A Deterministic Trajectory-Level Evaluation Framework for Learning-Based Agentic Systems

Prasad Maderamitla; Subba Rao Katragadda

doi:10.32996/jefas.2026.8.3.3

Authors

Prasad Maderamitla Independent researcher, California, USA
Subba Rao Katragadda Independent researcher, California, USA

DOI:

https://doi.org/10.32996/jefas.2026.8.3.3

Keywords:

Agentic Systems, Trajectory-Level Evaluation, Deterministic Replay, Learning-Based Autonomous Agents, AI Governance and Reliability

Abstract

In the recent past, learning-based agentic systems are being increasingly used to tackle complex decision environments where reliability, transparency, and compliance with governance are key. However, the dominant evaluation strategies are mostly outcome-centric, with a focus on aggregate performance measures like accuracy, rewards, or task completion rates. These evaluation strategies provide little information on the internal decision-making processes that lead to the observable outcomes, especially for reasoning-based agentic systems with multiple steps. This paper proposes a Deterministic Trajectory Level Evaluation Framework (DTLEF) for learning-based agentic systems. The proposed framework transforms the evaluation paradigm from traditional outcome-oriented metrics to an evaluation process that focuses on the trajectories of states and actions within controlled execution conditions. The DTLEF integrates standardized states, comprehensive action trace logging, deterministic replay validation, and behavior verification against governance constraints. The evaluation process, in which the agentic system runs in controlled inference mode and the trajectory traces are compared, enables the identification of policy instabilities, reasoning drift, and non-deterministic behavior. Moreover, the evaluation process also ensures that the trajectories are consistent with the predefined constraints. Unlike traditional performance metrics, the proposed framework does not depend upon empirical data or domain-specific metrics. Instead, it provides a methodology for evaluating agentic systems, including autonomous decision pipelines, tool-enhanced language agents, and cyber-physical control systems, at an architecture level. The proposed framework increases transparency, reproducibility, and compliance without modifying training procedures. This research provides a scalable and domain-independent evaluation methodology for validating learning-based autonomous agents in environments where behavioral reliability is as important as functional performance by formally defining trajectory-level determinism as a primary evaluation criterion.

A Deterministic Trajectory-Level Evaluation Framework for Learning-Based Agentic Systems

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

rightbar

submission

menus

Notice: