Detect behavior drift and structural changes across agent executions. Compare baselines to identify root causes.
Review each regression, compare the baseline vs current behavior, and use experiments to validate fixes.
No regressions match the current filter.