ACTIONWATCH
Should the current leading candidate move toward promotion?
2 critical regressions remain in the fleet, so experiment evidence should be reviewed before approving any release decision.
Recommendation: Review experiments before promotion
Claude Agent SDKAdd a "use only provided refund-policy context" guard to the system prompt.
Expected impact: faithful-to-context evaluator score expected to lift from 0.71 → ≥ 0.92 within 100 traces.