Failure cases where policy-heavy outputs drifted, over-fit, or hallucinated unsupported rules.
Go back to the dataset workbench and compare this evidence set against the rest of your current inventory.
Confirm the active evaluator set is appropriate for the cases represented in this dataset.
Use this dataset as the evidence base for candidate prompt or routing experiments.
Inspect the original runs feeding this dataset to make sure curation still matches the operational problem.
Stable dataset identifier.
Dataset creation time relative to now.
Recorded cases currently attached to this dataset.
Items carrying a source trace id for evidence inspection.
{
"traceId": "trace_policy_damage_claim_v4_hallucination",
"storyLabel": "Damaged-item policy checker invented a denial rule",
"sessionId": "session_damage_claim_policy",
"promptName": "refund-policy-check",
"promptVersion": 4
}{
"expectedAgentId": "Policy Grounding Reviewer",
"expectedPromptName": "refund-policy-check",
"expectedPromptVersion": 4,
"expectedStory": "Damaged-item policy checker invented a denial rule",
"expectedOutcome": "Match the seeded production behavior captured by this trace."
}{
"traceId": "trace_returns_exception_v18_regression",
"storyLabel": "Compressed refund rollout denied a valid exception path",
"sessionId": "session_returns_exception_week1",
"promptName": "support-reply",
"promptVersion": 18
}{
"expectedAgentId": "Returns Resolution Copilot",
"expectedPromptName": "support-reply",
"expectedPromptVersion": 18,
"expectedStory": "Compressed refund rollout denied a valid exception path",
"expectedOutcome": "Match the seeded production behavior captured by this trace."
}{
"traceId": "trace_fraud_pattern_review_06_053",
"storyLabel": "Fraud-watch run flagged suspicious refund behavior (6.53)",
"sessionId": "session_refund_risk_06",
"promptName": "fraud-risk-review",
"promptVersion": 2
}{
"expectedAgentId": "Fraud Watch Investigator",
"expectedPromptName": "fraud-risk-review",
"expectedPromptVersion": 2,
"expectedStory": "Fraud-watch run flagged suspicious refund behavior (6.53)",
"expectedOutcome": "Match the seeded production behavior captured by this trace."
}{
"traceId": "trace_refund_clarification_needed_03_013",
"storyLabel": "Refund clarification request avoided a false denial (3.13)",
"sessionId": "session_refund_clarification_03",
"promptName": "refund-policy-check",
"promptVersion": 3
}{
"expectedAgentId": "Returns Resolution Copilot",
"expectedPromptName": "refund-policy-check",
"expectedPromptVersion": 3,
"expectedStory": "Refund clarification request avoided a false denial (3.13)",
"expectedOutcome": "Match the seeded production behavior captured by this trace."
}