Executive-review cohort connecting customer-visible issues to budgets, SLAs, and experiment decisions.
Go back to the dataset workbench and compare this evidence set against the rest of your current inventory.
Confirm the active evaluator set is appropriate for the cases represented in this dataset.
Use this dataset as the evidence base for candidate prompt or routing experiments.
Inspect the original runs feeding this dataset to make sure curation still matches the operational problem.
Stable dataset identifier.
Dataset creation time relative to now.
Recorded cases currently attached to this dataset.
Items carrying a source trace id for evidence inspection.
{
"traceId": "trace_returns_exception_v18_regression",
"storyLabel": "Compressed refund rollout denied a valid exception path",
"sessionId": "session_returns_exception_week1",
"promptName": "support-reply",
"promptVersion": 18
}{
"expectedAgentId": "Returns Resolution Copilot",
"expectedPromptName": "support-reply",
"expectedPromptVersion": 18,
"expectedStory": "Compressed refund rollout denied a valid exception path",
"expectedOutcome": "Match the seeded production behavior captured by this trace."
}{
"traceId": "trace_shipping_kb_timeout_failed",
"storyLabel": "Shipping resolution degraded during logistics lookup timeout",
"sessionId": "session_shipping_timeout_cluster",
"promptName": "shipping-delay-triage",
"promptVersion": 7
}{
"expectedAgentId": "Shipping Delay Resolution",
"expectedPromptName": "shipping-delay-triage",
"expectedPromptVersion": 7,
"expectedStory": "Shipping resolution degraded during logistics lookup timeout",
"expectedOutcome": "Match the seeded production behavior captured by this trace."
}{
"traceId": "trace_fraud_pattern_review_07_060",
"storyLabel": "Fraud-watch run flagged suspicious refund behavior (7.60)",
"sessionId": "session_refund_risk_07",
"promptName": "fraud-risk-review",
"promptVersion": 2
}{
"expectedAgentId": "Fraud Watch Investigator",
"expectedPromptName": "fraud-risk-review",
"expectedPromptVersion": 2,
"expectedStory": "Fraud-watch run flagged suspicious refund behavior (7.60)",
"expectedOutcome": "Match the seeded production behavior captured by this trace."
}