Sandbox

policy-grounding-failures

76 items4 trace-derived

Failure cases where policy-heavy outputs drifted, over-fit, or hallucinated unsupported rules.

Dataset id

dataset_policy_grounding_failures

Recommended dataset actions

Return to datasets

Go back to the dataset workbench and compare this evidence set against the rest of your current inventory.

Review evaluator coverage

Confirm the active evaluator set is appropriate for the cases represented in this dataset.

Launch or inspect experiments

Use this dataset as the evidence base for candidate prompt or routing experiments.

Return to source traces

Inspect the original runs feeding this dataset to make sure curation still matches the operational problem.

Dataset id

dataset_policy_grounding_failures

Stable dataset identifier.

Created

1m ago

Dataset creation time relative to now.

Items

76

Recorded cases currently attached to this dataset.

Trace lineage

4

Items carrying a source trace id for evidence inspection.

Dataset items

dataset_policy_grounding_failures_item_1

Added 2d ago

trace-derived

Input

{
  "traceId": "trace_policy_damage_claim_v4_hallucination",
  "storyLabel": "Damaged-item policy checker invented a denial rule",
  "sessionId": "session_damage_claim_policy",
  "promptName": "refund-policy-check",
  "promptVersion": 4
}

Expected output

{
  "expectedAgentId": "Policy Grounding Reviewer",
  "expectedPromptName": "refund-policy-check",
  "expectedPromptVersion": 4,
  "expectedStory": "Damaged-item policy checker invented a denial rule",
  "expectedOutcome": "Match the seeded production behavior captured by this trace."
}

Open source trace

dataset_policy_grounding_failures_item_2

Added 3d ago

trace-derived

Input

{
  "traceId": "trace_returns_exception_v18_regression",
  "storyLabel": "Compressed refund rollout denied a valid exception path",
  "sessionId": "session_returns_exception_week1",
  "promptName": "support-reply",
  "promptVersion": 18
}

Expected output

{
  "expectedAgentId": "Returns Resolution Copilot",
  "expectedPromptName": "support-reply",
  "expectedPromptVersion": 18,
  "expectedStory": "Compressed refund rollout denied a valid exception path",
  "expectedOutcome": "Match the seeded production behavior captured by this trace."
}

Open source trace

dataset_policy_grounding_failures_item_3

Added 1d ago

trace-derived

Input

{
  "traceId": "trace_fraud_pattern_review_06_053",
  "storyLabel": "Fraud-watch run flagged suspicious refund behavior (6.53)",
  "sessionId": "session_refund_risk_06",
  "promptName": "fraud-risk-review",
  "promptVersion": 2
}

Expected output

{
  "expectedAgentId": "Fraud Watch Investigator",
  "expectedPromptName": "fraud-risk-review",
  "expectedPromptVersion": 2,
  "expectedStory": "Fraud-watch run flagged suspicious refund behavior (6.53)",
  "expectedOutcome": "Match the seeded production behavior captured by this trace."
}

Open source trace

dataset_policy_grounding_failures_item_4

Added 4d ago

trace-derived

Input

{
  "traceId": "trace_refund_clarification_needed_03_013",
  "storyLabel": "Refund clarification request avoided a false denial (3.13)",
  "sessionId": "session_refund_clarification_03",
  "promptName": "refund-policy-check",
  "promptVersion": 3
}

Expected output

{
  "expectedAgentId": "Returns Resolution Copilot",
  "expectedPromptName": "refund-policy-check",
  "expectedPromptVersion": 3,
  "expectedStory": "Refund clarification request avoided a false denial (3.13)",
  "expectedOutcome": "Match the seeded production behavior captured by this trace."
}

Open source trace

policy-grounding-failures

76 items4 trace-derived

Failure cases where policy-heavy outputs drifted, over-fit, or hallucinated unsupported rules.

Dataset id

dataset_policy_grounding_failures

Recommended dataset actions

Return to datasets

Go back to the dataset workbench and compare this evidence set against the rest of your current inventory.

Review evaluator coverage

Confirm the active evaluator set is appropriate for the cases represented in this dataset.

Launch or inspect experiments

Use this dataset as the evidence base for candidate prompt or routing experiments.

Return to source traces

Inspect the original runs feeding this dataset to make sure curation still matches the operational problem.

Dataset id

dataset_policy_grounding_failures

Stable dataset identifier.

Created

1m ago

Dataset creation time relative to now.

Items

76

Recorded cases currently attached to this dataset.

Trace lineage

4

Items carrying a source trace id for evidence inspection.

Dataset items

dataset_policy_grounding_failures_item_1

Added 2d ago

trace-derived

Input

{
  "traceId": "trace_policy_damage_claim_v4_hallucination",
  "storyLabel": "Damaged-item policy checker invented a denial rule",
  "sessionId": "session_damage_claim_policy",
  "promptName": "refund-policy-check",
  "promptVersion": 4
}

Expected output

{
  "expectedAgentId": "Policy Grounding Reviewer",
  "expectedPromptName": "refund-policy-check",
  "expectedPromptVersion": 4,
  "expectedStory": "Damaged-item policy checker invented a denial rule",
  "expectedOutcome": "Match the seeded production behavior captured by this trace."
}

Open source trace

dataset_policy_grounding_failures_item_2

Added 3d ago

trace-derived

Input

{
  "traceId": "trace_returns_exception_v18_regression",
  "storyLabel": "Compressed refund rollout denied a valid exception path",
  "sessionId": "session_returns_exception_week1",
  "promptName": "support-reply",
  "promptVersion": 18
}

Expected output

{
  "expectedAgentId": "Returns Resolution Copilot",
  "expectedPromptName": "support-reply",
  "expectedPromptVersion": 18,
  "expectedStory": "Compressed refund rollout denied a valid exception path",
  "expectedOutcome": "Match the seeded production behavior captured by this trace."
}

Open source trace

dataset_policy_grounding_failures_item_3

Added 1d ago

trace-derived

Input

{
  "traceId": "trace_fraud_pattern_review_06_053",
  "storyLabel": "Fraud-watch run flagged suspicious refund behavior (6.53)",
  "sessionId": "session_refund_risk_06",
  "promptName": "fraud-risk-review",
  "promptVersion": 2
}

Expected output

{
  "expectedAgentId": "Fraud Watch Investigator",
  "expectedPromptName": "fraud-risk-review",
  "expectedPromptVersion": 2,
  "expectedStory": "Fraud-watch run flagged suspicious refund behavior (6.53)",
  "expectedOutcome": "Match the seeded production behavior captured by this trace."
}

Open source trace

dataset_policy_grounding_failures_item_4

Added 4d ago

trace-derived

Input

{
  "traceId": "trace_refund_clarification_needed_03_013",
  "storyLabel": "Refund clarification request avoided a false denial (3.13)",
  "sessionId": "session_refund_clarification_03",
  "promptName": "refund-policy-check",
  "promptVersion": 3
}

Expected output

{
  "expectedAgentId": "Returns Resolution Copilot",
  "expectedPromptName": "refund-policy-check",
  "expectedPromptVersion": 3,
  "expectedStory": "Refund clarification request avoided a false denial (3.13)",
  "expectedOutcome": "Match the seeded production behavior captured by this trace."
}

Open source trace