Sandbox
Comfortable
Prompts
prompt_refun...
Tour
Prompts
refund-policy-check
prompt_refund_policy_check
v5
restores grounded policy behavior
Compare latest
Linked traces
Experiments
Copy ID
Versions
3
Latest
v5
Traces
71
Error rate
1.4%
Avg cost
$0.0362
Version history
v5
restores grounded policy behavior
4/22/2026 · gpt-4o-mini
v4
source of the policy hallucination story
4/22/2026 · gpt-4o-mini
v3
reference policy behavior
4/22/2026 · gpt-4o
Version 5
gpt-4o-mini · Created 4/22/2026, 5:33:07 PM
Diff vs v4
restores grounded policy behavior
Corrected policy grounding with safer escalation fallback
Impact: v4 → v5
Improved
Error rate
100.0%
→
0.0%
-100.0pp
Avg cost
$0.0522
→
$0.0000
-100%
Avg latency
10.90s
→
0.00s
-100%
Traces
1
→
0
-1
Changes from v4
Full diff view
Overly strict
Corrected
policy
branch
grounding
with
hallucinated
safer
denials
escalation fallback