Inspect experiment status, review attached runs, and connect the experiment back to its dataset, evaluator coverage, and release decision path.
Go back to the experiment workbench and compare this run set against other active or completed candidates.
Confirm the dataset feeding this experiment still represents the production failures or low-scoring cases you intend to fix.
Check whether the evaluator set behind this experiment is strong enough to support a promotion decision.
Use prompt history and release context to decide whether this experiment should influence a real production change.
Created 1m ago
Dataset supplying the evaluation cases for this experiment.
Attached experiment runs currently recorded for this experiment.
$0.1540 total cost across runs
Prompt family this experiment is most directly trying to influence.
Version treated as the stable baseline in this experiment config.
Version under evaluation for a possible release decision.
Keep gathering experiment output before making a release-state change.
{
"targetPromptId": "prompt_support_reply",
"targetPromptName": "support-reply",
"baselinePromptVersion": 11,
"candidatePromptVersion": 12,
"evaluationFocus": "premium escalation sensitivity",
"seededSummary": "Still tuning how aggressively chargeback-like billing cases should be escalated for premium accounts."
}Open the exact prompt diff between v11 and v12 before deciding whether to label or promote a version.
Confirm the experiment outcome is consistent with the current regression picture before changing release state.
Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.
Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.
{
"traceId": "trace_premium_chargeback_v18_missed_escalation",
"storyLabel": "Premium billing dispute answered directly instead of escalating",
"status": "degraded",
"agentId": "Premium Escalation Triage"
}{
"traceId": "trace_premium_chargeback_v19_restored_escalation",
"storyLabel": "Premium billing recovery restored the escalation handoff",
"status": "healthy",
"agentId": "Premium Escalation Triage"
}{
"traceId": "trace_duplicate_charge_review_04_021",
"storyLabel": "Duplicate-charge review resolved billing friction (4.21)",
"status": "healthy",
"agentId": "Billing Operations Guard"
}{
"traceId": "trace_premium_contract_escalation_03_045",
"storyLabel": "Enterprise contract question escalated correctly (3.45)",
"status": "healthy",
"agentId": "Premium Escalation Triage"
}