Inspect experiment status, review attached runs, and connect the experiment back to its dataset, evaluator coverage, and release decision path.
Go back to the experiment workbench and compare this run set against other active or completed candidates.
Confirm the dataset feeding this experiment still represents the production failures or low-scoring cases you intend to fix.
Check whether the evaluator set behind this experiment is strong enough to support a promotion decision.
Use prompt history and release context to decide whether this experiment should influence a real production change.
Created 1m ago
Dataset supplying the evaluation cases for this experiment.
Attached experiment runs currently recorded for this experiment.
$0.1469 total cost across runs
Prompt family this experiment is most directly trying to influence.
Version treated as the stable baseline in this experiment config.
Version under evaluation for a possible release decision.
Use runs, prompt history, and regressions together before setting or changing labels.
{
"targetPromptId": "prompt_support_reply",
"targetPromptName": "support-reply",
"baselinePromptVersion": 3,
"candidatePromptVersion": 4,
"evaluationFocus": "operator summary specificity",
"seededWinningCandidate": "weekly-support-brief v4",
"seededSummary": "Operator summaries are now specific enough to support investigation handoff and executive review without requiring a replay first."
}Open the exact prompt diff between v3 and v4 before deciding whether to label or promote a version.
Confirm the experiment outcome is consistent with the current regression picture before changing release state.
Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.
Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.
{
"traceId": "trace_returns_exception_v18_regression",
"storyLabel": "Compressed refund rollout denied a valid exception path",
"status": "degraded",
"agentId": "Returns Resolution Copilot"
}{
"traceId": "trace_shipping_kb_timeout_failed",
"storyLabel": "Shipping resolution degraded during logistics lookup timeout",
"status": "degraded",
"agentId": "Shipping Delay Resolution"
}{
"traceId": "trace_fraud_pattern_review_07_060",
"storyLabel": "Fraud-watch run flagged suspicious refund behavior (7.60)",
"status": "healthy",
"agentId": "Fraud Watch Investigator"
}