Inspect experiment status, review attached runs, and connect the experiment back to its dataset, evaluator coverage, and release decision path.
Go back to the experiment workbench and compare this run set against other active or completed candidates.
Confirm the dataset feeding this experiment still represents the production failures or low-scoring cases you intend to fix.
Check whether the evaluator set behind this experiment is strong enough to support a promotion decision.
Use prompt history and release context to decide whether this experiment should influence a real production change.
Created 1m ago
Dataset supplying the evaluation cases for this experiment.
Attached experiment runs currently recorded for this experiment.
$0.3377 total cost across runs
Prompt family this experiment is most directly trying to influence.
Version treated as the stable baseline in this experiment config.
Version under evaluation for a possible release decision.
Use runs, prompt history, and regressions together before setting or changing labels.
{
"targetPromptId": "prompt_support_reply",
"targetPromptName": "support-reply",
"baselinePromptVersion": 18,
"candidatePromptVersion": 19,
"evaluationFocus": "refund exception grounding",
"seededWinningCandidate": "support-reply v19 for Returns Resolution Copilot",
"seededSummary": "Version 19 restores grounded refund exception handling across the weekly return-risk cohort while keeping cost close to the compressed rollout."
}Open the exact prompt diff between v18 and v19 before deciding whether to label or promote a version.
Confirm the experiment outcome is consistent with the current regression picture before changing release state.
Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.
Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.
{
"traceId": "trace_returns_exception_v17_baseline",
"storyLabel": "Late return request handled with clear exception guidance",
"status": "healthy",
"agentId": "Returns Resolution Copilot"
}{
"traceId": "trace_returns_exception_v18_regression",
"storyLabel": "Compressed refund rollout denied a valid exception path",
"status": "degraded",
"agentId": "Returns Resolution Copilot"
}{
"traceId": "trace_returns_exception_v19_recovery",
"storyLabel": "Recovery candidate restored refund exception handling",
"status": "healthy",
"agentId": "Returns Resolution Copilot"
}{
"traceId": "trace_policy_damage_claim_v4_hallucination",
"storyLabel": "Damaged-item policy checker invented a denial rule",
"status": "degraded",
"agentId": "Policy Grounding Reviewer"
}{
"traceId": "trace_refund_clarification_needed_02_011",
"storyLabel": "Refund clarification request avoided a false denial (2.11)",
"status": "healthy",
"agentId": "Returns Resolution Copilot"
}{
"traceId": "trace_refund_clarification_needed_05_014",
"storyLabel": "Refund clarification request avoided a false denial (5.14)",
"status": "healthy",
"agentId": "Returns Resolution Copilot"
}{
"traceId": "trace_refund_clarification_needed_07_019",
"storyLabel": "Refund clarification request avoided a false denial (7.19)",
"status": "healthy",
"agentId": "Returns Resolution Copilot"
}