Inspect experiment status, review attached runs, and connect the experiment back to its dataset, evaluator coverage, and release decision path.
Go back to the experiment workbench and compare this run set against other active or completed candidates.
Confirm the dataset feeding this experiment still represents the production failures or low-scoring cases you intend to fix.
Check whether the evaluator set behind this experiment is strong enough to support a promotion decision.
Use prompt history and release context to decide whether this experiment should influence a real production change.
Created 1m ago
Dataset supplying the evaluation cases for this experiment.
Attached experiment runs currently recorded for this experiment.
$0.1894 total cost across runs
Prompt family this experiment is most directly trying to influence.
Version treated as the stable baseline in this experiment config.
Version under evaluation for a possible release decision.
Use runs, prompt history, and regressions together before setting or changing labels.
{
"targetPromptId": "prompt_support_reply",
"targetPromptName": "support-reply",
"baselinePromptVersion": 6,
"candidatePromptVersion": 7,
"evaluationFocus": "shipping fallback resilience",
"seededWinningCandidate": "shipping-delay-triage v7 with fallback hardening",
"seededSummary": "The revised fallback sequence cuts the worst timeout impact without forcing every shipping case onto the expensive baseline model."
}Open the exact prompt diff between v6 and v7 before deciding whether to label or promote a version.
Confirm the experiment outcome is consistent with the current regression picture before changing release state.
Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.
Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.
{
"traceId": "trace_shipping_kb_timeout_failed",
"storyLabel": "Shipping resolution degraded during logistics lookup timeout",
"status": "degraded",
"agentId": "Shipping Delay Resolution"
}{
"traceId": "trace_shipping_kb_timeout_recovered",
"storyLabel": "Fallback shipping path recovered after timeout cluster",
"status": "healthy",
"agentId": "Shipping Delay Resolution"
}{
"traceId": "trace_shipping_status_refresh_01_001",
"storyLabel": "Shipping status explanation stayed grounded (1.1)",
"status": "healthy",
"agentId": "Shipping Delay Resolution"
}{
"traceId": "trace_shipping_delay_warning_05_071",
"storyLabel": "Shipping status path stayed usable but slow (5.71)",
"status": "healthy",
"agentId": "Shipping Delay Resolution"
}{
"traceId": "trace_shipping_status_refresh_07_010",
"storyLabel": "Shipping status explanation stayed grounded (7.10)",
"status": "healthy",
"agentId": "Shipping Delay Resolution"
}