Sandbox

returns-recovery-v19

completedreturns-exception-week

Inspect experiment status, review attached runs, and connect the experiment back to its dataset, evaluator coverage, and release decision path.

Experiment id

exp_returns_recovery_v19

Recommended experiment actions

Return to experiments

Go back to the experiment workbench and compare this run set against other active or completed candidates.

Review source dataset

Confirm the dataset feeding this experiment still represents the production failures or low-scoring cases you intend to fix.

Review evaluator coverage

Check whether the evaluator set behind this experiment is strong enough to support a promotion decision.

Move toward prompt and release review

Use prompt history and release context to decide whether this experiment should influence a real production change.

Compare this experiment

Status

completed

Created 1m ago

Dataset

returns-exception-week

Dataset supplying the evaluation cases for this experiment.

Runs

7

Attached experiment runs currently recorded for this experiment.

Observed output

9554ms avg

$0.3377 total cost across runs

Release decision framing

Target prompt

support-reply

Prompt family this experiment is most directly trying to influence.

Baseline prompt

v18

Version treated as the stable baseline in this experiment config.

Candidate prompt

v19

Version under evaluation for a possible release decision.

Decision posture

Review for promotion

Use runs, prompt history, and regressions together before setting or changing labels.

Experiment config

{
  "targetPromptId": "prompt_support_reply",
  "targetPromptName": "support-reply",
  "baselinePromptVersion": 18,
  "candidatePromptVersion": 19,
  "evaluationFocus": "refund exception grounding",
  "seededWinningCandidate": "support-reply v19 for Returns Resolution Copilot",
  "seededSummary": "Version 19 restores grounded refund exception handling across the weekly return-risk cohort while keeping cost close to the compressed rollout."
}

Promotion review actions

Review candidate vs baseline prompt

Open the exact prompt diff between v18 and v19 before deciding whether to label or promote a version.

Re-check regression posture

Confirm the experiment outcome is consistent with the current regression picture before changing release state.

Move into release controls

Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.

Open comparison workspace

Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.

Attached experiment runs

exp_returns_recovery_v19_run_1

Dataset item dataset_returns_exception_week_item_1 · created 6d ago

output captured

Latency

11800ms

Tokens

Unavailable

Cost

$0.0830

{
  "traceId": "trace_returns_exception_v17_baseline",
  "storyLabel": "Late return request handled with clear exception guidance",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_2

Dataset item dataset_returns_exception_week_item_2 · created 3d ago

output captured

Latency

9800ms

Tokens

Unavailable

Cost

$0.0419

{
  "traceId": "trace_returns_exception_v18_regression",
  "storyLabel": "Compressed refund rollout denied a valid exception path",
  "status": "degraded",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_3

Dataset item dataset_returns_exception_week_item_3 · created 8h ago

output captured

Latency

10200ms

Tokens

Unavailable

Cost

$0.0445

{
  "traceId": "trace_returns_exception_v19_recovery",
  "storyLabel": "Recovery candidate restored refund exception handling",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_4

Dataset item dataset_returns_exception_week_item_4 · created 2d ago

output captured

Latency

10900ms

Tokens

Unavailable

Cost

$0.0522

{
  "traceId": "trace_policy_damage_claim_v4_hallucination",
  "storyLabel": "Damaged-item policy checker invented a denial rule",
  "status": "degraded",
  "agentId": "Policy Grounding Reviewer"
}

exp_returns_recovery_v19_run_5

Dataset item dataset_returns_exception_week_item_5 · created 5d ago

output captured

Latency

7600ms

Tokens

Unavailable

Cost

$0.0351

{
  "traceId": "trace_refund_clarification_needed_02_011",
  "storyLabel": "Refund clarification request avoided a false denial (2.11)",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_6

Dataset item dataset_returns_exception_week_item_6 · created 2d ago

output captured

Latency

8290ms

Tokens

Unavailable

Cost

$0.0432

{
  "traceId": "trace_refund_clarification_needed_05_014",
  "storyLabel": "Refund clarification request avoided a false denial (5.14)",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_7

Dataset item dataset_returns_exception_week_item_7 · created 6h ago

output captured

Latency

8290ms

Tokens

Unavailable

Cost

$0.0378

{
  "traceId": "trace_refund_clarification_needed_07_019",
  "storyLabel": "Refund clarification request avoided a false denial (7.19)",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

returns-recovery-v19

completedreturns-exception-week

Inspect experiment status, review attached runs, and connect the experiment back to its dataset, evaluator coverage, and release decision path.

Experiment id

exp_returns_recovery_v19

Recommended experiment actions

Return to experiments

Go back to the experiment workbench and compare this run set against other active or completed candidates.

Review source dataset

Confirm the dataset feeding this experiment still represents the production failures or low-scoring cases you intend to fix.

Review evaluator coverage

Check whether the evaluator set behind this experiment is strong enough to support a promotion decision.

Move toward prompt and release review

Use prompt history and release context to decide whether this experiment should influence a real production change.

Compare this experiment

Status

completed

Created 1m ago

Dataset

returns-exception-week

Dataset supplying the evaluation cases for this experiment.

Runs

7

Attached experiment runs currently recorded for this experiment.

Observed output

9554ms avg

$0.3377 total cost across runs

Release decision framing

Target prompt

support-reply

Prompt family this experiment is most directly trying to influence.

Baseline prompt

v18

Version treated as the stable baseline in this experiment config.

Candidate prompt

v19

Version under evaluation for a possible release decision.

Decision posture

Review for promotion

Use runs, prompt history, and regressions together before setting or changing labels.

Experiment config

{
  "targetPromptId": "prompt_support_reply",
  "targetPromptName": "support-reply",
  "baselinePromptVersion": 18,
  "candidatePromptVersion": 19,
  "evaluationFocus": "refund exception grounding",
  "seededWinningCandidate": "support-reply v19 for Returns Resolution Copilot",
  "seededSummary": "Version 19 restores grounded refund exception handling across the weekly return-risk cohort while keeping cost close to the compressed rollout."
}

Promotion review actions

Review candidate vs baseline prompt

Open the exact prompt diff between v18 and v19 before deciding whether to label or promote a version.

Re-check regression posture

Confirm the experiment outcome is consistent with the current regression picture before changing release state.

Move into release controls

Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.

Open comparison workspace

Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.

Attached experiment runs

exp_returns_recovery_v19_run_1

Dataset item dataset_returns_exception_week_item_1 · created 6d ago

output captured

Latency

11800ms

Tokens

Unavailable

Cost

$0.0830

{
  "traceId": "trace_returns_exception_v17_baseline",
  "storyLabel": "Late return request handled with clear exception guidance",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_2

Dataset item dataset_returns_exception_week_item_2 · created 3d ago

output captured

Latency

9800ms

Tokens

Unavailable

Cost

$0.0419

{
  "traceId": "trace_returns_exception_v18_regression",
  "storyLabel": "Compressed refund rollout denied a valid exception path",
  "status": "degraded",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_3

Dataset item dataset_returns_exception_week_item_3 · created 8h ago

output captured

Latency

10200ms

Tokens

Unavailable

Cost

$0.0445

{
  "traceId": "trace_returns_exception_v19_recovery",
  "storyLabel": "Recovery candidate restored refund exception handling",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_4

Dataset item dataset_returns_exception_week_item_4 · created 2d ago

output captured

Latency

10900ms

Tokens

Unavailable

Cost

$0.0522

{
  "traceId": "trace_policy_damage_claim_v4_hallucination",
  "storyLabel": "Damaged-item policy checker invented a denial rule",
  "status": "degraded",
  "agentId": "Policy Grounding Reviewer"
}

exp_returns_recovery_v19_run_5

Dataset item dataset_returns_exception_week_item_5 · created 5d ago

output captured

Latency

7600ms

Tokens

Unavailable

Cost

$0.0351

{
  "traceId": "trace_refund_clarification_needed_02_011",
  "storyLabel": "Refund clarification request avoided a false denial (2.11)",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_6

Dataset item dataset_returns_exception_week_item_6 · created 2d ago

output captured

Latency

8290ms

Tokens

Unavailable

Cost

$0.0432

{
  "traceId": "trace_refund_clarification_needed_05_014",
  "storyLabel": "Refund clarification request avoided a false denial (5.14)",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}

exp_returns_recovery_v19_run_7

Dataset item dataset_returns_exception_week_item_7 · created 6h ago

output captured

Latency

8290ms

Tokens

Unavailable

Cost

$0.0378

{
  "traceId": "trace_refund_clarification_needed_07_019",
  "storyLabel": "Refund clarification request avoided a false denial (7.19)",
  "status": "healthy",
  "agentId": "Returns Resolution Copilot"
}