Foxhound - AI Agent Observability

Release decision framing

Target prompt

support-reply

Prompt family this experiment is most directly trying to influence.

Baseline prompt

v3

Version treated as the stable baseline in this experiment config.

Candidate prompt

v4

Version under evaluation for a possible release decision.

Decision posture

Review for promotion

Use runs, prompt history, and regressions together before setting or changing labels.

Experiment config

{
  "targetPromptId": "prompt_support_reply",
  "targetPromptName": "support-reply",
  "baselinePromptVersion": 3,
  "candidatePromptVersion": 4,
  "evaluationFocus": "operator summary specificity",
  "seededWinningCandidate": "weekly-support-brief v4",
  "seededSummary": "Operator summaries are now specific enough to support investigation handoff and executive review without requiring a replay first."
}

Promotion review actions

Review candidate vs baseline prompt

Open the exact prompt diff between v3 and v4 before deciding whether to label or promote a version.

Re-check regression posture

Confirm the experiment outcome is consistent with the current regression picture before changing release state.

Move into release controls

Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.

Open comparison workspace

Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.

Attached experiment runs

exp_operator_summary_grounding_run_1

Dataset item dataset_support_exec_weekly_summary_item_1 · created 3d ago

output captured

Latency

9800ms

Tokens

Unavailable

Cost

$0.0419

{
  "traceId": "trace_returns_exception_v18_regression",
  "storyLabel": "Compressed refund rollout denied a valid exception path",
  "status": "degraded",
  "agentId": "Returns Resolution Copilot"
}

exp_operator_summary_grounding_run_2

Dataset item dataset_support_exec_weekly_summary_item_2 · created 1d ago

output captured

Latency

14600ms

Tokens

Unavailable

Cost

$0.0629

{
  "traceId": "trace_shipping_kb_timeout_failed",
  "storyLabel": "Shipping resolution degraded during logistics lookup timeout",
  "status": "degraded",
  "agentId": "Shipping Delay Resolution"
}

exp_operator_summary_grounding_run_3

Dataset item dataset_support_exec_weekly_summary_item_3 · created 3h ago

output captured

Latency

9220ms

Tokens

Unavailable

Cost

$0.0421

{
  "traceId": "trace_fraud_pattern_review_07_060",
  "storyLabel": "Fraud-watch run flagged suspicious refund behavior (7.60)",
  "status": "healthy",
  "agentId": "Fraud Watch Investigator"
}

operator-summary-grounding

operator-summary-grounding