Foxhound - AI Agent Observability

Release decision framing

Target prompt

support-reply

Prompt family this experiment is most directly trying to influence.

Baseline prompt

v6

Version treated as the stable baseline in this experiment config.

Candidate prompt

v7

Version under evaluation for a possible release decision.

Decision posture

Review for promotion

Use runs, prompt history, and regressions together before setting or changing labels.

Experiment config

{
  "targetPromptId": "prompt_support_reply",
  "targetPromptName": "support-reply",
  "baselinePromptVersion": 6,
  "candidatePromptVersion": 7,
  "evaluationFocus": "shipping fallback resilience",
  "seededWinningCandidate": "shipping-delay-triage v7 with fallback hardening",
  "seededSummary": "The revised fallback sequence cuts the worst timeout impact without forcing every shipping case onto the expensive baseline model."
}

Promotion review actions

Review candidate vs baseline prompt

Open the exact prompt diff between v6 and v7 before deciding whether to label or promote a version.

Re-check regression posture

Confirm the experiment outcome is consistent with the current regression picture before changing release state.

Move into release controls

Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.

Open comparison workspace

Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.

Attached experiment runs

exp_shipping_fallback_hardening_run_1

Dataset item dataset_shipping_reliability_window_item_1 · created 1d ago

output captured

Latency

14600ms

Tokens

Unavailable

Cost

$0.0629

{
  "traceId": "trace_shipping_kb_timeout_failed",
  "storyLabel": "Shipping resolution degraded during logistics lookup timeout",
  "status": "degraded",
  "agentId": "Shipping Delay Resolution"
}

exp_shipping_fallback_hardening_run_2

Dataset item dataset_shipping_reliability_window_item_2 · created 4h ago

output captured

Latency

11200ms

Tokens

Unavailable

Cost

$0.0573

{
  "traceId": "trace_shipping_kb_timeout_recovered",
  "storyLabel": "Fallback shipping path recovered after timeout cluster",
  "status": "healthy",
  "agentId": "Shipping Delay Resolution"
}

exp_shipping_fallback_hardening_run_3

Dataset item dataset_shipping_reliability_window_item_3 · created 6d ago

output captured

Latency

6400ms

Tokens

Unavailable

Cost

$0.0179

{
  "traceId": "trace_shipping_status_refresh_01_001",
  "storyLabel": "Shipping status explanation stayed grounded (1.1)",
  "status": "healthy",
  "agentId": "Shipping Delay Resolution"
}

exp_shipping_fallback_hardening_run_4

Dataset item dataset_shipping_reliability_window_item_4 · created 2d ago

output captured

Latency

11600ms

Tokens

Unavailable

Cost

$0.0279

{
  "traceId": "trace_shipping_delay_warning_05_071",
  "storyLabel": "Shipping status path stayed usable but slow (5.71)",
  "status": "healthy",
  "agentId": "Shipping Delay Resolution"
}

exp_shipping_fallback_hardening_run_5

Dataset item dataset_shipping_reliability_window_item_5 · created 3h ago

output captured

Latency

7320ms

Tokens

Unavailable

Cost

$0.0234

{
  "traceId": "trace_shipping_status_refresh_07_010",
  "storyLabel": "Shipping status explanation stayed grounded (7.10)",
  "status": "healthy",
  "agentId": "Shipping Delay Resolution"
}

shipping-fallback-hardening

shipping-fallback-hardening