Foxhound - AI Agent Observability

Release decision framing

Target prompt

support-reply

Prompt family this experiment is most directly trying to influence.

Baseline prompt

v11

Version treated as the stable baseline in this experiment config.

Candidate prompt

v12

Version under evaluation for a possible release decision.

Decision posture

Await more evidence

Keep gathering experiment output before making a release-state change.

Experiment config

{
  "targetPromptId": "prompt_support_reply",
  "targetPromptName": "support-reply",
  "baselinePromptVersion": 11,
  "candidatePromptVersion": 12,
  "evaluationFocus": "premium escalation sensitivity",
  "seededSummary": "Still tuning how aggressively chargeback-like billing cases should be escalated for premium accounts."
}

Promotion review actions

Review candidate vs baseline prompt

Open the exact prompt diff between v11 and v12 before deciding whether to label or promote a version.

Re-check regression posture

Confirm the experiment outcome is consistent with the current regression picture before changing release state.

Move into release controls

Use prompt labels and version history to convert experiment evidence into an explicit environment or production decision.

Open comparison workspace

Start from the comparison surface so you can add one or more peer experiments and review side-by-side evidence in a dedicated workspace.

Attached experiment runs

exp_premium_escalation_tuning_run_1

Dataset item dataset_premium_billing_guardrails_item_1 · created 2d ago

output captured

Latency

8700ms

Tokens

Unavailable

Cost

$0.0364

{
  "traceId": "trace_premium_chargeback_v18_missed_escalation",
  "storyLabel": "Premium billing dispute answered directly instead of escalating",
  "status": "degraded",
  "agentId": "Premium Escalation Triage"
}

exp_premium_escalation_tuning_run_2

Dataset item dataset_premium_billing_guardrails_item_2 · created 7h ago

output captured

Latency

9100ms

Tokens

Unavailable

Cost

$0.0391

{
  "traceId": "trace_premium_chargeback_v19_restored_escalation",
  "storyLabel": "Premium billing recovery restored the escalation handoff",
  "status": "healthy",
  "agentId": "Premium Escalation Triage"
}

exp_premium_escalation_tuning_run_3

Dataset item dataset_premium_billing_guardrails_item_3 · created 3d ago

output captured

Latency

7200ms

Tokens

Unavailable

Cost

$0.0412

{
  "traceId": "trace_duplicate_charge_review_04_021",
  "storyLabel": "Duplicate-charge review resolved billing friction (4.21)",
  "status": "healthy",
  "agentId": "Billing Operations Guard"
}

exp_premium_escalation_tuning_run_4

Dataset item dataset_premium_billing_guardrails_item_4 · created 4d ago

output captured

Latency

7020ms

Tokens

Unavailable

Cost

$0.0373

{
  "traceId": "trace_premium_contract_escalation_03_045",
  "storyLabel": "Enterprise contract question escalated correctly (3.45)",
  "status": "healthy",
  "agentId": "Premium Escalation Triage"
}

premium-escalation-tuning

premium-escalation-tuning