Docs / Reference · Dashboard

Dashboard

Mission control at localhost:3000. Nine pages, auto-refreshing every fifteen seconds, powered entirely client-side by the Customer API.

Pages

Live at localhost:3000. Auto-refreshes every 15 seconds.

PageWhat it shows
OverviewFour stat cards (Critical / High / Signals / Runs) with trend deltas (1h / 24h / 7d). Risk Trend 24-hour bar chart. Token Waste Drift panel — 24h sparkline of wasted tokens vs 7-day baseline, dashed baseline line, WARNING ZONE badge when 24h exceeds baseline by >20%. Failure Posture gauge — half-circle SVG with needle at avg confidence, daily signals, avg confidence, false positive rate. Top Failure Drivers, Agent Signal Drift, live run feed.
All RunsFull run table. Click any row to open run detail.
AlertsSignals grouped by failure type with per-run confidence and token estimates. Shadow signals rendered below with dashed border + SHADOW badge.
AnalyticsEstimated token cost saved this week (configurable $/1k). Cross-agent totals, top failure patterns, per-agent breakdown.
Risk HeatmapFailure type × agent intensity grid.
AgentsPer-agent health cards — failure rate %, dominant pattern, run / critical / high counts, last seen, ungraduated shadow signal count. Each card shows an Agent Health Score badge (0–100, colour-coded green/amber/red) powered by GET /v1/agents/{id}/health-score. Each card links to a Health Record panel with 30-day per-failure-type rates, sparkline, and a SYSTEMIC badge. Clicking any failure type opens the Why is this happening? deep-dive panel.
Compare RunsSide-by-side comparison. Select any two runs — metrics, signals, and max confidence shown in both panels with a colour-coded delta table (new / resolved failure types highlighted).
DetectorsThreshold sliders and alert-level selector. Live review panel: "with current config, N of M past runs would be flagged HIGH or above" — recomputes on every change.
PoliciesCreate, edit, toggle, and delete runtime guardrails. Each row shows trigger, operator, threshold, action type, and enabled state. One-click example templates for "cap tool calls", "cost cap", and "loop fix". Policies saved here are fetched automatically by the SDK within 60 seconds.

Why is this happening?

Clicking any failure type in the Signal Breakdown or Systemic Patterns sidebar opens a cross-run deep-dive panel. Click the same item again or ✕ to dismiss.

SectionWhat it answers
OverviewAffected runs / total runs, rate, avg confidence, severity breakdown, first and last seen
Fires at stepP25 / P50 / P75 / avg step index — answers "does this happen early or late in runs?"
Evidence patternsAggregated detector evidence: loop counts, token growth, RAG top scores, stall steps
Co-occurs withOther failure types that fire in the same runs, ranked by co-occurrence rate
14-day trendDaily sparkline of affected_runs / rate — is this getting worse, better, or stable?
Highest confidence runsFive runs with highest confidence for this failure type — each row opens run detail

Powered by GET /v1/agents/{agent_id}/failure-patterns/{failure_type}.

Run detail

Click any run row to open the detail panel. Three tabs:

  • Analysis — step timeline, signal score cards with confidence bars, plain-English explanation + suggested fix. When Langfuse credentials are configured, an Explain with Langfuse ↗ button calls POST /v1/signals/{id}/explain for a root-cause explanation and optional prompt fix.
  • Run graph — SVG node graph: green = LLM, orange = tool (ok), red = looping tool call, blue = start/end. Loop clusters highlighted with a dashed red outline.
  • Event log — every event in order, expandable to full payload. Content fields shown as SHA-256 hashes.

Stat card info buttons

CardThreshold
Criticalconf ≥ 0.85, or prompt injection / cascading failure regardless of confidence
Highconf ≥ 0.70 — tool loops, retry storms, context bloat
SignalsAll four levels: CRITICAL ≥ 0.85 · HIGH ≥ 0.70 · MEDIUM ≥ 0.50 · LOW < 0.50
Total runsProcessed runs counted within one 5s detector poll

Token waste estimates

Token waste across the dashboard is computed client-side from run step_count using a fixed estimate of 250 tokens per step. Dollar costs use a configurable rate — default $0.010/1k tokens, editable on the Analytics page. These are approximations.

Data sources

PageAPI calls
Overview, Alerts, Analytics, Heatmap, AgentsGET /v1/agents + per-agent /runs + /signals?include_shadow=true
All Runs, Compare RunsSame cached data, no extra calls
Run detailGET /v1/runs/{id} (events + signals)
Agent view (health record + runs)GET /v1/agents/{id}/runs + /signals + /insights + /health-score
Why is this happening? panelGET /v1/agents/{agent_id}/failure-patterns/{failure_type}
DetectorsStatic — edits require updating detectors.yml and restarting the detector service
PoliciesGET /v1/policies + POST + PUT /{id} + DELETE /{id} + PATCH /{id}/toggle