Dashboard — Dunetrace docs

Mission control at localhost:3000. Nine pages, auto-refreshing every fifteen seconds, powered entirely client-side by the Customer API.

Pages

Live at localhost:3000. Auto-refreshes every 15 seconds.

Page	What it shows
Overview	Four stat cards (Critical / High / Signals / Runs) with trend deltas (1h / 24h / 7d). Risk Trend 24-hour bar chart. Token Waste Drift panel — 24h sparkline of wasted tokens vs 7-day baseline, dashed baseline line, WARNING ZONE badge when 24h exceeds baseline by >20%. Failure Posture gauge — half-circle SVG with needle at avg confidence, daily signals, avg confidence, false positive rate. Top Failure Drivers, Agent Signal Drift, live run feed.
All Runs	Full run table. Click any row to open run detail.
Alerts	Signals grouped by failure type with per-run confidence and token estimates. Shadow signals rendered below with dashed border + `SHADOW` badge.
Analytics	Estimated token cost saved this week (configurable $/1k). Cross-agent totals, top failure patterns, per-agent breakdown.
Risk Heatmap	Failure type × agent intensity grid.
Agents	Per-agent health cards — failure rate %, dominant pattern, run / critical / high counts, last seen, ungraduated shadow signal count. Each card shows an Agent Health Score badge (0–100, colour-coded green/amber/red) powered by `GET /v1/agents/{id}/health-score`. Each card links to a Health Record panel with 30-day per-failure-type rates, sparkline, and a `SYSTEMIC` badge. Clicking any failure type opens the Why is this happening? deep-dive panel.
Compare Runs	Side-by-side comparison. Select any two runs — metrics, signals, and max confidence shown in both panels with a colour-coded delta table (new / resolved failure types highlighted).
Detectors	Threshold sliders and alert-level selector. Live review panel: "with current config, N of M past runs would be flagged HIGH or above" — recomputes on every change.
Policies	Create, edit, toggle, and delete runtime guardrails. Each row shows trigger, operator, threshold, action type, and enabled state. One-click example templates for "cap tool calls", "cost cap", and "loop fix". Policies saved here are fetched automatically by the SDK within 60 seconds.

Why is this happening?

Clicking any failure type in the Signal Breakdown or Systemic Patterns sidebar opens a cross-run deep-dive panel. Click the same item again or ✕ to dismiss.

Section	What it answers
Overview	Affected runs / total runs, rate, avg confidence, severity breakdown, first and last seen
Fires at step	P25 / P50 / P75 / avg step index — answers "does this happen early or late in runs?"
Evidence patterns	Aggregated detector evidence: loop counts, token growth, RAG top scores, stall steps
Co-occurs with	Other failure types that fire in the same runs, ranked by co-occurrence rate
14-day trend	Daily sparkline of affected_runs / rate — is this getting worse, better, or stable?
Highest confidence runs	Five runs with highest confidence for this failure type — each row opens run detail

Run detail

Click any run row to open the detail panel. Three tabs:

Analysis — step timeline, signal score cards with confidence bars, plain-English explanation + suggested fix. When Langfuse credentials are configured, an Explain with Langfuse ↗ button calls POST /v1/signals/{id}/explain for a root-cause explanation and optional prompt fix.
Run graph — SVG node graph: green = LLM, orange = tool (ok), red = looping tool call, blue = start/end. Loop clusters highlighted with a dashed red outline.
Event log — every event in order, expandable to full payload. Content fields shown as SHA-256 hashes.

Stat card info buttons

Card	Threshold
Critical	conf ≥ 0.85, or prompt injection / cascading failure regardless of confidence
High	conf ≥ 0.70 — tool loops, retry storms, context bloat
Signals	All four levels: CRITICAL ≥ 0.85 · HIGH ≥ 0.70 · MEDIUM ≥ 0.50 · LOW < 0.50
Total runs	Processed runs counted within one 5s detector poll

Token waste estimates

Token waste across the dashboard is computed client-side from run step_count using a fixed estimate of 250 tokens per step. Dollar costs use a configurable rate — default $0.010/1k tokens, editable on the Analytics page. These are approximations.

Data sources

Page	API calls
Overview, Alerts, Analytics, Heatmap, Agents	`GET /v1/agents` + per-agent `/runs` + `/signals?include_shadow=true`
All Runs, Compare Runs	Same cached data, no extra calls
Run detail	`GET /v1/runs/{id}` (events + signals)
Agent view (health record + runs)	`GET /v1/agents/{id}/runs` + `/signals` + `/insights` + `/health-score`
Why is this happening? panel	`GET /v1/agents/{agent_id}/failure-patterns/{failure_type}`
Detectors	Static — edits require updating `detectors.yml` and restarting the detector service
Policies	`GET /v1/policies` + `POST` + `PUT /{id}` + `DELETE /{id}` + `PATCH /{id}/toggle`