Open source · Apache 2.0

Your AI agents fail.
Silently.

Traditional monitoring can't see inside agent runs. Dunetrace watches every structural pattern and fires a Slack alert within 15 seconds of completion.

Get started free View on GitHub ↗

Detectors

<15s

Alert latency

Raw content sent

<500µs

Agent overhead

The Problem

Your logs say OK.
Your agent isn't.

API returns 200. No exceptions. But the agent called the same tool 12 times, burned through your token budget, and delivered a wrong answer, or no answer at all.

LangSmith and Langfuse answer "what happened?" after you already know something broke. Dunetrace answers a different question: is something breaking right now?

✗ Tool called 12× · API returns 200

✗ $12 in tokens burned · logs show no errors

✗ User got wrong answer · no alert fired

research-agent · run_f9c2 · ● live

step 01web_search"climate 2026"✓ 820tk

step 02web_search"climate 2026"⟳ 820tk

step 03web_search"climate 2026"⟳ 820tk

step 04web_search"climate 2026"⟳ 820tk

step 05web_search"climate 2026"⟳ 820tk

step 06web_search"climate 2026"⟳ 820tk

⚡ TOOL_LOOP detected HIGH

web_search called 6× with identical args
steps 2–6 · 4,100 tokens wasted

→ fix: add seen-queries deduplication

Slack #agent-alerts · delivered in 11s · First occurrence

How It Works

Zero-config. Zero latency impact.

Instrument once

Two lines of Python. The SDK patches OpenAI, Anthropic, httpx, and requests globally. No code changes per agent.

→

Detect automatically

15 structural detectors run on every completed run. Events are SHA-256 hashed. No raw content ever leaves your process.

→

Alert before users do

Slack or webhook. Plain-English explanation. Suggested fix. Rate context: first occurrence or systemic issue.

15 Structural Detectors

Catches what logs miss

Every detector runs automatically on every completed run. All thresholds configurable via detectors.yml — no code changes.

CRIT

Prompt Injection Signal

Input matches known injection / jailbreak patterns before hashing

HIGH

Tool Loop

Same tool called ≥3× in any 5-tool-call window

HIGH

Tool Thrashing

Agent alternates between exactly two tools repeatedly

HIGH

Retry Storm

Same tool fails 3+ times consecutively

HIGH

LLM Truncation Loop

finish_reason=length fires ≥2 times. Context overflowing.

HIGH

Empty LLM Response

Model returned zero-length output with finish_reason=stop

HIGH

Cascading Tool Failure

3+ consecutive failures across 2+ distinct tools

MED

Reasoning Stall

LLM:tool-call ratio ≥4×. Agent thinking without acting.

MED

Goal Abandonment

Tool use stops, then ≥4 consecutive LLM calls with no exit

MED

Context Bloat

Prompt tokens grow 3× from first to last LLM call

MED

RAG Empty Retrieval

Retrieval returned 0 results but agent answered anyway

MED

Slow Step

Tool call >15s or LLM call >30s detected

MED

Tool Avoidance

Final answer given without calling available tools

MED

Step Count Inflation

Run used >2× the P75 step count for this agent

MED

First Step Failure

Error or empty output at step ≤2. Failing at the gate.

All detectors run automatically. Shadow mode lets you validate custom detectors against real traffic before they page anyone.

Dashboard

Every signal. Explained.

Live auto-refreshes every 15s. Plain-English explanations. Suggested fixes for every failure.

localhost:3000 — Overview

Dunetrace overview dashboard with risk trend and failure patterns

localhost:3000 — Analytics

Dunetrace analytics showing failure trends and patterns over time

localhost:3000 — Agent Detail

Dunetrace agent detail with run timeline and signal analysis

Quick Start

Running in under 5 minutes

1

Start the backend

Clone the repo and spin up with Docker Compose. PostgreSQL included. No external dependencies.
2

Install the SDK

Pure Python. Supports any framework. First-class LangChain and LangGraph support.
3

Add two lines

Decorate your agent function. The SDK patches OpenAI, Anthropic, httpx and requests globally. Automatic.
4

Open the dashboard

Navigate to localhost:3000. Detectors start firing immediately. Configure Slack for real-time alerts.

terminal

            # 1. Start the backend

            git clone https://github.com/dunetrace/dunetrace

            cd dunetrace && cp .env.example .env

            docker compose build && docker compose up -d

            # 2. Install the SDK

            pip install dunetrace

            # or for LangChain:

            pip install 'dunetrace[langchain]'

agent.py

            from dunetrace import Dunetrace

            dt = Dunetrace()

            dt.init(agent_id="my-agent")

            @dt.agent()

            def run_agent(query: str) -> str:

                ...  # LLM + HTTP calls tracked automatically

Alerts

You shouldn't have to dig through logs to find out something broke.

Traditional monitoring never tells you. You find out when a user complains, then spend hours hunting through logs. Dunetrace fires a Slack alert within 15 seconds of a completed run, with a plain-English explanation and a suggested fix already attached.

✓ Slack Block Kit alerts

✓ Generic webhook (PagerDuty, Linear, custom)

✓ Weekly digest with 7-day aggregates

✓ At-least-once delivery with retry

⚡

Dunetrace

#agent-alerts

⚑ TOOL_LOOP detected

Agent: research-agent

Severity: HIGH

What: web_search called 8× in last 5-tool window

Why: Agent is stuck in a retrieval loop, likely failing to synthesize results

Fix: Add a tool-call ceiling or result-dedup step

Rate context: Recurring · seen in 3 of last 10 runs (30%)

🔒

Privacy by Design

No raw content. Ever.

Every prompt, tool argument, and model output is SHA-256 hashed before transmission. Dunetrace detects structural patterns, not content. Your data never leaves your infrastructure.

prompt: "What are the current interest rates?" → a3f8c1d9e2b47f5a0c6d8e1b3f2a9c47d8e1b3c5a2f9d6e0b4c7a1f3e8d2b5c9

Integrations

Works with your stack

Drop-in support for every major Python agent framework and observability tool.

LangChain

LangGraph

OpenAI

Anthropic

FastAPI

Flask

OpenTelemetry

Grafana / Loki

Datadog

Honeycomb

Tempo

PagerDuty

Langfuse

TypeScript / Node

Deep Analysis

Already using Langfuse? Connect your existing traces.

Dunetrace works alongside Langfuse, not instead of it. Langfuse tells you what happened in a run. Dunetrace tells you what went structurally wrong and fires an alert before you ever open a trace. Use both: get the alert from Dunetrace, then drill into the full trace in Langfuse for root cause analysis.

Read the integration docs ↗

🔗

Dunetrace + Langfuse

Your AI agents fail.Silently.

Your logs say OK.Your agent isn't.

Zero-config. Zero latency impact.

Instrument once

Detect automatically

Alert before users do

Catches what logs miss

Every signal. Explained.

Running in under 5 minutes

You shouldn't have to dig through logs to find out something broke.

No raw content. Ever.

Works with your stack

Already using Langfuse? Connect your existing traces.

Stop finding out from users.

Your AI agents fail.
Silently.

Your logs say OK.
Your agent isn't.