Langfuse — deep analysis
TOOL_LOOP, GOAL_ABANDONMENT), the dashboard shows an "Explain with Langfuse ↗" button. Clicking it fetches the execution trace from Langfuse, runs an LLM analysis against the signal evidence + trace inputs/outputs, and returns a plain-English root cause with a specific prompt fix you can apply in one click.Prerequisites
- Langfuse account (cloud or self-hosted) with a project and API keys
- One LLM API key for the analysis call (
ANTHROPIC_API_KEYpreferred,OPENAI_API_KEYaccepted as fallback)
1. Install
pip install 'dunetrace[langchain,langfuse]'
2. Add credentials to .env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com # omit for cloud; set for self-hosted
ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-... # accepted as fallback
Restart the API container after editing .env:
docker compose up -d api
3. Run both callbacks together
DunetraceCallbackHandler.last_run_id exposes the Dunetrace run ID. Langfuse's last_trace_id gives the corresponding Langfuse trace ID. Pass it to the explain endpoint to join the two systems.
from dunetrace import Dunetrace
from dunetrace.integrations.langchain import DunetraceCallbackHandler
from langfuse.langchain import CallbackHandler as LangfuseCallbackHandler # v4+
dt = Dunetrace()
dt_cb = DunetraceCallbackHandler(dt, agent_id="my-agent", model="gpt-4o-mini")
lf_cb = LangfuseCallbackHandler() # reads LANGFUSE_* from env
result = agent.invoke(
{"messages": [("human", query)]},
config={"callbacks": [dt_cb, lf_cb]},
)
dt.shutdown(timeout=5)
import langfuse as lf_module
lf_module.get_client().flush() # ensure trace is uploaded
dt_run_id = dt_cb.last_run_id # e.g. "b5ed23be-e4f0-43bc-..."
lf_trace_id = lf_cb.last_trace_id # e.g. "b5ed23bee4f043bc..." (same UUID, no dashes)
dt_run_id and lf_trace_id represent the same run even though their formats differ.4. Call the explain endpoint
POST /v1/signals/{signal_id}/explain
Content-Type: application/json
Authorization: Bearer <your-key>
{
"langfuse_trace_id": "b5ed23bee4f043bc8625914223875508"
}
Response includes root_cause, fix_content, fix_type, apply_blocked, and langfuse_prompt_name.
fix_type | Meaning | apply_blocked |
|---|---|---|
prompt_addition | One sentence to append to the system prompt | false — apply button shown |
code_change | Code or infrastructure fix needed (CONTEXT_BLOAT, SLOW_STEP, etc.) | true — apply button hidden |
no_auto_apply | Security signal (PROMPT_INJECTION_SIGNAL) — never auto-apply | true — blocked at API level |
5. Apply a fix to a managed prompt
When langfuse_prompt_name is non-null and apply_blocked is false:
POST /v1/signals/{signal_id}/apply-fix
Content-Type: application/json
Authorization: Bearer <your-key>
{
"fix_content": "Do not repeat a search query you have already executed in this run.",
"langfuse_prompt_name": "research-agent-prompt"
}
The fix is appended to the current prompt text and published as a new version. The dashboard shows "Applied as v4 in Langfuse ↗" with a link.
6. Track fix effectiveness
GET /v1/signals/{signal_id}/fix-status
Authorization: Bearer <your-key>
Returns runs_after_fix, recurrences_after_fix, and a verdict: verified (≥10 runs, 0 recurrences), likely_fixed (≥5 runs, 0 recurrences), still_occurring, or insufficient_data.
What each tool sees
| Concern | Dunetrace | Langfuse |
|---|---|---|
| Raw prompts & completions | Never — SHA-256 hashed | Yes — full content |
| Structural failures (loops, stalls…) | Automatic, 15 detectors | Manual inspection |
| Proactive alerting | Slack / webhook in <15s | No — passive |
| Prompt fix workflow | Explain + apply in one click | Manual editing |
| Trace timeline | Step graph (hashed) | Full span tree with content |
The Langfuse trace is never stored by Dunetrace — fetched, analysed, discarded. See the LangChain guide for the full runnable example with both callbacks.
OpenTelemetry
pip install 'dunetrace[otel]' opentelemetry-exporter-otlp-proto-grpc
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from dunetrace.integrations.otel import DunetraceOTelExporter
resource = Resource.create({
"service.name": "my-agent-service",
"deployment.environment": "production",
})
provider = TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter()))
dt = Dunetrace(otel_exporter=DunetraceOTelExporter(provider))
Each agent run produces a trace with a deterministic trace_id derived from run_id. Failure signals at run end are written as indexed attributes on the root span (dunetrace.signal.0.failure_type, .severity, .confidence). HIGH / CRITICAL signals set span.status = ERROR.
OpenLLMetry / OTel receiver
OpenLLMetry instruments 40+ AI frameworks and emits standard gen_ai.* OTel spans. Add DunetraceOTelReceiver as a second exporter and get behavioral detection with zero agent-code changes.
pip install 'dunetrace[otel]'
from dunetrace.integrations.otel_receiver import DunetraceOTelReceiver
DunetraceOTelReceiver.attach(provider, dt, agent_id="my-agent")
from traceloop.sdk import Traceloop
Traceloop.init(app_name="my-agent", tracer_provider=provider)
Each OTel trace becomes one Dunetrace run. Spans with gen_ai.request.model translate to llm_called / llm_responded. Spans with gen_ai.tool.name become tool_called / tool_responded.
gen_ai.* attribute handling
| Attribute | Handling |
|---|---|
gen_ai.request.model | Passed as-is (not sensitive) |
gen_ai.usage.prompt_tokens | Passed as-is |
gen_ai.usage.completion_tokens | Passed as-is |
gen_ai.completion.0.finish_reason | Passed as-is |
gen_ai.tool.name | Passed as-is |
gen_ai.prompt / gen_ai.completion | SHA-256 hashed at receiver boundary |
gen_ai.prompt.0.content | SHA-256 hashed at receiver boundary |
FastAPI / ASGI
from fastapi import FastAPI
from dunetrace import Dunetrace, DunetraceASGIMiddleware, get_current_run
dt = Dunetrace()
dt.auto_instrument()
app = FastAPI()
app.add_middleware(
DunetraceASGIMiddleware,
dt=dt, agent_id="my-api", model="gpt-4o",
)
@app.post("/chat")
async def chat(query: str):
run = get_current_run() # run opened by middleware
resp = await openai_client.chat.completions.create(
model="gpt-4o", messages=[{"role": "user", "content": query}],
)
return resp.choices[0].message.content
The run is also available on request.state.dunetrace_run.
Flask / WSGI
from flask import Flask
from dunetrace import Dunetrace, DunetraceWSGIMiddleware
dt = Dunetrace()
dt.auto_instrument()
app = Flask(__name__)
app.wsgi_app = DunetraceWSGIMiddleware(app.wsgi_app, dt=dt, agent_id="my-api")
Django
# wsgi.py
from dunetrace import Dunetrace, DunetraceWSGIMiddleware
from django.core.wsgi import get_wsgi_application
dt = Dunetrace()
dt.auto_instrument()
application = DunetraceWSGIMiddleware(get_wsgi_application(), dt=dt, agent_id="django-api")
Auto-instrumentation
dt.auto_instrument() patches supported AI framework clients at the class level so every LLM call made inside a dt.run() context (or inside a @dt.agent() function or middleware-wrapped request) is tracked automatically.
Supported frameworks: openai, anthropic, httpx, requests. Uninstalled frameworks are silently skipped. Calling auto_instrument() more than once is safe.
dt.auto_instrument() # patch all installed
dt.auto_instrument(["openai", "anthropic"]) # LLM clients only
dt.auto_instrument(["httpx", "requests"]) # HTTP clients only
Grafana / Loki
dt = Dunetrace(emit_as_json=True)
Writes every event to stdout as a Loki-compatible NDJSON line. Each line includes ts, level, logger, event_type, agent_id, run_id, step_index, payload.
Minimal Promtail pipeline stage:
pipeline_stages:
- json:
expressions: {ts: ts, event_type: event_type, agent_id: agent_id}
- timestamp:
source: ts
format: RFC3339Nano
- labels:
agent_id:
event_type:
get_current_run()
Returns the active RunContext for the current async task or thread, or None. Works inside @dt.agent(), ASGI middleware, WSGI middleware, and direct dt.run().
from dunetrace import get_current_run
def some_helper():
run = get_current_run()
if run:
run.tool_called("cache_lookup")
result = cache.get(key)
run.tool_responded("cache_lookup", success=result is not None)
return result
Policies
Runtime guardrails evaluated mid-run after every tool_called, llm_responded, and tool_responded event. Policies fire at most once per run (except log policies, which fire every time).
Local policies (no backend required)
from dunetrace import Dunetrace, PolicyViolation
dt = Dunetrace()
# Stop the run if tool calls exceed 5
dt.add_policy(
name="cap tool calls",
condition={"trigger": "tool_call_count", "operator": "gt", "value": 5},
action={"type": "stop"},
)
# Downgrade model when estimated cost exceeds $0.50
dt.add_policy(
name="cost cap",
condition={"trigger": "cost_usd", "operator": "gt", "value": 0.50},
action={"type": "switch_model", "params": {"model": "gpt-4o-mini"}},
)
# Inject a corrective prompt when a loop is detected mid-run
dt.add_policy(
name="loop fix",
condition={"trigger": "signal", "operator": "eq", "value": "TOOL_LOOP"},
action={"type": "inject_prompt", "params": {"prompt": "Stop repeating tool calls. Summarise what you know and answer directly."}},
)
Remote policies (dashboard-managed)
When api_key and endpoint are set, the SDK fetches policies from the backend at run start and caches them for 60 seconds. Policies defined in the dashboard apply automatically — no code changes needed.
dt = Dunetrace(api_key="dt_live_...", endpoint="https://ingest.dunetrace.com")
# Policies defined in the Policies page are pulled at run start.
Local policies (added via add_policy()) take priority over remote ones at the same priority level.
Condition reference
| Trigger | Type | What it measures |
|---|---|---|
tool_call_count | int | Total tool calls in the run so far |
step_count | int | Current step index |
cost_usd | float | Accumulated LLM cost in USD (model-aware pricing) |
error_count | int | Failed tool calls (success=False) |
finish_reason | str | Latest LLM finish_reason (e.g. "length", "stop") |
llm_latency_ms | int | Latest LLM call latency in milliseconds |
signal | str | Detector signal name — runs the full detector suite lazily (e.g. "TOOL_LOOP") |
Supported operators: gt gte lt lte eq neq contains
Action reference
| Action type | Effect |
|---|---|
stop | Raises PolicyViolation; run exits with exit_reason="policy_violation" |
switch_model | Sets run.model_override — read it between steps to switch the model |
inject_prompt | Appends to run.prompt_additions — pop with run.pop_prompt_addition() and prepend to next LLM call |
log | Emits policy.triggered event; no interruption; fires on every matching event |
Dashboard CRUD
Policies can be created, edited, toggled, and deleted from the Policies page at http://localhost:3000. The backend REST API:
| Endpoint | Description |
|---|---|
GET /v1/policies | List all policies |
POST /v1/policies | Create a policy |
PUT /v1/policies/{id} | Replace a policy |
DELETE /v1/policies/{id} | Delete a policy |
PATCH /v1/policies/{id}/toggle | Enable / disable |