AgentTel
Agent-Ready Telemetry
AgentTel enriches OpenTelemetry telemetry with the structured context AI agents need to autonomously diagnose, reason about, and resolve production incidents β without human interpretation of dashboards. Works across the full stack: JVM backends (Java, Kotlin, Scala), Go backends, Node.js/TypeScript backends, Python backends (FastAPI, Django, Flask), and browser frontends (TypeScript/JavaScript).
Standard observability answers "What happened?" AgentTel adds "What does an AI agent need to know to act on this?"
Modern observability tools generate massive volumes of telemetry β traces, metrics, logs β optimized for human consumption through dashboards and alert rules. AI agents tasked with autonomous incident response face critical gaps:
- No behavioral context β Spans lack baselines, so agents can't distinguish normal from anomalous
- No topology awareness β Agents don't know which services are critical, who owns them, or what depends on what
- No decision metadata β Is this operation retryable? Is there a fallback? What's the runbook?
- No actionable interface β Agents can read telemetry but can't query live system state or execute remediation
AgentTel closes these gaps at the instrumentation layer.
Core principle: telemetry should carry enough context for AI agents to reason and act autonomously.
AgentTel enriches telemetry at three levels β all configurable via YAML, no code changes required:
| Level | Where | What | Example |
|---|---|---|---|
| Topology | OTel Resource (once per service) | Service identity, ownership, dependencies | team, tier, on-call channel |
| Baselines | Span attributes (per operation) | What "normal" looks like | P50/P99 latency, error rate |
| Decisions | Span attributes (per operation) | What an agent is allowed to do | retryable, runbook URL, escalation level |
Topology is set once on the OTel Resource and automatically associated with all telemetry by the SDK. Baselines and decision metadata are attached per-operation on spans. This avoids redundant data on every span while ensuring agents always have the full context.
Try AgentTel in one command β starts a demo payment service with OTel Collector and Jaeger:
cd examples/spring-boot-example
docker compose -f docker/docker-compose.yml up --buildThen open Jaeger to see enriched traces, Swagger UI for the API, and MCP Tool Docs for the agent interface.
Every span is automatically enriched with agent-actionable attributes:
| Category | Attributes | Purpose |
|---|---|---|
| Topology | agenttel.topology.team, tier, domain, dependencies |
Service identity and dependency graph |
| Baselines | agenttel.baseline.latency_p50_ms, error_rate, source |
What "normal" looks like for each operation |
| Decisions | agenttel.decision.retryable, idempotent, runbook_url, escalation_level |
What an agent is allowed to do |
| Anomalies | agenttel.anomaly.detected, pattern, score |
Real-time deviation detection |
| SLOs | agenttel.slo.budget_remaining, burn_rate |
Error budget consumption tracking |
A complete toolkit for AI agent interaction with production systems:
| Component | Description |
|---|---|
| MCP Server | JSON-RPC server implementing the Model Context Protocol β exposes telemetry as tools AI agents can call |
| Health Aggregation | Real-time service health from span data with operation-level and dependency-level metrics |
| Incident Context | Structured incident packages: what's happening, what changed, what's affected, what to do |
| Remediation Framework | Registry of executable remediation actions with approval workflows |
| Action Tracking | Every agent decision and action recorded as OTel spans for full auditability |
| Context Formatters | Prompt-optimized output formats (compact, full, JSON) tuned for LLM context windows |
Browser SDK for agent-ready frontend observability:
| Feature | Description |
|---|---|
| Auto-Instrumentation | Page loads (Navigation Timing API), SPA navigation, fetch/XMLHttpRequest interception, click/submit interactions, JavaScript errors |
| Journey Tracking | Multi-step user funnel tracking with completion rates, abandonment detection, and duration baselines |
| Anomaly Detection | Client-side pattern detection β rage clicks, API failure cascades, slow page loads, error loops, funnel drop-offs |
| Cross-Stack Correlation | W3C Trace Context injection on all outgoing requests; backend trace ID extraction from responses |
| Route Baselines | Per-route configuration of expected page load times, API response times, error rates, and business criticality |
| Decision Metadata | Escalation levels, runbook URLs, retry policies, and fallback pages per route |
IDE-integrated MCP server for automated instrumentation setup:
| Tool | Description |
|---|---|
analyze_codebase |
Scans Java/Spring Boot source code β detects endpoints, dependencies, and framework |
instrument_backend |
Generates backend config β Gradle/Maven dependencies, annotations, agenttel.yml |
instrument_frontend |
Generates frontend config β React route detection, criticality inference, SDK initialization |
validate_instrumentation |
Validates agenttel.yml completeness against source code |
suggest_improvements |
Analyzes config and suggests fixes β missing baselines, uncovered endpoints, stale thresholds |
apply_improvements |
Auto-applies low-risk improvements using live health data; flags high-risk items for review |
Full observability for AI/ML workloads on the JVM:
| Framework | Approach | Coverage |
|---|---|---|
| Spring AI | SpanProcessor enrichment of existing Micrometer spans | Framework tag, cost calculation |
| LangChain4j | Decorator-based full instrumentation | Chat, embeddings, RAG retrieval |
| Anthropic SDK | Client wrapper | Messages API with token/cost tracking |
| OpenAI SDK | Client wrapper | Chat completions with token/cost tracking |
| AWS Bedrock | Client wrapper | Converse API with token/cost tracking |
Full lifecycle tracing for AI agents with 70+ semantic attributes:
| Feature | Description |
|---|---|
| Invocation Lifecycle | Goal, status, step count, max steps for each agent execution |
| Reasoning Steps | Thought, action, observation, evaluation, revision tracking |
| Tool Calls | Tool name, success/error/timeout status per call |
| Task Decomposition | Nested task breakdown with depth and parent tracking |
| Orchestration Patterns | Sequential, parallel, evaluator-optimizer, handoff, ReAct, orchestrator-workers |
| Cost Aggregation | Automatic LLM cost rollup from GenAI spans to agent sessions |
| Guardrails | Block, warn, log, escalate actions with named guardrails |
| Human Checkpoints | Approval, feedback, correction gates with wait time tracking |
| Loop Detection | Detects stuck reasoning loops (identical tool calls) |
| Quality Signals | Goal achievement, human interventions, eval scores |
| RAG Pipeline | Retriever and reranker spans with relevance scoring |
| Error Classification | Source (LLM/tool/agent/guardrail/timeout/network), retryability |
Three integration styles β programmatic, annotation, or YAML config:
Programmatic:
AgentTracer tracer = AgentTracer.create(openTelemetry)
.agentName("incident-responder")
.agentType(AgentType.SINGLE)
.build();
try (AgentInvocation inv = tracer.invoke("Diagnose high latency")) {
inv.step(StepType.THOUGHT, "Need to check service metrics");
try (ToolCallScope tool = inv.toolCall("get_service_health")) {
tool.success();
}
inv.complete(true);
}@AgentMethod annotation (Spring Boot):
@AgentMethod(name = "incident-responder", type = "single", maxSteps = 100)
public IncidentReport diagnose(String incidentId) {
// Automatically wrapped in AgentInvocation β no manual tracer calls
return analyzeAndRespond(incidentId);
}YAML config (Spring Boot):
agenttel:
agentic:
agents:
incident-responder:
type: single
max-steps: 100
loop-threshold: 5AgentTel supports multiple integration paths β pick what fits your stack:
| Path | Best For | Effort |
|---|---|---|
| Spring Boot Starter | Spring Boot applications | Add dependency + YAML config |
| Go SDK | Go services (net/http, Gin, gRPC) | go get + YAML config |
| Node.js SDK | Express / Fastify services | npm install + YAML config |
| Python SDK | FastAPI / Python services | pip install + YAML config |
| JavaAgent Extension | Any JVM app (no code changes) | JVM flag + YAML config |
| Web SDK | Browser/SPA applications | npm install + init call |
| Instrument Agent | IDE-assisted setup | Run MCP server in IDE |
Maven:
<dependencies>
<!-- Core: span enrichment, baselines, anomaly detection, SLO tracking -->
<dependency>
<groupId>dev.agenttel</groupId>
<artifactId>agenttel-spring-boot-starter</artifactId>
<version>0.3.0-alpha</version>
</dependency>
<!-- Optional: GenAI instrumentation -->
<dependency>
<groupId>dev.agenttel</groupId>
<artifactId>agenttel-genai</artifactId>
<version>0.3.0-alpha</version>
</dependency>
<!-- Optional: Agent interface layer (MCP server, incident context, remediation) -->
<dependency>
<groupId>dev.agenttel</groupId>
<artifactId>agenttel-agent</artifactId>
<version>0.3.0-alpha</version>
</dependency>
</dependencies>Gradle:
// build.gradle.kts
dependencies {
// Core: span enrichment, baselines, anomaly detection, SLO tracking
implementation("dev.agenttel:agenttel-spring-boot-starter:0.3.0-alpha")
// Optional: GenAI instrumentation
implementation("dev.agenttel:agenttel-genai:0.3.0-alpha")
// Optional: Agent interface layer (MCP server, incident context, remediation)
implementation("dev.agenttel:agenttel-agent:0.3.0-alpha")
}All enrichment is driven by YAML configuration -- no code changes needed:
# application.yml
agenttel:
# Topology: set once on the OTel Resource, associated with all telemetry
topology:
team: payments-platform
tier: critical
domain: commerce
on-call-channel: "#payments-oncall"
dependencies:
- name: postgres
type: database
criticality: required
timeout-ms: 5000
circuit-breaker: true
- name: stripe-api
type: rest_api
criticality: required
fallback: "Return cached pricing"
# Reusable operational profiles β reduce repetition across operations
profiles:
critical-write:
retryable: false
escalation-level: page_oncall
safe-to-restart: false
read-only:
retryable: true
idempotent: true
escalation-level: notify_team
# Per-operation baselines and decision metadata
# Use bracket notation [key] for operation names with special characters
operations:
"[POST /api/payments]":
profile: critical-write
expected-latency-p50: "45ms"
expected-latency-p99: "200ms"
expected-error-rate: 0.001
retryable: true # overrides profile default
idempotent: true
runbook-url: "https://wiki/runbooks/process-payment"
"[GET /api/payments/{id}]":
profile: read-only
expected-latency-p50: "15ms"
expected-latency-p99: "80ms"
baselines:
rolling-window-size: 1000
rolling-min-samples: 10
anomaly-detection:
z-score-threshold: 3.0Annotations are optional -- YAML config above is sufficient. Use @AgentOperation when you want IDE autocomplete and compile-time validation. Reference profiles to avoid repeating values:
@AgentOperation(profile = "critical-write")
@PostMapping("/api/payments")
public ResponseEntity<PaymentResult> processPayment(@RequestBody PaymentRequest req) {
// Your business logic β spans are enriched automatically
}When both YAML config and annotations define the same operation, YAML config takes priority. Per-operation values override profile defaults.
// Expose telemetry to AI agents via MCP
McpServer mcp = new AgentTelMcpServerBuilder()
.port(8081)
.contextProvider(agentContextProvider)
.remediationExecutor(remediationExecutor)
.build();
mcp.start();AI agents can now call tools like get_service_health, get_incident_context, list_remediation_actions, and execute_remediation over JSON-RPC.
Resource attributes (set once per service, associated with all telemetry):
agenttel.topology.team = "payments-platform"
agenttel.topology.tier = "critical"
agenttel.topology.domain = "commerce"
agenttel.topology.
agenttel.topology.dependencies = [{"name":"postgres","type":"database",...}]
Span attributes (per operation, only on operations with registered metadata):
agenttel.baseline.latency_p50_ms = 45.0
agenttel.baseline.latency_p99_ms = 200.0
agenttel.baseline.error_rate = 0.001
agenttel.baseline.source = "static"
agenttel.decision.retryable = true
agenttel.decision.runbook_url = "https://wiki/runbooks/process-payment"
agenttel.decision.escalation_level = "page_oncall"
agenttel.anomaly.detected = false
agenttel.slo.budget_remaining = 0.85
When an incident occurs, agents get structured context via MCP:
=== INCIDENT inc-a3f2b1c4 ===
SEVERITY: HIGH
SUMMARY: POST /api/payments experiencing elevated error rate (5.2%)
## WHAT IS HAPPENING
Error Rate: 5.2% (baseline: 0.1%)
Latency P50: 312ms (baseline: 45ms)
Patterns: ERROR_RATE_SPIKE
## WHAT CHANGED
Last Deploy: v2.1.0 at 2025-01-15T14:30:00Z
## WHAT IS AFFECTED
Scope: operation_specific
User-Facing: YES
Affected Deps: stripe-api
## SUGGESTED ACTIONS
- [HIGH] rollback_deployment: Rollback to previous version (NEEDS APPROVAL)
- [MEDIUM] enable_circuit_breakers: Circuit break stripe-api
pip install agenttel[fastapi]
# Optional extras
pip install agenttel[openai] # OpenAI instrumentation
pip install agenttel[anthropic] # Anthropic instrumentation
pip install agenttel[langchain] # LangChain instrumentation
pip install agenttel[all] # Everything# agenttel.yml
agenttel:
topology:
service-name: payment-service
team: payments-platform
tier: critical
domain: commerce
on-call-channel: "#payments-oncall"
operations:
"POST /api/payments":
expected-latency-p50: 45ms
expected-latency-p99: 200ms
retryable: true
runbook-url: "https://wiki/runbooks/process-payment"
slo:
availability:
target: 0.999
type: availabilityfrom fastapi import FastAPI
from agenttel.fastapi import instrument_fastapi
app = FastAPI()
instrument_fastapi(app) # One-line integrationAll spans are now enriched with topology, baselines, anomaly detection, and SLO tracking β identical attributes to the JVM SDK.
go get go.agenttel.dev/agenttel-go@latest# agenttel.yml β same format as JVM/Python SDKs
agenttel:
topology:
service-name: payment-service
team: payments-platform
tier: critical
operations:
"POST /api/payments":
expected-latency-p50: 45ms
expected-latency-p99: 200ms
retryable: trueimport (
agenttel "go.agenttel.dev/agenttel-go"
agmw "go.agenttel.dev/agenttel-go/middleware/http"
)
cfg, _ := agenttel.LoadConfigFile("agenttel.yml")
engine := agenttel.NewEngineBuilder(cfg).Build()
// Wrap your handler with AgentTel middleware
handler := agmw.Middleware(mux,
agmw.WithBaselineProvider(engine.BaselineProvider()),
agmw.WithTopology(engine.TopologyRegistry()),
)Also supports Gin (middleware/gin) and gRPC (middleware/grpc) out of the box. See Go Quickstart for details.
npm install @agenttel/node @opentelemetry/api @opentelemetry/sdk-trace-node# agenttel.yml β same format as all other SDKs
agenttel:
topology:
service-name: payment-service
team: payments-platform
tier: critical
operations:
"POST /api/payments":
expected-latency-p50: 45ms
expected-latency-p99: 200ms
retryable: trueimport { AgentTelEngineBuilder, expressMiddleware, loadConfig } from '@agenttel/node';
const config = loadConfig('agenttel.yml');
const engine = new AgentTelEngineBuilder()
.withTeam(config.topology?.team ?? 'my-team')
.withTier(config.topology?.tier ?? 'critical')
.build();
app.use(expressMiddleware({
baselineProvider: engine.baselineProvider,
topology: engine.topologyRegistry,
}));Also supports Fastify (fastifyPlugin). See Node.js Quickstart for details.
npm install @agenttel/webimport { AgentTelWeb } from '@agenttel/web';
AgentTelWeb.init({
appName: 'checkout-web',
appVersion: '1.0.0',
environment: 'production',
collectorEndpoint: '/otlp',
routes: {
'/checkout/:step': {
businessCriticality: 'revenue',
baseline: { pageLoadP50Ms: 800, apiCallP50Ms: 300 },
decision: { escalationLevel: 'page_oncall', runbookUrl: 'https://wiki/runbooks/checkout' },
},
},
journeys: {
checkout: {
steps: ['/products', '/cart', '/checkout/shipping', '/checkout/payment', '/confirmation'],
baseline: { completionRate: 0.65, avgDurationS: 300 },
},
},
anomalyDetection: {
rageClickThreshold: 3,
apiFailureCascadeThreshold: 3,
errorLoopThreshold: 5,
},
});The SDK automatically instruments page loads, SPA navigation, API calls, clicks, and errors β with cross-stack correlation via W3C Trace Context headers.
Run the instrument agent as an MCP server in your IDE (Cursor, VS Code, etc.):
pip install agenttel-instrument
agenttel-instrument --config instrument.ymlThen ask your IDE agent: "Analyze my codebase and generate AgentTel configuration" β it will scan your endpoints, detect dependencies, and generate a complete agenttel.yml.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Application β
β Backend: application.yml + @AgentOperation β
β Frontend: AgentTelWeb.init(config) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββββββββββββ€
β agenttel-spring-boot- β agenttel- β agenttel-web β
β starter β javaagent- β (@agenttel/web) β
β Auto-config Β· BPP Β· AOP β extension β Browser SDK β
β (Spring Boot apps) β (any JVM app) β (TypeScript/JS) β
ββββββββββββββββ¬ββββββββββββ΄βββ¬ββββββββββββββββ΄ββββββββββββββββββββββββ€
βagenttel-core βagenttel-genaiβ agenttel-agent β
β β β β
β SpanProcessorβ LangChain4j β MCP Server (JSON-RPC) β
β Baselines β Spring AI β Health Aggregation β
β Anomaly β Anthropic SDKβ Incident Context + Reporting β
β Detection β OpenAI SDK β Remediation Framework β
β SLO Tracking β Bedrock SDK β Trend Analysis Β· SLO Reports β
β Pattern β Cost Calc β Executive Summaries β
β Matching β β Cross-Stack Context β
ββββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββ€
β agenttel-api β
β @AgentOperation Β· AgentTelAttributes Β· Data Models β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β OpenTelemetry SDK β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
agenttel-go (Go SDK β full feature parity)
Core Β· net/http Β· Gin Β· gRPC Β· GenAI Β· Agent/MCP Β· Agentic Observability
agenttel-node (Node.js SDK β full feature parity)
Core Β· Express Β· Fastify Β· GenAI Β· Agent/MCP Β· Agentic Observability
agenttel-python (Python SDK β full feature parity)
Core Β· FastAPI Β· GenAI Β· Agent/MCP Β· Agentic Observability
agenttel-instrument (IDE MCP Server β Python)
Codebase analysis Β· Config generation Β· Validation Β· Auto-improvements
| Module | Artifact | Description |
|---|---|---|
agenttel-api |
dev.agenttel:agenttel-api |
Annotations, attribute constants, enums, data models. Zero runtime dependencies. |
agenttel-core |
dev.agenttel:agenttel-core |
Runtime engine β span enrichment, static + rolling baselines, z-score anomaly detection, pattern matching, SLO tracking, structured events. |
agenttel-genai |
dev.agenttel:agenttel-genai |
GenAI instrumentation β LangChain4j wrappers, Spring AI enrichment, Anthropic/OpenAI/Bedrock SDK instrumentation, cost calculation. |
agenttel-agent |
dev.agenttel:agenttel-agent |
Agent interface layer β MCP server, health aggregation, incident context, remediation, trend analysis, SLO reports, executive summaries, cross-stack context. |
agenttel-agentic |
dev.agenttel:agenttel-agentic |
Agent observability β lifecycle, reasoning, orchestration patterns (ReAct, sequential, parallel, handoff, evaluator-optimizer), cost aggregation, quality signals, guardrails, loop detection. |
agenttel-javaagent |
dev.agenttel:agenttel-javaagent |
Zero-code OTel javaagent extension. Drop-in enrichment for any JVM app β no Spring dependency. |
agenttel-spring-boot-starter |
dev.agenttel:agenttel-spring-boot-starter |
Spring Boot auto-configuration. Single dependency for Spring Boot apps. |
agenttel-web |
@agenttel/web (npm) |
Browser telemetry SDK β auto-instrumentation of page loads, navigation, API calls, errors, Web Vitals, journey tracking, anomaly detection, W3C trace propagation. |
agenttel-go |
go.agenttel.dev/agenttel-go (Go module) |
Go SDK β full feature parity. Core enrichment, net/http + Gin + gRPC middleware, GenAI instrumentation, MCP agent interface, agentic observability. |
agenttel-node |
@agenttel/node (npm) |
Node.js SDK β full feature parity. Core enrichment, Express + Fastify middleware, GenAI instrumentation, MCP agent interface, agentic observability. |
agenttel-types |
@agenttel/types (npm) |
Shared TypeScript types, enums, and attribute constants for @agenttel/web and @agenttel/node. |
agenttel-python |
agenttel (pip) |
Python SDK β full feature parity with JVM SDK. Core enrichment, FastAPI integration, GenAI instrumentation, MCP agent interface, agentic observability. |
agenttel-instrument |
agenttel-instrument (pip) |
IDE MCP server β codebase analysis, config generation, validation, improvement suggestions, and auto-apply for both backend and frontend instrumentation. |
agenttel-testing |
dev.agenttel:agenttel-testing |
Test utilities for verifying span enrichment. |
Full documentation site: agenttel.dev
| Document | Description |
|---|---|
| Project Overview | Vision, motivation, and design philosophy |
| Semantic Conventions | Complete attribute and event schema reference |
| Architecture | Technical architecture, data flow, and extension points |
| Agent Layer | MCP server, incident context, remediation, and agent interaction |
| GenAI Instrumentation | LLM framework instrumentation and cost tracking |
| API Reference | Annotations, programmatic API, and configuration reference |
| Roadmap | Implementation phases and release plan |
| Design Considerations | Trade-offs, evolution path, and future direction |
| API Documentation | Swagger UI, MCP tool docs, and aggregated Javadoc |
Working examples to get you started quickly:
| Example | Description | Run Command |
|---|---|---|
| Spring Boot Example | Payment service with span enrichment, topology, baselines, anomaly detection, and MCP server | ./gradlew :examples:spring-boot-example:bootRun |
| LangChain4j Example | GenAI tracing with LangChain4j β chat spans, token tracking, and cost calculation | ./gradlew :examples:langchain4j-example:run |
| Go Service Example | Go payment service with net/http middleware, topology, baselines, anomaly detection | cd examples/go-service-example && go run . |
| Express Example | Node.js payment service with Express middleware, topology, baselines, anomaly detection | cd examples/express-example && npm run dev |
| React Checkout Example | React SPA with frontend telemetry β journey tracking, anomaly detection, cross-stack correlation | cd agenttel-web/examples/react-checkout && npm start |
Each example includes a README with step-by-step instructions and curl commands to exercise the instrumentation.
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| v0.2.0-alpha | ## What's New in v0.2.0-alpha ### Agentic Observability (`agenttel-agentic`) New module for tracing AI agent invocations with full lifecycle support: - **`AgentTracer`** β trace agent invocations with steps, tool calls, tasks, and handoffs - **`AgentInvocation`** β auto-closing scope with step counting, loop detection, and cost aggregation - **`AgentCostAggregator`** β SpanProcessor that rolls up GenAI token costs to parent agent spans - **`LoopDetector`** β configurable detection of repetiti | Low | 3/10/2026 |

