Source Code

The runnable TypeScript source for this lesson is in lessons/19-observability/

Lesson 19: Observability & Tracing¶

Monitoring, debugging, and understanding agent behavior in production

What You'll Learn¶

Distributed Tracing: Track operations through spans
Metrics Collection: Token/cost/latency tracking
Structured Logging: Correlated logs for debugging
Export Formats: Console, JSON, JSONL, OTLP
Cost Attribution: Per-task cost tracking

Why This Matters¶

Without observability, debugging agents is guesswork:

Without Observability:
  Agent run fails... but why?

  ? Which LLM call took so long?
  ? Which tool call failed?
  ? How much did this run cost?
  ? What was the sequence of operations?

With Observability:
  "LLM call at 2:34:15 took 5.2s (timeout was 5s)"
  "Tool 'search' returned 0 results"
  "Total cost: $0.0023, 450 input + 120 output tokens"

Key Concepts¶

Traces and Spans¶

Trace (full operation)
└── Span: agent.run (500ms)
    ├── Span: agent.plan (50ms)
    ├── Span: llm.call (150ms)
    |   └── attributes: model=claude-3-sonnet, tokens=200
    ├── Span: tool.search (80ms)
    |   └── attributes: results=5
    └── Span: llm.call (120ms)

Agent Metrics¶

interface AgentMetrics {
  inputTokens: number;
  outputTokens: number;
  cacheReadTokens: number;
  estimatedCost: number;
  toolCalls: number;
  llmCalls: number;
  duration: number;
  errors: number;
}

Log Correlation¶

Logs include trace/span IDs for correlation:

2024-01-15T10:30:00Z INFO  [abc12345]: Making LLM call {model: "claude-3"}
2024-01-15T10:30:01Z WARN  [abc12345]: Rate limit approaching {remaining: 10}

Files in This Lesson¶

File	Purpose
`types.ts`	Span, Metric, Log type definitions
`tracer.ts`	OpenTelemetry-style tracing
`metrics.ts`	Token/cost/latency tracking
`logger.ts`	Structured logging
`exporter.ts`	Output formats (console, JSON, OTLP)
`main.ts`	Demonstration of all concepts

Running This Lesson¶

npm run lesson:19

Code Examples¶

Basic Tracing¶

import { createTracer } from './tracer.js';

const tracer = createTracer('my-agent');

// Wrap operations with spans
const result = await tracer.withSpan('agent.run', async (span) => {
  tracer.setAttribute(span, 'goal', 'Find tutorials');

  // Nested span for LLM call
  await tracer.withSpan('llm.call', async (llmSpan) => {
    tracer.setAttribute(llmSpan, 'model', 'claude-3-sonnet');
    tracer.setAttribute(llmSpan, 'tokens', 150);
    // ... make LLM call
  }, { kind: 'client' });

  return 'success';
});

Metrics Collection¶

import { createMetricsCollector } from './metrics.js';

const metrics = createMetricsCollector({ service: 'my-agent' });

// Record LLM call
metrics.recordLLMCall(
  'claude-3-sonnet',
  150,  // input tokens
  50,   // output tokens
  120,  // duration ms
  false // cached
);

// Record tool call
metrics.recordToolCall('search', 80, true);

// Get summary
const summary = metrics.getAgentMetrics();
console.log(`Total cost: $${summary.estimatedCost.toFixed(4)}`);

Structured Logging¶

import { createLogger } from './logger.js';

const logger = createLogger('agent', { minLevel: 'info' });

logger.info('Starting task', { goal: 'Find tutorials' });
logger.debug('Configuration loaded', { model: 'claude-3' });
logger.warn('Rate limit approaching', { remaining: 10 });
logger.error('Task failed', new Error('Timeout'), { retries: 3 });

Exporting Data¶

import { ConsoleExporter, JSONLExporter, OTLPExporter } from './exporter.js';

// Console (human-readable)
const console = new ConsoleExporter();

// JSONL (streaming)
const jsonl = new JSONLExporter('/path/to/traces.jsonl');

// OTLP (observability platforms)
const otlp = new OTLPExporter('http://collector:4318');

// Export spans
await exporter.exportSpans(tracer.getAllTraces()[0].spans);
await exporter.flush();

Cost Tracking¶

Model pricing (per 1K tokens):

Model	Input	Output
GPT-4	$0.030	$0.060
GPT-4-Turbo	$0.010	$0.030
Claude-3-Opus	$0.015	$0.075
Claude-3-Sonnet	$0.003	$0.015
Claude-3-Haiku	$0.00025	$0.00125

const cost = metrics.calculateCost('claude-3-sonnet', 1000, 500);
// $0.003 + $0.0075 = $0.0105

Alerting Patterns¶

// Latency alert
metrics.on((event) => {
  if (event.type === 'metric.recorded' &&
      event.metric.name === 'agent.llm.duration' &&
      event.metric.value > 5000) {
    alert('LLM latency > 5s');
  }
});

// Error rate alert
const errorRate = metrics.getAgentMetrics().errors /
                  metrics.getAgentMetrics().llmCalls;
if (errorRate > 0.05) {
  alert('Error rate > 5%');
}

// Cost alert
if (metrics.getAgentMetrics().estimatedCost > 10) {
  alert('Daily cost exceeded $10');
}

Export Formats¶

Format	Use Case
console	Development, quick debugging
json	Analysis, archival
jsonl	Streaming, log aggregation
otlp	Observability platforms (Jaeger, Datadog)

Best Practices¶

Span Naming¶

Use dot notation: agent.run, llm.call, tool.search
Include operation type and subject

Attribute Selection¶

Don't log sensitive data (API keys, PII)
Include enough context for debugging
Use consistent attribute names

Sampling¶

Sample traces in production (e.g., 10%)
Always record errors
Full sampling in development

Cost Management¶

Set daily/monthly budgets
Alert before hitting limits
Track cost per task type

Advanced: Execution Economics¶

The production agent implements an ExecutionEconomicsManager that replaces hard-coded iteration limits with intelligent budget management and progress detection.

The Problem with Iteration Limits¶

Traditional approach:
  maxIterations = 100  // Arbitrary!

Problems:
- Simple tasks hit the limit unnecessarily
- Complex tasks get cut off prematurely
- No correlation to actual cost or time
- Can't distinguish productive vs. stuck

Multi-Dimensional Budgets¶

interface ExecutionBudget {
  // Hard limits (force stop)
  maxTokens: number;           // e.g., 200000
  maxCost: number;             // e.g., $0.50 USD
  maxDuration: number;         // e.g., 300000ms (5 min)

  // Soft limits (warn/prompt for extension)
  softTokenLimit: number;      // e.g., 150000
  softCostLimit: number;       // e.g., $0.30 USD
  softDurationLimit: number;   // e.g., 180000ms (3 min)

  // Iteration is now soft guidance
  targetIterations: number;    // e.g., 20 (advisory)
  maxIterations: number;       // e.g., 100 (safety cap)
}

Progress Detection¶

interface ProgressState {
  filesRead: Set<string>;       // Files we've read
  filesModified: Set<string>;   // Files we've changed
  commandsRun: string[];        // Commands executed
  recentToolCalls: ToolCall[];  // Last 10 calls for loop detection
  lastMeaningfulProgress: number; // Timestamp
  stuckCount: number;           // Iterations without progress
}

// Detect stuck state
function isStuck(): boolean {
  // Check for repeated identical calls
  if (recentToolCalls.length >= 3) {
    const last3 = recentToolCalls.slice(-3);
    const unique = new Set(last3.map(tc => `${tc.tool}:${tc.args}`));
    if (unique.size === 1) return true; // Same call 3x
  }

  // Check for no progress over time
  const timeSinceProgress = Date.now() - lastMeaningfulProgress;
  if (timeSinceProgress > 60000 && iterations > 5) {
    return true; // No progress for 1 min with 5+ iterations
  }

  return false;
}

Budget Check Flow¶

function checkBudget(): BudgetCheckResult {
  // 1. Check hard limits first
  if (usage.tokens >= budget.maxTokens) {
    return { canContinue: false, suggestedAction: 'stop' };
  }
  if (usage.cost >= budget.maxCost) {
    return { canContinue: false, suggestedAction: 'stop' };
  }

  // 2. Check soft limits (warnings)
  if (usage.tokens >= budget.softTokenLimit) {
    return { canContinue: true, suggestedAction: 'request_extension' };
  }

  // 3. Check if stuck
  if (progressState.stuckCount >= 3) {
    return { canContinue: true, suggestedAction: 'request_extension' };
  }

  // 4. All good
  return { canContinue: true, suggestedAction: 'continue' };
}

Preset Budgets¶

// Quick task - simple queries
const QUICK_BUDGET = {
  maxTokens: 50000,
  maxCost: 0.10,
  maxDuration: 60000, // 1 minute
  targetIterations: 5,
};

// Standard task - typical development
const STANDARD_BUDGET = {
  maxTokens: 200000,
  maxCost: 0.50,
  maxDuration: 300000, // 5 minutes
  targetIterations: 20,
};

// Large task - complex multi-step
const LARGE_BUDGET = {
  maxTokens: 500000,
  maxCost: 2.00,
  maxDuration: 900000, // 15 minutes
  targetIterations: 50,
};

Integration Example¶

const economics = createEconomicsManager(STANDARD_BUDGET);

// Set extension handler (prompts user)
economics.setExtensionHandler(async (request) => {
  const answer = await askUser(
    `Task is using ${request.currentUsage.tokens} tokens. Extend budget?`
  );
  return answer ? request.suggestedExtension : null;
});

// In agent loop
while (true) {
  // Record LLM usage
  const response = await llm.chat(messages);
  economics.recordLLMUsage(response.inputTokens, response.outputTokens, model);

  // Record tool calls
  for (const call of response.toolCalls) {
    const result = await executeTool(call);
    economics.recordToolCall(call.name, call.args, result);
  }

  // Check budget
  const check = economics.checkBudget();
  if (!check.canContinue) {
    console.log(`Stopping: ${check.reason}`);
    break;
  }
  if (check.suggestedAction === 'request_extension') {
    const extended = await economics.requestExtension(check.reason);
    if (!extended) break;
  }
}

Next Steps¶

In Lesson 20: Sandboxing & Isolation, we'll learn how to: - Run untrusted code safely - Limit resource usage - Isolate agent execution environments