Source Code
The runnable TypeScript source for this lesson is in
lessons/06-testing-agents/
Lesson 6: Testing Agents¶
What You'll Learn¶
Agents are notoriously hard to test because: - LLM responses are non-deterministic - Tool execution has side effects - Conversations have state
In this lesson, we'll build testing infrastructure that makes agents testable.
Key Concepts¶
The Testing Challenge¶
Traditional unit tests assume deterministic outputs:
// This won't work - LLM outputs vary
expect(await agent.run("fix the bug")).toBe("I fixed the bug in line 42");
Agent tests need different approaches:
// Test behavior, not exact output
expect(result.toolsUsed).toContain('edit_file');
expect(result.filesModified).toContain('auth.py');
expect(result.success).toBe(true);
Testing Levels¶
- Unit Tests: Test individual components (tools, parsers)
- Integration Tests: Test tool + LLM interaction
- Scenario Tests: Test complete agent flows with recorded conversations
Mock Providers¶
Replace real LLMs with predictable mocks:
const mock = new ScriptedLLMProvider([
{ response: 'I\'ll read the file first.\n```json\n{"tool":"read_file"...}```' },
{ response: 'I found the bug. Let me fix it.\n```json\n{"tool":"edit_file"...}```' },
{ response: 'Done! The bug has been fixed.' },
]);
Conversation Fixtures¶
Record real conversations and replay them:
const fixture = loadFixture('fix-auth-bug.json');
const result = await replayConversation(fixture);
expect(result).toMatchSnapshot();
Files in This Lesson¶
mocks.ts- Mock LLM providers for testingfixtures/- Recorded conversation fixtureshelpers.ts- Testing utilitiesagent.test.ts- Example agent teststools.test.ts- Example tool tests
Running Tests¶
Testing Strategies¶
1. Scripted Responses¶
Best for testing specific behaviors:
test('agent uses read_file before editing', async () => {
const mock = new ScriptedLLMProvider([
{ response: '```json\n{"tool":"edit_file"...}```', mustContain: 'edit_file' },
]);
const result = await runAgent('fix bug', { llm: mock });
expect(mock.getCallLog()).toEqual([
expect.objectContaining({ containedToolCall: 'read_file' }),
expect.objectContaining({ containedToolCall: 'edit_file' }),
]);
});
2. Behavioral Assertions¶
Test outcomes, not exact responses:
test('agent successfully fixes a syntax error', async () => {
const result = await runAgent('fix the syntax error in main.ts');
expect(result.success).toBe(true);
expect(result.filesModified).toContain('main.ts');
// Verify the file no longer has syntax errors
const lintResult = await lintFile('main.ts');
expect(lintResult.errors).toHaveLength(0);
});
3. Snapshot Testing¶
Capture expected conversation flows:
test('agent conversation matches snapshot', async () => {
const result = await runAgent('create hello world');
expect(result.history).toMatchSnapshot();
});
Next Steps¶
After completing this lesson, move on to: - Lesson 7: MCP Integration - Extend capabilities via protocol