Source Code

The runnable TypeScript source for this lesson is in lessons/16-reflection/

Lesson 16: Self-Reflection & Critique¶

Teaching agents to evaluate and improve their own output

What You'll Learn¶

Reflection Prompts: Designing prompts for self-evaluation
Output Critique: Structured quality assessment
Reflection Loops: Iterative improvement cycles
Quality Scoring: Multi-dimensional output evaluation
Trajectory Analysis: Understanding improvement patterns

Why This Matters¶

Without self-reflection, agents produce output without knowing if it's good. Reflection enables:

Quality Assurance: Catch mistakes before delivery
Iterative Improvement: Get better with each attempt
Transparency: Understand why output was or wasn't satisfactory
Learning: Identify patterns in failures and successes

Key Concepts¶

Reflection Result¶

interface ReflectionResult {
  satisfied: boolean;        // Goal achieved?
  critique: string;          // Detailed feedback
  suggestions: string[];     // How to improve
  confidence: number;        // 0-1 assessment confidence
  issues: ReflectionIssue[]; // Specific problems found
  strengths: string[];       // What was done well
}

Quality Dimensions¶

+--------------------------------------------------------------+
|                    Quality Score                               |
+--------------------------------------------------------------+
|  Completeness  ========..  80%  Addresses all requirements    |
|  Correctness   =========.  90%  Accurate, error-free          |
|  Clarity       =======...  70%  Easy to understand            |
|  Efficiency    ======....  60%  Optimal implementation        |
|  Style         ========..  80%  Follows conventions           |
+--------------------------------------------------------------+
|  Overall                   76%                                |
+--------------------------------------------------------------+

Reflection Loop¶

+-------------------------------------------------------------+
|                                                               |
|  +----------+    +----------+    +------------------+        |
|  | Execute  |--->| Reflect  |--->| Satisfied?       |        |
|  | Task     |    | on       |    |                  |        |
|  |          |    | Output   |    | Yes -> Done      |        |
|  +----------+    +----------+    | No  -> Improve   |        |
|       ^                          +--------+---------+        |
|       |                                   |                  |
|       +------------- Feedback <-----------+                  |
|                                                              |
+-------------------------------------------------------------+

Files in This Lesson¶

File	Purpose
`types.ts`	Reflection types and interfaces
`reflector.ts`	Reflection prompt generation and processing
`critic.ts`	Output critique and quality scoring
`retry-loop.ts`	Reflection-driven retry mechanism
`main.ts`	Demonstration of all concepts

Running This Lesson¶

npm run lesson:16

Code Examples¶

Basic Reflection¶

import { SimpleReflector } from './reflector.js';

const reflector = new SimpleReflector({
  checkCompleteness: true,
  checkCorrectness: true,
  checkCodeQuality: true,
});

const result = await reflector.reflect(
  'Write a function to validate emails',
  `function validate(email) { return email.includes('@'); }`
);

console.log('Satisfied:', result.satisfied);
console.log('Confidence:', result.confidence);
console.log('Issues:', result.issues.length);

Output Critique¶

import { OutputCritic } from './critic.js';

const critic = new OutputCritic();

// Get detailed scores
const score = await critic.score(myCode);
console.log('Overall:', score.overall);
console.log('Completeness:', score.dimensions.completeness);
console.log('Correctness:', score.dimensions.correctness);

// Get full critique
const critique = await critic.critique(myCode, {
  checkCompleteness: true,
  checkCorrectness: true,
  checkCodeQuality: true,
  checkClarity: true,
  checkEdgeCases: true,
});

console.log('Assessment:', critique.assessment);
// 'excellent' | 'good' | 'acceptable' | 'needs_work' | 'poor'

Reflection Loop¶

import { ReflectionLoop } from './retry-loop.js';

const loop = new ReflectionLoop({
  maxAttempts: 3,
  satisfactionThreshold: 0.8,
  includePreviousAttempts: true,
});

// Subscribe to events
loop.on((event) => {
  if (event.type === 'reflection.completed') {
    console.log(`Attempt ${event.attempt}: ${event.result.confidence * 100}% confidence`);
  }
});

// Execute with reflection
const result = await loop.executeSimple(
  async () => generateCode(requirements),
  'Generate a sorting function'
);

console.log(`Completed in ${result.attempts} attempts`);
console.log(`Final confidence: ${result.reflections.at(-1)?.confidence}`);

Trajectory Analysis¶

const result = await loop.execute(task, goal);

const analysis = loop.analyzeTrajectory(result);

console.log('Improved:', analysis.improved);
console.log('Convergence:', analysis.convergence);
// 'improving' | 'plateau' | 'declining' | 'oscillating'

if (analysis.bottleneck) {
  console.log('Common issue:', analysis.bottleneck);
}

Reflection Prompts¶

Standard Template¶

You are a critical evaluator. Assess whether the output achieves the goal.

## Goal
${goal}

## Output to Evaluate
${output}

## Your Task
Evaluate completeness, correctness, clarity, and quality.

Respond in JSON:
{
  "satisfied": boolean,
  "critique": "detailed feedback",
  "suggestions": ["improvement 1", "improvement 2"],
  "confidence": 0.0-1.0,
  "issues": [...],
  "strengths": [...]
}

Code Review Template¶

Includes additional checks for: - Syntax and logic errors - Edge case handling - Security concerns - Code style and documentation

Strictness Levels¶

// Strict: Production code, security-sensitive
const strictLoop = createStrictLoop();
// - 5 max attempts
// - 0.9 satisfaction threshold
// - All criteria checked

// Lenient: Quick prototypes, internal tools
const lenientLoop = createLenientLoop();
// - 2 max attempts
// - 0.6 satisfaction threshold
// - Only completeness and correctness

When to Use Reflection¶

Use Reflection When:¶

Output quality is critical
Mistakes are expensive
Task is complex or ambiguous
User trust depends on accuracy

Skip Reflection When:¶

Task is simple and well-defined
Speed matters more than quality
Output is easily verifiable otherwise
Cost/latency is a concern

Issue Types¶

Type	Description
`incomplete`	Missing required elements
`incorrect`	Factually wrong or buggy
`unclear`	Hard to understand
`inefficient`	Could be done better
`inconsistent`	Contradicts requirements
`off_topic`	Doesn't address the goal
`style`	Style or formatting issues
`security`	Security concerns
`edge_case`	Missing edge case handling

Best Practices¶

Design Good Reflection Prompts¶

Be specific about evaluation criteria
Include context from previous attempts
Request structured output format

Choose Appropriate Strictness¶

Match criteria to task importance
Consider time/cost constraints
Balance thoroughness with efficiency

Analyze Patterns¶

Track improvement trajectories
Identify common bottlenecks
Learn from failure patterns

Prevent Infinite Loops¶

Set reasonable max attempts
Detect when improvement plateaus
Allow human escalation for critical issues

Next Steps¶

In Lesson 17: Multi-Agent Coordination, we'll explore how multiple agents can work together, combining reflection with collaboration:

Agent roles and specialization
Communication protocols
Conflict resolution
Team orchestration