OSS Demo Playbook: Code-Intel Across Codex, Claude, and Cursor¶

This playbook demonstrates Attocode's MCP codebase-intelligence tools on large open-source repositories using three agent lanes:

Codex CLI
Claude Code
Cursor Composer 1.5

The goal is not "which model wins." The goal is to show practical gains from structured code intelligence in large repos, especially for cheap/fast lanes.

What This Includes¶

Fixed benchmark manifest: eval/oss_demo/run_manifest.yaml
Prompt packet generator: python -m eval.oss_demo prepare
Results validator: python -m eval.oss_demo validate-results
Comparative report generator: python -m eval.oss_demo summarize

Repositories (Wave 1)¶

microsoft/vscode (TypeScript monorepo)
django/django (Python framework)
pytorch/pytorch (Python + C++)

Wave 2 roadmap is included in manifest: Kubernetes, Next.js, Airflow.

Step 1: Verify MCP Setup¶

uv run attocode code-intel status

If needed:

uv run attocode code-intel install codex --global
uv run attocode code-intel install claude --global
uv run attocode code-intel install cursor

Step 2: Generate Standardized Task Packets¶

python -m eval.oss_demo prepare \
  --manifest eval/oss_demo/run_manifest.yaml \
  --out eval/oss_demo/generated \
  --run-id big3-oss-demo-001 \
  --include-ablation

This creates per-agent/per-repo markdown packets plus a JSONL results template.

Step 3: Run Subagent-Style Exploration in Parallel¶

Use three lanes in parallel, one per repo archetype:

Lane A (vscode): TS monorepo exploration and impact tracing.
Lane B (django): Python framework analysis + with/without code-intel ablation.
Lane C (pytorch): cross-language dependency/risk analysis.

In Attocode, this can be done with /spawn for each lane. In Codex/Claude/Cursor, run equivalent task packets in parallel sessions.

Step 4: Fill `results.jsonl`¶

Populate rows for each {agent, repo, task, mode} with:

status
time/cost/tool-call fields
five rubric scores (0-5)
evidence paths

Then validate:

python -m eval.oss_demo validate-results \
  --manifest eval/oss_demo/run_manifest.yaml \
  --results eval/oss_demo/results.jsonl

Step 5: Build the Comparative Report¶

python -m eval.oss_demo summarize \
  --manifest eval/oss_demo/run_manifest.yaml \
  --results eval/oss_demo/results.jsonl \
  --out eval/oss_demo/report.md

Use eval/oss_demo/report_template.md when you want a manually narrated version for publishing.

Suggested "Convince People" Storyline¶

Show one failure mode without code-intel (slow/noisy search).
Show the same task with bootstrap + relevant_context + impact_analysis.
Quantify quality/time delta on at least one repo.
Emphasize ROI: cheap/fast agent + strong structure tools > blind traversal.

Codespaces Roadmap¶

Current flow is local-first. For portable demonstrations:

Add repo-local devcontainers for the target OSS repos.
Re-run the same packets in GitHub Codespaces.
Compare drift in latency and reproducibility versus local execution.