Attocode Roadmap¶

v0.2.16 -- Install Probing, Portable Bundles & Supply-Chain Hardening (Released 2026-04-05)¶

~~Installed-target runtime probing~~ -- DONE: attocode code-intel probe-install validates file-based MCP installs by launching the configured stdio command, resolving ${workspaceFolder}, and optionally exercising project_summary
~~Portable code-intel bundles~~ -- DONE: attocode code-intel bundle export / bundle inspect package local artifacts with metadata, hashes, and version stamping for offline transfer and debugging
~~Shared install config resolution~~ -- DONE: ResolvedInstallSpec and resolve_install_spec() unify JSON/TOML/YAML-backed assistant config parsing for install, status, and probe flows
~~Supply-chain malware detection expansion~~ -- DONE: new security_scan anti-patterns for invisible Unicode payloads, eval-on-decoded-data obfuscation, dynamic require() assembly, and string-based timer execution
~~Install-hook auditing~~ -- DONE: dependency audit now flags suspicious preinstall/install/postinstall scripts and shares matcher behavior between local and DB-backed scanners

~~Language-specific symbol extraction~~ -- DONE: 11 new tree-sitter configs (Erlang, Clojure, Perl, Crystal, Dart, OCaml, F#, Julia, Nim, R, Objective-C); total 36 languages supported
~~Architecture analysis fallback~~ -- DONE: directory-based module detection when dependency graph is sparse; 22 repos improved from 2/5 to 4/5
~~Search ranking improvements~~ -- DONE: graduated symbol boosting, multi-term coverage, path relevance, non-code penalty; MRR +23%, NDCG +16%
~~Lazy embedding initialization~~ -- DONE: embedding model loads on first semantic_search() call, not on construction; bootstrap latency 20s → 0.8s
~~3-way benchmark expansion~~ -- DONE: 19 → 49 repos benchmarked across 30+ languages
~~FK constraint fix~~ -- DONE: save files before symbols in ast_service.py

~~Embedding-based semantic search~~ -- DONE: hybrid vector + BM25 with RRF; two-stage retrieval
~~Persistent index across instances~~ -- DONE: SQLite-backed IndexStore with incremental updates
~~Progressive hydration~~ -- DONE: adaptive tier-based indexing (small/medium/large/huge); skeleton init <2s for any repo; background hydration thread; on-demand gap filling
~~3-way benchmark (20 repos)~~ -- DONE: grep vs ast-grep vs code-intel across 12 languages; code-intel 4.7/5 quality
~~New MCP tools~~ -- DONE: hydration_status tool; indexing_depth param on bootstrap; mode param on semantic_search
Ground truth expansion -- add YAML files for 15+ more benchmark repos (currently 5)
Go-specific search improvements -- Go MRR 0.200 lags Python 0.725; index package docs, use module paths
ast-grep integration -- optional structural pattern searches alongside tree-sitter parsing

~~BM25 keyword index disk cache~~ -- DONE: SQLite cache with incremental mtime-based updates; 8x speedup on cockroach-scale repos (20s → 2.5s warm)
~~Trigram pre-filtering for BM25~~ -- DONE: narrows candidate docs before scoring; zero accuracy loss (full corpus IDF preserved)
~~Numpy-accelerated vector search~~ -- DONE: BLAS matmul replaces Python loop; 183x speedup at 10K vectors; in-memory cache with version invalidation; pure Python fallback
~~Frecency-boosted search~~ -- DONE: SQLite-backed file access scoring with exponential decay; frecency_search MCP tool with two-phase file ordering
~~Fuzzy search (Smith-Waterman)~~ -- DONE: typo-resistant search via local sequence alignment; fuzzy_search, fuzzy_filename_search, fuzzy_score tools
~~Cross-mode search suggestions~~ -- DONE: "did you mean" fallbacks between filename and content search
~~Query constraints (fff-style)~~ -- DONE: git:modified, !pattern, path/, *.ext filters with git porcelain XY parsing
~~Query history & combo boosting~~ -- DONE: SQLite-backed query-to-file tracking; 3+ selections activate combo boost
~~Code-intel testing infrastructure~~ -- DONE: fixtures, mocks, helpers; 9 tool test modules
~~Overall benchmark improvement~~ -- DONE: 35% faster (4,182ms → 2,731ms avg), quality stable at 4.7/5

Cross-repo search in org -- aggregate embeddings across repositories, org-scoped vector queries
Better git integration -- commit graph exploration, blame-weighted hotspots, PR-aware analysis
Cross-service analysis -- detect API contracts (OpenAPI, gRPC), map service-to-service calls
More tests -- integration tests for all 40 MCP tools, Playwright E2E, target 60% coverage
Full MCP feature parity -- ensure all tools work in all modes (local, remote, service)
Better offline mode:
Offline embedding fallback (auto-switch to local model like all-MiniLM-L6-v2)
Pre-computed analysis bundles (.attocode-bundle export for air-gapped use)
Offline learning sync (queue locally, sync on reconnect)
Git-based offline analysis (blame, history, branches via pygit2 without DB)

Swarm mode update to loops -- migrate from DAG-based to loop-based execution architecture
Swarm using code intel -- bootstrap orientation, impact analysis for task scoping, cross-refs for merge conflicts, learning system for per-repo patterns