Advanced Analysis¶
Three complementary tools for deep codebase analysis: dead code detection, code distillation, and graph DSL queries. Available as MCP tools and through the HTTP API.
Dead Code Detection¶
The dead_code tool identifies unreferenced symbols, files, and modules using the cross-reference index. Results are ranked by a confidence score reflecting how likely the code is truly unused.
Analysis Levels¶
| Level | What it finds | Use case |
|---|---|---|
symbol |
Functions, classes, methods with zero external references | Cleaning up unused internal code |
file |
Files with zero importers (not entry points) | Finding abandoned modules |
module |
Directories with no external imports from outside | Identifying dead packages |
Entry Point Heuristics¶
Dead code detection automatically excludes known entry points to avoid false positives:
Auto-detected entry-point files:
main.py,__main__.py,__init__.py,setup.py,manage.py,conftest.py- Files starting with
test_or ending with_test.py
Auto-detected entry-point symbols:
- Functions named
main,setup,teardown - Symbols with framework decorators containing:
route,endpoint,fixture,command,click - Symbols defined in
__init__.py(treated as re-exports)
You can also specify additional entry points manually via the entry_points parameter.
Confidence Scoring¶
Each result includes a confidence score from 0.0 to 1.0:
| Factor | Effect |
|---|---|
Private name (leading _) |
+0.1 (base 0.8 vs 0.7) |
Defined in __init__.py |
-0.3 |
Public name (no _ prefix) |
-0.2 |
| File not modified in 180+ days | +0.1 |
Higher confidence = safer to remove. Use min_confidence to filter results.
Examples¶
MCP tool:
Sample output (symbol level):
Dead symbols detected in src/api/ (5 results):
1. [0.80] function _legacy_auth_check
src/api/security.py:142
Reason: function never called or imported; private symbol
2. [0.70] class DeprecatedHandler
src/api/handlers.py:88
Reason: class never instantiated or referenced
3. [0.70] method RequestParser._parse_legacy
src/api/parsers.py:201
Reason: method with no external callers
Sample output (file level):
Dead files detected (3 results):
1. [0.80] src/utils/_old_helpers.py
12 symbols, 0 dependencies
Reason: no files import this module; also has no imports (orphan)
2. [0.70] src/api/v1_compat.py
5 symbols, 3 dependencies
Reason: no files import this module
Sample output (module level):
Dead modules detected (2 results):
1. [0.70] src/legacy_exporters/
8 files, 34 symbols
Reason: no external imports into this directory (8 files)
2. [0.40] src/tools/deprecated/
3 files, 12 symbols
Reason: no external imports into this directory (3 files); contains entry-point files (lower confidence)
Code Distillation¶
The distill tool compresses codebase information at varying fidelity levels, optimized for different context window budgets.
Distillation Levels¶
| Level | Compression | What's included | Best for |
|---|---|---|---|
full |
~0% | Complete repo map with symbols | Full understanding, large context |
signatures |
~70% | Public API surface: function signatures, class definitions, type hints, first-line docstrings | API overview, code review |
structure |
~90%+ | File tree + import graph adjacency list | Quick orientation, small context |
File Selection¶
When files is specified, the tool includes those files plus neighbors up to depth hops via the dependency graph (max 3 hops). When files is omitted, auto-selects the most important files by PageRank score that fit within the token budget.
Examples¶
Signatures level (public API surface):
Output:
# src/auth/middleware.py
class AuthMiddleware(BaseHTTPMiddleware):
"""JWT authentication middleware."""
def __init__(self, app: ASGIApp, secret: str, algorithms: list[str] = ["HS256"]): ...
async def dispatch(self, request: Request, call_next: Callable) -> Response: ...
"""Validate JWT token and inject user into request state."""
async def verify_token(token: str, secret: str) -> dict: ...
"""Verify and decode a JWT token."""
def extract_bearer(authorization: str) -> str | None: ...
"""Extract bearer token from Authorization header."""
# src/auth/tokens.py
def create_access_token(user_id: str, expires_delta: timedelta = ...) -> str: ...
"""Create a signed JWT access token."""
def create_refresh_token(user_id: str) -> str: ...
"""Create a signed JWT refresh token."""
(4 files, level=signatures, ~380 tokens)
Structure level (maximum compression):
Output:
# File tree
src/api/__init__.py
src/api/app.py
src/api/routes/admin.py
src/api/routes/auth.py
src/api/routes/users.py
src/auth/middleware.py
src/auth/tokens.py
src/config.py
src/db/session.py
src/models/user.py
# Import graph (file -> imports)
src/api/app.py -> src/api/routes/admin.py, src/api/routes/auth.py, src/auth/middleware.py, src/config.py
src/api/routes/auth.py -> src/auth/tokens.py, src/models/user.py, src/db/session.py
src/auth/middleware.py -> src/auth/tokens.py, src/config.py, src/db/session.py
(10 files, level=structure, ~120 tokens)
Full level (delegates to repo_map):
Graph DSL¶
The graph_dsl tool provides a Cypher-inspired query language for traversing the dependency graph with depth constraints, filters, and multi-hop chains.
Syntax Reference¶
Nodes¶
- Variable:
target,caller,a,b--- matches any file, binds to a variable name - Literal path:
"src/api/app.py"--- matches a specific file - Glob pattern:
"src/api/*.py"--- matches files by glob
Edges¶
- Outbound:
node -[EDGE_TYPE]-> node - Inbound:
node <-[EDGE_TYPE]- node - Exact depth:
-[IMPORTS*3]->(exactly 3 hops) - Depth range:
-[IMPORTS*1..3]->(1 to 3 hops) - Default depth:
-[IMPORTS]->(exactly 1 hop)
Edge Types¶
| Edge type | Meaning |
|---|---|
IMPORTS |
File A imports file B |
IMPORTED_BY |
File A is imported by file B |
WHERE Clause¶
Filter results by file properties. Multiple conditions are AND-separated.
Available fields:
| Field | Type | Description |
|---|---|---|
language |
string | File language (e.g. "python", "typescript") |
line_count |
int | Number of lines |
importance |
float | PageRank importance score |
path |
string | Relative file path |
fan_in |
int | Number of files that import this file |
fan_out |
int | Number of files this file imports |
is_test |
bool | Whether the file is a test file |
is_config |
bool | Whether the file is a config file |
Available operators: =, !=, >, <, >=, <=, LIKE
RETURN Clause¶
RETURN target--- return file paths bound totargetRETURN target.language, target.line_count--- return specific fieldsRETURN COUNT--- return count of matches
Example Queries¶
1. Find all files imported by a specific file (1 hop):
2. Find transitive dependencies up to 3 hops:
3. Find all callers of a module:
4. Find Python files that import a module:
5. Find large files in the import chain:
MATCH "src/core/loop.py" -[IMPORTS*1..2]-> dep WHERE dep.line_count > 500 RETURN dep, dep.line_count
6. Multi-hop chain --- find files 2 hops downstream:
7. Find high fan-in files (popular dependencies):
8. Find test files that depend on a module:
9. Count dependencies:
10. Glob pattern matching:
Sample output: