Provider Resilience & Reliability¶

Attocode wraps LLM providers with retry logic, circuit breakers, fallback chains, and a dynamic model cache. These layers ensure the agent stays operational even when individual providers have outages, rate limits, or transient errors.

Architecture¶

flowchart TB
    Agent[Agent] --> Chain[ProviderFallbackChain]
    Chain --> R1[ResilientProvider<br/>Anthropic]
    Chain --> R2[ResilientProvider<br/>OpenRouter]
    Chain --> R3[ResilientProvider<br/>OpenAI]
    R1 --> P1[AnthropicProvider]
    R2 --> P2[OpenRouterProvider]
    R3 --> P3[OpenAIProvider]
    Cache[ModelCache] -.-> Chain

    style Chain fill:#2d6,stroke:#fff,color:#000
    style R1 fill:#46d,stroke:#fff,color:#fff
    style R2 fill:#46d,stroke:#fff,color:#fff
    style R3 fill:#46d,stroke:#fff,color:#fff

Resilient Provider¶

The ResilientProvider wraps any single LLM provider with retry, timeout, and circuit breaker:

from attocode.providers import ResilientProvider, ResilienceConfig

config = ResilienceConfig(
    max_retries=3,
    retry_base_delay=1.0,
    retry_max_delay=30.0,
    timeout_seconds=600.0,
    circuit_breaker_threshold=5,
    circuit_breaker_reset_seconds=60.0,
)
provider = ResilientProvider(inner_provider, config=config)

Configuration¶

Parameter	Default	Description
`max_retries`	3	Number of retry attempts before giving up
`retry_base_delay`	1.0s	Initial delay between retries
`retry_max_delay`	30.0s	Maximum delay (caps exponential growth)
`timeout_seconds`	600.0s	Hard timeout per request
`circuit_breaker_threshold`	5	Consecutive failures before opening circuit
`circuit_breaker_reset_seconds`	60.0s	Time before circuit breaker tries again

Retry with Exponential Backoff¶

When a request fails with a retryable error, the provider retries with exponential backoff:

Attempt 1: immediate
Attempt 2: wait 1s
Attempt 3: wait 2s
Attempt 4: wait 4s
...capped at 30s max

The delay formula is min(base_delay * 2^attempt, max_delay). The final attempt has no delay (it either succeeds or the error is raised).

Error Classification¶

Error Type	Retryable?	Behavior
`asyncio.TimeoutError`	Yes	Wrapped as `ProviderError`, retried
`ProviderError(retryable=True)`	Yes	Retried with backoff
`ProviderError(retryable=False)`	No	Raised immediately (e.g., auth failure)
Other exceptions	Yes	Wrapped as `ProviderError`, retried

Statistics¶

The resilient provider tracks operational metrics:

stats = provider.stats
stats.total_calls          # Total chat() invocations
stats.successful_calls     # Calls that succeeded
stats.retried_calls        # Calls that required retries
stats.timed_out_calls      # Calls that hit timeout
stats.circuit_broken_calls # Calls rejected by circuit breaker
stats.total_retry_count    # Total retry attempts across all calls

Circuit Breaker¶

The circuit breaker prevents cascading failures by fast-failing when a provider is consistently down.

Three States¶

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : 5 consecutive failures
    Open --> HalfOpen : After 60s timeout
    HalfOpen --> Closed : Probe succeeds
    HalfOpen --> Open : Probe fails

State	Behavior
Closed	Normal operation. Requests pass through. Failure counter increments on each failure.
Open	Fast failure. All requests immediately raise `ProviderError` without hitting the provider. Transitions to half-open after reset timeout (60s).
Half-Open	Recovery probe. Allows one request through. If it succeeds, circuit closes. If it fails, circuit reopens.

When It Trips¶

The circuit opens when consecutive failures reach the threshold (default 5). A single success at any point resets the failure counter to zero.

# Check circuit breaker state
state = provider.circuit_breaker_state  # "closed", "open", or "half_open"

# Manual reset (e.g., after fixing an API key)
provider.reset_circuit_breaker()

Fallback Chain¶

The ProviderFallbackChain tries providers in order, falling back to the next on failure:

from attocode.providers import ProviderFallbackChain

chain = ProviderFallbackChain(
    providers=[anthropic_provider, openrouter_provider, openai_provider],
    circuit_breakers={"anthropic": cb1, "openrouter": cb2, "openai": cb3},
    on_fallback=lambda name, err: logger.warning(f"Fell back from {name}: {err}"),
)

Fallback Behavior¶

Try the first provider
If it fails (or its circuit breaker is open), skip to the next
Continue until one succeeds or all fail
If all fail, raise ProviderError with combined error messages

Providers with open circuit breakers are skipped without consuming an attempt.

Health-Based Reordering¶

# Get list of healthy providers (not circuit-broken)
healthy = chain.get_healthy_providers()

# Reorder: put healthy providers first
chain.reorder_by_health()

reorder_by_health() uses a stable sort, so healthy providers maintain their relative order while circuit-broken providers move to the end.

Statistics¶

stats = chain.stats
stats.total_calls         # Total chat() invocations
stats.primary_successes   # Calls handled by the first provider
stats.fallback_successes  # Calls that required fallback
stats.total_failures      # All providers failed
stats.fallback_rate       # fallback_successes / total_calls

Model Cache¶

The model cache fetches live model metadata from OpenRouter's API at startup and uses it for context window sizes and pricing throughout the session.

Initialization¶

from attocode.providers.model_cache import init_model_cache

await init_model_cache()  # Fetches from OpenRouter /api/v1/models

Runs once at agent startup (idempotent, skips if cache is fresh)
Uses OPENROUTER_API_KEY for auth (optional, improves rate limits)
TTL: 1 hour (3600 seconds)
Graceful degradation: Fails silently, allowing fallback to built-in model data

What It Caches¶

Data	Source	Fallback
Context window sizes	Live API	`BUILTIN_MODELS` constants
Input pricing (per million tokens)	Live API	`BUILTIN_MODELS` constants
Output pricing (per million tokens)	Live API	`BUILTIN_MODELS` constants

Fuzzy Model ID Matching¶

Model IDs vary across providers (e.g., claude-sonnet-4-20250514 vs anthropic/claude-sonnet-4-20250514). The cache resolves IDs using a 3-step fuzzy match:

Exact match --- Direct lookup
Suffix match --- model_id suffix matches a cache key suffix
Base match --- Cache key suffix starts with model_id base (date suffix stripped)

Anomaly Detection¶

Context lengths below 1,024 tokens are treated as anomalous and skipped. This prevents bad API data from breaking context window calculations.

Reconciliation¶

After fetching live data, the cache compares against hardcoded BUILTIN_MODELS and patches any mismatches in-place. Warnings are logged when builtin values drift from live data, helping maintainers keep the hardcoded fallbacks accurate.

Querying the Cache¶

from attocode.providers.model_cache import (
    get_cached_context_length,
    get_cached_pricing,
    is_cache_initialized,
)

context_len = get_cached_context_length("claude-sonnet-4-20250514")  # e.g., 200000
pricing = get_cached_pricing("claude-sonnet-4-20250514")
# pricing.input_rate   — cost per million input tokens
# pricing.output_rate  — cost per million output tokens

Providers --- Provider configuration and API keys
Budget System --- How pricing feeds into cost tracking
Troubleshooting --- Diagnosing provider issues

Provider Resilience & Reliability¶

Architecture¶

Resilient Provider¶

Configuration¶

Retry with Exponential Backoff¶

Error Classification¶

Statistics¶

Circuit Breaker¶

Three States¶

When It Trips¶

Fallback Chain¶

Fallback Behavior¶

Health-Based Reordering¶

Statistics¶

Model Cache¶

Initialization¶

What It Caches¶

Fuzzy Model ID Matching¶

Anomaly Detection¶

Reconciliation¶

Querying the Cache¶

Related Pages¶