LLM Optimization Pipelines
AutoAgents provides a composable LLM pipeline layer in autoagents-llm to optimize inference latency, reliability, and cost without changing agent code.
This feature is available through:
autoagents::llm::pipeline::PipelineBuilderautoagents::llm::optim::{CacheLayer, RetryLayer, FallbackLayer}
Enable Feature Flag
Enable the optim feature on autoagents:
autoagents = { git = "https://github.com/liquidos-ai/AutoAgents", features = ["openai", "optim"] }
Or directly on autoagents-llm:
autoagents-llm = { git = "https://github.com/liquidos-ai/AutoAgents", features = ["optim"] }
Why Pipelines
Pipelines let you keep your agent code provider-agnostic while adding operational behavior:
- Response caching to reduce repeated network calls
- Retry with backoff on transient errors
- Fallback routing to alternate providers on failure
The final built value is still an Arc<dyn LLMProvider>, so existing AgentBuilder code remains unchanged.
Basic Composition
use autoagents::llm::LLMProvider;
use autoagents::llm::optim::{CacheConfig, CacheLayer, FallbackLayer, RetryConfig, RetryLayer};
use autoagents::llm::pipeline::PipelineBuilder;
use std::sync::Arc;
use std::time::Duration;
let llm: Arc<dyn LLMProvider> = PipelineBuilder::new(primary_provider)
.add_layer(CacheLayer::new(CacheConfig {
ttl: Some(Duration::from_secs(3600)),
max_size: Some(1000),
..CacheConfig::default()
}))
.add_layer(RetryLayer::new(RetryConfig::default()))
.add_layer(FallbackLayer::new(vec![fallback_provider]))
.build();
Layer Order
Layers are applied so that the first added layer is the outermost interceptor.
For:
PipelineBuilder::new(base)
.add_layer(LayerA)
.add_layer(LayerB)
.build()
request flow is:
LayerA -> LayerB -> base provider
This is important for behavior:
- Place cache outside retry/fallback if you want cache hits to bypass all network logic.
- Place retry outside fallback if you want one global retry around the whole fallback chain.
- Place retry inside each fallback provider if you need per-provider retry policy.
- Place guardrails outermost so input is validated before cache/network work and outputs are checked after inner layers complete.
Built-in Optimization Layers
CacheLayer
CacheLayer is an in-memory cache for chat, completion, embedding, and streaming responses.
CacheConfig:
ttl: entry freshness durationmax_size: per-cache-bucket maximum entriescache_completions: enable completion cachingcache_embeddings: enable embedding cachingcache_streaming: enable stream replay caching
Behavior notes:
- Non-streaming requests use single-flight on cache miss to coalesce identical concurrent calls.
- Streaming cache stores chunks only after a successful stream completion.
chat_with_web_searchis intentionally not cached.
RetryLayer
RetryLayer adds automatic retry with exponential backoff and optional jitter.
RetryConfig:
max_attemptsinitial_backoffmax_backoffjitterretryablepredicate
Default policy retries transient/provider/network-style failures and avoids retrying deterministic errors (for example auth/invalid-request style failures).
FallbackLayer
FallbackLayer routes requests to backup providers when errors are fallbackable.
FallbackConfig:
fallbackablepredicate
Behavior notes:
- Providers are tried in declared order.
- Non-fallbackable errors stop the chain immediately.
- Fallback providers are used as passed to
FallbackLayer::new(they are not automatically wrapped by other inner pipeline layers around the primary provider).
Production Recommendations
- Set explicit
ttlandmax_sizefor predictable memory usage. - Tune retry/backoff for your provider SLOs and rate limits.
- Keep fallback providers model-compatible with your prompt/tooling expectations.
- Use structured logging/telemetry around provider failures and fallback hops.
Example Crate
A runnable end-to-end example is available at:
examples/pipeline
Run it with:
OPENAI_API_KEY=... cargo run -p pipeline-example