FastLLM
FastLLM is a Rust SDK for routing LLM requests through one gateway. It supports
cloud providers through autoagents-llm and local GGUF models through the
optional autoagents-llamacpp runtime.
The SDK is configured with typed Rust structs. There is no YAML or TOML config surface in the current API.
Core Pieces
LlmGatewayis the SDK entry point.ModelRouteselects a provider and model.GatewayConfigcontrols scheduling, cache, retry, and local memory budget.ExecutionSchedulerapplies queue limits, per-route concurrency, and request deadlines.PromptCachestores completed chat responses with TTL eviction.RetryPipelineapplies retry and fallback policy.ModelRegistrytracks local model residency, TTL, and memory accounting.
Install
[dependencies]
fastllm = { path = "crates/fastllm" }
Enable local llama.cpp support when needed:
[dependencies]
fastllm = { path = "crates/fastllm", features = ["local"] }
Start Here
Examples
fastllm-example-hello-world: OpenAI provider request.fastllm-example-scheduler-showcase: scheduler, cache, and telemetry withEchoProvider.fastllm-example-memory-management: local model memory-budget admission.fastllm-example-local-model-inference: single local GGUF model.fastllm-example-parallel-local-inference: two local GGUF routes executed concurrently.