FastLLM

FastLLM is a Rust SDK for routing LLM requests through one gateway. It supports cloud providers through autoagents-llm and local GGUF models through the optional autoagents-llamacpp runtime.

The SDK is configured with typed Rust structs. There is no YAML or TOML config surface in the current API.

Core Pieces

LlmGateway is the SDK entry point.
ModelRoute selects a provider and model.
GatewayConfig controls scheduling, cache, retry, and local memory budget.
ExecutionScheduler applies queue limits, per-route concurrency, and request deadlines.
PromptCache stores completed chat responses with TTL eviction.
RetryPipeline applies retry and fallback policy.
ModelRegistry tracks local model residency, TTL, and memory accounting.

Install

[dependencies]
fastllm = { path = "crates/fastllm" }

Enable local llama.cpp support when needed:

[dependencies]
fastllm = { path = "crates/fastllm", features = ["local"] }

Start Here

Examples

fastllm-example-hello-world: OpenAI provider request.
fastllm-example-scheduler-showcase: scheduler, cache, and telemetry with EchoProvider.
fastllm-example-memory-management: local model memory-budget admission.
fastllm-example-local-model-inference: single local GGUF model.
fastllm-example-parallel-local-inference: two local GGUF routes executed concurrently.

Core Pieces​

Install​

Start Here​

Examples​

Core Pieces

Install

Start Here

Examples