Getting Started

Cloud Provider

FastLLM uses autoagents-llm providers behind LlmGateway.

use fastllm::{LlmGateway, LlmMessage, LlmRequest, ModelRoute, ProviderConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let gateway = LlmGateway::new();
    gateway.register_provider_config(ProviderConfig::from_env("openai", "gpt-4o-mini"))?;

    let response = gateway
        .chat(LlmRequest::new(
            ModelRoute::new("openai", "gpt-4o-mini"),
            vec![LlmMessage::user("What is the capital of France?")],
        ))
        .await?;

    println!("{}", response.text);
    Ok(())
}

Run the included example:

OPENAI_API_KEY=sk-... cargo run -p fastllm-example-hello-world

Request Options

LlmRequest carries routing and scheduling metadata:

route: provider and model.
request_id: optional ID used in deadline errors.
deadline_ms: per-request scheduler timeout.
priority: reserved for scheduler ordering.
cache: enables or disables prompt-cache lookup for the request.
parameters: output-affecting parameters included in the cache key.
provider_parameters: provider-only metadata excluded from the cache key.

Local Provider

Local llama.cpp support is feature-gated:

cargo test --workspace --all-targets --features local

Use register_llamacpp_model with a ModelConfig that includes model_path when the local feature is enabled.

Run the local examples with the default Hugging Face GGUF model:

cargo run -p fastllm-example-local-model-inference

cargo run -p fastllm-example-parallel-local-inference

The default is unsloth/Qwen3.5-9B-GGUF with Qwen3.5-9B-Q4_0.gguf. Set FASTLLM_GGUF_MODEL, FASTLLM_GGUF_MODEL_A, or FASTLLM_GGUF_MODEL_B to use local files.

Run SDK behavior examples without external services:

cargo run -p fastllm-example-scheduler-showcase
cargo run -p fastllm-example-memory-management

Cloud Provider​

Request Options​

Local Provider​

Cloud Provider

Request Options

Local Provider