Skip to main content

Getting Started

Cloud Provider

FastLLM uses autoagents-llm providers behind LlmGateway.

use fastllm::{LlmGateway, LlmMessage, LlmRequest, ModelRoute, ProviderConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let gateway = LlmGateway::new();
gateway.register_provider_config(ProviderConfig::from_env("openai", "gpt-4o-mini"))?;

let response = gateway
.chat(LlmRequest::new(
ModelRoute::new("openai", "gpt-4o-mini"),
vec![LlmMessage::user("What is the capital of France?")],
))
.await?;

println!("{}", response.text);
Ok(())
}

Run the included example:

OPENAI_API_KEY=sk-... cargo run -p fastllm-example-hello-world

Request Options

LlmRequest carries routing and scheduling metadata:

  • route: provider and model.
  • request_id: optional ID used in deadline errors.
  • deadline_ms: per-request scheduler timeout.
  • priority: reserved for scheduler ordering.
  • cache: enables or disables prompt-cache lookup for the request.
  • parameters: output-affecting parameters included in the cache key.
  • provider_parameters: provider-only metadata excluded from the cache key.

Local Provider

Local llama.cpp support is feature-gated:

cargo test --workspace --all-targets --features local

Use register_llamacpp_model with a ModelConfig that includes model_path when the local feature is enabled.

Run the local examples with the default Hugging Face GGUF model:

cargo run -p fastllm-example-local-model-inference

cargo run -p fastllm-example-parallel-local-inference

The default is unsloth/Qwen3.5-9B-GGUF with Qwen3.5-9B-Q4_0.gguf. Set FASTLLM_GGUF_MODEL, FASTLLM_GGUF_MODEL_A, or FASTLLM_GGUF_MODEL_B to use local files.

Run SDK behavior examples without external services:

cargo run -p fastllm-example-scheduler-showcase
cargo run -p fastllm-example-memory-management