LLM Guardrails
autoagents-guardrails provides policy-driven request/response safety controls for any Arc<dyn LLMProvider>.
It supports:
- Input guards (prompt checks)
- Output guards (response checks)
- Configurable enforcement policy (
Block,Sanitize,Audit) - Direct wrapper mode and
LLMLayermode forPipelineBuilder
Basic Usage
use std::sync::Arc;
use autoagents::guardrails::{
EnforcementPolicy, Guardrails,
guards::{PromptInjectionGuard, ToxicityGuard},
};
use autoagents::llm::LLMProvider;
use autoagents::llm::pipeline::PipelineBuilder;
let base: Arc<dyn LLMProvider> = build_provider();
let guardrails = Guardrails::builder()
.input_guard(PromptInjectionGuard::default())
.output_guard(ToxicityGuard::default())
.enforcement_policy(EnforcementPolicy::Block)
.build();
let llm: Arc<dyn LLMProvider> = PipelineBuilder::new(base)
.add_layer(guardrails.layer())
.build();
Policy Semantics
Block: fail fast on violations.Sanitize: rewrite payload to a redacted safe form and continue.Audit: log violation and continue.
Global policy is set with enforcement_policy(...). You can override policy for
specific guards with:
input_guard_with_policy(...)output_guard_with_policy(...)
Custom Sanitizers
You can override sanitize behavior with custom functions:
use autoagents::guardrails::{EnforcementPolicy, Guardrails, GuardedOutput};
let guardrails = Guardrails::builder()
.enforcement_policy(EnforcementPolicy::Sanitize)
.output_sanitizer(|output, _violation, _ctx| {
if let GuardedOutput::Completion(c) = output {
c.text = "[custom sanitized]".to_string();
}
})
.build();
Built-in sanitizer helpers are available in autoagents::guardrails::sanitizers:
default_input_sanitizer()default_output_sanitizer()redact_input_payload(...)redact_output_payload(...)redact_output_text_only_payload(...)noop_input_sanitizer()noop_output_sanitizer()
Built-in Input Redaction Guard
RegexPiiRedactionGuard redacts common PII in input payloads (email, phone,
SSN, and card-like numbers) before provider calls:
use autoagents::guardrails::{Guardrails, guards::RegexPiiRedactionGuard};
let guardrails = Guardrails::builder()
.input_guard(RegexPiiRedactionGuard::default())
.build();
Streaming Behavior
Streaming calls use pre-flight and post-aggregate checks:
- Input guards run before stream starts.
- Output guards run once after the stream fully completes.
- For blocked outputs, a final stream error is emitted.
Extending With New Guards
Implement InputGuard or OutputGuard:
use async_trait::async_trait;
use autoagents::guardrails::{
GuardContext, GuardDecision, GuardError, GuardedInput, InputGuard,
};
struct MyInputGuard;
#[async_trait]
impl InputGuard for MyInputGuard {
fn name(&self) -> &'static str { "my-input-guard" }
async fn inspect(
&self,
input: &mut GuardedInput,
_ctx: &GuardContext,
) -> Result<GuardDecision, GuardError> {
let _ = input;
Ok(GuardDecision::Pass)
}
}
Cloud Moderation / External Providers
For cloud moderation services, implement InputGuard and/or OutputGuard
that call your provider SDK or HTTP API and return GuardDecision.