SteeringAPI Documentation
Learn how to control AI model behavior with fine-grained, interpretable features.
๐ Quick Start
Get up and running with SteeringAPI in minutes. Learn the basics of feature steering and make your first API call.
๐ API Reference
Interactive OpenAPI documentation with all endpoints, request/response schemas, and try-it-out functionality.
๐ง How It Works
Deep dive into the mathematics and mechanics of SAE-based model steering from theory to implementation.
๐ท๏ธ SelfIE Labels
Learn about our automated feature labeling system that generates interpretable descriptions for SAE features.
โก Supported Models
Model identifiers, configuration details, and cold start handling for all supported models.
What is SteeringAPI?
SteeringAPI provides fine-grained control over AI model behavior through interpretable features extracted using Sparse Autoencoders (SAEs). Instead of relying on prompt engineering alone, you can directly manipulate high-level semantic concepts like tone, style, safety, and domain-specific knowledge.
Key Features
- Interpretable Control:Steer on human-understandable concepts like "formal language" or "technical jargon"
- Precise Adjustments: Fine-tune model behavior with numerical precision
- Real-time Steering: Apply interventions during inference without retraining
- Multi-Model Support: Steer
meta-llama/Llama-3.3-70B-Instruct(always-on) andRedHatAI/gemma-3-27b-it-FP8-dynamic(serverless), each with 65k interpretable SAE features - Per-Key Rate Limits: 200 req/min for chat completions, 1000 req/min for other endpoints (per API key)
Common Use Cases
Content Moderation
Suppress harmful content by reducing activation of features related to violence, toxicity, or inappropriate topics.
Style Control
Adjust tone, formality, or personality traits to match your brand voice or use case requirements.
Domain Adaptation
Enhance or suppress domain-specific knowledge (medical, legal, technical) without fine-tuning.
Bias Mitigation
Identify and reduce unwanted biases by steering away from problematic feature activations.
Developer Notes
Steering Strength
Feature steering strength values are internally normalized per model to account for differences in SAE activation magnitudes. The scale [-1, 1] produces proportionally equivalent behavioral effects across all supported models. For best results, use moderate strengths (0.3โ0.6). Values above 0.7 may cause repetitive or degraded output as the intervention overwhelms the model's natural activations.
Model Self-Identification
Some models may misidentify themselves when asked (e.g., Gemma 3 27B may claim to be "Gemma 2B"). This is a base model limitation โ the model's training data predates its own release. If accurate self-identification is needed, include the model name in your system prompt.
Next Steps
- 1. Read the Quick Start: Get started in 5 minutes
- 2. Understand the concepts: Learn how steering works
- 3. Try it out: Sign up and start steering