SteeringAPI Documentation

Learn how to control AI model behavior with fine-grained, interpretable features.

🚀 Quick Start

Get up and running with SteeringAPI in minutes. Learn the basics of feature steering and make your first API call.

📚 API Reference

Interactive OpenAPI documentation with all endpoints, request/response schemas, and try-it-out functionality.

🧠 How It Works

Deep dive into the mathematics and mechanics of SAE-based model steering from theory to implementation.

🏷️ SelfIE Labels

Learn about our automated feature labeling system that generates interpretable descriptions for SAE features.

⚡ Supported Models

Model identifiers, configuration details, and cold start handling for all supported models.

What is SteeringAPI?

SteeringAPI provides fine-grained control over AI model behavior through interpretable features extracted using Sparse Autoencoders (SAEs). Instead of relying on prompt engineering alone, you can directly manipulate high-level semantic concepts like tone, style, safety, and domain-specific knowledge.

Key Features

Interpretable Control:Steer on human-understandable concepts like "formal language" or "technical jargon"
Precise Adjustments: Fine-tune model behavior with numerical precision
Real-time Steering: Apply interventions during inference without retraining
Multi-Model Support: Steer meta-llama/Llama-3.3-70B-Instruct (always-on) and RedHatAI/gemma-3-27b-it-FP8-dynamic (serverless), each with 65k interpretable SAE features
Per-Key Rate Limits: 200 req/min for chat completions, 1000 req/min for other endpoints (per API key)

Common Use Cases

Content Moderation

Suppress harmful content by reducing activation of features related to violence, toxicity, or inappropriate topics.

Style Control

Adjust tone, formality, or personality traits to match your brand voice or use case requirements.

Domain Adaptation

Enhance or suppress domain-specific knowledge (medical, legal, technical) without fine-tuning.

Bias Mitigation

Identify and reduce unwanted biases by steering away from problematic feature activations.

Developer Notes

Steering Strength

Feature steering strength values are internally normalized per model to account for differences in SAE activation magnitudes. The scale [-1, 1] produces proportionally equivalent behavioral effects across all supported models. For best results, use moderate strengths (0.3–0.6). Values above 0.7 may cause repetitive or degraded output as the intervention overwhelms the model's natural activations.

Model Self-Identification

Some models may misidentify themselves when asked (e.g., Gemma 3 27B may claim to be "Gemma 2B"). This is a base model limitation — the model's training data predates its own release. If accurate self-identification is needed, include the model name in your system prompt.

Next Steps

1. Read the Quick Start: Get started in 5 minutes
2. Understand the concepts: Learn how steering works
3. Try it out: Sign up and start steering