SteeringAPI Documentation

Learn how to control AI model behavior with fine-grained, interpretable features.

What is SteeringAPI?

SteeringAPI provides fine-grained control over AI model behavior through interpretable features extracted using Sparse Autoencoders (SAEs). Instead of relying on prompt engineering alone, you can directly manipulate high-level semantic concepts like tone, style, safety, and domain-specific knowledge.

Key Features
  • Interpretable Control: Steer on human-understandable concepts like "formal language" or "technical jargon"
  • Precise Adjustments: Fine-tune model behavior with numerical precision
  • Real-time Steering: Apply interventions during inference without retraining
  • 100k+ Features: Access a vast library of interpretable features across multiple models

Common Use Cases

Content Moderation

Suppress harmful content by reducing activation of features related to violence, toxicity, or inappropriate topics.

Style Control

Adjust tone, formality, or personality traits to match your brand voice or use case requirements.

Domain Adaptation

Enhance or suppress domain-specific knowledge (medical, legal, technical) without fine-tuning.

Bias Mitigation

Identify and reduce unwanted biases by steering away from problematic feature activations.

Next Steps

  1. 1. Read the Quick Start: Get started in 5 minutes
  2. 2. Understand the concepts: Learn how steering works
  3. 3. Try it out: Sign up and start steering