Quick Start Guide
Get started with SteeringAPI in minutes. Learn how to make your first API call and steer model behavior.
Step 1: Get Your API Key
After signing up, navigate to the API Keys page in your dashboard:
- Go to API Keys in your dashboard
- Click "Create API Key"
- Give your key a descriptive name
- Copy and securely store your API key
Important
Rate Limits & Pricing
Rate limits are applied per API key. Each API key has its own rate limit bucket, so one key hitting the limit won't affect your other keys.
| Endpoint | Rate Limit |
|---|---|
/v1/chat/completions | 200 requests/minute |
/v1/payments/* | 30 requests/minute |
| All other endpoints | 1000 requests/minute |
In addition to per-minute rate limits, there is a concurrency limit of 24 simultaneous requestsper model. If you send more than 24 concurrent requests to the same model, excess requests are queued for up to 120 seconds. If no slot opens, you'll receive a 429 response. For batch workloads, we recommend limiting to 10-15 concurrent requests and waiting for each response before sending the next batch.
When you exceed the rate limit, you'll receive a 429 Too Many Requests response with a retry_after field indicating how many seconds to wait.
Pricing
- $0.01 per API call
- $0.000001 per token (input + output)
Need Higher Limits?
Context Window
Both models have a 4,096 token context window. This is the total limit for input (system prompt + conversation history) and output (completion tokens) combined.
- Llama 3.3 70B โ 4,096 tokens total
- Gemma 3 27B โ 4,096 tokens total
Requests exceeding the context window will receive a 400 error with the token count and limit.
Step 2: Install the SDK
Python
pip install vllm-sdkFor Python, install the vLLM SDK. For JavaScript/Node.js, you can use the native fetch API (no additional packages required).
Step 3: Make Your First Request
Basic Chat Completion
Start with a simple chat completion without steering:
import asyncio
from vllm_sdk import VLLMClient, ChatMessage
async def main():
async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
response = await client.chat_completions(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
ChatMessage(role="user", content="Tell me about the ocean")
],
)
print(response.choices[0].message.content)
asyncio.run(main())Supported Models
All examples use Llama 3.3 70B. To use Gemma 3 27B instead, replace the model identifier:
- Llama 3.3 70B:
meta-llama/Llama-3.3-70B-Instruct(always-on, instant responses) - Gemma 3 27B:
RedHatAI/gemma-3-27b-it-FP8-dynamic(serverless, 4-10 min cold start on first request)
All endpoints, features, and steering modes work identically across both models. See Supported Models for full configuration details.
Step 4: Add Feature Steering
Now let's add steering to control the model's behavior. We'll use feature 99 which represents "pirate speech patterns":
import asyncio
from vllm_sdk import VLLMClient, ChatMessage, Variant
async def main():
async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
variant.add_intervention(feature_id=99, strength=0.5, mode="add")
response = await client.chat_completions(
model=variant,
messages=[
ChatMessage(role="user", content="Tell me about the ocean")
],
)
print(response.choices[0].message.content)
# Output: "Arr, the ocean be a vast body of water..."
asyncio.run(main())Understanding Steering Values
- Positive values (0.1 - 1.0):Amplify the feature's effect
- Negative values (-1.0 - 0):Suppress the feature's effect
- Typical range: -0.5 to 0.5 for most use cases
- Experiment: Start small and adjust based on results
Step 5: Discover Features
Use the Feature Search API to find relevant features for your use case:
import requests
response = requests.post(
"https://api.goodfire.ai/v1/features/search",
headers={"X-API-Key": "<YOUR_API_KEY>"},
json={
"query": "formal academic writing",
"model_name": "meta-llama/Llama-3.3-70B-Instruct",
"top_k": 10
}
)
features = response.json()["data"]
for feature in features:
print(f"ID: {feature['id']}, Label: {feature['label']}")You can also browse features interactively in the Feature Search dashboard.
Step 6: Combine Multiple Features
You can steer on multiple features simultaneously for fine-grained control:
import asyncio
from vllm_sdk import VLLMClient, ChatMessage, Variant
async def main():
async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
variant.add_intervention(feature_id=1234, strength=0.3, mode="add") # Increase technical detail
variant.add_intervention(feature_id=5678, strength=-0.2, mode="add") # Reduce jargon
variant.add_intervention(feature_id=9012, strength=0.4, mode="add") # Add enthusiasm
response = await client.chat_completions(
model=variant,
messages=[
ChatMessage(role="user", content="Explain quantum computing")
],
)
print(response.choices[0].message.content)
asyncio.run(main())Next Steps
๐ API Documentation
Interactive OpenAPI docs with all endpoints, schemas, and try-it-out features.
๐ง How It Works
Understand the mathematics and mechanics behind feature steering.
๐ท๏ธ SelfIE Labels
Learn about automated feature labeling and interpretability.
๐ Feature Search
Discover and explore the full library of interpretable features.
๐ฌ Interactive Playground
Try steering features interactively in our chat interface.