Quick Start Guide

Get started with SteeringAPI in minutes. Learn how to make your first API call and steer model behavior.

Step 1: Get Your API Key

After signing up, navigate to the API Keys page in your dashboard:

  1. Go to API Keys in your dashboard
  2. Click "Create API Key"
  3. Give your key a descriptive name
  4. Copy and securely store your API key
Important
Your API key will only be shown once. Store it securely and never commit it to version control.

Rate Limits & Pricing

Rate limits are applied per API key. Each API key has its own rate limit bucket, so one key hitting the limit won't affect your other keys.

EndpointRate Limit
/v1/chat/completions200 requests/minute
/v1/payments/*30 requests/minute
All other endpoints1000 requests/minute

In addition to per-minute rate limits, there is a concurrency limit of 24 simultaneous requestsper model. If you send more than 24 concurrent requests to the same model, excess requests are queued for up to 120 seconds. If no slot opens, you'll receive a 429 response. For batch workloads, we recommend limiting to 10-15 concurrent requests and waiting for each response before sending the next batch.

When you exceed the rate limit, you'll receive a 429 Too Many Requests response with a retry_after field indicating how many seconds to wait.

Pricing

  • $0.01 per API call
  • $0.000001 per token (input + output)
Need Higher Limits?
If you need higher rate limits for production use cases, please contact us to discuss enterprise plans.

Context Window

Both models have a 4,096 token context window. This is the total limit for input (system prompt + conversation history) and output (completion tokens) combined.

  • Llama 3.3 70B โ€” 4,096 tokens total
  • Gemma 3 27B โ€” 4,096 tokens total

Requests exceeding the context window will receive a 400 error with the token count and limit.

Step 2: Install the SDK

Python

pip install vllm-sdk

For Python, install the vLLM SDK. For JavaScript/Node.js, you can use the native fetch API (no additional packages required).

Step 3: Make Your First Request

Basic Chat Completion

Start with a simple chat completion without steering:

import asyncio
from vllm_sdk import VLLMClient, ChatMessage

async def main():
    async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
        response = await client.chat_completions(
            model="meta-llama/Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Tell me about the ocean")
            ],
        )
        print(response.choices[0].message.content)

asyncio.run(main())
Supported Models

All examples use Llama 3.3 70B. To use Gemma 3 27B instead, replace the model identifier:

  • Llama 3.3 70B: meta-llama/Llama-3.3-70B-Instruct (always-on, instant responses)
  • Gemma 3 27B: RedHatAI/gemma-3-27b-it-FP8-dynamic (serverless, 4-10 min cold start on first request)

All endpoints, features, and steering modes work identically across both models. See Supported Models for full configuration details.

Step 4: Add Feature Steering

Now let's add steering to control the model's behavior. We'll use feature 99 which represents "pirate speech patterns":

import asyncio
from vllm_sdk import VLLMClient, ChatMessage, Variant

async def main():
    async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
        variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
        variant.add_intervention(feature_id=99, strength=0.5, mode="add")

        response = await client.chat_completions(
            model=variant,
            messages=[
                ChatMessage(role="user", content="Tell me about the ocean")
            ],
        )
        print(response.choices[0].message.content)
        # Output: "Arr, the ocean be a vast body of water..."

asyncio.run(main())
Understanding Steering Values
  • Positive values (0.1 - 1.0):Amplify the feature's effect
  • Negative values (-1.0 - 0):Suppress the feature's effect
  • Typical range: -0.5 to 0.5 for most use cases
  • Experiment: Start small and adjust based on results

Step 5: Discover Features

Use the Feature Search API to find relevant features for your use case:

import requests

response = requests.post(
    "https://api.goodfire.ai/v1/features/search",
    headers={"X-API-Key": "<YOUR_API_KEY>"},
    json={
        "query": "formal academic writing",
        "model_name": "meta-llama/Llama-3.3-70B-Instruct",
        "top_k": 10
    }
)

features = response.json()["data"]
for feature in features:
    print(f"ID: {feature['id']}, Label: {feature['label']}")

You can also browse features interactively in the Feature Search dashboard.

Step 6: Combine Multiple Features

You can steer on multiple features simultaneously for fine-grained control:

import asyncio
from vllm_sdk import VLLMClient, ChatMessage, Variant

async def main():
    async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
        variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
        variant.add_intervention(feature_id=1234, strength=0.3, mode="add")   # Increase technical detail
        variant.add_intervention(feature_id=5678, strength=-0.2, mode="add")  # Reduce jargon
        variant.add_intervention(feature_id=9012, strength=0.4, mode="add")   # Add enthusiasm

        response = await client.chat_completions(
            model=variant,
            messages=[
                ChatMessage(role="user", content="Explain quantum computing")
            ],
        )
        print(response.choices[0].message.content)

asyncio.run(main())

Next Steps