Quick Start Guide

Get started with SteeringAPI in minutes. Learn how to make your first API call and steer model behavior.

Step 1: Get Your API Key

After signing up, navigate to the API Keys page in your dashboard:

  1. Go to API Keys in your dashboard
  2. Click "Create API Key"
  3. Give your key a descriptive name
  4. Copy and securely store your API key
Important
Your API key will only be shown once. Store it securely and never commit it to version control.

Rate Limits & Pricing

Rate limits are applied per API key. Each API key has its own rate limit bucket, so one key hitting the limit won't affect your other keys.

EndpointRate Limit
/v1/chat/completions200 requests/minute
/v1/payments/*30 requests/minute
All other endpoints1000 requests/minute

When you exceed the rate limit, you'll receive a 429 Too Many Requests response with a retry_after field indicating how many seconds to wait.

Pricing

  • $0.01 per API call
  • $0.000001 per token (input + output)
Need Higher Limits?
If you need higher rate limits for production use cases, please contact us to discuss enterprise plans.

Step 2: Install the SDK

Python

pip install vllm-sdk

For Python, install the vLLM SDK. For JavaScript/Node.js, you can use the native fetch API (no additional packages required).

Step 3: Make Your First Request

Basic Chat Completion

Start with a simple chat completion without steering:

import asyncio
from vllm_sdk import VLLMClient, ChatMessage

async def main():
    async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
        response = await client.chat_completions(
            model="meta-llama/Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Tell me about the ocean")
            ],
        )
        print(response.choices[0].message.content)

asyncio.run(main())

Step 4: Add Feature Steering

Now let's add steering to control the model's behavior. We'll use feature 99 which represents "pirate speech patterns":

import asyncio
from vllm_sdk import VLLMClient, ChatMessage, Variant

async def main():
    async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
        variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
        variant.add_intervention(feature_id=99, strength=0.5, mode="add")

        response = await client.chat_completions(
            model=variant,
            messages=[
                ChatMessage(role="user", content="Tell me about the ocean")
            ],
        )
        print(response.choices[0].message.content)
        # Output: "Arr, the ocean be a vast body of water..."

asyncio.run(main())
Understanding Steering Values
  • Positive values (0.1 - 1.0): Amplify the feature's effect
  • Negative values (-1.0 - 0): Suppress the feature's effect
  • Typical range: -0.5 to 0.5 for most use cases
  • Experiment: Start small and adjust based on results

Step 5: Discover Features

Use the Feature Search API to find relevant features for your use case:

import requests

response = requests.post(
    "https://api.goodfire.ai/v1/features/search",
    headers={"X-API-Key": "<YOUR_API_KEY>"},
    json={
        "query": "formal academic writing",
        "model_name": "meta-llama/Llama-3.3-70B-Instruct",
        "top_k": 10
    }
)

features = response.json()["data"]
for feature in features:
    print(f"ID: {feature['id']}, Label: {feature['label']}")

You can also browse features interactively in the Feature Search dashboard.

Step 6: Combine Multiple Features

You can steer on multiple features simultaneously for fine-grained control:

import asyncio
from vllm_sdk import VLLMClient, ChatMessage, Variant

async def main():
    async with VLLMClient(api_key="<YOUR_API_KEY>") as client:
        variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
        variant.add_intervention(feature_id=1234, strength=0.3, mode="add")   # Increase technical detail
        variant.add_intervention(feature_id=5678, strength=-0.2, mode="add")  # Reduce jargon
        variant.add_intervention(feature_id=9012, strength=0.4, mode="add")   # Add enthusiasm

        response = await client.chat_completions(
            model=variant,
            messages=[
                ChatMessage(role="user", content="Explain quantum computing")
            ],
        )
        print(response.choices[0].message.content)

asyncio.run(main())

Next Steps