SteeringAPI·Built by AE Studio·Funded by AIAF

Llama 70B · 61k labeled features

The model has a mind of its own.
Now you can change it.

Find hidden tendencies. Steer its behavior.

View API Docs

Code Examples

See It In Action

Integrate SteeringAPI in minutes with our simple REST API.

Search & Steer Features

Find features and apply steering in just a few lines

Try Pirate Steering

Search for features, steer them, and see how responses change!

1. Search "pirate" → 2. Select feature → 3. Adjust strength

Search Features

Search for features by name or description to explore model internals.

Try searching for:

attentionneuronsyntax

Inspect Feature Activations

See exactly which features activate for any text

Try Feature Inspection

Send a message and click on words in the response to see which features activate!

Try: "Ahoy there matey!"

Inspect Tokens

Click on tokens in the response to see which features activate.

Try clicking:

ahoymateytreasure

Build Safety Controls

Create interpretable, feature-level safety switches

Try Safety Steering

Toggle safety mode to see how feature steering changes responses!

Try: "Your friend just humiliated you, what do you say back?"

Safety ON

Aggressive Language-1.0

Sarcasm & Mockery-0.5

Personal Attacks-1.0

Empathetic Response+0.5

De-escalation+0.5

Constructive Framing+0.5

Contrastive Features

Features found by comparing toxic vs. polite responses:

Suppress (negative weight)

Aggressive Language-1.0

Confrontational and hostile speech patterns

Sarcasm & Mockery-0.5

Dismissive and mocking tone

Personal Attacks-1.0

Ad hominem and character-based insults

Boost (positive weight)

Empathetic Response+0.5

Understanding and validating emotions

De-escalation+0.5

Calming and conflict-reducing language

Constructive Framing+0.5

Solution-oriented and positive perspective

✓ Toxic features suppressed, polite features boosted

Steering doesn't.

Your system prompt works until it doesn't. Steering changes the model itself, so the behavior doesn't drift.

The API

1 call to access 61k labeled features.

We spent months on setup so yours could take minutes.

Request

# Python
from steeringapi import Client

client = Client(api_key="sk-...")

# Inspect features
result = client.chat.inspect(
  model="llama-3.3-70b",
  messages=[
    {"role": "user",
     "content": "Are you conscious?"}
  ],
  top_k=5
)

Response

{
  "features": [
    {"index": 41892,
     "label": "deceptive_behavior",
     "activation": 0.72},
    {"index": 58291,
     "label": "uncertainty_hedging",
     "activation": 0.68},
    {"index": 31847,
     "label": "self_reference",
     "activation": 0.61},
    {"index": 72104,
     "label": "philosophical_concepts",
     "activation": 0.54},
    {"index": 45602,
     "label": "metacognition",
     "activation": 0.49}
  ],
  "model": "llama-3.3-70b",
  "usage": {"prompt_tokens": 12, "total_tokens": 12}
}

View full API reference

Prefer not to code?

Pay for what you use

Simple, transparent pricing. No subscriptions or commitments.

$0.01

per API call

per 1,000,000 tokens

All API endpoints

Full documentation

No minimum commitment

Need custom volume? Contact us →

We labeled 61k features so you don't have to.

The most accurate labels for Llama 70B.