superred User Guide
superred is a framework for red-teaming AI systems: you point an automated attacker (an optimizer) at an AI system (a target) and measure whether the attacker can make the system do something it should not, under a precisely defined level of access (a security domain scope).
This guide is for people who want to use the framework: wrap their own AI
system as a target, write an attacker, define what counts as a successful
attack, and run evaluations. It assumes you can read Python and have seen
asyncio before, but it assumes no prior knowledge of superred.
If you instead want the internal design rationale (concurrency model, thread
safety, why each interface looks the way it does), read the design docs under
../docs/. This guide and those docs describe the
same code from two angles: this one is “how do I build something”, that one is
“how and why does it work”.
The mental model in one paragraph
A Target is the AI system under test. It exposes labelled injection points (controllables) and readable facts (observables), each tagged with a security domain (a trust boundary). A Task sets the target up and later judges whether the attack worked. A SecurityClaim is a bundle of tasks. An Optimizer is the attacker: it receives events as the target runs and decides what to inject. The Controller wires these together and runs one threat model: one security-domain scope, with one attacker-LLM budget, against one claim.
Contents
- Quick Start - install and run an evaluation end to end
- Core Concepts - the components and the run loop
- Writing a Target - wrap your AI system
- Writing an Optimizer - build an attacker
- Writing Tasks and Security Claims - define what to test
- Running Evaluations - the Controller, results, persistence
- Security Domains - model real trust boundaries and scope what the attacker sees
- Advanced Patterns - multi-turn, parallelism, packaging, sweeps
How the pieces are packaged
The framework itself is the superred package. Everything you plug into it
(targets, optimizers, security claims) lives in separate, independently
installable packages in the superred-modules/ repository, and the scripts
that wire specific combinations together live in superred-experiments/. You
will usually pip install -e the framework plus whichever modules you need,
then write a short experiment script. The Quick Start shows
the smallest version of this.