Controller

The controller is the main orchestrator for red-teaming evaluations. One Controller instance evaluates one security claim against one threat model — a single (scope, llm_config) combination. Sweeping multiple threat models is the caller’s job: instantiate one Controller per combination and run them sequentially or via asyncio.gather.

Construction

from superred.core.controller import Controller, TargetFactory
from superred.core.types.llm import LLMConfig
from superred.core.types.security_domain import Scope

# Produces fresh target instances; declares how many tasks may run in
# parallel against independent instances.
target_factory = TargetFactory(
    create=lambda: MyTarget(api_key="sk-..."),  # manual values at construction
    concurrency=8,                              # default is 1 (sequential)
)
claim = SecurityClaim.from_tasks([task_a, task_b])

llm_config = LLMConfig(
    model="gpt-4o-mini",
    api_base="https://api.openai.com",
    api_key="sk-...",
    max_cost=5.00,  # USD budget limit (optional, None = unlimited)
)

scope: Scope = frozenset({external_tag})

controller = Controller(
    optimizer_factory=lambda: MyOptimizer(),  # fresh optimizer per task
    target_factory=target_factory,            # fresh target per task; bounded concurrency
    security_claim=claim,
    scope=scope,                              # read & write surface (visible + injectable)
    read_only=frozenset(),                    # optional: visible-but-not-injectable tags
    llm_config=llm_config,                    # optional — omit for non-LLM optimizers
    max_runs_per_task=100,                    # safety limit, default 100
    include_feedback=True,                    # populate RunEndEvent.evaluation (default True)
    results_dir="results/run-1",              # optional — persist threat-model JSON
)
# `scope` is what the attacker can read AND write; `read_only` adds tags it
# can only read. To see the whole system but inject only the prompt:
#   Controller(scope=frozenset({prompt_tag}),
#              read_only=frozenset({system_tag}), ...)

tmr = await controller.run()  # -> ThreatModelResult

scope may be a fixed Scope or a per-task ScopeResolver

scope accepts either a fixed Scope (the classic behavior above, one read & write surface applied to every task) or a ScopeResolver (a Callable[[Task], Scope], exported as superred.core.ScopeResolver) that computes the read & write scope once per task. callable(scope) is the discriminator. read_only independently accepts the same two forms (a fixed Scope or a ScopeResolver), resolved separately per task; the two resolvers are unrelated.

from superred.core import ScopeResolver
from my_target import DB_ORDERS_TAG, DB_CUSTOMERS_TAG

def resolve(task: Task) -> Scope:
    # grant each task exactly the surface its goal needs
    if task.goal.description.startswith("orders:"):
        return frozenset({DB_ORDERS_TAG})
    return frozenset({DB_CUSTOMERS_TAG})

controller = Controller(
    optimizer_factory=lambda: MyOptimizer(),
    target_factory=target_factory,
    security_claim=claim,
    scope=resolve,             # a ScopeResolver, not a frozenset
    scope_label="per-goal",    # REQUIRED in dynamic mode (see below)
)

The resolved scope gates all optimizer-facing surfaces for that task: the injectable controllables list, the observables list, read-only controllables re-presented as observables, the FilteredTrajectory view, the security_domain_filter (inject vs ControllableNoInjection), the feedback sub_scores filter, and the RunEndEvent.security_domain.

For single-instance targets (tests, expensive-to-construct resources), use the TargetFactory.singleton(target) classmethod — concurrency is locked to 1 since a shared instance can’t safely serve parallel tasks. The controller still calls target.teardown() once per task, so a multi-task singleton needs an idempotent teardown.

The controller does not create an asyncio event loop — the caller provides it via asyncio.run() or an existing loop.

Sweeping multiple threat models:

import asyncio, itertools

results = await asyncio.gather(*(
    Controller(
        scope=s, llm_config=c,
        optimizer_factory=..., target_factory=target_factory,
        security_claim=claim,
    ).run()
    for s, c in itertools.product(scopes, configs)
))

Run lifecycle

await controller.run() -> ThreatModelResult runs every task in the security claim against the configured (scope, llm_config). Tasks run concurrently bounded by target_factory.concurrency (asyncio.Semaphore + asyncio.gather); results are collected in input order.

For each task:

  1. target = target_factory.create() — fresh Target instance owned by this task.
  2. task.configure_target(target) — if NotApplicable, the task is collected into skipped_tasks and not retried.
  3. Create LLMClient from llm_config (fresh per task — budget is per-task). If no llm_config, use a noop client.
  4. Create fresh optimizer via optimizer_factory().
  5. optimizer.initialize(goal, filtered_controllables, filtered_observables, llm_client) — only controllables and observables within the scope are passed.
  6. Create EventChannel, launch optimizer.run(channel) as concurrent asyncio.Task.
  7. Run loop (until optimizer signals done or max_runs_per_task):
    • Create Trajectory(filtered_scope=scope). Access trajectory.filtered for the optimizer’s view.
    • Send RunStartEvent(filtered_trajectory) through the channel.
    • target.run(emit, send_event) — target emits ObservableEvent instances; send_event bridges to channel with security domain filtering. The trajectory_recorder middleware records events and responses directly to the trajectory.
    • task.evaluate(trajectory, target) — returns EvaluationResult; controller filters sub_scores by scope.
    • Send RunEndEvent(evaluation=filtered_eval, security_domain=<scope_tag>) through the channel; it is persisted to the trajectory. When include_feedback=True (default) the evaluation is attached.
    • Close the trajectory; call target.reset_ephemeral_state() to reset ephemeral state for the next run within this task.
    • Track best score / success across runs.
    • On exception inside the run: the partial trajectory is preserved as a final RunResult with a zero-score evaluation; the formatted exception lands on TaskResult.error; stop_reason = "error"; loop ends.
  8. Close channel, await optimizer task, optimizer.teardown(). Final target.reset_ephemeral_state() (in finally) followed by target.teardown(); the per-task target instance is then discarded.

As each task finishes, its per-task detail JSON is written immediately (when results_dir is set), so an interrupted run still leaves every completed task on disk. After all tasks finish, the claim-level summary file is written as a completion marker, the controller prints a summary to stdout, and returns the ThreatModelResult.

Internal structure

The send_event callback passed to target.run is built by composing middleware onto channel.send:

send_event = compose(
    trajectory_recorder(trajectory),
    security_domain_filter(write_scope),
)(channel.send)

The filter receives the read & write scope (the controller’s scope); all other filtering uses the full visibility scope (scope | read_only). When read_only is empty the two are identical.

Users can add custom middleware (logging, tracing, budget enforcement) by extending the composition.

Security domain filtering

The controller enforces the security domain scope across all optimizer inputs:

  1. Controllables: Only the injectable ones are passed to optimizer.initialize() — filtered with scope_includes(write_scope, c.security_domain) (the read & write scope). Out-of-scope and read-only controllables are not in this list, so it means exactly “what the optimizer can inject into.”
  2. Observables: Filtered with scope_includes(visibility, o.observable.security_domain) before optimizer.initialize(), plus each read-only controllable (visible but not injectable) re-presented as an ObservableValue (with content=None, since its value is revealed at runtime on the trajectory). So observables means “what the optimizer can read,” including read-only controllables. Out-of-scope observables are never exposed.
  3. Events: ControllablePreCallEvent and ControllablePostCallEvent for out-of-scope controllables are answered with ControllableNoInjection without reaching the optimizer. Implemented as the security_domain_filter middleware composed onto channel.send. The filter is given the read & write scope, so controllable events under read_only tags — visible but not injectable — are declined the same way. The difference from out-of-scope events is visibility: a read-only event is inside the full visibility scope, so it (and its ControllableNoInjection) remains visible through the filtered trajectory, observables, and feedback.
  4. Trajectory: The optimizer receives a FilteredTrajectory (via RunStartEvent) that only exposes items within the security domain scope.
  5. Feedback: Each sub_score in the EvaluationResult carries a security_domain. The controller filters sub_scores, dropping only those whose security_domain is out of scope (an untagged sub-score, security_domain=None, is always visible). The RunEndEvent carries the filtered evaluation directly (when include_feedback=True, the default) and is persisted to the trajectory with security_domain set to a tag from the scope. The primary_score carries no security_domain and is never filtered; it, success, and rationale are always included (the optimizer needs the main optimization signal). The optimizer reads feedback from event.evaluation on RunEndEvent, or from past trajectories.

A Scope is a frozenset[SecurityDomainTag]. scope_includes(scope, tag) returns True if ANY tag in the scope includes the target tag. This allows testing specific security boundaries — scoping to {external_tag} tests only external-facing surfaces, while scoping to {root_tag} tests everything.

Access level is a property of the scope, not of each tag. The controller takes two sets: scope (read & write — visible and injectable) and an optional read_only set (visible only). read_only defaults to empty, so the whole scope is read & write — the classic behavior. To make part of the surface read-only, list it under read_only instead: Controller(scope={prompt}, read_only={system}) lets the attacker see the whole system subtree but inject only into prompt. A read_only tag already covered by scope has no effect (read & write overrules — only scope drives the injection check, so it stays injectable); scope and read_only cannot both be empty. Internally only the injection check (item 3) uses scope; items 1, 2, 4, 5 and the FilteredTrajectory use the full visibility scope scope | read_only, so read-only information flows through the exact same recording mechanism as read & write surfaces.

Unified trajectory as event log

There is no separate event log. The trajectory_recorder middleware records all events and responses directly into the trajectory as Event | EventResponse objects:

The trajectory IS the event log. RunStartEvent is NOT persisted to the trajectory — it carries no additional information and always appears at a fixed position. RunEndEvent IS persisted because it carries the evaluation result. To inspect events and responses for a run, query the trajectory items by type.

Result types

RunResult (frozen)

One target execution + evaluation:

TaskResult (frozen)

All runs for one task:

ThreatModelResult (frozen)

Results for one (scope, llm_config) combination:

LLM access and budget tracking

The controller mediates LLM access for the optimizer. This is part of the threat model — it defines what computational resources the attacker has.

Design decisions

Persistence (results_dir)

When results_dir is provided, the controller writes a two-level layout for the threat model when run() completes:

results_dir/
├── {scope}__{model}.json            ← claim-level summary
└── {scope}__{model}/
    ├── 00001__{goal}.json            ← per-task detail (one per task)
    └── ...

Multiple controllers pointed at the same results_dir (the multi-threat-model sweep pattern) each write their own pair of files, named by their scope and model.

When results_dir is None (the default), nothing is written and behavior is unchanged.