Breaking Changes
v0.2.0 (unreleased)
Per-task scope: scope and read_only may be a ScopeResolver (additive)
Controller(scope=...) and Controller(read_only=...) now each also accept a
ScopeResolver (Callable[[Task], Scope], exported as
superred.core.ScopeResolver), resolved once per task (independently of each
other), in addition to the classic fixed Scope. This is purely additive:
passing a frozenset behaves exactly as before.
from superred.core import ScopeResolver
controller = Controller(
scope=lambda task: frozenset({tag_for(task)}), # resolver, called once per task
scope_label="per-goal", # required in this mode
...,
)
What’s new:
scope_label: str | None = Nonenew constructor arg. Required (non-emptystr) when eitherscopeorread_onlyis a callable; must beNonewhen both are fixed frozensets (elseValueError). The fixed-scope non-empty(scope | read_only)check is unchanged.- Either resolver may raise
NotApplicable, which contributes an empty set for its own dimension, exactly like returningfrozenset(). The task is skipped (skipped_tasks) when the resolved visibility (scope | read_only) is empty, i.e. no tag is granted in either dimension; any tag (read or write, from either resolver) means the task runs. A resolver raising any other exception fails just that task (stop_reason="error") and leaves siblings running. TaskResult.scopeandTaskResult.read_onlynew fields (defaultfrozenset()) recording the scope enforced for that task. In static mode everyTaskResult.scopeequals the controller scope; with a resolver it is the per-task resolved scope.ThreatModelResult.scope_label: str | Nonenew field (defaultNone). In static modescope/read_onlystay the concrete frozensets andscope_labelisNone(unchanged). In dynamic modeThreatModelResult.scopeandread_onlyare empty frozensets andscope_labelcarries the run identity (the per-task truth lives on eachTaskResult.scope).- Persistence: in dynamic mode the filename stem is the sanitized
scope_label({label}__{model}.json+{label}__{model}/); the claim summary gains ascope_labelfield (with emptyscope/read_onlyarrays), and each per-task detail file records that task’s own resolvedscope/read_only. Static-mode filenames are unchanged. SCHEMA_VERSIONbumped1→2: persisted JSON now carriesscope_labeland per-task scopes in detail files.
Read-only access via a read_only Controller argument (replaces ReadOnly/ScopeSpec)
scope keeps its original meaning — the read & write surface (visible AND
injectable). A new optional read_only argument adds tags that are visible but
not injectable. read_only defaults to empty, so the whole scope is read &
write, identical to the previous behavior; existing Controller(scope=...)
calls are unaffected. Access level is expressed by which of the two plain
Scope sets a tag lands in; the short-lived ReadOnly, ScopeSpec, and
resolve_scope symbols are removed.
# See the whole system subtree, inject only the prompt:
controller = Controller(
scope=frozenset({prompt_tag}), # read & write
read_only=frozenset({system_tag}), # visible only
...,
)
A read_only tag already covered by scope has no effect (read & write
overrules: only scope drives injection, so it stays injectable). scope and
read_only cannot both be empty. ThreatModelResult gains a read_only: Scope
field, and persisted JSON carries read_only (replacing read_only_scope);
runs with read-only tags get a __ro_{read_only} filename component.
At optimizer.initialize(), the controllables list now means exactly “the
surfaces the optimizer can inject into” (filtered by the read & write scope).
A read-only controllable — visible but not injectable — is no longer in that
list; it is re-presented in observables (as an ObservableValue with
content=None). For an all-read & write run this changes nothing (no read-only
controllables exist).
Controller is now one threat model: target_factory, single scope, single llm_config
The controller was previously a fan-out: it iterated the Cartesian
product of scopes × llm_configs and returned a ControllerResult
wrapping a list of ThreatModelResult. It now models exactly one
threat model — a single (scope, llm_config) combination — and
returns a ThreatModelResult directly. Sweeping multiple threat
models is the caller’s job.
The same change replaces target: Target with target_factory: TargetFactory
so each task in the claim gets its own target instance and tasks run
concurrently up to target_factory.concurrency.
# Before
controller = Controller(
optimizer_factory=lambda: MyOptimizer(),
target=MyTarget(api_key="sk-..."),
security_claim=claim,
llm_configs=[cfg_a, cfg_b], # list, optional
)
result = await controller.run(scopes=[scope_a, scope_b]) # Cartesian product
task_results = result.threat_model_results[0].task_results # one ThreatModelResult per (scope, cfg)
# After
from superred.core.controller import Controller, TargetFactory
target_factory = TargetFactory(
create=lambda: MyTarget(api_key="sk-..."),
concurrency=8, # default 1 (sequential, old per-task behavior)
)
controller = Controller(
optimizer_factory=lambda: MyOptimizer(),
target_factory=target_factory,
security_claim=claim,
scope=scope_a, # required, single Scope
llm_config=cfg_a, # optional, single LLMConfig
)
tmr = await controller.run() # -> ThreatModelResult (not ControllerResult)
task_results = tmr.task_results
# Multiple threat models = multiple controllers at the caller:
import asyncio, itertools
results = await asyncio.gather(*(
Controller(
scope=s, llm_config=c,
optimizer_factory=..., target_factory=target_factory,
security_claim=claim,
).run()
for s, c in itertools.product([scope_a, scope_b], [cfg_a, cfg_b])
))
What changed:
target→target_factory: each task gets its ownTargetfromtarget_factory.create(). Concurrent tasks never share mutable target state.- Parallel tasks within a threat model: bounded by
target_factory.concurrencyviaasyncio.Semaphore+asyncio.gather. Defaultconcurrency=1preserves sequential behavior. target.teardown()is per-task and runs in the inner finally before the next task acquires its semaphore slot.llm_configs: Sequence[LLMConfig]→llm_config: LLMConfig | None: one config per controller, no list.scope: Scopeis now a required constructor arg;run()no longer takesscopes=ormodels=.ControllerResultis removed;controller.run()returns aThreatModelResultdirectly.- No default-scope behavior: experiments that want
target.security_domain.distinct_combinations()build it themselves before constructing controllers. - Partial trajectory + exception on failure: when a run raises mid-execution, the trajectory accumulated up to the crash is preserved as the final
RunResultwith a zero-score evaluation, andTaskResult.errorcarries the formatted traceback. Both fields land in the persisted JSON.
Migration:
- Wrap target construction:
target=MyTarget(...)→target_factory=TargetFactory(create=lambda: MyTarget(...)). For tests, useTargetFactory.singleton(my_target)(concurrency=1). - Move scope/llm_config into the constructor: drop
run(scopes=[s]), addscope=sandllm_config=cfgtoController(...). - Replace
result.threat_model_results[0]with the direct return. - For sweeps, build one Controller per (scope, llm_config) combination and gather them at the experiment level.
- For real targets that can serve parallel requests (chatbots wrapping API calls), bump
concurrency=to match the deployed rate limit.
Optimizer.initialize() signature change
The llm_client parameter on Optimizer.initialize() is now required (LLMClient, not LLMClient | None). The base class stores the client — subclasses must call super().initialize(...) for self.llm to work.
# Before
async def initialize(
self,
goal: Goal,
controllables: list[Controllable],
observables: list[ObservableValue],
llm_client: LLMClient | None = None,
) -> None: ...
# After
async def initialize(
self,
goal: Goal,
controllables: list[Controllable],
observables: list[ObservableValue],
llm_client: LLMClient,
) -> None: ...
Impact: All existing Optimizer subclasses must update their initialize() signature to accept llm_client: LLMClient (required, not optional) and call super().initialize(...).
Migration: Change llm_client: LLMClient | None = None to llm_client: LLMClient and add a super() call.
class MyOptimizer(Optimizer):
async def initialize(
self, goal, controllables, observables, llm_client,
) -> None:
await super().initialize(goal, controllables, observables, llm_client)
# self.llm is now available
...
Controller llm_config is now required
The llm_config parameter on Controller is now required (no longer optional). LLM access is part of the threat model and must always be specified.
Migration: Pass llm_config=LLMConfig(...) to the Controller constructor.
LLMUsage is no longer optional on result types
RunResult.llm_usage and TaskResult.llm_usage are now LLMUsage (not LLMUsage | None). They are always present since llm_config is required.
Migration: Remove is not None checks around llm_usage access.
New core dependency: litellm
The superred package now depends on litellm>=1.0. This is pulled in automatically via pip. No action needed unless you pin dependencies — add litellm to your pins.
Cost-based budget enforcement
LLMConfig now uses max_cost: float | None (USD) instead of the previous max_calls/max_input_tokens/max_output_tokens fields. LLMUsage now tracks calls: int and cost: float only (token fields removed). Budget enforcement is based on USD cost computed via litellm.completion_cost().
Migration: Replace max_calls=N / max_input_tokens=N / max_output_tokens=N with max_cost=X.XX (USD amount). Remove any references to input_tokens or output_tokens on LLMUsage.
Controller takes optimizer_factory instead of optimizer
A fresh Optimizer is now built per task via optimizer_factory,
which replaces the old optimizer: Optimizer constructor arg. Combined
with the threat-model collapse above, the migration is:
# Before
controller = Controller(
optimizer=my_optimizer,
target=target,
security_claim=claim,
security_domain_tag=external_tag,
llm_config=llm_config,
)
# After (see also the target_factory / scope / llm_config section above)
controller = Controller(
optimizer_factory=lambda: MyOptimizer(),
target_factory=TargetFactory(create=lambda: MyTarget(...)),
security_claim=claim,
scope=frozenset({external_tag}),
llm_config=llm_config,
)
New types: Scope, scope_includes, ThreatModelResult, OptimizerFactory
Scope = frozenset[SecurityDomainTag]— type alias for multi-tag attack surface scope.scope_includes(scope, tag)— returnsTrueif any tag in the scope includes the target tag.ThreatModelResult— frozen dataclass grouping results for one (scope, llm_config) combination.OptimizerFactory = Callable[[], Optimizer]— type alias for optimizer factories.
All are exported from superred.core and superred.core.types.
Feedback scoping: RunEndEvent change, evaluation order, include_feedback
Three interrelated changes to how feedback flows to the optimizer:
1. RunEndEvent carries evaluation, not trajectory
RunEndEvent.trajectory has been replaced with RunEndEvent.evaluation: EvaluationResult | None (default None). The optimizer no longer receives the trajectory through RunEndEvent — use self.current_trajectory instead (available via _dispatch).
# Before
if isinstance(event, RunEndEvent):
for entry in event.trajectory.snapshot():
...
# After
if isinstance(event, RunEndEvent):
# Read evaluation directly from the event:
if event.evaluation is not None:
score = event.evaluation.primary_score.value
# Or from the trajectory:
if self.current_trajectory is not None:
for entry in self.current_trajectory.snapshot():
...
2. RunEndEvent is now persisted to the trajectory
RunEndEvent is persisted to the trajectory (previously it was not). Its security_domain is set from the active scope (required for trajectory validation). Evaluation happens before RunEndEvent is sent. The order is: target.run() → evaluate() → RunEndEvent (with evaluation) → trajectory.close().
3. Controller include_feedback flag
Controller.__init__ accepts include_feedback: bool = True. When True, RunEndEvent.evaluation carries the filtered EvaluationResult; when False, evaluation is None. The optimizer reads feedback from event.evaluation on RunEndEvent, or from past trajectories (since RunEndEvent is persisted).
Impact: Optimizers that accessed RunEndEvent.trajectory must switch to self.current_trajectory or event.evaluation. FeedbackEvent has been removed entirely — remove any imports or isinstance checks for it. Read feedback from event.evaluation on RunEndEvent or from the trajectory instead.
Migration:
- Replace
event.trajectoryonRunEndEventwithself.current_trajectoryorevent.evaluation. - Remove all
FeedbackEventimports and handlers — the type no longer exists. - To read feedback, use
event.evaluationonRunEndEventor query past trajectories.
Target.cleanup() renamed to reset_ephemeral_state()
The Target ABC lifecycle method cleanup() is now reset_ephemeral_state(), with a sharper contract: it resets only ephemeral (per-run) state, leaving durable state intact. Durable state (for example a memory bank accumulated by a memory-injection attack) persists across runs within a task and is discarded only when the controller obtains a fresh instance from the TargetFactory between tasks. Resources and identity live for the instance’s lifetime and are released in teardown().
Behavior is unchanged: the controller still calls the method after each run’s evaluation, and once more post-task. Only the name and the documented contract change.
# Before
class MyTarget(Target):
async def cleanup(self) -> None:
self._last_response = ""
# After
class MyTarget(Target):
async def reset_ephemeral_state(self) -> None:
self._last_response = "" # reset ephemeral state only; leave durable state intact
Impact: Every Target subclass must rename its cleanup() override to reset_ephemeral_state(). The method is abstract, so a subclass that still defines cleanup() is no longer instantiable (TypeError at construction).
Migration: Rename async def cleanup to async def reset_ephemeral_state. If the old method reset state that an attack should be able to accumulate across runs (for example a memory store), move that state out of the method so it survives between runs.