01
A typed phase loop, not vibes
The agent moves through 23 numbered lifecycle phases. Each has a Zod-typed evidence gate that acts like an eval-style verifier, so it can't move forward without producing the artifact the next phase needs. Because every phase has to produce its evidence, a long-horizon run stays auditable from start to finish.
02
Typed protocol between intent and execution
User intent lives in a mutable spec. The moment it's committed, it freezes into an immutable input package that every downstream tool consumes. Upstream of the freeze, anything can change; downstream, every change is auditable. Prompt-invalidation tracking catches stale assumptions.
03
Sandboxed tool execution, never on the host
Every tool runs inside a Docker image with a strict workspace mount and a command-policy admission gate. Same images locally via Docker Compose, same images in production on AWS Batch / ECS — one debugging surface in both places.
04
A Guardian for when things break
Real tools fail constantly. The Guardian subsystem collects forensic evidence on failure and spawns specialist subagents — a debugger, a UI verifier, a report reviewer — to diagnose and propose a fix. Every recovery is journaled, so retries don't quietly re-introduce the same bug.
05
Memory that means something
A durable memory layer stores domain gotchas, validated precedents, and project patterns. The agent stops relearning the same lesson on every project, which compounds across long-horizon work.
06
Deep research, not just chat
A supervisor / researcher multi-agent loop reads technical literature, vendor docs, and the agent's own memory, then synthesizes a grounded answer that streams back with provenance pointing at real sources.