01
A typed phase loop, not vibes
The agent moves through 23 numbered lifecycle phases. Each has a Zod-typed evidence gate, so it can't move forward without producing the artifact the next phase needs. No hidden state, no fuzzy checkpoint — a long-horizon run stays auditable end to end.
02
Typed protocol between intent and execution
User intent lives in a mutable spec. The moment it's committed, it freezes into an immutable input package that every downstream tool consumes. Upstream of the freeze, anything can change; downstream, every change is auditable. Prompt-invalidation tracking catches stale assumptions.
03
Sandboxed tool execution, never on the host
Every tool runs inside a Docker image with a strict workspace mount and a command-policy admission gate. Same images locally via Docker Compose, same images in production on AWS Batch / ECS — one debugging surface in both places.
04
A Guardian for when things break
Real tools fail constantly. The Guardian subsystem collects forensic evidence on failure and spawns specialist subagents — a debugger, a UI verifier, a report reviewer — to diagnose and propose a fix. Every recovery is journaled, so retries don't quietly re-introduce the same bug.
05
Memory that means something
A durable memory layer stores domain gotchas, validated precedents, and project patterns. The agent stops relearning the same lesson on every project — a small idea with a large effect on long-horizon competence.
06
Deep research, not just chat
A supervisor / researcher multi-agent loop reads technical literature, vendor docs, and the agent's own memory, and synthesizes a grounded answer streamed back with provenance — not hallucinated citations.