Hossein Naderi

§ 01By the numbers

The shape of the system.

04 / facts

155+

Tools registered

23

Lifecycle phases

3

LLM providers

15+

Internal packages

§ 02What's actually interesting

Six design ideas doing most of the work.

06 / patterns

01

A typed phase loop, not vibes

The agent moves through 23 numbered lifecycle phases. Each has a Zod-typed evidence gate, so it can't move forward without producing the artifact the next phase needs. No hidden state, no fuzzy checkpoint — a long-horizon run stays auditable end to end.

02

Typed protocol between intent and execution

User intent lives in a mutable spec. The moment it's committed, it freezes into an immutable input package that every downstream tool consumes. Upstream of the freeze, anything can change; downstream, every change is auditable. Prompt-invalidation tracking catches stale assumptions.

03

Sandboxed tool execution, never on the host

Every tool runs inside a Docker image with a strict workspace mount and a command-policy admission gate. Same images locally via Docker Compose, same images in production on AWS Batch / ECS — one debugging surface in both places.

04

A Guardian for when things break

Real tools fail constantly. The Guardian subsystem collects forensic evidence on failure and spawns specialist subagents — a debugger, a UI verifier, a report reviewer — to diagnose and propose a fix. Every recovery is journaled, so retries don't quietly re-introduce the same bug.

05

Memory that means something

A durable memory layer stores domain gotchas, validated precedents, and project patterns. The agent stops relearning the same lesson on every project — a small idea with a large effect on long-horizon competence.

06

Deep research, not just chat

A supervisor / researcher multi-agent loop reads technical literature, vendor docs, and the agent's own memory, and synthesizes a grounded answer streamed back with provenance — not hallucinated citations.

§ 03Stack

Boring tech, where boring means proven.

6 / layers

Web

Next.js 16 · React 19 · AI SDK v7 · tRPC · Zod · Tailwind

Compute

AWS Batch / ECS · Docker sandboxes · Vercel Workflows · Redis

Data

Postgres · Drizzle ORM · S3 / MinIO · Better Auth

Solvers

OpenFOAM · SU2 · CalculiX · Gmsh · FreeCAD · Trame / PyVista

Observability

OpenTelemetry · Sentry · Langfuse

Billing

Stripe (usage-based credits)

§ 04Why I'm building it

The cleanest test case I know.

Manifesto

Most agent demos work because the tasks are short, the tools forgive everything, and a wrong answer is cheap. I wanted to see what happens at the other extreme.

Engineering automation is the cleanest test case I know. The ground truth exists, and you can't talk your way around a bad result.

SimPilot is my bet on what a system like that actually needs. The protocol is typed where most demos rely on free-form prompts. Memory is durable instead of stuffed into context. Tools run in sandboxes rather than on trust. The multi-agent topology is scoped narrowly enough that each agent does one job. The domain is engineering simulation, but I care more about the shape of the system underneath.

A multi-agent systemfor engineering automation.