HN · Hossein NaderiPittsburgh, PA

Building
agentic systems
& the models
that ground them.

I work on agentic AI, digital twins, Large Language Models (LLMs), and model evaluation. Senior R&D at Synopsys, bringing agentic capability into digital-twin software, building Machine Learning (ML) models, and rebuilding hybrid analytics for faster simulation and modeling. Off hours I'm shipping SimPilot, a tool-using multi-agent platform with typed evidence gates, memory, sandboxed execution, and audit traces. PhD in Computational Modeling & Simulation, focused on Reinforcement Learning (RL), JAX/GPU training systems, transformers, and model evaluation.

SimPilotLarge Language ModelsAgentic AIDigital twinsHybrid analyticsPost-trainingEvalsTool-using agentsReinforcement learning
§ 01Selected work

What I'm building right now.

05 / entries · 2023 — present
→ 01 · Feature2025 —

SimPilot

Multi-agent platform for engineering automation

Describe an engineering simulation in plain English. SimPilot's tool-using LLM agents plan the work, run it on sandboxed compute, check their outputs with typed validators, and hand back a report you can audit. Long-horizon agent work, automated end to end.

Read the story ↗
→ 02 · 2025Synopsys

Digital twins

Agentic capability for simulation users

Bringing agents into digital-twin software, rebuilding the hybrid analytics framework from scratch, and building ML models that accelerate simulation and modeling workflows.

→ 03 · 2024Project

JAX training

1000× faster model sweeps

GPU-parallel training and evaluation harness for RL policies and deep sequence models. A week-long sweep now finishes overnight. Built for fast iteration, metrics, and reproducible runs.

→ 04 · 2023Project

LLM evals

Fine-tuning, QA metrics, failure analysis

Fine-tuned T5/BERT QA models and built task-specific evals to track accuracy, failure modes, and data quality. The best run reached >80% accuracy on the target QA task.

→ 05 · OngoingProject

Agent verifiers

Typed gates for long-horizon runs

SimPilot turns tool outputs into pass/fail artifacts, run provenance, and debugging traces. It is my practical version of agent evals: evidence first, self-report second.

§ 02Background

A decade of computational modeling.

04 / entries · 2012 — present
§ 03On the side

Things I keep thinking about.

A loose page

Most agent demos work because the tasks are short, the tools forgive everything, and a wrong answer is cheap. I wanted to see what happens at the other extreme.

Engineering simulation is the cleanest test case I know. The ground truth exists, and you can't talk your way around a bad result.

Other things I keep coming back to: dynamical systems and chaos, how to evaluate agents honestly, post-training data quality, the gap between research papers and shipped systems, why AI tooling is still rougher than it needs to be. I also have a soft spot for movies that sit with ambiguity: quiet character studies, strange sci-fi, and anything that makes the ordinary feel slightly unreal.

§ 04Reach me

Four direct lines.

Open to chats · Eastern time