Building and evaluating reliable AI agents and developer systems.

I work on agent infrastructure, AI-assisted software engineering, and evaluation of open-weight models on realistic developer tasks.

Current research

Latest writing