About
I build and evaluate AI systems for software developers, with a focus on agent reliability, tool use, context management, and practical AI infrastructure.
My current research examines how open-weight models perform on realistic engineering tasks: whether they can complete working changes, use tools correctly, operate within local hardware constraints, recover from failures, and recognize when human intervention is needed.
My current project, The 64GB Frontier, evaluates open-weight coding models running locally on a 64GB M1 Max.