Build Notes

Project stories and technical notes

Multi-agent systemsAI evaluationProduction AIResearch methodsFounder engineering

Reader-first notesTechnical depth, clear boundariesFiltering: Recruitment AI

Peter McCann Strain working at a laptop on a sunny Glasgow balcony — From the writing desk

All AI Evaluation Computer Vision Founder Engineering LLM Cost Optimization Legal Infrastructure Medical AI Multi-Agent Systems Production AI Recruitment AI Research Methods

Aurum at SThree: Enterprise Agentic AI Engineering

Multi-Agent Systems • AI Evaluation

Start here

Start with this

Aurum at SThree: Enterprise Agentic AI Engineering

12 min read

An employer-IP-bounded case study on Aurum at SThree: enterprise agentic AI engineering, reviewable state, fairness diagnostics, red-team testing, production control, and the self-improving-evaluator research that moved scoring quality from a 0.627 baseline to 0.851 without learning the benchmark.

Demo video coming soon

Enter article

Guided entry

Removing the LLM From the Hot Path: Building SThree’s Skills Taxonomy Pipeline

Production AI • LLM Cost Optimization

13 min

Removing the LLM From the Hot Path: Building SThree’s Skills Taxonomy Pipeline

A 30-experiment programme that replaced a per-skill LLM validator with a zero-LLM retrieval-and-filter cascade and a 7 kB classifier head — 180× cheaper, ~100× faster, and within ~2pp of the model it replaced.

18 May 2026