Real-time voice operations
Conversico (Greeta voice agent)
AI receptionist venture with bounded voice workflows, escalation logic, tenant privacy, and launch gates.
Multi-agent systems
Aurum at SThree shows enterprise agentic AI engineering in a high-stakes domain: orchestration, evaluator architecture, self-improvement research, reviewable state, fairness diagnostics, red-team layers, and production control. Personal Assistant AI is the private daily-use counterpart. Conversico (with Greeta as the voice agent) adds real-time voice operations.
Real-time voice operations
AI receptionist venture with bounded voice workflows, escalation logic, tenant privacy, and launch gates.
Enterprise agentic AI
SThree employer work with controlled tests, evaluator architecture, reviewable state, fairness/red-team systems, and self-improvement research.
8 specialists · memory OS
A private multi-agent second brain with capture, durable recall, review, action, and quality monitoring.
Full recruitment lifecycle
Candidate matching, outreach, scheduling, AI voice interviews, enrichment, and reporting.
Evaluator self-improvement research
GEPA prompt evolution, MultiStep evaluators, leakage audits, and Exp5 validation transfer.
P0–P12 patterns
A study of around 13 orchestration patterns, from single-pass pipelines to multi-step agentic planning.
Zero LLM calls
SThree CV skill tagging shows when the right move is removing the LLM from the hot path entirely.
Detailed plan
A human-in-the-loop impact-assessment architecture; exploration-stage strategy, not a delivered system.
Conversico shows multi-agent judgement under time pressure with Greeta as the voice agent: the system has to recover missed calls, support bookings, handle uncertainty gracefully, and preserve practice-level privacy.
Personal Assistant AI is the independent counterpoint to Aurum: not recruitment safety, but a private assistant case study for capture, memory, review, action, and proactive support.
Peter uses multi-agent systems when decomposition improves control, reliability, or the reviewer’s ability to understand what happened. In Aurum, the public architecture story is deliberately bounded because it is SThree employer work and the implementation remains employer IP.
The production question is how to debug, evaluate, and govern the system after it leaves a notebook. Aurum treats runtime state as a first-class product concern.
A separate Aurum research layer shows Peter can build agentic AI evaluation architecture, not just orchestration. It studied reflective prompt evolution against a synthetic proxy oracle across a frozen 215-candidate snapshot.
The deep-research benchmark shows the same instinct outside recruitment: orchestration claims need a controlled comparison, a shared tool layer, judge-reliability checks, and bounded interpretation.
The Skills Taxonomy work is the counterweight to agent-count storytelling. It shows Peter can decide when the right production move is retrieval, filters, calibration, and a small head rather than more LLM calls.
Continue exploring
Private assistant case study: memory, proactive operation, quality monitoring, and product boundaries.
OpenIn-production, going-to-market voice operations case study (Greeta voice agent) with product boundaries and demo video coming soon.
OpenFull public article on Aurum as enterprise agentic AI at SThree.
OpenDSPy, GEPA, MultiStep evaluators, synthetic proxy-oracle validation, and claim discipline.
OpenAround 13 orchestration patterns, a three-judge reliability panel, citation audits, and equivalence testing.
OpenSThree production AI economics: retrieval, filters, calibration, and a tiny head.
OpenBroader senior AI engineering context.
OpenWhere the agentic architecture becomes audit infrastructure.
Open