Portfolio
Lancet
An evaluation-first system for measuring agent and model behaviour with evidence — knowing whether a system actually works, not whether the demo ran.
Intelligence SystemFlagshipProduct CandidatePublic-safe
01Category
Intelligence System
02Arc Role
Product Candidate / Research Artifact
03R&D Funnel
Alpha
04Disclosure
Public-safe
01
What it is
Lancet is an evaluation system for agents and models. It follows a familiar discipline — the way pytest made tests a first-class part of writing software, Lancet makes evaluation a first-class part of shipping an agent. It consumes the traces other tools already produce and turns them into measured, defensible judgements about behaviour.
02
Problem space
Generative systems demo well and fail silently. An agent can appear to work and be wrong in ways no one sees, because the failure hides inside a fluent, confident output. Without evaluation grounded in evidence, teams ship on hope.
03
Arc's position
Arc's position is that the unsolved problem of the GenAI era is not capability but trust — and trust is earned by measurement, not assertion. Lancet is the instrument that measures: evidence-grounded evaluation and calibration, with the core method protected. It is where Arc's discipline of telling the truth about a system becomes a tool.
04
Current status
Alpha