Lancet

An evaluation-first system for measuring agent and model behaviour with evidence — knowing whether a system actually works, not whether the demo ran.

Intelligence SystemFlagshipProduct CandidatePublic-safe

01Category

Intelligence System

02Arc Role

Product Candidate / Research Artifact

03R&D Funnel

Alpha

04Disclosure

Public-safe

What it is

Lancet is an evaluation system for agents and models. It follows a familiar discipline — the way pytest made tests a first-class part of writing software, Lancet makes evaluation a first-class part of shipping an agent. It consumes the traces other tools already produce and turns them into measured, defensible judgements about behaviour.

Problem space

Generative systems demo well and fail silently. An agent can appear to work and be wrong in ways no one sees, because the failure hides inside a fluent, confident output. Without evaluation grounded in evidence, teams ship on hope.

Arc's position

Arc's position is that the unsolved problem of the GenAI era is not capability but trust — and trust is earned by measurement, not assertion. Lancet is the instrument that measures: evidence-grounded evaluation and calibration, with the core method protected. It is where Arc's discipline of telling the truth about a system becomes a tool.

Current status

Alpha