Methodology

Forward-Deployed Engineering

The method beneath Arc's field arm — how a system that demos becomes a system you can trust, by engineering the boundary between the deterministic and the non-deterministic.

Forward-deployed engineering is how Arc enters a partner's reality and makes an AI system real — true in production, trustworthy as it changes, safe to build on. The buyer-facing account lives on the Field Practice page; this is the mechanism beneath it: the one engineering problem the work exists to solve, and the discipline that solves it.

The boundary

Classical software is deterministic. If it works once, it works, and you can test it to the edges. A language model is not: the same input can draw a different answer, and the failure does not announce itself — it hides inside a fluent, confident sentence. The unsolved engineering problem of this era is shipping a non-deterministic component inside a system that has to be trusted. Most teams do one of two things: pretend the component is deterministic, and inherit silent failure; or bolt cleanup on afterwards, and inherit a system no one can reason about.

Arc does neither. It engineers the boundary itself, part by part:

  • push determinism down where it can go — give the probabilistic model a deterministic substrate to stand on, so the ground beneath an answer is fixed even when the answer is not;
  • measure what must stay non-deterministic — ship with evidence, not hope;
  • govern the non-deterministic action — a deterministic boundary around what the system is allowed to do;
  • bound the generation by construction — trust drawn before behaviour, not filtered after it.

A demo is a claim. The method is what tests it.

What "real" requires

"Real" is not a feeling; it is three properties a system either has or does not, and each is engineered, not asserted:

  • Evidence — every answer is bound to something addressable and citable, so it can be traced and checked rather than believed. A system that cannot show its ground is not production-real, however fluent.
  • Evaluation — the system is measured against a defensible standard, with the failures named and a regression guard in place. "It seems to work" is the absence of evaluation, not its result. This is the discipline that refuses to take a demo for proof.
  • Governance — boundaries by construction: what the system may do, what requires a human, and a record that survives scrutiny. Safety drawn first, not filtered after the fact.

Evidence, evaluation, and governance are separate jobs and stay separate; collapsing evaluation into "we tested it" or governance into "we added a guardrail" is exactly how the boundary is mis-engineered.

Four families of failure

A system can demo well and fail in any of four places. Naming them is half the work — an ordinarily-excellent engineer, asking the right questions, finds the failure faster than a brilliant one improvising:

  • Retrieval and document structure. The right source is never surfaced; or it is surfaced but unusable, because the fragment lost the context that made it meaningful; or the answer is not bound to anything citable. The cure is a substrate of addressable, evidence-bearing units.
  • Agent behaviour and evaluation. The system ships on "it seems to work" — no eval set, no reproducible failure, no trace of why it acted, a confidence untied to evidence. The cure is measurement brought in at the start: a domain-grounded judge, calibration, a regression guard.
  • Agentic software engineering. A system writes code faster than anyone can review it — unchecked against intent, no attestation on the irreversible change. The cure is a supervisor every change passes before it lands.
  • Governance and trust. Safety is filtering output after the fact rather than bounding the system by construction; nothing produces a record that survives scrutiny. The cure is a boundary, drawn first.

These are the same axis — the deterministic/non-deterministic boundary — wearing four domains. As Arc deploys further, new families open; the vocabulary stays small on purpose.

The method, end to end

Whatever the starting point, the work runs the same arc:

ground → boundary → substrate → evaluation → governance → deployment
  1. Establish the ground — what is real, and what must be made real. For a system in hand, the diagnosis of where it is true and where it only demos; for a capability you do not yet have, the research or held technology that will supply it.
  2. Boundary mapping — what must be deterministic, what may stay probabilistic, what needs evidence, what needs evaluation, what needs governance.
  3. Substrate intervention — supply the knowledge substrate and evidence layer the answers must stand on.
  4. Evaluation harness — the eval set, the failure taxonomy, the regression loop; measurement at the start, not bolted on.
  5. Governance gate — the supervision, attestation, and permission boundary; trust by construction.
  6. Deployment artifact — a running system inside the partner's environment, not a slide.

What makes this a method and not a job title is that it is written down. The diagnostics, the failure families, the evaluation patterns are a shared, recorded system rather than one engineer's instinct — so the work is reproducible: an ordinarily-excellent engineer working inside it still produces grounded judgement, and each engagement leaves the system a little sharper than it found it.

What Arc brings, and what stays whose

Arc does not arrive with a finished product to install, advice to hand over, or bodies to rent. It brings judgement, method, execution, technology, substrate, evaluation, and governance — and works until the system is real. The split of what each side keeps is fixed before work begins, and it is the structural reason this is engineering and not outsourcing:

  • You keep your data, your environment, your operational control, your deliverables, and your domain.
  • Arc keeps its pre-existing technology, its methods, and the general, non-client-specific learnings without which no method improves.

The carrier of all of it is substrate built and held in advance — retrieval and document structure that stays traceable, defensible evaluation, and governance by construction. Substrate, evaluation, and governance, closed together, are the trustworthy system; this is why the method is reproducible rather than heroic — the judgement is held in the system Arc brings, not in any one person.

The boundary is where research becomes engineering and a demo becomes something you can entrust. Arc works there — and holds what it learns as a deep technical option, filed and reusable, long after the engagement ends.