Mark Gerrard
Reliability

Reliability is a budget, not a feature

14 February 2026 · 12 min

The interesting failures are never the model. They are the seams between systems — the retry that wasn’t idempotent, the timeout nobody set, the row that two processes wrote at once.

When we first put an agent in front of our CRM enrichment pipeline, it worked in the demo and fell over in week two. It quietly wrote a wrong industry code to a few hundred accounts because an upstream call had degraded and the model confidently filled the gap. No exception, no alert. The system did exactly what we told it to, which was the problem.

Treat reliability as spend

Every nine of reliability has a price, and you pay it in engineering time and latency, not in cleverness. You decide the budget up front, per workflow. A draft-email agent can be loose. Anything that writes to the system of record gets a verification pass and a hard confidence floor, and you accept that it will sometimes refuse to act. Refusing is cheaper than being wrong.

The verification pass

The pattern is boring and it works: the model proposes, a deterministic check disposes. For enrichment, the check is a guard on confidence and a schema validation before anything touches the database.

enrich.py
python
async def enrich(lead):
    res = await model.run("classify", lead)
    if res.confidence < 0.82 or not schema.valid(res):
        raise NeedsReview(lead.id)
    return res.apply(lead)

What I’d tell my earlier self

Spend the first week instrumenting, not building. You cannot budget reliability you can’t see. Most of it is just being able to ask “what did it actually see?” and get a straight answer.

More writing

All writing →
Verification before autonomy Agents · 2026·01·09 The CRM is the hard part Integration · 2025·11·22 Multi-model orchestration without the orchestra Orchestration · 2025·10·03 What the planning domain taught me about retrieval Retrieval · 2025·08·17 Logging the prompt that actually shipped Observability · 2025·06·30