Reliability is a budget, not a feature
The interesting failures are never the model. They are the seams between systems — the retry that wasn’t idempotent, the timeout nobody set, the row that two processes wrote at once.
When we first put an agent in front of our CRM enrichment pipeline, it worked in the demo and fell over in week two. It quietly wrote a wrong industry code to a few hundred accounts because an upstream call had degraded and the model confidently filled the gap. No exception, no alert. The system did exactly what we told it to, which was the problem.
Treat reliability as spend
Every nine of reliability has a price, and you pay it in engineering time and latency, not in cleverness. You decide the budget up front, per workflow. A draft-email agent can be loose. Anything that writes to the system of record gets a verification pass and a hard confidence floor, and you accept that it will sometimes refuse to act. Refusing is cheaper than being wrong.
The verification pass
The pattern is boring and it works: the model proposes, a deterministic check disposes. For enrichment, the check is a guard on confidence and a schema validation before anything touches the database.
What I’d tell my earlier self
Spend the first week instrumenting, not building. You cannot budget reliability you can’t see. Most of it is just being able to ask “what did it actually see?” and get a straight answer.