Mark Gerrard
Orchestration

Multi-model orchestration without the orchestra

3 October 2025 · 10 min

Most “orchestration” I see is elaborate routing that exists to justify itself. The useful version is smaller and more boring.

You do not need a model router for the sake of having one. We route for exactly two reasons: cost on high-volume, low-stakes calls, and a fallback when the primary degrades. Everything else is one good model and a clear prompt.

A cascade, not a council

The pattern that earned its keep is a cascade: try the cheap model, accept its answer only if it clears a confidence and schema check, otherwise escalate to the stronger one. No voting, no debate, no five-agent committee. Each step is independently verifiable.

route.py
python
def answer(task):
    cheap = small.run(task)
    if cheap.confidence >= 0.9 and schema.valid(cheap):
        return cheap
    return large.run(task)  # escalate, rarely

More writing

All writing →
Reliability is a budget, not a feature Reliability · 2026·02·14 Verification before autonomy Agents · 2026·01·09 The CRM is the hard part Integration · 2025·11·22 What the planning domain taught me about retrieval Retrieval · 2025·08·17 Logging the prompt that actually shipped Observability · 2025·06·30