What the planning domain taught me about retrieval

17 August 2025 · 11 min

We spent months tuning embeddings before realising the problem was that our documents were the wrong shape.

UK planning documents are long, cross-referencing, and full of clauses that only mean something in the context of the policy they amend. Chunking them on token count gave us retrieval that was technically relevant and practically useless — fragments that cited a rule without the rule.

Shape beats similarity

The fix was structural, not statistical. We chunked along the document’s own hierarchy — policy, clause, amendment — and attached the parent context to every child. Retrieval quality jumped more from that than from any embedding-model change we tried.

Build the eval before the index

The thing that actually moved the needle was a small, honest evaluation set: real questions, with the passages an expert would expect back. Once we could measure retrieval, every change became a decision instead of a guess.

retrieval

← Previous

Multi-model orchestration without the orchestra

Logging the prompt that actually shipped

More writing

All writing →

Reliability is a budget, not a feature Reliability · 2026·02·14 Verification before autonomy Agents · 2026·01·09 The CRM is the hard part Integration · 2025·11·22 Multi-model orchestration without the orchestra Orchestration · 2025·10·03 Logging the prompt that actually shipped Observability · 2025·06·30