Logging the prompt that actually shipped

30 June 2025 · 7 min

The single highest-leverage thing we did for reliability was unglamorous: record the exact, fully-resolved prompt that was sent, every time.

Templates lie. The prompt in your code is not the prompt that shipped — by the time variables, retrieved context, and truncation have done their work, the string the model actually saw can be surprising. If you only log the template, every incident starts with a reconstruction.

Log the resolved string

Store the final rendered prompt, the model and parameters, the resolved context, and a hash that ties it to the response. Storage is cheap; a wrong answer you cannot reproduce is not.

trace.py

python

log.emit({
    "prompt": rendered,        # the literal string sent
    "model": cfg.model,
    "context_ids": ctx.ids,    # what retrieval returned
    "response_hash": sha(out),
})

Most of reliability is being able to ask “what did it actually see?” and get a straight answer in one query.

observability

← Previous

What the planning domain taught me about retrieval

More writing

All writing →

Reliability is a budget, not a feature Reliability · 2026·02·14 Verification before autonomy Agents · 2026·01·09 The CRM is the hard part Integration · 2025·11·22 Multi-model orchestration without the orchestra Orchestration · 2025·10·03 What the planning domain taught me about retrieval Retrieval · 2025·08·17