Patterns that consistently work
- Build the eval first. If you can't measure the system, you can't improve it.
- Start with retrieval, not training. RAG beats fine-tuning for most enterprise use cases.
- Stream tokens. Latency perception matters as much as actual latency.
- Fail open with a graceful fallback. Models will fail; users shouldn't notice.
- Build the human-handoff path on day one. Not all queries should be AI-resolved.
Traps to avoid
- Don't over-engineer the agent loop before you've nailed the single-turn experience.
- Don't centralise prompts in a file no one reviews. Prompts are code — review them.
- Don't treat tokens like they're free. They're cheap, but at scale, expensive.
- Don't skip cost dashboards. AI bills surprise teams more than any other infra cost.
