LLM integration without the cargo cult
The current wave of LLM integration in production systems has produced a recognisable pattern: teams adopt the technology enthusiastically, copy patterns from blog posts and documentation examples, and end up with integrations that are fragile, expensive to run, and difficult to reason about. This is cargo cult engineering applied to a new domain.
What cargo cult integration looks like
The symptoms are consistent across organisations and stacks:
- Prompts that are hundreds of lines long, encoding business logic that belongs in code
- LLM calls in hot paths where latency and cost accumulate invisibly
- No fallback when the model returns something unexpected
- Output parsed with assumptions that break on minor model version changes
- Evaluation done by eyeballing a handful of examples rather than systematic testing
None of this is a criticism of the teams involved. The tooling encourages it, the documentation examples demonstrate it, and the time pressure to ship something working is real.
Treating the LLM as a component
The discipline that fixes most of these problems is treating the LLM as a component with a defined interface, not as a magic box you talk to. This means:
Define the contract. What goes in, what comes out, and what constitutes a valid response. If you cannot write this down, you do not yet understand the integration well enough to build it.
Validate outputs. Parse and validate every response before using it. A response that does not conform to the expected schema should be treated as an error, not a special case to handle gracefully inline.
Separate prompt from logic. Prompts are configuration. Business logic is code. Mixing them produces something that is neither maintainable as configuration nor testable as code.
On cost and latency
LLM calls are not free and they are not fast. Both cost and latency are proportional to token volume, which means prompt length is a first-class engineering concern, not a detail to be optimised later.
Cache aggressively where the input is stable. Batch where latency permits. Move LLM calls out of synchronous request paths where possible. These are not optimisations for scale — they are correct engineering practice from the first integration.
Summary
LLM integration does not require new engineering disciplines. It requires applying existing ones: defined interfaces, validated outputs, separated concerns, and measured costs. The cargo cult emerges when teams treat the novelty of the technology as a reason to suspend normal engineering judgement. It is not.