The Prompt Is Not the Product
After 25 years of shipping products, I can tell you: the prompt is the easy part. The hard part is everything you build around it when it inevitably fails.
I spent three weeks on a prompt. Three weeks. I refined the system message, tuned the temperature, added few-shot examples, restructured the chain-of-thought reasoning. By the end I had something that nailed every test case I threw at it. It was beautiful. Clean outputs, consistent tone, handled edge cases gracefully.
Then I shipped it.
Users broke it seventeen different ways in the first forty-eight hours. Someone pasted in a CSV with 400 rows. Someone fed it Welsh. Someone — and I still don't understand how — managed to get it to return valid JSON with every value set to the string "undefined." A user in our Slack posted a screenshot of the output and just wrote "lol what." That was the entire bug report. They weren't wrong.
Warning
Here's the thing nobody tells you when you're getting started with AI: the prompt is maybe 15% of the work. The other 85% is everything you build around it for the moment it inevitably does something stupid.
I've seen this movie before
Twenty-five years in this industry and the pattern repeats with every new technology. I watched it happen with responsive design. Everyone obsessed over media queries and fluid grids — the sexy part — and ignored the hundred small decisions about what content to prioritise, how navigation should collapse, what happens when someone rotates their phone mid-scroll. The breakpoint wasn't the product. The experience at every resolution was the product.
I watched it happen with single-page apps. Everyone went all-in on client-side rendering and forgot about loading states, error boundaries, back-button behaviour, deep linking. The framework wasn't the product.
And now I'm watching it happen with AI. Everyone's tweaking prompts and benchmarking models and nobody's talking about what happens when the model returns garbage at 2am on a Saturday and there's no fallback.
The 85% that actually matters
After that humbling launch, I rebuilt the system properly. Here's what the "boring" 85% looked like:
Input validation. Not just type checking — semantic validation. Is this input something the model can reasonably handle? Is it too long? Too short? In a language we support? Contains PII we need to strip? I spent two days just on input preprocessing. Two days I would have killed for during that three-week prompt binge.
Output parsing. The model doesn't always return what you asked for. Sometimes it wraps JSON in markdown code fences. Sometimes it adds a chatty preamble before the structured data. Sometimes it returns a perfectly formatted response to a completely different question. You need parsers that are fault-tolerant, that can extract signal from messy output, that know when to retry versus when to bail.
Fallback chains. When the primary model fails — and it will fail — what happens? Do you try a different prompt? A smaller model? A rules-based fallback? Do you ask the user to rephrase? Every AI feature I ship now has at least two layers of fallback. The best ones have three.
Graceful degradation. This is the one that separates senior engineers from everyone else. When everything fails, the user should still be able to accomplish their goal. Maybe slower, maybe with more manual effort, but the door should never slam shut. I learned this lesson fifteen years ago building e-commerce sites — if the recommendation engine dies, you show bestsellers. If the search index is stale, you show categories. The AI version of this principle is identical.
Monitoring. You need to know when things go wrong before your users tell you. Latency tracking, output quality scoring, cost per request, failure rates by input type. I have dashboards now that would make my old analytics teammates weep. Not because they're complex — because they're focused on the only question that matters: is this thing actually working for real people right now?
The parallel to marketing
Years ago I ran a campaign for a SaaS product. We spent six weeks on the creative — the copy, the visuals, the landing page. It was gorgeous. Award-worthy, probably. Conversion rate: 0.3%. Terrible. The creative was brilliant but we'd completely ignored the targeting, the offer structure, the follow-up sequence, the friction in the signup flow. We'd polished the thing people see and neglected everything around it.
That campaign taught me something I carry into every project: the visible part is never the whole product. The system around the visible part is the product.
A prompt is like a headline. It matters. A great one is better than a bad one. But a great headline with a broken landing page converts worse than a decent headline with a page that actually loads, works on mobile, and has a clear call to action.
What I build now
Every AI feature I ship follows the same architecture:
Validate and preprocess the input
Select prompt and model
Call the model
Parse and validate the output
Retry with simplified prompt
Fall back to rules-based approach
Present a graceful manual option
Log everything
Steps 2 and 3 are the "AI" part. They're two steps out of eight. That ratio feels right.
The uncomfortable truth
The best AI products I've used — the ones that feel magical — work well even when the AI doesn't. There's always a path forward. There's always a way to correct the output, to provide more context, to fall back to doing it manually. The AI accelerates the happy path but it doesn't gatekeep the outcome.
Key Insight
That's not a limitation of the technology. That's good product design. It was true when I was building forms in 2004 and it's true now.
I still spend time on prompts. I'm not saying they don't matter. But when I see someone Tweet about spending a month "perfecting" their prompt, I want to ask: what happens when it meets a real user? What happens when someone pastes in something you never imagined? What happens at 2am when the API is slow and the model hallucinates and there's no one around to fix it?
The prompt is not the product. The prompt is a component in the product. And if you're spending more than 15% of your time on it, you're probably neglecting the parts that will actually determine whether anyone uses this thing twice.
Three weeks on a prompt. I'll never make that mistake again.