Prompt Engineering and the Taste Gap

2025-05-29 by Josh Bleecher Snyder

As I was learning prompt engineering, I encountered the same (good!) core advice repeatedly: invest in evals, make incremental improvements.

That’s great. But…where do the words come from? The ones that we evaluate, and iterate on, and augment with examples? When writing code, if you do a good job up front, you’ll spend less time fixing bugs and chasing performance issues. How do you do a better job up front of writing and editing a prompt?

The answer has slowly become clear to me: Have an LLM write and edit your prompts. You are still doing the heavy lifting. The understanding and context you provide is the crux of the work. But an LLM is an unparalleled linguistic optimizer and amanuensis. They are remarkably good at taking muddled, sprawling intentions and crystallizing them into clear, effective prompts.

I’m a native English speaker with extensive writing experience, most of it technical. I am a reasonably clear thinker and communicator. I’ve been writing prompts for a year. And yet frontier models systematically outperform my attempts to do it all myself. Maybe if I learned by osmosis at Anthropic or had another few years of prompt engineering experience this wouldn’t be true. But, to a first approximation, that describes nobody, so I’m pretty comfortable giving out this advice.

Frontier LLMs seem to be particularly good at providing strong guardrails to prevent unwanted but stubbornly persistent behaviors. For example, Claude generated this line, which proved to be strikingly effective in its context, and which I would never have come up with: “If you are about to do X - stop - and reconsider.”

How to prompt for prompts

When asking an LLM to write a prompt, I bring three things to the table.

The same approach works for refining a prompt. Explain what you’re doing, bring context (the original prompt and what’s going wrong), encourage the model not to make major changes (unless you think it’s necessary!), put on your judge’s wig, and iterate together.

Don’t just ask an LLM to improve a prompt. If you do this, sure, it’ll make a bunch of plausible-sounding changes. But anything it can infer from the original prompt is already implicit in the original prompt. You need to bring new information: new thoughts, observed problems, additional judgments. You could ask it to speculatively expand the prompt and then work to cut it back–the judgment involved in that paring down is new information. But the more context and judgment you bring, the more improvements are available.

Lastly, don’t be afraid to make changes yourself, no matter what the LLM thinks. Trust your judgment. And then, of course, eval!

To each their own

Oh, one more thing. I have a pet theory: Models have a house palette. Claude responds better to prompts that Claude writes. ChatGPT does a better job following instructions that it wrote.

I have not put this to an empirical test. (If anyone out there wants to do an experiment...)

This applies to humans, too. Words meant for humans should be written by humans.

Also published at commaok.xyz/ai/prompt-engineering-and-the-taste-gap/

sketch.dev · merde.ai · pi.dev