sketch blog: Prompt Engineering and the Taste Gap

As I was learning prompt engineering, I encountered the same (good!) core advice repeatedly: invest in evals, make incremental improvements.

That’s great. But…where do the words come from? The ones that we evaluate, and iterate on, and augment with examples? When writing code, if you do a good job up front, you’ll spend less time fixing bugs and chasing performance issues. How do you do a better job up front of writing and editing a prompt?

The answer has slowly become clear to me: Have an LLM write and edit your prompts. You are still doing the heavy lifting. The understanding and context you provide is the crux of the work. But an LLM is an unparalleled linguistic optimizer and amanuensis. They are remarkably good at taking muddled, sprawling intentions and crystallizing them into clear, effective prompts.

I’m a native English speaker with extensive writing experience, most of it technical. I am a reasonably clear thinker and communicator. I’ve been writing prompts for a year. And yet frontier models systematically outperform my attempts to do it all myself. Maybe if I learned by osmosis at Anthropic or had another few years of prompt engineering experience this wouldn’t be true. But, to a first approximation, that describes nobody, so I’m pretty comfortable giving out this advice.

Frontier LLMs seem to be particularly good at providing strong guardrails to prevent unwanted but stubbornly persistent behaviors. For example, Claude generated this line, which proved to be strikingly effective in its context, and which I would never have come up with: “If you are about to do X - stop - and reconsider.”

How to prompt for prompts

When asking an LLM to write a prompt, I bring three things to the table.

Clear instructions. I tell it I want it to help me write a prompt. Obvious, but I still sometimes manage to forget this. It is often also fruitful to request that it ask you probing, critical questions. Make it crystal clear you want engagement, not affirmation. (My Claude personal preference begins: "I value intellectual, rigorous, critical discussion. I want a tennis partner who returns serves with spin - challenge my assumptions, ask probing questions, suggest better approaches, and engage rather than affirm." This makes Claude charmingly fiesty.)
Context. Lots and lots of context. What I am trying to accomplish, for whom, and why? I pour in my ideas, dreams, hopes, confusions, plans, fears, draft sentences that seem promising, whatever. I ramble and circumlocute. Get it all down. All of it. Order doesn’t matter. It is easy to cut later.
Judgment. Ira Glass popularized the Taste Gap, the observation that judgment precedes ability: “All of us who do creative work, we get into it because we have good taste. But there is this gap. For the first couple years you make stuff, it’s just not that good.” The rough draft generated by the LLM usually has obvious flaws: possible misinterpretations, misplaced emphasis, omitted nuance, wrong feel. This is where you can leverage the Taste Gap: It is much easier to identify what’s wrong than it is to make it right. Identify everything that doesn’t seem right–even if it is hard to nail down precisely–and tell the LLM, and let it iterate. Tweak, quibble, argue, point out shortcomings, cut. Over and over. If the conversation goes off the rails, learn some lessons from it, copy/paste a bunch of text, and start over.

The same approach works for refining a prompt. Explain what you’re doing, bring context (the original prompt and what’s going wrong), encourage the model not to make major changes (unless you think it’s necessary!), put on your judge’s wig, and iterate together.

Don’t just ask an LLM to improve a prompt. If you do this, sure, it’ll make a bunch of plausible-sounding changes. But anything it can infer from the original prompt is already implicit in the original prompt. You need to bring new information: new thoughts, observed problems, additional judgments. You could ask it to speculatively expand the prompt and then work to cut it back–the judgment involved in that paring down is new information. But the more context and judgment you bring, the more improvements are available.

Lastly, don’t be afraid to make changes yourself, no matter what the LLM thinks. Trust your judgment. And then, of course, eval!

To each their own

Oh, one more thing. I have a pet theory: Models have a house palette. Claude responds better to prompts that Claude writes. ChatGPT does a better job following instructions that it wrote.

I have not put this to an empirical test. (If anyone out there wants to do an experiment...)

This applies to humans, too. Words meant for humans should be written by humans.

Also published at commaok.xyz/ai/prompt-engineering-and-the-taste-gap/