Prompt engineering
Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: The craft of writing inputs that produce the outputs you actually want. The single highest-leverage skill in working with LLMs — and one anyone can learn in an afternoon.
In plain English
Same LLM. Same task. Two different prompts. The outputs can be wildly different in quality.
That’s the whole reason prompt engineering exists. Models are pattern-matchers — they’re heavily influenced by the patterns you give them. A vague prompt produces vague output. A well-structured prompt produces structured output. An example-rich prompt produces output matching the examples.
“Prompt engineering” sounds like a discipline; in practice it’s mostly a small set of habits you internalize. Once you have them, you stop having to think about it — your prompts become good by default.
This entry isn’t a magic-words list. It’s the structural moves that consistently work, with the reasoning so you can adapt them.
Why it matters
- Quality: the difference between a 60% answer and a 95% answer is almost always the prompt, not the model.
- Cost: good prompts get the right answer in one shot. Bad prompts require follow-ups, retries, and more tokens.
- Reliability: good prompts produce consistent output across runs. Bad prompts swing wildly.
- Compounding: every system you build with LLMs — agents, RAG pipelines, automation — runs on prompts. Bad prompts ripple through everything downstream.
If you’re going to spend any time with LLMs, the hour you spend learning prompt engineering pays back forever.
The seven moves that consistently work
Move 1: State the goal clearly
The single biggest improvement. State what you want in a sentence. Don’t bury it.
❌ Bad: “I have this code and I’m thinking about it and not sure what’s going on with it can you take a look”
✅ Better: “Find and fix the bug in this code that causes [specific symptom].”
✅ Best: “Find and fix the bug in this code that causes the user to be logged out after 5 minutes. Return only the corrected code; no explanation.”
The model can only optimize for what you ask. If you don’t say it, the model guesses.
Move 2: Give context — who, why, constraints
Models behave differently for different audiences and purposes. Tell them.
You are reviewing code for a non-coder's hobby project. The user
maintains the code with AI help, doesn't know advanced patterns,
and prioritizes clarity over cleverness. Don't suggest refactors
unless they fix actual bugs.
This single paragraph radically changes what the model produces — it’ll stop suggesting fancy abstractions and start writing the kind of feedback you can actually act on.
For Claude, the system prompt is the right place for this kind of persistent context. For one-off prompts, just put it at the top.
Move 3: Show examples (few-shot prompting)
If you want output in a specific format, show one or two examples. The model will match the pattern faithfully.
Convert each user message into a structured task.
Example 1:
User: "I need to deploy the site by Friday"
Output: { task: "Deploy site", deadline: "Friday", priority: "high" }
Example 2:
User: "Maybe one day we should clean up the old auth code"
Output: { task: "Refactor old auth code", deadline: null, priority: "low" }
Now convert this:
User: "Tomorrow morning let's send the welcome email"
Output:
The model will produce { task: "Send welcome email", deadline: "tomorrow morning", priority: "medium" } with very high reliability. Without the examples, it might give you free-form text, a different schema, or skip fields.
Two or three examples is usually plenty. More can help for complex tasks; rarely worth more than five.
Move 4: Structure your prompt with clear sections
Models follow structure. Use markdown headings, XML tags, or numbered lists to separate sections.
<task>
Summarize the following article in 3 bullet points.
</task>
<constraints>
- Each bullet must be one sentence
- Use plain English; no jargon
- Don't include the article's title
</constraints>
<article>
[paste article here]
</article>
Claude is specifically trained to attend to XML-style tags. Anthropic recommends them for any prompt with multiple sections. The model knows to treat <article>...</article> as data and <task>...</task> as instructions.
Move 5: Ask for the structure of the output
If you want JSON, ask for JSON. If you want a markdown table, ask for one. If you want exactly three sentences, say so.
Return your answer as JSON in this exact shape:
{
"summary": "<one-sentence summary>",
"topics": ["<topic>", "<topic>", "<topic>"],
"confidence": <0-100>
}
For strict structured output, also use Anthropic’s tool use feature — define a tool with a JSON schema and the model is forced to produce conforming output.
Move 6: Tell the model to think before answering
For non-trivial problems, ask the model to reason before producing the final answer. This dramatically improves accuracy on hard tasks.
Before giving your final answer, think step by step in <thinking> tags.
Then give your final answer in <answer> tags.
This is the “Chain of Thought” technique. It costs more tokens (the thinking takes tokens) but is often dramatically more accurate.
Modern Claude models (Opus with extended thinking enabled) do this automatically when you turn on the extended thinking feature in the API. The model can think for an arbitrary amount of time before answering.
Move 7: Iterate with the model
Don’t treat the first response as final. Critique it, ask for revisions, ask the model to evaluate its own work.
- “That was good but please make point 2 more specific.”
- “Are there cases your solution doesn’t handle?”
- “Re-read your answer and identify any factual claims that should be verified.”
LLMs are surprisingly good at self-critique when explicitly asked. Use this.
Two patterns that are gold for harder tasks
Pattern A: “Plan, then execute”
For any multi-step task, ask the model to plan first, then execute. The plan acts as scaffolding that keeps the execution coherent.
You're going to refactor this auth module. Before doing anything:
1. Read the module and list every function it exports.
2. Identify the top 3 changes that would improve it most.
3. For each change, describe the BEFORE state, the AFTER state, and the risk.
4. Wait for me to approve before making any edits.
This is exactly what Claude Code’s /plan skill and Plan subagent do. It’s the highest-leverage move for any non-trivial change.
Pattern B: “Define done”
State what “complete” means. Otherwise the model decides for itself, and you may disagree.
This task is complete when:
- All existing tests pass
- The new function has at least 3 test cases
- The diff has no `console.log` statements
- The PR description summarizes the change in one sentence
Explicit definitions of done remove judgment calls. The model can self-check.
A concrete example: same task, three prompts, three outputs
Task: Get a summary of a meeting transcript.
Prompt 1 (bad)
Summarize this meeting transcript: [transcript]
Likely output: A paragraph that may or may not capture what matters. Length: unpredictable. Format: prose. May include filler. May miss action items entirely.
Prompt 2 (better)
Summarize the following meeting transcript. Include:
- Topics discussed
- Decisions made
- Action items (who, what, when)
Format as markdown with headings.
Transcript: [transcript]
Likely output: Structured markdown with the three sections. Decent quality. Some action items might still be missed if they were ambiguous in the transcript.
Prompt 3 (best)
You are a precise meeting summarizer for a busy founder. The summary
will be read in 60 seconds, so it must be scannable and only contain
what matters.
<task>
Summarize the meeting below. Return three sections:
1. **Decisions** — list of decisions made. If none were made, write "No decisions."
2. **Action items** — list, each with: owner | task | deadline. If a deadline wasn't
stated, write "TBD."
3. **Open questions** — list of things explicitly left unresolved. If none, write "None."
Do not include: small talk, restating what people said in agreement, or
speculation about what was meant.
</task>
<example>
Decisions:
- Ship v2 of the dashboard by end of month
- Drop support for Internet Explorer
Action items:
- Sarah | Write the v2 launch announcement | Friday
- Marcus | Audit IE-specific code paths | TBD
Open questions:
- Will marketing have campaign assets ready?
</example>
<transcript>
[transcript]
</transcript>
Likely output: Exactly the three sections requested, in exactly the format shown, with the level of detail you wanted. Reliable across runs.
The total token cost of Prompt 3 vs Prompt 1 is maybe 3Ă— more. But the output is 10Ă— more useful. That ratio is typical.
Things that don’t work as well as people think
- “Be creative” — too vague to act on. Specify the kind of creativity you want.
- “Make it better” — better at what? Be specific.
- “Don’t make mistakes” — doesn’t help. Models will produce errors anyway. Better to ask for self-review.
- “Please please please” — adding politeness doesn’t materially improve output (though it doesn’t hurt either).
- Threatening the model (“if you get this wrong I will be very disappointed”) — also doesn’t help despite occasional folklore.
- Negative instructions alone (“don’t be boring”) — better paired with positive ones (“write in a punchy, conversational tone”).
Iteration vs craft
You won’t get prompts right on the first try. The workflow:
- Write a decent first prompt
- Run it
- Look at the output — what’s missing, wrong, or off-tone?
- Add a sentence to the prompt addressing that issue
- Run again
- Repeat until consistently good
Save your good prompts. Reuse them. Adapt them. Over time, you’ll build a personal library of templates for the situations you face. (The encyclopedia is one place to keep them.)
For high-stakes or high-volume prompts, test them against many inputs before locking them in. A prompt that works on one input but fails on another isn’t ready.
Common gotchas
-
The system prompt sets the tone for everything. A short, focused system prompt outperforms a long, kitchen-sink one. Edit ruthlessly.
-
Order matters. Instructions at the top and bottom of a long prompt get more attention than those in the middle. Put critical rules in those positions.
-
Models can be too compliant. If you tell Claude “you must answer in 5 words,” it will — even when the right answer needs more. Allow flexibility for the model to push back when constraints are wrong: “If you can’t comply, say why instead.”
-
Examples can bias. If all your examples have positive sentiment, the model assumes new inputs are also positive. Mix examples.
-
Roleplay can drift. “Pretend you are a Victorian-era butler” works initially; after a few turns, the model often relaxes back to default tone. Re-anchor in the system prompt if needed.
-
temperature: 0doesn’t guarantee determinism. Across model versions or even runs on the same version, you may see tiny differences. For true determinism, also pin the model version and seed (where available). -
Don’t over-engineer for one-off prompts. A 10-line system prompt for a single ad-hoc question is overkill. Save the heavy artillery for templates you’ll reuse.
See also
- What is an LLM? 🟩 — why prompt engineering works at all
- Tokens & context windows 🟩 — the cost dimension
- Tool use 🟩 — for structured output and external actions
- Temperature & sampling 🟩 — the determinism dial
- The Claude API 🟩 🟦 — how prompts get sent
- Claude models 🟩 🟦 — choosing the right one
- Agents 🟩 — the system prompt is half of an agent
- Fine-tuning vs context 🟩 — when prompting isn’t enough
- Claude Code deep dive 🟩 🟦 — prompt patterns in a real agent
- Prompt patterns for dev 🟩 — 20 reusable patterns
- Workflow patterns đźź©
- Glossary: Prompt
Sources
- Anthropic — Prompt engineering overview
- Anthropic — Prompt library — battle-tested examples
- OpenAI — Prompt engineering guide
- Lilian Weng — Prompt Engineering — research-flavored survey
- Chain-of-Thought paper — the foundational reasoning technique