Introduction: The Art of Communicating with LLMs
Prompt Engineering is the discipline that studies how to formulate effective requests to Large Language Models. It's not a mystical art: it's a set of systematic, testable, and reproducible techniques that can dramatically improve output quality.
The difference between a generic prompt and a well-engineered prompt can be the difference between a mediocre response and a professional result. This article covers the most effective techniques, from basic level to advanced patterns like chain-of-thought and ReAct.
What You'll Learn in This Article
- Fundamental techniques: zero-shot, few-shot, and one-shot
- Chain-of-Thought (CoT) and Tree-of-Thought (ToT) for complex reasoning
- System prompts and persona injection to control behavior
- Structured output: getting JSON, tables, and specific formats
- The ReAct pattern for integrating reasoning and action
- Prompt versioning strategies and A/B testing
Fundamentals: Zero-Shot, One-Shot, and Few-Shot
The most basic distinction in prompt engineering concerns the number of examples provided to the model before the actual request.
Zero-Shot: No Examples
In zero-shot prompting, you directly ask the model to perform a task without providing any examples. It works well for simple, well-defined tasks where the model has already learned the pattern during training.
# Zero-shot: direct request without examples
from anthropic import Anthropic
client = Anthropic()
# Zero-shot classification
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=50,
messages=[{
"role": "user",
"content": "Classify the sentiment of this review as POSITIVE, NEGATIVE, or NEUTRAL:\n\n'The product arrived early and works perfectly. Highly recommended!'"
}]
)
print(response.content[0].text) # POSITIVE
Few-Shot: Examples as Guidance
Few-shot prompting provides 2-5 input/output examples before the request. This "teaches" the model the desired pattern through concrete examples, significantly improving output consistency and accuracy.
# Few-shot: provide examples before the request
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100,
messages=[{
"role": "user",
"content": """Extract entities from the text in the format ENTITY: TYPE.
Example 1:
Text: "John Smith works in London for Acme Corp"
Output:
- John Smith: PERSON
- London: LOCATION
- Acme Corp: ORGANIZATION
Example 2:
Text: "On January 15th Apple presented the new iPhone in San Francisco"
Output:
- January 15th: DATE
- Apple: ORGANIZATION
- iPhone: PRODUCT
- San Francisco: LOCATION
Text: "Sarah Johnson will attend the Google I/O 2026 conference in Mountain View"
Output:"""
}]
)
print(response.content[0].text)
Chain-of-Thought: Step-by-Step Reasoning
Chain-of-Thought (CoT) is one of the most powerful prompt engineering techniques. Instead of asking the model for a direct answer, you invite it to "reason step by step", making the reasoning process explicit. This significantly improves performance on tasks requiring logic, math, or multi-step reasoning.
# Chain-of-Thought: explicit reasoning
# WITHOUT CoT - often wrong answer on complex problems
response_no_cot = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{
"role": "user",
"content": "A store sells apples at 2 euros per kg. If I buy 3.5 kg and pay with a 20 euro bill, how much change do I get? And if a 10% discount applies?"
}]
)
# WITH CoT - step-by-step reasoning
response_cot = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": """A store sells apples at 2 euros per kg. If I buy 3.5 kg and pay with a 20 euro bill, how much change do I get? And if a 10% discount applies?
Think step by step before giving the final answer."""
}]
)
print(response_cot.content[0].text)
# The model will explicitly show:
# 1. Base cost: 3.5 * 2 = 7 EUR
# 2. Change without discount: 20 - 7 = 13 EUR
# 3. 10% discount: 7 * 0.10 = 0.70 EUR
# 4. Cost with discount: 7 - 0.70 = 6.30 EUR
# 5. Change with discount: 20 - 6.30 = 13.70 EUR
When to Use Chain-of-Thought
CoT is particularly effective for: multi-step math problems, logical analysis with multiple premises, decisions with multiple criteria to balance, code debugging where you need to trace the flow, and any task where intermediate reasoning is as important as the final answer.
System Prompts and Persona Injection
The system prompt is a special message that pre-conditions the model's behavior for the entire conversation. It's the most effective way to define the role, tone, restrictions, and desired output format.
# System prompt: define role and behavior
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
system="""You are a senior software architecture expert with 20 years of experience.
RULES:
1. Always respond concisely and practically
2. Use Python code examples when useful
3. Highlight trade-offs and performance considerations
4. If a question is ambiguous, ask for clarification
5. Don't make up non-existent libraries or APIs
OUTPUT FORMAT:
- Brief answer (2-3 sentences)
- Pros and Cons (if applicable)
- Code example (if relevant)
- References to known patterns""",
messages=[{
"role": "user",
"content": "Should I use a monolith or microservices for a startup with 3 developers?"
}]
)
print(response.content[0].text)
Persona Injection
Persona injection is a variant of the system prompt where you assign the model a specific identity. It's not role-playing: it's a technique to activate specific knowledge and communication styles within the model.
- "You are a senior PostgreSQL DBA": activates specific knowledge about query optimization, indexes, EXPLAIN ANALYZE
- "You are a tech lead doing code review": focus on maintainability, naming, SOLID principles
- "You are a B2B landing page copywriter": professional tone, conversion focus, effective CTAs
Structured Output: JSON, Tables, and Specific Formats
One of the most practical applications of prompt engineering is getting output in structured formats that can be easily parsed by code.
import json
# Request structured JSON output
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": """Analyze this bug description and return a JSON with this exact structure:
{
"severity": "critical|high|medium|low",
"component": "string",
"steps_to_reproduce": ["step1", "step2"],
"expected_behavior": "string",
"actual_behavior": "string",
"suggested_fix": "string"
}
Bug report: "When I click 'Save' in the profile form with an image larger than 5MB, the page crashes with error 500. It should show an error message about the maximum size."
Reply ONLY with the JSON, no additional text."""
}]
)
# Safe JSON parsing
try:
bug_data = json.loads(response.content[0].text)
print(f"Severity: {bug_data['severity']}")
print(f"Component: {bug_data['component']}")
except json.JSONDecodeError:
print("JSON parsing error")
The ReAct Pattern: Reasoning + Action
ReAct (Reason + Act) is an advanced pattern where the model alternates between reasoning phases and action phases. The model thinks about what to do, executes an action (like calling a tool), observes the result, and repeats the cycle until the task is complete.
# Simulation of the ReAct pattern
react_prompt = """You are an assistant that solves problems using the ReAct pattern.
For each step, use the format:
Thought: [reasoning about what to do]
Action: [action to execute]
Observation: [action result]
... (repeat until solution)
Final Answer: [final answer]
Available tools:
- search(query): search for information
- calculate(expression): calculate mathematical expressions
- lookup(database, key): look up in a database
Question: What is the total cost if I buy 15 software licenses at 49.99 EUR each with 22% VAT?"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": react_prompt}]
)
print(response.content[0].text)
# The model will produce:
# Thought: I need to calculate the base cost and then add VAT
# Action: calculate(15 * 49.99)
# Observation: 749.85
# Thought: Now I need to add 22% VAT
# Action: calculate(749.85 * 1.22)
# Observation: 914.817
# Final Answer: The total cost is 914.82 EUR (749.85 + 164.97 VAT)
Advanced Optimization Techniques
Beyond fundamental techniques, there are advanced strategies to maximize output quality.
Delimiters and Structure
Using clear delimiters to separate different parts of the prompt reduces ambiguity and improves model comprehension:
- Triple quotes (
""") to enclose text to analyze - XML tags (
<context>,<instructions>) to structure sections - Hash markers (
###) to separate logical sections - Explicit numbering for multi-step instructions
Negative Prompting
Specifying what not to do is equally important as specifying what to do. Negative instructions help the model avoid common unwanted patterns:
- "DO NOT start with 'Sure!' or 'Certainly!'"
- "DO NOT make up data or statistics if they're not in the context"
- "DO NOT repeat the question in the answer"
- "If you don't know the answer, say so. DO NOT fabricate."
Self-Consistency
The self-consistency technique consists of generating multiple responses to the same prompt (with temperature > 0) and selecting the most frequent answer or the one the model converges on. This is particularly useful for tasks with a defined correct answer.
Prompt Engineering Checklist
- Define the role with a clear system prompt
- Use few-shot for specific tasks or custom formats
- Activate CoT for complex reasoning ("Think step by step")
- Specify the desired output format (JSON, list, table)
- Use delimiters to separate context, instructions, and input
- Include negative instructions to avoid unwanted patterns
- Test with different temperatures to find optimal balance
- Version your prompts and track performance with A/B tests
Prompt Versioning and Testing
In production, prompts are code. They should be versioned, tested, and monitored like any other software component. A seemingly innocuous prompt change can drastically alter model behavior.
# Simple prompt versioning and testing system
from dataclasses import dataclass
from typing import Callable
@dataclass
class PromptVersion:
version: str
system_prompt: str
template: str
evaluator: Callable[[str], float]
# Define prompt versions
v1 = PromptVersion(
version="1.0",
system_prompt="You are a helpful assistant.",
template="Summarize this text: {text}",
evaluator=lambda output: len(output.split()) / 100 # brevity
)
v2 = PromptVersion(
version="2.0",
system_prompt="You are an expert editor. Produce concise and informative summaries.",
template="""Summarize the following text in maximum 3 bullet points.
Each point must contain a key piece of information with specific data.
Text:
\"\"\"{text}\"\"\"
Output format:
- [point 1]
- [point 2]
- [point 3]""",
evaluator=lambda output: len(output.split()) / 50 # tighter brevity
)
# A/B test
def test_prompt(version: PromptVersion, test_texts: list) -> float:
scores = []
for text in test_texts:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
system=version.system_prompt,
messages=[{"role": "user", "content": version.template.replace("{text}", text)}]
)
score = version.evaluator(response.content[0].text)
scores.append(score)
return sum(scores) / len(scores)
Conclusions
Prompt engineering is not a static skill: it evolves with models and with the understanding of their capabilities. The techniques presented in this article - zero-shot, few-shot, CoT, system prompts, structured output, ReAct - form a complete toolkit for interacting effectively with any LLM.
The key point is to treat prompts as code: version them, test them, iterate. A well-engineered prompt can eliminate the need for fine-tuning in many cases, saving significant time and costs.
In the next article, we'll explore Fine-Tuning: when prompt engineering isn't enough and you need to adapt the model itself to your data, using efficient techniques like LoRA, QLoRA, and PEFT.







