Mastering Claude AI: Advanced Prompting, Tool Use, and Production Patterns
Why Claude?
Claude (built by Anthropic) is designed from the ground up for honesty, helpfulness, and harmlessness. It excels at long-context reasoning (200K token window), follows instructions with high fidelity, and produces reliable structured outputs. The model family spans Haiku (fast/cheap), Sonnet (balanced), and Opus (most capable).
┌───────────────────────────────────────────────────────┐
│ Claude Model Comparison (2025) │
├────────────────┬──────────┬────────────┬──────────────┤
│ Model │ Speed │ Cost/MTok │ Best For │
├────────────────┼──────────┼────────────┼──────────────┤
│ claude-opus-4-7 │ Slow │ $15/$75 │ Complex reasoning│
│ claude-sonnet-4-6 │ Fast │ $3/$15 │ Balanced tasks │
│ claude-haiku-4-5 │ Fastest │ $0.25/$1.25│ High throughput │
└────────────────┴──────────┴────────────┴──────────────┘
Cost = Input/Output per million tokens
Setting Up the Anthropic SDK
python1pip install anthropic
python1from anthropic import Anthropic 2 3client = Anthropic() # Uses ANTHROPIC_API_KEY env var 4 5# Basic message 6response = client.messages.create( 7 model="claude-sonnet-4-6", 8 max_tokens=1024, 9 system="You are an expert software architect. Be precise and concise.", 10 messages=[ 11 {"role": "user", "content": "Explain the CQRS pattern in 3 sentences."} 12 ] 13) 14print(response.content[0].text) 15print(f"Input tokens: {response.usage.input_tokens}") 16print(f"Output tokens: {response.usage.output_tokens}")
Prompt Engineering for Claude
System Prompt Architecture
Claude responds extremely well to structured system prompts:
python1system_prompt = """You are a senior code reviewer at a top tech company. 2 3## Your Role 4- Review code for correctness, performance, security, and maintainability 5- Be direct and specific — cite line numbers when relevant 6- Prioritize: security bugs > logic errors > performance > style 7 8## Output Format 9Always respond with: 101. **Summary** (1-2 sentences overall assessment) 112. **Critical Issues** (security/bugs — must fix) 123. **Suggestions** (performance/style — nice to have) 134. **Verdict**: APPROVE / REQUEST_CHANGES / REJECT 14 15## Constraints 16- Never praise code just to be nice 17- If code is good, say so and explain why 18- Use markdown code blocks with language tags"""
Few-Shot Prompting
python1def classify_sentiment(texts: list[str]) -> list[str]: 2 examples = [ 3 {"text": "This product exceeded all my expectations!", "label": "POSITIVE"}, 4 {"text": "Completely useless, waste of money.", "label": "NEGATIVE"}, 5 {"text": "It arrived on time.", "label": "NEUTRAL"}, 6 ] 7 8 examples_text = "\n".join([ 9 f'Text: "{e["text"]}" → {e["label"]}' 10 for e in examples 11 ]) 12 13 response = client.messages.create( 14 model="claude-haiku-4-5-20251001", 15 max_tokens=256, 16 messages=[{ 17 "role": "user", 18 "content": f"""Classify sentiment as POSITIVE, NEGATIVE, or NEUTRAL. 19 20Examples: 21{examples_text} 22 23Now classify these (respond with JSON array): 24{chr(10).join(f'{i+1}. "{t}"' for i, t in enumerate(texts))}""" 25 }] 26 ) 27 import json 28 return json.loads(response.content[0].text)
Chain of Thought with Extended Thinking
Claude's thinking mode lets the model reason through complex problems before responding:
python1response = client.messages.create( 2 model="claude-opus-4-7", 3 max_tokens=16000, 4 thinking={ 5 "type": "enabled", 6 "budget_tokens": 10000 # max tokens for internal reasoning 7 }, 8 messages=[{ 9 "role": "user", 10 "content": """A store sells apples for $0.50 each and oranges for $0.75 each. 11 Alice buys 3x as many apples as oranges and spends exactly $9.00. 12 How many of each fruit did she buy?""" 13 }] 14) 15 16for block in response.content: 17 if block.type == "thinking": 18 print(f"[Thinking]\n{block.thinking}") 19 elif block.type == "text": 20 print(f"[Answer]\n{block.text}")
Tool Use (Function Calling)
Claude can call external tools and use the results in its responses:
python1import json 2from datetime import datetime 3 4tools = [ 5 { 6 "name": "get_stock_price", 7 "description": "Get the current stock price for a ticker symbol", 8 "input_schema": { 9 "type": "object", 10 "properties": { 11 "ticker": { 12 "type": "string", 13 "description": "Stock ticker symbol (e.g. AAPL, GOOGL)" 14 } 15 }, 16 "required": ["ticker"] 17 } 18 }, 19 { 20 "name": "search_news", 21 "description": "Search for recent news articles about a topic", 22 "input_schema": { 23 "type": "object", 24 "properties": { 25 "query": {"type": "string"}, 26 "max_results": {"type": "integer", "default": 5} 27 }, 28 "required": ["query"] 29 } 30 } 31] 32 33def run_tool(name: str, inputs: dict) -> str: 34 """Execute the tool and return result as string.""" 35 if name == "get_stock_price": 36 # In production, call a real stock API 37 return json.dumps({"ticker": inputs["ticker"], "price": 185.32, "change": "+1.2%"}) 38 elif name == "search_news": 39 return json.dumps([{"title": f"News about {inputs['query']}", "url": "..."}]) 40 41def agentic_loop(user_message: str) -> str: 42 """Run Claude in an agentic loop until it produces a final answer.""" 43 messages = [{"role": "user", "content": user_message}] 44 45 while True: 46 response = client.messages.create( 47 model="claude-sonnet-4-6", 48 max_tokens=4096, 49 tools=tools, 50 messages=messages 51 ) 52 53 if response.stop_reason == "end_turn": 54 # Claude is done — extract final text 55 return next(b.text for b in response.content if b.type == "text") 56 57 if response.stop_reason == "tool_use": 58 # Claude wants to use tools 59 messages.append({"role": "assistant", "content": response.content}) 60 61 tool_results = [] 62 for block in response.content: 63 if block.type == "tool_use": 64 print(f"Calling tool: {block.name}({block.input})") 65 result = run_tool(block.name, block.input) 66 tool_results.append({ 67 "type": "tool_result", 68 "tool_use_id": block.id, 69 "content": result 70 }) 71 72 messages.append({"role": "user", "content": tool_results}) 73 74# Run it 75answer = agentic_loop("What's the latest news about Apple stock and its current price?") 76print(answer)
Prompt Caching (90% Cost Reduction)
For prompts with large static context (system prompts, documents), use prompt caching:
python1from anthropic import Anthropic 2 3client = Anthropic() 4 5# Load your large document once 6with open("large_codebase.txt") as f: 7 codebase_content = f.read() # Could be 100K+ tokens 8 9def ask_about_code(question: str) -> str: 10 response = client.messages.create( 11 model="claude-sonnet-4-6", 12 max_tokens=1024, 13 system=[ 14 { 15 "type": "text", 16 "text": "You are a code assistant. Answer questions about the provided codebase." 17 }, 18 { 19 "type": "text", 20 "text": f"<codebase>\n{codebase_content}\n</codebase>", 21 "cache_control": {"type": "ephemeral"} # Cache this large block 22 } 23 ], 24 messages=[{"role": "user", "content": question}] 25 ) 26 27 # First call: cache MISS — pays full price 28 # Subsequent calls (within 5 min): cache HIT — 90% cheaper, 2x faster 29 cache_hits = response.usage.cache_read_input_tokens 30 print(f"Cache hits: {cache_hits} tokens (saved ~${cache_hits * 0.000003:.4f})") 31 32 return response.content[0].text
Structured Output with JSON Mode
python1from pydantic import BaseModel 2from typing import Literal 3 4class CodeReview(BaseModel): 5 verdict: Literal["APPROVE", "REQUEST_CHANGES", "REJECT"] 6 summary: str 7 critical_issues: list[str] 8 suggestions: list[str] 9 score: int # 1-10 10 11def review_code(code: str) -> CodeReview: 12 response = client.messages.create( 13 model="claude-sonnet-4-6", 14 max_tokens=2048, 15 system="You are a code reviewer. Always respond with valid JSON matching the schema.", 16 messages=[{ 17 "role": "user", 18 "content": f"""Review this code and respond with JSON matching this schema: 19{CodeReview.model_json_schema()} 20 21Code: 22```python 23{code} 24```""" 25 }] 26 ) 27 28 import json 29 data = json.loads(response.content[0].text) 30 return CodeReview(**data) 31 32review = review_code("def add(a, b): return a + b") 33print(f"Verdict: {review.verdict}, Score: {review.score}/10")
Streaming for Real-Time UX
python1import sys 2 3with client.messages.stream( 4 model="claude-sonnet-4-6", 5 max_tokens=2048, 6 messages=[{"role": "user", "content": "Write a blog post about AI in healthcare."}] 7) as stream: 8 for text in stream.text_stream: 9 print(text, end="", flush=True) 10 11 # Final message with usage stats 12 final = stream.get_final_message() 13 print(f"\n\nTotal tokens: {final.usage.input_tokens + final.usage.output_tokens}")
Production Best Practices
- Model selection: Haiku for classification/extraction, Sonnet for reasoning/generation, Opus for frontier tasks only
- Temperature: 0 for deterministic tasks (extraction, classification); 0.3–0.7 for creative tasks
- Max tokens: Set tight limits — avoids runaway costs and signals expected response length
- Retry logic: Implement exponential backoff for 529 (overloaded) and 529 errors
- Observability: Log every request with model, tokens, latency, and cost