Back to all articles
Featured image for article: Mastering Claude AI: Advanced Prompting, Tool Use, and Production Patterns
AI
25 min read4,101 views

Mastering Claude AI: Advanced Prompting, Tool Use, and Production Patterns

A complete expert guide to Claude AI — prompt engineering strategies, tool use, extended thinking, prompt caching, multi-turn conversations, and patterns for building production AI applications with Anthropic's Claude.

#Claude#Anthropic#LLM#Prompt Engineering#Tool Use#AI

Mastering Claude AI: Advanced Prompting, Tool Use, and Production Patterns

Why Claude?

Claude (built by Anthropic) is designed from the ground up for honesty, helpfulness, and harmlessness. It excels at long-context reasoning (200K token window), follows instructions with high fidelity, and produces reliable structured outputs. The model family spans Haiku (fast/cheap), Sonnet (balanced), and Opus (most capable).

┌───────────────────────────────────────────────────────┐
│              Claude Model Comparison (2025)            │
├────────────────┬──────────┬────────────┬──────────────┤
│  Model         │  Speed   │  Cost/MTok │  Best For    │
├────────────────┼──────────┼────────────┼──────────────┤
│  claude-opus-4-7   │  Slow    │  $15/$75   │  Complex reasoning│
│  claude-sonnet-4-6 │  Fast    │  $3/$15    │  Balanced tasks  │
│  claude-haiku-4-5  │  Fastest │  $0.25/$1.25│  High throughput │
└────────────────┴──────────┴────────────┴──────────────┘
  Cost = Input/Output per million tokens

Setting Up the Anthropic SDK

python
1pip install anthropic
python
1from anthropic import Anthropic 2 3client = Anthropic() # Uses ANTHROPIC_API_KEY env var 4 5# Basic message 6response = client.messages.create( 7 model="claude-sonnet-4-6", 8 max_tokens=1024, 9 system="You are an expert software architect. Be precise and concise.", 10 messages=[ 11 {"role": "user", "content": "Explain the CQRS pattern in 3 sentences."} 12 ] 13) 14print(response.content[0].text) 15print(f"Input tokens: {response.usage.input_tokens}") 16print(f"Output tokens: {response.usage.output_tokens}")

Prompt Engineering for Claude

System Prompt Architecture

Claude responds extremely well to structured system prompts:

python
1system_prompt = """You are a senior code reviewer at a top tech company. 2 3## Your Role 4- Review code for correctness, performance, security, and maintainability 5- Be direct and specific — cite line numbers when relevant 6- Prioritize: security bugs > logic errors > performance > style 7 8## Output Format 9Always respond with: 101. **Summary** (1-2 sentences overall assessment) 112. **Critical Issues** (security/bugs — must fix) 123. **Suggestions** (performance/style — nice to have) 134. **Verdict**: APPROVE / REQUEST_CHANGES / REJECT 14 15## Constraints 16- Never praise code just to be nice 17- If code is good, say so and explain why 18- Use markdown code blocks with language tags"""

Few-Shot Prompting

python
1def classify_sentiment(texts: list[str]) -> list[str]: 2 examples = [ 3 {"text": "This product exceeded all my expectations!", "label": "POSITIVE"}, 4 {"text": "Completely useless, waste of money.", "label": "NEGATIVE"}, 5 {"text": "It arrived on time.", "label": "NEUTRAL"}, 6 ] 7 8 examples_text = "\n".join([ 9 f'Text: "{e["text"]}" → {e["label"]}' 10 for e in examples 11 ]) 12 13 response = client.messages.create( 14 model="claude-haiku-4-5-20251001", 15 max_tokens=256, 16 messages=[{ 17 "role": "user", 18 "content": f"""Classify sentiment as POSITIVE, NEGATIVE, or NEUTRAL. 19 20Examples: 21{examples_text} 22 23Now classify these (respond with JSON array): 24{chr(10).join(f'{i+1}. "{t}"' for i, t in enumerate(texts))}""" 25 }] 26 ) 27 import json 28 return json.loads(response.content[0].text)

Chain of Thought with Extended Thinking

Claude's thinking mode lets the model reason through complex problems before responding:

python
1response = client.messages.create( 2 model="claude-opus-4-7", 3 max_tokens=16000, 4 thinking={ 5 "type": "enabled", 6 "budget_tokens": 10000 # max tokens for internal reasoning 7 }, 8 messages=[{ 9 "role": "user", 10 "content": """A store sells apples for $0.50 each and oranges for $0.75 each. 11 Alice buys 3x as many apples as oranges and spends exactly $9.00. 12 How many of each fruit did she buy?""" 13 }] 14) 15 16for block in response.content: 17 if block.type == "thinking": 18 print(f"[Thinking]\n{block.thinking}") 19 elif block.type == "text": 20 print(f"[Answer]\n{block.text}")

Tool Use (Function Calling)

Claude can call external tools and use the results in its responses:

python
1import json 2from datetime import datetime 3 4tools = [ 5 { 6 "name": "get_stock_price", 7 "description": "Get the current stock price for a ticker symbol", 8 "input_schema": { 9 "type": "object", 10 "properties": { 11 "ticker": { 12 "type": "string", 13 "description": "Stock ticker symbol (e.g. AAPL, GOOGL)" 14 } 15 }, 16 "required": ["ticker"] 17 } 18 }, 19 { 20 "name": "search_news", 21 "description": "Search for recent news articles about a topic", 22 "input_schema": { 23 "type": "object", 24 "properties": { 25 "query": {"type": "string"}, 26 "max_results": {"type": "integer", "default": 5} 27 }, 28 "required": ["query"] 29 } 30 } 31] 32 33def run_tool(name: str, inputs: dict) -> str: 34 """Execute the tool and return result as string.""" 35 if name == "get_stock_price": 36 # In production, call a real stock API 37 return json.dumps({"ticker": inputs["ticker"], "price": 185.32, "change": "+1.2%"}) 38 elif name == "search_news": 39 return json.dumps([{"title": f"News about {inputs['query']}", "url": "..."}]) 40 41def agentic_loop(user_message: str) -> str: 42 """Run Claude in an agentic loop until it produces a final answer.""" 43 messages = [{"role": "user", "content": user_message}] 44 45 while True: 46 response = client.messages.create( 47 model="claude-sonnet-4-6", 48 max_tokens=4096, 49 tools=tools, 50 messages=messages 51 ) 52 53 if response.stop_reason == "end_turn": 54 # Claude is done — extract final text 55 return next(b.text for b in response.content if b.type == "text") 56 57 if response.stop_reason == "tool_use": 58 # Claude wants to use tools 59 messages.append({"role": "assistant", "content": response.content}) 60 61 tool_results = [] 62 for block in response.content: 63 if block.type == "tool_use": 64 print(f"Calling tool: {block.name}({block.input})") 65 result = run_tool(block.name, block.input) 66 tool_results.append({ 67 "type": "tool_result", 68 "tool_use_id": block.id, 69 "content": result 70 }) 71 72 messages.append({"role": "user", "content": tool_results}) 73 74# Run it 75answer = agentic_loop("What's the latest news about Apple stock and its current price?") 76print(answer)

Prompt Caching (90% Cost Reduction)

For prompts with large static context (system prompts, documents), use prompt caching:

python
1from anthropic import Anthropic 2 3client = Anthropic() 4 5# Load your large document once 6with open("large_codebase.txt") as f: 7 codebase_content = f.read() # Could be 100K+ tokens 8 9def ask_about_code(question: str) -> str: 10 response = client.messages.create( 11 model="claude-sonnet-4-6", 12 max_tokens=1024, 13 system=[ 14 { 15 "type": "text", 16 "text": "You are a code assistant. Answer questions about the provided codebase." 17 }, 18 { 19 "type": "text", 20 "text": f"<codebase>\n{codebase_content}\n</codebase>", 21 "cache_control": {"type": "ephemeral"} # Cache this large block 22 } 23 ], 24 messages=[{"role": "user", "content": question}] 25 ) 26 27 # First call: cache MISS — pays full price 28 # Subsequent calls (within 5 min): cache HIT — 90% cheaper, 2x faster 29 cache_hits = response.usage.cache_read_input_tokens 30 print(f"Cache hits: {cache_hits} tokens (saved ~${cache_hits * 0.000003:.4f})") 31 32 return response.content[0].text

Structured Output with JSON Mode

python
1from pydantic import BaseModel 2from typing import Literal 3 4class CodeReview(BaseModel): 5 verdict: Literal["APPROVE", "REQUEST_CHANGES", "REJECT"] 6 summary: str 7 critical_issues: list[str] 8 suggestions: list[str] 9 score: int # 1-10 10 11def review_code(code: str) -> CodeReview: 12 response = client.messages.create( 13 model="claude-sonnet-4-6", 14 max_tokens=2048, 15 system="You are a code reviewer. Always respond with valid JSON matching the schema.", 16 messages=[{ 17 "role": "user", 18 "content": f"""Review this code and respond with JSON matching this schema: 19{CodeReview.model_json_schema()} 20 21Code: 22```python 23{code} 24```""" 25 }] 26 ) 27 28 import json 29 data = json.loads(response.content[0].text) 30 return CodeReview(**data) 31 32review = review_code("def add(a, b): return a + b") 33print(f"Verdict: {review.verdict}, Score: {review.score}/10")

Streaming for Real-Time UX

python
1import sys 2 3with client.messages.stream( 4 model="claude-sonnet-4-6", 5 max_tokens=2048, 6 messages=[{"role": "user", "content": "Write a blog post about AI in healthcare."}] 7) as stream: 8 for text in stream.text_stream: 9 print(text, end="", flush=True) 10 11 # Final message with usage stats 12 final = stream.get_final_message() 13 print(f"\n\nTotal tokens: {final.usage.input_tokens + final.usage.output_tokens}")

Production Best Practices

  1. Model selection: Haiku for classification/extraction, Sonnet for reasoning/generation, Opus for frontier tasks only
  2. Temperature: 0 for deterministic tasks (extraction, classification); 0.3–0.7 for creative tasks
  3. Max tokens: Set tight limits — avoids runaway costs and signals expected response length
  4. Retry logic: Implement exponential backoff for 529 (overloaded) and 529 errors
  5. Observability: Log every request with model, tokens, latency, and cost
Profile picture of Sumit Kumar Pandey

Sumit Kumar Pandey

Full-Stack Developer

Full-Stack Developer with 5+ years of experience building scalable web applications. Passionate about clean code, performance optimization, and modern web technologies.

About the Author

Author information for Sumit Kumar Pandey

Share this article

Found this helpful? Share with your network!

0 shares

Discussion (0)

Share your thoughts and join the conversation

Leave a comment

Be respectful and stay on topic

Write your comment in the text area above. Comments should be respectful and relevant to the article.

AI Chat Assistant

Interactive AI assistant for Sumit Kumar Pandey's portfolio website. Ask questions about technical skills, work experience, projects, availability, and contact information. Powered by Next.js API.