Mastering Claude AI: Advanced Prompting, Tool Use, and Production Patterns

Why Claude?

Claude (built by Anthropic) is designed from the ground up for honesty, helpfulness, and harmlessness. It excels at long-context reasoning (200K token window), follows instructions with high fidelity, and produces reliable structured outputs. The model family spans Haiku (fast/cheap), Sonnet (balanced), and Opus (most capable).

┌───────────────────────────────────────────────────────┐
│              Claude Model Comparison (2025)            │
├────────────────┬──────────┬────────────┬──────────────┤
│  Model         │  Speed   │  Cost/MTok │  Best For    │
├────────────────┼──────────┼────────────┼──────────────┤
│  claude-opus-4-7   │  Slow    │  $15/$75   │  Complex reasoning│
│  claude-sonnet-4-6 │  Fast    │  $3/$15    │  Balanced tasks  │
│  claude-haiku-4-5  │  Fastest │  $0.25/$1.25│  High throughput │
└────────────────┴──────────┴────────────┴──────────────┘
  Cost = Input/Output per million tokens

Setting Up the Anthropic SDK

python
1pip install anthropic

python
1from anthropic import Anthropic
2
3client = Anthropic()  # Uses ANTHROPIC_API_KEY env var
4
5# Basic message
6response = client.messages.create(
7    model="claude-sonnet-4-6",
8    max_tokens=1024,
9    system="You are an expert software architect. Be precise and concise.",
10    messages=[
11        {"role": "user", "content": "Explain the CQRS pattern in 3 sentences."}
12    ]
13)
14print(response.content[0].text)
15print(f"Input tokens: {response.usage.input_tokens}")
16print(f"Output tokens: {response.usage.output_tokens}")

Prompt Engineering for Claude

System Prompt Architecture

Claude responds extremely well to structured system prompts:

python
1system_prompt = """You are a senior code reviewer at a top tech company.
2
3## Your Role
4- Review code for correctness, performance, security, and maintainability
5- Be direct and specific — cite line numbers when relevant
6- Prioritize: security bugs > logic errors > performance > style
7
8## Output Format
9Always respond with:
101. **Summary** (1-2 sentences overall assessment)
112. **Critical Issues** (security/bugs — must fix)
123. **Suggestions** (performance/style — nice to have)
134. **Verdict**: APPROVE / REQUEST_CHANGES / REJECT
14
15## Constraints
16- Never praise code just to be nice
17- If code is good, say so and explain why
18- Use markdown code blocks with language tags"""

Few-Shot Prompting

python
1def classify_sentiment(texts: list[str]) -> list[str]:
2    examples = [
3        {"text": "This product exceeded all my expectations!", "label": "POSITIVE"},
4        {"text": "Completely useless, waste of money.", "label": "NEGATIVE"},
5        {"text": "It arrived on time.", "label": "NEUTRAL"},
6    ]
7    
8    examples_text = "\n".join([
9        f'Text: "{e["text"]}" → {e["label"]}'
10        for e in examples
11    ])
12    
13    response = client.messages.create(
14        model="claude-haiku-4-5-20251001",
15        max_tokens=256,
16        messages=[{
17            "role": "user",
18            "content": f"""Classify sentiment as POSITIVE, NEGATIVE, or NEUTRAL.
19
20Examples:
21{examples_text}
22
23Now classify these (respond with JSON array):
24{chr(10).join(f'{i+1}. "{t}"' for i, t in enumerate(texts))}"""
25        }]
26    )
27    import json
28    return json.loads(response.content[0].text)

Chain of Thought with Extended Thinking

Claude's thinking mode lets the model reason through complex problems before responding:

python
1response = client.messages.create(
2    model="claude-opus-4-7",
3    max_tokens=16000,
4    thinking={
5        "type": "enabled",
6        "budget_tokens": 10000  # max tokens for internal reasoning
7    },
8    messages=[{
9        "role": "user",
10        "content": """A store sells apples for $0.50 each and oranges for $0.75 each.
11        Alice buys 3x as many apples as oranges and spends exactly $9.00.
12        How many of each fruit did she buy?"""
13    }]
14)
15
16for block in response.content:
17    if block.type == "thinking":
18        print(f"[Thinking]\n{block.thinking}")
19    elif block.type == "text":
20        print(f"[Answer]\n{block.text}")

Tool Use (Function Calling)

Claude can call external tools and use the results in its responses:

python
1import json
2from datetime import datetime
3
4tools = [
5    {
6        "name": "get_stock_price",
7        "description": "Get the current stock price for a ticker symbol",
8        "input_schema": {
9            "type": "object",
10            "properties": {
11                "ticker": {
12                    "type": "string",
13                    "description": "Stock ticker symbol (e.g. AAPL, GOOGL)"
14                }
15            },
16            "required": ["ticker"]
17        }
18    },
19    {
20        "name": "search_news",
21        "description": "Search for recent news articles about a topic",
22        "input_schema": {
23            "type": "object",
24            "properties": {
25                "query": {"type": "string"},
26                "max_results": {"type": "integer", "default": 5}
27            },
28            "required": ["query"]
29        }
30    }
31]
32
33def run_tool(name: str, inputs: dict) -> str:
34    """Execute the tool and return result as string."""
35    if name == "get_stock_price":
36        # In production, call a real stock API
37        return json.dumps({"ticker": inputs["ticker"], "price": 185.32, "change": "+1.2%"})
38    elif name == "search_news":
39        return json.dumps([{"title": f"News about {inputs['query']}", "url": "..."}])
40
41def agentic_loop(user_message: str) -> str:
42    """Run Claude in an agentic loop until it produces a final answer."""
43    messages = [{"role": "user", "content": user_message}]
44    
45    while True:
46        response = client.messages.create(
47            model="claude-sonnet-4-6",
48            max_tokens=4096,
49            tools=tools,
50            messages=messages
51        )
52        
53        if response.stop_reason == "end_turn":
54            # Claude is done — extract final text
55            return next(b.text for b in response.content if b.type == "text")
56        
57        if response.stop_reason == "tool_use":
58            # Claude wants to use tools
59            messages.append({"role": "assistant", "content": response.content})
60            
61            tool_results = []
62            for block in response.content:
63                if block.type == "tool_use":
64                    print(f"Calling tool: {block.name}({block.input})")
65                    result = run_tool(block.name, block.input)
66                    tool_results.append({
67                        "type": "tool_result",
68                        "tool_use_id": block.id,
69                        "content": result
70                    })
71            
72            messages.append({"role": "user", "content": tool_results})
73
74# Run it
75answer = agentic_loop("What's the latest news about Apple stock and its current price?")
76print(answer)

Prompt Caching (90% Cost Reduction)

For prompts with large static context (system prompts, documents), use prompt caching:

python
1from anthropic import Anthropic
2
3client = Anthropic()
4
5# Load your large document once
6with open("large_codebase.txt") as f:
7    codebase_content = f.read()  # Could be 100K+ tokens
8
9def ask_about_code(question: str) -> str:
10    response = client.messages.create(
11        model="claude-sonnet-4-6",
12        max_tokens=1024,
13        system=[
14            {
15                "type": "text",
16                "text": "You are a code assistant. Answer questions about the provided codebase."
17            },
18            {
19                "type": "text",
20                "text": f"<codebase>\n{codebase_content}\n</codebase>",
21                "cache_control": {"type": "ephemeral"}  # Cache this large block
22            }
23        ],
24        messages=[{"role": "user", "content": question}]
25    )
26    
27    # First call: cache MISS — pays full price
28    # Subsequent calls (within 5 min): cache HIT — 90% cheaper, 2x faster
29    cache_hits = response.usage.cache_read_input_tokens
30    print(f"Cache hits: {cache_hits} tokens (saved ~${cache_hits * 0.000003:.4f})")
31    
32    return response.content[0].text

Structured Output with JSON Mode

python
1from pydantic import BaseModel
2from typing import Literal
3
4class CodeReview(BaseModel):
5    verdict: Literal["APPROVE", "REQUEST_CHANGES", "REJECT"]
6    summary: str
7    critical_issues: list[str]
8    suggestions: list[str]
9    score: int  # 1-10
10
11def review_code(code: str) -> CodeReview:
12    response = client.messages.create(
13        model="claude-sonnet-4-6",
14        max_tokens=2048,
15        system="You are a code reviewer. Always respond with valid JSON matching the schema.",
16        messages=[{
17            "role": "user",
18            "content": f"""Review this code and respond with JSON matching this schema:
19{CodeReview.model_json_schema()}
20
21Code:
22```python
23{code}
24```"""
25        }]
26    )
27    
28    import json
29    data = json.loads(response.content[0].text)
30    return CodeReview(**data)
31
32review = review_code("def add(a, b): return a + b")
33print(f"Verdict: {review.verdict}, Score: {review.score}/10")

Streaming for Real-Time UX

python
1import sys
2
3with client.messages.stream(
4    model="claude-sonnet-4-6",
5    max_tokens=2048,
6    messages=[{"role": "user", "content": "Write a blog post about AI in healthcare."}]
7) as stream:
8    for text in stream.text_stream:
9        print(text, end="", flush=True)
10    
11    # Final message with usage stats
12    final = stream.get_final_message()
13    print(f"\n\nTotal tokens: {final.usage.input_tokens + final.usage.output_tokens}")

Production Best Practices

Model selection: Haiku for classification/extraction, Sonnet for reasoning/generation, Opus for frontier tasks only
Temperature: 0 for deterministic tasks (extraction, classification); 0.3–0.7 for creative tasks
Max tokens: Set tight limits — avoids runaway costs and signals expected response length
Retry logic: Implement exponential backoff for 529 (overloaded) and 529 errors
Observability: Log every request with model, tokens, latency, and cost

Mastering Claude AI: Advanced Prompting, Tool Use, and Production Patterns

Mastering Claude AI: Advanced Prompting, Tool Use, and Production Patterns

Why Claude?

Setting Up the Anthropic SDK

Prompt Engineering for Claude

System Prompt Architecture

Few-Shot Prompting

Chain of Thought with Extended Thinking

Tool Use (Function Calling)

Prompt Caching (90% Cost Reduction)

Structured Output with JSON Mode

Streaming for Real-Time UX

Production Best Practices

Sumit Kumar Pandey

Share this article

Discussion (0)