Agentic Development Skill
Load with: base.md + llm-patterns.md + [language].md
For building autonomous AI agents that perform multi-step tasks with tools.
Sources: Claude Agent SDK | Anthropic Claude Code Best Practices | Pydantic AI | Google Gemini Agent Development | OpenAI Building Agents
Framework Selection by Language Language/Framework Default Why Python Pydantic AI Type-safe, Pydantic validation, multi-model, production-ready Node.js / Next.js Claude Agent SDK Official Anthropic SDK, tools, multi-agent, native streaming Python: Pydantic AI (Default) from pydantic_ai import Agent from pydantic import BaseModel
class SearchResult(BaseModel): title: str url: str summary: str
agent = Agent( 'claude-sonnet-4-20250514', result_type=list[SearchResult], system_prompt='You are a research assistant.', )
Type-safe result
result = await agent.run('Find articles about AI agents') for item in result.data: print(f"{item.title}: {item.url}")
Node.js / Next.js: Claude Agent SDK (Default) import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Define tools const tools: Anthropic.Tool[] = [ { name: "web_search", description: "Search the web for information", input_schema: { type: "object", properties: { query: { type: "string", description: "Search query" }, }, required: ["query"], }, }, ];
// Agentic loop async function runAgent(prompt: string) { const messages: Anthropic.MessageParam[] = [ { role: "user", content: prompt }, ];
while (true) { const response = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 4096, tools, messages, });
// Check for tool use
if (response.stop_reason === "tool_use") {
const toolUse = response.content.find((b) => b.type === "tool_use");
if (toolUse) {
const result = await executeTool(toolUse.name, toolUse.input);
messages.push({ role: "assistant", content: response.content });
messages.push({
role: "user",
content: [{ type: "tool_result", tool_use_id: toolUse.id, content: result }],
});
continue;
}
}
// Done - return final response
return response.content.find((b) => b.type === "text")?.text;
} }
Core Principle
Plan first, act incrementally, verify always.
Agents that research and plan before executing consistently outperform those that jump straight to action. Break complex tasks into verifiable steps, use tools judiciously, and maintain clear state throughout execution.
Agent Architecture Three Components (OpenAI) ┌─────────────────────────────────────────────────┐ │ AGENT │ ├─────────────────────────────────────────────────┤ │ Model (Brain) │ LLM for reasoning & │ │ │ decision-making │ ├─────────────────────┼───────────────────────────┤ │ Tools (Arms/Legs) │ APIs, functions, external │ │ │ systems for action │ ├─────────────────────┼───────────────────────────┤ │ Instructions │ System prompts defining │ │ (Rules) │ behavior & boundaries │ └─────────────────────┴───────────────────────────┘
Project Structure project/ ├── src/ │ ├── agents/ │ │ ├── orchestrator.ts # Main agent coordinator │ │ ├── specialized/ # Task-specific agents │ │ │ ├── researcher.ts │ │ │ ├── coder.ts │ │ │ └── reviewer.ts │ │ └── base.ts # Shared agent interface │ ├── tools/ │ │ ├── definitions/ # Tool schemas │ │ ├── implementations/ # Tool logic │ │ └── registry.ts # Tool discovery │ ├── prompts/ │ │ ├── system/ # Agent instructions │ │ └── templates/ # Task templates │ └── memory/ │ ├── conversation.ts # Short-term context │ └── persistent.ts # Long-term storage ├── tests/ │ ├── agents/ # Agent behavior tests │ ├── tools/ # Tool unit tests │ └── evals/ # End-to-end evaluations └── skills/ # Agent skills (Anthropic pattern) ├── skill-name/ │ ├── instructions.md │ ├── scripts/ │ └── resources/
Workflow Pattern: Explore-Plan-Execute-Verify
1. Explore Phase
// Gather context before acting
async function explore(task: Task): Promise
return { relevantFiles, existingPatterns, dependencies }; }
-
Plan Phase (Critical) // Plan explicitly before execution async function plan(task: Task, context: Context): Promise
{ const prompt = ` Task: ${task.description} Context: ${JSON.stringify(context)} Create a step-by-step plan. For each step: 1. What action to take 2. What tools to use 3. How to verify success 4. What could go wrong
Output JSON with steps array. `;
return await llmCall({ prompt, schema: PlanSchema }); }
- Execute Phase
// Execute with verification at each step
async function execute(plan: Plan): Promise
{ const results: Result[] = [];
for (const step of plan.steps) { // Execute single step const result = await executeStep(step);
// Verify before continuing
if (!await verify(step, result)) {
// Self-correct or escalate
const corrected = await selfCorrect(step, result);
if (!corrected.success) {
return handleFailure(step, results);
}
}
results.push(result);
}
return results; }
- Verify Phase
// Independent verification prevents overfitting
async function verify(step: Step, result: Result): Promise
{ // Run tests if available if (step.testCommand) { const testResult = await runCommand(step.testCommand); if (!testResult.success) return false; }
// Use LLM to verify against criteria const verification = await llmCall({ prompt: ` Step: ${step.description} Expected: ${step.successCriteria} Actual: ${JSON.stringify(result)}
Does the result satisfy the success criteria?
Respond with { "passes": boolean, "reasoning": string }
`,
schema: VerificationSchema
});
return verification.passes; }
Tool Design Tool Definition Pattern // tools/definitions/file-operations.ts import { z } from 'zod';
export const ReadFileTool = { name: 'read_file', description: 'Read contents of a file. Use before modifying any file.', parameters: z.object({ path: z.string().describe('Absolute path to the file'), startLine: z.number().optional().describe('Start line (1-indexed)'), endLine: z.number().optional().describe('End line (1-indexed)'), }), // Risk level for guardrails (OpenAI pattern) riskLevel: 'low' as const, };
export const WriteFileTool = { name: 'write_file', description: 'Write content to a file. Always read first to understand context.', parameters: z.object({ path: z.string().describe('Absolute path to the file'), content: z.string().describe('Complete file content'), }), riskLevel: 'medium' as const, // Require confirmation for high-risk operations requiresConfirmation: true, };
Tool Implementation
// tools/implementations/file-operations.ts
export async function readFile(
params: z.infer
const start = (params.startLine ?? 1) - 1;
const end = params.endLine ?? lines.length;
return {
success: true,
data: lines.slice(start, end).join('\n'),
metadata: { totalLines: lines.length }
};
} catch (error) {
return {
success: false,
error: Failed to read file: ${error.message}
};
}
}
Prefer Built-in Tools (OpenAI) // Use platform-provided tools when available const agent = createAgent({ tools: [ // Built-in tools (handled by platform) { type: 'web_search' }, { type: 'code_interpreter' },
// Custom tools only when needed
{ type: 'function', function: customDatabaseTool },
], });
Multi-Agent Patterns Single Agent (Default)
Use one agent for most tasks. Multiple agents add complexity.
Agent-as-Tool Pattern (OpenAI) // Expose specialized agents as callable tools const researchAgent = createAgent({ name: 'researcher', instructions: 'You research topics and return structured findings.', tools: [webSearchTool, documentReadTool], });
const mainAgent = createAgent({ tools: [ { type: 'function', function: { name: 'research_topic', description: 'Delegate research to specialized agent', parameters: ResearchQuerySchema, handler: async (query) => researchAgent.run(query), }, }, ], });
Handoff Pattern (OpenAI) // One-way transfer between agents const customerServiceAgent = createAgent({ tools: [ // Handoff to specialist when needed { name: 'transfer_to_billing', description: 'Transfer to billing specialist for payment issues', handler: async (context) => { return { handoff: 'billing_agent', context }; }, }, ], });
When to Use Multiple Agents Separate task domains with non-overlapping tools Different authorization levels needed Complex workflows with clear handoff points Parallel execution of independent subtasks Memory & State Conversation Memory // memory/conversation.ts interface ConversationMemory { messages: Message[]; maxTokens: number;
add(message: Message): void;
getContext(): Message[];
summarize(): Promise
// Maintain state across tool calls (Gemini pattern) interface AgentState { thoughtSignature?: string; // Encrypted reasoning state conversationId: string; // For shared memory currentPlan?: Plan; completedSteps: Step[]; }
Persistent Memory
// memory/persistent.ts
interface PersistentMemory {
// Store learnings across sessions
store(key: string, value: any): Promise
// Semantic search over past interactions
search(query: string, limit: number): Promise
Guardrails & Safety
Multi-Layer Protection (OpenAI)
// guards/index.ts
interface GuardrailConfig {
// Input validation
inputClassifier: (input: string) => Promise
// Output validation
outputValidator: (output: string) => Promise
// Tool risk assessment
toolRiskLevels: Record
// Actions requiring human approval humanInTheLoop: string[]; }
async function executeWithGuardrails(
agent: Agent,
input: string,
config: GuardrailConfig
): Promise
// 2. Execute with tool monitoring const result = await agent.run(input, { beforeTool: async (tool, params) => { const risk = config.toolRiskLevels[tool.name]; if (risk === 'high' || config.humanInTheLoop.includes(tool.name)) { return await requestHumanApproval(tool, params); } return { approved: true }; }, });
// 3. Validate output const outputCheck = await config.outputValidator(result.output); if (!outputCheck.safe) { return { blocked: true, reason: outputCheck.reason }; }
return result; }
Scope Enforcement (OpenAI) // Agent must stay within defined scope const agentInstructions = ` You are a customer service agent for Acme Corp.
SCOPE BOUNDARIES (non-negotiable): - Only answer questions about Acme products and services - Never provide legal, medical, or financial advice - Never access or modify data outside your authorized scope - If a request is out of scope, politely decline and explain why
If you cannot complete a task within scope, notify the user and request explicit approval before proceeding. `;
Model Selection Match Model to Task Task Complexity Recommended Model Notes Simple, fast gpt-5-mini, claude-haiku Low latency General purpose gpt-4.1, claude-sonnet Balance Complex reasoning o4-mini, claude-opus Higher accuracy Deep planning gpt-5 + reasoning, ultrathink Maximum capability Gemini-Specific // Use thinking_level for reasoning depth const response = await gemini.generate({ model: 'gemini-3', thinking_level: 'high', // For complex planning temperature: 1.0, // Optimized for reasoning engine });
// Preserve thought state across tool calls const nextResponse = await gemini.generate({ thoughtSignature: response.thoughtSignature, // Required for function calling // ... rest of params });
Claude-Specific (Thinking Modes) // Trigger extended thinking with keywords const thinkingLevels = { 'think': 'standard analysis', 'think hard': 'deeper reasoning', 'think harder': 'extensive analysis', 'ultrathink': 'maximum reasoning budget', };
const prompt = ` Think hard about this problem before proposing a solution.
Task: ${task.description} `;
Testing Agents Unit Tests (Tools) describe('readFile tool', () => { it('reads file content correctly', async () => { const result = await readFile({ path: '/test/file.txt' }); expect(result.success).toBe(true); expect(result.data).toContain('expected content'); }); });
Behavior Tests (Agent Decisions) describe('agent planning', () => { it('creates plan before executing file modifications', async () => { const trace = await agent.runWithTrace('Refactor the auth module');
// Verify planning happened first
const firstToolCall = trace.toolCalls[0];
expect(firstToolCall.name).toBe('read_file');
// Verify no writes without reads
const writeIndex = trace.toolCalls.findIndex(t => t.name === 'write_file');
const readIndex = trace.toolCalls.findIndex(t => t.name === 'read_file');
expect(readIndex).toBeLessThan(writeIndex);
}); });
Evaluation Tests // Run nightly, not in regular CI describe('Agent Accuracy (Eval)', () => { const testCases = loadTestCases('./evals/coding-tasks.json');
it.each(testCases)('completes $name correctly', async (testCase) => { const result = await agent.run(testCase.input);
// Verify against expected outcomes
expect(result.filesModified).toEqual(testCase.expectedFiles);
expect(await runTests(testCase.testCommand)).toBe(true);
}, 120000); });
Pydantic AI Patterns (Python Default) Project Structure (Python) project/ ├── src/ │ ├── agents/ │ │ ├── init.py │ │ ├── researcher.py # Research agent │ │ ├── coder.py # Coding agent │ │ └── orchestrator.py # Main coordinator │ ├── tools/ │ │ ├── init.py │ │ ├── web.py # Web search tools │ │ ├── files.py # File operations │ │ └── database.py # DB queries │ ├── models/ │ │ ├── init.py │ │ └── schemas.py # Pydantic models │ └── deps.py # Dependencies ├── tests/ │ ├── test_agents.py │ └── test_tools.py └── pyproject.toml
Agent with Tools from pydantic_ai import Agent, RunContext from pydantic import BaseModel from httpx import AsyncClient
class SearchResult(BaseModel): title: str url: str snippet: str
class ResearchDeps(BaseModel): http_client: AsyncClient api_key: str
research_agent = Agent( 'claude-sonnet-4-20250514', deps_type=ResearchDeps, result_type=list[SearchResult], system_prompt='You are a research assistant. Use tools to find information.', )
@research_agent.tool async def web_search(ctx: RunContext[ResearchDeps], query: str) -> list[dict]: """Search the web for information.""" response = await ctx.deps.http_client.get( 'https://api.search.com/search', params={'q': query}, headers={'Authorization': f'Bearer {ctx.deps.api_key}'}, ) return response.json()['results']
@research_agent.tool async def read_webpage(ctx: RunContext[ResearchDeps], url: str) -> str: """Read and extract content from a webpage.""" response = await ctx.deps.http_client.get(url) return response.text[:5000] # Truncate for context
Usage
async def main(): async with AsyncClient() as client: deps = ResearchDeps(http_client=client, api_key='...') result = await research_agent.run( 'Find recent articles about LLM agents', deps=deps, ) for item in result.data: print(f"- {item.title}")
Structured Output with Validation from pydantic import BaseModel, Field from pydantic_ai import Agent
class CodeReview(BaseModel): summary: str = Field(description="Brief summary of the review") issues: list[str] = Field(description="List of issues found") suggestions: list[str] = Field(description="Improvement suggestions") approval: bool = Field(description="Whether code is approved") confidence: float = Field(ge=0, le=1, description="Confidence score")
review_agent = Agent( 'claude-sonnet-4-20250514', result_type=CodeReview, system_prompt='Review code for quality, security, and best practices.', )
Result is validated Pydantic model
result = await review_agent.run(f"Review this code:\npython\n{code}\n")
if result.data.approval:
print("Code approved!")
else:
for issue in result.data.issues:
print(f"Issue: {issue}")
Multi-Agent Coordination from pydantic_ai import Agent
Specialized agents
planner = Agent('claude-sonnet-4-20250514', system_prompt='Create detailed plans.') executor = Agent('claude-sonnet-4-20250514', system_prompt='Execute tasks precisely.') reviewer = Agent('claude-sonnet-4-20250514', system_prompt='Review and verify work.')
async def orchestrate(task: str): # 1. Plan plan = await planner.run(f"Create a plan for: {task}")
# 2. Execute each step
results = []
for step in plan.data.steps:
result = await executor.run(f"Execute: {step}")
results.append(result.data)
# 3. Review
review = await reviewer.run(
f"Review the results:\nTask: {task}\nResults: {results}"
)
return review.data
Streaming Responses from pydantic_ai import Agent
agent = Agent('claude-sonnet-4-20250514')
async def stream_response(prompt: str): async with agent.run_stream(prompt) as response: async for chunk in response.stream(): print(chunk, end='', flush=True)
# Get final structured result
result = await response.get_data()
return result
Testing Agents import pytest from pydantic_ai import Agent from pydantic_ai.models.test import TestModel
@pytest.fixture def test_agent(): return Agent( TestModel(), # Mock model for testing result_type=str, )
async def test_agent_response(test_agent): result = await test_agent.run('Test prompt') assert result.data is not None
Test with specific responses
async def test_with_mock_response(): model = TestModel() model.seed_response('Expected output')
agent = Agent(model)
result = await agent.run('Any prompt')
assert result.data == 'Expected output'
Skills Pattern (Anthropic) Skill Structure skills/ └── code-review/ ├── instructions.md # How to perform code reviews ├── scripts/ │ └── run-linters.sh # Supporting scripts └── resources/ └── checklist.md # Review checklist
instructions.md Example
Code Review Skill
When to Use
Activate this skill when asked to review code, PRs, or diffs.
Process
- Read the changed files completely
- Run linters:
./scripts/run-linters.sh - Check against resources/checklist.md
- Provide structured feedback
Output Format
- Summary (1-2 sentences)
- Issues found (severity: critical/major/minor)
- Suggestions for improvement
- Approval recommendation
Loading Skills Dynamically
async function loadSkill(skillName: string): Promise./skills/${skillName};
const instructions = await fs.readFile(${skillPath}/instructions.md, 'utf-8');
const scripts = await glob(${skillPath}/scripts/*);
const resources = await glob(${skillPath}/resources/*);
return { name: skillName, instructions, scripts: scripts.map(s => ({ name: path.basename(s), path: s })), resources: await Promise.all(resources.map(loadResource)), }; }
Anti-Patterns No planning before execution - Agents that jump to action make more errors Monolithic agents - One agent with 50 tools becomes confused No verification - Agents must verify their own work Hardcoded tool sequences - Let the model decide tool order Missing guardrails - All agents need safety boundaries No state management - Lose context across tool calls Testing only happy paths - Test failures and edge cases Ignoring model differences - Reasoning models need different prompts No cost tracking - Agentic workflows can be expensive Full automation without oversight - Human-in-the-loop for critical actions Quick Reference Agent Development Checklist Define clear agent scope and boundaries Design tools with explicit schemas and risk levels Implement explore-plan-execute-verify workflow Add multi-layer guardrails Set up conversation and persistent memory Write behavior and evaluation tests Configure appropriate model for task complexity Add human-in-the-loop for high-risk operations Monitor token usage and costs Document skills and instructions Thinking Triggers (Claude) "think" → Standard analysis "think hard" → Deeper reasoning "think harder" → Extensive analysis "ultrathink" → Maximum reasoning
Gemini Settings
thinking_level: "high" | "low"
temperature: 1.0 (keep at 1.0 for reasoning)
thoughtSignature: