Building AI Agents from Scratch

A practical guide to creating production-ready AI agents with real code patterns for architecture, tools, memory, and safety.

What Makes an AI Agent Different from a Chatbot

Chatbots answer questions. Agents complete tasks. This fundamental distinction shapes everything about how you build them.

The Key Differences

Aspect	Chatbot	AI Agent
Response Type	Information only	Actions + information
Interaction	Single-turn queries	Multi-step workflows
Initiative	Reactive (user-driven)	Proactive (can initiate actions)
Tool Use	None	API calls, database queries, integrations

While chatbots excel at providing information in response to user queries, AI agents take this further by actually accomplishing work. They can check your calendar, update your CRM, place orders, and complete complex workflows--all while maintaining context and making decisions about the best path forward.

For teams building multi-agent systems, understanding these differences is critical. See our guide on multi-agent systems design for patterns that coordinate multiple agents working together.

In this guide, you'll learn:

Core agent architecture and the agent decision loop
How to integrate tools so agents can take real actions
Memory systems that maintain context across conversations
Error handling and safety guardrails for production systems
Testing strategies that catch issues before deployment

Core Agent Architecture

Every AI agent follows a fundamental loop. Understanding this pattern is essential before adding complexity.

The Agent Decision Loop

User Request →
 Planning (break into steps) →
 For each step:
 Select tool →
 Execute tool →
 Evaluate result →
 Continue or complete →
 Final response/action

The agent receives input, reasons about what needs to happen, selects appropriate tools, executes actions, evaluates results, and either continues or completes. This cycle repeats until the task is done.

For advanced orchestration patterns that manage complex workflows across multiple agents, explore our detailed guide on agent orchestration patterns.

Core Components

1. Planning Module Breaks complex requests into manageable steps. For "book a flight to NYC," the planner might create: search flights, check prices, present options, confirm booking, send confirmation.

2. Tool Selector Decides which tool (or combination of tools) can accomplish each step. Each tool has a description that helps the agent understand its capabilities.

3. Execution Engine Runs the selected tool with appropriate parameters and handles the response. Includes retry logic and error handling.

4. Evaluation Logic Determines whether the step succeeded, failed, or needs clarification. Routes to next step, retry, or escalation.

System Prompt Structure

Your system prompt defines the agent's personality, boundaries, and capabilities:

You are a customer service agent for ACME Corp. You help customers with:
- Order status inquiries
- Return and refund requests
- Product recommendations
- Technical support questions

You MUST:
- Verify customer identity before sharing personal information
- Ask for clarification when requests are ambiguous
- Escalate complex issues to human agents
- Never make up information--say "I don't know" when unsure

You CANNOT:
- Process payments directly (use the payment tool)
- Access other customers' accounts
- Make promises about shipping times

Tool Integration: Giving Agents Superpowers

Tools transform agents from conversational interfaces into actionable systems. A well-designed tool lets the agent interact with any external system.

Understanding Function Calling

Modern LLMs support function calling natively. You describe a tool's name, parameters, and purpose--the model decides when to use it:

const searchTool = {
 name: "search_knowledge_base",
 description: "Search company documentation and FAQs for answers.",
 parameters: {
 type: "object",
 properties: {
 query: {
 type: "string",
 description: "The search query, at least 3 characters"
 },
 category: {
 type: "string",
 enum: ["products", "policies", "technical", "billing"],
 description: "Optional category filter"
 }
 },
 required: ["query"]
 }
};

For a deep dive into LLM tool use patterns and function calling best practices, see our comprehensive guide on LLM tool use and function calling.

Building Custom Tools

A production-ready tool includes these elements:

1. Clear Name and Description The description is critical--it teaches the agent when to use your tool. Be specific about capabilities and limitations.

2. Input Validation Validate parameters before execution. Return clear error messages if inputs are invalid.

3. Error Handling Catch exceptions, log errors, and return structured responses the agent can interpret.

4. Response Formatting Return consistent, parseable responses. Include status, data, and any relevant metadata.

Common Tool Categories

Category	Examples	Use Case
Web Search	Google, Bing, DuckDuckGo	Research, current information
Database	PostgreSQL, MongoDB, Redis	Customer data, session state
APIs	CRM, ERP, payment systems	Business process integration
File Operations	Read, write, upload	Document processing, reports
Calendar	Google Calendar, Outlook	Scheduling, availability
Email	SendGrid, SMTP	Notifications, communications

Tool Registration Pattern

class Agent {
 private tools: Map<string, Tool> = new Map();

 registerTool(tool: Tool) {
 this.tools.set(tool.name, tool);
 }

 async selectTool(request: string): Promise<Tool> {
 const descriptions = Array.from(this.tools.values())
 .map(t => `${t.name}: ${t.description}`)
 .join('\n');
 
 const prompt = `Given this request: "${request}"
 Choose the best tool from these options:
 ${descriptions}`;
 
 const response = await this.llm.complete(prompt);
 return this.tools.get(response.toolName);
 }
}

Memory Systems: Maintaining Context

Without memory, every conversation starts fresh. With the right memory systems, agents learn and improve over time.

Memory Types

1. Conversation History (Short-term) The immediate chat history. Essential for coherent conversations but limited by the model's context window.

interface ConversationMemory {
 messages: Message[];
 maxMessages: number;
 
 add(message: Message): void;
 getRecent(count: number): Message[];
 summarizeOlder(): string;
}

2. Session State (Medium-term) User preferences, current task status, and temporary data that persists across conversation turns.

3. Long-term Knowledge (Vector Storage) RAG systems that let agents query your documentation, knowledge base, or company data.

For advanced memory management techniques and context optimization strategies, explore our detailed guide on agent memory and context management.

Context Management Strategies

Sliding Window Keep the most recent N messages, drop older ones. Simple but loses long context.

Summarization Periodically summarize older messages into condensed notes. Preserves key information.

Hybrid Approach Keep recent messages verbatim, summarize history, maintain vector store for retrieval.

Implementing RAG

class KnowledgeRetrieval {
 async search(query: string, userId: string) {
 // Convert query to vector
 const queryVector = await this.embed(query);
 
 // Search vector database
 const results = await this.vectorStore.search({
 query: queryVector,
 filter: { userId },
 limit: 5
 });
 
 // Format for inclusion in prompt
 return results.map(r => r.content).join('\n\n---\n\n');
 }
}

RAG Best Practices

Chunk documents to 500-1000 tokens
Include source attribution in responses
Filter by user permissions
Track retrieval relevance for improvement

Error Handling and Safety

Agents that fail badly can harm users and businesses. Robust error handling and safety guardrails are non-negotiable for production systems.

Confidence-Based Routing

interface AgentResponse {
 content: string;
 confidence: number;
 alternatives?: string[];
 needsClarification?: boolean;
}

function routeResponse(response: AgentResponse) {
 if (response.confidence >= 0.9) {
 return { action: 'respond', content: response.content };
 }
 if (response.confidence >= 0.7) {
 return { 
 action: 'respond_with_confidence',
 content: response.content,
 confidenceNote: 'I believe this is correct...'
 };
 }
 if (response.confidence >= 0.5) {
 return { 
 action: 'ask_confirmation',
 content: response.content,
 prompt: 'Does this look right?'
 };
 }
 return { action: 'escalate' };
}

For debugging techniques and troubleshooting production agent issues, see our guide on AI agent debugging.

Safety Guardrails

Action Confirmation Require human approval for high-stakes operations: refunds, data deletion, payments, sensitive data access.

Rate Limiting Prevent runaway agents with per-user, per-minute, and per-hour limits.

Scope Boundaries Define exactly what the agent can and cannot do. Block unauthorized tool combinations.

Cost Controls Cap spending per conversation, per day, or per user.

Audit Logging

Every agent action should be logged for review:

async function executeWithLogging(toolCall: ToolCall) {
 const auditLog = {
 timestamp: new Date().toISOString(),
 userId: toolCall.userId,
 tool: toolCall.toolName,
 parameters: toolCall.parameters,
 status: 'pending'
 };
 
 try {
 const result = await toolCall.execute();
 auditLog.status = 'success';
 auditLog.result = result;
 await this.auditLogger.log(auditLog);
 return result;
 } catch (error) {
 auditLog.status = 'failed';
 auditLog.error = error.message;
 await this.auditLogger.log(auditLog);
 throw error;
 }
}

Human Escalation Paths

Always provide escape hatches:

Easy "talk to human" option
Escalation triggers for confidence < 0.5
Retry limits before human review
Critical action confirmations

Testing AI Agents

Agents introduce testing challenges beyond traditional software. You need to verify correct behavior across scenarios, error paths, and edge cases.

Unit Testing Tools

Test each tool in isolation:

describe('CustomerLookupTool', () => {
 it('returns customer for valid email', async () => {
 const tool = new CustomerLookupTool(mockDatabase);
 const result = await tool.execute({ email: '[email protected]' });
 expect(result.found).toBe(true);
 expect(result.name).toBe('John Doe');
 });
 
 it('returns not_found for invalid email', async () => {
 const tool = new CustomerLookupTool(mockDatabase);
 const result = await tool.execute({ email: '[email protected]' });
 expect(result.found).toBe(false);
 });
 
 it('throws for invalid input', async () => {
 const tool = new CustomerLookupTool(mockDatabase);
 await expect(tool.execute({ email: '' }))
 .rejects.toThrow('Invalid email');
 });
});

Scenario Testing

Test complete flows with mock LLM responses:

describe('OrderRefundFlow', () => {
 it('completes refund for valid order', async () => {
 const agent = buildTestAgent({
 tools: [mockRefundTool, mockEmailTool],
 responses: {
 'Find my order status': { tool: 'lookupOrder', result: { status: 'shipped' } },
 'Process refund': { tool: 'refundOrder', result: { success: true } },
 'Send confirmation': { tool: 'sendEmail', result: { sent: true } }
 }
 });
 
 await agent.handle('I want to return my order');
 
 expect(agent.lastTool).toBe('sendEmail');
 expect(mockRefundTool).toHaveBeenCalled();
 });
});

Error Path Testing

it('escalates when payment fails', async () => {
 const agent = buildTestAgent({
 tools: [mockPaymentTool],
 responses: {
 'Process payment': { tool: 'payment', error: 'Card declined' }
 }
 });
 
 await agent.handle('Pay for my order');
 
 expect(agent.lastAction.type).toBe('escalate');
 expect(agent.lastAction.reason).toBe('payment_failed');
});

Production Monitoring

Track these metrics in production:

Success Rate: % of conversations completing successfully
Escalation Rate: % requiring human intervention
Tool Error Rate: Failures by tool
Average Confidence: How certain are agent responses
Response Latency: Time to complete tasks

Technology Stack Options

Choose the right foundation for your agent project

Claude Agent SDK

Anthropic's official SDK with built-in tool support, conversation management, and safety features. Best for complex reasoning tasks.

LangChain / LangGraph

Flexible framework with extensive integrations. Ideal for custom workflows and rapid prototyping.

Custom Implementation

Full control over every aspect. Best when you have specific requirements that frameworks don't support.

Infrastructure

Node.js or Python backends, PostgreSQL for data, Redis for state, vector DBs for RAG.

Ready to Build Your AI Agent?

We help businesses design, build, and deploy production-ready AI agents that automate complex workflows.

Common Questions About Building AI Agents

How long does it take to build an AI agent?

Simple single-task agents can be built in days. Multi-capability agents with custom integrations typically take 2-4 weeks. Enterprise multi-agent systems with strict safety requirements may take longer depending on complexity.

What's the difference between using a framework and building from scratch?

Frameworks like LangChain provide pre-built components for common patterns, accelerating development. Building from scratch offers maximum control but requires more initial development. Most projects benefit from a hybrid approach--using frameworks for standard components and custom code for unique requirements.

How do I handle agents that make mistakes?

Implement confidence thresholds that trigger clarification requests or human escalation. Test extensively with edge cases. Log all agent actions for review. Start with human-in-the-loop oversight and gradually increase autonomy as confidence grows.

Can agents access my internal systems?

Yes, with proper security. Tools should validate permissions, log access, and follow your data governance policies. Never expose raw database credentials--build abstraction layers that enforce access controls.

What ongoing maintenance do agents require?

Monitor performance metrics and error rates. Update prompts based on edge cases encountered. Refresh knowledge bases regularly. Patch security vulnerabilities. Iteratively improve based on user feedback and changing requirements.

Sources

Botpress: How to Build AI Agents for Beginners (2025)
Relevance AI: How to Build an AI Agent - Comprehensive Guide 2025
Digital Thrive Knowledge Base: AI Agents & Chatbots
Digital Thrive Knowledge Base: AI Agent Development