LLM Tool Use and Function Calling

Practical implementation patterns for building AI agents that can take action in the real world. Master tool calling across OpenAI, Claude, and other LLM providers.

Understanding Tool Use in AI Agents

Function calling bridges the gap between language models and real-world functionality, enabling agents to query databases, call APIs, execute code, and interact with external systems.

What Is Function Calling?

Function calling is the ability of large language models to recognize when a task requires external functionality and respond with structured requests that external systems can execute. Rather than generating text alone, the model produces machine-readable function calls that specify which tool to invoke and with what parameters.

The Tool Calling Flow

The tool calling flow follows a predictable pattern across providers:

Request & Definition: The application sends a user request along with available tool definitions to the model.
Analysis: The model determines whether tool use is necessary.
Tool Call: If tools are required, the model returns structured output specifying which tool to call and with what arguments.
Execution: The application executes the tool and returns the results to the model.
Response: The model incorporates the tool results to generate a coherent response.

The Prompt Engineering Guide's function calling documentation provides comprehensive technical patterns for implementing these flows across different LLM providers.

Effective tool use transforms AI agents from passive text generators into active systems capable of completing real-world tasks. When combined with agent orchestration patterns, complex multi-step workflows become possible.

Why Tool Use Matters for Agent Development

Tool use is the foundation of effective AI agents

Information Retrieval

Agents can query databases, search document stores, and call external APIs to fetch real-time information.

Action Execution

Beyond retrieving data, agents can send emails, update records, process transactions, or trigger workflows.

Computation and Analysis

Agents can call specialized functions for calculations, data analysis, or code execution.

Multi-Step Workflows

Complex tasks require multiple operations in sequence. Tool use enables agents to chain actions and adapt based on outcomes.

OpenAI Tools and Function Calling

The OpenAI Tool Calling Architecture

OpenAI's function calling feature, now called "Tools," allows developers to define custom functions that GPT models can invoke. The system uses JSON Schema to define function parameters, ensuring structured and predictable responses from the model.

The tool calling implementation requires defining three core components:

The tool specification that describes available functions to the model
The model response parsing that extracts function calls from the model's output
The function execution layer that actually runs the requested operations

Defining Tools with JSON Schema

Tool definitions use JSON Schema to describe function parameters. This schema serves a dual purpose: it validates that the model produces valid arguments, and it provides the model with clear guidance about what information is needed. Well-designed schemas significantly improve call accuracy and reduce errors.

According to the Prompt Engineering Guide's OpenAI tools architecture, effective tool definitions require careful attention to schema design and parameter descriptions.

Tool Choice Options

OpenAI provides several options for controlling when and how tools are used:

Auto: The model decides whether to call tools based on the user's request
Required: Forces tool use, ensuring the model calls at least one tool when appropriate
Specific function: Specify a particular function that must be called

Handling Parallel Tool Calls

Modern OpenAI models can call multiple tools in a single response, enabling parallel execution for efficiency. When a request requires data from multiple sources, the model can request all necessary calls at once rather than waiting for sequential results. Your application should be prepared to execute multiple function calls and return all results together.

Parallel tool calls are particularly valuable for aggregation tasks and can significantly improve the efficiency of multi-agent systems where different agents handle different aspects of a complex workflow.

OpenAI Tool Definition Example

1{2 "type": "function",3 "function": {4 "name": "query_database",5 "description": "Query the customer database for user records",6 "parameters": {7 "type": "object",8 "properties": {9 "table": {10 "type": "string",11 "enum": ["users", "orders", "products"],12 "description": "The database table to query"13 },14 "filters": {15 "type": "object",16 "properties": {17 "status": {18 "type": "string",19 "enum": ["active", "pending", "archived"]20 },21 "limit": {22 "type": "integer",23 "minimum": 1,24 "maximum": 100,25 "default": 5026 }27 }28 }29 },30 "required": ["table"]31 }32 }33}

Claude Tool Use and Implementation

Anthropic's Tool-Based Approach

Anthropic uses a tool-based approach for Claude models where schemas are defined as "tools," and Claude enforces structure with type safety. This approach differs from OpenAI in its XML format communication and more explicit tool definition patterns. Claude's tool use implementation emphasizes structured outputs and clear boundaries between thinking and action phases.

The Agenta.ai guide to structured outputs and function calling provides detailed comparison of Claude's approach with other providers, highlighting the unique aspects of Anthropic's implementation.

The XML format used by Claude provides clear visual separation between the model's reasoning and its tool requests. Tool definitions include name, description, and input schema, matching OpenAI's general structure but with Anthropic-specific conventions.

Claude Tool Definition Patterns

Claude tool definitions follow a consistent structure with the tool name, a human-readable description, and an input schema. The description is particularly important because Claude uses it to understand when each tool is appropriate. Unlike OpenAI, Claude's tool definitions often include more detailed guidance about when to use or avoid specific tools.

Parallel Tool Calls in Claude

Claude supports parallel tool calls through the tool_use API pattern, allowing multiple tool invocations in a single response. The model can request several tools simultaneously when it determines that multiple information sources are needed. Applications receive all tool requests together and can execute them in parallel.

Result Handling Patterns

Claude's tool result handling emphasizes clear, structured feedback. After executing a tool, the application should return results in a format that Claude can easily parse and incorporate. Verbose tool outputs can overwhelm the context window, so effective implementations provide concise summaries highlighting relevant information.

Claude Tool Definition Example

1{2 "name": "search_documents",3 "description": "Search the knowledge base for relevant documentation. Use this tool when the user asks about product features, technical specifications, or policy details. Do not use for recent news or current events.",4 "input_schema": {5 "type": "object",6 "properties": {7 "query": {8 "type": "string",9 "description": "The search query, phrased as a question or keyword phrase"10 },11 "max_results": {12 "type": "integer",13 "description": "Maximum number of results to return",14 "default": 515 },16 "filters": {17 "type": "object",18 "properties": {19 "category": {20 "type": "string",21 "enum": ["api", "guides", "reference"]22 }23 }24 }25 },26 "required": ["query"]27 }28}

Tool Schema Design Best Practices

Designing Effective Parameter Schemas

Schema design is the foundation of reliable function calling. Poorly designed schemas lead to incorrect calls, validation failures, and frustrating user experiences. Effective schemas balance specificity with flexibility, providing enough guidance for the model to make correct choices without being so restrictive that valid requests fail.

Use Enums for Limited Options: When a parameter has a small set of valid values, use enum constraints. This eliminates hallucinations where the model invents values that don't exist. For status fields, option lists, or categorical inputs, enums provide clear guidance.

Set Appropriate Type Constraints: Type constraints prevent obvious errors. Use integer ranges for numeric values, string patterns for formatted inputs like emails or phone numbers, and array length limits for collections.

Mark Required Fields Clearly: Define which parameters are required versus optional. Required fields should be truly necessary--if a field is optional in practice, mark it as such.

Provide Default Values: For optional parameters with sensible defaults, include those defaults in the schema.

Description Writing for Model Understanding

The description fields in your schema are the model's primary guide for when and how to use each parameter. Vague descriptions lead to incorrect calls, while detailed descriptions improve accuracy significantly. Effective descriptions explain what the parameter represents, describe valid formats or values, and explain when the parameter is relevant.

Avoiding Common Schema Mistakes

Several schema design mistakes commonly cause function calling failures:

Over-specification: Defining every possible parameter when only a few are needed
Under-specification: Failing to provide enough guidance for complex parameters
Inconsistent naming: Field names that don't match application expectations
Missing type definitions: Allowing the model to produce arguments the code cannot process

The Agenta.ai schema design best practices provide additional guidance on avoiding these common pitfalls in production implementations.

Error Handling and Reliability

Understanding Failure Modes

Tool calls can fail for several reasons, and robust agents must handle each case appropriately. Understanding common failure modes helps you design effective error handling strategies.

Model-Side Errors: The model may produce invalid arguments that fail schema validation, call the wrong tool entirely, or call the right tool with nonsensical arguments.

Execution Errors: Even valid tool calls can fail during execution. Network timeouts, database connection failures, authentication errors, and rate limits are common external failures.

Timeout and Latency Issues: Tool calls may take significant time to complete, especially for external APIs. Long-running calls can degrade user experience.

Partial Results: Sometimes tools return incomplete or unexpected results. A database query might return fewer rows than expected, or an API might return data in an unexpected format.

Implementing Retry Logic

For transient failures like network timeouts or rate limits, retry logic improves reliability. Implement exponential backoff with jitter to avoid overwhelming struggling services. However, not all errors should be retried:

Retryable: Network timeouts, 503 errors, rate limits
Not retryable: 400 errors, authentication failures, resource not found

Validation and Sanitization

Never trust model output without validation. Even with perfect schemas, models can produce arguments that fail validation or contain problematic values. Validate all tool arguments before execution, and sanitize inputs to prevent injection attacks.

Graceful Degradation Strategies

When tools are unavailable or failing repeatedly, agents should degrade gracefully:

Fall back to cached data or secondary sources
Acknowledge limitations and provide partial information
Explain why operations can't be completed and suggest alternatives

Robust error handling is essential for production AI systems. Combined with agent memory and context management, agents can recover from failures and maintain coherent user experiences.

Best Practices for Production Implementation

Provider Selection Considerations

Different LLM providers offer different trade-offs for tool use implementations. OpenAI provides mature tooling with extensive documentation and broad model support. Anthropic's Claude offers strong reasoning capabilities and XML format clarity. Consider several factors when selecting a provider:

Model performance: Some models excel at tool selection while others are better at argument generation
Latency requirements: Some providers offer faster response times for tool call round-trips
Cost structures: Per-token pricing affects total cost for agents that make many tool calls

Optimizing for Cost and Performance

Tool-using agents can generate significant API costs through both model calls and tool execution. Optimize both dimensions:

Reduce unnecessary tool calls: Only invoke expensive calls when genuinely needed
Batch operations: Design tools that handle batch operations rather than single items
Cache tool results: Avoid redundant execution for frequently occurring calls
Select appropriate models: Simple tool selection may work with faster, cheaper models

Security Considerations

Tool use introduces significant security considerations that must be addressed in production systems:

Input validation: Sanitize all user inputs before passing them to tools
Authorization checks: Every tool call should verify user authorization
Rate limiting: Prevent abuse and protect downstream services
Audit logging: Record all tool calls with sufficient detail for security review
Human oversight: Require approval for sensitive operations

Testing Tool Use Systems

Testing tool-using agents requires a different approach than testing traditional software:

Unit tests: Test each tool's implementation in isolation
Integration tests: Test the complete tool calling flow
Adversarial testing: Test how the system handles malicious inputs
Performance testing: Measure latency and identify bottlenecks
Failure mode testing: Systematically test each failure mode

Our AI agent debugging guide provides additional strategies for testing and troubleshooting agent systems.

Building Multi-Tool Agent Systems

Tool Selection and Routing

Complex agents often have access to many tools, and determining which tool (or combination) to use becomes a key challenge. Effective tool selection requires:

Clear descriptions for each tool
Disambiguation mechanisms for ambiguous requests
Explicit routing logic for complex scenarios

Orchestrating Sequential Tool Calls

Many tasks require multiple tool calls in sequence. The results of one call may determine what the next call should be. Design your system to support multi-step workflows where the model can accumulate information across calls.

Managing Tool Result Context

As agents accumulate tool results, context window management becomes important:

Summarize tool results, extracting only relevant information
Track which information has been provided to avoid redundant calls
Implement intelligent caching to prevent unnecessary repetitions

Best Practices Summary

Tool use transforms LLMs from text generators into actionable agents. Success requires:

Careful schema design with clear descriptions
Robust error handling and graceful degradation
Production-ready security and testing
Thoughtful optimization for cost and performance

When building agents that integrate tool use with memory systems, the combination enables sophisticated workflows that learn and adapt over time. Explore our building AI agents from scratch guide for comprehensive implementation patterns.

Frequently Asked Questions

Ready to Build Intelligent AI Agents?

Let us help you implement tool use and function calling for your specific use case.

Sources

Prompt Engineering Guide - Function Calling - Comprehensive technical guide covering function calling across multiple LLM providers with practical implementation patterns and code examples.
Agenta.ai - The Guide to Structured Outputs and Function Calling with LLMs - Production-focused guide covering OpenAI, Claude, and Gemini implementations with JSON mode, Pydantic, Instructor, and Outlines libraries.