Introduction: Tool Calling as the Bridge to the Real World
Tool calling is the mechanism that allows AI agents to move beyond text generation and act in the real world. Without tool calling, an agent can only produce words: answers, explanations, code that stays on the screen. With tool calling, it can search the web, query databases, call external APIs, create files, send emails, manage deployments, and automate complex processes.
Tool calling transforms the language model from a passive generation system into an active orchestrator that autonomously decides which tools to use, what parameters to provide, and how to compose results into a coherent response. This capability is what distinguishes a chatbot from an agent: the former responds, the latter acts.
In this article, we will explore tool calling in depth: from the formal definition of a tool using JSON Schema, to input validation, output parsing, and integration with REST APIs, GraphQL, and databases. We will build a reusable tool framework and analyze advanced patterns such as dynamic tool discovery and long-running tool management.
What You Will Learn in This Article
- How to define tools with JSON Schema: name, description, parameters, and output
- Input validation and sanitization to prevent injection and errors
- Structured output parsing with error recovery
- Integration with REST APIs and auto-generation from OpenAPI specs
- Database tools with secure, parameterized queries
- How to build a reusable custom tool framework
- Dynamic tool discovery and runtime registration
- Handling long-running tools with streaming and timeouts
- Web scraping tools with BeautifulSoup and Playwright
- Best practices for tool naming, descriptions, and parameter design
Function Calling API: How It Works
The function calling API is the protocol that enables language models to request the execution of external functions. Both OpenAI and Anthropic have implemented this capability, each with slightly different approaches but converging on the same core principle: the model receives a list of available tool definitions, reasons about the user's request, and outputs a structured JSON object specifying which tool to call and with what parameters.
It is critical to understand that the model does not execute the tools itself. It only generates the call specification. The actual execution is handled by the application layer, which receives the model's output, validates it, invokes the tool, and feeds the result back to the model for further reasoning.
Tool Definition with JSON Schema
Every tool that an agent can use must be formally described so that the language model understands what it does, what parameters it accepts, and what kind of output it produces. The de facto standard for this description is JSON Schema, a declarative format for specifying the structure, types, and constraints of data.
A good tool specification is fundamental for agent quality: if the description is vague, the model will not know when to use the tool; if the parameters are ambiguous, it will generate calls with incorrect values; if the output is undocumented, it will be unable to correctly interpret the results.
# JSON Schema tool definition
tool_definition = {
"name": "search_database",
"description": "Search the project database for records matching a query. "
"Use this tool when the user asks about stored data, project records, "
"or needs to find specific entries. Do NOT use this tool for general "
"knowledge questions - those should be answered directly.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query. Supports boolean operators (AND, OR, NOT).",
"minLength": 1,
"maxLength": 500
},
"table": {
"type": "string",
"description": "The database table to search in",
"enum": ["issues", "pull_requests", "commits", "users", "projects"]
},
"limit": {
"type": "integer",
"description": "Maximum number of results to return",
"default": 10,
"minimum": 1,
"maximum": 100
},
"sort_by": {
"type": "string",
"description": "Field to sort results by",
"enum": ["relevance", "date_created", "date_updated", "priority"],
"default": "relevance"
}
},
"required": ["query", "table"]
}
}
OpenAI Function Calling Flow
The OpenAI function calling flow follows a well-defined cycle. The developer sends the conversation messages along with a list of available tools. The model analyzes the conversation, decides whether a tool call is needed, and if so, returns a response with the tool name and arguments as a structured JSON object. The developer executes the tool, appends the result as a new message, and sends the conversation back for the model to continue reasoning.
from openai import OpenAI
client = OpenAI()
# Define available tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given location. Use when the user asks about weather conditions.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'London, UK'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"default": "celsius"
}
},
"required": ["location"]
}
}
}
]
# Step 1: Send message with tools
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Rome?"}],
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
# Step 2: Check if model wants to call a tool
if message.tool_calls:
tool_call = message.tool_calls[0]
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Step 3: Execute the tool
result = execute_tool(function_name, arguments)
# Step 4: Send the result back to the model
follow_up = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "What's the weather in Rome?"},
message,
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
}
],
tools=tools
)
Anthropic Tool Use
Anthropic's Claude implements tool calling through a similar but distinct mechanism. Tools are
defined with an input_schema field instead of parameters, and the
response uses a tool_use content block. The overall flow is the same: define tools,
receive tool call requests, execute them, and return results.
import anthropic
client = anthropic.Anthropic()
# Define tools for Claude
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a given location.",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'London, UK'"
}
},
"required": ["location"]
}
}
]
# Send message with tools
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Rome?"}]
)
# Process tool use blocks
for block in response.content:
if block.type == "tool_use":
tool_name = block.name
tool_input = block.input
tool_use_id = block.id
# Execute and return result
result = execute_tool(tool_name, tool_input)
follow_up = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Rome?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": json.dumps(result)
}]
}
]
)
Creating Custom Tools with LangChain
LangChain provides a powerful and ergonomic system for creating custom tools through the
@tool decorator and Pydantic schemas. The framework automatically extracts
the tool name, description, and parameter schema from the Python function, making tool
creation straightforward while maintaining full control over validation and behavior.
The @tool Decorator
The simplest and most idiomatic way to create a tool in LangChain is the @tool
decorator. It transforms an ordinary Python function into a tool that agents can use,
automatically extracting metadata from the function signature and docstring. Type hints
are essential: LangChain uses them to generate the JSON schema that the model reads to
understand how to invoke the tool.
from langchain_core.tools import tool
@tool
def web_search(query: str, max_results: int = 5) -> str:
"""Search for information on the web using a search engine.
Args:
query: The search query to execute.
max_results: Maximum number of results to return (default: 5).
Returns:
A string with formatted search results.
"""
from tavily import TavilyClient
client = TavilyClient()
results = client.search(query, max_results=max_results)
formatted = []
for r in results["results"]:
formatted.append(f"- {r['title']}: {r['content'][:200]}")
return "\n".join(formatted) if formatted else "No results found."
The Description Is Critical
The function docstring is used as the tool description sent to the LLM. A clear, concise, and precise description is fundamental: the model reads it to decide when to invoke the tool and how to provide parameters. Vague descriptions lead to incorrect or missed invocations. Write the docstring as if you were explaining to a colleague what the function does and when to use it.
Pydantic Schemas for Input Validation
For more complex tools that require structured, validated input, Pydantic models provide
a robust alternative to basic type hints. By defining a Pydantic BaseModel
as the tool's input schema, you get automatic type coercion, constraint validation, and
detailed error messages when the model provides invalid parameters.
from pydantic import BaseModel, Field, validator
from langchain_core.tools import tool
from typing import Optional
from enum import Enum
class SearchCategory(str, Enum):
ISSUES = "issues"
DOCS = "documentation"
CODE = "code"
WIKI = "wiki"
class SearchInput(BaseModel):
"""Schema for the search tool input with full validation."""
query: str = Field(
...,
min_length=1,
max_length=500,
description="The search query. Supports boolean operators (AND, OR, NOT)."
)
category: SearchCategory = Field(
default=SearchCategory.DOCS,
description="The category to search in."
)
max_results: int = Field(
default=10,
ge=1,
le=100,
description="Maximum number of results to return."
)
date_from: Optional[str] = Field(
default=None,
description="Start date filter in ISO 8601 format (YYYY-MM-DD)."
)
date_to: Optional[str] = Field(
default=None,
description="End date filter in ISO 8601 format (YYYY-MM-DD)."
)
@validator("query")
def sanitize_query(cls, v):
"""Remove potentially dangerous characters from the query."""
dangerous_chars = [";", "--", "/*", "*/", "xp_", "exec("]
for char in dangerous_chars:
v = v.replace(char, "")
return v.strip()
@validator("date_from", "date_to")
def validate_date_format(cls, v):
if v is not None:
from datetime import datetime
try:
datetime.strptime(v, "%Y-%m-%d")
except ValueError:
raise ValueError(f"Invalid date format: {v}. Expected YYYY-MM-DD.")
return v
@tool(args_schema=SearchInput)
def search_project(
query: str,
category: str = "documentation",
max_results: int = 10,
date_from: Optional[str] = None,
date_to: Optional[str] = None
) -> str:
"""Search the project knowledge base for relevant documents, issues, or code.
Use this tool when the user asks about project-specific information,
documentation, bug reports, or source code. Do NOT use for general
knowledge questions.
Args:
query: The search query with optional boolean operators.
category: The category to search in (issues, documentation, code, wiki).
max_results: Maximum results to return (1-100).
date_from: Optional start date filter (YYYY-MM-DD).
date_to: Optional end date filter (YYYY-MM-DD).
Returns:
Formatted search results with titles, snippets, and relevance scores.
"""
# Implementation here
results = perform_search(query, category, max_results, date_from, date_to)
return format_results(results)
API Integration Tools
REST APIs are the most common integration point for AI agents. Most web services expose RESTful APIs, and integrating these APIs as agent tools unlocks a vast ecosystem of capabilities: from GitHub for code management, to Jira for project tracking, to Slack for communication, to any SaaS service with a public API.
Building a REST API Tool
A well-designed API tool wraps HTTP requests with proper authentication, error handling, and response formatting. The tool should abstract away the HTTP details and present the agent with a clean, domain-specific interface.
import httpx
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Optional
class GitHubIssueInput(BaseModel):
repo: str = Field(..., description="Repository in 'owner/repo' format")
state: str = Field(default="open", description="Issue state: open, closed, all")
labels: Optional[str] = Field(default=None, description="Comma-separated label names")
per_page: int = Field(default=10, ge=1, le=100, description="Results per page")
@tool(args_schema=GitHubIssueInput)
def list_github_issues(
repo: str,
state: str = "open",
labels: Optional[str] = None,
per_page: int = 10
) -> str:
"""List issues from a GitHub repository.
Use this tool to find bugs, feature requests, or discussions
in a specific GitHub repository. Supports filtering by state and labels.
Args:
repo: Repository in 'owner/repo' format (e.g., 'langchain-ai/langchain').
state: Filter by issue state (open, closed, all).
labels: Comma-separated label names to filter by.
per_page: Number of results per page (1-100).
Returns:
Formatted list of issues with titles, labels, and creation dates.
"""
import os
token = os.getenv("GITHUB_TOKEN")
headers = {
"Authorization": f"Bearer {token}",
"Accept": "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28"
}
params = {"state": state, "per_page": per_page}
if labels:
params["labels"] = labels
try:
response = httpx.get(
f"https://api.github.com/repos/{repo}/issues",
headers=headers,
params=params,
timeout=30.0
)
response.raise_for_status()
issues = response.json()
if not issues:
return f"No {state} issues found in {repo}."
formatted = []
for issue in issues:
labels_str = ", ".join(l["name"] for l in issue.get("labels", []))
formatted.append(
f"#{issue['number']} - {issue['title']}\n"
f" State: {issue['state']} | Labels: {labels_str or 'none'}\n"
f" Created: {issue['created_at'][:10]} | "
f" Comments: {issue['comments']}"
)
return f"Found {len(issues)} issues in {repo}:\n\n" + "\n\n".join(formatted)
except httpx.HTTPStatusError as e:
return f"GitHub API error: {e.response.status_code} - {e.response.text[:200]}"
except httpx.RequestError as e:
return f"Request failed: {str(e)}"
Authentication Patterns
Integrating external APIs requires handling different authentication mechanisms. The agent's tool framework must support the most common patterns transparently, without exposing credentials to the language model context.
API Authentication Methods
| Method | Implementation | Security | Use Case |
|---|---|---|---|
| API Key | Header X-API-Key or query parameter |
Medium | Simple APIs, internal services |
| Bearer Token | Header Authorization: Bearer <token> |
High | Standard RESTful APIs, JWT |
| OAuth 2.0 | Authorization flow with token refresh | Very High | Third-party APIs, delegated access |
| mTLS | Bilateral client and server certificates | Maximum | Enterprise APIs, internal microservices |
Rate Limiting and Retry Logic
Every API has frequency limits that must be respected. An agent that calls APIs without managing rate limiting risks being blocked, degrading the user experience. Management should be proactive: track remaining limits and slow down before reaching the threshold, not just react afterwards.
import time
from collections import defaultdict
class RateLimiter:
"""Token bucket rate limiter for external APIs."""
def __init__(self):
self.limits: dict[str, dict] = {}
def configure(self, api_name: str, requests_per_minute: int):
self.limits[api_name] = {
"rpm": requests_per_minute,
"tokens": requests_per_minute,
"last_refill": time.time()
}
def acquire(self, api_name: str) -> bool:
"""Try to acquire a token. Returns False if rate limited."""
if api_name not in self.limits:
return True
limit = self.limits[api_name]
now = time.time()
elapsed = now - limit["last_refill"]
# Refill tokens proportionally to elapsed time
refill = elapsed * (limit["rpm"] / 60.0)
limit["tokens"] = min(limit["rpm"], limit["tokens"] + refill)
limit["last_refill"] = now
if limit["tokens"] >= 1:
limit["tokens"] -= 1
return True
return False
def wait_time(self, api_name: str) -> float:
"""Estimated wait time before next available token."""
if api_name not in self.limits:
return 0
limit = self.limits[api_name]
if limit["tokens"] >= 1:
return 0
return (1 - limit["tokens"]) * (60.0 / limit["rpm"])
class ResilientAPIClient:
"""API client with rate limiting, retry, and circuit breaker."""
def __init__(self, base_url: str, rate_limiter: RateLimiter):
self.base_url = base_url
self.rate_limiter = rate_limiter
self.max_retries = 3
self.base_delay = 1.0
def request(self, method: str, path: str, **kwargs) -> dict:
api_name = self.base_url
for attempt in range(self.max_retries):
# Wait if rate limited
if not self.rate_limiter.acquire(api_name):
wait = self.rate_limiter.wait_time(api_name)
time.sleep(wait)
try:
response = httpx.request(
method, f"{self.base_url}{path}",
timeout=30.0, **kwargs
)
# Handle rate limit responses (HTTP 429)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
time.sleep(retry_after)
continue
response.raise_for_status()
return response.json()
except httpx.TimeoutException:
delay = self.base_delay * (2 ** attempt)
time.sleep(delay)
except httpx.HTTPStatusError as e:
if e.response.status_code >= 500:
delay = self.base_delay * (2 ** attempt)
time.sleep(delay)
else:
raise
raise Exception(f"Failed after {self.max_retries} attempts")
Web Scraping Tools
Web scraping tools allow agents to extract structured data from websites that do not provide APIs. These tools are essential for tasks like market research, competitive analysis, content aggregation, and data collection. Two primary approaches exist: static scraping with BeautifulSoup for simple HTML pages, and dynamic scraping with Playwright for JavaScript-rendered content.
Static Scraping with BeautifulSoup
import httpx
from bs4 import BeautifulSoup
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Optional
class ScrapeInput(BaseModel):
url: str = Field(..., description="URL of the page to scrape")
selector: Optional[str] = Field(
default=None,
description="CSS selector to target specific elements (e.g., 'article p', '.content h2')"
)
extract: str = Field(
default="text",
description="What to extract: 'text' for visible text, 'links' for all links, 'tables' for table data"
)
@tool(args_schema=ScrapeInput)
def scrape_webpage(
url: str,
selector: Optional[str] = None,
extract: str = "text"
) -> str:
"""Scrape content from a webpage. Extracts text, links, or table data.
Use this tool when you need to read the content of a specific webpage,
extract links from a page, or parse tabular data. Works best with
static HTML pages. For JavaScript-heavy sites, use scrape_dynamic instead.
Args:
url: The full URL of the page to scrape.
selector: Optional CSS selector to target specific page elements.
extract: Type of content to extract (text, links, tables).
Returns:
Extracted content formatted as plain text.
"""
try:
headers = {
"User-Agent": "Mozilla/5.0 (compatible; ResearchBot/1.0)"
}
response = httpx.get(url, headers=headers, timeout=30.0, follow_redirects=True)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
# Remove script and style elements
for tag in soup(["script", "style", "nav", "footer", "header"]):
tag.decompose()
if selector:
elements = soup.select(selector)
if not elements:
return f"No elements found matching selector: {selector}"
target = elements
else:
target = [soup]
if extract == "text":
texts = []
for el in target:
text = el.get_text(separator="\n", strip=True)
texts.append(text)
content = "\n\n".join(texts)
# Truncate to avoid overwhelming the context
if len(content) > 8000:
content = content[:8000] + "\n... [truncated]"
return content
elif extract == "links":
links = []
for el in target:
for a in el.find_all("a", href=True):
text = a.get_text(strip=True)
href = a["href"]
if href.startswith("http"):
links.append(f"- [{text}]({href})")
return "\n".join(links[:50]) if links else "No links found."
elif extract == "tables":
tables = []
for el in target:
for table in el.find_all("table"):
rows = []
for tr in table.find_all("tr"):
cells = [td.get_text(strip=True) for td in tr.find_all(["td", "th"])]
rows.append(" | ".join(cells))
tables.append("\n".join(rows))
return "\n\n---\n\n".join(tables) if tables else "No tables found."
return "Invalid extract type. Use: text, links, or tables."
except httpx.HTTPStatusError as e:
return f"HTTP error {e.response.status_code}: {url}"
except Exception as e:
return f"Scraping error: {str(e)}"
Dynamic Scraping with Playwright
Modern web applications render content dynamically using JavaScript. For these sites, static HTML parsing is insufficient because the content does not exist in the initial HTML response. Playwright provides a headless browser that executes JavaScript and renders the page fully before extraction.
from playwright.async_api import async_playwright
from langchain_core.tools import tool
import asyncio
@tool
def scrape_dynamic(
url: str,
wait_selector: str = "body",
extract_selector: str = "body"
) -> str:
"""Scrape content from a JavaScript-rendered webpage using a headless browser.
Use this tool for modern web applications (React, Vue, Angular) where
content is rendered dynamically. More resource-intensive than static scraping
but handles all page types.
Args:
url: The full URL of the page to scrape.
wait_selector: CSS selector to wait for before extracting (ensures page is loaded).
extract_selector: CSS selector for the content to extract.
Returns:
Extracted text content from the rendered page.
"""
async def _scrape():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent="Mozilla/5.0 (compatible; ResearchBot/1.0)"
)
page = await context.new_page()
try:
await page.goto(url, wait_until="networkidle", timeout=30000)
await page.wait_for_selector(wait_selector, timeout=10000)
# Extract text content
content = await page.eval_on_selector(
extract_selector,
"el => el.innerText"
)
if len(content) > 8000:
content = content[:8000] + "\n... [truncated]"
return content
except Exception as e:
return f"Dynamic scraping error: {str(e)}"
finally:
await browser.close()
return asyncio.run(_scrape())
Web Scraping Considerations
Always respect the target website's robots.txt file and terms of service.
Implement appropriate delays between requests to avoid overloading servers. Use caching
to minimize redundant requests. Be aware that some websites actively block automated
scraping and may require additional strategies such as rotating user agents or using
proxy services.
Database Tools
Database tools allow agents to query and modify relational and NoSQL databases. They are among the most powerful and simultaneously most dangerous tools: an incorrect query can expose sensitive data, corrupt records, or cause performance problems. Security must be built into the architecture from the ground up.
Security Architecture for Database Tools
Database access must be mediated by a multi-layered security system:
- Read-only vs. read-write permissions: by default, database tools should have read-only access. Write operations must require explicit authorization and user confirmation
- Query allowlist: restrict the types of queries that can be executed. Only SELECT, never DROP, ALTER, or TRUNCATE
- Row limit: always enforce a LIMIT on queries to prevent returning millions of rows and overwhelming the context
- Table access control: define a whitelist of accessible tables, excluding sensitive tables such as those containing passwords or financial data
- Audit logging: record every executed query with timestamp, user, and result for accountability and debugging
import sqlite3
from langchain_core.tools import tool
from typing import Optional
import time
import re
class SecureDatabaseTool:
"""Database tool with multi-layered security."""
ALLOWED_TABLES = {"issues", "projects", "sprints", "tasks", "comments"}
MAX_ROWS = 100
BLOCKED_KEYWORDS = {"DROP", "ALTER", "TRUNCATE", "DELETE", "INSERT", "UPDATE"}
def __init__(self, db_path: str, read_only: bool = True):
self.db_path = db_path
self.read_only = read_only
self.audit_log = []
def _extract_tables(self, query: str) -> set[str]:
"""Extract table names from a SQL query."""
pattern = r'\bFROM\s+(\w+)|\bJOIN\s+(\w+)'
matches = re.findall(pattern, query, re.IGNORECASE)
return {m[0] or m[1] for m in matches if m[0] or m[1]}
def execute_query(self, query: str, params: tuple = ()) -> dict:
"""Execute a SQL query with security checks."""
query_upper = query.upper().strip()
# 1. Block non-SELECT queries in read-only mode
if self.read_only:
if not query_upper.startswith("SELECT"):
return {"error": "Only SELECT queries are allowed in read-only mode"}
for keyword in self.BLOCKED_KEYWORDS:
if keyword in query_upper:
return {"error": f"Blocked keyword detected: {keyword}"}
# 2. Verify accessed tables
tables_in_query = self._extract_tables(query)
unauthorized = tables_in_query - self.ALLOWED_TABLES
if unauthorized:
return {"error": f"Access denied to tables: {unauthorized}"}
# 3. Add LIMIT if missing
if "LIMIT" not in query_upper:
query = query.rstrip(";") + f" LIMIT {self.MAX_ROWS}"
# 4. Execute with parameterized query (NEVER string concatenation)
try:
conn = sqlite3.connect(self.db_path)
cursor = conn.execute(query, params)
columns = [desc[0] for desc in cursor.description]
rows = [dict(zip(columns, row)) for row in cursor.fetchall()]
conn.close()
# 5. Audit log
self.audit_log.append({
"query": query, "params": params,
"rows_returned": len(rows),
"timestamp": time.time()
})
return {
"success": True,
"columns": columns,
"rows": rows,
"row_count": len(rows)
}
except Exception as e:
return {"error": f"Query error: {str(e)}"}
# Create LangChain tool from the secure database class
db_tool = SecureDatabaseTool("app.db", read_only=True)
@tool
def query_database(sql_query: str) -> str:
"""Execute a read-only SQL query against the project database.
IMPORTANT: Only SELECT queries are allowed. Do not attempt INSERT,
UPDATE, DELETE, or DDL operations. Available tables: issues, projects,
sprints, tasks, comments.
Args:
sql_query: The SQL SELECT query to execute.
Returns:
Query results formatted as a text table, or an error message.
"""
result = db_tool.execute_query(sql_query)
if "error" in result:
return f"Error: {result['error']}"
if not result["rows"]:
return "The query returned no results."
# Format as readable table
columns = result["columns"]
rows = result["rows"]
header = " | ".join(columns)
separator = "-" * len(header)
body = "\n".join(
" | ".join(str(row.get(col, "")) for col in columns)
for row in rows
)
return f"Results ({result['row_count']} rows):\n{header}\n{separator}\n{body}"
SQL Injection Prevention
When a tool accepts parameters that will be used in SQL queries, injection protection is absolutely critical. A language model might generate parameters containing malicious SQL code, either because it was influenced by adversarial content in the prompt, or because the user is attempting an intentional attack. The fundamental rule is simple and admits no exceptions: always use parameterized queries, never concatenate strings to build SQL queries.
Tool Composition and Pipelines
The power of agents lies not in individual tools, but in the ability to compose them into complex workflows. A mature agent can chain multiple tools to complete tasks that no single tool could handle alone. Tool composition transforms a collection of simple capabilities into a sophisticated problem-solving system.
Sequential Tool Pipelines
In a sequential pipeline, each tool's output becomes the next tool's input. This pattern is ideal for multi-step workflows where each step depends on the previous result.
from dataclasses import dataclass
from typing import Any, Callable
@dataclass
class PipelineStep:
tool_name: str
transform: Callable[[Any], dict] # Transforms previous output to next input
description: str
class ToolPipeline:
"""Compose tools into sequential or conditional pipelines."""
def __init__(self, registry: dict):
self.registry = registry
self.steps: list[PipelineStep] = []
def add_step(self, tool_name: str, transform=None, description=""):
self.steps.append(PipelineStep(
tool_name=tool_name,
transform=transform or (lambda x: x),
description=description
))
return self
def execute(self, initial_input: dict) -> list[dict]:
"""Execute all pipeline steps sequentially."""
results = []
current_input = initial_input
for i, step in enumerate(self.steps):
# Transform input from previous step
tool_input = step.transform(current_input)
# Execute the tool
tool = self.registry[step.tool_name]
result = tool.invoke(tool_input)
results.append({
"step": i + 1,
"tool": step.tool_name,
"description": step.description,
"input": tool_input,
"output": result
})
current_input = result
return results
# Example: Bug Analysis Pipeline
pipeline = ToolPipeline(tool_registry)
pipeline.add_step(
"search_database",
transform=lambda inp: {"sql_query": f"SELECT * FROM issues WHERE title LIKE '%{inp['bug_title']}%'"},
description="Search for existing reports of this bug"
).add_step(
"search_codebase",
transform=lambda prev: {"query": prev, "file_types": ["py", "ts"]},
description="Find relevant source code files"
).add_step(
"analyze_code",
transform=lambda prev: {"code": prev, "checks": ["bugs", "complexity"]},
description="Analyze the code for potential issues"
)
results = pipeline.execute({"bug_title": "login timeout"})
Conditional Routing
Conditional routing enables the agent to choose different tool paths based on intermediate results. This pattern is essential for workflows where the next action depends on the outcome of the current step.
class ConditionalPipeline:
"""Pipeline with conditional branching based on tool results."""
def __init__(self, registry: dict):
self.registry = registry
self.routes: dict[str, list] = {}
def add_route(self, condition_name: str, condition_fn: Callable,
steps: list[PipelineStep]):
"""Add a conditional route with its pipeline steps."""
self.routes[condition_name] = {
"condition": condition_fn,
"steps": steps
}
def execute(self, initial_input: dict) -> dict:
"""Execute the appropriate route based on conditions."""
for route_name, route in self.routes.items():
if route["condition"](initial_input):
pipeline = ToolPipeline(self.registry)
pipeline.steps = route["steps"]
return {
"route": route_name,
"results": pipeline.execute(initial_input)
}
return {"error": "No matching route found"}
# Example: Route based on query type
router = ConditionalPipeline(tool_registry)
router.add_route(
"code_question",
condition_fn=lambda x: any(kw in x.get("query", "").lower()
for kw in ["code", "function", "class", "bug", "error"]),
steps=[
PipelineStep("search_codebase", lambda x: x, "Search code"),
PipelineStep("analyze_code", lambda x: x, "Analyze findings"),
]
)
router.add_route(
"data_question",
condition_fn=lambda x: any(kw in x.get("query", "").lower()
for kw in ["how many", "count", "statistics", "report"]),
steps=[
PipelineStep("query_database", lambda x: x, "Query data"),
PipelineStep("generate_report", lambda x: x, "Create report"),
]
)
Tool Composition Example: Automated Bug Investigation
When a user reports a bug, the agent can automatically orchestrate a sequence of tool calls:
search_database: check the bug tracker for existing reports of this bugsearch_codebase: find source code files relevant to the affected componentanalyze_code: analyze the code to identify the root causegenerate_fix: generate a suggested patch based on the analysisrun_tests: execute tests to verify the fix does not introduce regressionscreate_pull_request: create a PR with the fix and a detailed description
Each tool uses the output of the previous tool as input, creating an automated pipeline that transforms a bug report into a verified pull request.
Error Handling in Tool Execution
Error handling is a critical aspect of AI agent development. A production agent must gracefully handle network errors, timeouts, malformed responses, and iteration limits. Without robust error handling, a single failing tool call can cascade into a complete agent failure, leaving the user without a response.
Structured Error Responses
Every tool should return output in a structured, predictable format. A standard result object with consistent fields allows the agent framework to handle errors uniformly regardless of the tool that produced them.
from dataclasses import dataclass, field
from typing import Any, Optional
from enum import Enum
class ErrorSeverity(str, Enum):
TRANSIENT = "transient" # Retryable (timeout, rate limit)
VALIDATION = "validation" # Bad input, do not retry
PERMANENT = "permanent" # Service down, API key invalid
UNKNOWN = "unknown" # Unexpected error
@dataclass
class ToolResult:
"""Standard format for tool execution results."""
success: bool
data: Any = None
error: Optional[str] = None
error_severity: Optional[ErrorSeverity] = None
metadata: dict = field(default_factory=dict)
def to_message(self) -> str:
"""Convert to a message readable by the language model."""
if self.success:
if isinstance(self.data, list):
return f"Found {len(self.data)} results:\n" + \
"\n".join(str(item) for item in self.data)
return str(self.data)
else:
severity_hint = ""
if self.error_severity == ErrorSeverity.TRANSIENT:
severity_hint = " (this error may resolve on retry)"
elif self.error_severity == ErrorSeverity.VALIDATION:
severity_hint = " (check input parameters)"
return f"Error: {self.error}{severity_hint}"
Retry Logic with Exponential Backoff
import time
import random
from functools import wraps
def with_retry(max_attempts: int = 3, base_delay: float = 1.0,
max_delay: float = 60.0, jitter: bool = True):
"""Decorator that adds retry logic with exponential backoff."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs) -> ToolResult:
last_error = None
for attempt in range(max_attempts):
try:
result = func(*args, **kwargs)
if isinstance(result, ToolResult) and not result.success:
# Only retry transient errors
if result.error_severity != ErrorSeverity.TRANSIENT:
return result
last_error = result.error
else:
return result
except TimeoutError as e:
last_error = f"Timeout: {str(e)}"
except ConnectionError as e:
last_error = f"Connection failed: {str(e)}"
except Exception as e:
# Non-transient errors: do not retry
return ToolResult(
success=False,
error=str(e),
error_severity=ErrorSeverity.UNKNOWN
)
# Calculate delay with exponential backoff
delay = min(base_delay * (2 ** attempt), max_delay)
if jitter:
delay = delay * (0.5 + random.random())
time.sleep(delay)
return ToolResult(
success=False,
error=f"Failed after {max_attempts} attempts. Last error: {last_error}",
error_severity=ErrorSeverity.PERMANENT
)
return wrapper
return decorator
# Usage
@with_retry(max_attempts=3, base_delay=1.0)
def call_external_api(endpoint: str, params: dict) -> ToolResult:
response = httpx.get(endpoint, params=params, timeout=10.0)
if response.status_code == 429:
return ToolResult(
success=False,
error="Rate limited",
error_severity=ErrorSeverity.TRANSIENT
)
response.raise_for_status()
return ToolResult(success=True, data=response.json())
Fallback Strategies
When a tool fails even after retries, the agent needs fallback strategies to continue serving the user. A well-designed fallback system maintains service quality even when individual components are unavailable.
- Alternative tools: if one search tool is down, use another search provider
- Cached results: return previously cached results when the live service is unavailable
- Graceful degradation: respond with available information and clearly communicate what could not be retrieved
- Human escalation: for critical operations, escalate to a human operator when automated tools fail
Tool Testing
Tools must be tested independently before being integrated into an agent. Testing tools separately from the LLM ensures that failures are attributable to the tool implementation rather than to model behavior, making debugging significantly easier.
Unit Testing Tools
Each tool should have comprehensive unit tests covering normal operation, edge cases, and error conditions. External dependencies should be mocked to ensure tests are fast, deterministic, and independent of external service availability.
import pytest
from unittest.mock import patch, MagicMock
# Test the search tool
class TestWebSearchTool:
def test_successful_search(self):
"""Test that search returns formatted results."""
mock_results = {
"results": [
{"title": "Python Guide", "content": "A comprehensive guide to Python...", "url": "https://example.com"},
{"title": "Python Tutorial", "content": "Learn Python step by step...", "url": "https://example2.com"}
]
}
with patch("tavily.TavilyClient") as MockClient:
MockClient.return_value.search.return_value = mock_results
result = web_search.invoke({"query": "python tutorial", "max_results": 2})
assert "Python Guide" in result
assert "Python Tutorial" in result
def test_empty_results(self):
"""Test handling of no search results."""
with patch("tavily.TavilyClient") as MockClient:
MockClient.return_value.search.return_value = {"results": []}
result = web_search.invoke({"query": "xyznonexistent", "max_results": 5})
assert "No results found" in result
def test_api_error_handling(self):
"""Test graceful handling of API errors."""
with patch("tavily.TavilyClient") as MockClient:
MockClient.return_value.search.side_effect = ConnectionError("API unavailable")
with pytest.raises(ConnectionError):
web_search.invoke({"query": "test", "max_results": 5})
# Test the database tool
class TestDatabaseTool:
def setup_method(self):
"""Create an in-memory test database."""
import sqlite3
self.conn = sqlite3.connect(":memory:")
self.conn.execute(
"CREATE TABLE issues (id INTEGER, title TEXT, status TEXT)"
)
self.conn.execute(
"INSERT INTO issues VALUES (1, 'Login bug', 'open')"
)
self.conn.execute(
"INSERT INTO issues VALUES (2, 'UI glitch', 'closed')"
)
self.conn.commit()
self.db_tool = SecureDatabaseTool.__new__(SecureDatabaseTool)
self.db_tool.read_only = True
self.db_tool.audit_log = []
def test_select_query(self):
"""Test that SELECT queries execute successfully."""
result = self.db_tool.execute_query(
"SELECT * FROM issues WHERE status = ?", ("open",)
)
assert result["success"] is True
assert len(result["rows"]) == 1
assert result["rows"][0]["title"] == "Login bug"
def test_blocked_write_query(self):
"""Test that write queries are blocked in read-only mode."""
result = self.db_tool.execute_query(
"DELETE FROM issues WHERE id = 1"
)
assert "error" in result
assert "Only SELECT" in result["error"]
def test_blocked_table_access(self):
"""Test that access to unauthorized tables is denied."""
result = self.db_tool.execute_query(
"SELECT * FROM passwords"
)
assert "error" in result
assert "Access denied" in result["error"]
def test_automatic_limit(self):
"""Test that LIMIT is added when missing."""
# This test would verify the modified query includes LIMIT
result = self.db_tool.execute_query("SELECT * FROM issues")
assert result["success"] is True
assert len(self.db_tool.audit_log) == 1
Mocking External Services
When tools depend on external services (APIs, databases, web pages), mocking is essential for creating reliable, fast tests. The key principle is to mock at the boundary: replace the external HTTP call or database connection, not internal logic.
import pytest
from unittest.mock import patch, AsyncMock
import httpx
class TestGitHubIssueTool:
"""Test suite for the GitHub issues tool with mocked API."""
@pytest.fixture
def mock_github_response(self):
return [
{
"number": 42,
"title": "Authentication fails on timeout",
"state": "open",
"labels": [{"name": "bug"}, {"name": "priority-high"}],
"created_at": "2026-01-15T10:30:00Z",
"comments": 5
},
{
"number": 43,
"title": "Add dark mode support",
"state": "open",
"labels": [{"name": "enhancement"}],
"created_at": "2026-01-16T14:00:00Z",
"comments": 2
}
]
@patch("httpx.get")
def test_list_issues_success(self, mock_get, mock_github_response):
"""Test successful issue listing."""
mock_response = MagicMock()
mock_response.json.return_value = mock_github_response
mock_response.raise_for_status = MagicMock()
mock_get.return_value = mock_response
result = list_github_issues.invoke({
"repo": "owner/repo",
"state": "open",
"per_page": 10
})
assert "#42" in result
assert "Authentication fails on timeout" in result
assert "bug" in result
mock_get.assert_called_once()
@patch("httpx.get")
def test_rate_limit_handling(self, mock_get):
"""Test handling of GitHub rate limit (HTTP 429)."""
mock_response = MagicMock()
mock_response.status_code = 429
mock_response.raise_for_status.side_effect = httpx.HTTPStatusError(
"Rate limited", request=MagicMock(), response=mock_response
)
mock_response.text = "API rate limit exceeded"
mock_get.return_value = mock_response
result = list_github_issues.invoke({
"repo": "owner/repo",
"state": "open"
})
assert "error" in result.lower() or "rate" in result.lower()
@patch("httpx.get")
def test_empty_repository(self, mock_get):
"""Test handling of a repository with no issues."""
mock_response = MagicMock()
mock_response.json.return_value = []
mock_response.raise_for_status = MagicMock()
mock_get.return_value = mock_response
result = list_github_issues.invoke({
"repo": "owner/empty-repo",
"state": "open"
})
assert "No" in result and "issues" in result
Testing Best Practices for Tools
- Test independently: each tool should have its own test suite, separate from agent-level tests
- Mock at boundaries: replace external HTTP calls and database connections, not internal logic
- Test error paths: verify that tools return meaningful error messages for all failure modes
- Test validation: ensure Pydantic schemas reject invalid inputs with clear error messages
- Test edge cases: empty results, maximum-length inputs, special characters, unicode content
- Use fixtures: create reusable test data and mock responses with pytest fixtures
- Test idempotency: tools should produce consistent results when called multiple times with the same input
Best Practices for Tool Design
The quality of an agent's tool calling depends heavily on how well the tools are designed. A well-named tool with a clear description and intuitive parameters will be invoked correctly by the model far more often than a poorly designed one. These best practices are drawn from production experience with thousands of tool invocations.
Naming Conventions
Tool names should follow the verb_noun pattern, making them self-explanatory
and reducing ambiguity. The name is the first thing the model sees when deciding which
tool to use, so clarity is paramount.
Tool Naming Guidelines
| Pattern | Good Example | Bad Example | Why |
|---|---|---|---|
| verb_noun | search_database |
database |
Action is clear from the name |
| Specific action | create_github_issue |
github |
Specifies which GitHub operation |
| Domain prefix | jira_list_tickets |
list_things |
Domain context helps selection |
| Scope clarity | read_file_content |
file |
Distinguishes from write/delete |
| Avoid abbreviations | calculate_statistics |
calc_stats |
Full words are clearer for the LLM |
Writing Effective Descriptions
The tool description is the most critical field in the tool definition. The model uses it to decide when to invoke the tool and how to provide parameters. A well-written description should answer three questions: What does this tool do? When should the model use it? When should it NOT use it?
# BAD: Vague, unhelpful description
@tool
def search(query: str) -> str:
"""Search for stuff."""
pass
# GOOD: Specific, actionable description with boundaries
@tool
def search_project_documentation(
query: str,
section: str = "all"
) -> str:
"""Search the project's internal documentation for guides, API references,
and architecture decisions.
USE this tool when:
- The user asks about how a specific feature works
- The user needs API endpoint details or parameters
- Questions about architecture decisions or design patterns used
DO NOT use this tool when:
- The user asks general programming questions (answer directly)
- The user asks about external libraries (use web_search instead)
- The question is about project issues or bugs (use search_issues instead)
Args:
query: Natural language search query. Be specific and include
relevant technical terms for best results.
section: Documentation section to search in. Options: 'all',
'api', 'architecture', 'guides', 'changelog'.
Returns:
Top matching documentation excerpts with section references.
Returns "No results found" if no matches exist.
"""
pass
Parameter Design Principles
Well-designed parameters make it easy for the model to generate correct tool calls. Follow these principles for parameter design:
- Use enums for constrained values: when a parameter has a fixed set of valid options, use an
enumfield. This prevents the model from inventing invalid values and makes the options explicit - Provide sensible defaults: optional parameters should have default values that work well for the most common use case. This reduces the cognitive burden on the model
- Include examples in descriptions: concrete examples of valid parameter values are more informative than abstract type definitions
- Set bounds for numeric parameters: always specify
minimumandmaximumfor numeric fields to prevent unreasonable values - Keep the parameter count low: tools with more than 5-6 parameters are harder for models to use correctly. Consider breaking them into multiple focused tools
- Use descriptive parameter names:
file_pathis better thanpath,max_resultsis better thanlimitorn
Tool Granularity
Finding the right level of granularity is one of the most important decisions in tool design. Tools that are too broad try to do everything and become complex to use. Tools that are too narrow require many calls for simple operations and waste tokens.
# TOO BROAD: One tool tries to do everything
@tool
def manage_project(action: str, target: str, data: dict) -> str:
"""Manage all project operations: create issues, update tasks,
assign users, close sprints, generate reports..."""
pass # Complex switch-case logic
# TOO NARROW: Excessive fragmentation
@tool
def get_issue_title(issue_id: int) -> str:
"""Get only the title of an issue."""
pass
@tool
def get_issue_status(issue_id: int) -> str:
"""Get only the status of an issue."""
pass
@tool
def get_issue_assignee(issue_id: int) -> str:
"""Get only the assignee of an issue."""
pass
# JUST RIGHT: Focused but complete
@tool
def get_issue_details(issue_id: int, fields: list[str] = None) -> str:
"""Get details of a specific issue by ID.
Args:
issue_id: The unique identifier of the issue.
fields: Optional list of fields to return. If not specified,
returns all fields. Options: title, status, assignee,
priority, labels, created_at, updated_at, description.
Returns:
Issue details as formatted text.
"""
pass
@tool
def search_issues(query: str, status: str = "open", limit: int = 10) -> str:
"""Search for issues matching a query."""
pass
@tool
def create_issue(title: str, description: str, labels: list[str] = None) -> str:
"""Create a new issue in the project tracker."""
pass
The Golden Rule of Tool Design
Design each tool as if you were creating a function for a colleague who has never seen your codebase. The name should explain what it does. The description should explain when to use it. The parameters should be self-documenting with types, constraints, and examples. If a developer would need to read the implementation to understand how to use the tool, the specification needs improvement.
Advanced Patterns: Dynamic Tool Discovery
In advanced agents, the set of available tools is not fixed: it can change at runtime based on context, user permissions, or external service availability. Dynamic tool discovery is the mechanism that allows the agent to discover new tools and register them dynamically without restarting the system.
Semantic Tool Selection
When the agent has access to many tools (dozens or hundreds), it is impractical to include them all in the prompt: they would consume too many tokens and confuse the model. Semantic tool selection dynamically chooses only the relevant tools for the current query, based on the semantic similarity between the tool description and the user request.
from numpy import dot
from numpy.linalg import norm
def cosine_similarity(a: list[float], b: list[float]) -> float:
return dot(a, b) / (norm(a) * norm(b))
class DynamicToolSelector:
"""Selects the most relevant tools for the current query."""
def __init__(self, all_tools: list[dict], embedding_model):
self.all_tools = all_tools
self.embedding_model = embedding_model
self._tool_embeddings: dict[str, list[float]] = {}
self._build_index()
def _build_index(self):
"""Pre-compute embeddings for all tool descriptions."""
for tool in self.all_tools:
text = f"{tool['name']}: {tool['description']}"
self._tool_embeddings[tool["name"]] = self.embedding_model.embed(text)
def select(self, query: str, max_tools: int = 10) -> list[dict]:
"""Select the most relevant tools for a given query."""
query_embedding = self.embedding_model.embed(query)
scores = []
for tool_name, tool_embedding in self._tool_embeddings.items():
similarity = cosine_similarity(query_embedding, tool_embedding)
scores.append((tool_name, similarity))
scores.sort(key=lambda x: x[1], reverse=True)
selected_names = {name for name, _ in scores[:max_tools]}
return [t for t in self.all_tools if t["name"] in selected_names]
# Usage in an agent
selector = DynamicToolSelector(all_available_tools, embedding_model)
# For each user query, select only relevant tools
relevant_tools = selector.select("How many open bugs do we have?", max_tools=5)
# Returns: [query_database, search_issues, get_statistics, ...]
# Instead of all 50+ available tools
Benefits of Dynamic Tool Discovery
- Scalability: supports hundreds of tools without saturating the context window
- Precision: the model sees only relevant tools, reducing confusion and errors
- Extensibility: new tools can be added at runtime without modifying the agent code
- Security: tools can be filtered based on user permissions
- Cost reduction: fewer tools in the prompt means fewer tokens consumed per call
Conclusion
Tool calling is the mechanism that transforms AI agents from passive text generators into active orchestrators capable of acting in the real world. The quality of tool calling depends on three pillars: precise definitions with detailed JSON Schema, rigorous validation of inputs to prevent errors and attacks, and robust error handling with appropriate retry and fallback strategies.
We have explored how to integrate REST APIs, web scraping, and databases securely, how to build a reusable tool framework with dynamic discovery, how to compose tools into pipelines, and how to test tools independently with proper mocking. These patterns form the foundation for building reliable, scalable production agents.
The best practices for tool naming, description writing, and parameter design are not mere conventions: they directly impact how effectively the language model can select and invoke tools. A well-designed tool ecosystem is the difference between an agent that occasionally makes useful tool calls and one that consistently executes complex multi-step workflows with precision.
In the next article, we will tackle testing AI agents: how to test tool calling flows end-to-end, how to simulate model responses, how to measure the quality of agent decisions, and how to implement regression tests to ensure that changes do not introduce unexpected behaviors.







