Tool use (function calling)

Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: The feature that lets LLMs do things in the real world — not just talk. You tell the model “here are some functions you can call”; the model decides when to call them. This single feature is what turns chatbots into agents.


In plain English

A plain LLM can only produce text. That’s it. It can write you a function in Python, but it can’t run that function. It can describe a Google search, but it can’t do the search. It can write an email, but it can’t send it.

Tool use changes that. You tell the model: “here are the tools you have access to — each one with a name, a description of what it does, and a description of the inputs it needs.” Then, when you give the model a task, it can decide to call a tool. Your code receives the call, actually runs the tool (e.g. searches Google, sends the email), and feeds the result back to the model. The model uses the result to continue working — possibly calling more tools — until the task is done.

Tool use is also called function calling (OpenAI’s name) or just tools. Same idea.

It’s a simple mechanism. But it’s the single most important feature added to LLMs after the original transformer paper, because it’s the gateway to agency. Once an LLM can take actions, it stops being a fancy autocomplete and starts being a worker.


Why it matters

Every interesting AI product built since 2023 uses tool use:

  • ChatGPT uses it for web search, code interpreter, image generation, custom GPTs
  • Claude Code uses it for Read, Edit, Write, Bash, Grep, WebSearch — literally every action it takes is a tool call
  • Cursor / Windsurf / Aider use it for editing files and running terminal commands
  • MCP servers are an entire ecosystem of pre-built tools that any LLM can plug into
  • Customer support agents, research assistants, code reviewers — all use tool use

Without tool use, an LLM can describe a solution. With tool use, the LLM can deliver the solution. That’s the difference.


How it actually works — the loop

The basic pattern is a loop between your code and the model:

1. You call the model with:
   - Your task ("Summarize the top 5 hacker news posts from today")
   - A list of available tools (e.g. fetch_url, get_current_time)

2. The model responds with one of two things:
   (a) A normal text response (the task is done)
   (b) A tool_call: "I want to call fetch_url with these arguments"

3. If it's a text response → you're done.
   If it's a tool_call → your code:
   - Actually runs the tool
   - Captures the result
   - Sends it back to the model with the conversation history

4. Goto step 2.

This loop continues until the model decides it has enough information to answer in plain text.

The critical insight: the model never actually runs the tool itself. It just produces a structured request saying “I’d like to call this tool with these arguments.” Your code is the one that executes the tool. That separation is what makes this safe and flexible — your code can refuse to run dangerous tools, ask the user first, log everything, or substitute mock data for testing.


What a tool definition looks like

Each tool you give the model has three parts:

{
  name: "fetch_url",
  description: "Fetch the contents of a web page given its URL. Returns the HTML.",
  input_schema: {
    type: "object",
    properties: {
      url: {
        type: "string",
        description: "The full URL to fetch (must start with http:// or https://)"
      }
    },
    required: ["url"]
  }
}
  • name — a short identifier the model uses to call the tool.
  • description — natural English text telling the model what the tool does and when to use it. The model reads this to decide whether the tool fits the current task. Quality of the description directly affects how well the model uses the tool.
  • input_schema — a JSON Schema describing the arguments. The model is forced to produce input matching this schema, so you don’t have to validate “did the model give me a URL?” — you can trust the structure.

Good tool descriptions look like:

“Search the web using DuckDuckGo. Use this when the user asks about current events, recent news, or anything that may have happened after your training cutoff. Returns the top 5 results with titles and snippets.”

Bad tool descriptions look like:

“Search the web.”

The model reads the description like a human read a tool-tip — it’s how it decides which tool fits.


A concrete example: getting weather

Imagine you want to build an assistant that can answer “what’s the weather in Sydney?”

You define one tool:

const tools = [{
  name: "get_weather",
  description: "Get the current weather for a city. Returns temperature in Celsius and a short description.",
  input_schema: {
    type: "object",
    properties: {
      city: { type: "string", description: "The city name, e.g. 'Sydney' or 'New York'" }
    },
    required: ["city"]
  }
}];

You implement the actual function:

async function getWeather({ city }) {
  const res = await fetch(`https://api.weatherapi.com/v1/current.json?key=...&q=${city}`);
  const data = await res.json();
  return { temp_c: data.current.temp_c, description: data.current.condition.text };
}

You make the API call:

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  tools,
  messages: [{ role: "user", content: "What's the weather in Sydney?" }]
});

The model replies:

{
  "stop_reason": "tool_use",
  "content": [
    { "type": "text", "text": "I'll check the weather in Sydney for you." },
    { "type": "tool_use", "id": "tool_123", "name": "get_weather", "input": { "city": "Sydney" } }
  ]
}

Your code calls getWeather({ city: "Sydney" }), gets { temp_c: 18, description: "Partly cloudy" }, and sends it back to the model:

const next = await anthropic.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  tools,
  messages: [
    { role: "user", content: "What's the weather in Sydney?" },
    { role: "assistant", content: response.content },
    { role: "user", content: [{ type: "tool_result", tool_use_id: "tool_123", content: '{"temp_c":18,"description":"Partly cloudy"}' }] }
  ]
});

The model now replies in plain text:

“It’s currently 18°C and partly cloudy in Sydney.”

Done.

That entire pattern — define tools, send message, handle tool calls, feed results back — is the whole of tool use. Everything fancier (multi-step agents, complex workflows) is just running that loop more times with more tools.


Parallel tool calls

Modern Claude can call multiple tools in a single turn:

{
  "stop_reason": "tool_use",
  "content": [
    { "type": "tool_use", "id": "1", "name": "get_weather", "input": { "city": "Sydney" } },
    { "type": "tool_use", "id": "2", "name": "get_weather", "input": { "city": "Melbourne" } },
    { "type": "tool_use", "id": "3", "name": "get_weather", "input": { "city": "Brisbane" } }
  ]
}

Your code runs all three in parallel and sends back all three results. This dramatically speeds up agents that need multiple independent pieces of information.


Tool use vs structured output

Sometimes you don’t need a real tool — you just want the model to produce JSON in a specific shape. Tool use works for this too: define a “tool” called submit_answer with the schema you want; tell the model to use it. The model will produce strict JSON matching your schema.

This is more reliable than asking the model “respond in JSON” — the input_schema is enforced.


The tool-use loop in Claude Code (a real example)

When you ask Claude Code to “fix this bug”:

  1. Tool call: Glob({ pattern: "**/*.ts" }) → list all TypeScript files
  2. Tool call: Read({ file_path: "src/auth.ts" }) → read the suspected file
  3. Tool call: Grep({ pattern: "expires", path: "src" }) → search for related code
  4. Tool call: Read({ file_path: "src/session.ts" }) → read another suspected file
  5. Tool call: Edit({ file_path: "src/session.ts", old_string: "...", new_string: "..." }) → apply a fix
  6. Tool call: Bash({ command: "npm test" }) → verify
  7. Text response: “I fixed the session expiry bug. Tests pass.”

Seven tool calls. Each one was a decision by the model: which tool, with what arguments, given the current state of the conversation. The loop continued until the model felt the job was done.

This is what “agentic” means in practice — the LLM in the driver’s seat, your code as the engine and steering wheel.


Common gotchas

  • Tool description quality matters more than you think. A vague description is the #1 cause of “the model isn’t using my tool.” Treat tool descriptions like documentation written for a smart but unfamiliar developer.

  • Schema strictness varies. Most models will conform to the input schema almost always. Edge cases include very nested schemas, or fields the description encourages the model to use freely. Validate on your side too for safety.

  • The model can hallucinate tools that don’t exist. If your description says “use search_web” and the user asks for something search-related but you only provided get_weather, the model might try to call search_web anyway. Always validate that the tool name exists before executing.

  • Don’t give too many tools. Past ~20–30 tools, the model starts losing track. Group related operations into multi-purpose tools, or use a “select a tool category first” pattern.

  • Tool descriptions are tokens. Long descriptions for every tool add up — both in input cost and context-window space. Be concise.

  • Cycles can occur. If the model keeps calling the same tool with similar inputs, it might be stuck. Add a max-iterations check in your loop, and consider feeding the model a meta-message (“you’ve tried this 3 times — try a different approach”).

  • Race conditions in parallel tool calls. If the model calls two tools that depend on each other, parallel execution can produce stale or wrong results. Use sequential tool calls for dependent operations.

  • Tool errors should be returned, not thrown. When a tool fails (timeout, 404, etc.), return the error message as a tool_result. The model can often recover by trying a different approach.

  • Sensitive tools need extra care. A tool that can delete data, send email, or spend money should require explicit user approval before being called. Claude Code’s permission system is exactly this — every “risky” tool prompts for approval.


See also


Sources