Guide
What is Tool Use in LLMs?
A guide to LLM tool use: browsing, code execution, APIs, MCP, agents, function calls, guardrails and evaluation.
Tool use in LLMs refers to the integration of functionalities like browsing, code execution, APIs, and agent workflows to extend large language models' capabilities. This is crucial as it enables enhanced reasoning and practical application without additional training, improving efficiency and safety. For example, the MAVEN framework boosted GPT-OSS-120b's accuracy from 48% to 71%, while Microsoft's Agent Governance Toolkit ensures safe AI agent operations (30 articles, 13 citations, 2026).
Quick Answer
Tool use in LLMs refers to the ability of large language models to interact with external systems and perform tasks through APIs, code execution, and other mechanisms. This capability is increasingly important as AI applications expand, with models like GPT-OSS-120b achieving a 71% accuracy on MAVEN-Bench. Recent developments highlight the growing integration of governance frameworks, such as Microsoft's Agent Governance Toolkit, to ensure safe tool use.
- Evidence base
- 30 filtered articles
- Cited sources
- 13 citations across 5 sources
- Refresh cadence
- Weekly
- Last updated
- Jun 1, 2026
FAQ
What is tool use in LLMs?
Tool use in LLMs refers to their ability to interact with external systems and perform tasks through APIs, code execution, and other mechanisms.
Why is tool use important?
Tool use is crucial for enhancing the functionality and applicability of LLMs across various sectors, including finance, healthcare, and software development.
What recent advancements have been made in LLM tool use?
Recent advancements include the MAVEN framework improving GPT-OSS-120b accuracy to 71% and the implementation of governance frameworks like Microsoft's Agent Governance Toolkit.
Current Read
Tool use in large language models (LLMs) encompasses various functionalities, including browsing, code execution, and API interactions, which enhance their utility across different applications. For example, the MAVEN framework has improved the accuracy of the GPT-OSS-120b model from 48% to 71% on MAVEN-Bench, showcasing significant advancements in agentic tool calling. Furthermore, the integration of governance measures, such as Microsoft's Agent Governance Toolkit, emphasizes the need for safety and compliance in AI agent workflows, ensuring that actions are evaluated based on identity and trust scores before execution.
Recent trends indicate a growing reliance on AI agents in various sectors, with tools like OpenAI's Codex being utilized to streamline processes in tax filing and software development. Companies like Endava have reported reducing software delivery timelines from weeks to hours by leveraging Codex, while Cisco and OpenAI's collaboration aims to enhance enterprise engineering through AI-native development. As the landscape evolves, the focus on secure and efficient tool use in LLMs will continue to shape AI's role in business and technology.
Key Takeaways
- MAVEN improves GPT-OSS-120b accuracy from 48% to 71% on MAVEN-Bench.
- Microsoft's Agent Governance Toolkit enhances safety in AI agent workflows.
- Codex is being used to automate tax filings and improve software delivery timelines.
- Endava reduced software delivery from weeks to hours using Codex.
- Nvidia's Vera CPU sets a new benchmark for agentic workloads in AI factories.
Topic Map
Understanding Tool Use in LLMs
Tool use in LLMs involves the integration of various functionalities that allow models to interact with external systems and perform tasks. This includes browsing capabilities, code execution, and API interactions. Recent advancements, such as the MAVEN framework, have demonstrated significant improvements in accuracy and reasoning capabilities, with models like GPT-OSS-120b achieving a 71% accuracy on MAVEN-Bench without additional training.
Governance Frameworks for AI Agents
The implementation of governance frameworks is crucial for ensuring safe tool use in AI agents. Microsoft's Agent Governance Toolkit serves as a model for creating governed workflows, where actions are evaluated based on identity and trust scores before execution. This approach enhances the safety and reliability of AI agents in various applications.
Related Guides
What is Function Calling?
A guide to function calling in LLMs: structured tool calls, schemas, APIs, agent workflows, reliability and safety checks.
What are AI Agents?
A living guide to AI agents: how they work, where they are useful, what can fail, and the latest agent news from trusted AI sources.
What is Agentic AI?
A guide to agentic AI: planning, tool use, memory, workflows, autonomy levels, risks and the latest agent product signals.
Source-Linked Articles
MAVEN: Improving Generalization in Agentic Tool Calling
MAVEN (Modular Agentic Verification and Execution Network) enhances reasoning in agentic tool-calling environments, improving GPT-OSS-120b accuracy from 48% to 71% on MAVEN-Bench without extra training. This lightweight framework also remains competitive against proprietary models at a cost ratio of 1/10, highlighting its potential for better compositional reasoning.
arXiv cs.AI · Jun 1, 2026
An Implementation of the Microsoft Agent Governance Toolkit for Safe AI Agent Tool Use with Policies, Approvals, Audit Logs, and Risk Controls
This tutorial demonstrates the implementation of Microsoft's Agent Governance Toolkit to create a governed AI-agent workflow. The framework ensures that all actions by AI agents pass through a governance layer that evaluates identity, trust score, risk tier, and other factors before execution, enhancing safety in tool use.
MarkTechPost · May 31, 2026