Smart Tool Selection: Achieving 34-64% Token Savings with Spring AI's Dynamic Tool Discovery

As AI agents connect to more services—Slack, GitHub, Jira, MCP servers—tool libraries grow rapidly. A typical multi-server setup can easily have 50+ tools consuming 55,000+ tokens before any conversation starts. Worse, tool selection accuracy degrades when models face 30+ similarly-named tools.

The Tool Search Tool pattern, pioneered by Anthropic, addresses this: instead of loading all tool definitions upfront, the model discovers tools on-demand. It receives only a search tool initially, queries for capabilities when needed, and gets relevant tool definitions expanded into context. This achieves significant token savings while maintaining access to hundreds of tools. The key insight: While Anthropic introduced this pattern for Claude, we can implement the same approach for any LLM using Spring AI’s Recursive Advisors. Spring AI provides a portable abstraction that makes dynamic tool discovery work across OpenAI, Anthropic, Gemini, Ollama, Azure OpenAI, and any other LLM provider supported by Spring AI. Our preliminary benchmarks show Spring AI’s Tool Search Tool implementation achieves 34-64% token reduction across OpenAI, Anthropic, and Gemini models while maintaining full access to hundreds of tools.

The Spring AI Tool Search Tool project is available on: spring-ai-tool-search-tool.

How Tool Calling Works

First, let’s understand how Spring AI’s tool calling works when using the ToolCallAdvisor - a special recursive advisor that:

Intercepts the ChatClient request before it reaches the LLM
Includes tool definitions in the prompt sent to the model - for all registered tools!
Detects tool call requests in the model’s response
Executes the requested tools using the ToolCallingManager
Loops back with tool results until the model provides a final answer

The tool execution happens in a recursive loop - the advisor keeps calling the LLM until no more tool calls are requested.

The Problem

The standard tool calling flow (such as ToolCallAdvisor) sends all tool definitions to the LLM upfront. This creates three major issues with large tool collections:

Context bloat - Massive token consumption before any conversation begins
Tool confusion - Models struggle to choose correctly when facing 30+ similar tools
Higher costs - Paying for unused tool definitions in every request

The Tool Search Tool Solution

By extending Spring AI’s ToolCallAdvisor, we’ve created a ToolSearchToolCallAdvisor that implements dynamic tool discovery. It intercepts the tool calling loop to selectively inject tools based on what the model discovers it needs:

The flow works as follows:

Indexing: At conversation start, all registered tools are indexed in the ToolSearcher (but NOT sent to the LLM)
Initial Request: Only the Tool Search Tool (TST) definition is sent to the LLM - saving context
Discovery Call: When the LLM needs capabilities, it calls the TST with a search query
Search & Expand: The ToolSearcher finds matching tools (e.g., “Tool XYZ”) and their definitions are added to the next request
Tool Invocation: The LLM now sees both TST and the discovered tool definitions, and can call the actual tool
Tool Execution: The discovered tool is executed and results returned to the LLM
Response: The LLM generates the final answer using the tool results

In code, this looks like this:

var toolSearchToolCallAdvisor = ToolSearchToolCallAdvisor.builder()
    .toolSearcher(toolSearcher)
    .maxResults(5)
    .build();

ChatClient chatClient = chatClientBuilder
    .defaultTools(new MyTools())  // 100s of tools registered but NOT sent to LLM initially
    .defaultAdvisors(toolSearchToolCallAdvisor) // Activate Tool Search Tool
    .build();

Pluggable Search Strategies

The ToolSearcher interface abstracts the search implementation, supporting multiple strategies (see tool-searchers for implementations):

Strategy	Implementation	Best For
Semantic	`VectorToolSearcher`	Natural language queries, fuzzy matching
Keyword	`LuceneToolSearcher`	Exact term matching, known tool names
Regex	`RegexToolSearcher`	Tool name patterns (`get_*_data`)

Getting Started

The project’s GitHub repository is: spring-ai-tool-search-tool. For detailed setup instructions and code examples, see the Quick Start guide (v1.x) and the related example application (v1.x). Maven Central (1.0.1):

<dependency>
    <groupId>org.springaicommunity</groupId>
    <artifactId>tool-search-tool</artifactId>
    <version>1.0.1</version>
</dependency>

<!-- Choose a search strategy -->
<dependency>
    <groupId>org.springaicommunity</groupId>
    <artifactId>tool-searcher-lucene</artifactId>
    <version>1.0.1</version>
</dependency>

Version v1.0.x is Spring AI 1.1.x / Spring Boot 3 compatible and v2.0.x is Spring AI 2.x / Spring Boot 4 compatible. Example Usage

@SpringBootApplication
public class Application {

    @Bean
    CommandLineRunner demo(ChatClient.Builder builder, ToolSearcher toolSearcher) {
        return args -> {
            var advisor = ToolSearchToolCallAdvisor.builder()
                .toolSearcher(toolSearcher)
                .build();

            ChatClient chatClient = builder
                .defaultTools(new MyTools())
                .defaultAdvisors(advisor)
                .build();

            var answer = chatClient.prompt("""
                Help me plan what to wear today in Amsterdam.
                Please suggest clothing shops that are open right now.
                """).call().content();
            
            System.out.println(answer);
        };
    }

    static class MyTools {

		@Tool(description = "Get the weather for a given location at a given time")
		public String weather(String location, 
            @ToolParam(description = "YYYY-MM-DDTHH:mm") String atTime) {...}

		@Tool(description = "Get clothing shop names for a given location at a given time")
		public List<String> clothing(String location,
				@ToolParam(description = "YYYY-MM-DDTHH:mm") String openAtTime) {...}

		@Tool(description = "Current date and time for a given location")
		public String currentTime(String location) {...}
        
        // ... potentially hundreds more tools
    }
}

For the example above, the flow would be:

User Request: “Help me plan what to wear today in Amsterdam. Please suggest clothing shops that are open right now.”
Initialization: Index all tools: weather, clothing, currentTime (+ potentially 100s more)
First LLM Call - LLM sees only toolSearchTool
- LLM calls toolSearchTool(query="current time date") → ["currentTime"]
Second LLM Call - LLM sees toolSearchTool + currentTime
- LLM calls currentTime("Amsterdam") → "2025-12-08T11:30"
- LLM calls toolSearchTool(query="weather location") → ["weather"]
Third LLM Call - LLM sees toolSearchTool + currentTime + weather
- LLM calls weather("Amsterdam") → "Sunny, 15°C"
- LLM calls toolSearchTool(query="clothing shops") → ["clothing"]
Fourth LLM Call - LLM sees toolSearchTool + currentTime + weather + clothing
- LLM calls clothing("Amsterdam", "2025-12-08T11:30") → ["H&M", "Zara", "Uniqlo"]
Final Response: “Based on the sunny 15°C weather in Amsterdam, I recommend light layers. Here are clothing shops open now: H&M, Zara, …”

Performance Measurements

⚠️ Disclaimer: These are preliminary, manual measurements taken after a few runs. They are not averaged across multiple iterations and should be considered illustrative rather than representative.

To quantify the token savings, we ran preliminary benchmarks using the demo application with the following setup:

Task: “Help me plan what to wear today in Amsterdam. Please suggest clothing shops that are open right now.”
28 total tools: 3 relevant tools (weather, clothing, currentTime) + 25 unrelated “dummy” tools, deliberately not relevant to the weather/clothing task, demonstrating how the tool search efficiently discovers only the needed tools among many unrelated options.
Search strategies: Lucene (keyword-based) and VectorStore (semantic)
Models tested: Gemini (gemini-3-pro-preview), OpenAI (gpt-5-mini-2025-08-07), Anthropic (claude-sonnet-4-5-20250929)

The measurements are collected using a custom TokenCounterAdvisor that tracks and aggregates the token usage.

Results with Lucene Search

Model	Approach	Total Tokens	Prompt Tokens	Completion Tokens	Requests	Savings
Gemini	With TST	2,165	1,412	231	4	60%
	Without TST	5,375	4,800	176	3	—
OpenAI	With TST	4,706	2,770	1,936	5	34%
	Without TST	7,175	5,765	1,410	3	—
Anthropic	With TST	6,273	5,638	635	5	64%
	Without TST	17,342	16,752	590	4	—

Results with VectorStore Search

Model	Approach	Total Tokens	Prompt Tokens	Completion Tokens	Requests	Savings
Gemini	With TST	2,214	1,502	234	4	57%
	Without TST	5,122	4,767	73	3	—
OpenAI	With TST	3,697	2,109	1,588	4	47%
	Without TST	6,959	5,771	1,188	3	—
Anthropic	With TST	6,319	5,642	677	5	63%
	Without TST	17,291	16,744	547	4	—

Key Observations

Significant token savings across all models: The Tool Search Tool pattern achieved 34-64% reduction in total token consumption depending on the model and search strategy.
Prompt tokens are the key driver: The savings come primarily from reduced prompt tokens - with TST, only discovered tool definitions are included rather than all 28 tools upfront.
Trade-off: More requests, fewer tokens: TST requires 4-5 requests vs 3-4 without, but the total token cost is significantly lower.
Both search strategies perform similarly: Lucene and VectorStore produced comparable results, with VectorStore showing slightly better efficiency for OpenAI in this test.
All models successfully completed the task: All three models (Gemini, OpenAI, Anthropic) figured out that they needed to call currentTime before invoking the other tools, demonstrating correct reasoning about tool dependencies.
Different tool discovery strategies: Models exhibited varying approaches—some managed to request all necessary tools upfront, while others discovered them one by one. However, all models leveraged parallel tool calling when possible to optimize execution.
Older models may struggle: The older model versions may have difficulty with the tool search pattern, potentially missing required tools or making suboptimal discovery decisions. Consider adding a custom systemMessageSuffix to provide additional guidance to the model, experiment with different tool-searcher configurations or pair this approach with the LLM as Judge pattern to ensure reliable and consistent behavior across different models.

When to Use

Tool Search Tool Approach	Traditional Approach
20+ tools in your system	Small tool library (<20 tools)
Tool definitions consuming >5K tokens	All tools frequently used in every session
Building MCP-powered systems with multiple servers	Very compact tool definitions
Experiencing tool selection accuracy issues

Next Steps

As the Tool Search Tool project matures and proves its value within the Spring AI Community, we may consider adding it to the core Spring AI project. For deterministic tool selection without LLM involvement, explore the Pre-Select Tool Demo and the experimental PreSelectToolCallAdvisor. Unlike the agentic ToolSearchToolCallAdvisor, this advisor pre-selects tools based on message content before the LLM call—ideal for Chain of Thought patterns where a preliminary reasoning step explicitly names the required tools. Also, you can consider combining the Tool Search Tool with LLM-as-a-Judge patterns to ensure discovered tools actually fulfill the user’s task. A judge model could evaluate whether the dynamically selected tools produced satisfactory results and improve the tool discovery if needed. Try the current implementation and provide feedback to help shape its evolution into a first-class Spring AI feature.

Conclusion

The Tool Search Tool pattern is a step toward scalable AI agents. By combining Anthropic’s innovative approach with Spring AI’s portable abstraction, we can build systems that efficiently manage thousands of tools while maintaining high accuracy across any LLM provider. The power of Spring AI’s recursive advisor architecture is that it allows us to implement sophisticated tool discovery workflows that work universally - whether you’re using OpenAI’s GPT models, Anthropic’s Claude, local Ollama models, or any other LLM supported by Spring AI. You get the same dynamic tool discovery benefits without being locked into a specific provider’s native implementation.

References

Anthropic Tool Search Tool Pattern: Advanced Tool Use
Spring AI Tool Search Tool Implementation: GitHub Repository
Spring AI Tools Documentation: Tools API Reference
Spring AI Recursive Advisors: Advisors API Reference
Spring AI Recursive Advisors Blog: Spring AI Recursive Advisors

Community

Projects

Production Projects

Incubating Projects

Get Involved

Benchmarking

Blog

Smart Tool Selection: Achieving 34-64% Token Savings with Spring AI's Dynamic Tool Discovery

How Tool Calling Works

The Problem

The Tool Search Tool Solution

Pluggable Search Strategies

Getting Started

Performance Measurements

Results with Lucene Search

Results with VectorStore Search

Key Observations

When to Use

Next Steps

Conclusion

References

Community

Projects

Production Projects

Incubating Projects

Get Involved

Benchmarking

Blog

​How Tool Calling Works

​The Problem

​The Tool Search Tool Solution

​Pluggable Search Strategies

​Getting Started

​Performance Measurements

​Results with Lucene Search

​Results with VectorStore Search

​Key Observations

​When to Use

​Next Steps

​Conclusion

​References

How Tool Calling Works

The Problem

The Tool Search Tool Solution

Pluggable Search Strategies

Getting Started

Performance Measurements

Results with Lucene Search

Results with VectorStore Search

Key Observations

When to Use

Next Steps

Conclusion

References