What AWS Bedrock Adds
AWS Bedrock extends prompt caching beyond Claude to include Amazon Nova models: Claude models (via Bedrock):- Claude 3 Opus 4.1, Opus 4, Sonnet 4.5, Sonnet 4, Haiku 4.5
- Claude 3.7 Sonnet, 3.5 Haiku
- Full caching support including tool definitions
- Nova Micro, Lite, Pro, Premier
- System and conversation caching support only
Key Differences from Anthropic Direct API
While the core caching concepts remain the same (as covered in our previous blog), AWS Bedrock has several differences worth understanding.Fixed 5-Minute Cache TTL
AWS Bedrock uses a fixed 5-minute TTL with no configuration options, while Anthropic’s direct API offers optional 1-hour caching.Tool Caching Not Supported on Nova
Amazon Nova models do not support tool caching. Attempting to useTOOLS_ONLY or SYSTEM_AND_TOOLS strategies with Nova throws an exception.
Model-Specific Token Thresholds
| Model | Minimum Tokens per Checkpoint |
|---|---|
| Claude 3.7 Sonnet, 3.5 Sonnet v2, Opus 4, Sonnet 4, Sonnet 4.5 | 1,024 |
| Claude 3.5 Haiku | 2,048 |
| Claude Haiku 4.5 | 4,096 |
| Amazon Nova (all variants) | 1,000 |
Cache Metrics Naming
| Metric | Anthropic Direct | AWS Bedrock |
|---|---|---|
| Creating a cache entry | cacheCreationInputTokens | cacheWriteInputTokens |
| Reading from cache | cacheReadInputTokens | cacheReadInputTokens |
Same Spring AI Patterns Across Providers
Spring AI uses identical caching strategies across both providers:Provider Characteristics at a Glance
| Feature | AWS Bedrock | Anthropic Direct |
|---|---|---|
| Cache TTL | 5 minutes (fixed) | 5 minutes (default), 1 hour (optional) |
| Models | Claude + Nova | Claude only |
| Tool Caching | Claude only | All Claude models |
| Token Metrics | cacheWriteInputTokens, cacheReadInputTokens | cacheCreationInputTokens, cacheReadInputTokens |
| Pricing | Varies by region/model | Published per-model |
| Cost Pattern | ~25% write premium, ~90% read savings | 25% write premium, 90% read savings |
Example: Document Analysis with Caching
Here’s a practical example showing cache effectiveness:cacheWriteInputTokens > 0, cacheReadInputTokens = 0
Subsequent requests (within TTL): cacheWriteInputTokens = 0, cacheReadInputTokens > 0
With a 3,500-token system prompt, this yields approximately 65% cost reduction on cached content (first question pays ~1.25x, subsequent questions pay ~0.10x).
Using Different Models on Bedrock
Getting Started
Add the Spring AI Bedrock Converse starter: Note: AWS Bedrock Prompt caching support is available in Spring AI1.1.0 and later. Try it with the latest 1.1.0-SNAPSHOT version.
Strategy Selection Reference
| Strategy | Use When | Claude | Nova |
|---|---|---|---|
SYSTEM_ONLY | Large stable system prompt | Yes | Yes |
TOOLS_ONLY | Large stable tools, dynamic system | Yes | No |
SYSTEM_AND_TOOLS | Both large and stable | Yes | No |
CONVERSATION_HISTORY | Multi-turn conversations | Yes | Yes |
NONE | Disable caching explicitly | Yes | Yes |