- Goals: the objectives that the agent is to complete, such as increasing code coverage, labeling issues, or reviewing and merging pull requests.
- Context: the data and environment the agent reasons over - source files, logs, structured datasets, and documentation.
- Tools: Custom capabilities made available to the model to invoke when needed, most often exposed through the Model Context Protocol.
- Judges: evaluators that verify outcomes and assess quality against predefined criteria. These can be deterministic, e.g. a code coverage number or AI-driven, using the LLM–as–Judge pattern.
- Sandbox: An abstraction of where the Agent will execute their work safely and reproducibly. Current support is for local execution and in a Docker container.
Resources
Projects:- Spring AI Bench - GitHub repository
- Spring AI Agents - Documentation
- Spring AI Community - Community portal
- Developer Productivity AI Arena (DPAIA) - Industry initiative for modern agent benchmarking
- SWE-bench - Original benchmark suite
- SWE-bench-Live - Fresh issues benchmark showing 60%→19% drop
- SWE-bench-Java - Multi-language benchmark showing Java ~7-10% vs Python ~75%
- mini-SWE-agent - Minimal agent achieving competitive results
- Model Context Protocol - MCP specification
- BetterBench - Benchmark quality framework
- Devoxx 2025: Spring AI Agents and Spring AI Bench - Mark Pollack’s talk introducing both projects