- Dedicated Vector Databases
- Database Engines with Vector Search Capabilities
float numbers, a namespace (something like a table or collection) and a Top K (the number of results to return) argument. Vector search then run an Approximate Nearest Neighbor (ANN) or k-Nearest Neighbors (kNN) search.
Those databases allow additional filtering similarity and metadata fields, however, the core of the search gravitates around vector representations.
Existing Database Engines such as Postgres (pgvector), Oracle, and MongoDB have gradually added vector search capabilities to their engines.
They are not dedicated vector databases but rather general-purpose databases with vector search capabilities. Their strength lies in their ability to handle a wide range of data types and queries, especially when it comes to combining vector search with traditional queries. They also have a long history of supporting administrative tasks with a well-understood operating model for backup and recovery, scaling, and maintenance.
Another aspect to consider is that these databases are already being used in production, containing large amounts of existing data.
Elephant in the Room
Spring AI has a wide range of integrations with vector stores. The obvious question is: “Why has Spring AI support for Vector Search and Spring Data does not?”. And why does that even matter? The goal of Spring AI is to simplify the process of building AI-powered applications by providing a consistent programming model and abstractions. It focuses on integrating AI capabilities into Spring applications and provides a unified API for working with various AI models and services. AI is a hot topic: Several database vendors have contributed their integration for vector search to Spring AI to enable use-cases such as Retrieval Augmented Generation. This is a great example of how Open Source can drive innovation and collaboration in the database space. When we consider what’s after the peak of AI’s hype cycle, we are faced with day-2 operations. Data has a lifecycle, new LLM models come and go, some are better for certain tasks or languages than others. While Spring AI’sVectorStore has the means to reflect of data lifecycle to some extent, it is by no means its primary focus.
And here comes Spring Data into play. Spring Data is all about data models, access, and data lifecycle. It provides a consistent programming model for accessing different data stores, including relational databases, NoSQL databases, and more. Spring Data’s focus is on simplifying data access and management, making it easier to work with various data sources in a consistent way.
Wouldn’t it then make sense to have Vector Search capabilities in Spring Data?
Yes, it would.
Vector Search in Spring Data
With Spring Data 3.5, we’ve introduced aVector type to simplify usage of vector data in entities.
Vector data types are not common in typical domain models.
The closest resemblance of vector data has been geospatial data types such as Point, Polygon, etc. but even those are not common.
Domain models rather consist of primitive types and value types reflecting the domain they are used in.
Vector properties use either vendor-specific types (such as Cassandra’s CqlVector) or use some sort of array, like float[]. In the latter case, using arrays introduces quite some accidental complexity: Java arrays are pointers. Their underlying actual array data is mutable. Carrying arrays around isn’t too common either.
Your domain model can leverage Vector property type reducing the risk of accidentally mutating the underlying data and giving the otherwise float[] a semantic context. Persisting and retrieving Vector properties is handled by Spring Data for modules where Spring Data handles object mapping. For JPA, you will require additional converters.
Vector.of(…) creates an immutable Vector instance and a copy of the given input array. While this is useful for most scenarios, performance-sensitive arrangements that want to reduce GC pressure can retain the reference to the original array:
float[] array by calling toFloatArray() respective toDoubleArray() if you want to use double[]. Alternatively, you can access the Vector’s source through getSource().
Depending on your data store, you might need to equip your data model with additional annotations to indicate e.g. the number of dimensions or its precision.
When running a Vector search operation, each database uses a very different API. Let’s take a look at MongoDB and Apache Cassandra.
In MongoDB, Vector Search is used through its Aggregation Framework requiring an aggregation stage:
VectorSearchOperation offers a fluent API guiding you through each step of the operation reflecting the underlying MongoDB API in a convenient way.
Let’s have a look at Apache Cassandra. Cassandra uses CQL (Cassandra Query Language) to run queries against the database. Cassandra Vector Search uses the same approach. With Cassandra, Spring Data users have the choice to either use Spring Data Cassandra’s Query API or the native CQL API to run a Vector Search:
CommentVectorSearchResult class or List<Document> (MongoDB’s native document type)? Cassandra has no detached raw type that you could use to consume result so we need CommentVectorSearchResult as dedicated type to map this specific Cassandra search result. We not only want to access domain data, but also the score. MongoDB’s Java driver ships with a Document type that behaves is essentially a Map.
That’s not how we envisioned a modern programming model.
When searching for items, the result is not a list of domain objects but rather a list of search results. How do we even represent those?
And can’t we have a uniform programming model that combines the simplicity of expressing what I want with the power of the underlying database?
Vector Search Methods
If it is a search result, then it is aSearchResult<T>. What if repository methods could return SearchResults<T>? Searching is a slightly different concept than querying (finding) entities. Beyond that, search methods would work similarly to existing query methods.
language.
By leveraging Near and Within keywords, Spring Data MongoDB is able to associate the given Vector and Similarity predicate to customize the actual Vector Search operation. The result is returned as SearchResults providing access to the found entity and its score respective similarity value.
Using Vector Search with Postgres or Oracle is even simpler. The following example shows a Vector Search method in a Spring Data JPA repository through Hibernate’s hibernate-vector module:
Near and Within Search methods require a Vector and a Score, Similarity (subtype of Score), or Range<Score> parameter to determine how similarity/distance is calculated. Traditionally, query methods are intended to express predicates of a query while a typical Vector search is more about the Top-K limiting. That is something we have to consider in the future.
Search methods can also leverage annotations in the same way that query methods do. The following example shows a search method in a Spring Data JPA:
Similarity.of(…) as argument with the appropriate ScoringFunction, Spring Data normalizes the native score into a similarity range between 0 and 1.
Vector with JPA. When using JPA, you can use Vector in your domain model to store the vector assuming you have configured an AttributeConverter to convert the Vector into a database type.
However, when using Hibernate’s distance methods (such as cosine_distance), Hibernate doesn’t consider any Attribute Converters, hence your model must resort to using float[] or double[] as embedding type:
Conclusion
We’ve explored the world of Vector Search and how it fits into the Spring ecosystem along with Vector Search origins in Spring AI. Support for theVector type ships with Spring Data 3.5 (2025.0) in the May 2025 release.
Vector Search Methods are a preview feature of the Spring Data 4.0 (2025.1) release train with first implementations for JPA through Hibernate Vector, MongoDB and Apache Cassandra. We’re excited to hear what you think about Vector Search methods and how we can improve them further.
You can find the documentation about Vector Search in the reference documentation, in JPA, MongoDB, and Cassandra.