AI Meets Spring Petclinic: Implementing an AI Assistant with Spring AI (Part II)

Recap of Part I

In the first part of this blog series, we explored the basics of integrating Spring AI with large language models. We walked through building a custom ChatClient, leveraging Function Calling for dynamic interactions, and refining our prompts to suit the Spring Petclinic use case. By the end, we had a functional AI assistant capable of understanding and processing requests related to our veterinary clinic domain. Now, in Part II, we’ll go a step further by exploring Retrieval-Augmented Generation (RAG), a technique that enables us to handle large datasets that wouldn’t fit within the constraints of a typical Function Calling approach. Let’s see how RAG can seamlessly integrate AI with domain-specific knowledge.

Retrieval-Augmented Generation

While listing veterinarians could have been a straightforward implementation, I chose this as an opportunity to showcase the power of Retrieval-Augmented Generation (RAG). RAG integrates large language models with real-time data retrieval to produce more accurate and contextually relevant text. Although this concept aligns with our previous work, RAG typically emphasizes data retrieval from a vector store. A vector store contains data in the form of embeddings—numerical representations that capture the meaning of the information, such as the data about our veterinarians. These embeddings are stored as high-dimensional vectors, facilitating efficient similarity searches based on semantics rather than traditional text-based searches. For instance, consider the following veterinarians and their specialties:

Dr. Alice Brown - Cardiology
Dr. Bob Smith - Dentistry
Dr. Carol White - Dermatology

In a conventional search, a query for “Teeth Cleaning” would yield no exact matches. However, with semantic search powered by embeddings, the system recognizes that “Teeth Cleaning” relates to “Dentistry.” Consequently, Dr. Bob Smith would be returned as the best match, even though his specialty was never explicitly mentioned in the query. This illustrates how embeddings capture the underlying meaning rather than merely relying on exact keywords. While the implementation of this process is beyond the scope of this article, you can learn more by checking out this YouTube video.

Fun fact - this example was generated by ChatGPT itself.

In essence, similarity searches operate by identifying the nearest numerical values of the search query to those of the source data. The closest match is returned. The process of transforming text into these numerical embeddings is also handled by the LLM.

Generating Test Data

Utilizing a vector store is most effective when handling a substantial amount of data. Given that six veterinarians can easily be processed in a single call to the LLM, I aimed to increase the number to 256. While even 256 may still be relatively small, it serves well for illustrating our process. Veterinarians in this setup can have zero, one, or two specialties, mirroring the original examples from Spring Petclinic. To avoid the tedious task of creating all this mock data manually, I enlisted ChatGPT’s assistance. It generated a union query that produces 250 veterinarians and assigns specialties to 80% of them:

-- Create a list of first names and last names
WITH first_names AS (
    SELECT 'James' AS name UNION ALL
    SELECT 'Mary' UNION ALL
    SELECT 'John' UNION ALL
    ...
),
last_names AS (
    SELECT 'Smith' AS name UNION ALL
    SELECT 'Johnson' UNION ALL
    SELECT 'Williams' UNION ALL
    ...
),
random_names AS (
    SELECT
        first_names.name AS first_name,
        last_names.name AS last_name
    FROM
        first_names
    CROSS JOIN
        last_names
    ORDER BY
        RAND()
    LIMIT 250
)
INSERT INTO vets (first_name, last_name)
SELECT first_name, last_name FROM random_names;

-- Add specialties for 80% of the vets
WITH vet_ids AS (
    SELECT id
    FROM vets
    ORDER BY RAND()
    LIMIT 200  -- 80% of 250
),
specialties AS (
    SELECT id
    FROM specialties
),
random_specialties AS (
    SELECT 
        vet_ids.id AS vet_id,
        specialties.id AS specialty_id
    FROM 
        vet_ids
    CROSS JOIN 
        specialties
    ORDER BY 
        RAND()
    LIMIT 300  -- 2 specialties per vet on average
)
INSERT INTO vet_specialties (vet_id, specialty_id)
SELECT 
    vet_id,
    specialty_id
FROM (
    SELECT 
        vet_id,
        specialty_id,
        ROW_NUMBER() OVER (PARTITION BY vet_id ORDER BY RAND()) AS rn
    FROM 
        random_specialties
) tmp
WHERE 
    rn <= 2;  -- Assign at most 2 specialties per vet

-- The remaining 20% of vets will have no specialties, so no need for additional insertion commands

To ensure that my data remains static and consistent across runs, I exported the relevant tables from the H2 database as hardcoded insert statements. These statements were then added to the data.sql file:

INSERT INTO vets VALUES (default, 'James', 'Carter');
INSERT INTO vets VALUES (default, 'Helen', 'Leary');
INSERT INTO vets VALUES (default, 'Linda', 'Douglas');
INSERT INTO vets VALUES (default, 'Rafael', 'Ortega');
INSERT INTO vets VALUES (default, 'Henry', 'Stevens');
INSERT INTO vets VALUES (default, 'Sharon', 'Jenkins');
INSERT INTO vets VALUES (default, 'Matthew', 'Alexander');
INSERT INTO vets VALUES (default, 'Alice', 'Anderson');
INSERT INTO vets VALUES (default, 'James', 'Rogers');
INSERT INTO vets VALUES (default, 'Lauren', 'Butler');
INSERT INTO vets VALUES (default, 'Cheryl', 'Rodriguez');
...
...
-- Total of 256 vets

-- First, let's make sure we have 5 specialties
INSERT INTO specialties (name) VALUES ('radiology');
INSERT INTO specialties (name) VALUES ('surgery');
INSERT INTO specialties (name) VALUES ('dentistry');
INSERT INTO specialties (name) VALUES ('cardiology');
INSERT INTO specialties (name) VALUES ('anesthesia');

INSERT INTO vet_specialties VALUES ('220', '2');
INSERT INTO vet_specialties VALUES ('131', '1');
INSERT INTO vet_specialties VALUES ('58', '3');
INSERT INTO vet_specialties VALUES ('43', '4');
INSERT INTO vet_specialties VALUES ('110', '3');
INSERT INTO vet_specialties VALUES ('63', '5');
INSERT INTO vet_specialties VALUES ('206', '4');
INSERT INTO vet_specialties VALUES ('29', '3');
INSERT INTO vet_specialties VALUES ('189', '3');
...
...

Embedding the Test Data

We have several options available for the vector store itself. Postgres with the pgVector extension is probably the most popular choice. Greenplum—a massively parallel Postgres database—also supports pgVector. The Spring AI reference documentation lists the currently supported vector stores. For our simple use case, I opted to use the Spring AI-provided SimpleVectorStore. This class implements a vector store using a straightforward Java ConcurrentHashMap, which is more than sufficient for our small dataset of 256 vets. The configuration for this vector store, along with the chat memory implementation, is defined in the AIBeanConfiguration class annotated with @Configuration:

@Configuration
@Profile("openai")
public class AIBeanConfiguration {

	@Bean
	public ChatMemory chatMemory() {
		return new InMemoryChatMemory();
	}

	@Bean
	VectorStore vectorStore(EmbeddingModel embeddingModel) {
		return new SimpleVectorStore(embeddingModel);
	}

}

The vector store needs to embed the veterinarian data as soon as the application starts. To achieve this, I added a VectorStoreController bean, which includes an @EventListener that listens for the ApplicationStartedEvent. This method is automatically invoked by Spring as soon as the application is up and running, ensuring that the veterinarian data is embedded into the vector store at the appropriate time:

	@EventListener
	public void loadVetDataToVectorStoreOnStartup(ApplicationStartedEvent event) throws IOException {
		// Fetches all Vet entites and creates a document per vet
		Pageable pageable = PageRequest.of(0, Integer.MAX_VALUE);
		Page<Vet> vetsPage = vetRepository.findAll(pageable);

		Resource vetsAsJson = convertListToJsonResource(vetsPage.getContent());
		DocumentReader reader = new JsonReader(vetsAsJson);

		List<Document> documents = reader.get();
		// add the documents to the vector store
		this.vectorStore.add(documents);

		if (vectorStore instanceof SimpleVectorStore) {
			var file = File.createTempFile("vectorstore", ".json");
			((SimpleVectorStore) this.vectorStore).save(file);
			logger.info("vector store contents written to {}", file.getAbsolutePath());
		}

		logger.info("vector store loaded with {} documents", documents.size());
	}

	public Resource convertListToJsonResource(List<Vet> vets) {
		ObjectMapper objectMapper = new ObjectMapper();
		try {
			// Convert List<Vet> to JSON string
			String json = objectMapper.writeValueAsString(vets);

			// Convert JSON string to byte array
			byte[] jsonBytes = json.getBytes();

			// Create a ByteArrayResource from the byte array
			return new ByteArrayResource(jsonBytes);
		}
		catch (JsonProcessingException e) {
			e.printStackTrace();
			return null;
		}
	}

There’s a lot to unpack here, so let’s walk through the code:

Similar to listOwners, we begin by retrieving all vets from the database.
Spring AI embeds entities of type Document into the vector store. A Document represents the embedded numerical data alongside its original, human-readable text data. This dual representation allows our code to map correlations between the embedded vectors and the natural text.
To create these Document entities, we need to convert our Vet entities into a textual format. Spring AI provides two built-in readers for this purpose: JsonReader and TextReader. Since our Vet entities are structured data, it makes sense to represent them as JSON. To achieve this, we use the helper method convertListToJsonResource, which leverages the Jackson parser to convert the list of vets into an in-memory JSON resource.
Next, we call the add(documents) method on the vector store. This method is responsible for embedding the data by iterating over the list of documents (our vets in JSON format) and embedding each one while associating the original metadata with it.
Though not strictly required, we also generate a vectorstore.json file, which represents the state of our SimpleVectorStore database. This file allows us to observe how Spring AI interprets the stored data behind the scenes. Let’s take a look at the generated file to understand what Spring AI sees.

{
  "dd919c71-06bb-4777-b974-120dfee8b9f9" : {
    "embedding" : [ 0.013877872, 0.03598228, 0.008212427, 0.00917901, -0.036433823, 0.03253927, -0.018089917, -0.0030867155, -0.0017038669, -0.048145704, 0.008974405, 0.017624263, 0.017539598, -4.7888185E-4, 0.013842596, -0.0028221398, 0.033414137, -0.02847539, -0.0066955267, -0.021885695, -0.0072387885, 0.01673529, -0.007386951, 0.014661016, -0.015380662, 0.016184973, 0.00787377, -0.019881975, -0.0028785826, -0.023875304, 0.024778388, -0.02357898, -0.023748307, -0.043094076, -0.029322032, ... ],
    "content" : "{id=31, firstName=Samantha, lastName=Walker, new=false, specialties=[{id=2, name=surgery, new=false}]}",
    "id" : "dd919c71-06bb-4777-b974-120dfee8b9f9",
    "metadata" : { },
    "media" : [ ]
  },
  "4f9aabed-c15c-43f6-9dbc-46ed9a18e176" : {
    "embedding" : [ 0.01051745, 0.032714732, 0.007800559, -0.0020621764, -0.03240663, 0.025530376, 0.0037602335, -0.0023702774, -0.004978633, -0.037364256, 0.0012831709, 0.032742742, 0.005430281, 0.00847278, -0.004285406, 0.01146276, 0.03036196, -0.029941821, 0.013220336, -0.03207052, -7.518716E-4, 0.016665466, -0.0052062077, 0.010678503, 0.0026591222, 0.0091940155, ... ],
    "content" : "{id=195, firstName=Shirley, lastName=Martinez, new=false, specialties=[{id=1, name=radiology, new=false}, {id=2, name=surgery, new=false}]}",
    "id" : "4f9aabed-c15c-43f6-9dbc-46ed9a18e176",
    "metadata" : { },
    "media" : [ ]
  },
  "55b13970-cd55-476b-b7c9-62337855ae0a" : {
    "embedding" : [ -0.0031563698, 0.03546827, 0.018778138, -0.01324492, -0.020253662, 0.027756566, 0.007182742, -0.008637386, -0.0075725033, -0.025543278, 5.850768E-4, 0.02568248, 0.0140383635, -0.017330453, 0.003935892, ... ],
    "content" : "{id=19, firstName=Jacqueline, lastName=Ross, new=false, specialties=[{id=4, name=cardiology, new=false}]}",
    "id" : "55b13970-cd55-476b-b7c9-62337855ae0a",
    "metadata" : { },
    "media" : [ ]
  },
...
...
...

Pretty cool! We have a Vet in JSON format alongside a set of numbers that, while they might not make much sense to us, are highly meaningful to the LLM. These numbers represent the embedded vector data, which the model uses to understand the relationships and semantics of the Vet entity in a way far beyond simple text matching.

Optimizing for Cost and Fast Startup

If we were to run this embedding method on every application restart, it would lead to two significant drawbacks:

Long Startup Times: Each Vet JSON document would need to be re-embedded by making calls to the LLM again, delaying application readiness.
Increased Costs: Embedding 256 documents would send 256 requests to the LLM every time the app starts, leading to unnecessary usage of LLM credits.

Embeddings are better suited for ETL (Extract, Transform, Load) or streaming processes, which run independently of the main web application. These processes can handle embedding in the background without impacting user experience or causing unnecessary cost. To keep things simple in the Spring Petclinic, I decided to load the pre-embedded vector store on startup. This approach provides instant loading and avoids any additional LLM costs. Here’s the addition to the method to achieve that:

	@EventListener
	public void loadVetDataToVectorStoreOnStartup(ApplicationStartedEvent event) throws IOException {
		Resource resource = new ClassPathResource("vectorstore.json");

		// Check if file exists
		if (resource.exists()) {
			// In order to save on AI credits, use a pre-embedded database that was saved
			// to disk based on the current data in the h2 data.sql file
			File file = resource.getFile();
			((SimpleVectorStore) this.vectorStore).load(file);
			logger.info("vector store loaded from existing vectorstore.json file in the classpath");
			return;
		}
        // Rest of the method as before
        ...
        ...
}

The vectorstore.json file is located under src/main/resources, ensuring that the application will always load the pre-embedded vector store on startup, rather than re-embedding the data from scratch. If we ever need to regenerate the vector store, we can simply delete the existing vectorstore.json file and restart the application. Once the updated vector store is generated, we can place the new vectorstore.json file back into src/main/resources. This approach gives us flexibility while avoiding unnecessary re-embedding processes during regular restarts.

Implementing Similarity Search

With our vector store ready, implementing the listVets function becomes straightforward. The function is defined as follows:

	@Bean
	@Description("List the veterinarians that the pet clinic has")
	public Function<VetRequest, VetResponse> listVets(AIDataProvider petclinicAiProvider) {
		return request -> {
			try {
				return petclinicAiProvider.getVets(request);
			}
			catch (JsonProcessingException e) {
				e.printStackTrace();
				return null;
			}
		};
	}
    record VetResponse(List<String> vet) {
    };

    record VetRequest(Vet vet) {
    }

Here is the implementation in AIDataProvider:

	public VetResponse getVets(VetRequest request) throws JsonProcessingException {
		ObjectMapper objectMapper = new ObjectMapper();
		String vetAsJson = objectMapper.writeValueAsString(request.vet());

		SearchRequest sr = SearchRequest.from(SearchRequest.defaults()).withQuery(vetAsJson).withTopK(20);
		if (request.vet() == null) {
			// Provide a limit of 50 results when zero parameters are sent
			sr = sr.withTopK(50);
		}

		List<Document> topMatches = this.vectorStore.similaritySearch(sr);
		List<String> results = topMatches.stream().map(document -> document.getContent()).toList();
		return new VetResponse(results);
	}

Let’s review what we’ve done here:

We start with a Vet entity in the request. Since the records in our vector store are represented as JSON, the first step is to convert the Vet entity into JSON as well.
Next, we create a SearchRequest, which is the parameter passed to the similaritySearch method of the vector store. The SearchRequest allows us to fine-tune the search based on our specific needs. In this case, we mostly use the defaults, except for the topK parameter, which determines how many results to return. By default, this is set to 4, but in our case, we increase it to 20. This lets us handle broader queries like “How many vets specialize in cardiology?”
If no filters are provided in the request (i.e., the Vet entity is empty), we increase the topK value to 50. This enables us to return up to 50 vets for queries like “list the vets in the clinic.” Of course, this won’t be the entire list, as we want to avoid overwhelming the LLM with too much data. However, we should be fine because we carefully fine-tuned the system text to manage these cases:
```
When dealing with vets, if the user is unsure about the returned results, 
explain that there may be additional data that was not returned.
Only if the user is asking about the total number of all vets, 
answer that there are a lot and ask for some additional criteria. 
For owners, pets or visits - answer the correct data.
```
The final step is to call the similaritySearch method. We then map the getContent() of each returned result, as this contains the actual Vet JSONs rather than the embedded data.

From here, it’s business as usual. The LLM completes the function call, retrieves the results, and determines how best to display the data in the chat. Let’s see it in action: 5e16fa8b-4073-4d4f-ab1a-ca1324a83616

It looks like our system text is functioning as expected, preventing any overload. Now, let’s try providing some specific criteria: 76dcf069-0393-4749-b730-1f59a5cba3bd

The data returned from the LLM is exactly what we expect. Let’s try a broader question: 8ec546df-dcb4-4e1d-9f34-891dd39ff9e5

The LLM successfully identified at least 20 vets specializing in cardiology, adhering to our defined upper limit of topK (20). However, if there’s any uncertainty about the results, the LLM notes that there may be additional vets available, as specified in our system text.

Implementing the UI

Implementing the chatbot UI involves working with Thymeleaf, JavaScript, CSS, and the SCSS preprocessor. After reviewing the code, I decided to place the chatbot in a location accessible from any tab, making layout.html the ideal choice. During discussions about the PR with Dr. Dave Syer, I realized that I shouldn’t modify petclinic.css directly, as Spring Petclinic utilizes an SCSS preprocessor to generate the CSS file. I’ll admit—I’m primarily a backend Spring developer with a career focused on Spring, cloud architecture, Kubernetes, and Cloud Foundry. While I have some experience with Angular, I’m not an expert in frontend development. I could probably come up with something, but it likely wouldn’t look polished. Fortunately, I had a great partner for pair programming—ChatGPT. If you’re interested in how I developed the UI code, you can check out this ChatGPT session. It’s remarkable how much you can learn from collaborating with large language models on coding exercises. Just remember to thoroughly review the suggestions instead of blindly copy-pasting them.

Conclusion

After experimenting with Spring AI for a few months, I’ve come to deeply appreciate the thought and effort behind this project. Spring AI is truly unique because it allows developers to explore the world of AI without needing to train hundreds of team members in a new language like Python. More importantly, this experience highlights an even greater advantage: your AI code can coexist in the same codebase as your existing business logic. You can easily enhance a legacy codebase with AI capabilities using just a few additional classes. The ability to avoid rebuilding all your data from scratch in a new AI-specific application significantly boosts productivity. Even simple features like automatic code completion for your existing JPA entities in the IDE make a tremendous difference. Spring AI has the potential to significantly enhance Spring-based applications by simplifying the integration of AI capabilities. It empowers developers to leverage machine learning models and AI-powered services without needing deep expertise in data science. By abstracting complex AI operations and embedding them directly into familiar Spring frameworks, developers can focus on rapidly building intelligent, data-driven features. This seamless fusion of AI and Spring fosters an environment where innovation is not hindered by technical barriers, creating new opportunities for developing smarter, more adaptive applications.

Community

Projects

Production Projects

Incubating Projects

Get Involved

Benchmarking

Blog

AI Meets Spring Petclinic: Implementing an AI Assistant with Spring AI (Part II)

Recap of Part I

Retrieval-Augmented Generation

Generating Test Data

Embedding the Test Data

Optimizing for Cost and Fast Startup

Implementing Similarity Search

Implementing the UI

Conclusion

Community

Projects

Production Projects

Incubating Projects

Get Involved

Benchmarking

Blog

​Recap of Part I

​Retrieval-Augmented Generation

​Generating Test Data

​Embedding the Test Data

​Optimizing for Cost and Fast Startup

​Implementing Similarity Search

​Implementing the UI

​Conclusion

Recap of Part I

Retrieval-Augmented Generation

Generating Test Data

Embedding the Test Data

Optimizing for Cost and Fast Startup

Implementing Similarity Search

Implementing the UI

Conclusion