Using RAG and Vector Store with Spring AI

This article will teach you how to create a Spring Boot application that uses RAG (Retrieval Augmented Generation) and vector store with Spring AI. We will continue experiments with stock data, which were initiated in my previous article about Spring AI. This is the third part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one:

https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.

This article will show how to include one of the vector stores supported by Spring AI and advisors dedicated to RAG support in our sample application codebase used by two previous articles. It will connect to Open AI API, but you can easily switch to other models using Mistral AI or Ollama support in Spring AI. For more details, please refer to my first article.

Source Code

If you would like to try it by yourself, you may always take a look at my source code. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation for RAG with Spring AI

The problem to solve is similar to the one described in my previous article about the Spring AI function calling feature. Since the OpenAI model is trained on a static dataset it does not have direct access to the online services or APIs. We want it to analyze stock growth trends for the biggest companies in the US stock market. Therefore, we must obtain share prices from a public API that returns live stock market data. Then, we can store this data in our local database and integrate it with the sample Spring Boot AI application. Instead of a typical relational database, we will use a vector store. In vector databases, queries work differently from those in traditional relational databases. Instead of looking for exact matches, they conduct similarity searches. It retrieves the most similar vectors to a given input vector.

After loading all required data into a vector database, we must integrate it with the AI model. Spring AI provides a comfortable mechanism for that based on the Advisors API. We have already used some built-in advisors in the previous examples, e.g. to print detailed AI communication logs or enable chat memory. This time they will allow us to implement a Retrieval Augmented Generation (RAG) technique for our app. Thanks to that, the Spring Boot app will retrieve similar documents that best match a user query before sending a request to the AI model. These documents provide context for the query and are sent to the AI model alongside the user’s question.

Here’s a simplified visualization of our process.

Vector Store with Spring AI

Set up Pinecone Database

In this section, we will prepare a vector store, integrate it with our Spring Boot application, and load some data there. Spring AI supports various vector databases. It provides the VectorStore interface to directly interact with a vector store from our Spring Boot app. The full list of supported databases can be found in the Spring AI docs here.

We will proceed with the Pinecone database. It is a popular cloud-based vector database, that allows us to store and search vectors efficiently. Instead of a cloud-based database, we can set up a local instance of another popular vector store – ChromaDB. In that case, you can use the docker-compose.yml file in the repository root directory, to run that database with the docker compose up command. With Pinecone we need to sign up to create an account on their portal. Then we should create an index. There are several customizations available, but the most important thing is to choose the right embedding model. Since text-embedding-ada-002 is a default embedding model for OpenAI we should choose that option. The name of our index is spring-ai. We can read an environment and project name from the generated host URL.

After creating an index, we should generate an API key.

Then, we will copy the generated key and export it as the PINECONE_TOKEN environment variable.

export PINECONE_TOKEN=

ShellSession

Integrate Spring Boot app with Pinecode using Spring AI

Our Spring Boot application must include the spring-ai-pinecone-store-spring-boot-starter dependency to smoothly integrate with the Pinecone vector store.


  org.springframework.ai
  spring-ai-pinecone-store-spring-boot-starter

XML

Then, we must provide connection settings and credentials to the Pinecone database in the Spring Boot application.properties file. It must be at least the Pinecode API key, environment name, project name, and index name.

spring.ai.vectorstore.pinecone.apiKey = ${PINECONE_TOKEN}
spring.ai.vectorstore.pinecone.environment = aped-4627-b74a
spring.ai.vectorstore.pinecone.projectId = fsbak04
spring.ai.vectorstore.pinecone.index-name = spring-ai

SQL

After providing all the required configuration settings we can inject and use the VectorStore bean e.g. in our application REST controller. In the following code fragment, we load input data into a vector store and perform a simple similarity search to find the most growth stock trend. We individually query the Twelvedata API for each company from a list and deserialize the response to the StockData object. Then we create a Spring AI Document object, which contains the name of a company and share close prices for the last 10 days. Data is written in the JSON format.

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY}")
    private String apiKey;

    public StockController(VectorStore store,
                           RestTemplate restTemplate) {
        this.store = store;
        this.restTemplate = restTemplate;
    }

    @PostMapping("/load-data")
    void load() throws JsonProcessingException {
        final List<String> companies = List.of("AAPL", "MSFT", "GOOG", "AMZN", "META", "NVDA");
        for (String company : companies) {
            StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize=10&apikey={1}",
                    StockData.class,
                    company,
                    apiKey);
            if (data != null && data.getValues() != null) {
                var list = data.getValues().stream().map(DailyStockData::getClose).toList();
                var doc = Document.builder()
                        .id(company)
                        .text(mapper.writeValueAsString(new Stock(company, list)))
                        .build();
                store.add(List.of(doc));
                LOG.info("Document added: {}", company);
            }
        }
    }
    
    @GetMapping("/docs")
    List<Document> query() {
        SearchRequest searchRequest = SearchRequest.builder()
                .query("Find the most growth trends")
                .topK(2)
                .build();
        List<Document> docs = store.similaritySearch(searchRequest);
        return docs;
    }
    
}

Java

Once we start our application and call the POST /stocks/load-data endpoint, we should see 6 records loaded into the target store. You can verify the content of the database in the Pinocone index browser.

Then we can interact directly with a vector store by calling the GET /stocks/docs endpoint.

curl http://localhost:8080/stocks/docs

ShellSession

Implement RAG with Spring AI

Use QuestionAnswerAdvisor

Previously we loaded data into a target vector store and performed a simple search to find the most growth trend. Our main goal in this section is to incorporate relevant data into an AI model prompt. We can implement RAG with Spring AI in two ways with different advisors. Let’s begin with QuestionAnswerAdvisor. To perform RAG we must provide an instance of QuestionAnswerAdvisor to the ChatClient bean. The QuestionAnswerAdvisor constructor takes the VectorStore instance as an input argument.

@RequestMapping("/v1/most-growth-trend")
String getBestTrend() {
   PromptTemplate pt = new PromptTemplate("""
            {query}.
            Which {target} is the most % growth?
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            """);

   Prompt p = pt.create(
            Map.of("query", "Find the most growth trends",
                   "target", "share")
   );

   return this.chatClient.prompt(p)
            .advisors(new QuestionAnswerAdvisor(store))
            .call()
            .content();
}

Java

Then, we can call the endpoint GET /stocks/v1/most-growth-trend to see the AI model response. By the way, the result is not accurate.

Let’s work a little bit on our previous code. We will publish a new version of the AI prompt under the GET /stocks/v1-1/most-growth-trend endpoint. The changed lines have been highlighted. We build the SearchRequest objects that return the top 3 records with the 0.7 similarity threshold. The newly created SearchRequest object must be passed as an argument in the QuestionAnswerAdvisor constructor.

@RequestMapping("/v1-1/most-growth-trend")
String getBestTrendV11() {
   PromptTemplate pt = new PromptTemplate("""
            Which share is the most % growth?
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            Return a full name of company instead of a market shortcut. 
            """);

   SearchRequest searchRequest = SearchRequest.builder()
            .query("""
            Find the most growth trends.
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            """)
            .topK(3)
            .similarityThreshold(0.7)
            .build();

   return this.chatClient.prompt(pt.create())
            .advisors(new QuestionAnswerAdvisor(store, searchRequest))
            .call()
            .content();
}

Java

Now, the results are more accurate. The model also returns the full name of companies instead of a market shortcut.

Use RetrievalAugmentationAdvisor

Instead of the QuestionAnswerAdvisor class, we can also use the experimental RetrievalAugmentationAdvisor. It provides an out-of-the-box implementation for the most common RAG flows, based on a modular architecture. There are several built-in modules we can use with RetrievalAugmentationAdvisor. We will include the RewriteQueryTransformer module that uses LLM to rewrite a user query to provide better results when querying a target vector database. It requires the query and target placeholders to be present in the prompt template. Thanks to that transformer we can retrieve the optimal set of records for a percentage growth calculation.

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final ChatClient chatClient;
    private final RewriteQueryTransformer.Builder rqtBuilder;
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY}")
    private String apiKey;

    public StockController(ChatClient.Builder chatClientBuilder,
                           VectorStore store,
                           RestTemplate restTemplate) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.rqtBuilder = RewriteQueryTransformer.builder()
                .chatClientBuilder(chatClientBuilder);
        this.store = store;
        this.restTemplate = restTemplate;
    }
    
    // other methods ...
    
    @RequestMapping("/v2/most-growth-trend")
    String getBestTrendV2() {
        PromptTemplate pt = new PromptTemplate("""
                {query}.
                Which {target} is the most % growth?
                The 0 element in the prices table is the latest price, while the last element is the oldest price.
                """);

        Prompt p = pt.create(Map.of("query", "Find the most growth trends", "target", "share"));

        Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
                .documentRetriever(VectorStoreDocumentRetriever.builder()
                        .similarityThreshold(0.7)
                        .topK(3)
                        .vectorStore(store)
                        .build())
                .queryTransformers(rqtBuilder.promptTemplate(pt).build())
                .build();

        return this.chatClient.prompt(p)
                .advisors(retrievalAugmentationAdvisor)
                .call()
                .content();
    }

}

Java

Once again, we can verify the AI model response by calling the GET /stocks/v2/most-growth-trend endpoint. The response is similar to those generated by the GET /stocks/v1-1/most-growth-trend endpoint.

Run the Application

Only to remind you. Before running the application, we should provide OpenAI and TwelveData API tokens.

$ export OPEN_AI_TOKEN=<YOUR_OPEN_AI_TOKEN>
$ export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
$ mvn spring-boot:run

ShellSession

Final Thoughts

In this article you learned how to use an important AI technique called Retrieval Augmented Generation (RAG) with Spring AI. Spring AI simplifies RAG by providing built-in support for vector stores and easy data incorporation into the chat model through the Advisor API. However, since RetrievalAugmentationAdvisor is an experimental feature we cannot rule out some changes in future releases.

Using RAG and Vector Store with Spring AI

OpenShift AI with vLLM and Spring AI

Spring AI with Azure OpenAI

Using Model Context Protocol (MCP) with Spring AI

Using RAG and Vector Store with Spring AI

Source Code

Motivation for RAG with Spring AI

Vector Store with Spring AI

Set up Pinecone Database

Integrate Spring Boot app with Pinecode using Spring AI

Implement RAG with Spring AI

Use QuestionAnswerAdvisor

Use RetrievalAugmentationAdvisor

Run the Application

Final Thoughts

Like this:

Related

Related Posts

OpenShift AI with vLLM and Spring AI

Spring AI with Azure OpenAI

Using Model Context Protocol (MCP) with Spring AI