3 best practices for using retrieval-augmented generation (RAG)

Retrieval-augmented generation (RAG) can play a crucial role in improving the quality of a large language models’ outputs.

It can prevent large language models (LLMs) from hallucinating; it can extend the LLMs use cases; it can make it easier to maintain LLMs, and more.

Before you can reap the benefits of RAG, however, you need to follow certain best practices over time.

Here are just a few worth implementing.

Continuously evaluate the outputs to spot issues and areas of improvement

Unfortunately, a LLM can stop producing the output you expect and want.

For example, if you make API calls to a base model (e.g., GPT) and the provider makes a change to the endpoint such that the API requests now reach a different version (e.g., GPT-3.5 instead of GPT-4), the output can worsen.

Whatever the reason, it’s worth keeping close tabs on the models’ outputs through human oversight and a wide range of tests. Only then you can look for the root issue and address it successfully. 

In the case of testing, you can use—among other tests—consistency testing to see if similar queries result in similar outputs; load testing to see how the retriever and generator perform in high-demand scenarios; and edge-case testing to see how the model does in handling a wide range of scenarios. 

Related: Common examples of RAG

Provide context on how the output was generated

To ensure users trust the LLM’s outputs, it’s worth appending links to the specific sources that were used and/or a brief description of how the response was generated—all within the output itself.

For example, say you’ve integrated your product with customers’ file storage systems and feed the files that are synced over to a LLM, which can process the files' information. 

You can then power an intranet solution for employers that not only answers employees’ questions through the information in the files but also links out to the specific files used in generating the answers. To give users even more context, the output can mention the specific parts of the file it pulled from.

Screenshot of Assembly's AI feature, "Dora AI"
Assembly, an HR platform, follows the use case described above; their customers can go on to visit the document that their AI feature, “Dora AI”, cites in a given output with ease

Related: The top challenges of using RAG

Feed the LLM product integration data

Your success with RAG largely depends on the specific data sources you’re using.

To that end, you should look to use your customers’ data from product integrations.

For example, if a customer integrates their CRM with your product, you could feed the CRM data that’s synced over to your product to a LLM as well. 

CRM integration for LLMs

Product integration data offers a wealth of benefits that, taken together, allow a LLM to provide high quality outputs consistently:

  • The data is more likely to be accurate and up-to-date than other types of data, as it’s often taken from customers’ systems of record—which are consistently maintained
  • The data is often diverse, which leads the LLM to better support different use cases more effectively
  • The integrations for syncing the data can be API-based, which allows you to collect data from customers’ systems quickly and in the format a LLM needs  

To help you collect product integration data at scale, you can use a unified API solution.

Using this type of integration solution, you can offer hundreds of integrations from a single build. This allows you to feed a higher share of your customers' data to a LLM, which, in turn, allows you to provide personalized AI features to a higher percentage of users.

Moreover, through Merge, the leading unified API solution, you’ll get access to a suite of Integration Observability features. They, coupled with Merge’s integration maintenance support, help your integrations experience little downtime—which enables a LLM to gather all the data it needs over time. 

Finally, since Merge provides normalized data to your product, the LLM will be better positioned to generate high quality output more consistently.

{{this-blog-only-cta}}