Table of contents

Just for you

How to build integrations that power enterprise AI search

Why normalized data is critical for best-in-class retrieval-augmented generation (RAG)

3 best practices for using retrieval-augmented generation (RAG)

Jon Gitlin

Senior Content Marketing Manager

@Merge

Retrieval-augmented generation (RAG) can play a crucial role in improving the quality of a large language models’ outputs.

It can prevent large language models (LLMs) from hallucinating; it can extend the LLMs use cases; it can make it easier to maintain LLMs, and more.

Before you can reap the benefits of RAG, however, you need to follow certain best practices over time.

Here are just a few worth implementing.

Continuously evaluate the outputs to spot issues and areas of improvement

Unfortunately, a LLM can stop producing the output you expect and want.

For example, if you make API calls to a base model (e.g., GPT) and the provider makes a change to the endpoint such that the API requests now reach a different version (e.g., GPT-3.5 instead of GPT-4), the output can worsen.

Whatever the reason, it’s worth keeping close tabs on the models’ outputs through human oversight and a wide range of tests. Only then you can look for the root issue and address it successfully.

In the case of testing, you can use—among other tests—consistency testing to see if similar queries result in similar outputs; load testing to see how the retriever and generator perform in high-demand scenarios; and edge-case testing to see how the model does in handling a wide range of scenarios.

Related: Common examples of RAG

Provide context on how the output was generated

To ensure users trust the LLM’s outputs, it’s worth appending links to the specific sources that were used and/or a brief description of how the response was generated—all within the output itself.

For example, say you’ve integrated your product with customers’ file storage systems and feed the files that are synced over to a LLM, which can process the files' information.

You can then power an intranet solution for employers that not only answers employees’ questions through the information in the files but also links out to the specific files used in generating the answers. To give users even more context, the output can mention the specific parts of the file it pulled from.

Screenshot of Assembly's AI feature, "Dora AI" — *Assembly, an HR platform, follows the use case described above; their customers can go on to visit the document that their AI feature, “Dora AI”, cites in a given output with ease*

Feed the LLM product integration data

Your success with RAG largely depends on the specific data sources you’re using.

To that end, you should look to use your customers’ data from product integrations.

For example, if a customer integrates their CRM with your product, you could feed the CRM data that’s synced over to your product to a LLM as well.

Product integration data offers a wealth of benefits that, taken together, allow a LLM to provide high quality outputs consistently:

The data is more likely to be accurate and up-to-date than other types of data, as it’s often taken from customers’ systems of record—which are consistently maintained
The data is often diverse, which leads the LLM to better support different use cases for RAG more effectively
The integrations for syncing the data can be API-based, which allows you to collect data from customers’ systems quickly and in the format a LLM needs

To help you collect product integration data at scale, you can use a unified API solution.

Using this type of integration solution, you can offer hundreds of integrations from a single build. This allows you to feed a higher share of your customers' data to a LLM, which, in turn, allows you to provide personalized AI features to a higher percentage of users.

Moreover, through Merge, the leading unified API solution, you’ll get access to a suite of Integration Observability features. They, coupled with Merge’s integration maintenance support, help your integrations experience little downtime—which enables a LLM to gather all the data it needs over time.

Finally, since Merge provides normalized data to your product, the LLM will be better positioned to generate high quality output more consistently.

Jon Gitlin

Senior Content Marketing Manager

@Merge

Jon Gitlin is the Managing Editor of Merge's blog. He has several years of experience in the integration and automation space; before Merge, he worked at Workato, an integration platform as a service (iPaaS) solution, where he also managed the company's blog. In his free time he loves to watch soccer matches, go on long runs in parks, and explore local restaurants.

3 steps to build an MCP server from scratch

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

Company

Subscribe to the Merge Blog

Get stories from Merge straight to your inbox

Ready to leverage RAG?

Learn how Merge helps companies like Guru, Causal, and Assembly use RAG for their product by scheduling a demo with one of our integration experts.

Schedule a demo

But Merge isn’t just a Unified  API product. Merge is an integration platform to also manage customer integrations. gradient text

Add hundreds of integrations to your product through Merge’s Unified API

Just for you

How to build integrations that power enterprise AI search

Why normalized data is critical for best-in-class retrieval-augmented generation (RAG)

3 best practices for using retrieval-augmented generation (RAG)

Continuously evaluate the outputs to spot issues and areas of improvement

Provide context on how the output was generated

Feed the LLM product integration data

Read more

3 steps to build an MCP server from scratch

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

Subscribe to the Merge Blog

Ready to leverage RAG?

Add hundreds of integrations to your product through Merge’s Unified API

Just for you

How to build integrations that power enterprise AI search

Why normalized data is critical for best-in-class retrieval-augmented generation (RAG)

3 best practices for using retrieval-augmented generation (RAG)

Continuously evaluate the outputs to spot issues and areas of improvement

Provide context on how the output was generated

Feed the LLM product integration data

Read more

3 steps to build an MCP server from scratch

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

Subscribe to the Merge Blog

Ready to leverage RAG?

3 ways to drive business results with your new Merge integrations

3 ways to drive business results with your new Merge integrations

Get our best content every week