Table of contents

Just for you

How to use LLM-powered agents to build intelligent workflow automations

Powering AI with product integrations

5 challenges of using retrieval-augmented generation (RAG)

Jon Gitlin

Senior Content Marketing Manager

@Merge

Retrieval-augmented generation can help language models (LLMs) generate more reliable, personalized, and valuable outputs.

But reaping these benefits isn’t a given.

There are several challenges to using the technique, and you’ll need to understand and address them proactively before you’re able to leverage RAG effectively.

Let’s take a closer look at some of these top challenges.

Related: How to use RAG effectively

Building and maintaining integrations

To help a LLM access 3rd-party data, you’ll need to connect to the associated 3rd-party data source(s).

For instance, you’ll need to build a screen scraper to copy certain text on a site on a recurring cadence and add it to the LLM. Or, to use another example, you’ll need to build to a SaaS application’s endpoints to access specific data and add it to the LLM consistently.

Whatever the case might be, the process of implementing and maintaining these connections requires significant technical resources. You might end up having to reallocate several engineers to this, which can prevent them from focusing on your core product.

Failing to perform retrieval operations quickly

Several factors can prevent your retrieval operations from working quickly (which, in turn, delays response generations).

This can come down to a number of factors, such as:

The size of the data source
Network delays
The number of data sources that need to be accessed
The number of queries a retrieval system needs to perform

Regardless of the cause, the retrieval operation can ultimately fail to work quickly enough to meet your needs and that of your end users (e.g., customers).

Configuring the output to include the source

To help users trust a LLM’s output and explore the answer further, you can append the specific data source(s) used to generate a particular output.

Screenshot of Dora AI — *Assembly, an HR platform, follows the use case described above; their customers can go on to visit the document that their AI feature, “Dora AI”, cites in a given output with ease*

Adding the correct source to any output, however, can prove complex. The LLM will need to be able to correctly identify the source for each output, and if several sources are used, this can prove even more difficult.

Also, your LLM will need to place the source in a section within the output that doesn’t disrupt the flow of the text. And, if multiple sources are used, the LLM needs to make the relationship between a source and the output that came from it clear to the end user—which can be difficult to navigate successfully.

Related: RAG best practices

Accessing sensitive data

Certain 3rd-party sources can include personally identifiable information (PII).

Without taking the proper precautions in accessing and handling this sensitive data, you can end up violating privacy laws and regulations, like GDPR or HIPAA.

This, in turn, can harm your business in a variety of tangible and intangible ways, such as significant fines, loss in customer trust, worsened reputation in the market, etc.

Using unreliable data sources

Countless sites may seem credible but have any combination of the following issues:

Contain false information
Don’t address a topic comprehensively
Fail to get updated over time
Include biased information
Experience extensive and lengthy outages

Training an LLM with data sources that have any of these flaws can lead the model to hallucinate (as the training data doesn’t cover the input) or generate false output through inaccurate training data.

Leverage RAG effectively with Merge

Using Merge, the leading unified API solution, you can access the data a LLM needs to power stand-out AI features in your product.

Merge allows you to add hundreds of integrations to your product in a single build as well as maintain and manage each integration with ease, all but ensuring the LLM receives a comprehensive set of accurate data without interruptions.

Moreover, Merge provides normalized data to your product, which allows the less predictable parts of a LLM to be at least partially offset, enabling the LLM to generate high quality output more consistently.

Learn more about how Merge powers AI features for companies like Guru, Causal, Ema, Peoplelogic, among others, and uncover how Merge can provide your product with LLM-ready data by scheduling a demo with one of our integration experts.

Jon Gitlin

Senior Content Marketing Manager

@Merge

Jon Gitlin is the Managing Editor of Merge's blog. He has several years of experience in the integration and automation space; before Merge, he worked at Workato, an integration platform as a service (iPaaS) solution, where he also managed the company's blog. In his free time he loves to watch soccer matches, go on long runs in parks, and explore local restaurants.

3 steps to build an MCP server from scratch

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

Company

Subscribe to the Merge Blog

Get stories from Merge straight to your inbox

But Merge isn’t just a Unified  API product. Merge is an integration platform to also manage customer integrations. gradient text

Add hundreds of integrations to your product through Merge’s Unified API

Just for you

How to use LLM-powered agents to build intelligent workflow automations

Powering AI with product integrations

5 challenges of using retrieval-augmented generation (RAG)

Building and maintaining integrations

Failing to perform retrieval operations quickly

Configuring the output to include the source

Accessing sensitive data

Using unreliable data sources

Leverage RAG effectively with Merge

Read more

3 steps to build an MCP server from scratch

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

Subscribe to the Merge Blog

Add hundreds of integrations to your product through Merge’s Unified API

Just for you

How to use LLM-powered agents to build intelligent workflow automations

Powering AI with product integrations

5 challenges of using retrieval-augmented generation (RAG)

Building and maintaining integrations

Failing to perform retrieval operations quickly

Configuring the output to include the source

Accessing sensitive data

Using unreliable data sources

Leverage RAG effectively with Merge

Read more

3 steps to build an MCP server from scratch

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

Subscribe to the Merge Blog

3 ways to drive business results with your new Merge integrations

3 ways to drive business results with your new Merge integrations

Get our best content every week