Table of contents

Just for you

AI connectors: use cases, benefits, and features

5 challenges of using retrieval-augmented generation (RAG)

Why AI companies need both raw and normalized customer data

David Dalmaso

Software Engineer

@Merge

Performing certain transformations on customer data before embedding and adding it to a vector database is essential to powering reliable, personalized, and robust AI capabilities. More specifically, the majority of your customer data needs to be normalized before it’s embedded.

But that might not be the case when critical data is unique to a specific customer.

You can read on to learn more about the role of normalized and raw data for fueling AI products and features.

Normalized data helps LLMs generate clean, accurate, and non-sensitive outputs

Normalization refers to the process of standardizing and transforming data into a consistent format across systems.

How data around file created date can be normalized — *Fields related to when a file gets created can be normalized across file storage solutions, or transformed into a common format*

This process offers several advantages during the retrieval portion of a RAG (retrieval-augmented generation) pipeline.

Since normalized data is consistent and doesn’t include extraneous information, an embedding algorithm is more likely to produce semantically-accurate vectors before storing them

This ensures that the most accurate contextual embeddings are retrieved, which in turn allows the LLM to generate more reliable output.

How normalized data can improve an LLM's outputs

But the value of normalized data doesn’t stop there.

The normalization process can also include removing certain types of sensitive data (e.g., social security numbers). This effectively prevents this data from being returned in your retrieval step.

How data related to tax numbers can be removed during the normalization process — *The process of normalizing data can include removing certain fields that are sensitive, like organizations’ tax numbers in your customers’ ERP systems*

Finally, part of normalizing data involves removing duplicates automatically. This means that duplicate data won’t go on to get embedded, retrieved, and used by an LLM.

*Normalizing data from customers’ HRISs can include removing duplicate names*

‍

https://www.merge.dev/blog/ai-enterprise-search?blog-related=image‍‍

Raw data lets you account for edge cases across your customer base

Your customers’ applications are often highly customized with unique objects and fields that fit their specific business needs.

*Your customers might have custom fields across systems of record that need to be fed to your LLM*

Since this type of data isn’t consistently created and stored across your customers’ systems, it wouldn’t make sense to create strict normalized data models for them.

That said, custom data can be an important part of a customer’s use case(s) with your product, making it an essential input for the LLM you use.

For example, say you offer a product intelligence solution that uses an LLM to summarize product feedback based on the transcripts of recorded customer calls. Let’s also assume that a customer has a unique “Customer Health Score” field in their CRM that can—depending on the value—determine how they prioritize product feedback.

By embedding health score data from that customer’s CRM, it can be returned in the retrieval step when the customer uses terminology and data related to a client’s health. Your LLM can then use the additional context to not only summarize customer-specific product feedback but also weigh in on whether and why it should be prioritized.

Access normalized and raw data across your integrations with Merge

Merge, the leading unified API solution, normalizes integrated data using predefined Common Models for the 200+ cross-category integrations it supports.

The platform also lets you access raw data from your customers’ systems through its Authenticated Passthrough Request feature.

Merge Authenticated Passthrough visualization — *How Merge’s Authenticated Passthrough Request feature works*

Learn how Merge powers cutting-edge AI companies like Guru, Ema, and Telescope, and discover how it can support your organization by scheduling a demo with one of our integration experts.

David Dalmaso

Software Engineer

@Merge

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

Company

MCP vs API: how to understand their relationship

Subscribe to the Merge Blog

Get stories from Merge straight to your inbox

But Merge isn’t just a Unified  API product. Merge is an integration platform to also manage customer integrations. gradient text

Add hundreds of integrations to your product through Merge’s Unified API

Just for you

AI connectors: use cases, benefits, and features

5 challenges of using retrieval-augmented generation (RAG)

Why AI companies need both raw and normalized customer data

Normalized data helps LLMs generate clean, accurate, and non-sensitive outputs

Since normalized data is consistent and doesn’t include extraneous information, an embedding algorithm is more likely to produce semantically-accurate vectors before storing them

Raw data lets you account for edge cases across your customer base

That said, custom data can be an important part of a customer’s use case(s) with your product, making it an essential input for the LLM you use.

Access normalized and raw data across your integrations with Merge

Read more

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

MCP vs API: how to understand their relationship

Subscribe to the Merge Blog

Add hundreds of integrations to your product through Merge’s Unified API

Just for you

AI connectors: use cases, benefits, and features

5 challenges of using retrieval-augmented generation (RAG)

Why AI companies need both raw and normalized customer data

Normalized data helps LLMs generate clean, accurate, and non-sensitive outputs

Since normalized data is consistent and doesn’t include extraneous information, an embedding algorithm is more likely to produce semantically-accurate vectors before storing them

Raw data lets you account for edge cases across your customer base

That said, custom data can be an important part of a customer’s use case(s) with your product, making it an essential input for the LLM you use.

Access normalized and raw data across your integrations with Merge

Read more

3 insider tips for using the Model Context Protocol effectively

We’ve launched Merge MCP to help AI companies leverage our integrations in minutes! Here’s how to use it

MCP vs API: how to understand their relationship

Subscribe to the Merge Blog

3 ways to drive business results with your new Merge integrations

3 ways to drive business results with your new Merge integrations

Get our best content every week