Why AI companies need both raw and normalized customer data

Performing certain transformations on customer data before embedding and adding it to a vector database is essential to powering reliable, personalized, and robust AI capabilities. More specifically, the majority of your customer data needs to be normalized before it’s embedded. 

But that might not be the case when critical data is unique to a specific customer.

You can read on to learn more about the role of normalized and raw data for fueling AI products and features.

Normalized data helps LLMs generate clean, accurate, and non-sensitive outputs 

Normalization refers to the process of standardizing and transforming data into a consistent format across systems.

How data around file created date can be normalized
Fields related to when a file gets created can be normalized across file storage solutions, or transformed into a common format

This process offers several advantages during the retrieval portion of a RAG (retrieval-augmented generation) pipeline.

Since normalized data is consistent and doesn’t include extraneous information, an embedding algorithm is more likely to produce semantically-accurate vectors before storing them.

This ensures that the most accurate contextual embeddings are retrieved, which in turn allows the LLM to generate more reliable output.

How normalized data can improve an LLM's outputs

But the value of normalized data doesn’t stop there.

The normalization process can also include removing certain types of sensitive data (e.g., social security numbers). This effectively prevents this data from being returned in your retrieval step.

How data related to tax numbers can be removed during the normalization process
The process of normalizing data can include removing certain fields that are sensitive, like organizations’ tax numbers in your customers’ ERP systems 

Finally, part of normalizing data involves removing duplicates automatically. This means that duplicate data won’t go on to get embedded, retrieved, and used by an LLM.

Normalizing data from customers’ HRISs can include removing duplicate names

https://www.merge.dev/blog/ai-enterprise-search?blog-related=image

Raw data lets you account for edge cases across your customer base

Your customers’ applications are often highly customized with unique objects and fields that fit their specific business needs.

Your customers might have custom fields across systems of record that need to be fed to your LLM

Since this type of data isn’t consistently created and stored across your customers’ systems, it wouldn’t make sense to create strict normalized data models for them.

That said, custom data can be an important part of a customer’s use case(s) with your product, making it an essential input for the LLM you use.

For example, say you offer a product intelligence solution that uses an LLM to summarize product feedback based on the transcripts of recorded customer calls. Let’s also assume that a customer has a unique “Customer Health Score” field in their CRM that can—depending on the value—determine how they prioritize product feedback.

By embedding health score data from that customer’s CRM, it can be returned in the retrieval step when the customer uses terminology and data related to a client’s health. Your LLM can then use the additional context to not only summarize customer-specific product feedback but also weigh in on whether and why it should be prioritized. 

Related: Why your RAG pipelines need normalized data

Access normalized and raw data across your integrations with Merge

Merge, the leading unified API solution, normalizes integrated data using predefined Common Models for the 200+ cross-category integrations it supports.

The platform also lets you access raw data from your customers’ systems through its Authenticated Passthrough Request feature

Merge Authenticated Passthrough visualization
How Merge’s Authenticated Passthrough Request feature works

Learn how Merge powers cutting-edge AI companies like Guru, Ema, and Telescope, and discover how it can support your organization by scheduling a demo with one of our integration experts.

Email Updates

Subscribe to the Merge Blog

Get stories from Merge straight to your inbox