The top challenges of normalizing multiple API integrations
Gartner estimates that poor data quality leads to $15 million in losses per year, on average, at an organization.
Part of the solution to better data quality involves applying the concept of normalization: that is, having a single source of truth for how data can be read and written, regardless of where it comes from.
To help you leverage data normalization effectively, we’ll cover how it works, the top challenges of performing it for multiple API integrations, and a solution to help you overcome these challenges.
Note: Data normalization is used both in internal and external (customer-facing) integrations.
What is data normalization in the context of integration?
Data normalization accurately and consistently transforms data from a third-party source to a unified destination format, like your database.
In the context of integrating multiple APIs, it means you’re normalizing data across multiple unique systems into a single format.
To help clarify this definition, let's use an example: Say you’re looking to integrate an HRIS solution with your product to sync employee start dates (among other fields). The HRIS stores an employee start date as MM-DD-YYYY, while your backend stores it as DD-MM-YYYY. Normalizing this data would involve:
- Establishing DD-MM-YYYY as the standard date format in your backend.
- Writing logic to consistently translate source data into your unified format.
- Anticipating any needs for custom values or insertions that may be required from your customer. For example, a customer might store 'employee start date' in a custom field.
Note: Data normalization and data standardization might seem different. In practice, they aren’t. Data normalization and data standardization are functionally the same thing when defining the process of integration. For the purposes of this article, we’ll use “data normalization.”
Challenges of normalizing data
While there are several obstacles, here are the top ones to account for.
Unifying data models at scale
If you're accustomed to one-to-one or simple integrations, the concept of normalizing data into a ‘common’ data model may be new.
A common data model is a standard data object that’s is flexible enough to represent a single concept from multiple different APIs. Common models represent concepts in the real world; there can be “Employee” models, “Candidate” models, “Invoice” models, and so on.
Determining what a “Common” data model looks like for any external integration ahead of time can save your engineering team headaches and heartaches.
To help make this point, let’s start with what the world looks like without a common data model.
Related: A deep dive on the different types of REST API pagination
The scenario without a common data model
Imagine that you’re setting up applicant tracking system (ATS) integrations. Initially, you focus on Lever, where demand is high. You map Lever’s API fields directly to your backend, creating a simple 1:1 mapping.
Later, you need to integrate with Greenhouse. But there's a twist: The Greenhouse API, unlike Lever’s, has two fields to represent name: `first_name` and `last_name.` You think, "I'll just add a new field to accommodate this."
So, now your schema for an ATS candidate has three separate fields for someone’s name. `Name`, `First_name`, and `Last_name.`
As you continue to integrate with more ATS systems, you’ll find that each has a unique conventions for naming fields, leading your backend database to become cluttered with various fields for essentially the same data point.
When does it end? It’s up to you.
To recap, here are the top challenges of operating without a common data model:
- Increased complexity: Managing and understanding your database becomes increasingly complex with each new integration.
- Data redundancy: You end up with multiple fields in your database that represent the same data, leading to redundancy and potential inconsistencies.
- Difficult to maintain: Updating or modifying integrations becomes more difficult as you have to deal with different structures and fields for each ATS.
- Scalability issues: As you add more integrations, scaling your integrations' architecture becomes more challenging due to the ever-increasing number of fields and the complex logic required to handle them. And even if you have a common data model set up, you’ll still need to manage and maintain it at scale.
Transforming 3rd-party data at scale
Field transformation may seem trivial at first. In its most basic format, it’s making sure a field from system “A” fits the standards of system “B.” But its difficulties lie in building a transformation engine that's adaptable and reliable at returning consistently transformed data at scale.
To make this point, let’s continue with our example. You’ve chosen to normalize the names of candidates in ATS solutions to “First_Name” and “Last_Name.” The data in the source system, Lever, is stored as “FullName.”
Your code’s logic then should just look for the first space in a name, and then store the two strings separately. So, “FullName”:“Hayley Ye” becomes “First_Name”: “Hayley”, and “Last_Name”: “Ye”.
This seems simple enough, but your data transformation engine can cause issues.
Data transformation engines require consistent fine-tuning
The first issue that you run into is adaptability around edge cases.
For example, what if someone has a middle name? Or, what if they have a hyphenated name? Or what if the data is entered incorrectly? Obviously, these basic edge cases can eventually be accounted for. But what you can’t account for is the myriad of edge cases that will continue to add up.
This requires you to invest in integration transformation tooling that’s adaptable and efficient. For example, how can you quickly identify the source of poor data, and update the underlying transformation logic so that existing integrations don’t break?
Additionally, as you onboard customers, you'll need to continually adjust your transformation logic. New customers mean new, unpredictable “inputs” to your transformation engine. This requires you to have invested a lot in a reliable maintenance engine.
Related: A guide to integration maintenance
Data transformation engines can prove difficult to scale
Ensuring that the transformation process is both efficient and scalable is another layer of complexity. Your transformation engine can’t just handle data from a single integration: It needs to accommodate dozens of integrations used by hundreds of potential customers.
It’s one thing to ingest and transform 200,000 JSON objects. It’s another to ingest and transform 20M. How you scale your system will require a serious investment in architecture.
While the fundamentals of field and value transformation are simple, the tooling and architecture required to both update transformation logic and maintain it are far from simple.
Accounting for unique data configurations in enterprise software
The difficulties and nuances of business transformation continue to mount as you move upmarket.
Let’s say you’re trying to onboard X set of enterprise users, however the underlying systems they store their data in are a black box. This is because you only know the output of their APIs.
Enterprise business systems are known for adapting to business needs, and this makes building a generic integration very tough.
Your challenge isn’t to simply transform the data (addressed above), but also to know how that data changes over time.
Consider a typical scenario in a CRM system: when an Opportunity, like a potential sale, is successfully closed, it gets converted into a customer record. The question arises: how should your system query this data and understand that that change occurred.
For example, contact details, communication history, and purchase preferences from the Opportunity might need to be migrated to the customer's account. It's crucial to determine how your system will handle the logic of this transformation.
Keeping data secure while only collecting the information that’s needed
In an era of frequent data breaches and stricter privacy regulations, ensuring security and practicing data minimization are paramount in API integration.
Security concerns aren't just about protecting data from unauthorized access; they're also about ensuring that the data is transmitted, transformed, and stored securely throughout the integration process. This involves implementing robust encryption methods, secure authentication protocols, and a shift left stance for security.
Data minimization, on the other hand, is about only collecting and processing the data that’s absolutely necessary for the intended purpose. This is not only a best practice from a data privacy perspective but also reduces the complexity and potential risks associated with data handling.
Implementing features such as Scopes—or the controls that guarantee that you're only syncing the necessary data into your system is—necessary, but far from trivial.
Balancing the need for detailed, comprehensive multi-API integration with the principles of security and data minimization requires careful planning, a thorough understanding of the relevant data privacy laws, and a proactive approach to data governance.
Related: The top SaaS integration challenges
Allowing for user-defined mapping
Allowing users to define their own data mappings introduces additional complexity. This flexibility can be incredibly powerful, allowing users to tailor integrations to their specific needs. However, it also adds another dimension to the challenge, as you now need to support a wide range of potential mapping configurations, each with its own quirks and complexities.
Providing user-defined mapping capabilities requires robust, intuitive interfaces that allow users to easily configure their data mappings, as well as a backend system capable of handling a diverse array of mapping scenarios. It also necessitates a strong support and documentation system to guide users through the process and help them resolve any issues they might encounter.
Avoid normalization hassles with Merge
What if there was a platform that normalized all the data your customers care about?
Meet Merge.
Merge is a product integration platform that offers dozens of common data models for 200+ integrations. The platform handles all the underlying logic and transformation, and has robust security and compliance measures in place to guarantee that your customers’ data is kept secure.
You can learn more about how Merge works as well as how it can help your team integrate at scale by scheduling a demo with one of our integration experts.