Enterprise B2B companies need to integrate and maintain an average of 90+ vendors to ensure customer satisfaction. Developers need to decide the best method to build those integrations: scraping, reverse-engineering internal APIs, or building with official APIs. At Merge, we build our integrations with official APIs because of their speed, scalability, and stability. We build for enterprise customers.
Scraping? For enterprise data?
Yup. This time-tested method for pulling data is also the easiest way to build-out an integration. A scraper pulls information from the target platform as if you were copy and pasting all the data yourself. What you see on an HR platform, the computer sees and compiles.
Scraping is visual: you’re pinpointing the user interface (UI) elements of the target platform to pull relevant data for your API. Since you only need to have a registered account, getting access isn’t a hassle. Scrapy, Beautiful Soup, and Selenium are popular technologies that exist to help you scrape with blistering ease.
Scraping seems great! What could go wrong?
A lot — we’ll break it down by stage.
Scrapers start by logging in to user accounts. If a user’s session expires, the scraper will need a new sign-on. But with tiered security: 2-factor authentication, Google authentication, and FIDE keys, you’re now requiring a physical person to log back on and re-authenticate their server. That’s an unnecessary hassle for your customer.
Remember that part about the UI that made scraping so easy? It’s also its biggest weakness. UI gets tweaked, HTML elements change, and links break all the time. The customer doesn’t care about “session links” or “UI adjustments — ” they care about their product working. With scraping, developers leave too many critical variables out of their control.
Even if you had a fully built-out, well-maintained scraping integration, you’re still not going to be the best in show: scraping is plain slow. Imagine pulling 10,000 employees from a system by going page by page. Because you’re reliant on a public UI, you’re dependent on that UI’s speed.
While scraping may be easy to demonstrate as a proof of concept, it’s simply unacceptable for the requirements of enterprise-level integration management.
Internal APIs — could that be an alternative?
Absolutely. Remember: the internal API communicates data between the front-end (what the user sees) and the back-end (the logic controller of the website). Through any browser you can study network traffic, just like you can inspect elements and adjust HTML. Studying the network requests that a platform makes shows you how the internal API behaves. By repeating the requests the website makes, your own integration is able to pull data.
With internal APIs, reliability is more dependable than scraping. Even though you’re going in ‘blind,’ assuming you craft a decent integration, a platform rarely changes its API to be backward incompatible. And, compared to scraping, reverse engineering is significantly faster: you’re only making a few directed requests as opposed to pulling from tens of thousands of individual pages.
Sounds great — but I feel like something could be wrong…
You’re absolutely right. Companies can still change internal APIs without notice, which could break how your integration talks to the backend. Additionally, you may not have all of the data you need to make smart decisions. Since you’re still going off of what is publicly available, only a minimal amount of data is passed down from the website to populate the given UI. Behind the scenes, there is a whole host of data that you’re missing out on.
Now the good stuff. Official APIs are by far the best way to integrate — they just need a little extra love from the developer (hint: this is where Merge comes in).
This involves integrating with an API explicitly created for this purpose. A given platform allows external companies to integrate with their product. Unlike with scraping, a user never has to re-authenticate their integration: it’s permanent until disconnected. In terms of speed and quality, there’s nothing better. An integration built with the Greenhouse API can pull 5,000,000 candidates in minutes. Imagine scraping that: it would take hours or even days. Most official APIs allow the pushing and pulling of almost every piece of data, meaning you can have all of the information you need.
Official APIs are directly supported by their platforms, making them highly reliable. Companies know backward-incompatible changes could break integrating applications, so they agree not to. Additionally, platforms seek to be developer-friendly. They often advertise partnerships with companies who use their official APIs — you can see some of ours here and here.
So what’s the catch?
There are a few for the developer building out the integration.
Linking accounts is a bit more complicated, and may involve getting an API key.
With Merge, that’s not necessary. We make connecting with official APIs simple for end users.
Different platforms also expose data in different ways, and this can lead to headaches when accessing large quantities of data from different vendors. With Merge’s Unified API, we’ve already done the heavy lifting to normalize your data. Your job is to use it.
Integrating once with Merge gives you access to over 60+ integrations in HR, payroll, and recruiting. Want to test it out yourself? Sign-up for a demo here.