How to Stop Being Rate Limited: Best Practices for Making API Calls At Scale

David Donnelly II

February 17, 2022

Editor's note: This is a series on API-based integrations. Check out Merge if you're looking to add 150+ integrations across HR, ATS, CRM, Accounting, and Project Management platforms with one unified API.

Congratulations: you finally pushed your integration to prod. Everything should go fine, right? After all, pulling just 5 employees from Greenhouse’s API worked like a charm.

Then the trouble starts – 429 Error: Rate Limit Exceeded. 

Feels like you got caught red-handed. Turns out pulling 50,000 employees didn’t scale as well as you’d thought. More importantly: if it’s this easy to get rate limited for one integration, how do you manage a whole unified integration solution? At Merge, our team has faced, and overcome, this exact challenge. 

In Part 1 of this article, we’ll cover what rate limiting is, why you should care about it, and the four most common ways rate limits are implemented by API providers. 

In Part 2, we’ll set about a methodology for ensuring that your data engine for making API calls can avoid getting rate limited. We’ll start with creating a technical definition of a rate limit, think through the storage options for single server and multi-server set-ups, and outline how we’re able to dynamically handle rate limits. 

By the end of this article, you’ll have a solid understanding of the fundamental frameworks and methods for avoiding rate limits when building integrations. 

Part 1: Understanding Rate Limiting

What is Rate Limiting?

3rd Party APIs will sometimes implement a rate limit to prevent their users (or malicious actors) from flooding their servers with too many requests. In the worst case, these efforts are called DDoS (Distributed Denial of Service) attacks and can lead to outages of that third-party’s platform, leading to downtime for all of their customers. 

At Merge, we interact with a lot of APIs. This means we’re required to avoid being rate limited in countless ways, at all hours, every day. Because we’re constantly making calls to these APIs we’ve had to dynamically figure out how to configure our internal backend to handle these rate limits appropriately. 

Why You Should Care About Being Rate Limited

Not handling rate limits properly can lead to the following issues:

  1. Account Suspension: Accounts that continually violate rate limits can be marked by 3rd parties as "bad actors" attempting DDoS attacks. 
  2. Write Blocking: If an integration reaches its rate limit, the third party will likely stop processing requests. This is especially pressing for WRITE/UPDATE/DELETE operations, which often need to happen with a fast turnaround.
  3. Degrading Integration Performance: At Merge, we’re aware that we may not be the only client using an API instance for data management. We never want to negatively impact other services that our end-users and partners have built.

For our platform (and yours) any of these outcomes are unacceptable: they either negatively affect our or your customers’ experience using a product.

To prevent being rate limited, it’s generally best to practice to stay under a set threshold below the 3rd party's rate limit, as opposed to running right up to that limit. 

4 Ways You’ll Be Rate Limited

We’ve been able to adapt to the four major ways APIs implement rate-limiting. These are:

  1. Request Frequency Rate Limits: these are limits based on the number of requests in a defined time range. Generally, limits are configured to be a number of requests per second (for example, 10/s), but others can be hourly, daily, or even weekly.
  2. Fetched Model Count Rate Limits: these are limits based on the amount of data fetched in a defined time range. These limits will often be configured as "entities" or "models" returned in response to payloads from the 3rd party. Similar to frequency, they can have varying degrees of time ranges and cutoff points.
  3. Concurrent Sessions Rate Limits: these are limits based on the number of active client/server sessions established with the 3rd party service. These rate limits are typically not set over a time range, but rather are the current count at any time.
  4. Unsuccessful Request Limits: these are limits that will start restricting access if too many unsuccessful requests are made in a row. Generally used to thwart brute force (DDoS) attacks and unbounded exponential backoff. These limits typically have a daily time frame and should have a very low threshold.

As you design your rate limit solution, you’ll want to be aware of how every platform understands its rate-limiting, and make sure to cater your rate limit management for that platform. 

Part 2: Implementing Rate Limit Tracking Into Your Platform

Defining a Rate Limit Configuration

The most basic function of a rate limit tracker is the ability to programmatically define all the different rate limits you want to account for. It’s not uncommon for platforms to have more than one rate limit or even variations of different types. You’ll want to be able to define one or more rate limits per platform you’re integrating with.

We’ve also found that while the majority of rate limit tracking occurs at a platform-wide level, some APIs adjust rate limits for specific end-users. Therefore, our rate limit tracker stores two types of information: platform-specific details and end-user-specific details.

The definitions of platform-specific rate limit tracking we recommend using are:

  1. Rate Limit Type: The type of rate limit we’re defining
  2. Rate Limit Default Threshold: The default threshold of where to start slowing down requests or stopping them completely. 
  3. Rate Limit Max Count: This is the number of “incidents” where rate-limiting starts should start kicking in. We say “incident”  to accommodate the different rate limit types. For example, a request frequency rate limit of 10/s will have a max count value of 10.
  4. Rate Limit Time Period: The time range that the limit is defined over. Typically will be seconds, minutes, or days.
  5. Rate Limit Time Value: The amount of your time period that the expiration of a rate limit is set to. For example, a model count rate limit of 300 entities every 2 hours will have a time period of HOURS and a time value of 2. Using both period and value will let you configure pretty much any time range that will be defined for a rate limit
  6. Default Backoff Factor + Retry Count: Related to how to manage to lower your rate limit. We’ll discuss this in more detail in our “What To Do If a Rate Limit Threshold Has Been Reached” section.

Tracking End User-Specific Details

Rate Limit configurations are great for defining what a rate limit looks like for an entire 3rd party API, and for the majority of APIs that you deal with this definition alone should suffice. But after working with so many different APIs and seeing many, many different rate limits Merge noticed a trending design choice: platforms will change the details of a rate limit for an end-user’s specific use of that API. 

Two factors that influence this are:

  1. Changing the rate limit threshold based on customer pricing. For example, having a rate limit of 100 requests/second for enterprise users and 50/s of freemium customers
  2. A rolling rate limit that will adjust based on current server load (this limit is typically passed back via API response Headers)

Differences in rate limits per customer are often defined in either the docs of an API, or they’re returned via API headers so that a user can dynamically determine the rate limit they must abide by. In these cases, Merge tracks these details per end-user using the following definitions as a template of default values that are overridden when we start fetching data. 

Our End-User Rate Limit Definition has the following attributes:

  1. Rate Limit Configuration: A reference back to the general definition of the rate limit (see above)
  2. End-User: A reference to the end-user the rate limit details 
  3. Override Default Threshold: We thought it useful to also be able to throttle specific integrations per customer requests. Even if a rate limit is the same for all users, if one of our (or your) customers want to operate at a different threshold than the default, then we could easily configure that for them.
  4. Override Rate Limit Max Count: This covers our cases above where 3rd parties will change their rate limits depending on the customer. At Merge, this field can be either manually set or will dynamically populate itself if the rate limit offers details via headers.

Tracking API Requests

Now that you know how you’re defining your rate limits, let’s work through how you can best store information about how close you are to a given rate limit. You don’t want to call an API blind! Regardless of whether you’re making API calls on a single server or a distributed system, the same general logic for choosing where to track a rate limit applies:

  • Temporary storage is the quickest option to track how many calls have been made to a server. However, we lose this information if the process gets terminated (in error or the script finishes)
  • Persistent storage with a database is slower. However, data is saved in case of interruption or sync completion and can be referenced later. 

Which storage you use depends on the timeframe for the rate limit. For example:

  • Integration with a Rate Limit of 10/s should use temporary storage. Why? Because of the faster expiration (seconds!), it would make sense to store this locally because of the speed at which your program can check it.  
  • Integration with a rate limit of 100 calls per day should use persistent storage. Why? The slower rate limit means that your process will need to know in hours, not seconds, whether it should try again. You don’t want to worry about losing this information, and you’re not calling this process many times a second — so store it in a persistent database. 

The proper implementation of rate-limiting will utilize both methods. While performing syncs, it's likely best to use temporary storage while syncing along with periodic saves to persistent storage. 

For robust applications, and to cover the 4 different types of rate limits, it is likely best to wrap all of your rate limit tracking into a manager. This manager can be accessed by your data engine and specifically called when your application:

  1. Makes an API Request
  2. Fetches a number of data elements via API
  3. Starts a new process or connection to a 3rd party
  4. Encounters a non-200 response from an API

Use this manager to implement your different storage access, and have a “check” method that will validate that you’re below thresholds before making an API call.

What to do When a Rate Limit Threshold Has Been Reached

So far, we’ve discussed how to define, track, and store rate limit information while your application is performing data syncs. But what happens when we hit the rate limit threshold? Because of the nature of rate-limiting, the solution is only to wait until we are below the threshold. However, we do have agency over how long we wait until we are below the threshold. 

Merge uses two main approaches when we approach a rate limit: 

  1. Exponential backoff: Exponential backoff is a general technique of slowing down a constrained process until it reaches an appropriate speed. For us, we have our data sync sleep for a small period before re-checking temporary storage where we are within the rate limit. If we are still at the threshold, then we sleep again for an exponentially increased amount of time before another retry. At Merge, we tune backoffs specifically for each rate limit in the configuration mentioned above
  2. Save + Schedule: Sleeping a process is not always ideal in large applications because it wastes resources that could be used for other work. In the “save and schedule” method we commit our current rate limit status to persistent storage, log where we are in the data sync, and then schedule the task to pick up where it left off at a later time. 

Which approach do I use when reaching rate limit thresholds? 

The main factor in your decision is going to be the time frame of the limit. Sleeping a process is a waste of resources: if a rate limit is configured with a time period of several hours or a day, you’ll want to give those resources back to the machine and schedule a time later to try. But another consideration is that “starting from where I left off” will either not be possible for some 3rd Party APIs, or will be more computationally expensive than just sleeping for a few seconds and trying again. 

Implementing Rate Limit Tracking in a Distributed System

When you’re making calls off of a simple application on your laptop or maybe even a single server, the two types of storage are pretty easy to work with:

  • Store temporary information on rate limits in a local variable or globally defined hash set
  • Store persistent data in a local DB or filesystem 

For enterprise applications, you’ll likely have data syncs distributed amongst multiple servers or even entire clusters and data centers. Even for the same end-user, you could be running multiple syncs at once on different machines, but they all have to stay under one rate limit.

We want to keep the same 2-storage type approach, but change our technology implementation as follows:

  • Store temporary information in a shared cache (Merge uses Redis, which has been extremely performant for us). This will give you fast read/write, but take special consideration of:
  • Locking: You want rate limit tracking to be fast to avoid both unnecessarily slowing down of your data syncs, and also the slowing down of tracking itself. If tracking is out of step, then your current “rate limit status” will be out of date relative to the real 3rd party server.
  • Race conditions that occur between multiple machines update the same rate limit status for the same user. At Merge, we use only increments and expirations when counting limits (queue + set methods).
  • Store persistent information on your regular application database. While this data will be accessed far less frequently than the temporary storage, take special consideration not to “forget” about it and only use temporary storage. Make sure to have your data engine always check persistent storage for an active rate limit block before starting again because there is a chance your temporary storage has purged old data and you’re still under a rate limit threshold!

Using Dynamic Thresholds to Manage Rate Limits

The design above should offer you most of the functionality you’ll need to define, track, and stay under rate limits. 

On top of this, one final feature that we’ve built on top of our tracking system is Dynamic Default Thresholds. 

We previously mentioned that rate limit thresholds are dynamic in Merge because they can be changed per customer and end-user enabling granular control over how much work Merge does on behalf of our customers. The thresholds can also be configured to be even more dynamic by being adjusted based on anticipated data load. 

Remember from Section 1 (“Why You Should Care About Being Rate Limited”) that an important consideration of where to set our thresholds comes from the fact that Merge, or your product, is likely not the only user of an API. While we can set a threshold and track the work that we’re doing, there’s not a good way to know where we stand against a total rate limit — not including the headers or API endpoints that tell you your current rate limit count.

To compensate for this, we set relatively conservative thresholds for our rate limits. Of course, this lower threshold is going to slow down data syncs because the more we run into the threshold, the more often we will either have to back off or try again later. 

What would an ideal solution to this be? If we could know beforehand how much data we’d need to fetch, then we can specifically tailor a threshold to:

  • Perform the data sync our data-engine needs
  • Leave room for other users of the same API

And? You guessed it — Merge has several techniques to estimate the amount of data we are about to start fetching and approximately how many requests it will take to do so. Before starting a sync we adjust the rate limit threshold (always respecting a customer-set limit) to either sync faster for higher data loads, or more conservatively for smaller ones. 

Implementing this feature is difficult and likely the subject of a blog post in itself. After it's implemented, you just need to tune your rate limit management system for all of the different attributes listed above. Implementing such a system can be difficult, especially at scale, but is a vital part of any integration effort.

If you made it this far in the article then congrats! You just read nearly 3,000 words on how to successfully track and prevent yourself from being rate limited! If you’re looking to implement such a system into your product, but either don’t have the time or resources to do so at scale, then consider Merge. We’ve already built out rate-limiting for our HR and Payroll, ATS, CRM, ticketing, file storage, and Accounting systems, so start building with a free account or schedule a demo

Email Updates

Subscribe to the Merge Blog

Get stories from Merge straight to your inbox

Subscribe to Blog
100+ integrations, all in one place
Learn how companies like Gong, Calendly, and Ramp scaled their integrations in days with Merge’s unified API
Book a demo