Skip to content

Rate Limiting

What is rate limiting?

Rate limiting is a strategy for limiting network traffic. It puts a cap on how often someone can repeat an action within a certain timeframe – for instance, trying to log in to an account. Rate limiting can help stop certain kinds of malicious bot activity. It can also reduce strain on web servers. However, rate limiting is not a complete solution for managing bot activity.

What Are the Main Uses of Rate Limiting?

The primary aim of rate limiting is ensuring fair use of shared resources. Beyond that, rate limiting is a versatile technique that organizations may make use of for a wide variety of reasons.

Rate Limiting Offers Extra Security

Rate limiting prevents denial of service (DoS) attacks. In a DoS attack, a malicious user initiates a massive number of service requests to bring down the system. For example, if a concert promoter and ticket sales website gets a million requests in a second once a concert goes on sale, it will choke the system, and the webserver and database may become unavailable. With rate limiting, the website can prevent any such attempts. A denial of service attack can even happen even if the client does not have any wrong intent. It happens when there is an error in the system that issues the request (client-side). Rate limiting also prevents such unintended attacks.

Access Control

Rate limiting not only deals with limiting the number of requests, but it can be modified to limit the level of access also. For example, if there is an API-based service to view and modify the personal details of a user, the rate-limiting algorithm can implement different access levels. One set of users can only view the personal details, while the second set can both view and modify the details.

Metering for APIs

In API business models, rate limiting can be used for metering the usage. For example, if a user has signed up for a plan that allows 1000 API requests per hour, the rate limiting logic will restrict any API request above the cap. Also, the rate limiting algorithm can dynamically allow the user to purchase more requests per second.

Guarantees Performance

A key objective of implementing rate limiting logic is to ensure the performance of an API. When the system allows unlimited requests, the performance of the server degrades and slows down the API. In extreme cases, the server might fail to take up any requests. This may lead to cascading failures in a distributed system, where the load from the non-functioning server is distributed to the other servers and gradually overloads them. Rate limiting prevents this condition by either restricting the requests at the user level or the server level.

Ensures Availability

One of the main requirements of API-based services is their 24/7 availability. Every second, thousands of users access an API. Even a few seconds of outage can result in a huge loss for the organization. Therefore, it is in the best interest of every organization to strive for zero downtime. Rate limiting and other techniques like load-sharing allow for organizations to restrict the sudden bursts in APIrequests and ensure the system’s availability.

What kinds of bot attacks are stopped by rate limiting?

Rate limiting is often employed to stop bad bots from negatively impacting a website or application. Bot attacks that rate limiting can help mitigate include:

  • Brute force attacks
  • DoS and DDoS attacks
  • Web scraping

How does rate limiting work?

Rate limiting also protects against API overuse, which is not necessarily malicious or due to bot activity, but is important to prevent nonetheless.

Rate limiting runs within an application, rather than running on the web server itself. Typically, rate limiting is based on tracking the IP addresses that requests are coming from, and tracking how much time elapses between each request. The IP address is the main way an application identifies who or what is making the request.

A rate limiting solution measures the amount of time between each request from each IP address, and also measures the number of requests within a specified timeframe. If there are too many requests from a single IP within the given timeframe, the rate limiting solution will not fulfill the IP address's requests for a certain amount of time.

Essentially, a rate-limited application will say, "Hey, slow down," to unique users that are making requests at a rapid rate. This is comparable to a police officer who pulls over a driver for exceeding the road's speed limit, or to a parent who tells their child not to eat so much candy in such a short span of time.

How does rate limiting work with user logins?

Users may find themselves locked out of an account if they unsuccessfully attempt to log in too many times in a short amount of time. This occurs when a website has login rate limiting in place.

This precaution exists, not to frustrate users who have forgotten their passwords, but to block brute force attacks in which a bot tries thousands of different passwords in order to guess the correct one and break into the account. If a bot can only make 3 or 4 login attempts an hour, then such an attack is statistically unlikely to be successful.

Rate limiting on a login page can be applied according to the IP address of the user trying to log in, or according to the user's username. Ideally it would use a combination of the two, because:

If rate limiting is only applied by IP address, brute force attackers could bypass this by attempting logins from multiple IP addresses (perhaps by using a botnet). If it's only done by username, any attacker that has a list of known usernames can try a variety of commonly used passwords with those usernames and is likely to successfully break into at least a few accounts, all from the same IP address. Because rate limiting is necessary to prevent these brute force attacks, users who can't remember their passwords may be rate limited along with malicious bots. Users will likely see a "too many login attempts" message of some sort and be prompted to try again within a specified timeframe, or be advised that they are locked out of their accounts altogether.

How does rate limiting work for APIs?

An API, or application programming interface, is a way to request functionality from a program. APIs are invisible to most users, but they're extremely important for applications to function properly. For example, a restaurant's website could rely upon the API of a table reservation service to enable customers to make reservations online. Or, an eCommerce platform could integrate a shipping company's API to provide users with accurate shipping costs.

Every time an API responds to a request, the owner of that API has to pay for compute time: the server resources required for code to run and produce a response to that API request. In the example above, the restaurant's API integration will cause the table reservation service to pay for compute time whenever a restaurant customer makes a reservation.

For this reason, any application or service that offers an API for developers will have limitations on how many API calls can be made per hour or day by each unique user. In this way, third-party developers don't overuse an API.

Rate limiting can also motivate developers to pay more for leveraging the API: often they can only make so many API calls before paying more for the API service.

Rate limiting for APIs helps protect against malicious bot attacks as well. An attacker can use bots to make so many repeated calls to an API that it renders the service unavailable for anyone else, or crashes the service altogether. This is a type of DoS or DDoS attack.

How do social media platforms like Twitter and Instagram use rate limiting?

Social media platform rate limiting is basically just API rate limiting. Any third-party application that integrates Twitter, for instance, can only refresh to look for new tweets or messages a certain amount of times per hour. Instagram has similar limits for third-party apps. This is why users may occasionally encounter "rate limit exceeded" messages.

These limits typically don't apply to users who are using the social media platform directly.

Rate Limiting Algorithms

In general, a rate is a simple count of occurrences over time. However, there are several different techniques for measuring and limiting rates, each with their own uses and implications.

  • Token bucket: A token bucket maintains a rolling and accumulating budget of usage as a balance of tokens. This technique recognizes that not all inputs to a service correspond 1:1 with requests. A token bucket adds tokens at some rate. When a service request is made, the service attempts to withdraw a token (decrementing the token count) to fulfill the request. If there are no tokens in the bucket, the service has reached its limit and responds with backpressure.

    For example, in a GraphQL service, a single request might result in multiple operations that are composed into a result. These operations may each take one token. This way, the service can keep track of the capacity that it needs to limit the use of, rather than tie the rate-limiting technique directly to requests.

  • Leaky bucket: A leaky bucket is similar to a token bucket, but the rate is limited by the amount that can drip or leak out of the bucket. This technique recognizes that the system has some degree of finite capacity to hold a request until the service can act on it; any extra simply spills over the edge and is discarded. This notion of buffer capacity (but not necessarily the use of leaky buckets) also applies to components adjacent to your service, such as load balancers and disk I/O buffers.

  • Fixed window: Fixed-window limits—such as 3,000 requests per hour or 10 requests per day — are easy to state, but they are subject to spikes at the edges of the window, as available quota resets. Consider, for example, a limit of 3,000 requests per hour, which still allows for a spike of all 3,000 requests to be made in the first minute of the hour, which might overwhelm the service.

  • Sliding window: Sliding windows have the benefits of a fixed window, but the rolling window of time smooths out bursts. Systems such as Redis facilitate this technique with expiring keys.

When you have many independently running instances of a service (such as Cloud Functions) in a distributed system, and the service needs to be limited as a whole, you need to use a fast, logically-global (global to all running functions, not necessarily geographically global) key-value store like Redis to synchronize the various limit counters.

Released under the MIT License.