How is rate limiting implemented?

I would like to implement a multithreaded client which may scale up or down. The simplest way to handle the rate limiting is to rely on the API to tell me when I’ve hit a limit; then each client can exponentially back off until the rate limit is restored.

But this depends on how rate limiting is implemented. When I hit my limit, do I get a specific error code? Am I limited for a certain amount of time? Is the 600 requests counter reset every 5 minutes, or averaged over a different period?


Rate limiting is based on a maximum number of requests within a given time window.

By default, this is 600 requests in a 5 minute window, and the 5 minute window begins when the first request is made.

Each request returns a X-Ratelimit-Reset header. This is a unix timestamp, and when that time passes the remaining rate limit (X-Ratelimit-Remain) is reset to the maximum (X-Ratelimit-Limit).

When you exceed the rate limit, any further requests will return a 429 response.

After receiving a 429 response, you should wait until the time specified in X-Ratelimit-Reset has passed before making further requests.

The ‘600 requests in a 5 minute window’ limitation … can be changed in some way?
We mean, is there any way you can increase the number of requests per 5 minutes or even disable this limitation?

Thank you very much in advance!

1 Like