Rate Limiter
Materials — open to everyone, no sign-in
Topic: Rate Limiter
Interviewer: Pinglu
Interviewee: 09817167d
Level: L4 (Experienced Individual Contributor)
Rate limiter
Mock System Design Interview Summary
Interview Overview
Date:
Target level:
Duration: 1 hour
Topic covered: Rate limiter
Drawing tool used
Requirements
(side note: candidate is Ph.D. new grad, but already joined system design interviews with real companies)
Functional requirements
Limit the rate of requests
By request type or by user
Frequency per minute depends on users
Services are deployed on multiple servers
Return boolean if false, response of http status of 400
Limit based on per minute, per hour
Non functional requirements
System Design
External APIs
For each request
Input parameter: user, request type
Limit based on per minute, per hour
Input parameters: int api_id, int userId
Output:
boolean
true: process request, give a response
False: return HTTP status
Database schema
Data model:
APITable
Key: api_id
Value: rate_limit
UserTable
Key: userId
Values: rate_limit, email, phone
Low latency:
cache rate count of each api in time intervals, cache user info; cache API info
Multiple clusters around the world; clusters around
System design diagram
Type of storage: cache, redis, memcached
For each API, create a hashTable
hashTable, key is userID, value is count of frequency in current time interval
Type of storage: hashTable, key
API_ID 2Bytes + userID 8Bytes + overhead of hashTable 20B = 30B per count
Assuming 1000 requests per time interval
30KB per user
1 million users, 30GB storage
IP address and user?
IP cannot distinguish good/bad users
Effectively limit the total rate
Users: pros: distinguish good/bad users
Cons: short period of time of surges
Cons: login
Algorithm:
For API working on single server, sliding window will be a simple and effective algorithm
API on multiple server: token method.
For each server we can allocate a few tokens
If a request comes, the server reduces from the token locally
Servers communicate with each other
Discussions during the Interview
Interviewer:
customer sends request
Customer goes to the rate limiter, then go to the actual server
Interviewee:
Move the rate limit to the LB
For server with high computing capability, we can allocate more tokens -> more job allocated
Eventual consistency
It is possible to compute more jobs in a time interval than limit -> allow negative count of token
https://docs.google.com/document/d/182rZ1om9Tj1RNCNPRRm5DSfWmXJn1Gm9Y4gLk6vjQIM/edit
Interviewer and Audience Feedback after the Interview
Interviewer feedback
Did well:
Gather requirement well
Design API ahead of time
Right cache design, key and value
Pros and cons of IP and users
Explain clearly
To improve
Cluster vs single machine. Does not affect the rate limiter. Rate limiter happens before requests hit the server
Explain the algorithm more clearly. Sliding and fixed window. Pros and cons.
Rate limiter: does not fit into template. Type of storage. It took a lot of time. We don’t need to persist data in the database. We can remove
Limiter:
Per server, per API
It can be separate from
QPS is very high for limiter
Therefore we don’t need
Audience:
Directly decorate on the API server
Stickiness on the load balancer
Audience:
Need to switch algorithm
Audience
Is there a traffic spike?
If there is no traffic spike. There may need to be a queue
If a spike goes to 1000 QPS per second from usual 100 QPS
Can support lots of request
Should be able to handle a lot of requests
Can queue up and still receive the response
Audience
If capacity is enough, can add burst handling
Audience
Rate limit should reject?
Overall rate limit at load balancer
Server can have its own limit
Audience
Rate limit: 多个customer
There is some flexibility to go over the limit
burst , bucket
Audience
Why per user rate limit
Protection of the server: may not need to limit by user
Audience
Some customers may automate the requests using script
Interviewer
Security may be a consideration
Audience
Some web servers allow anonymous
User story
Based on work experience, Restful api, where can we get the user ID?
Distributed system: may be for a backend server
Anonymous token?
Use IP
Load balancer: IP hash. Fixed ip->fixed server
Backend:
Load balancer may put IP into X-IP forward
Audience
Client side rate limiting
E.g. chrome browser. Javascript may limit it from the client side
Client side can help. Server side is required
Audience
Can we use DB?
DB, MemCacheD
All servers can read from the same DB
Pure memory: it’s possible that one user’s requests are sent to multiple
Sticky server
Or use MemCacheD
Based on diskDB is not feasible.
If use DB, must use memory based DB
Can also use memory
10 LBs -> each LB just use 10% of the rate limit
Same domain name can map to multiple IPs, then it may randomly pick one of the IPs
If it’s for different services
Load balancer needs all rules for rate limiting
Too much business logic in the LB
Rate limiter can be put on each
IP hash may not be balanced.
Using IP based
What’s the topology?
LB -> Rate limiter
Rate limiter can be put on LB, can also be put on the server behind LB
Traffic usually reaches LB first, then rate limiter
Token server
Are we testing the rate limiter algorithm as well?
Per user, per API
Same user may get different quota
API is heavy or light
Can load different algorithm based on
Rate limiter: VIP, IP address, can get a few more token
VIP: queue1. Regular: queue2
Preempted: VIP not preempted. IP preempted
Can sliding window handle distributed
Distributed: should use token
Single machine: can use sliding window
Regardless whether its token or sliding window, whether it’s token, it has pros and cons
Token vs sliding window
Distributed vs single machine
Load balancer put at LB vs backend
Let’s say 3 regions
500QPS for asian
200QPS for other regions
Reason for multiple regions
DNS can handle first level of LB, split to beijing vs shanghai
Second level is within one region
We can have 2 layers of rate limiter
First one is at LB
Second one is at service level
L4 vs L7 rate limiter
Key
UserID:API
Value
Number of time hit
Expiration
1 second
Easy to maintain
Not as accurate as sliding window
We can add buffer to make the system behavior more smooth
3 choices
Rate limiter at LB
Rate limiter at API server
Rate limiter can be separate from API
DNS load
Rate limit can be for IP
Rate limit can be for user
IP