Rate Limiter
Topic: Rate Limiter
Interviewer: 乔磊
Interviewee: Jin📷🏂🍛🏃🍹.C (Jin_Jin_Jin_424)
Level: L5 (Senior)
QR code
Sign up for future events. Presenters for system design appreciated.
Sign up for 1-1 coaching:
WGROUP_08G 八月底前获取30%优惠
Mock System Design Interview Summary
Interview Overview
Date: 8/21/2022
Target level: L5
Duration: 45 minutes
Topic covered: Rate limiter
Drawing tool used: Excalidraw
Requirements
Functional requirements
Rate limiter for public services in an e-commerce company
Different rate limits for different services.
For each request, we need to limit requests based on user ID and service
For example, post API for creating an item
Scaling requirements
Assume we have 1M active users
Users 20 requests per day
Scaling estimate
20M requests per day -> QPS: 20M / 10^5 = 200
System Design
External APIs
System design
Where should the rate limiter be?
Application gateway, vs application proxy
Put the rate limiter in both service proxy and application proxy
IP based throttling in the API gateway
Interviewer: IP based throttling is not required
Interviewee: we can put the throttling at the application proxies for user based
Which algorithm will be used
Token bucket; 100 requests to A service per minute
Fixed window algorithm
How to set up config
Add config files and config service
What is in the configuration file?
YAML:
Domain: service; service_A
Descriptors:
key:
value:
rate_limit:
Unit: S, M, H, D
rate_limit_value: 1
Example:
Domain: SERVICE; service_B
Descriptors:
Key: service_name
Value: service_A
Rate_limit:
Unit: m
Rate_limit_value: 100
Interviewer:
Right now it covers calls from service B to A
What if there are 1M users? Do we need 1M entries in the configuration file?
Key: “user_id”
Rate_limit:
Unit: s
rate_limit_value: 100
ShouldAllowRequest(domain=”service_B”, descriptor={key=”user_id”, value=user123})
Interviewer: Enable this scenario: For most users we limit the requests to 100 requests/second. For one important user, we limit the requests to 1000 request/second
Interviewee:
how to setup the config
YAML:
domain: SERVICE; service_B
descriptors:
key: service_name
value: service_A
rate_limit:
unit: m
rate_limit_value: 100
// user_id
key: “user_id”
rate_limit:
unit: s
rate_limit_value: 10
key: “user_id”
value: user123
rate_limit:
unit: s
rate_limit_value: 100
Interviewee: Implementing the fixed window rate limit algorithm, add Redis
How to limit service B to send at most 100 requests to A service per minute.
GET key: service_b+service_A
If not set key
SET(key, 1)
expire(key, 1m)
Else
Value < 100
increment(key, 1)
LET_REWUEST_GO
Downside for fixed window algorithm, it may allow burst of requests when they are made near the time interval boundary
Interviewer: What happens when the request is denied?
Interviewee: we can drop the request, or put it in a queue to process later
Interviewer: how many Redis servers? How much memory?
Interviewee:
Redis can handle 50-100k write requests per second. One Redis can handle all requests.
Assume entry_size -> ~100Bytes
100 bytes * 20M = 2GB
Interviewer: How do you load the cache? How to load it into the rate limiter service?
Interviewee: Config service is an independent service with its own UI
Once we change the configuration, we can push the configuration files to the ratelimiter service
Interviewer: rate limiter service is a cluster, will there a timelapse to fully pushed to all nodes
Interviewee: yes there will be some delay to fully push. If we are concerned about this, we can put the configuration data into Redis
Interviewer: that sounds better because file operations are usually more dangerous.
Interviewer and Audience Feedback
Soft Skills
Hard Skills
Interviewer Feedback
Interviewee prepared well. Can control the timing. Can ask the key questions
Algorithm choices are not well chosen. Fixed window is not a very popular algorithm to use.
We don’t need to write pseudocode for the algorithm. I hope to hear the tradeoffs between different algorithms. Using memory, handling bursts.
Service team or on API gateway
Interviewee Self Review
I was surprised that we needed to discuss where to put the service. In industry these 2 places are both fine.
I prepared a better algorithm than the fixed window, but I was cut off. Should I insist on continuing to use a better algorithm?
Interviewer: I suggest going to the best algorithm directly. You can talk about the 2 algorithms at a high level.
Interviewer’s design
Tradeoff where to put rate limiter. Main tradeoff is integration effort
Recommend to web service cluster
Audience
Sidecar pattern
Daemon:
consumer can deploy the rate limiter daemon into its own service
Token bucket is better
Incoming is bursty, outgoing burst is allowed then token bucket is better
Otherwise, leaky bucket is better (outgoing burst is not allowed)
Audience
In real work, it’s a politics problem where to put the configuration server
Code review, independent control
API gateway: provide autonomous and central control
Sidecar it’s easier for autonomous control
Performance: API gateway: rate limit becomes a policy
Rate limit has high availability
Side car is better to increase high availability
Cache estimation:
What if we don’t provide cache.
For example, big user 1,000 QPS limit. Small user 10 QPS limit. We can push back to let the big customer route to a different resource.
Interviewer: Regular user may have 100, premium user may have 500 QPS limit. We don’t need to route to a different resource
Audience
How do we handle sidecar?
Execution plane can be separate
Centralized control plane
===
Audience
Outgoing burst allowed -> token bucket - easier to implement
Outgoing burst not allowed -> leaky bucket - timer - difficult to implement
Window is the most difficult to implement.
Interviewer:
Fallback strategy - default fallback configuration, e.g. guarantee 100 requests per second.
Interviewer:
5000G memory then it will need sharding
Redis cluster
Audience:
Should calculate concurrent user not total user
Interviewer:
Agreed. We need to estimate concurrent user per minute
Interviewee:
2 additional points
Add local token bucket
What happens if redis is down? Fallback to local token bucket
We can use 2 redis. One handles requests per second. One to handle requests per minute
SLA 99.99% - 10 seconds down per year
99.95% - 1 minute down per year
Interviewer:
Rate limit is a cluster
If the same user sends requests to 2 different machines. They may be out-of-date.
Audience:
Redis -> only a single thread. Guarantees consistency.
If we directly hit Redis, the changes are atomic.
Write throughput is lower
Deploy multiple Redis cluster. There can be different partition. We probably will write to a different cluster. Not very accurate.
Interviewee:
Increase, then check value.
Interviewer:
We can use a session sticker to route to the same Redis server.
Audience
How do we modify distributed redis when we use a token bucket?
Interviewer: First increase then check value.
Audience: however this cannot handle burst correctly. Check value may return a larger value after the burst.
Audience: increase counter will return the value
Audience: redis can provide cluster.
Audience
Fixed window: can use counter. Increment
Token bucket: requires implementing through LUA script. We implement using sorted set. Sorted set is a key value set. Second key with expiration.
First count the # of users in the
Add key to the sorted set.
Multi - can guarantee it is in one transaction.