Rate Limiter

System DesignReliability & Scaling

Topic: Rate Limiter

Interviewer: 乔磊

Interviewee: Jin📷🏂🍛🏃🍹.C (Jin_Jin_Jin_424)

Level: L5 (Senior)


QR code

Sign up for future events. Presenters for system design appreciated.

Sign up for 1-1 coaching:

https://commitway.com/store

WGROUP_08G 八月底前获取30%优惠

Mock System Design Interview Summary

Interview Overview

Date: 8/21/2022

Target level: L5

Duration: 45 minutes

Topic covered: Rate limiter

Drawing tool used: Excalidraw

Requirements

Functional requirements

Rate limiter for public services in an e-commerce company

Different rate limits for different services.

For each request, we need to limit requests based on user ID and service

For example, post API for creating an item

Scaling requirements

Assume we have 1M active users

Users 20 requests per day

Scaling estimate

20M requests per day -> QPS: 20M / 10^5 = 200

System Design

External APIs

System design

Where should the rate limiter be?

Application gateway, vs application proxy

Put the rate limiter in both service proxy and application proxy

IP based throttling in the API gateway

Interviewer: IP based throttling is not required

Interviewee: we can put the throttling at the application proxies for user based

Which algorithm will be used

Token bucket; 100 requests to A service per minute

Fixed window algorithm

How to set up config

Add config files and config service

What is in the configuration file?

YAML:

Domain: service; service_A

Descriptors:

key:

value:

rate_limit:

Unit: S, M, H, D

rate_limit_value: 1

Example:

Domain: SERVICE; service_B

Descriptors:

Key: service_name

Value: service_A

Rate_limit:

Unit: m

Rate_limit_value: 100

Interviewer:

Right now it covers calls from service B to A

What if there are 1M users? Do we need 1M entries in the configuration file?

Key: “user_id”

Rate_limit:

Unit: s

rate_limit_value: 100

ShouldAllowRequest(domain=”service_B”, descriptor={key=”user_id”, value=user123})

Interviewer: Enable this scenario: For most users we limit the requests to 100 requests/second. For one important user, we limit the requests to 1000 request/second

Interviewee:

how to setup the config

YAML:

domain: SERVICE; service_B

descriptors:

key: service_name

value: service_A

rate_limit:

unit: m

rate_limit_value: 100

// user_id

key: “user_id”

rate_limit:

unit: s

rate_limit_value: 10

key: “user_id”

value: user123

rate_limit:

unit: s

rate_limit_value: 100

Interviewee: Implementing the fixed window rate limit algorithm, add Redis

How to limit service B to send at most 100 requests to A service per minute.

GET key: service_b+service_A

If not set key

SET(key, 1)

expire(key, 1m)

Else

Value < 100

increment(key, 1)

LET_REWUEST_GO

Downside for fixed window algorithm, it may allow burst of requests when they are made near the time interval boundary

Interviewer: What happens when the request is denied?

Interviewee: we can drop the request, or put it in a queue to process later

Interviewer: how many Redis servers? How much memory?

Interviewee:

Redis can handle 50-100k write requests per second. One Redis can handle all requests.

Assume entry_size -> ~100Bytes

100 bytes * 20M = 2GB

Interviewer: How do you load the cache? How to load it into the rate limiter service?

Interviewee: Config service is an independent service with its own UI

Once we change the configuration, we can push the configuration files to the ratelimiter service

Interviewer: rate limiter service is a cluster, will there a timelapse to fully pushed to all nodes

Interviewee: yes there will be some delay to fully push. If we are concerned about this, we can put the configuration data into Redis

Interviewer: that sounds better because file operations are usually more dangerous.

Interviewer and Audience Feedback

Soft Skills

Hard Skills

Interviewer Feedback

Interviewee prepared well. Can control the timing. Can ask the key questions

Algorithm choices are not well chosen. Fixed window is not a very popular algorithm to use.

We don’t need to write pseudocode for the algorithm. I hope to hear the tradeoffs between different algorithms. Using memory, handling bursts.

Service team or on API gateway

Interviewee Self Review

I was surprised that we needed to discuss where to put the service. In industry these 2 places are both fine.

I prepared a better algorithm than the fixed window, but I was cut off. Should I insist on continuing to use a better algorithm?

Interviewer: I suggest going to the best algorithm directly. You can talk about the 2 algorithms at a high level.

Interviewer’s design

Tradeoff where to put rate limiter. Main tradeoff is integration effort

Recommend to web service cluster

Audience

Sidecar pattern

Daemon:

consumer can deploy the rate limiter daemon into its own service

Token bucket is better

Incoming is bursty, outgoing burst is allowed then token bucket is better

Otherwise, leaky bucket is better (outgoing burst is not allowed)

Audience

In real work, it’s a politics problem where to put the configuration server

Code review, independent control

API gateway: provide autonomous and central control

Sidecar it’s easier for autonomous control

Performance: API gateway: rate limit becomes a policy

Rate limit has high availability

Side car is better to increase high availability

Cache estimation:

What if we don’t provide cache.

For example, big user 1,000 QPS limit. Small user 10 QPS limit. We can push back to let the big customer route to a different resource.

Interviewer: Regular user may have 100, premium user may have 500 QPS limit. We don’t need to route to a different resource

Audience

How do we handle sidecar?

Execution plane can be separate

Centralized control plane

===

Audience

Outgoing burst allowed -> token bucket - easier to implement

Outgoing burst not allowed -> leaky bucket - timer - difficult to implement

Window is the most difficult to implement.

Interviewer:

Fallback strategy - default fallback configuration, e.g. guarantee 100 requests per second.

Interviewer:

5000G memory then it will need sharding

Redis cluster

Audience:

Should calculate concurrent user not total user

Interviewer:

Agreed. We need to estimate concurrent user per minute

Interviewee:

2 additional points

Add local token bucket

What happens if redis is down? Fallback to local token bucket

We can use 2 redis. One handles requests per second. One to handle requests per minute

SLA 99.99% - 10 seconds down per year

99.95% - 1 minute down per year

Interviewer:

Rate limit is a cluster

If the same user sends requests to 2 different machines. They may be out-of-date.

Audience:

Redis -> only a single thread. Guarantees consistency.

If we directly hit Redis, the changes are atomic.

Write throughput is lower

Deploy multiple Redis cluster. There can be different partition. We probably will write to a different cluster. Not very accurate.

Interviewee:

Increase, then check value.

Interviewer:

We can use a session sticker to route to the same Redis server.

Audience

How do we modify distributed redis when we use a token bucket?

Interviewer: First increase then check value.

Audience: however this cannot handle burst correctly. Check value may return a larger value after the burst.

Audience: increase counter will return the value

Audience: redis can provide cluster.

Audience

Fixed window: can use counter. Increment

Token bucket: requires implementing through LUA script. We implement using sorted set. Sorted set is a key value set. Second key with expiration.

First count the # of users in the

Add key to the sorted set.

Multi - can guarantee it is in one transaction.

https://redis.io/commands/incr/