Design Yelp
Topic: Design Yelp
Interviewer: guanyao
Interviewee: longwei
Level: L5 (Senior)
Sign up for future events
Mock System Design Interview Summary
Interview Overview
Date: 7/10/2022
Target level: L5
Duration: 45 minutes
Topic covered: Design Yelp
Drawing tool used: Excalidraw
Requirements
Functional requirements
Design Yelp
No result, return empty
A few radius choices: 1, 5, 10 miles
Business owner can update information, low QPS
Non functional requirements
100M DAU
200M businesses
Low latency, 500 ms in case of changing radius
Hours / 1 day delay. Ideally within minutes.
Low latency
Highly scalable
Durable
QPS
2 requests per day per user, peak hour * 5, 1k * 10 = 10k QPS
Storage: video/image
Low frequency for business updates
System Design
External APIs
User get_biz_from(cur_location, radius)
Biz_owner: CRUD update(biz_id, location, biz_info)
System design
Q: Purpose of API GW?
A: choking point for redirect. HTTPS termination.
Q: how to store based on location?
A: geo hashing, mapping 2D->1D
Search for longest prefix
Len 6 => 1 mile
Len 5 => 5 miles
Len 4 => 10 miles
Q: how does geohashing work?
A: cut the whole area into 4. Each with prefix 00, 01, 11, 10
Further division will append to the prefix 00, 01, 11, 10
How do you implement 5 mile, 10 mile radius?
Example geocache code. Resolution A:
abcdef
abcde
abcde
Q: How do you store geocache in database?
A: how we use it:
My_geohash: list of biz[]
Filter based on geohash
Return … biz id
Query the biz_info service, and fill in information
How we store it:
Main storage is redis cache, look up from list to business ID
Backend is a database, with geohash -> biz_id
Reading from Redis
Writing to geohash
Q: how to store it in database?
A: we can use regular SQL
Q: how do you search nearby businesses in SQL?
A: select biz_id where geohash = XX
Q: What if we don’t have enough results?
A:
Q: will we build 3 entries for the same location
A: yes, depends on resolution
Q: Too much resources consumed. Can we have 1 geocache code?
A: a few gigs. Because # of business doesn’t take a lot of storage.
Q: what if there is consistency issue between redis and database?
A: in redis we store like this: geohash: [biz1, biz2]
In database we store like this:
Geohash: biz1
Geohash: biz2
If radius is 10 miles, how will use search for the location?
Geohash_1mile: biz1…biz2
Geohash_5mile: biz1
Geohash_10mile: []
Do we have enough memory?
A:
8 byte, 200M * 3 = 24*2 = 5G
If it is the case, we can shard the storage by regions
8 bytes, 200M * 3 = 24 * 2 = 5G, 100, 100% = 50G
We can have one server in california, another for washington
Q: is it true for each restaurant we have 3 different entries stored in redis?
A: yes
Q: can you just use 1? Make 1,5,10 miles input parameters for querying
A:
Interviewer and Audience Feedback
Interviewer
Overall is good
Yelp: has higher expectation
Soft skill: try to get get to the point sooner
L5 weak hire
Good communication
Reasonable drawing
Soft skill: the requirement gathering was a bit too slow. Covered some corner cases. Should get to the point, how to search with prefix
API server: login, authentication. But the answer includes some other points not related.
How does geohash work? Should cover the high level first.
Hard skill: it seems in the database we can store just one entry for each business. Just store the longest one. You can use one of the two simpler solutions
B+ tree index
GBPass tree. Hashtable.
In Redis, you can store multiple times. In the database you should just store it once.
SQL: range queries
Interviewee
Timing was too slow during requirement gathering
In actual database, we can just store each business once
The interviewer is right
There may be multiple entries in Redis in Redis.
It seems 5-10G we can store all data. We can store everything in memory. I should cover this with you, option is to use memory to accelerate or use sortable geohash
Interviewer: if we store everything in database, database can handle the QPS
Audience:
Audience: Redis can be used as a persistent storage: https://redis.io/docs/manual/persistence/
Audience: Redis supports prefix search
Audience: soft skill
Lots of misspellings.
Can slow down when expressing ideas. Can think and communicate
Lots of feed words. Can listen more
Audience: there may be two closeby location that go into different geo bounding box
Interviewee: we can query using less granular way. We can query 9 nearby boxes
From 1 mile query, then we can use 5 mile query
Audience: it feels we need to handle some boundary conditions, to specially add other boxes
Audience: we can precompute nearby boxes.
Audience: it feels we should also consider the maximum number of restaurants. We don’t need to return 10,000 restaurants.
Interviewer: we can use this as a follow-up question.
We can either use number of restaurants. Or we use popular restaurants. Combining multiple results.
Audience: this is a business requirement.
Interviewer: additional criteria may include ethnicity/category.
Audience: can use 9 boxes, get results, then sort
Audience: do we use some size grid? Can we use smaller grid for downtown?
Audience: similar to quadtree
Interviewer: quadtree - near k business, can handle different densities.
Geohash is easier to implement.
Audience: we can put geohash into Redis. Redis natively support geohash, and can support sorting based on overlapping bits. We can shard based on city into multiple Redis shards
Audience: how do we handle different density of restaurants
Audience:
GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [ASC|DESC] [COUNT count] [STORE key] [STORedisT key]
Key may be different geography centers
Audience: quadtree, geohash. Space tradeoffs