Design Yelp · commitway

Topic: Design Yelp

Interviewer: guanyao

Interviewee: longwei

Level: L5 (Senior)

Mock System Design Interview Summary

Interview Overview

Date: 7/10/2022

Target level: L5

Duration: 45 minutes

Topic covered: Design Yelp

Drawing tool used: Excalidraw

Requirements

Functional requirements

Design Yelp

No result, return empty

A few radius choices: 1, 5, 10 miles

Business owner can update information, low QPS

Non functional requirements

100M DAU

200M businesses

Low latency, 500 ms in case of changing radius

Hours / 1 day delay. Ideally within minutes.

Low latency

Highly scalable

Durable

QPS

2 requests per day per user, peak hour * 5, 1k * 10 = 10k QPS

Storage: video/image

Low frequency for business updates

System Design

External APIs

User get_biz_from(cur_location, radius)

Biz_owner: CRUD update(biz_id, location, biz_info)

System design

Q: Purpose of API GW?

A: choking point for redirect. HTTPS termination.

Q: how to store based on location?

A: geo hashing, mapping 2D->1D

Search for longest prefix

Len 6 => 1 mile

Len 5 => 5 miles

Len 4 => 10 miles

Q: how does geohashing work?

A: cut the whole area into 4. Each with prefix 00, 01, 11, 10

Further division will append to the prefix 00, 01, 11, 10

How do you implement 5 mile, 10 mile radius?

Example geocache code. Resolution A:

abcdef

abcde

Q: How do you store geocache in database?

A: how we use it:

My_geohash: list of biz[]

Filter based on geohash

Return … biz id

Query the biz_info service, and fill in information

How we store it:

Main storage is redis cache, look up from list to business ID

Backend is a database, with geohash -> biz_id

Reading from Redis

Writing to geohash

Q: how to store it in database?

A: we can use regular SQL

Q: how do you search nearby businesses in SQL?

A: select biz_id where geohash = XX

Q: What if we don’t have enough results?

Q: will we build 3 entries for the same location

A: yes, depends on resolution

Q: Too much resources consumed. Can we have 1 geocache code?

A: a few gigs. Because # of business doesn’t take a lot of storage.

Q: what if there is consistency issue between redis and database?

A: in redis we store like this: geohash: [biz1, biz2]

In database we store like this:

Geohash: biz1

Geohash: biz2

If radius is 10 miles, how will use search for the location?

Geohash_1mile: biz1…biz2

Geohash_5mile: biz1

Geohash_10mile: []

Do we have enough memory?

8 byte, 200M * 3 = 24*2 = 5G

If it is the case, we can shard the storage by regions

8 bytes, 200M * 3 = 24 * 2 = 5G, 100, 100% = 50G

We can have one server in california, another for washington

Q: is it true for each restaurant we have 3 different entries stored in redis?

A: yes

Q: can you just use 1? Make 1,5,10 miles input parameters for querying

Interviewer and Audience Feedback

Interviewer

Overall is good

Yelp: has higher expectation

Soft skill: try to get get to the point sooner

L5 weak hire

Good communication

Reasonable drawing

Soft skill: the requirement gathering was a bit too slow. Covered some corner cases. Should get to the point, how to search with prefix

API server: login, authentication. But the answer includes some other points not related.

How does geohash work? Should cover the high level first.

Hard skill: it seems in the database we can store just one entry for each business. Just store the longest one. You can use one of the two simpler solutions

B+ tree index

GBPass tree. Hashtable.

In Redis, you can store multiple times. In the database you should just store it once.

SQL: range queries

Interviewee

Timing was too slow during requirement gathering

In actual database, we can just store each business once

The interviewer is right

There may be multiple entries in Redis in Redis.

It seems 5-10G we can store all data. We can store everything in memory. I should cover this with you, option is to use memory to accelerate or use sortable geohash

Interviewer: if we store everything in database, database can handle the QPS

Audience:

https://stackoverflow.com/questions/30182570/elasticsearch-quadprefixtree-vs-geohashprefixtree#:~:text=A%20quadtree%20can%20be%20implemented,curve%20by%20interleaving%20the%20bits.

Audience: Redis can be used as a persistent storage: https://redis.io/docs/manual/persistence/

Audience: Redis supports prefix search

Audience: soft skill

Lots of misspellings.

Can slow down when expressing ideas. Can think and communicate

Lots of feed words. Can listen more

Audience: there may be two closeby location that go into different geo bounding box

Interviewee: we can query using less granular way. We can query 9 nearby boxes

From 1 mile query, then we can use 5 mile query

Audience: it feels we need to handle some boundary conditions, to specially add other boxes

Audience: we can precompute nearby boxes.

Audience: it feels we should also consider the maximum number of restaurants. We don’t need to return 10,000 restaurants.

Interviewer: we can use this as a follow-up question.

We can either use number of restaurants. Or we use popular restaurants. Combining multiple results.

Audience: this is a business requirement.

Interviewer: additional criteria may include ethnicity/category.

Audience: can use 9 boxes, get results, then sort

Audience: do we use some size grid? Can we use smaller grid for downtown?

Audience: similar to quadtree

Interviewer: quadtree - near k business, can handle different densities.

Geohash is easier to implement.

Audience: we can put geohash into Redis. Redis natively support geohash, and can support sorting based on overlapping bits. We can shard based on city into multiple Redis shards

Audience: how do we handle different density of restaurants

Audience:

GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [ASC|DESC] [COUNT count] [STORE key] [STORedisT key]

Key may be different geography centers

Audience: quadtree, geohash. Space tradeoffs