Customer Review System
Topic: Customer Review System
Interviewer: Jin📷🏂🍛🏃🍹.C (Jin_Jin_Jin_424)
Interviewee: 乔磊
Level: L5 (Senior)
Signup sheet:
QRCode:
Mock System Design Interview Summary
Interview Overview
Date: 8/28/2022
Target level: L5
Duration: 45 minutes
Topic covered: Customer review for food delivery
Drawing tool used:
Requirements
Functional requirements
Customer review for food delivery system
Need to show average rating for each restaurant
Accept reviews
Text reviews and ratings (image later)
Provide review, and get reviews
No approval needed
Scaling requirements
Need average ratings.
DAU : 10 M(100 M) (1M restaurant, 50M food items) 1000 M reviews
10M / 80,000 = 100 QPS
10QPS (write) -> 40Gbytes of reviews per day
Q: Assumption 10% QPS are reviews. How about rating?
A: add functional requirement: how to get rating per food item
We assume the query can include the ratings for the items
System Design
High level system:
Separate read and write part of services.
NoSQL: Use eventual consistency Capacity and partition.
Cassandra: eventual consistency, scalable (consistent hashing)
Q: why not MySQL?
A: Not needed to have ACID
Need to self-manage sharding of database
Cassandra it’s easy to scale to add more partitions
Q: 100 QPS can be handled by MySQL. Why choosing NoSQL over MySQL
A: sacrifice consistency to improve availability and partition.
Benefit for NoSQL: you don’t worry about the future
If we need to increase by 100x, NoSQL is more scalable.
We don’t need a join or transaction.
Schema:
Comments_by_item
itemId (uuid)
commentId (timeuuid)
User_id
user_name
Comment (text)
Rating (0-10)
Picture1_location
creation_time
Interviewee: should we support comment hierarchy?
Interviewer: we don’t need to support replies in comments
Item and rating tables may be managed by a different team.
Rating_by_item
Item_id
Total_score
Total_count
Computing average score periodically (say 10-30 minutes)
Can leave the unsaved information in redis
Q: why not update the value directly?
A: we can reduce speed for updating the average
Q: what if we have a popular dish, do we need to return all reviews?
A: We should add pagination from the start.
List comes with pagination feature with limit and offset
Q: how do we show 3 top popular reviews?
A: we need to change the database design
We can support thumb up thumb down to understand which comment is most helpful.
Alternative: we can automatically detect based on content, length, word cloud
Interviewer: let’s use thumb up thumb down
Interviewee: we can use thumb up count - thumb down count. We can list review based on creation time or popularity
Interviewer: how do we choose top 3 reviews? Do we need to process all reviews for an item?
Interviewee: I probably want to add a flag for promoted comments. We use a backend system to scan comments.
Interviewer: walk through a scenario where upvote of an item goes up?
Example:
100 upvote
50 upvote
15 upvote
45 upvote
Keep a threshold on the item
threshold_upvote in item table.
External APIs
System design
Interviewer and Audience Feedback
Interviewer:
Soft skill
Ok soft skill
Each time there are multiple options. It is a good way to go
Improve:
QPS - too long. Just basic is fine. QPS is low.
Missing API design. Ideally design
Hard skill
First 30 minutes: rating not designed.
30 minutes - 45 minutes: I care most about upvote
Interviewee:
Time control was not ideal
I wasn’t able to get the keypoint for the last question, e.g. upvote, downvote.
For example if we just need to update the top 3 comments, and if I understand it sooner, I wouldn’t have spent too much time on this.
API design could have cost too much time.
Interviewer:
Can I ask “We have 5 APIs, should I write them down?”
1000 reviews: should I return all of it?
The most difficult is to figure out what the interviewer wants to test
How to handle new requirement
===
Sql vs nosql
top3
Rating
Audience feedback
Audience: Do we need to add user and restaurant information?
Interviewee: we shouldn’t. We can assume we are only responsible for rating/review part. Limit the scope
Audience: Data modeling. If we include information from other tables, we can denormalize information, e.g. add user/restaurant information into the review
Interviewee: when a user reads a review, they are on a new page. There is no need to include restaurant information together.
Audience: there is a danger for denormalization. If we update the user information, then we need to propagate to other tables. Comments may need their own table.
Audience: should ask read operation
Interviewer: 10M DAU, 100M total user
10QPS per write
After the user finishes an order, they may read a lot of reviews.
They may read the review for many dishes
One user orders 1 dish, but may read many dishes. Reading reviews may be 20 times per user.
QPS:
80,000 seconds. Is not whole day, 2 hours lunch, 2 hours dinner
Should use peak time of 4 hours
Interviewee: there are other possible features, such as
searching the review content with keywords
Serving images
This appears to be product feature design.
Audience:
Why did you insist on nosql? User generated content, fraud detection. There may be data pipelines and heuristics.
Returning comment: is a ranking algorithm. Popularity, relevance, time. When there is ranking when you query, it may come from elastic search. As long as you can store it, it doesn’t matter. Elastic search.
Slice and dice. User, ethnicity. SQL can more easily support these.
Interviewee: I asked if we need machine / person review. But the interviewer mentioned it’s out of scope.
Audience: people may have phishing attack.
Audience: read QPS is very high. mySQL may not be able to handle. Cache, cassandra.
Audience: most important is ranking. Likely you will use elastic search. You can still use mySQL as source of truth, but use elastic search to serve derived results.
Near real time average rating: likely we will use elastic search
Interviewer:
SQL vs NoSQL. Read, write are very low. 10 QPS is low. 100 QPS still easy to handle
future proof is not a very strong reason.
popular/unpopular, interviewee proposed upvote vs machine detected.
Interviewee:
I find it most difficult to choose SQL vs NoSQL.
In an ambiguous situation, it feels NoSQL is fine due to the flexible schema.
Audience:
We usually need to support follow up comments
Usually we should answer relational databases. Query, Join, are easy to implement. For extensible, we can change to NoSQL later
Interviewer:
SQL vs NoSQL tradeoff: consistency, big QPS, CAP
Audience:
Depends on access pattern: B+ tree vs LSM (log structured merge tree)
NoSQL: schema is very flexible
DynamoDB: very flexible. JSON string
MongoDB: JSON string
Cassandra: requires column and row