Customer Review System

Topic: Customer Review System

Interviewer: Jin📷🏂🍛🏃🍹.C （Jin_Jin_Jin_424）

Interviewee: 乔磊

Level: L5 (Senior)

Signup sheet:

QRCode:

Mock System Design Interview Summary

Interview Overview

Date: 8/28/2022

Target level: L5

Duration: 45 minutes

Topic covered: Customer review for food delivery

Drawing tool used:

Requirements

Functional requirements

Customer review for food delivery system

Need to show average rating for each restaurant

Accept reviews

Text reviews and ratings (image later)

Provide review, and get reviews

No approval needed

Scaling requirements

Need average ratings.

DAU : 10 M(100 M) (1M restaurant, 50M food items) 1000 M reviews

10M / 80,000 = 100 QPS

10QPS (write) -> 40Gbytes of reviews per day

Q: Assumption 10% QPS are reviews. How about rating?

A: add functional requirement: how to get rating per food item

We assume the query can include the ratings for the items

System Design

High level system:

Separate read and write part of services.

NoSQL: Use eventual consistency Capacity and partition.

Cassandra: eventual consistency, scalable (consistent hashing)

Q: why not MySQL?

A: Not needed to have ACID

Need to self-manage sharding of database

Cassandra it’s easy to scale to add more partitions

Q: 100 QPS can be handled by MySQL. Why choosing NoSQL over MySQL

A: sacrifice consistency to improve availability and partition.

Benefit for NoSQL: you don’t worry about the future

If we need to increase by 100x, NoSQL is more scalable.

We don’t need a join or transaction.

Schema:

Comments_by_item

itemId (uuid)

commentId (timeuuid)

User_id

user_name

Comment (text)

Rating (0-10)

Picture1_location

creation_time

Interviewee: should we support comment hierarchy?

Interviewer: we don’t need to support replies in comments

Item and rating tables may be managed by a different team.

Rating_by_item

Item_id

Total_score

Total_count

Computing average score periodically (say 10-30 minutes)

Can leave the unsaved information in redis

Q: why not update the value directly?

A: we can reduce speed for updating the average

Q: what if we have a popular dish, do we need to return all reviews?

A: We should add pagination from the start.

List comes with pagination feature with limit and offset

Q: how do we show 3 top popular reviews?

A: we need to change the database design

We can support thumb up thumb down to understand which comment is most helpful.

Alternative: we can automatically detect based on content, length, word cloud

Interviewer: let’s use thumb up thumb down

Interviewee: we can use thumb up count - thumb down count. We can list review based on creation time or popularity

Interviewer: how do we choose top 3 reviews? Do we need to process all reviews for an item?

Interviewee: I probably want to add a flag for promoted comments. We use a backend system to scan comments.

Interviewer: walk through a scenario where upvote of an item goes up?

Example:

100 upvote

50 upvote

15 upvote

45 upvote

Keep a threshold on the item

threshold_upvote in item table.

External APIs

System design

Interviewer and Audience Feedback

Interviewer:

Soft skill

Ok soft skill

Each time there are multiple options. It is a good way to go

Improve:

QPS - too long. Just basic is fine. QPS is low.

Missing API design. Ideally design

Hard skill

First 30 minutes: rating not designed.

30 minutes - 45 minutes: I care most about upvote

Interviewee:

Time control was not ideal

I wasn’t able to get the keypoint for the last question, e.g. upvote, downvote.

For example if we just need to update the top 3 comments, and if I understand it sooner, I wouldn’t have spent too much time on this.

API design could have cost too much time.

Interviewer:

Can I ask “We have 5 APIs, should I write them down?”

1000 reviews: should I return all of it?

The most difficult is to figure out what the interviewer wants to test

How to handle new requirement

===

Sql vs nosql

top3

Rating

Audience feedback

Audience: Do we need to add user and restaurant information?

Interviewee: we shouldn’t. We can assume we are only responsible for rating/review part. Limit the scope

Audience: Data modeling. If we include information from other tables, we can denormalize information, e.g. add user/restaurant information into the review

Interviewee: when a user reads a review, they are on a new page. There is no need to include restaurant information together.

Audience: there is a danger for denormalization. If we update the user information, then we need to propagate to other tables. Comments may need their own table.

Audience: should ask read operation

Interviewer: 10M DAU, 100M total user

10QPS per write

After the user finishes an order, they may read a lot of reviews.

They may read the review for many dishes

One user orders 1 dish, but may read many dishes. Reading reviews may be 20 times per user.

QPS:

80,000 seconds. Is not whole day, 2 hours lunch, 2 hours dinner

Should use peak time of 4 hours

Interviewee: there are other possible features, such as

searching the review content with keywords

Serving images

This appears to be product feature design.

Audience:

Why did you insist on nosql? User generated content, fraud detection. There may be data pipelines and heuristics.

Returning comment: is a ranking algorithm. Popularity, relevance, time. When there is ranking when you query, it may come from elastic search. As long as you can store it, it doesn’t matter. Elastic search.

Slice and dice. User, ethnicity. SQL can more easily support these.

Interviewee: I asked if we need machine / person review. But the interviewer mentioned it’s out of scope.

Audience: people may have phishing attack.

Audience: read QPS is very high. mySQL may not be able to handle. Cache, cassandra.

Audience: most important is ranking. Likely you will use elastic search. You can still use mySQL as source of truth, but use elastic search to serve derived results.

Near real time average rating: likely we will use elastic search

Interviewer:

SQL vs NoSQL. Read, write are very low. 10 QPS is low. 100 QPS still easy to handle

future proof is not a very strong reason.

popular/unpopular, interviewee proposed upvote vs machine detected.

Interviewee:

I find it most difficult to choose SQL vs NoSQL.

In an ambiguous situation, it feels NoSQL is fine due to the flexible schema.

Audience:

We usually need to support follow up comments

Usually we should answer relational databases. Query, Join, are easy to implement. For extensible, we can change to NoSQL later

Interviewer:

SQL vs NoSQL tradeoff: consistency, big QPS, CAP

Audience:

Depends on access pattern: B+ tree vs LSM (log structured merge tree)

NoSQL: schema is very flexible

DynamoDB: very flexible. JSON string

MongoDB: JSON string

Cassandra: requires column and row