Newsfeed

System DesignRecommendation & Feed

Topic: Newsfeed

Interviewer: Li Ning

Interviewee: Fei

Level: L5 (Senior)


Topic

Mock System Design Interview Summary

Interview Overview

Date: 5/22/2022

Target level: L5

Duration: 45 minutes

Topic covered: Design Newsfeed

Drawing tool used: excalidraw

Requirements

Functional requirements

See newsfeed

Post newsfeed: text / image /video. Focus on text and image first

Non-requirement

login/logout

Follow users

Non functional requirements

Assume 100,000 users

2 tweets per day => .2 tweets per day

DAU 10,000 users

Stability / Reliability 7*24

10k * 2 = 20k tweets

10MB == 200 GB/day, image, text, video

200GB / 86400 = 2.4 MB / s

System Design

System design

MySQL + noSQL: image, text, video

MySQL: feed info, people’s posts, follower relationship, timeline

20% / 80%

Reads: 2.4 MB/s*4 = 10MB/s

QPS: 100k

Each user can read newsfeed for 5 times

User can follow 100k * 100 * 2 / 86400 = 240 QPS

13:00

Q: let’s focus on text services

Choosing mongoDB

JSON native, text file can be converted to JSON; can help us support rest API

For media files, we can use another database

Q: let’s focus on text messages first

When twitting: tweet service saves content to MongoDB; also talks to user service for auth

Read feed:

Use friendship service to find the followers to get new feeds

Q: what does friendship service do?

A: find out followers of users

Q: where do we save the follower relations?

A: MySQL DB.

Add mysql DB

Q: Table design for friendship service

A:

Users: userID, userName, email, phone, password

Following: from_userID, to_userID

Tweets: saved in mongo DB

De-couple services

Q: how to send a tweet? newsfeed service, friendship service

A: user services are for creating users. Friendship services are for relations

Q: why do we need to talk to user service and newsfeed service when sending a tweet?

A: follower will push data into database. Followee will pull information. Newsfeed service can act as a broker.

Q: back to basic requirements. Which services will be involved when a user publishes a tweet

News feed service -> tweet service

Q: why not talk to tweet service directly?

A: when don’t want to publish tween service. It’s a private service, not exposed externally

Q: how does a friend see the new tweet?

A: brute-force solution: first check friendship service to get all followers. Then talk to tweet service to retrieve the tweets of the followers. Newsfeed service can aggregate based on the timeline.

Q: How can newsfeed service get all newsfeed?

A: when can use friendship service to lookup relations, then tweet service will return from different followers, then tweet service can get the tweets

API design:

getTweets/Id=123/pageNo=1

postTweets/getTweets/page/No=1

{useId: {123, 124, 125}}

100 tweets * 3 = 300

Q: will you return all tweets? Which service will do pagination

A: newsfeed services, it sends pagination to tweet service to retrieve based on pagination

Q: what happens when we increase from 100,000 users to 100M users?

A: there may be bottleneck in different services

We can separate people using space distribution, dense distribution, inactive/active, lazy pull

We can fan out when people create new feeds, the feeds are fanned out to followers

A: We can notify followers when a new post is created

Q: for 100M users, are you going to use the tweet service to retrieve the tweets?

A: we can use some sharding, e.g. for celebrities, we can add cache. We can directly use some cache

We can use async service to persist the tweets in mongo db; we can use cache management to handle celebrities.

Q: what does notification service do?

A: newsfeed -> tweet service -> notify all followers

Notification service monitors the people you are following

Interviewer and Audience Feedback

Interviewer:

Interviewee did not ask for a complete functional requirement. Should clarify the requirement. Users can post. Users can see their own posts. User can see newsfeed

Too much time spent on estimation: low QPS. I wasn’t expecting a lot of time spent here.

High level design: can make the diagram more clear. Database design should include table names.

The most important is to discuss the tradeoff of push vs poll.

User service, validation. I didn’t care about this area. Should be more focused

Scaling system can wait till when we scale it 1000 times

Fanout - we didn’t discuss about it

Interviewee:

Architecture: I was interrupted

Newsfeed service, want to be a multi-tier service. Wanted to use this as coordinator

QPS, DAU design. I didn’t get the point of the interviewer. I thought it was too low and a trick question. Different from normal

Push vs poll, we didn’t discuss deeply. At the beginning it’s low QPS, we didn’t add cache or queue. So I tried to simplify at the beginning, so I was not set up to discuss the cache.

If the amount is small: we can just use database

Increasing QPS: we can add cache

Increase more: can add queue and cache

Should prepare for multiple tier of loads

Interviewer:

I hoped for basic architecture at the beginning. Then we can add a message queue and cache.

I provided low QPS, facilitating the basic architecture

Then adding QPS can lead to sharding, message queue and cache

Wanted to see a progression

===

What’s the best way to drive?

If we notice it’s very low QPS; we can quickly calculate the order of magnitude

If we think the calculation is important, we can calculate but we can refer back.

We can design with a single machine.

High qps => Distributed system

If we have very low QPS, we can just retrieve from database

If we have high QPS, then we improve the design

Push vs pull is based on the case. Business driven technical solution.

What are the key dimensions?

Normal user vs celebrity

Post service to create post

ViewPost: view my own post

Low number of users, we can reactively query the users followed, then retrieve the feed from database

High number of users

we can proactively compute the feed when post is created.

celebrity: too many followers. Add a flag in Postcache whether it’s a celebrity. No fanout for celebrity

Combine pre-created feed and celebrity posts

Notification service:

can push new posts to offline users

When a celebrity sends a post, the user receives a notification

Post service -> queue -> newsfeed service

Batch job to fanout

Fanout only to active user. We need to detect which users are active users. We can build a user cache to mark if they are active or not.

Major event, many celebrities all send post

All people try to poll at the same time, then we need to do a rate limit

Newsfeed cache: can be regenerated if it’s down.

We may need to reconstruct the cache from database

Or we can persist the result into database

Should we cover pros and cons of different DB and different cache? Vendor, SLA

Newsfeed ranking service:

Add some ads, or add weight to some people.

https://whimsical.com/newsfeed-Kgu7U3wpGYiYbwit8AYTVj