Design YouTube
Topic: Design YouTube
Interviewer: video from 12-5-2021
Interviewee: system
Level: L4 (Experienced Individual Contributor)
System Design Interview
Join Us on Wechat
System Design
Interview Notes:
Requirements:
Focus on upload and viewing
Multiple device types
View the video, thumbnail, preview (when hovering over the video)
Support like feature
No functional:
Do not lose the video
Watch video fluently
High availability
Reduce network cost
Constraints:
100M daily active users
5 videos / day person
View:
QPS: 5 * 100 million / 86400 (~0.1 million) = 5 * 1000 = 5000 / s
Upload: download 1:200
QPS: 25/s
300M/video
Upload bandwidth: 7G / s
Storage: 683T/day
[14]
External API:
getVideo(video_id.user_id.offset) -> URL
uploadVideo(video_id.user_id, description, length, tags[], video_content)
[15]
Q: Why separate metadata server and upload service?
A: Because the upload takes a long time, we don’t need to block the metadata service. We can first contact the metadata service, which returns a signed URL. Then the client contacts the upload service for uploading.
Q: why encode service is before the upload service
A: client uploads video first to the original storage
Q: why do we need original service?
A: this supports different video encodings.
Q: do we ask the metadata service for URL to upload?
A: yes.
Need to change the APIs
Interviewee: can we just use an existing solution or do we need to build our own distributed file system?
Interviewer: why?
Interviewee: it’s simpler to use and maintain than building our own.
Add message queue to trigger the encode service
Q: why do you use message queue?
A: because encoding and uploading are time consuming. It’s more efficient if we do it asynchronously.
Q: what’s the benefit?
A: don’t need to wait.
Q: Why do you choose S3 but not other type of storage?
A: Because we have a lot of videos. Store videos
Q: Other db also support large storage
A: we can also choose from Azure or Google cloud
We also need to update the status of the video
[31]
Design download
Download bandwidth is very high
Client can directly go to S3 to get the video
We need to add CDN to cache popular videos
[may need a meta data service to resolve video ID to CDN URLs]
CDN will cache popular video, and fall back to S3
We can use recommender/topK system to calculate popular videos to cache in CDN
Q: any other optimization?
A: encode into different resolution
Q: how do we design comments features?
A:
External API:
comment(video_id.user_id.content_comment_id)
Like(video_id.comment_id)
Interviewee: Define database schema?
Interviewer: cover the comment service first
We can save comment into noSQL database
Q: why NoSQL for comment database
A: schema is straightforward. Many comments in one video. Large amount of data. No need to join tables
Add cache to cache popular comments
[43]
We can store the comments in Redis and persist into database every 5 minutes
Q: If the user hits the like button, this action is important to the user
A: do we need to make it accurate?
We can add a standby Redis
1:24:49:10
1:25:47:09