Design YouTube (used)
Topic: Design YouTube (used)
Interviewer: peijin
Interviewee: sinco
Level: L5 (Senior)
Topic
Mock System Design Interview Summary
Interview Overview
Date: 12/5/2021
Target level: L5 (senior)
Duration: 1 hour
Topic covered: Design YouTube
Drawing tool used: Whimsical
Requirements
Functional requirements
View the video, thumbnail
Upload the video
Search the video
Popular / comments on the video
Non functional requirements
We should not lose the video
Watch the video fluently
Make sure high availability
Try to reduce the network cost
Constraints:
DAU: 100 million
5 videos / day person
View:
QPS: 5*100 million / 86400 (~0.1 million) = 5000 queries per second
Upload:
Download: 1:200 ratio
QPS: 25 / second
Download width: 7G/s * 200 = 1400G/s = 1.4T /s
Assumption: 300M / video
Upload width: 7G / s
Storage: 683T / day
System Design
External APIs
getVideo(video_id, user_id, offset)
Offset is needed because large video is chopped into small pieces
uploadVideo(video_id, user_id, description, length, tags[], video_content)
System design diagram
Upload flow:
Changing API for upload
uploadVideo(video_id, user_id, description, length, tags[])
-> returns { presigned-url }
Q: Should we choose Google cloud storage /AWS S3, instead of maintaining our own distributed file system
A: what are the tradeoffs of pre-built solution vs self-maintained solution?
Q: Reduce maintenance if we use a pre-build solution.
Adding message queue to trigger encode service
Q from interviewer: Why message queue?
A from interviewee: Encoding service and upload service are time consuming
Q from interviewer: Why using S3 and not other types of storage?
A from interviewee: because we nee d a large amount of storage; S3 can scale unlimitly.
Q from interviewer: why not other database
A from interviewee: can use google cloud and azure.
Record status of the intermediate processing:
Download flow
Retrieve data from S3
Add CDN to cache popular videos
Q from interviewer: how do we know what are the most frequently watched videos?
A from interviewee: add TopK system and use the output to cache the popular videos
Q from interviewer: any other optimization for watched videos?
A from interviewee: encode video at different resolution
Q from interviewer: how do we implement comment features?
Add API:
comment(video_id, user_id, content, comment_id)
like (video_id, comment_id)
Interviewee: DB schema first or comment first?Interviewer: First comment at high level, then go to DB design
Interviewer: What will you store in Redis
Interviewee: schema for redis
How to handle Redis failure?
Data is not very important
If Redis fails, we can sync up Redis with NoSQL every 1 or 5 minutes
Interviewer: How to make sure there is no loss due to redis crash?
Interviewee: another copy of redis as stand-by
Database schema
Additional design
Discussions during the Interview
Interviewer and Audience Feedback after the Interview
Requirement gathering - feature
L5 interviewee should drive the interview process
Tradeoff - sometimes may not get keypoint, S3 - video large file size, (not amazon vs google)
Estimate QPS - no need to be very accurate
Some portion is not covered (e.g. database schema)
Project awareness, not enough time to cover
Audience
Should the interviewee propose and then follow up with additional
Or entirely ask the interviewer?
Interviewer
Prefer interviewee
Audience
We have not seen the landing page design
Audience
Yes. Probably based on recommendation system
Audience
Upload now is 2 APIs -> create metadata and URL
2 steps for upload, but one page may break
Audience
Upload service can be put in between client and S3
前兆万兆 网卡
Upload
Dev tool
如何看到video from S3?
S3: 4000 small file for 10 minute video
First get metadata from server for the chunks
Then retrieve each chunk
We should write the API for download video to deal with chunks
There are only 2 APIs, but there are so many services.
For example, how to save, there is no API
How to upload to original service => requires API
L5: should cover some additional questions
Non-functional: lots
10G per second -> cluster, or many instances. If one network supports x-K bytes, then how do we handle?
How do we handle upload error?
How do we handle popular and non-popular videos?
How do we pace it?
45 minute - may not have enough time to cover all features
Confirm the number of the features. Directly go to the important features
The interviewee seems to know most of the knowledge, but did not express it systematically.
Scalability, cache.
How to form the system?
Interviewer: “I expect you to drive the interview”
First write down an outline before drawing the video, such as life cycle of the video, upload, download, review
Comment service. Interviewer: address the key points
Should discuss the tradeoffs
NoSQL: should talk about extensibility, high concurrency
SQL: query is good
Why Redis: high availability
Redis cluster, scale up
Upload/download worker? Kubernetes? AWS?
Lamda: small load, servless. upload/download clusters
Everybody use S3 or not?
Netflix uses AWS. Netflix builds its own CDN.
S3 streaming
Cut into small pieces
Can be on the client. Download pieces, then play the downloaded portion
Can S3 sustain such high QPS?
If 20% users download from S3. Most users should rely on CDN.
CDN and S3 should be connected.
Can put a lot of CDNs
IP - map to CDN
CDN and S3 should have dedicated line to ensure up-to-date
Is there another buffer between S3 and CDN
Netflix: uses AWS
Self-built service - load balancer becomes the bottleneck
Bilibili - they will build their own infrastructure
Netflix - found lower cost in AWS
Instagram - photos - their own storage system
Depends on the scale of the company
Mock - interviewer - interviewee - synchronize
Real interview, we don’t know the question ahead of time
It’s hard for the interviewer to drive the next step
Upload, download, or other things
Interviewer - hard to drive
Interviewer - should have some idea. NVP - critical journey - upload, download
Needs to be proactive, not reactive
Want to have an expectation of the key point for testing.
Limit the requirements.
Redis, NoSQL - but there is not enough time.
I don’t understand if the interviewer or interviewee
The best is the interviewee should drive. Need to confirm with the interviewer
Interviewee should drive. If the interviewer disagrees, then interviewee can change direction.
NoSQL, MySQL. Can summarize the interviewer and make it as a keypoint.
Try to drive toward the part that you understand.
There are some usual tradeoffs - nosql/sql
Buffering
Synchronize vs asynchronous (message queue)2-3 minutes. Then confirm you are on the right track.
Is interviewer in partnership with interviewee?
At checkpoint: but you can double confirm at each checkpoint.
If the interviewer wants to ask. Then interviewee should take the hint
How does the interviewer score? What metrics?
Different interviewers are different.
Cannot expect all interviewers are professional.
Soft skill / hard skill.
Communication skill. 2-4 minutes, confirm with interviewer.
Hard skill. Every question may have some points, scalability, reliability.
Why should we use message queue? Did I give the right answers?
It is right. There are async processing.
Alternative is synchronous. Then latency is much longer.
解耦,异步,削峰,填谷
decouple
Asynchronous
Handle peak
Handle trough
Upload should be broken up.
Client side code will handle.
Some users may upload the same video
They will compare duplicate.
Can reference existing ID.
If client cut the video into 10 pieces, will it be hard on client?
4 core - 4 k video, cut into 20
Encoding is not in client side
Does the client compress before the upload?
May depend on upload format
Original source file can be saved in “original storage”
Transcode - output with compression.
Before upload
HTTP uses GZIP. Built in compression
Video may not be compressed again
Chunk then upload
Upload is just through browser
Does the browser chunk it for you?
Javascript can chunk it for you before the upload.
ADP - on top of TCP
TCP will cut into packet
When client requires different density, then it will query metadata
DASH - streaming service.
Chunk - each chunk’s offset is saved in DB
DB may not rely on mysql.
Offset - put in blob
User facing meta data, can be put in my mysql
Technical meta data are handled separately
Can use Redis to handle progress (user progress to which point)