Design Realtime Comment for Newsfeed
Materials — open to everyone, no sign-in
Topic: Design Realtime Comment for Newsfeed
Interviewer: ken
Interviewee: ying
Level: L4 (Experienced Individual Contributor)
Additional Resources:
System Design Interview - Design Realtime Comments
Join Us on Wechat
Subscribe to Our YouTube channel https://commitway.com/eventyoutube
===
[7:00]
Requirements
Realtime facebook comment
Comment below the post
1 level comment
optional: privileged comment
MAU: 10^9
10^8 post per minute
6*10^5 comments per minute
[12:40 - 7:00]
API design
Publish_comment(uid, …)
View_comment(post_id, …): get comment on a post
View_comment_by_user (uid, self_uid, auth_token): get comments made by one user
[15:00-7:00]
1 high availability 99.999%
2 resilience on comment published
3 scale well
4 No need for it to be linearizable. No need for transaction. Eventual consistency
Additional comment
- Deliver in real time (1 second)
[19:16-7:00]
QPS: 6*10^5/60 = 10^4 QPS
Comment per day: 610^56024= 1.0810^10 comment
Q: multimedia?
A: text only
1kb per comment: 1.08*10^1 TB per day ~10TB per day
Assumer 50% compression rate: 5TB per day
[24:00-7:00]
High level design
Realtime DB
Leader-follower db can work
Multi-level db works
Quorum-based: dynamo/cassandra can work
[28:44-7]
Q: SQL/NoSQL?
A: when we look at the post, we want to look at comments as a whole. SQL: hard to support locality
use NoSQL: document style or column style
Comment in post table
Key: PID, UID, post content, value: Json: {[comment1, comment2, etc}]
[appending is difficult]
Comment by user: key: ui, value: Json: {[comment1, comment2, etc]}
What database to use?
Memcached, redis
1kb to several megabye per comment
Like to find a NoSQL database that supports a few megabytes
2 users comment at the same time
Version vector system to resolve conflicts
V1 [comment1, comment2]
V1->V2 [comment1, comment2, comment3] -> V2
V1->V2 [comment1, comment2, comment4]
Comment1, comment2, comment3, comment4
[optimist lock?]
Do we keep all versions?
Delete old version when versions are combined
How do we deliver comment in real time?
Do another quorum read
39:10-7
User keeps a socket with the server
Put comments into a stream.
Or polling
Kafka stream.
Publish comments into the stream
Reading the stream: 1 assign back to comment db
Track which user is reading which post
Save comment in in-memory datastore
Realtime = 1 second
Set up kafka stream, and server behind it. Guarantees the fastest response
NoSQL speed should be fast enough
Polling for user.
Pushing can achieve faster speed
Choose polling not pushing
Prefer polling
[ may not be realtime ]
Q: Each active user reads db every 3 seconds. What is the read throughput we need to support?
A: if client is an app.
Calculate throughput
10^9 users * 0.5 * 3 posts/user = 4.5 * 10^9 read per second
Scale by partitioning onto 1000-2000 servers
4.5*10^6 per minute
4.5 billion reads per second total
4.5 million reads per second per server
Redis 100,000-200,000 QPS
Use in-memory database to serve at high speed
Polling is easy to implement but the required throughput is very high
====
Post content
Comment content
Who is reading which post
Writer: everybody
Reader: people who submits comment
Write globally, read locally: A, B
Write locally, read globally: C
1B comment reader per day
10M comment writer per day
A -> C (write locally 1 Billion users, read globally 10M user) -> B (write globally 10M users, read locally)
===