Real time message system
Materials — open to everyone, no sign-in
Topic: Real time message system
Interviewer: 刘君洁(liujunjie034661)
Interviewee: LW
Level: L4 (Experienced Individual Contributor)
QR code
Sign up for future events. Presenters for system design appreciated.
Sign up for 1-1 coaching:
WGROUP_08G 八月底前获取30%优惠
Mock System Design Interview Summary
Interview Overview
Date: 8/14/2022
Target level: L4 (experienced IC)
Duration: 45 minutes
Topic covered: Messaging system for users
Drawing tool used: Excalidraw
Requirements
Functional requirements
1-1 chat
Need to maintain history
Interviewee: 200 vs 200,000 concurrent connections?
Interviewer: what’s the difference?
Interviewee: 10M-100M will make
Excalidraw
Non functional requirements
System Design
External APIs
send(userId, content)
receive(userId, content)
System design
Choosing NoSQL for scalability
Chat sent and saved in database
User gets online, the message is received from NoSQL
Protocol used:
Polling. Ask for message periodically. Expensive if there are no messages.
Long polling: wait for the message when there is no response.
Q: why is there high overhead for long polling?
A: each round trip requires HTTP handshake
Websocket: bidirectional communication, low overhead.
Q: How does a user subscribe to the messages that it needs to receive?
A: each user has its own message queue topic
Q: which message queue do you like to use?
A: kafka: can create a topic for each user. robust fail over and retry
Q: how do you scale up message queue?
A: for kafka, scalability is simple. Each chat server will be a publisher, and a receiver. We can add publisher and subscriber.
Q: chat server or user subscribing to the message queue?
A: chat server. THis can help us handle additional security requirements
Q: how many websockets each user needs to maintain with the server
A: one websocket per server. Bi-directional connections
Q: user A sends a message to user B
A: first it goes to database, then it goes to the message queue user B is subscribing to
Q: how does chat server know to poll the message
A: use long polling
Q: how many chat servers do you want to maintain? What is the constraint for the chat server to handle many users?
A: the bottleneck is in the middleware. Kafka can scale up easily.
Q: let’s say there are 100 chat servers. How do you make sure the server pulls the message is connected to the right user?
A: when user opens websocket to chat server, the chat server subscribes the topic in the queue
Q: how to design the system to send the right message from the queue to the right user
A: there is a local mapping between message topics and the websocket connection
Q: talk about the database
A: no relations. NoSQL is more scalable.
messageID, content, sender, receiver, timestamp
Q: what database to use?
A: we can use cassandra
Q: what is the row ID?
A: we don’t need a row ID.
Q: it’s a column based database. Do you need a row ID? What are the column names
A: column names: messageID, content, sender, receiver, timestamp
Q: should we maintain a history?
A: messageID should be incrementing. The partition key is senderID. New messages should have a larger message ID value than older messages.
Q: if I want to read the historical messages from one friend, do I need to query a different partition?
A: sender ID or receiver ID can be the partition key
Online users rely on message queue. So database is write heavy
Q: how to maintain the message order?
A: can use incrementing message ID
Q: what if the messages are too close?
A: just use the receival time sequence
Q: if there is some network issue, how do we maintain order?
A: if there are 2 messages that are really close, we still use the timestamp of the receival.
Scalability / availability
Q: how to avoid single point of failure?
A: chat servers are replicated.
Message queue: kafka provides
NoSQL: different replicas in different data centers
Chat servers: we have different instances. Can bring down using zookeeper. User can connect to a different server
Q: zookeeper for availability of chat server?
A: we can use it for balancing loads between different chat servers
Q: what load balancing algorithms?
A: consistent hashing of the user ID. If the user ID is random, there should not be a hotspot.
Q: why use consistent hashing?
A: this maps servers to users. We just want to make sure the mapping is random
Q: what complexity for consistent hashing?
A: when new server joins, we need to balance the consistent hashing.
Q: how to handle chat server leaving and joining the ring?A: heartbeat.
Q: when will it rejoin to leave the cluster?
A: when missing heartbeat for 30 seconds
Q: will it cause problem in consistent hashing?
A: randomize. If one server drops, other servers will take the load evenly
Q: message queue between chat server and noSQL database.
Ordering
Interviewer and Audience Feedback
Audience scores.
Interviewer:
Websocket: is rather difficult. Harder than one socket. Didn’t get solid answer to find which server has the websocket connection
I wonder if there are real experience using
Kafka is not very good. Consumer is by partition. Kafka is more suitable for offline, not online.
RabbitMQ is very good. Kafka streaming is good. Each partition is only consumed by one consumer.
Consumer: RabbitMQ支持的 TOPIC 比 KAFKA多。。其他好像差不多。。 九章说的。
Interviewee
Seems to be fine
I didn’t get
===
Which message queue
Not the main point - RabbitMQ
Interviewer:
Websocket server
Client connects to a specific server. There are multiple websocket servers. Even if each websocket can handle 1M connections, we need 2.
Using Redis to record which client is on which webserver
Audience: Why do we need to require mapping?
Interviewer: how do we find the right websocket server
Interviewer: Do you create a topic for each user?
Alternative design:
Audience: can we not use message queue
Interviewer: MQ: buffering, decoupling
Audience: if the receiving user is not available, we may need to deliver
Interview: Cassandra schema?
Audience: is Cassandra column based database
Interview: row key, column key
Column key is flexible
rowKey: threadID
audience:
userID, sender or receiver
userID, sender or receiver
A composite key userID1, userID2
Should duplicate the message twice
Interviewer:
Primary key = userID1, userID2 sorted order
Audience:
ThreadID is fine
Receiver is the groupID
Interviewer:
How do we order the messages?
Everybody may have a different location
Audience: Cassandra can use transaction of 2 records
Scott Shi interview
Audience: Cassandra can record the last time the user read the messages.
Interviewer: How do you maintain ordering?
Audience: we can just use the receival time. Wrong order at edge case.
Interviewer: version stamp
Vector version timestamp.
Logical version stamp
Kafka
===
Can kafka dynamically add topics?
Usually use 1 topic per kafka
We can use redis pub/sub to create many topics
Easy to create and subscribe topic
RabbitMQ can dynamically create topics
===
Audience: use NoSQL. No need for delete. Is scalable.
===
Recommend reading: https://youtu.be/6w6E_B55p0E