Distributed Message Queue

System DesignMessaging & StreamingDistributed Systems

Topic: Distributed Message Queue

Interviewer: longwei

Interviewee: chenbin

Level: L5 (Senior)


System design Public speaking / behavioral interview

Topic

Mock System Design Interview Summary

Interview Overview

Date: 6/19/2022

Target level: L5

Duration: 45 minutes

Topic covered: distributed message queue

Drawing tool used: Jamboard

Key problem vote: ?

Requirements

Functional requirements

Distributed message queue across regions

In order delivery: input order is A, B, C, then output order is A, B, C

Support topics

Consumer will pull messages

Support multiple consumers?

Same order in a partition

Lower priority

Once and only once

Non functional requirements

Most important:

Reliable

Fault tolerant

Scaling is not as important

Smaller message.

Does not need to be as scalable as kafka

Initially we just consider a single consumer

System Design

External APIs

Producer: post(message, ID)

Consumer: pull(ID)

System design

13 minutes

Interviewee: we should use ack=3 to ensure the message is reliably persisted by 3 replicas

Interviewer: What’s in meta service?

Interviewee: similar to a zoo service. Supports leader/follower election

Interviewer: How do we maintain ordering?

Interviewee: we need to decide which component will do the ordering. We can synchronize the calls using the clock of the leader of the replica.

Interviewer: how do you define a leader? If we have leader and follower relations, why do we need metadata service?

Interviewee: some place we need to record which one is the leader.

Metadata: leader election for replicas

Locate message ID -> message storage service

Interviewee:

Consumer comes with consumerID

Returns list

Meta service converts message ID to message storage service.

Interviewer: does the service return a list of IDs, or the message bodies?

Interviewee: the API service will return a list of messages, not a single one.

Let’s say producer sends A, B, C, D

When consumer arrives

Returns A

Returns B, C

Returns D

Each they may get more than one message

Interviewer:

consumer sends CID to API server

API server looks up the message storage from metadata service

API server retrieves the messages from the message storage

Interviewee:

Correct.

Interviewer: how do you handle the ordering

Interviewee: we can write to the leader, and the leader decides the order. Then the leader sends the ordering to replicas

Interviewer: do you use timestamp?

Interviewee: we can use timestamp or sequence number

Interviewer: why not order in the API server. (I’m fine)

Interviewer: how to support high availability? Is meta service a single point of failure?

Interviewee: we use raft to coordinate between Meta service nodes. We can use zoo keeper with multiple nodes, running consensus algorithms.

Interviewer: zookeeper with more than one instance?

Interviewee: we can set up zookeeper on 5 machines

Interviewer: zookeeper itself is a cluster

Interviewee: yes

Interviewer: replica x 3. Metadata service has multiple replicas. We have 3 copies of the data. If one copy of the data is bad, what do you do?

Interviewee: if one copy is bad

Read and detect failure. We need to go back to the metadata service to allocate another storage node.

Or metadata service provides the addresses of multiple services. We may detect the corrupted replica during read. If we find out we have fewer than 3, we will metadata service to allocate another storage node, and replica data into the new node.

Interviewer: if the API returns a list of messages to the consumers, how do you make sure the same messages will not be returned again

Interviewee: API service will see the list of messages. If we have consumed the actual message, let’s say we have chunk services c1, c2, c3. We need to get acknowledgment of consumer, then we bump the watermark at the meta data.

Interviewer: why we don’t remove the messages?

Interviewee: benefit is we save the disk space. Alternative is we save it for a couple weeks. It may be good to retain the data to handle edge cases.

Audience Rating

Interviewee Self Feedback

Basically covered all requirements

The interviewer skipped some requirements. I followed the focus of the interviewer.

At the beginning I missed some requirements. I didn’t follow up right away.

A bit nervous

Soft skill

Audience: We can describe the use case.

Interviewee: it looks like the interviewer locked down the requirements

Audience: Different use case. There are multiple solutions on the market such as RocketMQ, Kafka. Throughput, latency, business requirements.

Interviewee: small scale. Focus was on fault tolerant.

Audience: Any business case?

Interviewee: small scale, similar to finance applications

Audience: it looks similar to Kafka

Interviewee: best to ask about the business

Hard skill

Interviewee: ordering. If there is only one consumer, it’s not hard to guarantee the order. However, it’s more difficult to guarantee order for multiple consumers. Every partition can only be consumed by (at most) one consumer.

Audience: timestamp or broker timestamp. If it’s based on broker timestamp, then it is fine.

Interviewee: I tried to clarify with interviewer. It looks like for one partition, we guarantee the order.

Audience, in Kafka, if we use the same key, we will send to the same partition.

If we need to ensure end-to-end, it’s more difficult, for example, we cannot allow two producers to send at the same time.

How to partition: handled by procedure

How to sequentialize, how big is the throughput, producer must coordinate with the broker.

Broker is within the storage. Every broker is a machine in the message storage.

Audience M: LB is put in broker. In producer client library will resolve the address of the broker. Using LB it may

Audience: zookeeper needs to push update to the producer.

Audience: if the master is down, will the zookeeper send it to the producer?

Audience M: producer will pull data from the coordinator of the cluster.

Audience M: yes we can also design metadata as zookeeper, and zookeeper pushes the configuration with the producer.

Kafka controller: relieved some duty from zookeeper.

Discuss tradeoff.

Use load balancer

Or directly put it in consumer

Consumer Pulls metadata from metadata

Metadata pushes metadata to consumer

Audience: How do we implement a queue? Table, in-memory, file

Interviewee: we can save everything on disk.

We can use version or timestamp to ensure

Walkahead log, save the offset

Audience: walkahead log?

Interviewee: file 1, file 2, file 3.

Audience: Are we saving multiple messages in the same file?

Interviewee: yes. Append to file, and change offset.

Audience: use reactor mode, use a single thread to write to the file.

Audience M: we can have multiple producers.

Interviewee: we can build an in-memory queue, the flush from queue to disk

Audience: scalability, partition

Interviewee: yes it’s important but the time is limited.

Audience: how to handle some producer sending first but arriving last?

Interviewee: we can provide some locking at the client side when sending the messages.

Audience: DDIA total order guaranteed by consensus algorithm.

Audience: rabbit queue can support ordering

Audience:

Reference: qingyun 算法

Audience: exactly once. Idempotent key for producer. Idempotent key for consumer.

https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/