Metrics System (Time series DB)

System DesignDatabases & Storage

Topic: Metrics System (Time series DB)

Interviewer: Bryan

Interviewee: Alina

Level: L4 (Experienced Individual Contributor)

Additional Resources:


Metrics System

Mock System Design Interview Summary

Interview Overview

Date: 10/31/2021

Target level: 4

Duration: 1 hour

Topic covered:

Drawing tool used: diagrams.net

Requirements

1M users, 10 writes, 1 read per second

Functional requirements

data collection, CRUD, aggregate, cal, store, query, data visual

Non functional requirements

high available » high consistency

high scalability

high performance

data consistency

Math constrains:

1M

QPS:

write heavy

1M * 20 / 10 ^ 5 = 10 ^ 4

peek:

3 * 10 ^ 4

read:

10 write, 1 read

10 ^ 3 /s

storage:

3 years * 365 * 10 ^ 5 * 10 ^ 4 * 1KB / (1024 * 1024)

bandwidth:

write: 1KB * 3 * 10 ^ 4 = 0.5 M/s

read: 0.05 M/s

memory:

1M * 0.2 * 1KB = 5 * 10 ^ 3 GB

System Design

System design diagram

API

metric.send(userId, eventName, status, timestamp): statusCode

metric.get(eventName, timestamp, range): List / integrate

Why is the result? Trying to get a count per day - unit can be 1 day

Customer can change range to 1 hour, 1 day, 1 week, 1 month

Push/Pull: choose push

Data flow:

Writes:

Client -> load balancer -> aggregator service -> message queue (kafka) ->

Log service

No SQL database Elastic search

Visualization service (logstash / self )

Elastic search (index, range query)

Notification service

SQL database (notification, sender/receiver for queries)

PostSQL with replica

Reads:

Client -> LB -> Redis -> read service -> elastic search, sliding window (within the window, what the counts are)

Database schema:

nosql:

{

“index”: eventName,

“timestamp”: time,

“status”: running,

“tenant”: {

“tenant1”, “tenant2”

}

}

(why is there “tenant”?)

SQL:

| NotificationID | sender | receiver | content | status | timestamp |

Discussion:

Interviewer: Go through API and schema

Interviewee:

metric.send(userId, eventName, status, timestamp): statusCode

Interviewer: What is the example?

Interviewee:

metrics.send({

User: userId

eventName: add item,

status: isAdded,

Timestamp: 2021,

})

Interviewer:

Let’s skip log and notification service

Can we see the schema for visualization service.

Interviewee:

Schema is …

Read service will read from elastic search

Interviewer:

How can user see the visualization

Interviewee:

Added Grafana (added between elastic search and read service)

Removed grafana -> the client can get a count based on range, can display its own UI

Interviewer:

Why do we have Redis?

Interviewee:

The client can keep on clicking refresh to read from database

Redis:

{

“Input parameter”: {

“eventName”, “time interval”

}

Count:

}

Interviewer:

Computation in the database is really slow. How do we do real time?

Interviewee:

Can be real time. It’s ok to be slow.

Can I get some hint? For performance improvement.

Interviewer:

If you need to monitor

1M user, all traffic at the beginning. Say within 1 minute, lots of writes

Every 5 seconds: display all traffic in the monitor

Multiple million in each query

Interviewee:

Can use some sampling to estimate to render quickly

Then render a more accurate

Interviewer:

Can we aggregate before we write to the time series database

Can we add time interval as part of the key

Interviewee:

Add a temporary storage for aggregation

Interviewer:

Kafka can aggregate

Other services can also aggregate

Before you save to elastic search

Interviewer

If our payload is huge, what can we transport the data to the database

Interviewee

Can split the load into small files

One worker is 1kb

Every second 1M records

Interviewee:

Can hash and send to different instances of aggregator

Interviewer:

Example

Interviewee

Missing data. Monitor -> wait for 2 days and there are no events. Can ask the client to resent.

Validation of data

Interviewer:

Walk through the whole design

Send request

Aggregator: missing data, validation

Kafka: push to different consumer

Log service: append to log entry

Visualization: can aggregate and write to elastic search

Notification: may or may not need

Read:

Request

Redis check cache. Return result if cache hit

Redis talk to read service if cache miss. Read service read from elastic search

UI keeps on pulling

Handle scale up:

Database sharding. Replica to make service highly available. Sharding: time interval + event key

Kafka, elastic search, read service, can all have multiple instances

Discussions during the Interview

Interviewer and Audience Feedback after the Interview

Interviewer:

Interviewee is nervous

I waited for interviewee to finish

Chat window: expressed a lot of prepared material

However should pause and ask for requirements

Spend too much time upfront

Interviewer:

Design share screen

Our requests are big

Aggregator service. Event can be aggregated at the aggregator

Partition, event name, time interval.

Regardless time interval, can make it a batch

Can aggregate 5 seconds

Currently individual records are sent to kafka directly

Load in frontend is heavy

Kafka is heavily loaded

Range query will need to visit the whole database

Slow

Metrics monitoring is close to real time

Calculation in database is slow, hard to become real time

Every time UI will need to visit the database

Time series database:

Header - file type

Table split into blocks. Log structure merge stream, append only.

Reading is relatively slow

Recommend to do aggregation ahead of time

For example aggregate within 1 second

Time series database, CRUD, compaction

Small chunks at a time

Then will merge

Cache use

Redis as cache

There is a monitor, so why does it need Redis to cache

ELK: elastic search, kabana - can read and provide close to real time visualization. It’s not through the client side.

It feeds the component directly

Hard skill:

Read more paper

Soft skill:

Try to get more feedback from interviewer

Too nervous

When I ask interviewee is stuck

Can practice more.

===

Audience

Kaisi:

Metrics design:

CPU/hardware monitoring

Application level monitoring

Direct answer missing? Is the UI querying the database continuously?

Kibana can work

Or direct

Audience

Anna: my team is using kafka, elastic search,

Every log needs to aggregate

What to use as partition key in Kafka

How to accelerate reading

Elastic search: how do I save?

Index: similar to DB table

How to define table schema

Interviewer:

I asked interviewee about schema. We probably can’t get answer

Partition key: Log, metrics, notification. Can use partition.

If we don’t use partition. 3 services may read the same record/log

Metrics: time-range, aggregation count, event name. Reduction of volume.

Using partition: can allow slow process keep running. Faster process

Elastic search. Full text search.

Metrics monitoring: Influx db is fine.

Prometheus (google)

If we really want complete log, and we want to full text search. Answer: don’t do it.

If you need it, then use ELK stack

Anna: should not use elastic search

Interviewer: Usually interviewer will not deny your choice

Anna: should we choose tool first? DB/cache

Bryan: queuing system. If interviewer doesn’t know the system, try to use generic terms instead of specific technology Kafka

Push vs Pull. Pros and cons. Why do we use push?

This will increase your score. Try to cover choices and rationales.

Audience: 1 million user. Very heavy traffic. API send metrics may be too slow.

Can we first write to log, then batch process log

Interviewer:

aggregation service. Aggregate, then classify. Do some aggregation (small, big time range), do it as a batch. Smaller latency.

Can do finer aggregation before writing.

Client side is rare to issue read requests. We may not need redis

Main bottleneck is at write side.

If there are 3 services, then there may be time skew between 3 services.

Most of the time, interviewer will write down redflags. If the interviewer asks multiple times, then there must be something wrong.

Audience:

Metrics circular dependency

Kafka can go down, then metrics system can go down.

Interviewer:

If there are some part of Kafka is down,

3 replica. 1 is primary. When primary replica is down, then fail over to other replica

Audience:

Write heavy. If read heavy, elastic search already contains read cache.

Audience:

Everyday should have a partition

Interviewer:

Time interval as key. Bloom filter. Different range. Can do a fast filter to only

Inverse index. Make a

If it’s based on day, can partition/save by

Audience:

How to handle tag (e.g. based on location)

Audience:

Add tag as part of the key. Can do a bloom filter

Audience

5 tags, 3 values. 3 ** 5. Lots of tags/partitions

Yes we can use this optimize.

Kaisi: Audience

Audience:

If granularity is 1 minute, 1 hour, 1 day, then we need 3 tables

Table 1:

Every record covers 1 minute

Table 2:

Every record covers 1 hour

Table 3:

Every record covers 1 day

Interviewer:

Granularity should depend on business requirements.

Interviewer:

Async: scanner

One day: need to wait for the whole day

Every hour compact minutes to hour

Every day compact hour to day

Audience

Aggregation service upfront is mainly used to reduce QPS

Eric Che

TSDB has already implemented all of these

Interviewer:

From system design, it’s not driven based on DB. It’s based on our requirement

Audience

Aggregator, time series DB

Audience

If there is a TSDB, we can directly send record to TSDB

Interviewer:

Using Kafka can handle a wide range of requirements

Interviewer:

Aggregator service, aggregate based on second, and type of message

Writes to database

Influx DB

Log service - full text search can use elastic search

===

Anna:

How do we show full life cycle

Interviewer:

Later stage how to maintain system

No single point of failure, heartbeat, health

Core system design must work. Non-functional requirement

How to handle over count, under count, exactly count

Exact commit: 2PC

Audience:

Prometheus supports tag

Audience: another architecture graph (based on interviewer):