Metrics System (Time series DB)
Topic: Metrics System (Time series DB)
Interviewer: Bryan
Interviewee: Alina
Level: L4 (Experienced Individual Contributor)
Additional Resources:
Metrics System
Mock System Design Interview Summary
Interview Overview
Date: 10/31/2021
Target level: 4
Duration: 1 hour
Topic covered:
Drawing tool used: diagrams.net
Requirements
1M users, 10 writes, 1 read per second
Functional requirements
data collection, CRUD, aggregate, cal, store, query, data visual
Non functional requirements
high available » high consistency
high scalability
high performance
data consistency
Math constrains:
1M
QPS:
write heavy
1M * 20 / 10 ^ 5 = 10 ^ 4
peek:
3 * 10 ^ 4
read:
10 write, 1 read
10 ^ 3 /s
storage:
3 years * 365 * 10 ^ 5 * 10 ^ 4 * 1KB / (1024 * 1024)
bandwidth:
write: 1KB * 3 * 10 ^ 4 = 0.5 M/s
read: 0.05 M/s
memory:
1M * 0.2 * 1KB = 5 * 10 ^ 3 GB
System Design
System design diagram
API
metric.send(userId, eventName, status, timestamp): statusCode
metric.get(eventName, timestamp, range): List
Why is the result? Trying to get a count per day - unit can be 1 day
Customer can change range to 1 hour, 1 day, 1 week, 1 month
Push/Pull: choose push
Data flow:
Writes:
Client -> load balancer -> aggregator service -> message queue (kafka) ->
Log service
No SQL database Elastic search
Visualization service (logstash / self )
Elastic search (index, range query)
Notification service
SQL database (notification, sender/receiver for queries)
PostSQL with replica
Reads:
Client -> LB -> Redis -> read service -> elastic search, sliding window (within the window, what the counts are)
Database schema:
nosql:
{
“index”: eventName,
“timestamp”: time,
“status”: running,
“tenant”: {
“tenant1”, “tenant2”
}
}
(why is there “tenant”?)
SQL:
| NotificationID | sender | receiver | content | status | timestamp |
Discussion:
Interviewer: Go through API and schema
Interviewee:
metric.send(userId, eventName, status, timestamp): statusCode
Interviewer: What is the example?
Interviewee:
metrics.send({
User: userId
eventName: add item,
status: isAdded,
Timestamp: 2021,
})
Interviewer:
Let’s skip log and notification service
Can we see the schema for visualization service.
Interviewee:
Schema is …
Read service will read from elastic search
Interviewer:
How can user see the visualization
Interviewee:
Added Grafana (added between elastic search and read service)
Removed grafana -> the client can get a count based on range, can display its own UI
Interviewer:
Why do we have Redis?
Interviewee:
The client can keep on clicking refresh to read from database
Redis:
{
“Input parameter”: {
“eventName”, “time interval”
}
Count:
}
Interviewer:
Computation in the database is really slow. How do we do real time?
Interviewee:
Can be real time. It’s ok to be slow.
Can I get some hint? For performance improvement.
Interviewer:
If you need to monitor
1M user, all traffic at the beginning. Say within 1 minute, lots of writes
Every 5 seconds: display all traffic in the monitor
Multiple million in each query
Interviewee:
Can use some sampling to estimate to render quickly
Then render a more accurate
Interviewer:
Can we aggregate before we write to the time series database
Can we add time interval as part of the key
Interviewee:
Add a temporary storage for aggregation
Interviewer:
Kafka can aggregate
Other services can also aggregate
Before you save to elastic search
Interviewer
If our payload is huge, what can we transport the data to the database
Interviewee
Can split the load into small files
One worker is 1kb
Every second 1M records
Interviewee:
Can hash and send to different instances of aggregator
Interviewer:
Example
Interviewee
Missing data. Monitor -> wait for 2 days and there are no events. Can ask the client to resent.
Validation of data
Interviewer:
Walk through the whole design
Send request
Aggregator: missing data, validation
Kafka: push to different consumer
Log service: append to log entry
Visualization: can aggregate and write to elastic search
Notification: may or may not need
Read:
Request
Redis check cache. Return result if cache hit
Redis talk to read service if cache miss. Read service read from elastic search
UI keeps on pulling
Handle scale up:
Database sharding. Replica to make service highly available. Sharding: time interval + event key
Kafka, elastic search, read service, can all have multiple instances
Discussions during the Interview
Interviewer and Audience Feedback after the Interview
Interviewer:
Interviewee is nervous
I waited for interviewee to finish
Chat window: expressed a lot of prepared material
However should pause and ask for requirements
Spend too much time upfront
Interviewer:
Design share screen
Our requests are big
Aggregator service. Event can be aggregated at the aggregator
Partition, event name, time interval.
Regardless time interval, can make it a batch
Can aggregate 5 seconds
Currently individual records are sent to kafka directly
Load in frontend is heavy
Kafka is heavily loaded
Range query will need to visit the whole database
Slow
Metrics monitoring is close to real time
Calculation in database is slow, hard to become real time
Every time UI will need to visit the database
Time series database:
Header - file type
Table split into blocks. Log structure merge stream, append only.
Reading is relatively slow
Recommend to do aggregation ahead of time
For example aggregate within 1 second
Time series database, CRUD, compaction
Small chunks at a time
Then will merge
Cache use
Redis as cache
There is a monitor, so why does it need Redis to cache
ELK: elastic search, kabana - can read and provide close to real time visualization. It’s not through the client side.
It feeds the component directly
Hard skill:
Read more paper
Soft skill:
Try to get more feedback from interviewer
Too nervous
When I ask interviewee is stuck
Can practice more.
===
Audience
Kaisi:
Metrics design:
CPU/hardware monitoring
Application level monitoring
Direct answer missing? Is the UI querying the database continuously?
Kibana can work
Or direct
Audience
Anna: my team is using kafka, elastic search,
Every log needs to aggregate
What to use as partition key in Kafka
How to accelerate reading
Elastic search: how do I save?
Index: similar to DB table
How to define table schema
Interviewer:
I asked interviewee about schema. We probably can’t get answer
Partition key: Log, metrics, notification. Can use partition.
If we don’t use partition. 3 services may read the same record/log
Metrics: time-range, aggregation count, event name. Reduction of volume.
Using partition: can allow slow process keep running. Faster process
Elastic search. Full text search.
Metrics monitoring: Influx db is fine.
Prometheus (google)
If we really want complete log, and we want to full text search. Answer: don’t do it.
If you need it, then use ELK stack
Anna: should not use elastic search
Interviewer: Usually interviewer will not deny your choice
Anna: should we choose tool first? DB/cache
Bryan: queuing system. If interviewer doesn’t know the system, try to use generic terms instead of specific technology Kafka
Push vs Pull. Pros and cons. Why do we use push?
This will increase your score. Try to cover choices and rationales.
Audience: 1 million user. Very heavy traffic. API send metrics may be too slow.
Can we first write to log, then batch process log
Interviewer:
aggregation service. Aggregate, then classify. Do some aggregation (small, big time range), do it as a batch. Smaller latency.
Can do finer aggregation before writing.
Client side is rare to issue read requests. We may not need redis
Main bottleneck is at write side.
If there are 3 services, then there may be time skew between 3 services.
Most of the time, interviewer will write down redflags. If the interviewer asks multiple times, then there must be something wrong.
Audience:
Metrics circular dependency
Kafka can go down, then metrics system can go down.
Interviewer:
If there are some part of Kafka is down,
3 replica. 1 is primary. When primary replica is down, then fail over to other replica
Audience:
Write heavy. If read heavy, elastic search already contains read cache.
Audience:
Everyday should have a partition
Interviewer:
Time interval as key. Bloom filter. Different range. Can do a fast filter to only
Inverse index. Make a
If it’s based on day, can partition/save by
Audience:
How to handle tag (e.g. based on location)
Audience:
Add tag as part of the key. Can do a bloom filter
Audience
5 tags, 3 values. 3 ** 5. Lots of tags/partitions
Yes we can use this optimize.
Kaisi: Audience
Audience:
If granularity is 1 minute, 1 hour, 1 day, then we need 3 tables
Table 1:
Every record covers 1 minute
Table 2:
Every record covers 1 hour
Table 3:
Every record covers 1 day
Interviewer:
Granularity should depend on business requirements.
Interviewer:
Async: scanner
One day: need to wait for the whole day
Every hour compact minutes to hour
Every day compact hour to day
Audience
Aggregation service upfront is mainly used to reduce QPS
Eric Che
TSDB has already implemented all of these
Interviewer:
From system design, it’s not driven based on DB. It’s based on our requirement
Audience
Aggregator, time series DB
Audience
If there is a TSDB, we can directly send record to TSDB
Interviewer:
Using Kafka can handle a wide range of requirements
Interviewer:
Aggregator service, aggregate based on second, and type of message
Writes to database
Influx DB
Log service - full text search can use elastic search
===
Anna:
How do we show full life cycle
Interviewer:
Later stage how to maintain system
No single point of failure, heartbeat, health
Core system design must work. Non-functional requirement
How to handle over count, under count, exactly count
Exact commit: 2PC
Audience:
Prometheus supports tag
Audience: another architecture graph (based on interviewer):