Distributed Key-Value Store
Materials — open to everyone, no sign-in
Topic: Distributed Key-Value Store
Interviewer: ken
Level: L5 (Senior)
Additional Resources:
System Design Interview - Distributed Key Value Store
8/6/2024
YouTube for the event:
| | | 职场提升俱乐部 |
[45]
[42]
[ Functional requirements
Design the next generation of distributed KV store that will support services:
authentication, authorization, messaging, etc
Won’t be used to support payment
Range query
Technical implications: B+ tree more ideal than hashtables
Eventual consistency
Technical implication: no need for transactions
Data type: string ]
[ Scaling requirements
5 PB
K/V Scale linearly
Multi data center after 5 years
Latency: Users should have real time experience, <= .5 seconds
Availability: 99.9%
Robustness: user data should not get lost
Technical implications: how to ensure multiple replicas
read/write ratio = 99/1
Technical implication: B+ tree is better than LSM tree
CAP: Availability > Partition > Consistency
Non-SQL
DNS, TCP load balancer
[36]
Partition - using hash
[34:05]
[partition by hash or by range?]
3 replica
Partition:
Based on a hash method
We may support multi-tenancy
[30:45]
3 partition
Q: What’s supported by the zookeeper?
Socket number, TCP server
Primary selection - some machine is down
Q: How to support range query?
Send it to all partitions
Calculate - if the keys are sequential
[may want to clarify the key type]
[25:25]
Q: Alternative?
Calculate the min and max key
[21:31]
Q: how does one node process the range query
Individual nodes
Cache data
Hash
Bloomfilter if the key exists or not
Rocksdb: load the page
[want to understand B+ tree vs LSM tree]
Rocksdb: B+ tree, write ahead log, a big hashtable/b+ tree table
Read-write ratio: to select which
Read » write : rocksdb , ssd, sequential read and sequential write, not in place-write. Append write, compaction
Write » Read: may not rocksdb. Local store
[14:45]
Q: ZooKeeper?
Generation and sequence concept
Primary switch, new generation
Based on generation/sequence number
Which generation/sequence is the biggest, is most the primary
Subscribe to primary
ZooKeeper storage backup
Self status: secondary/primary, currentstatus, primarystatus
[7:30]
[6:41]
meta service
Metadata ping
All meta to proxy service
====
Interviewee:
High level architecture
Range query - fan out
Detail - primary selection - process - not very well