Voting System
Topic: Voting System
Interviewer: jack
Interviewee: @Marco Y
Level: L5 (Senior)
QR Code
Registration:
YouTube channel
https://www.youtube.com/channel/UCKzpuki3fHTfCCngJDCZ_Mg
Mock System Design Interview Summary
Interview Overview
Date: 9/18/2022
Target level: ?
Duration: 45 minutes
Topic covered: Voting for talent show
Drawing tool used: Whimsical
Requirements
Functional requirements
Vote for a talent show
24 hours to vote
Query voting result anytime before or after the voting time
Vote from phone, web, mobile
Registered users only
VP user can cast 10 votes
Regular user - 1 vote only up to three actors
VIP users can cast 10 votes up to three action
Regular users - 1 vote only up to three actors
Data retention - keep for 1 month
Scaling requirements
Number of users: 1B voters
VIP users: 1k
Low latency
Highly available
Highly consistent
Scalable: support up to 1B
Estimation
Estimate:
QPS 1b / 10^5 = 10k per second
Storage:
1b * 1k ~= 1 TB
Need to cons
System Design
https://whimsical.com/voting-system-demo-TxAGpeHtAr3D5FB5jo7t4a
Database schema
User(name, email, dob, phone, isVIP, creation_date)
constraint(conntestantId, name, dob, creation_date)
Score(constestantId, voteCount)
Voter(contestantId, voteCount)
Q: How do we rollback a change if we detected fraud?
We can rely on voter ID
We can use sorted set in redis to support top voted contestants
===
APIs
POST /api/v1/vote
req(api_key, constestantId, voteCount)
req(200 -OK)
GET /api/v1/score
req (api_key, size, sortorder(top)}
rep(list of contestant meet the criteria in JSON)
Q: Can you handle 1 billion users?
A: throughput = 10k per sec
If one machine can handle 1k requests per second
-> 10 machines
We need multiple web servers and vote servers
Add slaves for redis to handle the reads
One server may not be able to handle such high throughput, we can partition Redis
Shard key: contestantID -> hashID
Autoscaling: shard by hashkey
Transaction
Lua script
WATCH/MULTI/EXEC
How to handle fraud? Some users may have click frauds
Security - 2FA
Username + password + token
Close the vote
App server should validate the timestamp
= 24 hours, discard the vote
Continue process another 30 minutes to those in-flight votes
Q from interviewee: should I cut the voting by 24 hours
A: how do you cut it?
Interviewer and Audience Feedback
Interviewer Feedback
Good answers
Complete
Handled different type of abuse, and choice of database
The last part to cut off through a checkpoint.
Strong knowledge
Interviewee Feedback
Did some research
Research Redis
Expressed what I prepared
Need to improve skills, such as calculating the number of machines
There are some special knowledge needed. Need to improve
Design from interviewer
Interviewer: my design may not be as good as interviewee
Audience: how does the system aggregate the data into database
There should be an application to aggregate
Audience: Lua script may not be fast enough. MQ can aggregate
Interviewee: it’s hard to cutoff. MQ may be slow
Audience: we can use spark, microbatch
Interviewer: can read from Alex Xu book 2: realtime gaming leaderboard
Audience: if the network is bad, how to handle overvoting or missing voting?
Interviewer: this should be handled by frontend
Audience: how to handle fake votes?
Interviewer: IP address filter
Audience: in your design, there are two arrows to redis
Interviewer: Redis contains aggregate vote, and individual vote
Audience: if there are big amount of votes, and TTL is 30 seconds. Will we see the updates after 30 seconds?
Interviewer: worker will update redis before 30 seconds
Audience: how does your design dedupe multiple votes from the same user?
Interviewer: it’s not handled by this design
Audience: 1 machine handles 1k queries. One machine can handle 60 queries
Interviewer: Redis processes in memory and is very fast. SQL persists to disk so it’s slower.
Audience: tomcat, jetty, 500 QPS. 1 second to process a request. Then 500 QPS
500 threads by default. You can increase or decrease the thread count
CPU - a few hundred ms. 1 core .1 seconds. 10 requests per second per core
32 core - 300 requests per second
Interviewee: do we need to load test?
Audience: yes
Audience: Redis as a storage. - data retention for one month. Do we need to persist?
Interviewee: Redis can persist: capture snapshot. Append only file. It’s different from memcache
Audience: how do we aggregate?
Interviewee: we don’t have an aggregator. We can have a python code segment; it reads the count from a sorted set.
Audience: do we need to aggregate from different machines?
Interviewee: we have multiple machines, each for each shard. Aggregation can be handled by app server.
Audience: there are so many app servers.
Interviewee: every shard has its own counter. We can aggregate data across different machines. During this aggregation we don’t need to touch the counter.
Audience: do we need to read from multiple servers for every read query?
Interviewee: the same contestant always goes to the same partition.
Audience: there may be a hot partition if there is a popular contestant
Interviewee: hard to handle
Audience: you can use map-reduce
Audience: how to implement auto scaling?
Interviewee: AWS EC2 instance. We deploy applications. When there is a traffic spike, the system will generate another instance of the same server. The load balancer can route to the old and new instance. It’s a common feature in cloud platform
Audience: hot partition. Read can go to a slave, but write always goes to master
Audience: We don’t need a LUA script.
Audience: we need to ensure every user vote 3 times. If there are 2 devices, it will be problematic
Rate limiter. Every user can only vote one time every second.
Audience: can use saga
Audience: Redis can handle 100k per second. We probably don’t need to worry about partition
Audience: how do we get the final score?
Interviewee: we can call the GET API to get the scores from different partitions.