Ticketmaster
Topic: Ticketmaster
Interviewer: xiang
Interviewee: becky
Level: L5 (Senior)
Topic
Mock System Design Interview Summary
Interview Overview
Date: 11/14/2021
Target level: L4 or low L5
Duration: 1 hour
Topic covered:
Drawing tool used: https://whimsical.com/LMjh729zgs43BaxYWoKM4T
Requirements
Functional requirements
Ticketmaster
A large ticket sales event will start at a pre-set time
All tickets are the same (no difference in seating or content)
Non functional requirements
100k tickets to sell
10M people want to buy it
P0: buy ticket, cap = 2 tickets / user, all tickets the same
P0: buyer can see order status
P0: 1 hour session timeout for order
P3: view tickets
audience comment in zoom:
Non-functional requirements missing
System Design
API
POST: /v1/tickets?q=number_items
Resp {status: “”, callback: url}
(fail, pending)
POST: /v1/tickets/payment
Resp: {status: }
{Accepted, rejected, timeout}
Interviewer:
view tickets
Interviewee:
P3
Interviewer:
P0: get order status
Interviewee:
GET: /v1/orders
GET: /v1/orders/orderID
Database schema:
Ticket:
Uid: pk
Item: varchar
Total_quantity: int
Available_quantity: int
Price: int
Order:
Uid
User_id
Payment_ref
Status
Creation_ts (can be used to cancel)
Quantity
System design diagram
Discussion
Use mySQL, postGre
Reason: relational, strong consistency guarantee needed
Discussion:
Interviewer:
10M users will visit the website
Will this cause problem
Interviewee:
Not many rows in the ticket DB
Only one row in the ticket DB to indicate
Ticket table will be very small, so DB can handle it
However, if there are many types of tickets to sell, then we can think of additional layer of cache between service and DB, e.g. redis, or other in memory DB
For placing order, we need to go to MySQL
User visit website
API gateway sends request to ticket service
3.1 Ticket service check with redis the available ticket count
3.2 Ticket service places order with order service
3.3 order service updates ticket count with DB
4 order service responds the success/failure
- API gateway sends request to
Discussion
Interviewer:
500k user wants to buy ticket within 1 second
Interviewee
Multiple replica of API gateway, each one handles 50k
Interviewer:
What is in the cache
Interviewee:
Ticket remaining. Updated by batch job
Interviewer:
How often do you update the cache?
Interviewee:
10 to 30 seconds
Interviewer:
In the cache the system may show available but not available
Interviewee:
Reason for the batch job
After 1 hour, some order expire and we should update the ticket database to replenish the available count
Interviewer:
You update at 10 seconds frequency. The first second the
Interviewer:
Who updates the available quantity
Interviewee:
When user place the order, quantity is updated
Interviewee:
Session management service is needed
All tables are in mysqlDB
Redis table
ticketID: avialable_q
Probably don’t need cache service
Interviewer:
Why copy the data to redis? Why not just mySQL
Interviewee:
After the event is sold out, we don’t need to keep the table in redis
Interviewer:
Want to maintain the order of the requests
Want to make sure the request is handled based on incoming requests
Interviewee:
When can do something very close First-in/first-out
Due to uncertainty in network delay
For strict first-in-first-out, need a single thread to process the request
We can put some queuing service after the database
Discussions during the Interview
Requirements:
Are tickets the same
Interviewer and Audience Feedback after the Interview
Soft skill
312
321
Hard skill
21
Interviewer:
Non-functional requirement, which part is the most difficult
CAP - which one we should emphasize the most
Every checkpoint 3-5 minutes, ask for feedback,
Hard skill:
I am a bit confused about the design
When going into detail, what are the high level design?
E.g. I want to put all data into mysql. It may not be ideal to directly hit mysql.
Redis is correct. Cache - not good
10 seconds to update the cache, may not be good since too much changes for
Audience
Redis: If we write from redis to hard disk, will slow down performance
Audience
Redis is distributed. Don’t put count
Every ticket is an entry in redis
Audience
Key is sequential.
100 ticket - 100 entry
How does the service know which ticket is where
Ticket table:
Every type of ticket
Interviewer: all tickets are the same
Audience:
Every ticket is one entry
Interviewee:
You need to scan 5M entries
Some tickets are reserved, available or bought
From order table
Audience:
Do we need ticket ID or are they always the same?
Interviewee:
Initial design every ticket is different
Interviewer:
How to handle huge request?
How to ensure the quantity is accurate?
50k - how to distribute to 5M people?
Handle 50k requests
Audience:
You may want to throttle at the API
We may just limit the count.
There may be abuse, using the ticket grabbing system
Not everyone may buy?
If they buy and no abuse, then we can thottle at API
Interviewer:
Accurate number?
$ is saved in sql database
However the throughput is too high
Need to find some method to handle high throughput
Audience
When reading, you can read dirty data
However, when you place order, you need to be more sequential
Interviewer:
3-5M request for 10 seconds
Place order need to succeed
Interviewee:
Redis:Update quantity
Order service, insert order row
Audience
3.2: put request in the queue
Audience
What happens redis crashes
Audience
MySQL usually crashes before Redis
Can use cluster
Interviewee
Source of truth is at mysql
Sum of status paid
Redis is just a cache
Audience
Ticket and order are different systems
You need 2PC between redis and db
Interviewee:
Adding lock for one row, same for multiple rows
Audience
First read redis
Updates db, then immediately update redis
Audience
Data entry is small
Can we cache count in API gateway?
Interviewee:
Initial design uses ticket as cache
Audience
Redis: atomic write
Redis: server cluster
For distributed for
There may be bursts of writes
Concurrency issue for burst
Reading: you may read dirty
However, when you write you can
Audience
3.1 for reservation
3.2 is for placing order
Audience:
Ticket service may crash after update of redis
Then MQ may not have the request to buy ticket
Ticket service crashes, we
Interviewee
Need 2PC
Audience
Should update DB first before updating cache
So we should not update cache first
Audience
Whoever pays first will get the ticket
After buying the ticket, you need to compete to pay, bad experience
Payment must succeed, if reservation is
Need to reserve for one hour
How do we reserve for one hour
Reduce the available
Interviewee:
Session management
Scan, and unlock after 1 hour
Audience:
What happens if we update mysql, and then crash before updating redis
The redis will not have the most up to date
Protect mysql with queue: 3.2 and the unlabeled arrow into order service
Key point is to have a cache for high throughput read
Redis is a buffer
Interviewee:
Direct buy button, small scope in the flow
===
Audience
Life cycle
Where do we get the tickets?
Seller
Monitoring, 报表, lifecycle
1 billion requests: we can throttle at API gateway
1 million within one minute
Separate the hot tickets to a different set of servers
You may have a separate SQL database table
How do we increase
Audience
It’s very specific
After you finish the design
At estimate stage
Sometimes interview starts to small scale
It may be hard to extend to scale up
Try to design MVP. If there is time, extend
Interviewer