Google Flights
Materials — open to everyone, no sign-in
Topic: Google Flights
Interviewer: 慢慢失败
Interviewee: MiracleGuardian
Level: L5 (Senior)
Additional Resources:
Sign Up Form:
QRCode
Mock System Design Interview Summary
Interview Overview
Date: 1/14/2022
Target level: L5 - senior
Duration: 45 minutes
Topic covered: Google Flights
Drawing tool used: Whimsical
Requirements
Functional requirements
Interviewee: Customer can go to google website and search for flight (src, destination, date)
Customer click certain flight and the website customer to the booking
(expedia, delta, USA - reservation and payment will be out of scope)
Set alarm for tracking flight ticket price (out of scope)
Interviewee: Customers can filter and sort flights based on price, duration, stops
Interviewer: Customer can filter by stops
Customers can track historical price for a given flight
Interviewer: where do google flight get data? It needs to get data
Interviewee: not part of requirement?
Interviewer: it is part of the requirement
Assumption:
Interviewee: Google is not responsible to create flight data, the SSOT of flight data might come from flight companies or booking companies
Interviewer: We need to consider retrieving data as part of the requirements
Non functional requirements:
Availability
Eventual consistency
Scale
High level design
Interviewer: explain?
Interviewee:
Google flight UI: user entry point to the service
Flight management: query the flight database
Interviewer: Who owns service “third party flight data service”
Interviewee: google
Interviewer: how does it get the data?
Interviewee:We can use pull model
Interviewer: can you expand on the blocks in the graph?
As a design doc, will this be sufficient detail?
Interviewee: we can drill down further
Interviewer: what does the API look like?
Interviewee: Query API: request: src
Interviewer: is it a function call?
Interviewee: can be a rest API
GET flight/list
Request: src: string, dest: string, start date, timestamp, stop: boolean
Response: a list of flight instance
Flight instance
Flight:
Flight instance ID
Start date
End date:
Flight:
Flight id:
Flight company:
Stops:
Duration:
Interviewer: why are there 2 types of flights?
Interviewee:
Flight instance:
Flight:
Flight instance ID
Start date
End date:
Flight:
Flight id:
Flight company:
Stops:
Duration:
There are 2 concepts: flight, and flight instance. Each flight can have multiple flight instances
Interviewer: what happens if there is an internal error
Interviewee: should we differentiate between internal error vs no flight?
Exceptions:
500 internal errors
400 retriable
===
Interviewer: how do you handle 1 stop, 2 stops, 3 stops?
Interviewee: this will be a filter condition
17:00
Stop: int (0, 1, 2, -1)
===
Interviewer: how do you handle this in the backend? What’s the schema of the flight db?
Interviewee:
Database schema:
Flight
flight ID:
Flight company:
Start Time
End time
Departure:
Arrival:
Duration:
Flight instance:
Departure time:
Flight:
Status:
[14:00]
Flight seat
List
Ticket:
Fare:
Type:
RemainNumber:
[11:00]
Interviewer: do we need to update the column when the user reserves or cancels a flight?
Interviewee: yes
Interviewer: how do we calculate the routes?
Interviewee: in real time or offline
Interviewer: If I need to search multi-stop flights, when do you compute it?
Interviewee:
When can pre-compute popular flights
But for rare flights, we can compute in real time.
Interviewer: what’s the criteria?
Interviewee: based on search pattern
[7:20]
Interviewer: talk about integration with 3rd party
Interviewee: 2 ways
Push model:
assume the flight companies can trigger after an update, and push changes to google
When a flight is booked, then it can push google the seat count changes
Pull model
Google can pull every 5-10 minutes using API from 3rd party
[4:30]
Interviewer: Availability and consistency
Interviewee:
No need for strong consistency
For availability, we can use dynamodb or other nosql for scaling
Interviewer: dynamoDB needs key, partition key.
Interviewee DynamoDB:
Hashkey:
Interviewee: for high availability we can put a caching layer
====
===
====
Interviewer feedback
Probably a senior engineer
No hire L5
Weak hire L4
Go to strong senior team
There are 2 flights. Was not able to understand
Colon. Definition
Requirement gathering:
Did not ask scale during requirement gathering
But scale is important
QPS, peak QPS
Reasoning should be supported by numbers
I want the candidate to be independent
Read and write. Gathering data is part of the requirement
System design
Needs more communication during diagraming
I wanted to know more details.
Adding cache
What is the caching strategy? Cache aside, cache through
I tried to drill down more but did not get more details
Need API and schema
I asked the candidate about these points
Database appeared to be SQL DB, but later changed to NoSQL
I tried to ask for deeper and wider but didn’t get the signal for it
Interviewee:
Requirement
Too much time for requirement gathering
Requirement usually doesn’t include the data in my company
Usually not part of BRD
Diagram:
Not detail enough
API: a bit too late. Should have a better design
In request some information is missing
Database schema
DynamoDB schema usually the same as SQL database
NoSQL vs SQL: we should say it ahead of time
Interviewer:
Usually NoSQL has no primary key
Usually we add index
Interviewee:
DynamoDB schema
Interviewer:
Should go through the design
Start from basic requirement, then API, the database schema
Google probably buys streaming API
Biggest problem to call is it’s expensive
Stock trading: each stream costs
First get to working solution
Then go to details
Audience Feedback
Soft
Audience
Sometimes the interviewee didn’t get the concern of the interviewer
Listen to the question of the feedback
Requirement:
Too much typing
book, change, cancel,
High level confirm (users and system)
Q: basic requirement, ask or confirm
Interviewer:
most of the time ask
===
Hard skill:
Interviewer: there is core puzzle of the question
Audience: multiple stop is a unique puzzle
Functional
Nonfunctional
Database
API
Deepdive
Core puzzle: how to quickly compute multiple stops
We didn’t discuss about the scale
How many airports? How do we compute in real time? Should we pre-compute? Tradeoffs?
Core puzzle2: pull, push
A bit too high level
===
Airports 10k
Currently 40k
10M
1M DA
===
Audience
It looks like the data size is small
Can use SQL DB
Graph DB may help
40,000 airports
100,000 flights
10M
===
Leetcode
BFS
DB: lowest price, smallest stops
Using redis is fine
Audience
Rare flights can be in database
Most flights can be from Redis
Interviewee:
We probably can save the recent dates, and bigger airports
Interviewee:
Best to use NoSQL, easier to scale.
Data orchestration
Mismatch of information between NoSQL and Redis
Some information may be missing in cache, such as how many tickets are available.
Interviewer:
Search: origin, destination, start, end
Some database are only for delta
Some database may be based on price
Location as partition key
Interviewer:
This is probably stored in memory
If it’s one stop
We will precompute international 1 stop flights to major cities
1 stop SF to beijing
SF to 200 major cities.
Start, end
Have another pre-computed route table
Hub to hub
Source hub, destination hub, flights available
Branch to smaller cities
Interviewer:
SQL
Audience:
NoSQL supports sharding
SQL
Audience
How to handle sold-out flights
There will be a trigger to remove the flight