Google Flights

System DesignE-commerce & PaymentsSearch & Indexing

Materials — open to everyone, no sign-in

Topic: Google Flights

Interviewer: 慢慢失败

Interviewee: MiracleGuardian

Level: L5 (Senior)

Additional Resources:


Sign Up Form:

QRCode

Mock System Design Interview Summary

Interview Overview

Date: 1/14/2022

Target level: L5 - senior

Duration: 45 minutes

Topic covered: Google Flights

Drawing tool used: Whimsical

Requirements

Functional requirements

Interviewee: Customer can go to google website and search for flight (src, destination, date)

Customer click certain flight and the website customer to the booking

(expedia, delta, USA - reservation and payment will be out of scope)

Set alarm for tracking flight ticket price (out of scope)

Interviewee: Customers can filter and sort flights based on price, duration, stops

Interviewer: Customer can filter by stops

Customers can track historical price for a given flight

Interviewer: where do google flight get data? It needs to get data

Interviewee: not part of requirement?

Interviewer: it is part of the requirement

Assumption:

Interviewee: Google is not responsible to create flight data, the SSOT of flight data might come from flight companies or booking companies

Interviewer: We need to consider retrieving data as part of the requirements

Non functional requirements:

Availability

Eventual consistency

Scale

High level design

Interviewer: explain?

Interviewee:

Google flight UI: user entry point to the service

Flight management: query the flight database

Interviewer: Who owns service “third party flight data service”

Interviewee: google

Interviewer: how does it get the data?

Interviewee:We can use pull model

Interviewer: can you expand on the blocks in the graph?

As a design doc, will this be sufficient detail?

Interviewee: we can drill down further

Interviewer: what does the API look like?

Interviewee: Query API: request: src

Interviewer: is it a function call?

Interviewee: can be a rest API

GET flight/list

Request: src: string, dest: string, start date, timestamp, stop: boolean

Response: a list of flight instance

Flight instance

Flight:

Flight instance ID

Start date

End date:

Flight:

Flight id:

Flight company:

Stops:

Duration:

Interviewer: why are there 2 types of flights?

Interviewee:

Flight instance:

Flight:

Flight instance ID

Start date

End date:

Flight:

Flight id:

Flight company:

Stops:

Duration:

There are 2 concepts: flight, and flight instance. Each flight can have multiple flight instances

Interviewer: what happens if there is an internal error

Interviewee: should we differentiate between internal error vs no flight?

Exceptions:

  1. 500 internal errors

  2. 400 retriable

===

Interviewer: how do you handle 1 stop, 2 stops, 3 stops?

Interviewee: this will be a filter condition

17:00

Stop: int (0, 1, 2, -1)

===

Interviewer: how do you handle this in the backend? What’s the schema of the flight db?

Interviewee:

Database schema:

Flight

flight ID:

Flight company:

Start Time

End time

Departure:

Arrival:

Duration:

Flight instance:

Departure time:

Flight:

Status:

[14:00]

Flight seat

List tickets

Ticket:

Fare:

Type:

RemainNumber:

[11:00]

Interviewer: do we need to update the column when the user reserves or cancels a flight?

Interviewee: yes

Interviewer: how do we calculate the routes?

Interviewee: in real time or offline

Interviewer: If I need to search multi-stop flights, when do you compute it?

Interviewee:

When can pre-compute popular flights

But for rare flights, we can compute in real time.

Interviewer: what’s the criteria?

Interviewee: based on search pattern

[7:20]

Interviewer: talk about integration with 3rd party

Interviewee: 2 ways

Push model:

assume the flight companies can trigger after an update, and push changes to google

When a flight is booked, then it can push google the seat count changes

Pull model

Google can pull every 5-10 minutes using API from 3rd party

[4:30]

Interviewer: Availability and consistency

Interviewee:

No need for strong consistency

For availability, we can use dynamodb or other nosql for scaling

Interviewer: dynamoDB needs key, partition key.

Interviewee DynamoDB:

Hashkey:

Interviewee: for high availability we can put a caching layer

====

===

====

Interviewer feedback

Probably a senior engineer

No hire L5

Weak hire L4

Go to strong senior team

There are 2 flights. Was not able to understand

Colon. Definition

Requirement gathering:

Did not ask scale during requirement gathering

But scale is important

QPS, peak QPS

Reasoning should be supported by numbers

I want the candidate to be independent

Read and write. Gathering data is part of the requirement

System design

Needs more communication during diagraming

I wanted to know more details.

Adding cache

What is the caching strategy? Cache aside, cache through

I tried to drill down more but did not get more details

Need API and schema

I asked the candidate about these points

Database appeared to be SQL DB, but later changed to NoSQL

I tried to ask for deeper and wider but didn’t get the signal for it

Interviewee:

Requirement

Too much time for requirement gathering

Requirement usually doesn’t include the data in my company

Usually not part of BRD

Diagram:

Not detail enough

API: a bit too late. Should have a better design

In request some information is missing

Database schema

DynamoDB schema usually the same as SQL database

NoSQL vs SQL: we should say it ahead of time

Interviewer:

Usually NoSQL has no primary key

Usually we add index

Interviewee:

DynamoDB schema

Interviewer:

Should go through the design

Start from basic requirement, then API, the database schema

Google probably buys streaming API

Biggest problem to call is it’s expensive

Stock trading: each stream costs

First get to working solution

Then go to details

Audience Feedback

Soft

Audience

Sometimes the interviewee didn’t get the concern of the interviewer

Listen to the question of the feedback

Requirement:

Too much typing

book, change, cancel,

High level confirm (users and system)

Q: basic requirement, ask or confirm

Interviewer:

most of the time ask

===

Hard skill:

Interviewer: there is core puzzle of the question

Audience: multiple stop is a unique puzzle

Functional

Nonfunctional

Database

API

Deepdive

Core puzzle: how to quickly compute multiple stops

We didn’t discuss about the scale

How many airports? How do we compute in real time? Should we pre-compute? Tradeoffs?

Core puzzle2: pull, push

A bit too high level

===

Airports 10k

Currently 40k

10M

1M DA

===

Audience

It looks like the data size is small

Can use SQL DB

Graph DB may help

40,000 airports

100,000 flights

10M

===

Leetcode

BFS

DB: lowest price, smallest stops

Using redis is fine

Audience

Rare flights can be in database

Most flights can be from Redis

Interviewee:

We probably can save the recent dates, and bigger airports

Interviewee:

Best to use NoSQL, easier to scale.

Data orchestration

Mismatch of information between NoSQL and Redis

Some information may be missing in cache, such as how many tickets are available.

Interviewer:

Search: origin, destination, start, end

Some database are only for delta

Some database may be based on price

Location as partition key

Interviewer:

This is probably stored in memory

If it’s one stop

We will precompute international 1 stop flights to major cities

1 stop SF to beijing

SF to 200 major cities.

Start, end

Have another pre-computed route table

Hub to hub

Source hub, destination hub, flights available

Branch to smaller cities

Interviewer:

SQL

Audience:

NoSQL supports sharding

SQL

Audience

How to handle sold-out flights

There will be a trigger to remove the flight