TinyURL (used)

System DesignDatabases & Storage

Topic: TinyURL (used)

Interviewer: jiayan

Interviewee: 靖楓

Level: L4 (Experienced Individual Contributor)


Topic

Mock System Design Interview Summary

Interview Overview

Date: 5/5/2022

Target level: L4

Duration: 45 minutes

Topic covered: Tiny URL

Drawing tool used: excalidraw

Key problem vote: ?

Requirements

Functional requirements

Long URL=> short URL

Each long URL should only have one short URL

Length as short as possible

Characters: upper and lower case + numbers

Non functional requirements

Generate 100M per day

100 bytes - long URL

90 bytes - long URL

10 bytes - short URL

100 bytes * 100M * 365 * 5 = 8 TB

100M * 365 * 5 = ~200,000M

Much heavier read than write

System Design

System design

NoSQL:

Key-value pair

Easier to scale, horizontal scale

Easy lookup

Range queries

Hint: Clarifying requirement, if the user types a short URL, the system should return the long URL

Hint: can create 2 tables, long URL maps to short URL, short URL maps to long URL

One write server, multiple read servers

External APIs

Two APIs:

POST API for generating short URL from long URL

GET API for translating short URL into long URL

Hint Assume short URL is 64 bits

Q: Two different long URLs may have the same hash collision (map to the same hash code)

How do you make sure we generate unique URLs?

A: when the hash function returns a value, we check the value in the DB. If the value exists in DB, then attach a random value to the end, and try again

Hash_function(timestamp + long URL)

First 10 characters, sliding window, until the value is unique

Q: Separate service for write and get?

A: yes, we have a lot writes than reads

Add LB to send traffic to read server

Q: is web server same as web browser?

Add DB

Q: let’s worry about the traffic

We can have a cache value to sort the most recently queried short URLs

Q: Is the cache a browser side cache?

A: yes

APIs,

write: POST /api/v1/generate

Original URL

Read: GET /api/v1/get_url/param short_url

Read: GET

https://generateurl.com/{short_url}

How does redirect works?

Hint: should return the right status code 301 for redirect

Q: Read node is 100 times more than the write node

Will you put 100 more read node?

Q: How does database handle read service? Each database have a limitation for handling read requests. How do you handle read requests?

A: we have more than 1 database. Depends on how many database we have, one database will handle the writes (master) and multiple databases will handle the reads (slaves)

We will use eventual consistency to propagate the mapping to the read replicas

Q: what do you want to add to make your system more robust?

A: add cache

Q: if cache space runs out, how do we handle it?

A: we can evict data from cache

Q: what else can you add to make it more robust?

A: _

Q: how do you avoid hackers to generate random short URL to attack the system? Your system will need to go to database to check.

A: we can add rate-limiter on the client

Algorithms: can use sliding window, or use token bucket to drop requests based on limit

Interviewer and Audience Feedback

Interviewer:

Requirement, communication: soft skill is good

Hard skill: understand basic building blocks, but can improve on how to use the building blocks

Interviewee:

Hard skill should be improved

==

Soft skill

Interviewer:

Alex Xu book gives a design pattern for each

Functional

Non functional

API design

High level design

Database design

Deep dive

API design jumped out in the middle

==

Audience:

Requirement gathering needs more improvement

If the user does not provide URL

Parameter may have its own

Malicious/porn URL, how do we handle these situations

API design

Came in too late

Jumped into scalability too early

==

Hard skills

Interviewer:

Load balancer and server

I was not sure if the box represent web browser or web server

URL preprocessing - may not be too important

Key points:

Must fill 1-1 mapping

Database: how to ensure consistency?

Cassandra, if you create 2 tables, how do we know these two tables are in sync?

We may crash between writing 1 table and writing another table

Cassandra can create 1 table + a materialized view.

It’s very important requirement

Cache location is not right

Interviewee: When you write checking cache. This is not ideal

We may need ID generator.

Timestamp + long URL, may still encounter hash collision

Base 64: there are some special characters, ? and = are used. Therefore we should use base 62

Rate limiter: is a bonus.

==

Audience

System design interview is 45 minutes

When we design, should we start with core function, then expand?

Core feature is long->short URL, short->long URL

Then we add scalability

Then we can add DDOS

Interviewer:

It’s very reasonable

===

Audience

System design interview - it feels quite reactive

Can interviewee drive the discussion?

Interviewer:

It’s right for interviewee to drive

I had to drill down because I wanted to figure out the knowledge

Audience

Perhaps experienced people will drive more

Interviewer:

L4: doesn’t need to drive strongly as long as can answer the questions

===

Audience

How to distinguish L4, L5?

Interviewer:

Guarantee database 1-1 mapping:

L4: two tables can work

L5: have to deep dive into guarantee of 1-1 mapping

====

Audience

How to enforce 1-1

Interviewer:

Primary key

Materialized view

2PC can be used to guarantee consistency between the 2 tables

Audience

How to ensure consistency between 2 tables

2PC

First make sure all party are ready to commit

Then commit the changes

==

Step 1: long URL insert to long URL, short URL table (using UUID generator)

Step 2: insert short URL + long URL into the second table

DynamoDB: can use global secondary index

Cassandra: secondary index is not recommended; materialized view is recommended

==

Audience:

Database writing. Should we handle data center going down?

Interviewer:

It will be a bonus if you can handle data center going down.

===

Audience

Cache: Only for reading

Audience

Expiration. TTL

Interviewer:

5 year expiration

===

Redis:

multi - similar to a transaction

Lua - can also implement transaction

Not using queue

===

MySQL requires sharding

100M write

1000 QPS