Topic: Cloud File Storage

Interviewer: Tekken

Interviewee: Tom (慢慢失败)

Level: L5 (Senior)

Topic

Mock System Design Interview Summary

Interview Overview

Date: 2/6/2022

Target level: L5

Duration: 45 minutes

Topic covered: Cloud storage system

Drawing tool used: Excalidraw.com

Requirements

Functional requirements

Google drive: can edit/download/upload

Dropbox: local folder and cloud folder sync

Permission control

public/private

List files in the client/web

View files in the client/web

Sync the local files to cloud

Upload files/Download files/Delete files

Non functional requirements

Availability: 99.9%

Reliability: do not lose user data

Latency: List files should have realtime user experience

Capacity:

DAU - 10M

Total users - 50M

Assume: on average, every user has 1 GB files in our cloud

Storage: 50M * 1GB =

Network bandwidth:

Assumption: on average, in DAU, each user upload 100M

Upload network bandwidth = 10M * 100MB / (24 h * 3600s)

1:2 upload/download ratio

Download network bandwidth = 20M * 100MB / (24 h * 3600s)

of files per user:

Assume on average, a file has size of 50MB

=> 1GB / 50MB for a single user => 200 files

Interviewer: 10 billion files total

System Design

Starting about 7:25

External APIs

System design

Interviewer: If you upload a file from one device, how does this sync the file to a different device?

Interviewee: let’s discuss the components at high level first.

Interviewer: how to upload file?

Interviewee: let’s define the API first

API:

UploadFileRequest(user_token, file_path, file(data_stream), description, title, permission)

DownloadFileRequest(user_token, file_path) - file(data_stream)

DeleteFile(user_token, file_path)

ListFile(user_token, folder_path, pagination, sort) -> list<file_meta_data>

Interviewer: do we need multiple services (permission service, upload service, etc.)?

Interviewee: yes. The sync functionality can also be achieved by these services.

Cloud file storage

FileMetadataTable: (nosql + transaction)

File_id, user_id,

permission[public/private],

Title, description, file_path,

list<chunk_address>, creation_time, last_updated_time

File system:

Store these chunks

Interviewer: you need ACID?

Interviewee: yes. Strong consistency -> always read the latest write. So we may not need strong consistency

Interviewer: should you draw a new database component?

Interviewee: let’s finish this part of design first

To get all these chunks:

Upload in chunks.

Pro: we can always resume when we stop.

Con: It will need some client side support (cost client side CPU)

Upload the entire file and parse them to chunks

Pro: Easy to implement, supports web client

Con: If upload fails, we have to start from beginning

Adding “File Processor” to generate metadata

Workflow:

User upload: call “uploadFileRequest” there are 2 APIs, one is for device, one is for web client

Syncing file to client:

A few possible ways:

Websocket:

Pro: bidirectional, can upload/download

Con: expensive to keep connection

Pull/long polling:

Pro: cheap, easy to implement

Con: for download, we need another request

Personal prefer to use websocket

Interviewer: How do you know which part of the file requires sync?

Interviewee: 2 clients can modify the same file

To handle race condition, the behavior may not be expected

Interviewer: User A modify existing file, upload in chunk

Once the upload is done.

Next step is we need to notify other devices, which file which chunk is changed

Interviewee: file processor, we do dedup and comparison. How many chunks were updated.

Add “fanout message queue”

Dedup:

Options

Based on the check sum (recommended)

Based on the entire data

Interviewer and Audience Feedback

Audience:

参考答案：https://www.youtube.com/watch?v=PE4gwstWhmc Dropbox Senior Engineer design at Stanford University.

Interviewer

Need a working solution

5 minutes: requirement

25 minutes: system architecture design

5-10 minutes for normal case

If more time: 1-2 questions

Database, table design

Notification

Assumption: interviewee can ask interviewer question

I didn’t quite follow the diagram, but didn’t want to interrupt

Interrupt or not

Metadata - on cloud storage

Need to separate storage and database

===

Soft skill feedback from audience

Some interviewer interrupts

Some interviewee doesn’t interrupt

The right thing is interviewee’s reaction, should solicited feedback regularly

If we don’t interrupt, but the design is not what I expected, then it’s probably no hire

Some interviewer doesn’t interrupt and just provide no hire feedback

Interviewee can ask more feedback

Interviewer asked twice how to sync. Interviewee pushed back twice.

Interviewee can adapt to the pace of the interviewer

Interviewee: it feels like the question is too early to be asked

Interviewer: it feels the diagram cannot handle all use cases. I will probably draw high-level

It looks like notification portion is missing, so I provided hint for

Audience: interviewee can discuss the focus of the interview

The design doesn’t have an impressive aspect

Give interviewer some multiple choice

Audience: interviewee want to have complete coverage

Interviewer can suggest that this part is too much detail, e.g. upload, download is too

10 minutes spent on

Audience: not provide a complete requirement. Can ask the interviewer what are the key features

Interviewer wanted to show the share feature

Interviewer wanted to provide hint for

Availability: 99.9%?

Audience: 2 nines, 3 nines, will that be enough

There is no proof - 99.9%.

Handle scalability

It depends on level. Availability

Interviewer: collect positive signal. (compared to other candidates)

High availability may be a possible point of discussion.

Audience: 3-9 or 4-9 not provable

High availability

Low latency

Reliability

The numbers are not

Hard skill feedback

One topic per device?

10 topics for 10 devices?

If there is only 1 queue per user, then the message may be consumed

Long-lived connection

SSE (server sent event) may be better

If there are too many users

Message queue 50,000 connections

Database is harder to scale (partition)

Redis for metadata cache

Audience

Meta data stores the chunk

If file storage server crashes, then it gets to a strange state

If it’s not a transaction, then it can crash after uploaded to cloud storage

1 file to multiple chunks

If there are 2 chunks, then are there 2 messages?

Answer: yes

https://whimsical.com/google-drive-BBsXFU8DQX7tp9CMyTXASs

Chunking is done on the client

Don’t use S3 chunking

Client already chunked

MVP: chunk is an optimization

If we don’t do chunk, then it will still

Client: there is a client service.

Directly talking to file storage service

Sync, notification, first talk to client service

If there is a web client, then where is the client service?

If it’s an app, then client service sits on device

If it’s a web client, where is the client service?

There is local storage for web browser, but there is a limit

Many browser doesn’t have local storage

Chunker: only an app can do chunking

If there is webclient, then we don’t need to do chunking

At the beginning we don’t need to discuss chunk

Multiple client modify different part of the file?

Can use lock

For example, google sheet can update at different places

Try to lock a unit (such as a line in the file)

Google sheet - last modification wins

Google drive - does not solve this problem. The sync has a problem. It will just create different copies of the same file.

Workable solution

First upload: create meta data, then upload, then confirm upload finished

Message queue: what’s the topic?

Producer, consumer. What’s the topic.

For example, 10 devices. Then there will be 10 topics each one for one device.

Why do we use device ID as topic?

If there is one topic, then if we consume 1 message, it will disappear

You can use one queue. Notification server: which clients are connected, then push to connected clients. Then consume the message

Every time an offline client connects, then it will be diff it’s own meta data to meta data in the cloud

There may be multiple metadata service.

One in China and one in US

We should have multiple copies of the metadata

S3 chunked upload

First it will create the metadata, including all chunks

Then the chunks can be uploaded

After completing 100 chunks, then mark that as finished

Modified design after discussion

Materials — open to everyone, no sign-in

of files per user: