Google File System, a Planet Scale Distributed File System
Materials — open to everyone, no sign-in
Topic: Google File System, a Planet Scale Distributed File System
Presenter: Longwei
Additional Resources:
Sign up for future events:
System Design Presentation Summary
Date: 9/11/2022
Topic: distributed file system
Presenter: Leo
Top 4 layers are conceptual
The master is a single machine (single point of failure)
Client only upload data to one replica (the closest)
Network is the most expensive resource
Naively we can assume each set of operation is performed on each chunk
Chunk location is not persisted
Chunk server will sync with master during boot
Summary review:
Master supports metadata
Real data is stored in the chunk server
Meta information is very small X kB information
10 GB memory - 100T data
Q: how to handle master crash
A: master has write-ahead-log. Master can be restarted within 1 minute
Audience: there is a shadow master. Can recover from log
May get to a state where we fail to write to some replica.
Big idea
Client sends data to the memory of 3 replicas
Client asks 3 replicas to save from memory to disk
Lease: a node is a master of a version only for 1 minute
When appending more data than a chunk, GFS will start a new chunk.
Shadow master: similar to slaves in master/slave architecture where master can be used for write and read, and slaves can be used for read.
Are operation log sent to shadow master
Trade-offs
Chunk size = 64MB
Larger chunk size: client has less communication with master; master requires less memory
Not good for small sizes.
Not all master data is strongly consistent to gain performance.
At least once vs exactly once
Using at least once; may have duplicate; faster write, slower read
Offset decided by GFS
Cannot support fast random access
Two primary on network partitions?
Yes. resolved by lease + version
Master fails?
Human intervention
Stale replica
Audience Discussion
If we modify 1 chunk, will it bump up a version?
Snapshotting is related to file copying. When copying a file to another file, the other file is a shallow copy, until a modification happens.
Why can we write at most once during one minute?
Lease has TTL of 1 minute.
What if the access is longer than 1 minute?
Because we are sending a trunk, which should be fast.
Every request will get a lease from the master.
Master may change from one version to another version.
It seems the version number can solve the problem of ordering.