SQL Database

System DesignDatabases & Storage

Topic: SQL Database

Presenter: Pauline


System Design Presentation

SQL database

We are looking for presenters !

QRCode:

Diagrams:

https://app.excalidraw.com/l/7yDNW4C8AjE/9OCFEE4WJdL

SQL

MySQL

ISAM, vs InnoDB

Index: ordered storage of data, or pointers to data

Sequential search, binary search, B+ tree search

Binary search: Tree height determines the IO speed

B+ tree:

Internal node

MyISAM is a non-clustered index.

InnoDB is a clustered index. It must have primary key

MyISAM: primary and secondary key is similar. B+ tree leaf nodes point to the data address. A

In InnoDB: secondary key B+ tree points to the primary key. Then it needs to go to B+ tree of primary key to get to data.

Before 2009 default to MyISAM

After 2009 default InnoDB

Composite index: There is order in columns. Saves space.

Multiple indexes: there is an index for every column. Bigger memory consumption

Unique index

Unique constraint

Best practice

Don’t use long string as primary key

Choose column to build index:

Make sure the column can help eliminate choices quickly (e.g. don’t use female/male as index)

Transaction model

An example of why crash may cause problems in the database records

Transaction: multiple changes are done together.

Atomic, consistent, isolation, durability

Consistency is the most important.

Biggest problem is concurrent access

Log recovery - keep atomic and durability

Detect wrong transactions and rolls it back

Concurrency issues

Read only no problem

Write - read: may read incorrectly

Write - write: may lose data updated

Different isolation levels:

Read uncommitted

Read committed

Repeatable read

Serializable

Read uncommitted’s problems: Dirty write, dirty read.

Read commit’s problems: there may be unrepeatable commit

Read repeatable: Phantom reads

Transaction isolation and isolation levels.

MySQL - repeatable read

Oracle - read committed

Optimistic concurrency control

Read->compute->validate->Write

Pessimistic concurrency control

Validate -> read -> compute -> write

Concurrency-control protocols - lock based protocol

Concurrrency-control protocols - lock based

Shared lock (S)

Exclusive lock (X)

Deadlock, Hungry

Concurrency-control protocols - timestamp based

W_TS(X) write timestamp

R_TS(X) read timestamp

Rollback conditions:

Write item (X): R_TS(X) > TS(T) or W_TS(X) > TS(T)

Read item (X): W_TS(X) > TS(T).

Time-based control is pessimistic

Write immediately but does not commit

Only validate when committing

===

Write based log (WAL)

Add, delete, update records in WAL

Intermediate update and delay update

Undo WAL: record old value, when system rollback, it takes original state

Redo WAL: record new value, when system crashes, it will redo to get new value