NoSQL database

System DesignDatabases & Storage

Topic: NoSQL database

Presenter: Marco


Sign Up for System Design Mock Interviews / Presentations

Sign Up for Job Referrals

https://commitway.com/job-refer

QRCode

工作人员 直通经理上岸三群 分布式系统设计群

System Design Presentation Summary

NoSQL Database

SQL database store things in tables. We need to split the data into different database tables. This leads to low development productivity.

Start of NoSQL: big table and dynamo

Timeline

Document DB MongoDB

Column family store: Cassandra

Key-value database: Redis

Neo4j

Influx

Document databases

Document stored in JSON, BSON

Can perform nested queries

MongoDB is the most popular document database, and most popular

MongoDB vs RDMBS

Collection vs table

Document vs row

_id vs rowid

reference/embedding vs join

Consistency: replica sets

Transactions: atomic transaction at single document level

Availability: high availability through leader-followers. Auto fail-over

Query features: : JSON-like query language

Scaling: Sharding is used to scale writes and reads

Working with MongoDB - data modeling

Embedding vs reference

MongoDB demo

Mongo cli

Mongo compass (UI)

Columnar database

Data is stored in cells

Each row is associated with a row key

Unlimited columns in a row

Cassandra

Column is a name-value pair

Good for write-intensive applications

Cassandra vs RDBMS

Keyspace vs database

Column family vs table

Row vs row

Column vs column

Column database - cassandra

Consistency

Commit log then write into memtable

Transactions

Atomic transaction only at the single row level

Availability

Highly available

Query features

Cassandra query language

Scaling

Sharding for writes and reads

Working with Cassandra

Tables and keys

CREATE TABLE

Primary is also the sharding ey

Primary key ((actor), release_year, movie_id)

Primary key defines how we can query data

Key-value databases

Simplest form

Redis is most popular key-value

Key-value db vs rdbms

Key-value vs row

Consistency applicable on a single key

Transactions: multi/watch/exec. Lua script

Audience question

Partition shard data to individual databases

Duplicate the entire database

Working with Redis

String

We can set expire time

List

LPUSH, RPUSH

Hashes

HSET

HMSET

SET ckey “Redis demo”

UI: Redis insight

Graph database

Contains a collection of nodes and edges

Nodes

Edges, loves

Labels

Properties

Graph DB vs RDBMS

Map directly t structures of OO applications vs impedance mismatch problem

Relation

Graph DB - Neo4j

ACID-compliant

Implemented in java

Multi-platform

Supports scale up and out

Billions of entities, schemaless

Consistency: leader-follower

Does not support distributing nodes/relationships on different servers

Sharding is not supported in graph DB

Most popular commands

MATCH (:PERSON (NAME: “DAN”) ) - [: LOVES] -> (whom) RETURN

Create

Neo4j - demo

Can apply graph algorithm such as DFS, BFS

Time series database

Usually used to capture metrics

InfluxDB - famous

Time-structure merge tree

Write-ahead log (WAL)

InfluxQL is SQL-like

Availability - supports horizontal scaling (sharding)

Not a transaction database

InfluxDB vs RDBS

Tags vs indexed columns

Other concepts

Measurement

Time series database Usage

Usually used with other tools

TICK stack (Telegraf, InfluxDB, Chronograf, Kapcitor)

InfluxDB - DEMO

Q&A

What’s the purpose of having column family when you already have colum?

column family is like a table

MongoDB: ecommerce: orders. Blogging

Key-value: cache the user profile, session information. May store shopping cart data

GraphDB: recommendation system/engine. Location based. Any connected data

Cassandra: event logging

Partition and replica

Partition - separate data to different DB nodes

How much data can be held in a single DB?

Depends on the computer’s disk size

Index used by different database