NoSQL database
Topic: NoSQL database
Presenter: Marco
Sign Up for System Design Mock Interviews / Presentations
Sign Up for Job Referrals
https://commitway.com/job-refer
QRCode
工作人员 直通经理上岸三群 分布式系统设计群
System Design Presentation Summary
NoSQL Database
SQL database store things in tables. We need to split the data into different database tables. This leads to low development productivity.
Start of NoSQL: big table and dynamo
Timeline
Document DB MongoDB
Column family store: Cassandra
Key-value database: Redis
Neo4j
Influx
Document databases
Document stored in JSON, BSON
Can perform nested queries
MongoDB is the most popular document database, and most popular
MongoDB vs RDMBS
Collection vs table
Document vs row
_id vs rowid
reference/embedding vs join
Consistency: replica sets
Transactions: atomic transaction at single document level
Availability: high availability through leader-followers. Auto fail-over
Query features: : JSON-like query language
Scaling: Sharding is used to scale writes and reads
Working with MongoDB - data modeling
Embedding vs reference
MongoDB demo
Mongo cli
Mongo compass (UI)
Columnar database
Data is stored in cells
Each row is associated with a row key
Unlimited columns in a row
Cassandra
Column is a name-value pair
Good for write-intensive applications
Cassandra vs RDBMS
Keyspace vs database
Column family vs table
Row vs row
Column vs column
Column database - cassandra
Consistency
Commit log then write into memtable
Transactions
Atomic transaction only at the single row level
Availability
Highly available
Query features
Cassandra query language
Scaling
Sharding for writes and reads
Working with Cassandra
Tables and keys
CREATE TABLE
Primary is also the sharding ey
Primary key ((actor), release_year, movie_id)
Primary key defines how we can query data
Key-value databases
Simplest form
Redis is most popular key-value
Key-value db vs rdbms
Key-value vs row
Consistency applicable on a single key
Transactions: multi/watch/exec. Lua script
Audience question
Partition shard data to individual databases
Duplicate the entire database
Working with Redis
String
We can set expire time
List
LPUSH, RPUSH
Hashes
HSET
HMSET
…
SET ckey “Redis demo”
UI: Redis insight
Graph database
Contains a collection of nodes and edges
Nodes
Edges, loves
Labels
Properties
Graph DB vs RDBMS
Map directly t structures of OO applications vs impedance mismatch problem
Relation
Graph DB - Neo4j
ACID-compliant
Implemented in java
Multi-platform
Supports scale up and out
Billions of entities, schemaless
Consistency: leader-follower
Does not support distributing nodes/relationships on different servers
Sharding is not supported in graph DB
Most popular commands
MATCH (:PERSON (NAME: “DAN”) ) - [: LOVES] -> (whom) RETURN
Create
Neo4j - demo
Can apply graph algorithm such as DFS, BFS
Time series database
Usually used to capture metrics
InfluxDB - famous
Time-structure merge tree
Write-ahead log (WAL)
InfluxQL is SQL-like
Availability - supports horizontal scaling (sharding)
Not a transaction database
InfluxDB vs RDBS
Tags vs indexed columns
Other concepts
Measurement
Time series database Usage
Usually used with other tools
TICK stack (Telegraf, InfluxDB, Chronograf, Kapcitor)
InfluxDB - DEMO
Q&A
What’s the purpose of having column family when you already have colum?
column family is like a table
MongoDB: ecommerce: orders. Blogging
Key-value: cache the user profile, session information. May store shopping cart data
GraphDB: recommendation system/engine. Location based. Any connected data
Cassandra: event logging
Partition and replica
Partition - separate data to different DB nodes
How much data can be held in a single DB?
Depends on the computer’s disk size
Index used by different database