Design Google Calendar

System DesignScheduling & Calendar

Materials — open to everyone, no sign-in

Topic: Design Google Calendar

Interviewer: system

Level: L5 (Senior)

Additional Resources:


System Design Interview

Join Us on Wechat

Interview Notes

[7]

Requirements

500M total users

100M DAU

QPS: 1000 QPS for write

10k QPS for reading

[peak traffic?]

[11]

API:

Booking event

Request:

Sender, invitee, title, time, location, test

GET /calendar-event

[how to handle anonymous user?]

[13]

Relational database

[throughput calculation?]

Events

eventID, sender, invitee(string), title, time, location, text

[database normalization? Some information may be repeated]

[17]

eventID, sender, invitee(string), title, location, from_ts, to_ts

Calculate data volume

1.5TB per year

15TB over 10 years

[ volume is (too) big for single instance ]

[how to support multiple invitees? Will there be multiple rows]

[invitee is email or ?]

[read write throughput for one machine? Is it sufficient?]

[how to update the meeting time?]

[21]

[23]

How to support large number of invitees

Range query for meetings within an interval

Select * from meetings where time > from_ts and time < to_ts

May have 2 indices but still need to merge the result

[ in relational DB, can use composite index]

Optimize the query with slot ID

[27]

Optimizing for range query

[ spending too much time on range query optimization. Like to see other components in the design ]

[30]

Defining primary key

PK: sender + eventID

Index: dayID

Range query:

Continue on index

Index: participant + day ID

Select *

from meetings

where dayID in (day1, day2) and participant = “alf”

3k meeting per month

40k meetings per person

Need pagination

[38 - 54]

Like to clarify these questions:

[how to support multiple invitees? Will there be multiple rows]

[invitee is email or ID?]

[read write throughput for one machine? Is it sufficient?]

[how to update the meeting time?]

Invitees by email?

[database normalization. Some fields are repeated across rows]

Q: Primary key?

A: normalize the table. Factor out event detail table

[41]

Change event time requires change of multiple rows

[44]

[ system design is not complete yet]

Q: Handling users outside of the system?

A: add email notification system

Add meeting processor for changes of the events

[asynchronous]

[ discuss distributed system ]

Q: how to send email with high throughput

A: add message queue

[48]

[relational vs non-relational database: throughput and volume can be too high. For relational database, we may shard by time range. Throughput is still high ]

Q: can DB handle 1000 QPS?

A: small writes. Could be writing 3000 rows per second due to multi-invitees for some meetings

Backup longer term data

Prefer to use same database technology for old and new data

===

Discussion after Interview

Shard by time range may not be good

The QPS is high. May add cache for today’s calendar

Can create 2 databases:

User

Events - a very big table.

It’s difficult to support event table using traditional database.

Indices to maintain: user, events, time range. Difficult to maintain index on large data volume

Recommend to use non-relational database

How to solve conflict?

If another user has a meeting at the same spot.

User can receive multiple invitations at the same time

Should gather the requirements

If we need to support the meeting room.

Event - location, start-time, end-time.

Rest API can also solve this problem.

No-sql database, may require atomic support across different machines

How to shard?

Can use user ID to shard / partition key

Heavier read, lighter write

Usually use invitee ID as partition key

Possible points for drill down

Database schema

How to sync data to client

How to save to database, and how to index the database

Micro services

1:24:49:10

1:25:47:09