Bird's Eye View of Ads Systems

System DesignAds & Monetization

Materials — open to everyone, no sign-in

Topic: Bird’s Eye View of Ads Systems

Presenter: Leo 玩中生活

Additional Resources:


System Design Presentation - Bird’s Eye View of Ads System

Join Us on Wechat

Subscribe to Our YouTube channel https://commitway.com/eventyoutube

Level of audience: Ads system 101

Source of information: internet

Topics

Ads’ importance

Use cases of ads system

Engineering design

Challenges

Why is ads system important?

Connect product with people

2% of US GDP

76% of Google gross revenue

27% year over year growth of amazon ads revenue

Internet users:

See ads during website visit

Goal is to have fun

May not like to see too much ads

Business owner:

Like to deliver ads to target audience based on location, gender, interest etc

Like to set up my budget

Like to track the performance of the ads

Platform owner

Grow demand (advertisers)

Grow supply (active users)

A simplified flow of ads system

Business owner: create an ad (add to ads inventory)

Ads platform system to index the ads inventory

User visits a website, which loads ads from the ads ranking system of the ads platform

Website sends user activity: , userID, adID, placement

User clicks into the ads and takes action. Website sends user action to the platform

Business owner: gets a report of

Impression

Purchases

Business owner may adjust budget based on the feedback

Ads measurement feeds into ads ranking

Inventory service

Requirements: Strong consistency, durability, scalability (millions of ads)

Relational database for strong consistency

Sharding for availability

Replication for durability

Index service

Goal: increase the speed for retrieving ads

Skip list, B+ tree in relational DB

LSM tree in key-value database

Inverted index in Lucene(document search system).

From keywords to ads ID

Business owner can Define audience group (gender, geolocation, habit, age)

Inverted index can create index from the above criteria to ads ID

Ranking service

30 relevant ads from millions of inventory within XX milliseconds

Solutions:

1st layer ranking - millions of ads to dozen thousands of ads

2nd layer ranking - dozen thousands of ads to hundreds of ads

Another heavier model to choose 30 ads

Billing service

Goal: Highly available, end to end consistency

Solutions:

Action async to ensure high availability

Idempotent to ensure exactly once (no more, no less)

Message queue - exactly once

Global unique ID + dedupe

[typical latency?]

Measurement service: Aggregation reporting

Ads impression record: in ads ID, ads ID, timestamp

Ads action record: user ID, action (e.g. purchase)

Attribution: Link action with impression

Aggregation

Measurement service:

Support near real-time reporting on dozens axis

Message queue + streaming, e.g. kafka + flink

Sharding

Goal:

divide workload

batch the actions

Memory pre-batch to reduce

Use cases

Inventory ads, consistent hashing

Budget check, report aggregation

Group related transactions together, sharding by ads ID

Batch in memory, then update in database

[inventory ads, ]

Error handling

Infrastructure

Backoff retry

Regional traffic control - manual or automated cut-off

Model fallback

Cold start: new ads or new user

Last seen

Trending

Similarity (offline)

Random

Privacy challenges

Platform limitation

3rd party cookie

Cross app tracking. Will come back as during time interval, total purchase, but not specific users

Regulation limitation

Digital marketing act

Online behavior advertising

More restrictive to leverage 3rd party data

Impact

Least impact for amazon (first party data)

Medium impact for Google (lots of data source)

Most impact for social platform such Meta (conversion data came from 3rd party)

[conversion tracking most difficult]

Audience:

Q: how does cross app tracking limitation impact ads tracking?

A: Impression, conversion.

3rd party signal - cannot be used for targeting. Or lose the data from 3rd party

Q: billing system latency?

A: sharding, batch (billing service, flink) shard by ads ID - same ad always handle the same machine

Q: Is today’s talk about one company?

A: it’s about big ads companies

Q: how do you learn about the system design of the overall system?

A: knowledge sharing in the company. Bilibili + youtube

Q: memory related inconsistency issues?

A: crash machine?

Q: fail-over + source of truth? Is truth in memory or database

A: billing only one service as writer, source of truth

Data loss: may lose data. Checkpoint on localhost

Q: sharding?

A: message queue sharding for events.

Storage sharding. Ads - budget limitation

Message queue and storage sharding are independent

Q: budget is always calculated by one machine

A: adsID 1111 -> may go to 1000 machines -> message queue sharding will send them back to the same machine.

5 minutes later we will flush the events into master of database

Hot task -> subsharding

Q: auto scaling?

A: yes

Q: challenges?

A: users + ads, two tower sharding.

Online model: sparse neural network

Q: Aggregation on the same machine every 1 minute. Strong consistency. Will it run over the budget?

A: simplified flow. In reality, we will have a pacer to control the speed of ads impression

Ads ranking system will call pacer.

Q: what is ads impression

A: I view the ad, it played for a few seconds

Q: 3rd party data

A:

signal turn from 3rd party to 1st party.

Aggregated data - there are 3 conversions in the last hour. May do some prediction using model

Platform limitation is only through platform. Advertiser can call advertising platform’s API

Q: how good are online courses for ads infrastructure?

A: online courses can provide you an overview. Work experience

Q: adblock

A: need to study more. May block the request to the ads platform

Q: big challenges

A:

Recommendation

Big data

billing: financial, exact once, high throughput