System Design for Recommendations and Search // Eugene Yan // MLOps Meetup #78
MLOps.community MLOps.community
22.5K subscribers
60,395 views
0

 Published On Sep 17, 2021

Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

MLOps Community Meetup #78! Last Wednesday we talked to Eugene Yan, an Applied Scientist at Amazon.

//Abstract
How does system design for industrial recommendations and search look like? In this talk, Eugene Yan shares how its often split into:
- Latency-constrained online vs. less-demanding offline environments, and
- Fast but coarse candidate retrieval vs. slower but more precise ranking

We'll also see examples of system design from companies such as Alibaba, Facebook, JD, DoorDash, LinkedIn, and maybe do a quick walk-through on how to implement a candidate retrieval MVP.

//Bio
Eugene Yan designs, builds, and operates machine learning systems that serve customers at scale. He's currently an Applied Scientist at Amazon. Previously, he led the data science teams at Lazada (acquired by Alibaba) and uCare.ai. He writes & speaks about data science, data/ML systems, and career growth at eugeneyan.com and tweets at @eugeneyan.

// Relevant links
eugeneyan.com
applyingml.com
https://www.oreilly.com/library/view/...

-------------- ✌️Connect With Us ✌️ ------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/

Connect with Demetrios on LinkedIn:   / dpbrinkm  
Connect with Eugene on   / eugeneyan  

Timestamps:
[00:10] System Design for Recommendations and Search
[01:37] Why: Batch vs. Real-time
[02:05] Batch
Recommender (key-value DB)
Recommendations refreshed periodically
[02:21] Real-time
Recommender (REST/gRPC)
Recommendations generated in real-time
[02:37] Batch benefits
Pre-computed
Decouple compute from serving
Lower operational load
[03:25] Real-time benefits
Responsive to time-sensitive context
Reduce cost on non-visiting users
[06:50] Focus on real-time aka on-demand
[07:00] Offline vs Online aspect
[07:11] Offline aspect
Host batch processes such as training, index/graph building
Load data into feature stores
[07:23] Online aspect
Uses artifacts from the offline environment to serve requests
Candidate retrieval and ranking
[07:40] Retrieval
Fast but coarse
Searches millions of items to get hundreds of candidates
Approx NN. Graphs, etc.
[08:05] Ranking
Slower but more precise
Ranks hundreds of candidates
Adds more features
Classification or learning to rank
[08:49] Online Retrieval
[09:37] Offline Ranking
[10:50] Online Retrieval
[11:15] Offline Retrieval
[12:25] How: Industry Examples
[12:45] Building item embeddings for candidate retrieval (Alibaba)
[15:31] Building a graph network for ranking (Alibaba)
[17:06] Building embeddings for retrieval in search (Facebook)
[19:10] Building graphs for query expansion and retrieval (DoorDash)
[22:32] Unnecessary real-time over-engineering
[25:05] Real-time timely decision
[26:27] How: Industry Examples (Retrieval)
[26:43] Collaborative Filtering
[30:32] Candidate Retrieval at YouTube (via penultimate embedding)
[32:06] Candidate Retrieval at Instagram (via word2vec)
[33:53] How: Industry Examples (Ranking)
[33:56] Ranking at Google (via sigmoid)
[35:00] Ranking at YouTube (via weighted logistic regression)
[35:31] Ranking at Alibab (via Transformer)
[36:16] How: Building an MVP
[36:22] Training: Self-supervised Representation Learning
[37:20] Ranking: Logistic Regression
[37:21] Retrieval: Approximate nearest neighbors
[38:40] Ranking: Logistic Regression
[39:00] Serving: Multiple instances + Load Balancer (or SageMaker)
[39:38] From two-stage to four-stage
[41:54] Further reading
[43:44] Applied ML page
[52:52] Keeping the habit
[55:26] Recommended books for machine learning

show more

Share/Embed