Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

25.7K subscribers

284,496 views

About
Share

Published On May 28, 2023

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.

Paper: Attention is all you need - https://arxiv.org/abs/1706.03762

Slides PDF: https://github.com/hkproj/transformer...

Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference

Published On May 28, 2023

Share/Embed

Video Link