Hybrid Vision Transformer Model (ViT)
EscVM EscVM
1.4K subscribers
2,723 views
0

 Published On May 2, 2021

Hybrid Vision Transformer (ViT) from “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale” paper explained as fast as possible.

An episode of AIQuickie with TensorFlow2.x code.

▬ Contents of this video ▬▬▬▬▬▬▬▬▬▬ 👀

0:00 - Intro
0:56 - Theory part
2:13 - Code part

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Link to the Notebook: https://github.com/EscVM/EscVM_YT/blo...

Link to the paper: https://arxiv.org/pdf/2010.11929.pdf

show more

Share/Embed