Published On May 2, 2021
Hybrid Vision Transformer (ViT) from “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale” paper explained as fast as possible.
An episode of AIQuickie with TensorFlow2.x code.
▬ Contents of this video ▬▬▬▬▬▬▬▬▬▬ 👀
0:00 - Intro
0:56 - Theory part
2:13 - Code part
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Link to the Notebook: https://github.com/EscVM/EscVM_YT/blo...
Link to the paper: https://arxiv.org/pdf/2010.11929.pdf
show more