Hybrid Vision Transformer Model (ViT)

1.4K subscribers

2,723 views

About
Share

Published On May 2, 2021

Hybrid Vision Transformer (ViT) from “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale” paper explained as fast as possible.

An episode of AIQuickie with TensorFlow2.x code.

▬ Contents of this video ▬▬▬▬▬▬▬▬▬▬ 👀

0:00 - Intro
0:56 - Theory part
2:13 - Code part

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Link to the Notebook: https://github.com/EscVM/EscVM_YT/blo...

Link to the paper: https://arxiv.org/pdf/2010.11929.pdf

Published On May 2, 2021

Share/Embed

Video Link