Multimodal AI from First Principles - Neural Nets that can see, hear, AND write.
Neural Breakdown with AVB Neural Breakdown with AVB
6.2K subscribers
6,665 views
0

 Published On May 27, 2023

Generative Large Language Models like OpenAI's GPT-4, Google's PaLM 2, and Discriminative models like ImageBind are models released in 2023 that combine visual and textual input to perform multi-modal tasks. Multimodal modeling combines multiple modalities to train neural networks - images, text, audio, etc empowering ML models to perform amazing multimodal tasks like text-image retrieval, multimodal vector arithmetic, visual question answering, and language modelling.

To support the channel and access the Word documents/slides used in this video, consider JOINING the channel on Youtube or Patreon. Members get access to scripts, slides, animations, and illustrations for most of the videos on my channel!

Patreon -   / neuralbreakdownwithavb  

Follow on Twitter: @neural_avb

In this video, I covered the essential published techniques for Multimodal Modelling and so many amazing results of the past few years that have left my jaws on the floor. Hope you enjoy it!

Watch how Multimodal models generate images:
   • If LLMs are text models, how do they ...  

#deeplearning #languagemodel #gpt #computervision

Papers references in this video:
Unifying Visual-Semantic Embeddings: https://arxiv.org/pdf/1411.2539.pdf
CLIP: https://arxiv.org/abs/2102.02779
ImageBInd: https://arxiv.org/abs/2305.05665
BLIP: https://arxiv.org/abs/2201.12086
HERO: https://arxiv.org/pdf/2005.00200.pdf
VL-T5: https://arxiv.org/pdf/2102.02779.pdf
OFA: https://arxiv.org/abs/2202.03052
SimVLM: https://arxiv.org/abs/2108.10904
Frozen: https://arxiv.org/abs/2106.13884
Flamingo: https://arxiv.org/abs/2204.14198
MiniGPT4: https://arxiv.org/abs/2304.10592
Kosmos-1: https://arxiv.org/abs/2302.14045
PaLM-E: https://arxiv.org/abs/2303.03378

Timestamps:
0:00 - Intro
02:55 - Basics
05:05 - Contrastive Learning
07:54 - Masked Visual Language Models
10:20 - Unified Models
13:41 - Generative LLMs

show more

Share/Embed