How to Deploy Models at Scale with GPUs | TransformX 2022

23.6K subscribers

2,110 views

About
Share

Published On Oct 25, 2022

Graphics Processing Units (GPUs) are used for training artificial intelligence and deep learning models, particularly those related to ML inference use cases. However, using GPUs to deploy models at scale can create several challenges for ML practitioners. In this session, Varun Mohan, CEO and Co-Founder of Exafunction, will share the best practices he’s learned to build an architecture that optimizes GPUs for deep learning workloads. Mohan will explain the advantages for using GPUs for ML deployment, as well as where they might not have as many benefits. Mohan will also discuss cost, memory, and other factors in the GPU-vs-CPU equation. He will also discuss inefficiencies that may arise in different scenarios and some of the issues related to network bandwidth and egress. Mohan will offer techniques, including the importance of batching workloads and optimizing your models, to solve these problems. Finally, he will discuss how some companies are using GPUs to run their recommendation and serving systems. Before Exafunction, Mohan was a technical lead and senior manager at Nuro, where he saw the power of deep learning and the large challenges of productionizing it at scale.

👉 Check out more here: https://scl.ai/3zpj1DN

Published On Oct 25, 2022

Share/Embed

Video Link