Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - 680

The TWIML AI Podcast with Sam Charrington

17.7K subscribers

320 views

About
Share

Published On Apr 17, 2024

Today we're joined by Alex Havrilla, a PhD student at Georgia Tech, to discuss "Teaching Large Language Models to Reason with Reinforcement Learning." Alex discusses the role of creativity and exploration in problem solving and explores the opportunities presented by applying reinforcement learning algorithms to the challenge of improving reasoning in large language models. Alex also shares his research on the effect of noise on language model training, highlighting the robustness of LLM architecture. Finally, we delve into the future of RL, and the potential of combining language models with traditional methods to achieve more robust AI reasoning.

🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confi...

🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/
Follow us on Twitter: / twimlai
Follow us on LinkedIn: / twimlai

📖 CHAPTERS
===============================
00:00 - Introduction
02:19 - RL vs RLHF
06:22 - The state of RL
07:31 - Path to online learning
11:04 - Teaching LLMs to reason with RL
31:10 - ARB
34:45 - The importance of storing information
35:15 - Static and dynamic noise
45:06 - Conclusion

🔗 LINKS & RESOURCES
===============================
Teaching Large Language Models to Reason with Reinforcement Learning - https://arxiv.org/abs/2403.04642
ARB: Advanced Reasoning Benchmark for Large Language Models - https://arxiv.org/pdf/2307.13692.pdf
Proximal Policy Optimization Algorithms - https://arxiv.org/abs/1707.06347
Prioritized Level Replay - https://arxiv.org/pdf/2010.03934.pdf
Direct Preference Optimization: Your Language Model is Secretly a Reward Model - https://arxiv.org/pdf/2305.18290.pdf
trlX documentation - https://trlx.readthedocs.io/en/latest/

📸 Camera: https://amzn.to/3TQ3zsg
🎙️Microphone: https://amzn.to/3t5zXeV
🚦Lights: https://amzn.to/3TQlX49
🎛️ Audio Interface: https://amzn.to/3TVFAIq
🎚️ Stream Deck: https://amzn.to/3zzm7F5

Published On Apr 17, 2024

Share/Embed

Video Link