Intro to Amazon EMR - Big Data Tutorial using Spark
YouTube Viewers YouTube Viewers
9.72K subscribers
14,696 views
0

 Published On Sep 9, 2023

Edit*
Make sure you encrypt your Spark script as you upload it inside S3 (timestamp: 13:42)
There's a small typo in line 41 of the code, should be "add_argument"

Intro
Today we're going to talk about a popular tool in Data Engineering. Amazon EMR is an industry-leading big data platform. It's a really mature service developed way back in 2009, and draws a lot of heuristics from the Apache Hadoop project. EMR is used for processing terabytes worth of data, and training machine learning models. In this tutorial, we'll dive deep into EMR's architecture, a live demo on how to trigger jobs using Steps, and demonstrate how to use Spark to extrapolate data from Amazon S3. Hope you enjoy this one!

Timestamps ⏰
0:00 Intro
1:16 Overview of Amazon EMR
5:10 Create filesystem, VPC, and configure EMR cluster
9:04 Writing our Spark script
13:42 3 ways to Trigger Steps in EMR
18:32 SSH into Resource Manager in YARN
19:50 Enable EMR managed auto-scaling
20:57 Summary

Notes from video 📝
https://bittersweet-mall-f00.notion.s...

Who am I? 🙋🏻‍♂️
I'm Jay, I love making videos about travel, self-help and tech. I currently work in New York City as a data engineer, but I grew up in Malaysia and lived in the UK when I was 19. Back then, I had no idea what life was about, moving to so many places, navigating career in Tech. Today, I've learned a lot and wanna share my perspective through filmmaking.

Socials 📱
instagram:   / jayzern  

Sub Count: 4,539

show more

Share/Embed