Apache Spark Programming

Course Details
Code: DB105
Tuition (USD): $2,500.00 • Classroom (3 days)
$2,500.00 • Virtual (3 days)

This 3-day course provides a thorough review of the Apache Spark framework, including the "Spark fundamentals" with specific emphasis on skills development and the unique needs of a Data Engineering team through the use of lecture and hands-on labs.

This course is combined with DB 100 - Apache Spark Overview to provide a comprehensive overview of the Apache Spark framework for Data Engineers.

After working through the Apache Spark fundamentals on the first day, the following days resume with more advanced APIs and techniques such as a review of specific Readers & Writers, broadcast table joins, additional SQL functions, and more hands-on labs. Additionally, the Structured Streaming demos from day #1 are replaced with broader, streaming-specific, lectures, and labs.

Throughout the three day course, participants are also introduced into more of the Apache Spark architecture. Topics include, but are not limited to, the DAG Execution model, an introduction to the Catalyst Optimizer, and Spark-Partitioning.

Skills Gained

After taking this class, students will be able to:

  • This course is ideal for Data Engineers that are new to Apache Spark or that have been using Apache Spark for less than one year
  • This course is suitable for SQL Analyst seeking to grow beyond simple SQL queries and into the use of the DataFrame APIs
  • This course is suitable for Data Analyst, Data Scientists, and ML Practitioners that have a stronger engineering background and would like to benefit from a deeper understanding of the architecture and APIs

Who Can Benefit

  • This course is ideal for Data Engineers that are new to Apache Spark or that have been using Apache Spark for less than one year
  • This course is suitable for SQL Analyst seeking to grow beyond simple SQL queries and into the use of the DataFrame APIs
  • This course is suitable for Data Analyst, Data Scientists, and ML Practitioners that have a stronger engineering background and would like to benefit from a deeper understanding of the architecture and APIs

Prerequisites

  • Knowledge of SQL is helpful
  • Experience with either Python or Scala is required
  • Some familiarity with Apache Spark or other big-data processing frameworks is helpful but not required

Course Details

Software & Hardware Requirements

  • Web Browser: Chrome
  • An Internet Connection
  • GoToTraining (for remote classes only)
  • A computer, laptop, or tablet with a keyboard

Course Outline

  • About Databricks, Spark
  • A high-level overview of the Spark Architecture
  • Spark Entry Points, Simple Data Injestion & overview of API docs
  • Hands-on practice with different data injestion options
  • Hands-on practice with the DataFrames APIs
  • Introduction to Spark's execution model
  • Hands-on practice with performance optimization
  • Introduction to Structured Streaming
  • Introduction to Machine Learning Pipelines
Contact Us 1-800-803-3948
Contact Us Live Chat
FAQ Get immediate answers to our most frequently asked qestions. View FAQs arrow_forward