Getting Started with Apache Spark DataFrames

Course Details
Code: SPARK-DF-SELF
Tuition (USD): $75.00 • Self Paced
Generate a quote

This hands-on self-paced training course targets Analysts and Data Scientists getting started using Databricks to analyze big data with Apache Spark™ DataFrames. The course ends with a capstone project demonstrating Exploratory Data Analysis with Spark DataFrames on Databricks.

Skills Gained

  • Use the interactive Databricks notebook environment
  • Examine external data sets
  • Query existing data sets using Spark DataFrames
  • Visualize query results and data using the built-in Databricks visualization features
  • Perform exploratory data analysis using Spark DataFrames
  • Learn to translate SQL statements to DataFrame syntax

Who Can Benefit

  • Primary Audience: Data Scientists and Engineers
  • Secondary Audience: Data Analysts

Prerequisites

  • Programming in Scala or Python required.

Course Details

Course Outline

  • Getting Started and Accessing the Course
  • Querying Files with DataFrames
  • Aggregations and JOINs
  • Uploading and Accessing Data
  • Querying JSON & Hierarchical Data with DataFrames
  • Querying Data Lakes with DataFrames
  • Capstone Project: Exploratory Data Analysis

Platforms

  • Supported platforms include Azure Databricks, Databricks Community Edition, and non-Azure Databricks.
  • If you're planning to use the course on Azure Databricks, select the "Azure Databricks" Platform option.
  • If you're planning to use the course on Databricks Community Edition or on a non-Azure version of Databricks, select the "Other Databricks" Platform option.

Format: SELF-PACED

The course is a series of six self-paced lessons plus a final capstone project performing Exploratory Data Analysis using Spark DataFrames on Databricks. Each lesson includes hands-on exercises.