This hands-on self-paced training course targets Analysts and Data Scientists getting started using Databricks to analyze big data with Apache Spark™ DataFrames. The course ends with a capstone project demonstrating Exploratory Data Analysis with Spark DataFrames on Databricks.
- Use the interactive Databricks notebook environment
- Examine external data sets
- Query existing data sets using Spark DataFrames
- Visualize query results and data using the built-in Databricks visualization features
- Perform exploratory data analysis using Spark DataFrames
- Learn to translate SQL statements to DataFrame syntax
Who Can Benefit
- Primary Audience: Data Scientists and Engineers
- Secondary Audience: Data Analysts
- Programming in Scala or Python required.
- Getting Started and Accessing the Course
- Querying Files with DataFrames
- Aggregations and JOINs
- Uploading and Accessing Data
- Querying JSON & Hierarchical Data with DataFrames
- Querying Data Lakes with DataFrames
- Capstone Project: Exploratory Data Analysis
- Supported platforms include Azure Databricks, Databricks Community Edition, and non-Azure Databricks.
- If you're planning to use the course on Azure Databricks, select the "Azure Databricks" Platform option.
- If you're planning to use the course on Databricks Community Edition or on a non-Azure version of Databricks, select the "Other Databricks" Platform option.
The course is a series of six self-paced lessons plus a final capstone project performing Exploratory Data Analysis using Spark DataFrames on Databricks. Each lesson includes hands-on exercises.