This 1-day course is for data engineers, architects, data scientists and software engineers who want to use Databricks Delta for ETL processing on Data Lakes. The course ends with a capstone project building a complete data pipeline using Databricks Delta.
Each topic includes lecture content along with hands-on labs in the Databricks notebook environment. Students may keep the notebooks and continue to use them with the free Databricks Community Edition offering after the class ends; all examples are guaranteed to run in that environment.
After taking this class, students will be able to:
- Use the interactive Databricks notebook environment.
- Use Databricks Delta to create, append and upsert data into a Data Lake.
- Use Databricks Delta to manage and extract actionable insights out of a Data Lake.
- Use Databricks Delta’s advanced optimization features to speed up queries.
- Use Databricks Delta to seamlessly ingest streaming and historical data.
- Implement a Databricks Delta data pipeline architecture
Who Can Benefit
Data engineers, software engineers, dev-ops, IT operations, and team-leads with experience using Databricks.
Completed the Getting Started with Apache Spark™ SQL, Getting Started with Apache Spark™ DataFrames, or ETL Part 1 course, or already have similar knowledge