In this course data engineers apply data transformation and writing best practices such as user-defined functions, join optimizations, and parallel database writes. By the end of this course, you will transform complex data with custom functions, load it into a target database, and navigate Databricks and Spark documents to source solutions.
- Apply built-in functions to manipulate data
- Write UDFs with a single DataFrame column inputs
- Apply UDFs with a multiple DataFrame column inputs and that return complex types
- Employ table join best practices relavant to big data environments
- Repartition DataFrames to optimize table inserts
- Write to managed and unmanaged tables
ETL Part 1 self-paced course.
- Course Overview and Setup
- Common Transformations
- User Defined Functions
- Advanced UDFs
- Joins and Lookup Tables
- Database Writes
- Table Management
- Capstone Project: Custom Transformations, Aggregating and Loading
Supported platforms include Azure Databricks, Databricks Community Edition, and non-Azure Databricks.
- If you're planning to use the course on Azure Databricks, select the "Azure Databricks" Platform option.
- If you're planning to use the course on Databricks Community Edition or on a non-Azure version of Databricks, select the "Other Databricks" Platform option.
The course is a series of seven self-paced lessons available in both Scala and Python. A final capstone project involves writing custom, generalizable transformation logic to population data warehouse summary tables and efficiently writing the tables to a database. Each lesson includes hands-on exercises.