Live Webinar - ITIL 4 Overview - What’s New from ITIL v3 to ITIL 4

closeClose

ETL Part 2 - Transformations and Loads

Course Details
Code: ETL2-TRAN-SELF
Tuition (USD): $75.00 • Self Paced
Generate a quote

In this course data engineers apply data transformation and writing best practices such as user-defined functions, join optimizations, and parallel database writes. By the end of this course, you will transform complex data with custom functions, load it into a target database, and navigate Databricks and Spark documents to source solutions.

Skills Gained

  • Apply built-in functions to manipulate data
  • Write UDFs with a single DataFrame column inputs
  • Apply UDFs with a multiple DataFrame column inputs and that return complex types
  • Employ table join best practices relavant to big data environments
  • Repartition DataFrames to optimize table inserts
  • Write to managed and unmanaged tables

Prerequisites

ETL Part 1 self-paced course.

Course Details

Course Outline

  • Course Overview and Setup
  • Common Transformations
  • User Defined Functions
  • Advanced UDFs
  • Joins and Lookup Tables
  • Database Writes
  • Table Management
  • Capstone Project: Custom Transformations, Aggregating and Loading

Platforms

Supported platforms include Azure Databricks, Databricks Community Edition, and non-Azure Databricks.

  • If you're planning to use the course on Azure Databricks, select the "Azure Databricks" Platform option.
  • If you're planning to use the course on Databricks Community Edition or on a non-Azure version of Databricks, select the "Other Databricks" Platform option.

Format

The course is a series of seven self-paced lessons available in both Scala and Python. A final capstone project involves writing custom, generalizable transformation logic to population data warehouse summary tables and efficiently writing the tables to a database. Each lesson includes hands-on exercises.