When does class start/end?
Classes begin promptly at 9:00 am, and typically end at 5:00 pm.
When you feel constrained by the computing power of a single computer, you can leverage the Apache Spark platform's massively parallel processing capabilities using PySpark, a Python-based language...
Read MoreWhen you feel constrained by the computing power of a single computer, you can leverage the Apache Spark platform's massively parallel processing capabilities using PySpark, a Python-based language supported by Spark. Along with introducing PySpark, this course covers Spark Shell to interactively explore and manipulate data. Spark SQL is introduced for a uniform programming API to work with structured data. The course ends with covering Pandas for data manipulation and analysis and data visualization with seaborn.
Knowledge of SQL, familiarity with Python (or the ability to learn the basics of a new language)
Chapter 1. Introduction to Apache Spark
Chapter 2. The Spark Shell
Chapter 3. Introduction to Spark SQL
Chapter 4. Practical Introduction to Pandas
Chapter 5. Data Visualization with seaborn in Python
Chapter 6. (Optional) Quick Introduction to Python for Data Engineers
Lab Exercises