Training
Cloudera
Developer
Cloudera Developer Training for Spark & Hadoop

545 Reviews star_rate star_rate star_rate star_rate star_half

Cloudera Developer Training for Spark & Hadoop

This four-day hands-on training course delivers the key concepts and expertise developers need to use Apache Spark to develop high-performance parallel applications. Participants will learn how to...

By Request

$3,520 USD GSA $2,736.27

Course Code DEV-S-H

Duration 4 days

Available Formats Classroom

Enter your Email to Download Full Course Details

This four-day hands-on training course delivers the key concepts and expertise developers need to use Apache Spark to develop high-performance parallel applications. Participants will learn how to use Spark SQL to query structured data and Spark Streaming to perform real-time processing on streaming data from a variety of sources. Developers will also practice writing applications that use core Spark to perform ETL processing and iterative algorithms. The course covers how to work with “big data” stored in a distributed file system, and execute Spark applications on a Hadoop cluster. After taking this course, participants will be prepared to face real-world challenges and build applications to execute faster decisions, better decisions, and interactive analysis, applied to a wide variety of use cases, architectures, and industries.

Who Can Benefit

This course is designed for developers and engineers who have programming experience, but prior knowledge of Hadoop and/or Spark is not required.

Prerequisites

This course is designed for developers and engineers who have programming experience, but prior knowledge of Spark and Hadoop is not required. Apache Spark examples and hands-on exercises are presented in Scala and Python. The ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful.

Course Details

Course Outline

1. Introduction

2. Introduction to Apache Hadoop and the Hadoop Ecosystem

Apache Hadoop Overview
Data Processing
Introduction to the Hands-On Exercises

3. Apache Hadoop File Storage

Apache Hadoop Cluster Components
HDFS Architecture
Using HDFS

4. Distributed Processing on an Apache Hadoop Cluster

YARN Architecture
Working With YARN

5. Apache Spark Basics

What is Apache Spark?
Starting the Spark Shell
Using the Spark Shell
Getting Started with Datasets and DataFrames
DataFrame Operations

6. Working with DataFrames and Schemas

Creating DataFrames from Data Sources
Saving DataFrames to Data Sources
DataFrame Schemas
Eager and Lazy Execution

7. Analyzing Data with DataFrame Queries

Querying DataFrames Using Column Expressions
Grouping and Aggregation Queries
Joining DataFrames

8. RDD Overview

RDD Overview
RDD Data Sources
Creating and Saving RDDs
RDD Operations

9. Transforming Data with RDDs

Writing and Passing Transformation Functions
Transformation Execution
Converting Between RDDs and DataFrames

10. Aggregating Data with Pair RDDs

Querying Tables in Spark Using SQL
Querying Files and Views
The Catalog API
Comparing Spark SQL, Apache Impala, and Apache Hive-on-Spark

11. Querying Tables and Views with SQL

Querying Tables in Spark Using SQL
Querying Files and Views
The Catalog API

12. Working with Datasets in Scala

Datasets and DataFrames
Creating Datasets
Loading and Saving Datasets
Dataset Operations

13. Writing, Configuring, and Running Spark Applications

Writing a Spark Application
Building and Running an Application
Application Deployment Mode
The Spark Application Web UI
Configuring Application Properties

14. Spark Distributed Processing

Review: Apache Spark on a Cluster
RDD Partitions
Example: Partitioning in Queries
Stages and Tasks
Job Execution Planning
Example: Catalyst Execution Plan
Example: RDD Execution Plan

15. Distributed Data Persistence

DataFrame and Dataset Persistence
Persistence Storage Levels
Viewing Persisted RDDs

16. Common Patterns in Spark Data Processing

Common Apache Spark Use Cases
Iterative Algorithms in Apache Spark
Machine Learning
Example: k-means

17. Introduction to Structured Streaming

Apache Spark Streaming Overview
Creating Streaming DataFrames
Transforming DataFrames
Executing Streaming Queries

18. Structured Streaming with Apache Kafka

Overview
Receiving Kafka Messages
Sending Kafka Messages

19. Aggregating and Joining Streaming DataFrames

Streaming Aggregation
Joining Streaming DataFrames

20. Conclusion

A. Message Processing with Apache Kafka

What Is Apache Kafka?
Apache Kafka Overview
Scaling Apache Kafka
Apache Kafka Cluster Architecture
Apache Kafka Command Line Tools

Read Less

0 options available

There are currently no scheduled dates for this course. If you are interested in this course, request a course date with the links above. We can also contact you when the course is scheduled in your area.

Request Other Date Request On-site Course

When does class start/end?

Classes begin promptly at 9:00 am, and typically end at 5:00 pm.

Does the course schedule include a Lunchbreak?

Lunch is normally an hour long and begins at noon. Coffee, tea, hot chocolate and juice are available all day in the kitchen. Fruit, muffins and bagels are served each morning. There are numerous restaurants near each of our centers, and some popular ones are indicated on the Area Map in the Student Welcome Handbooks - these can be picked up in the lobby or requested from one of our ExitCertified staff.

How can someone reach me during class?

If someone should need to contact you while you are in class, please have them call the center telephone number and leave a message with the receptionist.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

How do I find an ExitCertified training location?

We have training locations across the United States and Canada. View a full list of classroom training locations.

Which delivery formats are available?

At ExitCertified we offer training that is Instructor-Led, Online, Virtual and Self-Paced.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

ExitCertified instructors have an average of 27 years of practical IT experience. They have also served as consultants for an average of 15 years. To stay up to date, instructors will at least spend 25 percent of their time learning new emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth. We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact customerexp@exitcertified.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How to request a W9 for ExitCertified LLC?

View our filing status and how to request a W9.

Eric was a superb instructor that effortlessly explained complex topics in a simple and a fun way.

ExitCertified Student

ExitCertified

It was very pleasant experience in going through this training with Charles Hardin. It is truly amazing to see Charles passion in teaching and technology.

ExitCertified Student

ExitCertified

The combination of Eric's lecture and the class materials enabled me to learn and practice the concepts went over in this training course.

ExitCertified Student

ExitCertified

I came to learn and get some high level understanding of Hadoop and Spark. I feel confident about all Spark can do and can apply this to my work

ExitCertified Student

ExitCertified

Joel is one of the best instructor to listen to and explain the way you expect from a specialist . He is the best in industry.

ExitCertified Student

ExitCertified

Cloudera Developer Training for Spark & Hadoop

Overview

Schedule

FAQ

Reviews

Who Can Benefit

Prerequisites

Course Details

Course Outline

When does class start/end?

Does the course schedule include a Lunchbreak?

How can someone reach me during class?

What languages are used to deliver training?

What does GTR stand for?

How do I find an ExitCertified training location?

Which delivery formats are available?

Does ExitCertified deliver group training?

What does vendor-authorized training mean?

Is the training too basic, or will you go deep into technology?

How up-to-date are your courses and support materials?

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Do you provide hands-on training and exercises in an actual lab environment?

Will you customize the training for our company’s specific needs and goals?

How do I get started with certification?

Will I get access to content after I complete a course?

How to request a W9 for ExitCertified LLC?

Alert!

Modal Title

Error!

Default Title

Prompt

Confirm

Login

Cloudera Developer Training for Spark & Hadoop

Overview

Schedule

FAQ

Reviews

Who Can Benefit

Prerequisites

Course Details

Course Outline

Upcoming Course Dates

Drag & Drop a File Here

When does class start/end?

Does the course schedule include a Lunchbreak?

How can someone reach me during class?

What languages are used to deliver training?

What does GTR stand for?

How do I find an ExitCertified training location?

Which delivery formats are available?

Does ExitCertified deliver group training?

What does vendor-authorized training mean?

Is the training too basic, or will you go deep into technology?

How up-to-date are your courses and support materials?

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Do you provide hands-on training and exercises in an actual lab environment?

Will you customize the training for our company’s specific needs and goals?

How do I get started with certification?

Will I get access to content after I complete a course?

How to request a W9 for ExitCertified LLC?

Alert!

Modal Title

Error!

Default Title

Prompt

Confirm

Login