Winter Savings - Save on IT Training Using Promo Code FROSTBYTE

closeClose

Cloudera Developer Training

  • Tuition USD $2,235 GSA  $1,914.11
  • Reviews star_rate star_rate star_rate star_rate star_half 1527 Ratings
  • Course Code SPARK-HADOOP-OD
  • Available Formats Self Paced

This OnDemand offering provides you with a 180-day subscription that begins on the date of purchase.

Skills Gained

Through instructor-led discussion and interactive, hands-on exercises, participants will learn Apache Spark and how it integrates with the entire Hadoop ecosystem, learning:

  • How data is distributed, stored, and processed in a Hadoop cluster
  • How to use Sqoop and Flume to ingest data
  • How to process distributed data with Apache Spark
  • How to model structured data as tables in Impala and Hive
  • How to choose the best data storage format for different data usage patterns
  • Best practices for data storage

Prerequisites

This course is designed for developers and engineers who have programming experience. Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful. Prior knowledge of Hadoop is not required.

Course Details

This four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark, Hive, Flume, Sqoop, and Impala, this training course is the best preparation for the real-world challenges faced by Hadoop developers. Participants learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools

CCA Spark & Hadoop Developer

This course is an excellent place to start for people working towards the CCA Spark & Hadoop Developer certification. Although further study is required before passing the exam, this course covers many of the subjects tested in the CCA Spark & Hadoop Developer exam. After successfully completing this course, we recommend that participants attend Cloudera’s Designing and Building Big Data Applications course, which builds on the foundations taught here.

Introduction to Hadoop and the Hadoop Ecosystem

  • Problems with Traditional Large-Scale Systems
  • Hadoop!
  • Data Storage and Ingest
  • Data Processing
  • Data Analysis and Exploration
  • Other Ecosystem Tools
  • Introduction to the Hands-On Exercises

Hadoop Architecture and HDFS

  • Distributed Processing on a Cluster
  • Storage: HDFS Architecture
  • Storage: Using HDFS
  • Resource Management: YARN Architecture
  • Resource Management: Working with YARN

Importing Relational Data with Apache Sqoop

  • Sqoop Overview
  • Basic Imports and Exports
  • Limiting Results
  • Improving Sqoop’s Performance
  • Sqoop 2

Introduction to Impala and Hive

  • Introduction to Impala and Hive
  • Why Use Impala and Hive?
  • Querying Data With Impala and Hive
  • Comparing Hive and Impala to Traditional Databases

Modeling and Managing Data with Impala and Hive

  • Data Storage Overview
  • Creating Databases and Tables
  • Loading Data into Tables
  • HCatalog
  • Impala Metadata Caching

Data Formats

  • Selecting a File Format
  • Hadoop Tool Support for File Formats
  • Avro Schemas
  • Using Avro with Impala, Hive, and Sqoop
  • Avro Schema Evolution
  • Compression

Data File Partitioning

  • Partitioning Overview
  • Partitioning in Impala and Hive

Capturing Data with Apache Flume

  • What is Apache Flume?
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Channels
  • Flume Configuration

Spark Basics

  • What is Apache Spark?
  • Using the Spark Shell
  • RDDs (Resilient Distributed Datasets)
  • Functional Programming in Spark

Working with RDDs in Spark

  • Creating RDDs
  • Other General RDD Operations

Writing and Deploying Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating the SparkContext
  • Building a Spark Application (Scala and Java)
  • Running a Spark Application
  • The Spark Application Web UI
  • Configuring Spark Properties
  • Logging

Parallel Processing in Spark

  • Review: Spark on a Cluster
  • RDD Partitions
  • Partitioning of File-Based RDDs
  • HDFS and Data Locality
  • Executing Parallel Operations
  • Stages and Tasks

Spark RDD Persistence

  • RDD Lineage
  • RDD Persistence Overview
  • Distributed Persistence

Common Patterns in Spark Data Processing

  • Common Spark Use Cases
  • Iterative Algorithms in Spark
  • Graph Processing and Analysis
  • Machine Learning
  • Example: k-means

DataFrames and Spark SQL

  • Spark SQL and the SQL Context
  • Creating DataFrames
  • Transforming and Querying DataFrames
  • Saving DataFrames
  • DataFrames and RDDs
  • Comparing Spark SQL, Impala, and Hive-on-Spark

When does class start/end?

Classes begin promptly at 9:00 am, and typically end at 5:00 pm.

Does the course schedule include a Lunchbreak?

Lunch is normally an hour long and begins at noon. Coffee, tea, hot chocolate and juice are available all day in the kitchen. Fruit, muffins and bagels are served each morning. There are numerous restaurants near each of our centers, and some popular ones are indicated on the Area Map in the Student Welcome Handbooks - these can be picked up in the lobby or requested from one of our ExitCertified staff.

How can someone reach me during class?

If someone should need to contact you while you are in class, please have them call the center telephone number and leave a message with the receptionist.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals, and private on sites. View our group training page for more information.

The course on Redshift was excellent! As a DBA/DA I found there is a quite a bit of learning for Oracle professionals to move into AWS.

Great company to work with. Have taken training with Exit in the past. This experience is consistent with the others.

This is a great way to learn online. Courses are well structured and clearly explained with very dedicated staff.

It was quick and easy to sign up and attend the class. Instructions were easy to follow. I appreciate the reminder emails of class date/time.

Everything went well. Enrollment process was easy and I always felt welcomed throughout the whole experience.

Contact Us 1-800-803-3948
Contact Us Live Chat
FAQ Get immediate answers to our most frequently asked qestions. View FAQs arrow_forward