Performing Data Engineering on Microsoft HD Insight (90 Day)

Course Details
Code: OD20775
Tuition (USD): $870.00 $652.50 • Self Paced
Generate a quote
This course is available in other formats
Instructor-Led Classroom & Virtual
Performing Data Engineering on Microsoft HD Insight (20775)

The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Skills Gained

After completing this course, students will be able to:

  • Explain how Microsoft R
  • Transform and clean big data sets

Who Can Benefit

The primary audience for this course is data engineers, data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

Prerequisites

In addition to their professional experience, students who attend this course should have:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.
  • Working knowledge of relational databases.

Course Details

Outline

Module 1: Getting Started with HDInsightThis module introduces Hadoop, the MapReduce paradigm, and HDInsight.
Lessons

  • Big Data
  • Hadoop
  • MapReduce
  • HDInsight
Lab : Querying Big Data
  • Query data with Hive
  • Visualize data with Excel
Aftercompleting this module, students will be able to:
  • DescribeBig data.
  • DescribeHadoop.
  • DescribeMapReduce.
  • DescribeHDInsight.
Module 2: Deploying HDInsight ClustersAt the end of this module the student will be able to deploy HDInsight clusters.
Lessons
  • HDInsight cluster types
  • Managing HDInsight Clusters
  • Managing HDInsight Clusters with PowerShell
Lab : Managing HDInsight clusters with the Azure Portal
  • Create an HDInsight Hadoop Cluster
  • Customise HDInsight using a script action
  • Customize HDInsight using Bootstrap
  • Delete an HDInsight cluster
Aftercompleting this module, students will be able to:
  • Describe HDInsight cluster types.
  • Describe the creation, management, and deletion of HDInsightclusters with the Azure portal.
  • Describethe creation, management, and deletion of HDInsight clusters with PowerShell.
Module 3: Authorizing Users to Access ResourcesThis module covers permissions and the assignment of permissions.
Lessons
  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters
Lab : Authorizing Users to Access Resources
  • Configure a domain-joined HDInsight cluster
  • Configure Hive policies
Aftercompleting this module, students will be able to:
  • Describe how to authorize user access to objects.
  • Describe how to authorize users to execute code.
  • Describehow to manage domain-joined HDInsight clusters.
Module 4: Loading data into HDInsightThis module covers loading data into HDInsight.
Lessons
  • HDInsight Storage
  • Data loading tools
  • Performance and reliability
Lab : Loading Data into HDInsight
  • Loading data using Sqoop
  • Loading data using AZcopy
  • Loading data using ADLcopy
  • Use HDInsight to compress data
Aftercompleting this module, students will be able to:
  • Describe HDInsight storage configurations and architectures.
  • Describe options for loading data into HDInsight.
  • Describe benefits of compression and pre-processing inHDInsight.
Module 5: Troubleshooting HDInsightThis module describes how to troubleshoot HDInsight.
Lessons
  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite
Lab : Troubleshooting HDInsight
  • Analyze HDInsight logs
  • Analyze YARN logs
  • Monitor resources with Operations Management Suite
After completing this module, students will be able to:
  • Analyze HDInsight logs.
  • Analyze YARN logs.
  • Analyze Heap dumps.
  • Use the operations management suite to monitor resources.
Module 6: Implementing Batch SolutionsThis module describes how to implement batch solutions.
Lessons
  • Apache Hive storage
  • Querying with Hive and Pig
  • Operationalize HDInsight
Lab : Backing Up SQL Server Databases
  • Load data into a hive table
  • Query data with Hive and Pig
Aftercompleting this module, students will be able to:
  • Describe Apache Hive storage.
  • Query data using Hive and Pig.
  • Operationalize HDInsight.
Module 7: Design Batch ETL solutions for big data with SparkThis module describes how to design batch ETL solutions for big data with Spark.
Lessons
  • What is Spark?
  • ETL with Spark
  • Spark performance
Lab : Design Batch ETL solutions for big data with Spark.
  • Create a HDInsight Cluster with access to Data Lake Store
  • Use HDInsight Spark cluster to analyze data in Data Lake Store
  • Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
  • Managing resources for Apache Spark cluster on Azure HDInsight
Aftercompleting this module, students will be able to:
  • DescribeSpark and when to use it.
  • Describethe use of ETL with Spark.
  • AnalyzeSpark performance.
Module 8: Analyze Data with Spark SQLThis module describes how to analyze data with Spark SQL.
Lessons
  • Implement interactive queries
  • Perform exploratory data analysis
Lab : Analyze data with Spark SQL
  • Implement interactive queries
  • Perform exploratory data analysis
After completing this module, students will be able to:
  • Implement interactive queries.
  • Perform exploratory data analysis.
Module 9: Analyze Data with Hive and PhoenixThis module describes how to analyze data with Hive and Phoenix.
Lessons
  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix
Lab : Analyze data with Hive and Phoenix
  • Implement interactive queries for big data with interactive Hive
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix
Aftercompleting this module, students will be able to:
  • Implementinteractive queries with interactive Hive.
  • Performexploratory data analysis using Hive.
  • Performinteractive processing by using Apache Phoenix.
Module 10: Stream AnalyticsThis module introduces Azure Stream Analytics.
Lessons
  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs
Lab : Implement Stream Analytics
  • Process streaming data with stream analytics
  • Managing stream analytics jobs
Aftercompleting this module, students will be able to:
  • Describestream analytics and its capabilities.
  • Processstreaming data with stream analytics.
  • Managestream analytics jobs.
Module 11: Spark Streaming using the DStream APIThis module introduces the Dstream API and describes how to create Spark structured streaming applications.
Lessons
  • Dstream
  • Create Spark structured streaming applications
  • Persistence and visualization
Lab : Spark streaming applications using DStream API
  • Creating Spark streaming applications using the DStream API
  • Creating Spark structured streaming applications
Aftercompleting this module, students will be able to:
  • ExplainDStream.
  • CreateSpark structured streaming applications.
  • Describepersistence and visualization.
Module 12: Develop big data real-time processing solutions with Apache StormThis module explains how to develop big data real-time processing solutions with Apache Storm.
Lessons
  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm
Lab : Developing big data real-time processing solutions with Apache Storm
  • Stream data with Storm
  • Create Storm Topologies
Aftercompleting this module, students will be able to:
  • Persistlong term data.
  • Streamdata with Storm.
  • CreateStorm topologies.
ConfigureApache Storm.Module 13: Analyze Data with Spark SQLThis module describes how to analyze data with Spark SQL.
Lessons
  • Implement interactive queries
  • Perform exploratory data analysis
Lab : Analyze data with Spark SQL
  • Implement interactive queries
  • Perform exploratory data analysis
Aftercompleting this module, students will be able to:
  • Implementinteractive queries.
  • Performexploratory data analysis.