3 arrows

Winter-Ready Savings Up To $500 Off Training 

closeClose

Analyzing Big Data with R Programming

Accelebrate's Analyzing Big Data with R Programming training teaches attendees how to use In-memory/on-disk, distributed analysis using H20, Hadoop, and Apache Spark, and how to integrate Microsoft Machine Learning Server and R. Skills Gained All students will be able to: Use Data.table to...

Course Code ACCEL-R-ABDP
Duration 4 days
Available Formats Classroom
5090 Reviews star_rate star_rate star_rate star_rate star_half
Course Image

Accelebrate's Analyzing Big Data with R Programming training teaches attendees how to use In-memory/on-disk, distributed analysis using H20, Hadoop, and Apache Spark, and how to integrate Microsoft Machine Learning Server and R.

Skills Gained

All students will be able to:

  • Use Data.table to manipulate large datasets that fit in memory
  • Understand batch processing via SQL queries
  • Implement online learning style models using R
  • Describe the difference between Hadoop and Spark
  • Understand the HDFS file format
  • Use the SparkR package to leverage Spark through an R API
  • Manage data via H20 and the R API
  • Implement models using H20 and the R API
  • Explain how Microsoft R Server and Microsoft R Client work
  • Use R Client with R Server to explore big data held in different data stores
  • Visualize data by using graphs and plots
  • Transform and clean big data sets
  • Build and evaluate regression models generated from big data
  • Create, score, and deploy partitioning models generated from big data
  • Use R in the SQL Server and Hadoop environments

Prerequisites

In addition to their professional experience, students who attend this course should have:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices
  • Basic knowledge of the Microsoft Windows operating system and its core functionality

Course Details

Software Requirements

  • R 3.0 or later with console
  • IDE or text editor of your choice (RStudio recommended)

Big Data with R Training Outline

Introduction

In-memory Big Data: Data.table

  • Why do we need data.table?
  • Why is it
  • The i and the j arguments in data.table
  • Renaming Columns
  • Adding new columns
  • Binning data (continuous to categorical)
  • Combining categorical values
  • Transforming Variables
  • Group-by functions with data.table
  • Handling missing data
  • Long to Wide and Back
  • Merging datasets together
  • Stacking datasets together (concatenation)

SQL Connections and Sequential data updates

SQL Connections and Sequential data updates

  • The biglm package

Data Munging and Machine Learning Via H20

  • Intro to H20
  • Launching the cluster, checking status
  • Data Import, manipulation in H20
  • Unstructured data analysis: Word2Vec
  • Fitting models in H20
  • Generalized Linear Models
  • Naïve Bayes
  • RandomForest
  • Gradient Boosting Machine (GBM)
  • Ensemble model building

Overview of Hadoop

  • Distributed data versus distributed analytics
  • HDFS and map-reduce

Apache Spark

  • Overview of Spark
  • APIs to use Apache Spark with R
  • Sparklyr versus SparkR
  • R, Python, Java and Scala APIs to Spark
  • Applied Examples using SparkR
  • Data import and manipulation in Spark(R)
  • The Spark machine learning library mllib:
  • General Linear Models
  • Random Forest
  • Naïve Bayes

Microsoft Machine Learning Server Overview

  • What is Microsoft R server
  • Using Microsoft R client
  • The ScaleR functions

Data Munging

  • Understanding XDF files
  • Data I/O
  • Variable transformations
  • Data subsetting, splitting, and merging

Data Summarization

  • Creating visualizations
  • Numerical summaries

Processing Big Data

  • Transforming Big Data
  • Managing datasets

Implementing General Linear Models

  • Establishing and leveraging partitions/clusters
  • Fitting regression models and making predictions

Implementing Other Models

  • Decision Trees and Random Forests
  • Naïve Bayes

Conclusion

Contact Us 1-800-803-3948
Contact Us
FAQ Get immediate answers to our most frequently asked qestions. View FAQs arrow_forward