3 arrows

Summer-Ready Savings: Up To $500 Off Training 

closeClose

Analyzing Big Data with R Programming

  • Contact Us For Pricing
  • Reviews star_rate star_rate star_rate star_rate star_half 4119 Ratings
  • Course Code ACCEL-R-ABDP
  • Duration 4 days
  • Available Formats Classroom

Accelebrate's Analyzing Big Data with R Programming training teaches attendees how to use In-memory/on-disk, distributed analysis using H20, Hadoop, and Apache Spark, and how to integrate Microsoft Machine Learning Server and R.

Skills Gained

All students will be able to:

  • Use Data.table to manipulate large datasets that fit in memory
  • Understand batch processing via SQL queries
  • Implement online learning style models using R
  • Describe the difference between Hadoop and Spark
  • Understand the HDFS file format
  • Use the SparkR package to leverage Spark through an R API
  • Manage data via H20 and the R API
  • Implement models using H20 and the R API
  • Explain how Microsoft R Server and Microsoft R Client work
  • Use R Client with R Server to explore big data held in different data stores
  • Visualize data by using graphs and plots
  • Transform and clean big data sets
  • Build and evaluate regression models generated from big data
  • Create, score, and deploy partitioning models generated from big data
  • Use R in the SQL Server and Hadoop environments

Prerequisites

In addition to their professional experience, students who attend this course should have:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices
  • Basic knowledge of the Microsoft Windows operating system and its core functionality

Course Details

Software Requirements

  • R 3.0 or later with console
  • IDE or text editor of your choice (RStudio recommended)

Big Data with R Training Outline

Introduction

In-memory Big Data: Data.table

  • Why do we need data.table?
  • Why is it
  • The i and the j arguments in data.table
  • Renaming Columns
  • Adding new columns
  • Binning data (continuous to categorical)
  • Combining categorical values
  • Transforming Variables
  • Group-by functions with data.table
  • Handling missing data
  • Long to Wide and Back
  • Merging datasets together
  • Stacking datasets together (concatenation)

SQL Connections and Sequential data updates

SQL Connections and Sequential data updates

  • The biglm package

Data Munging and Machine Learning Via H20

  • Intro to H20
  • Launching the cluster, checking status
  • Data Import, manipulation in H20
  • Unstructured data analysis: Word2Vec
  • Fitting models in H20
  • Generalized Linear Models
  • Naïve Bayes
  • RandomForest
  • Gradient Boosting Machine (GBM)
  • Ensemble model building

Overview of Hadoop

  • Distributed data versus distributed analytics
  • HDFS and map-reduce

Apache Spark

  • Overview of Spark
  • APIs to use Apache Spark with R
  • Sparklyr versus SparkR
  • R, Python, Java and Scala APIs to Spark
  • Applied Examples using SparkR
  • Data import and manipulation in Spark(R)
  • The Spark machine learning library mllib:
  • General Linear Models
  • Random Forest
  • Naïve Bayes

Microsoft Machine Learning Server Overview

  • What is Microsoft R server
  • Using Microsoft R client
  • The ScaleR functions

Data Munging

  • Understanding XDF files
  • Data I/O
  • Variable transformations
  • Data subsetting, splitting, and merging

Data Summarization

  • Creating visualizations
  • Numerical summaries

Processing Big Data

  • Transforming Big Data
  • Managing datasets

Implementing General Linear Models

  • Establishing and leveraging partitions/clusters
  • Fitting regression models and making predictions

Implementing Other Models

  • Decision Trees and Random Forests
  • Naïve Bayes

Conclusion

When does class start/end?

Classes begin promptly at 9:00 am, and typically end at 5:00 pm.

Does the course schedule include a Lunchbreak?

Lunch is normally an hour long and begins at noon. Coffee, tea, hot chocolate and juice are available all day in the kitchen. Fruit, muffins and bagels are served each morning. There are numerous restaurants near each of our centers, and some popular ones are indicated on the Area Map in the Student Welcome Handbooks - these can be picked up in the lobby or requested from one of our ExitCertified staff.

How can someone reach me during class?

If someone should need to contact you while you are in class, please have them call the center telephone number and leave a message with the receptionist.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals, and private on sites. View our group training page for more information.

Classromm confortable and enviroment. Instructir knowlegeable of course matarial but need to align to students working in the Canadian Federal Govt. ve US Federal govt.

Great company -- easy to sign up and very organized. Loved my teacher and class overall.

Fantastic and great training. Tons of hands-on labs to really make you understand the material being thought.

Instructor was great, course was mostly very good except for too much focus on pricing

Class was very informative, although one lab didnt but will try again later

0 options available

There are currently no scheduled dates for this course. If you are interested in this course, request a course date with the links above. We can also contact you when the course is scheduled in your area.

Contact Us 1-800-803-3948
Contact Us
FAQ Get immediate answers to our most frequently asked qestions. View FAQs arrow_forward