Training
Generative AI
Data Science and Big Data Analytics

7879 Reviews star_rate star_rate star_rate star_rate star_half

Data Science and Big Data Analytics

Business success in the information age is predicated on the ability of organizations to convert raw data coming from various sources into high-grade business information. To stay competitive,...

View Full Schedule

$3,140 USD

Course Code WA2688

Duration 5 days

Available Formats Classroom, Virtual

Enter your Email to Download Full Course Details

Jun 3, 2024 - Jun 7, 2024 (5 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Jul 1, 2024 - Jul 5, 2024 (5 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Aug 26, 2024 - Aug 30, 2024 (5 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Business success in the information age is predicated on the ability of organizations to convert raw data coming from various sources into high-grade business information. To stay competitive, organizations have started adopting new approaches to data processing and analysis. For example, data scientists are turning to Apache Spark for processing massive amounts of data using Spark’s distributed compute capability along with its built-in machine learning library, or switching from proprietary and costly solutions to the free R programming language.

Applied Data Science and Business Analytics
Algorithms, Techniques and Common Analytical Methods
Machine Learning Introduction
Visualizing and Reporting Processed Results
The R Programming Language
Data Analysis with R
Elements of Functional Programming
Apache Spark Introduction
Spark SQL
ETL with Spark
MLlib Machine Learning Library
Graph Processing with GraphX

Who Can Benefit

Data Scientists, Software Developers, IT Architects, and Technical Managers

Prerequisites

Participants should have general knowledge of statistics and programming.

Course Details

Outline

Chapter 1 - Data Science Algorithms and Analytical Methods

Supervised vs Unsupervised Machine Learning
Supervised Machine Learning Algorithms
Unsupervised Machine Learning Algorithms
Choose the Right Algorithm
Life-cycles of Machine Learning Development
Classifying with k-Nearest Neighbors (SL)
k-Nearest Neighbors Algorithm
k-Nearest Neighbors Algorithm
The Error Rate
Decision Trees (SL)
Using Decision Trees
Random Forests
Naive Bayes Classifier (SL)
Classification of Documents with Naive Bayes
Unsupervised Learning Type: Clustering
K-Means Clustering (UL)
K-Means Clustering in a Nutshell
K-Means Clustering in a Nutshell
Regression Analysis
Types of Regression
Simple Linear Regression Model
Linear Regression Illustration
Least-Squares Method (LSM)
LSM Assumptions
Fitting Linear Regression Models in R
Example of Using R's lm() Function
Example of Using lm() with a Data Frame
Regression Models in Excel
Logistic Regression
Regression vs Classification
Time-Series Analysis
Decomposing Time-Series
Decomposing Time-Series
Summary

Chapter 2 - Getting Started with R

Introduction
Positioning of R in the Data Science Arena
R Integrated Development Environments
Running R
Running RStudio
Ending the Current R Session
Getting Help
Getting System Information
General Notes on R Commands and Statements
R Data Structures
R Objects and Workspace
Assignment Operators
Assignment Example
Arithmetic Operators
Logical Operators
System Date and Time
Operations
User-defined Functions
User-defined Function Example
R Code Example
Type Conversion (Coercion)
Control Statements
Conditional Execution
Repetitive Execution
Repetitive execution
Built-in Functions
Reading Data from Files into Vectors
Example of Reading Data from a File
Writing Data to a File
Example of Writing Data to a File
Logical Vectors
Character Vectors
Matrix Data Structure
Creating Matrices
Working with Data Frames
Matrices vs Data Frames
A Data Frame Sample
Accessing Data Cells
Getting Info About a Data Frame
Selecting Columns in Data Frames
Selecting Rows in Data Frames
Getting a Subset of a Data Frame
Sorting (ordering) Data in Data Frames by Attribute(s)
Applying Functions to Matrices and Data Frames
Using the apply() Function
Example of Using apply()
Executing External R commands
Loading External Scripts in RStudio
Listing Objects in Workspace
Removing Objects in Workspace
Saving Your Workspace in R
Saving Your Workspace in RStudio
Saving Your Workspace in R GUI
Loading Your Workspace
Loading Your Workspace
Hands-on Exercises
Getting and Setting the Working Directory
Getting the List of Files in a Directory
Diverting Output to a File
Batch (Unattended) Processing
Importing Data into R
Exporting Data from R
Hands-on Exercise
Standard R Packages
Extending R
Extending R in R GUI
Extending R in RStudio
CRAN Page
Summary

Chapter 3 - Text Mining

What is Text Mining?
The Common Text Mining Tasks
What is Natural Language Processing (NLP)?
Some of the NLP Use Cases
Machine Learning in Text Mining and NLP
Machine Learning in NLP
TF-IDF
The Feature Hashing Trick
Stemming
Example of Stemming
Stop Words
Popular Text Mining and NLP Libraries and Packages
Summary

Chapter 4 - Introduction to Functional Programming

What is Functional Programming (FP)?
Terminology: Higher-Order Functions
Terminology: Lambda vs Closure
A Short List of Languages that Support FP
FP with Java
FP With JavaScript
Imperative Programming in JavaScript
The JavaScript map (FP) Example
The JavaScript reduce (FP) Example
Using reduce to Flatten an Array of Arrays (FP) Example
The JavaScript filter (FP) Example
Common High-Order Functions in Python
Common High-Order Functions in Scala
Elements of FP in R
Summary

Chapter 5 - What is NoSQL?

Limitations of Relational Databases
Limitations of Relational Databases (Cont'd)
Defining NoSQL
What are NoSQL (Not Only SQL) Databases?
The Past and Present of the NoSQL World
NoSQL Database Properties
NoSQL Benefits
NoSQL Benefits
NoSQL Database Storage Types
NoSQL Database Storage Types
The CAP Theorem
NoSQL Systems CAP Triangle
Mechanisms to Guarantee a Single CAP Property
Limitations of NoSQL Databases
Big Data Sharding
Sharding Example
Quiz
Quiz Answers
Summary

Chapter 6 - MapReduce Overview

The Client – Server Processing Pattern
Distributed Computing Challenges
MapReduce Defined
Google's MapReduce
MapReduce Phases
The Map Phase
The Reduce Phase
MapReduce Word Count Job
MapReduce Shared-Nothing Architecture
Similarity with SQL Aggregation Operations
Example of Map & Reduce Operations using JavaScript
Example of Map & Reduce Operations using JavaScript
Problems Suitable for Solving with MapReduce
Typical MapReduce Jobs
Fault-tolerance of MapReduce
Distributed Computing Economics
MapReduce Systems
Summary

Chapter 7 - Hadoop Overview

Apache Hadoop
Apache Hadoop Logo
Typical Hadoop Applications
Hadoop Clusters
Hadoop Design Principles
Hadoop Versions
Hadoop's Main Components
Hadoop Simple Definition
Side-by-Side Comparison: Hadoop 1 and Hadoop 2
Hadoop-based Systems for Data Analysis
Other Hadoop Ecosystem Projects
Hadoop Caveats
Hadoop Distributions
Cloudera Distribution of Hadoop (CDH)
Cloudera Distributions
Hortonworks Data Platform (HDP)
MapR
Summary

Chapter 8 - Hadoop Distributed File System Overview

Hadoop Distributed File System (HDFS)
HDFS Considerations
HDFS High Availability
Storing Raw Data in HDFS
HDFS Security
HDFS Rack-awareness
Data Blocks
Data Block Replication Example
HDFS NameNode Directory Diagram
File Metadata Records (Conceptual View)
NameNode Meta Information Size
HDFS Balancing
Accessing HDFS
Examples of HDFS Commands
Other Supported File Systems
WebHDFS
Examples of WebHDFS Calls
HDFS Daemon Web UI Ports
Viewing Replica Factor and Block Size in NameNode Web UI
HDFS Write Operation
HDFS Read Operation
Read Operation Sequence Diagram
Communication inside HDFS
Summary

Chapter 9 - MapReduce with Hadoop

Hadoop's MapReduce
MapReduce 1 and MapReduce 2
Why do I need Discussion of the Old MapReduce?
MapReduce v1 ("Classic MapReduce")
JobTracker and TaskTracker (the "Classic MapReduce")
YARN (MapReduce v2)
YARN vs MR1
YARN As Data Operating System
MapReduce Programming Options
Hadoop's Streaming MapReduce
Python Word Count Mapper Program Example
Python Word Count Reducer Program Example
Setting up Java Classpath for Streaming Support
Streaming Use Cases
The Streaming API vs Java MapReduce API
Amazon Elastic MapReduce
Amazon Elastic MapReduce
Apache Tez
Summary

Chapter 10 - Apache Pig Scripting Platform

What is Pig?
Pig Latin
Apache Pig Logo
Pig Execution Modes
Local Execution Mode
MapReduce Execution Mode
Running Pig
Running Pig in Batch Mode
What is Grunt?
Pig Latin Statements
Pig Programs
Pig Latin Script Example
SQL Equivalent
Differences between Pig and SQL
Statement Processing in Pig
Comments in Pig
Supported Simple Data Types
Supported Complex Data Types
Arrays
Defining Relation's Schema
Not Matching the Defined Schema
The bytearray Generic Type
Using Field Delimiters
Loading Data with TextLoader()
Referencing Fields in Relations
Summary

Chapter 11 - Apache Pig Relational and Eval Operators

Pig Relational Operators
Example of Using the JOIN Operator
Example of Using the JOIN Operator
Example of Using the Order By Operator
Caveats of Using Relational Operators
Pig Eval Functions
Caveats of Using Eval Functions (Operators)
Example of Using Single-column Eval Operations
Example of Using Eval Operators For Global Operations
Summary

Chapter 12 - Hive

What is Hive?
Apache Hive Logo
Hive's Value Proposition
Who uses Hive?
What Hive Does Not Have
Hive's Main Sub-Systems
Hive Features
The "Classic" Hive Architecture
The New Hive Architecture
HiveQL
Where are the Hive Tables Located?
Hive Command-line Interface (CLI)
The Beeline Command Shell
Summary

Chapter 13 - Hive Command-line Interface

Hive Command-line Interface (CLI)
The Hive Interactive Shell
Running Host OS Commands from the Hive Shell
Interfacing with HDFS from the Hive Shell
The Hive in Unattended Mode
The Hive CLI Integration with the OS Shell
Executing HiveQL Scripts
Comments in Hive Scripts
Variables and Properties in Hive CLI
Setting Properties in CLI
Example of Setting Properties in CLI
Hive Namespaces
Using the SET Command
Setting Properties in the Shell
Setting Properties for the New Shell Session
Setting Alternative Hive Execution Engines
The Beeline Shell
Connecting to the Hive Server in Beeline
Beeline Command Switches
Beeline Internal Commands
Summary

Chapter 14 - Hive Data Definition Language

Hive Data Definition Language
Creating Databases in Hive
Using Databases
Creating Tables in Hive
Supported Data Type Categories
Common Numeric Types
String and Date / Time Types
Miscellaneous Types
Example of the CREATE TABLE Statement
Working with Complex Types
Working with Complex Types
Table Partitioning
Table Partitioning
Table Partitioning on Multiple Columns
Viewing Table Partitions
Row Format
Data Serializers / Deserializers
File Format Storage
File Compression
More on File Formats
The ORC Data Format
Converting Text to ORC Data Format
The EXTERNAL DDL Parameter
Example of Using EXTERNAL
Creating an Empty Table
Dropping a Table
Table / Partition(s) Truncation
Alter Table/Partition/Column
Views
Create View Statement
Why Use Views?
Restricting Amount of Viewable Data
Examples of Restricting Amount of Viewable Data
Creating and Dropping Indexes
Describing Data
Summary

Chapter 15 - Apache Sqoop

What is Sqoop?
Apache Sqoop Logo
Sqoop Import / Export
Sqoop Help
Examples of Using Sqoop Commands
Data Import Example
Fine-tuning Data Import
Controlling the Number of Import Processes
Data Splitting
Helping Sqoop Out
Example of Executing Sqoop Load in Parallel
A Word of Caution: Avoid Complex Free-Form Queries
Using Direct Export from Databases
Example of Using Direct Export from MySQL
More on Direct Mode Import
Data Export from HDFS
Export Tool Common Arguments
Data Export Control Arguments
Data Export Example
INSERT and UPDATE Statements
INSERT Operations
UPDATE Operations
Example of the Update Operation
Failed Exports
Sqoop2
Summary

Chapter 16 - Introduction to Apache Spark

What is Apache Spark
A Short History of Spark
Where to Get Spark?
The Spark Platform
Spark Logo
Common Spark Use Cases
Languages Supported by Spark
Running Spark on a Cluster
The Driver Process
Spark Applications
Spark Shell
The spark-submit Tool
The spark-submit Tool Configuration
The Executor and Worker Processes
The Spark Application Architecture
Interfaces with Data Storage Systems
Limitations of Hadoop's MapReduce
Spark vs MapReduce
Spark as an Alternative to Apache Tez
The Resilient Distributed Dataset (RDD)
Spark Streaming (Micro-batching)
Spark SQL
Example of Spark SQL
Spark Machine Learning Library
GraphX
Spark vs R
Summary

Chapter 17 - The Spark Shell

The Spark Shell
The Spark Shell
The Spark Shell UI
Spark Shell Options
Getting Help
The Spark Context (sc) and SQL Context (sqlContext)
The Shell Spark Context
Loading Files
Saving Files
Basic Spark ETL Operations
Summary

Chapter 18 - Spark RDDs

The Resilient Distributed Dataset (RDD)
Ways to Create an RDD
Custom RDDs
Supported Data Types
RDD Operations
RDDs are Immutable
Spark Actions
RDD Transformations
RDD Transformations
Other RDD Operations
Chaining RDD Operations
RDD Lineage
The Big Picture
What May Go Wrong
Checkpointing RDDs
Local Checkpointing
Parallelized Collections
More on parallelize() Method
The Pair RDD
Where do I use Pair RDDs?
Example of Creating a Pair RDD with Map
Example of Creating a Pair RDD with keyBy
Miscellaneous Pair RDD Operations
RDD Caching
RDD Persistence
The Tachyon Storage
Summary

Chapter 19 - Parallel Data Processing with Spark

Running Spark on a Cluster
Spark Stand-alone Option
The High-Level Execution Flow in Stand-alone Spark Cluster
Data Partitioning
Data Partitioning Diagram
Single Local File System RDD Partitioning
Multiple File RDD Partitioning
Special Cases for Small-sized Files
Parallel Data Processing of Partitions
Parallel Data Processing of Partitions
Spark Application, Jobs, and Tasks
Stages and Shuffles
The "Big Picture"
Summary

Chapter 20 - Shared Variables in Spark

Shared Variables in Spark
Broadcast Variables
Creating and Using Broadcast Variables
Example of Using Broadcast Variables
Accumulators
Creating and Using Accumulators
Example of Using Accumulators
Custom Accumulators
Summary

Chapter 21 - Introduction to Spark SQL

What is Spark SQL?
What is Spark SQL?
Uniform Data Access with Spark SQL
Hive Integration
Hive Interface
Integration with BI Tools
Spark SQL is No Longer Experimental Developer API!
What is a DataFrame?
The SQLContext Object
The SQLContext API
Changes Between Spark SQL 1.3 to 1.4
Example of Spark SQL (Scala Example)
Example of Working with a JSON File
Example of Working with a Parquet File
Using JDBC Sources
JDBC Connection Example
Performance & Scalability of Spark SQL
Summary
Chapter 22 - Graph Processing with GraphX
What is GraphX?
Supported Languages
Vertices and Edges
Graph Terminology
Example of Property Graph
The GraphX API
The GraphX Views
The Triplet View
Graph Algorithms
Graphs and RDDs
Constructing Graphs
Graph Operators
Example of Using GraphX Operators
GraphX Performance Optimization
The PageRank Algorithm
GraphX Support for PageRank
Summary

Chapter 23 - The Spark Machine Learning Library

What is MLlib?
Supported Languages
MLlib Packages
Dense and Sparse Vectors
Labeled Point
Python Example of Using the LabeledPoint Class
LIBSVM format
An Example of a LIBSVM File
Loading LIBSVM Files
Local Matrices
Example of Creating Matrices in MLlib
Distributed Matrices
Example of Using a Distributed Matrix
Classification and Regression Algorithm
Clustering
Summary

Chapter 24 - Machine Learning with BigML

What is BigML?
How BigML Service Works
Data Files
Data Sets
Data Sets Example
Models
Predictions
The Prediction UI Form
Text Analysis in BigML
REST API
Summary

Read Less

View Full Schedule

3 options available

Jun 3, 2024 - Jun 7, 2024 (5 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Jul 1, 2024 - Jul 5, 2024 (5 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

Aug 26, 2024 - Aug 30, 2024 (5 days)

Language	Time
Virtual
English	10:00 AM – 6:00 PM EST
Select delivery method/location (1 options)
Virtual \| 10:00 AM – 6:00 PM EST Virtual \| 10:00 AM – 6:00 PM EST

Enroll: Enroll

When does class start/end?

Classes begin promptly at 9:00 am, and typically end at 5:00 pm.

Does the course schedule include a Lunchbreak?

Lunch is normally an hour long and begins at noon. Coffee, tea, hot chocolate and juice are available all day in the kitchen. Fruit, muffins and bagels are served each morning. There are numerous restaurants near each of our centers, and some popular ones are indicated on the Area Map in the Student Welcome Handbooks - these can be picked up in the lobby or requested from one of our ExitCertified staff.

How can someone reach me during class?

If someone should need to contact you while you are in class, please have them call the center telephone number and leave a message with the receptionist.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

How do I find an ExitCertified training location?

We have training locations across the United States and Canada. View a full list of classroom training locations.

Which delivery formats are available?

At ExitCertified we offer training that is Instructor-Led, Online, Virtual and Self-Paced.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

ExitCertified instructors have an average of 27 years of practical IT experience. They have also served as consultants for an average of 15 years. To stay up to date, instructors will at least spend 25 percent of their time learning new emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth. We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact customerexp@exitcertified.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How to request a W9 for ExitCertified LLC?

View our filing status and how to request a W9.

The format of the class was concise. I learned new skills to use at my workplace.

Mustapha F

ExitCertified

the class/lecture was amazing and very easy to understand and was in detail.

Muhammad M

ExitCertified

This course gave me a clearer understanding of the AWS cloud architecture.

ExitCertified Student

ExitCertified

Courseware was effective but would like to have some PDF material on BPML and XPATH

&quot;Steve&quot;

ExitCertified

Both course material and instructor demonstrated a sound foundation on Maximo material

ExitCertified Student

ExitCertified

Data Science and Big Data Analytics

Overview

Schedule

FAQ

Reviews

Who Can Benefit

Prerequisites

Course Details

Outline

When does class start/end?

Does the course schedule include a Lunchbreak?

How can someone reach me during class?

What languages are used to deliver training?

What does GTR stand for?

How do I find an ExitCertified training location?

Which delivery formats are available?

Does ExitCertified deliver group training?

What does vendor-authorized training mean?

Is the training too basic, or will you go deep into technology?

How up-to-date are your courses and support materials?

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Do you provide hands-on training and exercises in an actual lab environment?

Will you customize the training for our company’s specific needs and goals?

How do I get started with certification?

Will I get access to content after I complete a course?

How to request a W9 for ExitCertified LLC?

Drag & Drop a File Here

Alert!

Modal Title

Error!

Default Title

Prompt

Confirm

Login

Data Science and Big Data Analytics

Upcoming Course Dates

Overview

Schedule

FAQ

Reviews

Who Can Benefit

Prerequisites

Course Details

Outline

When does class start/end?

Does the course schedule include a Lunchbreak?

How can someone reach me during class?

What languages are used to deliver training?

What does GTR stand for?

How do I find an ExitCertified training location?

Which delivery formats are available?

Does ExitCertified deliver group training?

What does vendor-authorized training mean?

Is the training too basic, or will you go deep into technology?

How up-to-date are your courses and support materials?

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Do you provide hands-on training and exercises in an actual lab environment?

Will you customize the training for our company’s specific needs and goals?

How do I get started with certification?

Will I get access to content after I complete a course?

How to request a W9 for ExitCertified LLC?

Drag & Drop a File Here

Alert!

Modal Title

Error!

Default Title

Prompt

Confirm

Login