Comprehensive Data Science with Python

  • Contact Us For Pricing
  • Reviews star_rate star_rate star_rate star_rate star_half 1131 Ratings
  • Course Code ACCEL-PYTH-CDS
  • Duration 5 days
  • Available Formats Classroom

This Python programming data science training course teaches engineers, data scientists, statisticians, and other quantitative professionals the Python skills they need to use the Python programming language to analyze and chart data.

Skills Gained

All students will:

  • nderstand the difference between Python basic data types
  • Know when to use different python collections
  • Ability to implement python functions
  • Understand control flow constructs in Python
  • Handle errors via exception handling constructs
  • Be able to quantitatively define an answerable, actionable question
  • Import both structured and unstructured data into Python
  • Parse unstructured data into structured formats
  • Understand the differences between NumPy arrays and pandas dataframes
  • Overview of where Python fits in the Python/Hadoop/Spark ecosystem
  • Simulate data through random number generation
  • Understand mechanisms for missing data and analytic implications
  • Explore and Clean Data
  • Create compelling graphics to reveal analytic results
  • Reshape and merge data to prepare for advanced analytics
  • Find test for group differences using inferential statistics
  • Implement linear regression from a frequentist perspective
  • Understand non-linear terms, confounding, and interaction in linear regression
  • Extend to logistic regression to model binary outcomes
  • Understand the difference between machine learning and frequentist approaches to statistics
  • Implement classification and regression models using machine learning
  • Score new datasets, evaluate model fit, and quantify variable importance

Prerequisites

All attendees should have prior programming experience and an understanding of basic statistics.

Course Details

Software Requirements

  • Anaconda Python 3.5 or later
  • Spyder IDE (Comes with Anaconda)

Data Science with Python Programming Training Outline

Base Python Introduction

  • History and current use
  • Installing the Software
  • Python Distributions
  • String Literals and numeric objects
  • Collections (lists, tuples, dicts)
  • Datetime classes in Python
  • Memory Management in Python
  • Control Flow
  • Functions
  • Exception Handling

Defining Actionable, Analytic Questions

  • Defining the quantitative construct to make inference on the question
  • Identifying the data needed to support the constructs
  • Identifying limitations to the data and analytic approach
  • Constructing Sensitivity analyses

Bringing Data In

  • Structured Data
  • Structured Text Files
  • Excel workbooks
  • SQL databases
  • Working with Unstructured Text Data
  • Reading Unstructured Text
  • Introduction to Natural Language Processing with Python

NumPy: Matrix Language

  • Introduction to the ndarray
  • NumPy operations
  • Broadcasting
  • Missing data in NumPy (masked array)
  • NumPy Structured arrays
  • Random number generation

Data Preparation with Pandas

  • Filtering
  • Creating and deleting variables
  • Discretization of Continuous Data
  • Scaling and standardizing data
  • Identifying Duplicates
  • Dummy Coding
  • Combining Datasets
  • Transposing Data
  • Long to wide and back

Exploratory Data Analysis with Pandas

  • Univariate Statistical Summaries and Detecting Outliers
  • Multivariate Statistical Summaries and Outlier Detection
  • Group-wise calculations using Pandas
  • Pivot Tables

Exploring Data Graphically

  • Histogram
  • Box-and-whiskers plot
  • Scatter plots
  • Forest Plots
  • Group-by plotting

Advanced Graphing with Matplotlib, Pandas, and Seaborn

Python, Hadoop and Spark

  • Introduction to the difference in Python, Hadoop, and Spark
  • Importing data from Spark and Hadoop to Python
  • Parallel execution leveraging Spark or Hadoop

Missing Data

  • Exploring and understanding patterns in missing data
  • Missing at Random
  • Missing Not at Random
  • Missing Completely at Random
  • Data imputation methods

Traditional Inferential Statistics

  • Comparing Groups
  • P-Values, summary statistics, sufficient statistics, inferential targets
  • T-Tests (equal and unequal variances)
  • ANOVA
  • Chi-Square Tests
  • Correlation

Frequentist Approaches to Multivariate Statistics

  • Linear Regression
  • Multivariate linear regression
  • Capturing Non-linear Relationships
  • Comparing Model Fits
  • Scoring new data
  • Poisson Regression Extension
  • Logistic regression
  • Logistic Regression Example
  • Classification Metrics

Machine Learning Approaches to Multivariate Statistics

  • Machine Learning Theory
  • Data pre-processing
  • Missing Data
  • Dummy Coding
  • Standardization
  • Training/Test data
  • Supervised Versus Unsupervised Learning
  • Unsupervised Learning: Clustering
  • Clustering Algorithms
  • Evaluating Cluster Performance
  • Dimensionality Reduction
  • A-priori
  • Principal Components Analysis
  • Penalized Regression

Supervised Learning: Regression

  • Linear Regression
  • Penalized Linear Regression
  • Stochastic Gradient Descent
  • Scoring New Data Sets
  • Cross Validation
  • Variance Bias-Tradeoff
  • Feature Importance

Supervised Learning: Classification

  • Logistic Regression
  • LASSO
  • Random Forest
  • Ensemble Methods
  • Feature Importance
  • Scoring New Data Sets
  • Cross Validation

Conclusion

When does class start/end?

Classes begin promptly at 9:00 am, and typically end at 5:00 pm.

Does the course schedule include a Lunchbreak?

Lunch is normally an hour long and begins at noon. Coffee, tea, hot chocolate and juice are available all day in the kitchen. Fruit, muffins and bagels are served each morning. There are numerous restaurants near each of our centers, and some popular ones are indicated on the Area Map in the Student Welcome Handbooks - these can be picked up in the lobby or requested from one of our ExitCertified staff.

How can someone reach me during class?

If someone should need to contact you while you are in class, please have them call the center telephone number and leave a message with the receptionist.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

Does ExitCertified deliver group training?

Yes, we provide training for groups, individuals, and private on sites. View our group training page for more information.

The course was well organized, excellent training materials and lab excercises.

The instructor was very good but there were a few technical problems form cognizant's end that made things difficult from time to time.

A must take course if you want to be an AWS cloud architect. It touches almost all the essential services in AWS

ExitCertified offers friendly, experienced, and great instructors for their courses!

The course was very thorough and the instructor ensured that all the students were on the same page.

0 options available

There are currently no scheduled dates for this course. If you are interested in this course, request a course date with the links above. We can also contact you when the course is scheduled in your area.

Contact Us 1-800-803-3948
Contact Us Live Chat
FAQ Get immediate answers to our most frequently asked qestions. View FAQs arrow_forward