Accelebrate's Introduction to R Programming training course teaches attendees how to use R programming to explore data from a variety of sources by building inferential models and generating charts, graphs, and other data representations.
Skills Gained
All students will:
- Master the use of the R interactive environment
- Expand R by installing R packages
- Explore and understand how to use the R documentation
- Read Structured Data into R from various sources
- Understand the different data types in R
- Understand the different data structures in R
- Understand how to use dates in R
- Use R for mathematical operations
- Use of vectorized calculations
- Write user-defined R functions
- Use control statements
- Write Loop constructs in R
- Use Apply to iterate functions across data
- Reshape data to support different analyses
- Understand split-apply-combine (group-wise operations) in R
- Deal with missing data
- Manipulate strings in R
- Understand basic regular expressions in R
- Understand base R graphics
- Focus on GGplot2 graphics for R
- Be familiar with trellis (lattice) graphics
- Use R for descriptive statistics
- Use R for inferential statistics
- Write multivariate models in R
- Understand confounding and adjustment in multivariate models
- Understand interaction in multivariate models
- Predict/Score new data using models
- Understand basic non-linear functions in models
- Understand how to link data, statistical methods, and actionable questions
Prerequisites
Students should have knowledge of basic statistics (t-test, chi-square-test, regression) and know the difference between descriptive and inferential statistics. No programming experience is needed.
Software Requirements
- R 3.0 or later with console
- IDE or text editor of your choice (RStudio recommended)
Outline
Overview
- History of R
- Advantages and disadvantages
- Downloading and installing
- How to find documentation
Introduction
- Using the R console
- Getting help
- Learning about the environment
- Writing and executing scripts
- Object oriented programming
- Introduction to vectorized calculations
- Introduction to data frames
- Installing packages
- Working directory
- Saving your work
Variable types and data structures
- Variables and assignment
- Data types
- Numeric, character, boolean, and factors
- Data structures
- Vectors, matrices, arrays, dataframes, lists
- Indexing, subsetting
- Assigning new values
- Viewing data and summaries
- Naming conventions
- Objects
Getting data into the R environment
- Built-in data
- Reading data from structured text files
- Reading data using ODBC
Dataframe manipulation with dplyr
- Renaming columns
- Adding new columns
- Binning data (continuous to categorical)
- Combining categorical values
- Transforming variables
- Handling missing data
- Long to wide and back
- Merging datasets together
- Stacking datasets together (concatenation)
Handling dates in R
- Date and date-time classes in R
- Formatting dates for modeling
Control flow
- Truth testing
- Branching
- Looping
Functions in depth
- Parameters
- Return values
- Variable scope
- Exception handling
Applying functions across dimensions
Exploratory data analysis (descriptive statistics)
- Continuous data
- Distributions
- Quantiles, mean
- Bi-modal distributions
- Histograms, box-plots
- Categorical data
- Tables
- Barplots
- Group by calculations with dplyr
- Split-apply-combine
- Melting and casting data
Inferential statistics
- Bivariate correlation
- T-test and non-parametric equivalents
- Chi-squared test
Base graphics
- Base graphics system in R
- Scatterplots, histograms, barcharts, box and whiskers, dotplots
- Labels, legends, titles, axes
- Exporting graphics to different formats
Advanced R graphics: ggplot2
- Understanding the grammar of graphics
- Quick plots (qplot function)
- Building graphics by pieces (ggplot function)
General linear regression
- Linear and logistic models
- Regression plots
- Confounding / interaction in regression
- Scoring new data from models (prediction)
Conclusion