This course focuses on the statistical and machine learning methods for predictive modeling available in the IMSTAT procedure. Topics include building candidate predictive models and assessing predictive models on training and holdout data for honest assessment using the IMSTAT procedure. You learn about methods such as decision trees and random forests using the DECISIONTREE and RANDOMWOODS statements. Modeling a binary response using the LOGISTIC and NEURAL statements is also covered, as is analyzing an interval target with generalized linear models using the GLM and GENMODEL statements. Generating and using Base SAS score code is demonstrated as well. Features of ODS Statistical Graphics are described for visualizing IMSTAT results.
- distribute SAS tables in the Hadoop Distributed File System (HDFS)
- load Hadoop tables into LASR memory
- process in-memory tables with PROC LASR and PROC IMSTAT
- build predictive models using the PROC IMSTAT statements DECISIONTREE, RANDOMWOODS, LOGISTIC, NEURAL, GLM, and GENMODEL
- produce assessment statistics using the PROC IMSTAT ASSESS statement
- produce score code
- score new data sets
- generate visual summaries of data using ODS statistical graphics.
Who Can Benefit
- Experienced predictive modelers who need to learn the syntax and functionality of the analytical statements in the IMSTAT procedure
- Before attending this course, you should have completed Getting Started with In-Memory Statistics For Hadoop Other prerequisites are knowledge of and experience using the analytics methods such as binary logistic regression and decision trees. Understanding of predictive modeling concepts such as honest assessment on holdout data is also required.