This course focuses on DS2, a fourth-generation SAS-proprietary language providing modern programming techniques and structures for parallel processing and preparing large data for analysis. A brief introduction to Hadoop, including a comparison to traditional relational databases, is included. If you are a programmer/data scientist who wants training and hands-on experience manipulating Hadoop data using powerful SAS algorithms, this class is for you.
- describe Hadoop and its core technologies
- differentiate between Hadoop and traditional relational database management systems
- identify the similarities and differences between traditional SAS DATA steps and DS2 DATA programs
- convert a Base SAS DATA step to a DS2 DATA program
- use DS2 variable declarations, expressions, and methods for data conversion, manipulation, and conditional processing
- create user-defined packages to store, share, and execute user-defined DS2 methods
- use predefined DS2 packages for advanced data manipulation
- create and execute DS2 threads for parallel processing
- leverage the SAS In-Database Code Accelerator to execute DS2 code directly on a Hadoop cluster
- execute DS2 code in the SAS High-Performance Analytics grid using the HPDS2 procedure.
Who Can Benefit
- Experienced SAS programmers and/or data scientists who want training and hands-on experience manipulating Hadoop data using powerful SAS algorithms
- This course was written with the seasoned SAS programmer in mind. If you have completed both the SAS(R) Programming II: Manipulating Data with the DATA Step course and the SAS(R) SQL 1: Essentials course or have a solid SAS DATA step programming background and know how to write SQL joins, you should be quite comfortable in this class.