DS2 Programming Essentials with Hadoop

Course Details
Code: DS2H
Tuition (USD): $1,300.00 • Classroom (2 days)
Course Details
GSA (USD): $1,178.84 • Classroom (2 days)

This course focuses on DS2, a fourth-generation SAS-proprietary language providing modern programming techniques and structures for parallel processing and preparing large data for analysis. A brief introduction to Hadoop, including a comparison to traditional relational databases, is included. If you are a programmer/data scientist who wants training and hands-on experience manipulating Hadoop data using powerful SAS algorithms, this class is for you.

Skills Gained

  • describe Hadoop and its core technologies
  • differentiate between Hadoop and traditional relational database management systems
  • identify the similarities and differences between traditional SAS DATA steps and DS2 DATA programs
  • convert a Base SAS DATA step to a DS2 DATA program
  • use DS2 variable declarations, expressions, and methods for data conversion, manipulation, and conditional processing
  • create user-defined packages to store, share, and execute user-defined DS2 methods
  • use predefined DS2 packages for advanced data manipulation
  • create and execute DS2 threads for parallel processing
  • leverage the SAS In-Database Code Accelerator to execute DS2 code directly on a Hadoop cluster
  • execute DS2 code in the SAS High-Performance Analytics grid using the HPDS2 procedure.

Who Can Benefit

  • Experienced SAS programmers and/or data scientists who want training and hands-on experience manipulating Hadoop data using powerful SAS algorithms

Prerequisites

  • This course was written with the seasoned SAS programmer in mind. If you have completed both the SAS(R) Programming II: Manipulating Data with the DATA Step course and the SAS(R) SQL 1: Essentials course or have a solid SAS DATA step programming background and know how to write SQL joins, you should be quite comfortable in this class.

Course Details

Introduction

  • introduction to DS2
  • introduction to Hadoop
  • course logistics

Getting Started

  • hello world
  • basic DS2 syntax
  • converting DATA steps to DS2 DATA programs

DATA Steps versus DS2 DATA Programs

  • similarities to the DATA step
  • DS2 missing features

New Data Types and Syntax

  • DATA program structuring
  • data types
  • automatic data type conversion
  • expressions

Methods, Packages, and Threads

  • methods
  • user-defined packages
  • predefined packages
  • threads

DS2 Unleashed

  • SAS In-Database Code Accelerator
  • introduction to the HPDS2 procedure

Learning More

  • learning more

Appendix (self-study) Tracing and Debugging

  • tracing
  • debugging