This OnDemand offering provides you with a 180-day subscription that begins on the date of purchase.
- How to identify potential business use cases where data science can provide impactful results
- How to obtain, clean and combine disparate data sources to create a coherent picture for analysis
- What statistical methods to leverage for data exploration that will provide critical insight into your data
- Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
- What machine learning technique to use for a particular data science project
- How to implement and manage recommenders using Spark’s MLlib, and how to set up and evaluate data experiments
- What are the pitfalls of deploying new analytics projects to production, at scale
This course is suitable for developers, data analysts, and statisticians with basic knowledge of Apache Hadoop: HDFS, MapReduce, Hadoop Streaming, and Apache Hive as well as experience working in Linux environments. Students should have proficiency in a scripting language; Python is strongly preferred, but familiarity with Perl or Ruby is sufficient.