This course builds on skills developed in the Data Science and Big Data Analytics course. The main focus areas cover Hadoop (including Pig, Hive, and HBase), natural language processing, social network analysis, simulation, random forests, multinomial logistic regression, and data visualization. With a technology-neutral approach, this course utilizes several open-source tools to address big data challenges.
- MapReduce functionality
- NoSQL databases and Hadoop Ecosystem tools for analyzing large-scale, unstructured data sets
- Natural language processing, social network analysis, and data visualization concepts
- Use advanced quantitative methods, and apply one of them in a Hadoop environment
- Apply advanced techniques to real-world datasets in a final lab
Who Can Benefit
- Aspiring data scientists
- Data analysts that have completed the associate level Data Science and Big Data Analytics course
- Computer scientists wanting to learn MapReduce and methods for analyzing unstructured data such as text.