Apache Hive makes transformation and analysis of complex, multi-structured data scalable in Hadoop. Apache Impala enables real-time interactive analysis of the data stored in Hadoop using a native SQL environment. Together, they make multi-structured data accessible to analysts, database administrators, and others without Java programming expertise.
- How the open source ecosystem of big data tools addresses challenges not met by traditional RDBMSs
- How Apache Hive and Apache Impala are used to provide SQL access to data
- How Hive and Impala syntax and data formats, including functions and subqueries, help answer questions about data
- How to create, modify, and delete tables, views, and databases; load data; and store results of queries
- How to create and use partitions and different file formats
- How to combine two or more datasets using JOIN or UNION, as appropriate
- What analytic and windowing functions are, and how to use them
- How to store and query complex or nested data structures
- How to process and analyze semi-structured and unstructured data
- Different techniques for optimizing Hive and Impala queries
- How to extend the capabilities of Hive and Impala using parameters, custom file formats and SerDes, and external scripts
- How to determine whether Hive, Impala, an RDBMS, or a mix of these is best for a given task
This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity. Prior knowledge of Apache Hadoop is not required.