This course describes the functionality of SAS Text Miner, which is a separately licensed component that is available for SAS Enterprise Miner. In this course, you learn to use SAS Text Miner to uncover underlying themes or concepts contained in large document collections, automatically group documents into topical clusters, classify documents into predefined categories, and integrate text data with structured data to enrich predictive modeling endeavors.
The e-learning format of this course includes Virtual Lab time to practice.
- Convert documents stored in standard formats (Microsoft Word, Adobe PDF, and so on) into general purpose HTML or TXT formats.
- Read documents from a variety of sources (web pages, flat files, data elements in a relational database, spreadsheet cells, and so on) into SAS tables.
- Process textual data for text mining (for example, correct misspellings or recode acronyms and abbreviations).
- Convert unstructured text-based character data into structured numeric data.
- Explore words and phrases in a document collection.
- Query document collections using keywords (that is, identify documents having specific words or phrases).
- Identify topics or concepts that appear in a document collection.
- Create user-influenced topic tables from scratch or by modifying machine-generated topics or concepts using domain knowledge.
- Use derived topic tables or preexisting user-influenced topic tables (or both) to enhance information retrieval and document classification.
- Cluster documents into homogeneous subgroups.
- Classify documents into predefined categories.
Who Can Benefit
- Statisticians, business analysts, and market researchers who incorporate free-format textual information in their analyses; managers of large document collections who must organize and select documents using data mining; and students of data mining who want to learn about text mining
- Before attending this course, you should:
- Be acquainted with Microsoft Windows and Windows-based software.
- Have at least an introductory-level familiarity with basic statistics and regression modeling.