 |
 |
 |
 |
 |
 |
 |
Features
CONTENT ANALYSIS AND TEXT-MINING TOOL FOR STATA
WordStat for Stata was created to allow Stata v13 - Stata v16 users running under Windows, to apply text analytics techniques on any string variables stored in a Stata data file. WordStat combines natural language processing, content analysis and statistical techniques to quickly extract topics, patterns and relationships in large amount of text. It can process millions of words in seconds and compare extracted themes across any other numerical, categorical or date variables in the Stata file. WHAT IT IS USED FOR?WordStat can be used by anyone who needs to quickly extract and analyze information stored in Stata text variables. It may be used for: • Content analysis of open-ended responses, interview or focus group transcripts • Business intelligence and competitive web sites analysis • Information extraction and knowledge discovery from incident reports, customer complaints • Content analysis of news coverage or scientific literature (scientometrics or bibliometrics studies) • Automatic tagging and classification of documents • Fraud detection, authorship attribution, patent analysis • Taxonomy development and validation • Etc. (for some examples of studies using WordStat, see the Studies page). WORDSTAT FOR STATA KEY FEATURES | EXPLORATORY TEXT MININGIntegrated exploratory text mining and visualization tools such as clustering, multidimensional scaling, proximity plots, and more, to quickly extract themes and automatically identify patterns. |  | TOPIC MODELINGGet a quick overview of the most salient topics from large text collections. A side panel allows one to compare the frequency of specific topics across other variables using bar charts or line charts. |  | CATEGORIZATION DICTIONARIESUse existing or create custom dictionaries composed of words, word patterns, phrases and proximity rules. Get computer assistance for building taxonomies with phrase and named-entity extraction, misspelling replacements, integrated thesaurus, etc.. |  | COMPARATIVE ANALYSISExplore relationships between unstructured text and structured data with statistical and graphical tools (correspondence analysis, heatmaps, bubble charts, etc.). |  | LINK ANALYSISExplore relationships among words or extracted concepts using force-based graphs, multidimensional scaling or circular graphs. Retrieve text segments associated with specific connections. |  | MACHINE LEARNINGDevelop automatic document classification models by using Naive Bayes and K-Nearest Neighbors. Classification models may then be saved on disk and reapplied on new data. |  | CHARTINGIllustrate patterns and explore complex phenomena with interactive visualization tools such as bar charts, line charts, heatmaps, word clouds, bubble charts, MDS plots, etc.. Copy and paste charts or saved them to disk in bmp, jpg, or png file formats. |  | DOCUMENT CONVERSION WIZARDThe Document conversion wizard allows one to easily import into a new Stata .dta file, documents stored in various file formats (.DOC, HTML, PDF, TXT) and automatically extract numeric and alphanumeric values from structured documents.
|
|
|
 |
 |
 |
 |

|