ParaText: scalable text modeling and analysis

11 years 6 months ago
ParaText: scalable text modeling and analysis
Automated analysis of unstructured text documents (e.g., web pages, newswire articles, research publications, business reports) is a key capability for solving important problems in areas including decision making, risk assessment, social network analysis, intelligence analysis, scholarly research and others. However, as data sizes continue to grow in these areas, scalable processing, modeling, and semantic analysis of text collections becomes essential. In this paper, we present the ParaText text analysis engine, a distributed memory software framework for processing, modeling, and analyzing collections of unstructured text documents. Results on several document collections using hundreds of processors are presented to illustrate the flexibility, extensibility, and scalability of the the entire process of text modeling from raw data ingestion to application analysis. Categories and Subject Descriptors I.2.7 [Computing Methodologies]: Natural Language Processing--text analysis General...
Daniel M. Dunlavy, Timothy M. Shead, Eric T. Stant
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where HPDC
Authors Daniel M. Dunlavy, Timothy M. Shead, Eric T. Stanton
Comments (0)