Sciweavers

2926 search results - page 157 / 586
» Document Analysis
Sort
View
DL
1994
Springer
191views Digital Library» more  DL 1994»
15 years 8 months ago
Corpus Linguistics for Establishing The Natural Language Content of Digital Library Documents
Digital Libraries will hold huge amounts of text and other forms of information. For the collections to be maximally useful, they must be highly organized with useful indexes and ...
Robert P. Futrelle, Xiaolan Zhang 0002, Yumiko Sek...
156
Voted
ACML
2009
Springer
15 years 10 months ago
Estimating Likelihoods for Topic Models
Abstract. Topic models are a discrete analogue to principle component analysis and independent component analysis that model topic at the word level within a document. They have ma...
Wray L. Buntine
143
Voted
PVLDB
2008
85views more  PVLDB 2008»
15 years 3 months ago
Scalable ad-hoc entity extraction from text collections
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc&quo...
Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaud...
IJCNLP
2004
Springer
15 years 9 months ago
Combining Labeled and Unlabeled Data for Learning Cross-Document Structural Relationships
Multi-document discourse analysis has emerged with the potential of improving various NLP applications. Based on the newly proposed Cross-document Structure Theory (CST), this pap...
Zhu Zhang, Dragomir R. Radev
106
Voted
ANLP
1994
68views more  ANLP 1994»
15 years 5 months ago
Practical Issues in Automatic Documentation Generation
PLANDoc, a system under joint development by Columbia and Bellcore, documents the activity of planning engineers as they study telephone routes. It takes as input a trace of the e...
Kathleen McKeown, Karen Kukich, James Shaw