Sciweavers

241 search results - page 21 / 49
» Detecting Co-Derivative Documents in Large Text Collections
Sort
View
92
Voted
KDD
2009
ACM
169views Data Mining» more  KDD 2009»
15 years 4 months ago
On burstiness-aware search for document sequences
As the number and size of large timestamped collections (e.g. sequences of digitized newspapers, periodicals, blogs) increase, the problem of efficiently indexing and searching su...
Theodoros Lappas, Benjamin Arai, Manolis Platakis,...
AND
2009
14 years 7 months ago
Digital weight watching: reconstruction of scanned documents
A web-portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. E...
Tim Gielissen, Maarten Marx
CICLING
2007
Springer
15 years 3 months ago
Text Categorization for Improved Priors of Word Meaning
Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant (most frequent) s...
Rob Koeling, Diana McCarthy, John Carroll
EMNLP
2010
14 years 7 months ago
Evaluating Models of Latent Document Semantics in the Presence of OCR Errors
Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
90
Voted
PKDD
1998
Springer
113views Data Mining» more  PKDD 1998»
15 years 1 months ago
Text Mining at the Term Level
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on...
Ronen Feldman, Moshe Fresko, Yakkov Kinar, Yehuda ...