As the number and size of large timestamped collections (e.g. sequences of digitized newspapers, periodicals, blogs) increase, the problem of efficiently indexing and searching su...
Theodoros Lappas, Benjamin Arai, Manolis Platakis,...
A web-portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. E...
Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant (most frequent) s...
Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on...