We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this approach is a ...
We present a system to automatically generate RSS feeds from HTML documents that consist of time-series items with date expressions, e.g., archives of weblogs, BBSs, chats, mailin...
We describe an evaluation of result set filtering techniques for providing ultra-high precision in the task of presenting related news for general web queries. In this task, the n...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury...
To understand the subjective documents, for example, public comments on the government’s proposed regulation, opinion identification and classification is required. Rather than ...
Namhee Kwon, Liang Zhou, Eduard H. Hovy, Stuart W....
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...