We apply a well-known Bayesian probabilistic model to textual information retrieval: the classification of documents based on their relevance to a query. This model was previously...
In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured cla...
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...
Many of the existing machine-learning approaches to speech summarization cast important sentence selection as a two-class classification problem and have shown empirical success f...
Opinion detection research relies on labeled documents for training data, either by assumptions based on the document’s origin or by using human assessors to categorise the docu...