We present two machine learning approaches to information extraction from semi-structured documents that can be used if no annotated training data are available, but there does ex...
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...
Supervised text categorization is a machine learning task where a predefined category label is automatically assigned to a previously unlabelled document based upon characteristic...
Many academic journals ask their authors to provide a list of about five to fifteen keywords, to appear on the first page of each article. Since these key words are often phrases ...
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...