This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. ...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...
: This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured...
Abstract. Notations like SGML and XML represent document structures using tree structures; while this is in general a step forward from earlier systems, it creates certain difficul...
Today, valuable business information is increasingly stored as unstructured data (documents, emails, etc.). For example, documents exchanged between business partners capture info...
Despite recent advances in authoring systems and tools, creating multimedia presentations remains a labor-intensive process. This paper describes a system for automatically constr...