The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Conte...
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a...
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas L...
This paper introduces an information retrieval based approach for automating the detection and classification of non-functional requirements (NFRs). Early detection of NFRs is use...
Jane Cleland-Huang, Raffaella Settimi, Xuchang Zou...
This paper reports on work done for the Genomics Track at TREC 2004 by ConverSpeech LLC in conjunction with scientists at the Saccharomyces Genome Database (SGD), the model organi...
Colleen E. Crangle, Alex Zbyslaw, J. Michael Cherr...
Indexing file systems is a powerful means of helping users locate documents, software, and other types of data among large repositories. In environments that contain many differen...