We present a novel approach to managing redundancy in sequence databanks such as GenBank. We store clusters of near-identical sequences as a representative union-sequence and a se...
Michael Cameron, Yaniv Bernstein, Hugh E. Williams
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
On the Web, there is a pervasive use of XML to give lightweight semantics to textual collections. Such documentcentric XML collections require a query language that can gracefully...
Jaap Kamps, Maarten Marx, Maarten de Rijke, Bö...
This paper proposes an effective approach to provide relevant search terms for conceptual Web search. ‘Semantic Term Suggestion’ function has been included so that users can f...