Browsing and searching for documents in large, online enterprise document repositories are common activities. While internet search produces satisfying results for most user queri...
Andreas Girgensohn, Frank M. Shipman III, Francine...
We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of r...
This paper describes a system to support humanities scholars in their interpretation of literary work. It presents a user interface and web architecture that integrates text minin...
Catherine Plaisant, James Rose, Bei Yu, Loretta Au...
Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge numbers of features. Most previous studies found that the major...
The Wikipedia XML collection turned out to be rich of marked-up phrases as we carried out our INEX 2007 experiments. Assuming that a phrase occurs at the inline level of the markup...