We are presenting a text analysis tool set that allows analysts in various fields to sieve through large collections of multilingual news items quickly and to find information that...
Many of the documents in large text collections are duplicates and versions of each other. In recent research, we developed new methods for finding such duplicates; however, as the...
A key challenge of data-driven social science is the gathering of high quality multi-dimensional datasets. A second challenge relates to design and execution of structured experim...
Term-weighting schemes are vital to the performance of Information Retrieval models that use term frequency characteristics to determine the relevance of a document. The vector spa...
Evaluation in Information Retrieval (IR) has long focused on effectiveness and efficiency. However, new and emerging access tasks now demand alternative evaluation measures which ...