In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold w...
This overview introduces the aim of the SMUC 2010 workshop, as well as the list of papers presented in the workshop. Categories and Subject Descriptors H.3 [Information Systems]: ...
Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis ...
A typical collection of personal information contains many documents and mentions many concepts (e.g., person names, events, etc.). In this environment, associative browsing betwe...
Jinyoung Kim, Anton Bakalov, David A. Smith, W. Br...
Many large-scale Web applications that require ranked top-k retrieval are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non...
George Beskales, Marcus Fontoura, Maxim Gurevich, ...