This paper explores correspondence and mixture topic modeling of documents tagged from two different perspectives. There has been ongoing work in topic modeling of documents with...
In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly;...
Many large-scale Web applications that require ranked top-k retrieval are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non...
George Beskales, Marcus Fontoura, Maxim Gurevich, ...
Aggregated search is the task of integrating results from potentially multiple specialized search services, or verticals, into the Web search results. The task requires predicting...
This paper addresses the problem of mining named entity translations from comparable corpora, specifically, mining English and Chinese named entity translation. We first observe...
Jinhan Kim, Long Jiang, Seung-won Hwang, Young-In ...
Modern computing systems generate large amounts of log data. System administrators or domain experts utilize the log data to understand and optimize system behaviors. Most system ...
Twitter summarizes the great deal of messages posted by users in the form of trending topics that reflect the top conversations being discussed at a given moment. These trending ...
The field of information retrieval has witnessed over 50 years of research on retrieval methods for metadata descriptions and controlled indexing languages, the prototypical exam...
Previous information extraction (IE) systems are typically organized as a pipeline architecture of separated stages which make independent local decisions. When the data grows bey...
Qi Li, Sam Anzaroot, Wen-Pin Lin, Xiang Li, Heng J...