An Embedded Media Marker (EMM) is a transparent mark printed on a paper document that signifies the availability of additional media associated with that part of the document. Use...
Qiong Liu, Chunyuan Liao, Lynn Wilcox, Anthony Dun...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
In recentyears,databaseresearchandproduct developmentactivities havefocusedonsupport for non-traditional data types, such astext or multi-media documents.This paper describes an a...
A Question Answering (QA) system aims to return exact answers to natural language questions. While today information retrieval techniques are quite successful at locating within l...
Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining “external” to the language modelin...