Say you are looking for information about a particular person. A search engine returns many pages for that person's name but which pages are about the person you care about, ...
Improving the precision of information retrieval has been a challenging issue on Chinese Web. As exemplified by Chinese recipes on the Web, it is not easy/natural for people to us...
The selection of indexing terms for representing documents is a key decision that limits how effective subsequent retrieval can be. Often stemming algorithms are used to normaliz...
Users of a digital book library system typically interact with the system to search for books by querying on the metadata describing the books or to search for information in the p...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz