There is an exploding amount of user-generated content on the Web due to the emergence of "Web 2.0" services, such as Blogger, MySpace, Flickr, and del.icio.us. The part...
Ka Cheung Sia, Junghoo Cho, Yun Chi, Belle L. Tsen...
This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper...
: TextWise LLC. participated in the TREC-7 Cross-Language Retrieval track using the CINDOR system, which utilizes a "conceptual interlingua" representation of documents a...
Anne Diekema, Farhad Oroumchian, Paraic Sheridan, ...
In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold w...
In the area of image retrieval, post-retrieval processing is often used to refine the retrieval results to better satisfy users’ requirements. Previous methods mainly focus on p...