This paper describes our participation in the 2008 TREC Blog track. Our system consists of 3 components: data preprocessing, topic retrieval, and opinion finding. In the topic ret...
Given only the URL of a web page, can we identify its topic? This is the question that we examine in this paper. Usually, web pages are classified using their content [7], but a U...
Search engine logs are an emerging new type of data that offers interesting opportunities for data mining. Existing work on mining such data has mostly attempted to discover knowl...
Powerful SIMD instructions in modern processors offer an opportunity for greater search performance. In this paper, we apply these instructions to decoding search engine posting ...
Alexander A. Stepanov, Anil R. Gangolli, Daniel E....
Wikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyp...