: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and...
XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encod...
Charu C. Aggarwal, Na Ta, Jianyong Wang, Jianhua F...
Modern web search engines are expected to return top-k results efficiently given a query. Although many dynamic index pruning strategies have been proposed for efficient top-k com...
We visualize the structure of sections of the World Wide Web by constructing graphical representations in 3D hyperbolic space. The felicitous property that hyperbolic space has â€...
This paper presents a potential seed selection algorithm for web crawlers using a gain - share scoring approach. Initially we consider a set of arbitrarily chosen tourism queries. ...