In this paper, we study search bot traffic from search engine query logs at a large scale. Although bots that generate search traffic aggressively can be easily detected, a large ...
There is an exploding amount of user-generated content on the Web due to the emergence of "Web 2.0" services, such as Blogger, MySpace, Flickr, and del.icio.us. The part...
Ka Cheung Sia, Junghoo Cho, Yun Chi, Belle L. Tsen...
Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can ser...
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where rel...