Web archives preserve the history of Web sites and have high long-term value for media and business analysts. Such archives are maintained by periodically re-crawling entire Web s...
Marc Spaniol, Dimitar Denev, Arturas Mazeika, Gerh...
The most commonly used request processing model in multithreaded web servers is thread-per-request, in which an individual thread is bound to serve each web request. However, with...
Contextual advertising on web pages has become very popular recently and it poses its own set of unique text mining challenges. Often advertisers wish to either target (or avoid) ...
Yi Zhang, Arun C. Surendran, John C. Platt, Mukund...
With the rapid and dramatic increase in web feeds published by different publishers, providers or websites via Really Simple Syndication (RSS) and Atom, users cannot be expected t...
In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient min...