HTTP provides a mechanism to connect web sites. Almost all sites have a large amount of hypertext content that provides connection to other sites in the World Wide Web. The succes...
The high availability of video streams is making necessary mechanisms for indexing such contents in the Web world. In this paper we focus on news programs and we propose a mechani...
We present a new algorithm for finding large, dense subgraphs in massive graphs. Our algorithm is based on a recursive application of fingerprinting via shingles, and is extreme...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
There have been many attempts to study the content of the web, either through human or automatic agents. Five different previously used web survey methodologies are described and ...