Dividing web pages into fragments has been shown to provide significant benefits for both content generation and caching. In order for a web site to use fragment-based content gen...
Lakshmish Ramaswamy, Arun Iyengar, Ling Liu, Fred ...
Many visual similarity-based phishing page detectors have been developed to detect phishing webpages, however, scammers now create polymorphic phishing pages to breach the defense ...
In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities pre...
The term web genre denotes the type of a given web resource, in contrast to the topic of its content. In this research, we focus on recognizing the web genres blog, wiki and forum...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz