Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web...
People regularly interact with different representations of Web pages. A person looking for new information may initially find a Web page represented as a short snippet rendered b...
Jaime Teevan, Edward Cutrell, Danyel Fisher, Steve...
Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g., banner ads, navigation bars, ...
Automatic classification of web pages is an effective way to deal with the difficulty of retrieving information from the Internet. Although there are many automatic classification...
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...