The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a pa...
Web content mining opens up the possibility to use data presented in web pages for the discovery of interesting and useful patterns. Our web mining tool, FBL (Filtered Bayesian Le...
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
Given a user keyword query, current Web search engines return a list of individual Web pages ranked by their "goodness" with respect to the query. Thus, the basic unit fo...
Ramakrishna Varadarajan, Vagelis Hristidis, Tao Li
The GDA (Global Document Annotation) project proposes a tag set which allows machines to automatically infer the underlying semantic/pragmatic structure of documents. Its objectiv...