Authority flow is an effective ranking mechanism for answering queries on a broad class of data. Systems have been developed to apply this principle on the Web (PageRank and topic ...
In data integration applications, a join matches elements that are common to two data sources. Often, however, elements are represented slightly different in each source, so an app...
Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformat...
Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spa...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...