When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web s...
Martin Ester, Hans-Peter Kriegel, Matthias Schuber...
This paper considers the framework of the so-called "market basket problem", in which a database of transactions is mined for the occurrence of unusually frequent item s...
Displaying scanned book pages in a web browser is difficult, due to an array of characteristics of the common user's configuration that compound to yield text that is degrade...
Alexander J. Quinn, Chang Hu, Takeshi Arisaka, Ann...
A graph is chordal if it does not contain any induced cycle of size greater than three. An alternative characterization of chordal graphs is via a perfect elimination ordering, whi...
Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities b...
Steven Euijong Whang, David Menestrina, Georgia Ko...