Sciweavers

280 search results - page 55 / 56
» pvldb 2008
Sort
View
PVLDB
2008
99views more  PVLDB 2008»
13 years 4 months ago
Industry-scale duplicate detection
Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in...
Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lu...
PVLDB
2008
141views more  PVLDB 2008»
13 years 4 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
PVLDB
2008
182views more  PVLDB 2008»
13 years 4 months ago
SCOPE: easy and efficient parallel processing of massive data sets
Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, pr...
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson,...
PVLDB
2008
205views more  PVLDB 2008»
13 years 4 months ago
Making SENSE: socially enhanced search and exploration
Online communities like Flickr, del.icio.us and YouTube have established themselves as very popular and powerful services for publishing and searching contents, but also for ident...
Tom Crecelius, Mouna Kacimi, Sebastian Michel, Tho...
PVLDB
2008
131views more  PVLDB 2008»
13 years 4 months ago
Learning to create data-integrating queries
The number of potentially-related data resources available for querying -- databases, data warehouses, virtual integrated schemas -continues to grow rapidly. Perhaps no area has s...
Partha Pratim Talukdar, Marie Jacob, Muhammad Salm...