Sciweavers

498 search results - page 47 / 100
» Robust web content extraction
Sort
View
159
Voted
SIGMOD
2006
ACM
107views Database» more  SIGMOD 2006»
16 years 18 days ago
Documentum ECI self-repairing wrappers: performance analysis
Documentum Enterprise Content Integration (ECI) services is a content integration middleware that provides one-query access to the Intranet and Internet content resources. The ECI...
Boris Chidlovskii, Bruno Roustant, Marc Brette
SPIESR
2001
152views Database» more  SPIESR 2001»
15 years 1 months ago
Video summarization and semantics editing tools
This paper describes a video summarization and semantics editing tool that is suited for content-based video indexing and retrieval with appropriate human operator assistance. The...
Li-Qun Xu, Jian Zhu, Fred Stentiford
101
Voted
WWW
2009
ACM
16 years 1 months ago
Sitemaps: above and beyond the crawl of duty
Comprehensive coverage of the public web is crucial to web search engines. Search engines use crawlers to retrieve pages and then discover new ones by extracting the pages' o...
Uri Schonfeld, Narayanan Shivakumar
105
Voted
DEBU
2000
101views more  DEBU 2000»
15 years 8 days ago
Learning to Understand the Web
In a traditional information retrieval system, it is assumed that queries can be posed about any topic. In reality, a large fraction of web queries are posed about a relatively sm...
William W. Cohen, Andrew McCallum, Dallan Quass
131
Voted
CAISE
2010
Springer
15 years 1 months ago
Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources
There is a large amount of data that is published on the Web and several techniques have been developed to extract and integrate data from Web sources. However, Web data are inhere...
Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, ...