Sciweavers

498 search results - page 47 / 100
» Robust web content extraction
Sort
View
SIGMOD
2006
ACM
107views Database» more  SIGMOD 2006»
15 years 10 months ago
Documentum ECI self-repairing wrappers: performance analysis
Documentum Enterprise Content Integration (ECI) services is a content integration middleware that provides one-query access to the Intranet and Internet content resources. The ECI...
Boris Chidlovskii, Bruno Roustant, Marc Brette
SPIESR
2001
152views Database» more  SPIESR 2001»
14 years 11 months ago
Video summarization and semantics editing tools
This paper describes a video summarization and semantics editing tool that is suited for content-based video indexing and retrieval with appropriate human operator assistance. The...
Li-Qun Xu, Jian Zhu, Fred Stentiford
WWW
2009
ACM
15 years 10 months ago
Sitemaps: above and beyond the crawl of duty
Comprehensive coverage of the public web is crucial to web search engines. Search engines use crawlers to retrieve pages and then discover new ones by extracting the pages' o...
Uri Schonfeld, Narayanan Shivakumar
DEBU
2000
101views more  DEBU 2000»
14 years 9 months ago
Learning to Understand the Web
In a traditional information retrieval system, it is assumed that queries can be posed about any topic. In reality, a large fraction of web queries are posed about a relatively sm...
William W. Cohen, Andrew McCallum, Dallan Quass
CAISE
2010
Springer
14 years 11 months ago
Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources
There is a large amount of data that is published on the Web and several techniques have been developed to extract and integrate data from Web sources. However, Web data are inhere...
Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, ...