Sciweavers

609 search results - page 35 / 122
» Adaptive record extraction from web pages
Sort
View
WWW
2005
ACM
16 years 2 months ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger
SIGMOD
2006
ACM
107views Database» more  SIGMOD 2006»
16 years 2 months ago
Documentum ECI self-repairing wrappers: performance analysis
Documentum Enterprise Content Integration (ECI) services is a content integration middleware that provides one-query access to the Intranet and Internet content resources. The ECI...
Boris Chidlovskii, Bruno Roustant, Marc Brette
DEBU
2000
95views more  DEBU 2000»
15 years 1 months ago
Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach
A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web si...
Craig A. Knoblock, Kristina Lerman, Steven Minton,...
PVLDB
2010
112views more  PVLDB 2010»
15 years 11 days ago
Towards The Web of Concepts: Extracting Concepts from Large Datasets
Concepts are sequences of words that represent real or imaginary entities or ideas that users are interested in. As a first step towards building a web of concepts that will form...
Aditya G. Parameswaran, Hector Garcia-Molina, Anan...
CICLING
2006
Springer
15 years 5 months ago
Extracting Key Phrases to Disambiguate Personal Names on the Web
Abstract. When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. Ho...
Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuk...