Sciweavers

146 search results - page 3 / 30
» RoadRunner: Towards Automatic Data Extraction from Large Web...
Sort
View
87
Voted
PVLDB
2010
112views more  PVLDB 2010»
14 years 8 months ago
Towards The Web of Concepts: Extracting Concepts from Large Datasets
Concepts are sequences of words that represent real or imaginary entities or ideas that users are interested in. As a first step towards building a web of concepts that will form...
Aditya G. Parameswaran, Hector Garcia-Molina, Anan...
HIS
2003
14 years 11 months ago
Data Mining of Web Access Logs From an Academic Web Site
We have used a general purpose data mining tool to determine whether we can find any ‘golden nuggets’ in the web access logs of a large academic web site. Our goal was to use...
Victor Ciesielski, A. Lalani
WWW
2005
ACM
15 years 10 months ago
Web data extraction based on partial tree alignment
This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items...
Yanhong Zhai, Bing Liu
DEBU
2000
95views more  DEBU 2000»
14 years 9 months ago
Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach
A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web si...
Craig A. Knoblock, Kristina Lerman, Steven Minton,...
66
Voted
CIKM
2003
Springer
15 years 2 months ago
Extracting unstructured data from template generated web documents
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...