Sciweavers

433 search results - page 14 / 87
» Web page title extraction and its application
Sort
View
HUMAN
2005
Springer
15 years 2 months ago
How to Evaluate the Effectiveness of URL Normalizations
Syntactically different URLs could represent the same web page on the World Wide Web, and duplicate representation for web pages causes web applications to handle a large amount of...
Sang Ho Lee, Sung Jin Kim, Hyo Sook Jeong
100
Voted
DEXA
2006
Springer
197views Database» more  DEXA 2006»
14 years 11 months ago
Cleaning Web Pages for Effective Web Content Mining
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...
Jing Li, Christie I. Ezeife
ICDE
2010
IEEE
255views Database» more  ICDE 2010»
15 years 4 months ago
On supporting effective web extraction
— Commercial tuple extraction systems have enjoyed some success to extract tuples by regarding HTML pages as tree structures and exploiting XPath queries to find attributes of t...
Wook-Shin Han, Wooseong Kwak, Hwanjo Yu
KDD
2002
ACM
170views Data Mining» more  KDD 2002»
15 years 9 months ago
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web
When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web s...
Martin Ester, Hans-Peter Kriegel, Matthias Schuber...
WWW
2005
ACM
15 years 10 months ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger