Search Sciweavers | Sciweavers

433 search results - page 14 / 87

» Web page title extraction and its application

click to vote

HUMAN
2005
Springer

144views Social Sciences» more HUMAN 2005»

How to Evaluate the Effectiveness of URL Normalizations

15 years 5 months ago

Download dblab.ssu.ac.kr

Syntactically different URLs could represent the same web page on the World Wide Web, and duplicate representation for web pages causes web applications to handle a large amount of...

Sang Ho Lee, Sung Jin Kim, Hyo Sook Jeong

claim paper

Read More »

117

click to vote

DEXA
2006
Springer

197views Database» more DEXA 2006»

Cleaning Web Pages for Effective Web Content Mining

15 years 1 months ago

Download sol.cs.uwindsor.ca

Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...

Jing Li, Christie I. Ezeife

claim paper

Read More »

click to vote

ICDE
2010
IEEE

255views Database» more ICDE 2010»

On supporting effective web extraction

15 years 6 months ago

Download rosaec.snu.ac.kr

— Commercial tuple extraction systems have enjoyed some success to extract tuples by regarding HTML pages as tree structures and exploiting XPath queries to ﬁnd attributes of t...

Wook-Shin Han, Wooseong Kwak, Hwanjo Yu

claim paper

Read More »

click to vote

KDD
2002
ACM

170views Data Mining» more KDD 2002»

Web site mining: a new way to spot competitors, customers and suppliers in the world wide web

16 years 1 days ago

Download www.cs.sfu.ca

When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web s...

Martin Ester, Hans-Peter Kriegel, Matthias Schuber...

claim paper

Read More »

122

click to vote

WWW
2005
ACM

154views Internet Technology» more WWW 2005»

Thresher: automating the unwrapping of semantic content from the World Wide Web

16 years 9 days ago

Download www2005.org

We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...

Andrew Hogue, David R. Karger

claim paper

Read More »

« Prev « First page 14 / 87 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers