Sciweavers

148 search results - page 23 / 30
» Landmark Extraction: A Web Mining Approach
Sort
View
JCDL
2004
ACM
198views Education» more  JCDL 2004»
15 years 2 months ago
Finding authoritative people from the web
Today’s web is so huge and diverse that it arguably reflects the real world. For this reason, searching the web is a promising approach to find things in the real world. This ...
Masanori Harada, Shin-ya Sato, Kazuhiro Kazama
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
15 years 4 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
SAC
2005
ACM
15 years 3 months ago
Automatic wrapper maintenance for semi-structured web sources using results from previous queries
During the last years, significant attention has been paid to the problem of building wrappers for extracting data from semistructured web sources. Nevertheless, since web sources...
Juan Raposo, Alberto Pan, Manuel Álvarez, &...
EMNLP
2007
14 years 11 months ago
Learning to Find English to Chinese Transliterations on the Web
We present a method for learning to find English to Chinese transliterations on the Web. In our approach, proper nouns are expanded into new queries aimed at maximizing the probab...
Jian-Cheng Wu, Jason S. Chang
WWW
2005
ACM
15 years 10 months ago
The volume and evolution of web page templates
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...
David Gibson, Kunal Punera, Andrew Tomkins