Sciweavers

311 search results - page 14 / 63
» Cleaning Web Pages for Effective Web Content Mining
Sort
View
WWW
2007
ACM
15 years 10 months ago
Homepage live: automatic block tracing for web personalization
The emergence of personalized homepage services, e.g. personalized Google Homepage and Microsoft Windows Live, has enabled Web users to select Web contents of interest and to aggr...
Jie Han, Dingyi Han, Chenxi Lin, Hua-Jun Zeng, Zhe...
SIGIR
2004
ACM
15 years 2 months ago
Web-a-where: geotagging web content
We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it as...
Einat Amitay, Nadav Har'El, Ron Sivan, Aya Soffer
WIRI
2005
IEEE
15 years 3 months ago
Extended Link Analysis for Extracting Spatial Information Hubs
Recently, web mining that tries to find useful knowledge from the vast amount of web pages has attracted a lot of research interests. Besides, it is becoming an essential task to...
Jianwei Zhang 0002, Yoshiharu Ishikawa, Hiroyuki K...
WWW
2003
ACM
15 years 10 months ago
Detecting Near-replicas on the Web by Content and Hyperlink Analysis
The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc...
Ernesto Di Iorio, Michelangelo Diligenti, Marco Go...
ICDE
2004
IEEE
117views Database» more  ICDE 2004»
15 years 10 months ago
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient min...
James Caverlee, Ling Liu, David Buttler