Sciweavers

2553 search results - page 151 / 511
» How-To Web Pages
Sort
View
AH
2004
Springer
15 years 9 months ago
Machine Learning Methods for One-Session Ahead Prediction of Accesses to Page Categories
This paper presents a comparison among several well-known machine learning techniques when they are used to carry out a one-session ahead prediction of page categories. We use reco...
José David Martín-Guerrero, Emili Ba...
CIKM
2008
Springer
15 years 6 months ago
Coreex: content extraction from online news articles
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Jyotika Prasad, Andreas Paepcke
TREC
2003
15 years 5 months ago
Combining Structural Information and the Use of Priors in Mixed Named-Page and Homepage Finding
This paper presents Carnegie Mellon University’s experiments on the mixed named-page and homepage finding task of the TREC 12 Web Track. Our results were strong; we achieved the...
Paul Ogilvie, Jamie Callan
WWW
2007
ACM
16 years 5 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
COLCOM
2008
IEEE
15 years 6 months ago
Web Canary: A Virtualized Web Browser to Support Large-Scale Silent Collaboration in Detecting Malicious Web Sites
Abstract. Malicious Web content poses a serious threat to the Internet, organizations and users. Current approaches to detecting malicious Web content employ high-powered honey cli...
Jiang Wang, Anup K. Ghosh, Yih Huang