Search Sciweavers | Sciweavers

117

AH
2004
Springer

88views Internet Technology» more AH 2004»

Machine Learning Methods for One-Session Ahead Prediction of Accesses to Page Categories

15 years 9 months ago

This paper presents a comparison among several well-known machine learning techniques when they are used to carry out a one-session ahead prediction of page categories. We use reco...

José David Martín-Guerrero, Emili Ba...

claim paper

Read More »

127

click to vote

CIKM
2008
Springer

194views Information Technology» more CIKM 2008»

Coreex: content extraction from online news articles

15 years 6 months ago

Download ilpubs.stanford.edu

We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...

Jyotika Prasad, Andreas Paepcke

claim paper

Read More »

144

click to vote

TREC
2003

103views Information Technology» more TREC 2003»

Combining Structural Information and the Use of Priors in Mixed Named-Page and Homepage Finding

15 years 5 months ago

Download www.cs.cmu.edu

This paper presents Carnegie Mellon University’s experiments on the mixed named-page and homepage finding task of the TREC 12 Web Track. Our results were strong; we achieved the...

Paul Ogilvie, Jamie Callan

claim paper

Read More »

118

click to vote

WWW
2007
ACM

162views Internet Technology» more WWW 2007»

Detecting near-duplicates for web crawling

16 years 5 months ago

Download infolab.stanford.edu

Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...

Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma

claim paper

Read More »

126

click to vote

COLCOM
2008
IEEE

121views Distributed And Parallel Com...» more COLCOM 2008»

Web Canary: A Virtualized Web Browser to Support Large-Scale Silent Collaboration in Detecting Malicious Web Sites

15 years 6 months ago

Download mason.gmu.edu

Abstract. Malicious Web content poses a serious threat to the Internet, organizations and users. Current approaches to detecting malicious Web content employ high-powered honey cli...

Jiang Wang, Anup K. Ghosh, Yih Huang

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers