Search Sciweavers | Sciweavers

385 search results - page 61 / 77

» A language for manipulating clustered web documents results

109

click to vote

NSDI
2010

194views Computer Networks» more NSDI 2010»

The Architecture and Implementation of an Extensible Web Crawler

15 years 3 months ago

Download www.usenix.org

Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...

Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...

claim paper

Read More »

113

click to vote

WSDM
2010
ACM

204views Data Mining» more WSDM 2010»

Learning URL patterns for webpage de-duplication

15 years 8 months ago

Download www.wsdm-conference.org

Presence of duplicate documents in the World Wide Web adversely aﬀects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...

Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...

claim paper

Read More »

149

click to vote

IDEAS
2003
IEEE

96views Database» more IDEAS 2003»

Evaluating Nested Queries on XML Data

15 years 7 months ago

Download www.di.unipi.it

In the past few years, much attention has been paid to the study of semistructured data, i.e., data with irregular, possibly unstable, and rapidly changing structure, and, in part...

Carlo Sartiani

claim paper

Read More »

117

click to vote

WWW
2008
ACM

202views Internet Technology» more WWW 2008»

Using subspace analysis for event detection from web click-through data

16 years 2 months ago

Download www2008.org

Although most of existing research usually detects events by analyzing the content or structural information of Web documents, a recent direction is to study the usage data. In th...

Ling Chen 0002, Yiqun Hu, Wolfgang Nejdl

posted by jekky

Read More »

100

click to vote

CIKM
2008
Springer

125views Information Technology» more CIKM 2008»

Learning to link with wikipedia

15 years 4 months ago

Download www.cs.waikato.ac.nz

This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify...

David N. Milne, Ian H. Witten

claim paper

Read More »

« Prev « First page 61 / 77 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers