Sciweavers

385 search results - page 61 / 77
» A language for manipulating clustered web documents results
Sort
View
NSDI
2010
14 years 11 months ago
The Architecture and Implementation of an Extensible Web Crawler
Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...
Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
15 years 4 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
IDEAS
2003
IEEE
96views Database» more  IDEAS 2003»
15 years 3 months ago
Evaluating Nested Queries on XML Data
In the past few years, much attention has been paid to the study of semistructured data, i.e., data with irregular, possibly unstable, and rapidly changing structure, and, in part...
Carlo Sartiani
WWW
2008
ACM
15 years 10 months ago
Using subspace analysis for event detection from web click-through data
Although most of existing research usually detects events by analyzing the content or structural information of Web documents, a recent direction is to study the usage data. In th...
Ling Chen 0002, Yiqun Hu, Wolfgang Nejdl
CIKM
2008
Springer
14 years 11 months ago
Learning to link with wikipedia
This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify...
David N. Milne, Ian H. Witten