Sciweavers

563 search results - page 16 / 113
» Crawling the web for structured documents
Sort
View
125
Voted
DEXA
2010
Springer
226views Database» more  DEXA 2010»
15 years 12 days ago
Vi-DIFF: Understanding Web Pages Changes
Nowadays, many applications are interested in detecting and discovering changes on the web to help users to understand page updates and more generally, the web dynamics. Web archiv...
Zeynep Pehlivan, Myriam Ben Saad, Stéphane ...
110
Voted
CLEF
2010
Springer
15 years 3 months ago
MapReduce for Information Retrieval Evaluation: "Let's Quickly Test This on 12 TB of Data"
We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use ...
Djoerd Hiemstra, Claudia Hauff
103
Voted
CORR
2010
Springer
102views Education» more  CORR 2010»
15 years 1 months ago
MIREX: MapReduce Information Retrieval Experiments
We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use...
Djoerd Hiemstra, Claudia Hauff
WWW
2005
ACM
16 years 2 months ago
Predictive ranking: a novel page ranking approach by estimating the web structure
PageRank (PR) is one of the most popular ways to rank web pages. However, as the Web continues to grow in volume, it is becoming more and more difficult to crawl all the available...
Haixuan Yang, Irwin King, Michael R. Lyu
NDSS
2009
IEEE
15 years 8 months ago
Document Structure Integrity: A Robust Basis for Cross-site Scripting Defense
Cross-site scripting (or XSS) has been the most dominant class of web vulnerabilities in 2007. The main underlying reason for XSS vulnerabilities is that web markup and client-sid...
Yacin Nadji, Prateek Saxena, Dawn Song