Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

160

TREC
2001

125views Information Technology» more TREC 2001»

Retrieving Web Pages Using Content, Links, URLs and Anchors

15 years 6 months ago

Retrieving Web Pages Using Content, Links, URLs and Anchors

Download trec.nist.gov

For this year's web track, we concentrated on the entry page finding task. For the content-only runs, in both the ad-hoc task and the entry page finding task, we used an information retrieval system based on a simple unigram language model. In the Ad hoc task we experimented with alternatieve approaches to smoothing. For the entry page task, we incorporated additional information into the model. The sources of information we used in addition to the document's content are links, URLs and anchors. We found that almost every approach can improve the results of a content only run. In the end, a very basic approach, using the depth of the path of the URL as a prior, yielded by far the largest improvement over the content only results.

Thijs Westerveld, Wessel Kraaij, Djoerd Hiemstra

Real-time Traffic

Ad Hoc Task | Entry Page | Page Finding Task | TREC 2001 | TREC 2008 |

claim paper

Related Content

» LinkContexts for Ranking

» Linking wikipedia to the web

» An Intelligent Web Agent to Mine Bilingual Parallel Pages via Automatic Discovery of URL P...

» Using anchor texts with their hyperlink structure for web search

» Retrieving broken web links using an approach based on contextual information

» Analyzing Information Retrieval Methods to Recover Broken Web Links

» Combining anchor text categorization and graph analysis for paid link detection

» Using urls and table layout for web classification tasks

» Fast webpage classification using URL features

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	TREC
Authors	Thijs Westerveld, Wessel Kraaij, Djoerd Hiemstra

Comments (0)