Parsimonious Language Models for a Terabyte of Text

13 years 5 months ago

Download trec.nist.gov

: The aims of this paper are twofold. Our ﬁrst aim is to compare results of the earlier Terabyte tracks to the Million Query track. We submitted a number of runs using different document representations (such as full-text, title-ﬁelds, or incoming anchor-texts) to increase pool diversity. The initial results show broad agreement in system rankings over various measures on topic sets judged at both Terabyte and Million Query tracks, with runs using the full-text index giving superior results on all measures, but also some noteworthy upsets. Our second aim is to explore the use of parsimonious language models for retrieval on terabytescale collections. These models are smaller thus more efﬁcient than the standard language models when used at indexing time, and they may also improve retrieval performance. We have conducted initial experiments using parsimonious models in combination with pseudo-relevance feedback, for both the Terabyte and Million Query track topic sets, and obtaine...

Djoerd Hiemstra, Rongmei Li, Jaap Kamps, Rianne Ka

Real-time Traffic

Earlier Terabyte Tracks | Query Track | Terabyte | TREC 2007 | TREC 2008 |

claim paper

» Language Models for Searching in Web Corpora

» Optimization strategies for complex queries

Post Info
More Details (n/a)

Added	07 Nov 2010
Updated	07 Nov 2010
Type	Conference
Year	2007
Where	TREC
Authors	Djoerd Hiemstra, Rongmei Li, Jaap Kamps, Rianne Kaptein

Comments (0)

Sciweavers

Parsimonious Language Models for a Terabyte of Text

Earlier Terabyte Tracks | Query Track | Terabyte | TREC 2007 | TREC 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers