Length normalization in XML retrieval

15 years 6 months ago

Download staff.science.uva.nl

XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a potentially retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length bias introduced by the amount of smoothing, and show the importance of extreme length priors for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-oﬀ value) does not create an appropriate document length normalization. Even after increasing the minimal size of XML ele...

Jaap Kamps, Maarten de Rijke, Börkur Sigurbj&

Real-time Traffic

Document Length Normalization | Length Normalization | SIGIR 2004 | XML Retrieval |

claim paper

» Examining topic shifts in contentoriented XML retrieval

» Hierarchical Indexing and Flexible Element Retrieval for Structured Document

» Automatic Face Recognition for Film Character Retrieval in FeatureLength Films

» Lowerbounding term frequency normalization

» Using Topic Shifts for Focussed Access to XML Repositories

» A signaltonoise approach to score normalization

» Analysis of Combining Multiple Query Representations with Varying Lengths in a Single Engi...

» Structured Document Retrieval Multimedia Retrieval and Entity Ranking Using PFTijah

Post Info
More Details (n/a)

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	SIGIR
Authors	Jaap Kamps, Maarten de Rijke, Börkur Sigurbjörnsson

Comments (0)

Sciweavers

Length normalization in XML retrieval

Document Length Normalization | Length Normalization | SIGIR 2004 | XML Retrieval |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers