Sciweavers

INEX
2004
Springer

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

13 years 9 months ago
The Utrecht Blend: Basic Ingredients for an XML Retrieval System
Exploiting the structure of a document allows for more powerful information retrieval techniques. In this article a basic approach is discussed for the retrieval of XML document fragments. Based on a vector-space model for text retrieval we aim at investigating various strategies that influence the retrieval performance of an XML-based IR system. The first extension of the system uses a schema-based approach that takes into account that authors tag their text to emphasise on particular pieces of content that are of extra importance. Based on the schema used by the document collection, the system can easily derive the childs of mixed content nodes and judge those child nodes to be more important than other nodes. A second approach discussed here is based on a horizontal fragmentation of the inverse document frequencies, used by the vector space model. The underlying assumption states that the spreading of terms is related to the semantical structure of the document. However, we obser...
Roelof van Zwol, Frans Wiering, Virginia Dignum
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where INEX
Authors Roelof van Zwol, Frans Wiering, Virginia Dignum
Comments (0)