Semistructured data, in particular XML, has emerged as one of the primary means for information exchange and content management. The power of XML allows authors to structure a doc...
: Whenever transformation of data is used to bridge the gap of different data formats, and a query is given in the destination format, query reformulation can speed up the transfor...
Many different ranking algorithms based on content and context have been used in web search engines to find pages based on a user query. Furthermore, to achieve better performance ...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
In this report we describe the approach of the University of Twente to the 2006 GeoCLEF task. It is based on retrieval by content and the subsequent filtering by geographical rele...