Sciweavers

SIGIR
2008
ACM

XML-aided phrase indexing for hypertext documents

13 years 4 months ago
XML-aided phrase indexing for hypertext documents
We combine techniques of XML Mining and Text Mining for the benefit of Information Retrieval. By manipulating the word sequence according to the XML structure of the marked-up text, we strengthen phrase boundaries so that they are more obvious to the algorithms that extract multiword sequences from text. Consequently, the quality of the indexed phrases improves, which has a positive effect on the average precision measured by the INEX 2007 standards. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing--Indexing methods General Terms Algorithms Keywords XML, Phrase, Word sequence, Text mining, XML Retrieval
Miro Lehtonen, Antoine Doucet
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Where SIGIR
Authors Miro Lehtonen, Antoine Doucet
Comments (0)