Prior Art Retrieval Using the Claims Section as a Bag of Words

10 years 5 months ago
Prior Art Retrieval Using the Claims Section as a Bag of Words
We describe our participation in the 2009 CLEF-IP task, which was targeted at priorart search for topic patent documents. Our system retrieved patent documents based on a standard bag-of-words approach for both the Main Task and the English Task. In both runs, we extracted the claim sections from all English patents in the corpus and saved them in the Lemur index format with the patent IDs as DOCIDs. These claims were then indexed using Lemur's BuildIndex function. In the topic documents we also focussed exclusively on the claims sections. These were extracted and converted to queries by removing stopwords and punctuation. We did not perform any term selection. We retrieved 100 patents per topic using Lemur's RetEval function, retrieval model TF-IDF. Compared to the other runs submitted for the track, we obtained good results in terms of nDCG (0.46) and moderate results in terms of MAP (0.054). Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1...
Suzan Verberne, Eva D'hondt
Added 08 Nov 2010
Updated 08 Nov 2010
Type Conference
Year 2009
Where CLEF
Authors Suzan Verberne, Eva D'hondt
Comments (0)