Sciweavers

TREC
2000

Information Space Based on HTML Structure

13 years 5 months ago
Information Space Based on HTML Structure
The main goal for the Information Space system for TREC9 was early precision. To facilitate this, an emphasis was placed on seeking matches from only the TITLE, H1, H2 and H3 tags in the Web (wt10G) and large Web (wt100) document collections. Ranking of documents was based on a combination of Boolean union sets, term weights, and principal components analysis (PCA). Very large sparse cooccurrence matrices were created for term weighting and PCA. The Information Space system is part of a larger general software package called IRTools.
Gregory B. Newby
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2000
Where TREC
Authors Gregory B. Newby
Comments (0)