Temporally-aware algorithms for document classification

10 years 6 months ago
Temporally-aware algorithms for document classification
Automatic Document Classification (ADC) is still one of the major information retrieval problems. It usually employs a supervised learning strategy, where we first build a classification model using pre-classified documents and then use this model to classify unseen documents. The majority of supervised algorithms consider that all documents provide equally important information. However, in practice, a document may be considered more or less important to build the classification model according to several factors, such as its timeliness, the venue where it was published in, its authors, among others. In this paper, we are particularly concerned with the impact that temporal effects may have on ADC and how to minimize such impact. In order to deal with these effects, we introduce a temporal weighting function (TWF) and propose a methodology to determine it for document collections. We applied the proposed methodology to ACM-DL and Medline and found that the TWF of both follows a logno...
Thiago Salles, Leonardo C. da Rocha, Gisele L. Pap
Added 21 May 2011
Updated 21 May 2011
Type Journal
Year 2010
Authors Thiago Salles, Leonardo C. da Rocha, Gisele L. Pappa, Fernando Mourão, Wagner Meira Jr., Marcos André Gonçalves
Comments (0)