Sciweavers

JCDL
2009
ACM

Topic model methods for automatically identifying out-of-scope resources

13 years 11 months ago
Topic model methods for automatically identifying out-of-scope resources
Recent years have seen the rise of subject-themed digital libraries, such as the NSDL pathways and the Digital Library for Earth System Education (DLESE). These libraries often need to manually verify that contributed resources cover topics that fit within the theme of the library. We show that such scope judgments can be automated using a combination of text classification techniques and topic modeling. Our models address two significant challenges in making scope judgments: only a small number of out-of-scope resources are typically available, and the topic distinctions required for digital libraries are much more subtle than classic text classification problems. To meet these challenges, our models combine support vector machine learners optimized to different performance metrics and semantic topics induced by unsupervised statistical topic models. Our best model is able to distinguish resources that belong in DLESE from resources that don’t with an accuracy of around 70%. W...
Steven Bethard, Soumya Ghosh, James H. Martin, Tam
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Where JCDL
Authors Steven Bethard, Soumya Ghosh, James H. Martin, Tamara Sumner
Comments (0)