We are interested in retrieving information from conversational speech corpora, such as call-center data. This data comprises spontaneous speech conversations with low recording q...
Modern web search engines are expected to return top-k results efficiently given a query. Although many dynamic index pruning strategies have been proposed for efficient top-k com...
The use of NLP techniques for document classification has not produced significant improvements in performance within the standard term weighting statistical assignment paradigm (...
We investigate the problem of learning document classifiers in a multilingual setting, from collections where labels are only partially available. We address this problem in the ...
Ontologies represent a method of formally expressing a shared understanding of information, and have been seen by many authors as a prerequisite for the "Semantic web". ...