We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acous...
Combination of structure and content features is necessary for effective retrieval and classification of XML documents. Composite kernels provide a way for fusion of content and s...
This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, Email messages consist of a set of field...
The extraction of optimal features, in a classification sense, is still quite challenging in the context of large-scale classification problems (such as visual recognition), inv...
Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...