Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic pro...
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T...
We describe cross language retrieval experiments using Amharic queries and English language document collection from our participation in the bilingual ad hoc track at the CLEF 20...
Malware detectors require a specification of malicious behavior. Typically, these specifications are manually constructed by investigating known malware. We present an automatic t...
Mihai Christodorescu, Somesh Jha, Christopher Krue...
Category ranking provides a way to classify plain text documents into a pre-determined set of categories. This work proposes to have a look at typical document collections and ana...