We approached the problem of classifying papers for the TREC 2004 Genomics Track triage task as a four step process: feature generation, feature selection, classifier training, an...
Aaron M. Cohen, Ravi Teja Bhupatiraju, William R. ...
Word segmentation is a crucial step for segmentation-free document analysis systems and is used for creating an index based on word matching. In this paper, we propose a novel met...
By exploiting the object-oriented dynamic composability of modern document applications and formats, malcode hidden in otherwise inconspicuous documents can reach third-party appli...
Wei-Jen Li, Salvatore J. Stolfo, Angelos Stavrou, ...
This report addresses some of our observations made in a dozen of projects in the area of software testing, and more specifically, in automated testing. It documents, analyzes and...
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this pape...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...