One of the biggest challenges in building effective anti-spam solutions is designing systems to defend against the everevolving bag of tricks spammers use to defeat them. Because ...
We approached the problem of classifying papers for the TREC 2004 Genomics Track triage task as a four step process: feature generation, feature selection, classifier training, an...
Aaron M. Cohen, Ravi Teja Bhupatiraju, William R. ...
Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has received an...
In this paper we study the problem of collecting training samples for building enterprise taxonomies. We develop a computer-aided tool named InfoAnalyzer, which can effectively as...
Overall performance of the data mining process depends not just on the value of the induced knowledge but also on various costs of the process itself such as the cost of acquiring...