In automated multi-label text categorization, an automatic categorization system should output a category set, whose size is unknown a priori, for each document under analysis. Ma...
Claudine Badue, Felipe Pedroni, Alberto Ferreira d...
We present the architecture of nreduce, a distributed virtual machine which uses parallel graph reduction to run programs across a set of computers. It executes code written in a ...
Peter M. Kelly, Paul D. Coddington, Andrew L. Wend...
Multistructured documents are documents whose structure is composed of a set of concurrent hierarchical structures. In this paper, we propose a new model of multistructured docume...
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closel...
We study cost-sensitive learning of decision trees that incorporate both test costs and misclassification costs. In particular, we first propose a lazy decision tree learning that ...