Abstract. We focus on two recently proposed algorithms in the family of “boosting”-based learners for automated text classification, AdaBoost.MH and AdaBoost.MHKR . While the ...
Pio Nardiello, Fabrizio Sebastiani, Alessandro Spe...
In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Tasks that take place over a long period of time or collaborative tasks where participants are required to develop an understanding of each other’s effort benefit from better co...
Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. H...