Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
This paper is concerned with the problem of Imbalanced Classification (IC) in web mining, which often arises on the web due to the "Matthew Effect". As web IC applicatio...
Web link analysis has proven to be a significant enhancement for quality based web search. Most existing links can be classified into two categories: intra-type links (e.g., web h...
We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number...
Automatic classification of web pages is an effective way to deal with the difficulty of retrieving information from the Internet. Although there are many automatic classification...