Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g., banner ads, navigation bars, ...
The purpose of this study is to demonstrate the benefit of using common data mining techniques on survey data where statistical analysis is routinely applied. The statistical surv...
Hongxing He, Huidong Jin, Jie Chen, Damien McAulla...
The use of unlabeled data to aid classification is important as labeled data is often available in limited quantity. Instead of utilizing training samples directly into semi-super...
This paper describes a study performed in an industrial setting that attempts to build predictive models to identify parts of a Java system with a high probability of fault. The s...
The problem of automatically classifying the gender of a blog author has important applications in many commercial domains. Existing systems mainly use features such as words, wor...