With a large number of news available on the internet everyday, it is an interesting work to automatically organize news events by time order and dependencies between events. The w...
Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to...
The high computational cost of nonlinear support vector machines has limited their usability for large-scale problems. We propose two novel stochastic algorithms to tackle this pr...
We study a class of algorithms that speed up the training process of support vector machines (SVMs) by returning an approximate SVM. We focus on algorithms that reduce the size of...
One major goal for data mining is to understand data. Rule based methods are better than other methods in making mining results comprehensible. However, the current rule based cla...