The growth of the web has directly influenced the increase in the availability of relational data. One of the key problems in mining such data is computing the similarity between o...
Pradeep Muthukrishnan, Dragomir R. Radev, Qiaozhu ...
Abstract. The current generation of data mining tools have limited capacity and performance, since these tools tend to be sequential. This paper explores a migration path out of th...
Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation b...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
A data mining component is included in Microsoft SQL Server 2000 and SQL Server 2005, one of the most popular DBMSs. This gives a push for data mining technologies to move from a ...
Many approaches to active learning involve periodically training one classifier and choosing data points with the lowest confidence. An alternative approach is to periodically cho...