A Semi-Supervised Document Clustering Technique for Information Organization

13 years 11 months ago
A Semi-Supervised Document Clustering Technique for Information Organization
This paper discusses a new type of semi-supervised document clustering that uses partial supervision to partition a large set of documents. Most clustering methods organizes documents into groups based only on similarity measures. Unfortunately, the traditional approaches to document clustering are often unable to correctly discern structural details hidden within the document corpus because their algorithms inherently strongly depend on the document themselves and their similarity to each other. In this paper, we attempt to isolate more semantically coherent clusters by employing the domain-specific knowledge provided by a document analyst. By using external human knowledge to guide the clustering mechanism with some flexibility when creating the clusters, clustering efficiency can be considerably enhanced. As a basic clustering strategy, we use a variant of complete-linkage agglomerative hierarchical clustering, and develop the concepts (or seeds) of requested clusters by exploiti...
Han-joon Kim, Sang-goo Lee
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 2000
Where CIKM
Authors Han-joon Kim, Sang-goo Lee
Comments (0)