This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual...
High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to...
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two ...
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, ...
This paper discusses a new type of semi-supervised document clustering that uses partial supervision to partition a large set of documents. Most clustering methods organizes docum...