Sciweavers

SIGIR
2005
ACM
13 years 10 months ago
Relation between PLSA and NMF and implications
Non-negative Matrix Factorization (NMF, [5]) and Probabilistic Latent Semantic Analysis (PLSA, [4]) have been successfully applied to a number of text analysis tasks such as docum...
Éric Gaussier, Cyril Goutte
ICAIL
2005
ACM
13 years 10 months ago
Effective Document Clustering for Large Heterogeneous Law Firm Collections
Computational resources for research in legal environments have historically implied remote access to large databases of legal documents such as case law, statutes, law reviews an...
Jack G. Conrad, Khalid Al-Kofahi, Ying Zhao, Georg...
SIGIR
2006
ACM
13 years 10 months ago
Feature diversity in cluster ensembles for robust document clustering
The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one ...
Xavier Sevillano, Germán Cobo, Francesc Al&...
JCDL
2006
ACM
172views Education» more  JCDL 2006»
13 years 10 months ago
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE
Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study ...
Illhoi Yoo, Xiaohua Hu
ICDM
2006
IEEE
132views Data Mining» more  ICDM 2006»
13 years 10 months ago
High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets
High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to...
Hassan H. Malik, John R. Kender
ICDE
2006
IEEE
114views Database» more  ICDE 2006»
13 years 10 months ago
Novelty-based Incremental Document Clustering for On-line Documents
Document clustering has been used as a core technique in managing vast amount of data and providing needed information. In on-line environments, generally new information gains mo...
Sophoin Khy, Yoshiharu Ishikawa, Hiroyuki Kitagawa
CBMS
2006
IEEE
13 years 10 months ago
Biomedical Ontology MeSH Improves Document Clustering Qualify on MEDLINE Articles: A Comparison Study
Document clustering has been used for better document retrieval, document browsing, and text mining. In this paper, we investigate if biomedical ontology MeSH improves the cluster...
Illhoi Yoo, Xiaohua Hu
ICDM
2007
IEEE
179views Data Mining» more  ICDM 2007»
13 years 10 months ago
GDClust: A Graph-Based Document Clustering Technique
This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (Graph-Based Document Clustering) works with frequent senses ra...
M. Shahriar Hossain, Rafal A. Angryk
DASFAA
2007
IEEE
240views Database» more  DASFAA 2007»
13 years 10 months ago
A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering
Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic simila...
Xiaodan Zhang, Liping Jing, Xiaohua Hu, Michael K....
ICDM
2008
IEEE
147views Data Mining» more  ICDM 2008»
13 years 11 months ago
Clustering Documents with Active Learning Using Wikipedia
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this pape...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...