Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can ...
Saurabh Kataria, William Browuer, Prasenjit Mitra,...
This paper focuses on automatically improving the readability of documents. We explore mechanisms relating to content control that could be used (i) by authors to improve the qual...
This paper describes a novel browsing paradigm, taking benefit of the various types of links (e.g. thematic, temporal, references, etc.) that can be automatically built between mul...
We present a novel sequential clustering algorithm which is motivated by the Information Bottleneck (IB) method. In contrast to the agglomerative IB algorithm, the new sequential ...
This paper proposes an algorithm called Imprecise Spectrum Analysis (ISA) to carry out fast dimension reduction for document classification. ISA is designed based on the one-sided...
Hu Guan, Bin Xiao, Jingyu Zhou, Minyi Guo, Tao Yan...