In this paper, we proposed a novel probabilistic generative model to deal with explicit multiple-topic documents: Parametric Dirichlet Mixture Model(PDMM). PDMM is an expansion of...
—This paper presents an approach using social semantics for the task of topic labelling by means of Open Topic Models. Our approach utilizes a social ontology to create an alignm...
This paper presents an approach for extracting and segmenting tables from Chinese ink documents based on a matrix model. An ink document is first modeled as a matrix containing i...
Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recent...
Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document met...