Sciweavers

602 search results - page 10 / 121
» Integrating Data and Probabilistically Structured Text Docum...
Sort
View
KDD
2006
ACM
175views Data Mining» more  KDD 2006»
15 years 9 months ago
A mixture model for contextual text mining
Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations...
Qiaozhu Mei, ChengXiang Zhai
KDD
2007
ACM
167views Data Mining» more  KDD 2007»
15 years 9 months ago
Generalized component analysis for text with heterogeneous attributes
We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. O...
Xuerui Wang, Chris Pal, Andrew McCallum
SIGIR
2004
ACM
15 years 2 months ago
GaP: a factor model for discrete data
We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called “GaP” for Gamma-Poisson, the distri...
John F. Canny
DOCENG
2003
ACM
15 years 2 months ago
INFTY: an integrated OCR system for mathematical documents
An integrated OCR system for mathematical documents, called INFTY, is presented. INFTY consists of four procedures, i.e., layout analysis, character recognition, structure analysi...
Masakazu Suzuki, Fumikazu Tamari, Ryoji Fukuda, Se...
AUSDM
2008
Springer
243views Data Mining» more  AUSDM 2008»
14 years 11 months ago
Structure-Based Document Model with Discrete Wavelet Transforms and Its Application to Document Classification
Term signal is an existing text representation that depicts a term as a vector of frequencies of occurrences in a number of user-defined partitions of a document. Although term si...
Supphachai Thaicharoen, Tom Altman, Krzysztof J. C...