Sciweavers

602 search results - page 3 / 121
» Integrating Data and Probabilistically Structured Text Docum...
Sort
View
DAS
2010
Springer
13 years 3 months ago
Information extraction by finding repeated structure
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For exampl...
Evgeniy Bart, Prateek Sarkar
SDM
2009
SIAM
235views Data Mining» more  SDM 2009»
14 years 2 months ago
Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases.
As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstruc...
ChengXiang Zhai, Duo Zhang, Jiawei Han
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 6 months ago
Structured entity identification and document categorization: two tasks with one joint model
Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
Indrajit Bhattacharya, Shantanu Godbole, Sachindra...
ICDM
2009
IEEE
164views Data Mining» more  ICDM 2009»
14 years 6 days ago
iTopicModel: Information Network-Integrated Topic Modeling
—Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online da...
Yizhou Sun, Jiawei Han, Jing Gao, Yintao Yu
PKDD
2005
Springer
122views Data Mining» more  PKDD 2005»
13 years 11 months ago
A Probabilistic Clustering-Projection Model for Discrete Data
For discrete co-occurrence data like documents and words, calculating optimal projections and clustering are two different but related tasks. The goal of projection is to find a ...
Shipeng Yu, Kai Yu, Volker Tresp, Hans-Peter Krieg...