Sciweavers

41 search results - page 3 / 9
» Corpus Based Unsupervised Labeling of Documents
Sort
View
EMNLP
2006
14 years 11 months ago
Entity Annotation based on Inverse Index Operations
Entity annotation involves attaching a label such as `name' or `organization' to a sequence of tokens in a document. All the current rule-based and machine learningbased...
Ganesh Ramakrishnan, Sreeram Balakrishnan, Sachind...
KDD
1998
ACM
101views Data Mining» more  KDD 1998»
15 years 1 months ago
Probabilistic Modeling for Information Retrieval with Unsupervised Training Data
We apply a well-known Bayesian probabilistic model to textual information retrieval: the classification of documents based on their relevance to a query. This model was previously...
Ernest P. Chan, Santiago Garcia, Salim Roukos
ICDAR
2005
IEEE
15 years 3 months ago
Learning Diagram Parts with Hidden Random Fields
Many diagrams contain compound objects composed of parts. We propose a recognition framework that learns parts in an unsupervised way, and requires training labels only for compou...
Martin Szummer
78
Voted
ICA
2007
Springer
15 years 1 months ago
Text Clustering on Latent Thematic Spaces: Variants, Strengths and Weaknesses
Deriving a thematically meaningful partition of an unlabeled document corpus is a challenging task. In this context, the use of document representations based on latent thematic ge...
Xavier Sevillano, Germán Cobo, Francesc Al&...
DEXA
2009
Springer
173views Database» more  DEXA 2009»
15 years 4 months ago
Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents
SHIRI 1 is an ontology-based system for integration of semistructured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts ...
Mouhamadou Thiam, Nacéra Bennacer, Nathalie...