Sciweavers

41 search results - page 3 / 9
» Corpus Based Unsupervised Labeling of Documents
Sort
View
EMNLP
2006
13 years 7 months ago
Entity Annotation based on Inverse Index Operations
Entity annotation involves attaching a label such as `name' or `organization' to a sequence of tokens in a document. All the current rule-based and machine learningbased...
Ganesh Ramakrishnan, Sreeram Balakrishnan, Sachind...
KDD
1998
ACM
101views Data Mining» more  KDD 1998»
13 years 10 months ago
Probabilistic Modeling for Information Retrieval with Unsupervised Training Data
We apply a well-known Bayesian probabilistic model to textual information retrieval: the classification of documents based on their relevance to a query. This model was previously...
Ernest P. Chan, Santiago Garcia, Salim Roukos
ICDAR
2005
IEEE
13 years 12 months ago
Learning Diagram Parts with Hidden Random Fields
Many diagrams contain compound objects composed of parts. We propose a recognition framework that learns parts in an unsupervised way, and requires training labels only for compou...
Martin Szummer
ICA
2007
Springer
13 years 10 months ago
Text Clustering on Latent Thematic Spaces: Variants, Strengths and Weaknesses
Deriving a thematically meaningful partition of an unlabeled document corpus is a challenging task. In this context, the use of document representations based on latent thematic ge...
Xavier Sevillano, Germán Cobo, Francesc Al&...
DEXA
2009
Springer
173views Database» more  DEXA 2009»
14 years 28 days ago
Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents
SHIRI 1 is an ontology-based system for integration of semistructured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts ...
Mouhamadou Thiam, Nacéra Bennacer, Nathalie...