Sciweavers

735 search results - page 87 / 147
» Corpora and data preparation
Sort
View
ACL
2006
14 years 11 months ago
Semi-Supervised Learning of Partial Cognates Using Bilingual Bootstrapping
Partial cognates are pairs of words in two languages that have the same meaning in some, but not all contexts. Detecting the actual meaning of a partial cognate in context can be ...
Oana Frunza, Diana Zaiu Inkpen
ACL
2006
14 years 11 months ago
Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation
We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entr...
Ruiqiang Zhang, Gen-ichiro Kikui, Eiichiro Sumita
EACL
2006
ACL Anthology
14 years 11 months ago
Web Text Corpus for Natural Language Processing
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...
Vinci Liu, James R. Curran
ACL
2001
14 years 11 months ago
Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems
This paper presents a method that assists in maintaining a rule-based named-entity recognition and classification system. The underlying idea is to use a separate system, construc...
Georgios Petasis, Frantz Vichot, Francis Wolinski,...
NIPS
2001
14 years 11 months ago
Latent Dirichlet Allocation
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian m...
David M. Blei, Andrew Y. Ng, Michael I. Jordan