Sciweavers

ACL
2001

Serial Combination of Rules and Statistics: A Case Study in Czech Tagging

13 years 5 months ago
Serial Combination of Rules and Statistics: A Case Study in Czech Tagging
A hybrid system is described which combines the strength of manual rulewriting and statistical learning, obtaining results superior to both methods if applied separately. The combination of a rule-based system and a statistical one is not parallel but serial: the rule-based system performing partial disambiguation with recall close to 100% is applied first, and a trigram HMM tagger runs on its results. An experiment in Czech tagging has been performed with encouraging results. 1 Tagging of Inflective Languages Inflective languages pose a specific problem in tagging due to two phenomena: highly inflective nature (causing sparse data problem in any statistically-based system), and free word order (causing fixed-context systems, such as n-gram Hidden Markov Models (HMMs), to be even less adequate than for English). The average tagset contains about 1,000 - 2,000 distinct tags; the size of the set of possible and plausible tags can reach several thousands. Apart from agglutinative languag...
Jan Hajic, Pavel Krbec, Pavel Kveton, Karel Oliva,
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where ACL
Authors Jan Hajic, Pavel Krbec, Pavel Kveton, Karel Oliva, Vladimir Petkevic
Comments (0)