Sciweavers

FINTAL
2006

Morphological Lexicon Extraction from Raw Text Data

13 years 8 months ago
Morphological Lexicon Extraction from Raw Text Data
The tool extract enables the automatic extraction of lemma-paradigm pairs from raw text data. The tool uses search patterns that consist of regular expressions and propositional logic. These search patterns define sufficient conditions for including lemma-paradigm pairs in the lexicon, on the basis of word forms occurring in the data. This paper explains the search pattern syntax of extract as well as the search algorithm, and discusses the design of search patterns from the recall and precision point of view. The extract tool was developed for morphologies defined in the Functional Morphology tool [1], but it is usable for all systems that implement a word-andparadigm description of a morphology. The usefulness of the tool is demonstrated by a case study on the Canadian Hansards Corpus of French. The result is evaluated in terms of precision of the extracted lemmas and statistics on coverage and rule productiveness. Competitive extraction figures show that human-written rules in a tai...
Markus Forsberg, Harald Hammarström, Aarne Ra
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where FINTAL
Authors Markus Forsberg, Harald Hammarström, Aarne Ranta
Comments (0)