Sciweavers

SDM
2010
SIAM

Semi-supervised Bio-named Entity Recognition with Word-Codebook Learning

13 years 5 months ago
Semi-supervised Bio-named Entity Recognition with Word-Codebook Learning
We describe a novel semi-supervised method called WordCodebook Learning (WCL), and apply it to the task of bionamed entity recognition (bioNER). Typical bioNER systems can be seen as tasks of assigning labels to words in bioliterature text. To improve supervised tagging, WCL learns a class of word-level feature embeddings to capture word semantic meanings or word label patterns from a large unlabeled corpus. Words are then clustered according to their embedding vectors through a vector quantization step, where each word is assigned into one of the codewords in a codebook. Finally codewords are treated as new word attributes and are added for entity labeling. Two types of wordcodebook learning are proposed: (1) General WCL, where an unsupervised method uses contextual semantic similarity of words to learn accurate word representations; (2) Task-oriented WCL, where for every word a semi-supervised method learns target-class label patterns from unlabeled data using supervised signals fro...
Pavel P. Kuksa, Yanjun Qi
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where SDM
Authors Pavel P. Kuksa, Yanjun Qi
Comments (0)