Sciweavers

CICLING
2007
Springer

Rule-Based Protein Term Identification with Help from Automatic Species Tagging

13 years 7 months ago
Rule-Based Protein Term Identification with Help from Automatic Species Tagging
In biomedical articles, terms often refer to different protein entities. For example, an arbitrary occurrence of term p53 might denote thousands of proteins across a number of species. A human annotator is able to resolve this ambiguity relatively easily, by looking at its context and if necessary, by searching an appropriate protein database. However, this phenomenon may cause much trouble to a text mining system, which does not understand human languages and hence can not identify the correct protein that the term refers to. In this paper, we present a Term Identification system which automatically assigns unique identifiers, as found in a protein database, to ambiguous protein mentions in texts. Unlike other solutions described in literature, which only work on gene/protein mentions on a specific model organism, our system is able to tackle protein mentions across many species, by integrating a machine-learning based species tagger. We have compared the performance of our automatic ...
Xinglong Wang
Added 13 Aug 2010
Updated 13 Aug 2010
Type Conference
Year 2007
Where CICLING
Authors Xinglong Wang
Comments (0)