Sciweavers

BMCBI
2008

Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses

13 years 4 months ago
Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses
Background: The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata quality. However, the semantic heterogeneity and annotation inconsistencies in biological databases greatly increase the complexity of aggregating and cleaning metadata. Manual curation of datasets, traditionally favoured by life scientists, is impractical for studies involving thousands of records. In this study, we investigate quality issues that affect major public databases, and quantify the effectiveness of an automated metadata extraction approach that combines structural and semantic rules. We applied this approach to more than 90,000 influenza A records, to annotate sequences with protein name, virus subtype, isolate, host, geographic origin, and year of isolat...
Olivo Miotto, Tin Wee Tan, Vladimir Brusic
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2008
Where BMCBI
Authors Olivo Miotto, Tin Wee Tan, Vladimir Brusic
Comments (0)