Sciweavers

SDM
2003
SIAM

Detection of Underrepresented Biological Sequences using Class-Conditional Distribution Models

13 years 5 months ago
Detection of Underrepresented Biological Sequences using Class-Conditional Distribution Models
A labeled sequence data set related to a certain biological property is often biased and, therefore, does not completely capture its diversity in nature. To reduce this sampling bias problem a data mining procedure is proposed for detecting underrepresented relevant sequences. The procedure is aimed at helping domain experts achieve a cost-effective qualitative enlargement of knowledge through an in-depth study of a small number of statistically underrepresented and functionally interesting sequences. Our procedure consists of: (i) learning a class-conditional distribution model on each class of labeled data; (ii) applying the models to select statistically underrepresented unlabeled sequences; and (iii) automatically evaluating their interestingness. An application of the proposed approach is illustrated on an important problem of increasing the data set of confirmed disordered proteins. The obtained results demonstrate the promise of the proposed approach for an efficient reductio...
Slobodan Vucetic, Dragoljub Pokrajac, Hongbo Xie,
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2003
Where SDM
Authors Slobodan Vucetic, Dragoljub Pokrajac, Hongbo Xie, Zoran Obradovic
Comments (0)