We introduce a new EM framework in which it is possible not only to optimize the model parameters but also the number of model components. A key feature of our approach is that we...
Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...
Background: Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these...
Abstract. A system of nested dichotomies is a hierarchical decomposition of a multi-class problem with c classes into c − 1 two-class problems and can be represented as a tree st...
To access the content of digital texts efficiently, it is necessary to provide more sophisticated access than keyword based searching. Genescene provides biomedical researchers wi...
Gondy Leroy, Hsinchun Chen, Jesse D. Martinez, Sha...