Background: The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of m...
David M. Blei, K. Franks, Michael I. Jordan, I. Sa...
Classifier subset selection (CSS) from a large ensemble is an effective way to design multiple classifier systems (MCSs). Given a validation dataset and a selection criterion, the...
In TREC 2007, we participate in four tasks of the Blog and Enterprise tracks. We continue experiments using Terrier1 [14], our modular and scalable Information Retrieval (IR) plat...
David Hannah, Craig Macdonald, Jie Peng, Ben He, I...
In the intellectual property field two tasks are of high relevance: prior art searching and patent classification. Prior art search is fundamental for many strategic issues such as...
Douglas Teodoro, Julien Gobeill, Emilie Pasche, Di...
This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In thi...