Searching and extracting meaningful information out of highly heterogeneous datasets is a hot topic that received a lot of attention. However, the existing solutions are based on e...
Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated c...
Erik Linstead, Paul Rigor, Sushil Krishna Bajracha...
Topics in prior-art patent search are typically full patent applications and relevant items are patents often taken from sources in different languages. Cross language patent retr...
Background: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technolog...
Background: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document ...