Machine learning techniques for data extraction from semistructured sources exhibit different precision and recall characteristics. However to date the formal relationship between...
Guizhen Yang, Saikat Mukherjee, I. V. Ramakrishnan
In this study I use statistical Natural Language Processing and adapted Controlled Language methods to preprocess individual documents before they are used as source documents for ...
Deduplication, a key operation in integrating data from multiple sources, is a time-consuming, labor-intensive and domainspecific operation. We present our design of alias that us...
In this paper, we present a new approach for word sense disambiguation (WSD) using an exemplar-based learning algorithm. This approach integrates a diverse set of knowledge source...