Integration of multiple heterogeneous data sources continues to be a critical problem for many application domains and a challenge for researchers world-wide. With the increasing ...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands...
—It is common for large and complex organizations to maintain repositories of business process models in order to document and to continuously improve their operations. Given suc...
Remco M. Dijkman, Marlon Dumas, Boudewijn F. van D...
Abstract. We propose a generative model for automatic query reformulations from an initial query using the underlying subtopic structure of top ranked retrieved documents. We addre...
Debasis Ganguly, Johannes Leveling, Gareth J. F. J...