Leveraging information from relevance assessments has been proposed as an effective means for improving retrieval. We introduce a novel language modeling method which uses inform...
AS-Index is a new index structure for exact string search in disk resident databases. It uses hashing, unlike known alternate structures, tree or trie based, and indexes every n-g...
We propose a partitioning scheme for similarity search indexes that is called Maximal Metric Margin Partitioning (MMMP). MMMP divides the data on the basis of its distribution pat...
We describe a novel simple and highly scalable semi-supervised method called Word-Class Distribution Learning (WCDL), and apply it the task of information extraction (IE) by utili...
Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kav...
Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to a...