Abstract. This paper explores the possibility of using a modified Expectation-Maximization algorithm to estimate parameters for a simple hierarchical generative model for XML retr...
We describe a system for extracting mentions of terms such as company and product names, in a large and noisy corpus of documents, such as the World Wide Web. Since natural langua...
Einat Amitay, Rani Nelken, Wayne Niblack, Ron Siva...
Abstract. We investigate the potential of coherence-based scores to predict query difficulty. The coherence of a document set associated with each query word is used to capture the...
This paper describes BABYLON, a system that attempts to overcome the shortage of parallel texts in low-density languages by supplementing existing parallel texts with texts gather...
In this paper, we propose an approach for identifying curatable articles from a large document set. This system considers three parts of an article (title ract, MeSH terms, and ca...