In this article, we propose a special type of decision tree, called a decision cascade, for binarizing document images. Such images are produced by cameras, resulting in varying de...
ABSTRACT. In the framework of the LegDoc project at Xerox Research Centre Europe, we are developing components for the semantic annotation of semi-structured documents. While certa...
Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a pr...
We propose a semantic tagger that provides high level concept information for phrases based on several kinds of low level information about words in clinical narrative texts. The ...
Internet search engines identify web pages that contain user-specified keywords, and then rank these pages according to their (heuristically assessed) relevance to the user’s qu...