The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document's metadata, such as author name and title informati...
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract ...
Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...
In software development, many kinds of knowledge are shared and reused as software patterns. Howevel; the relation analysis among software by hand is on the large scale. In this w...