This paper presents a generic features selection method and its applications on some document analysis problems. The method is based on a genetic algorithm (GA), whose tness funct...
Indexes are a well established method of locating information in printed literature just as find is a popular technique when searching in digital documents. However, document reade...
Jennifer Pearson, George Buchanan, Harold W. Thimb...
The categorization of documents is traditionally topic-based. This paper presents a complementary analysis of research and experiments on genre to show that encouraging results ca...
We consider the problem of document conversion from the renderingoriented HTML markup into a semantic-oriented XML annotation defined by user-specific DTDs or XML Schema descrip...
A method is presented for segmenting documents into conceptually related areas. Determining the equivalence of text is often based on the number of word repetitions. This approach...