Sciweavers

ICPR
2010
IEEE

Text Separation from Mixed Documents Using a Tree-Structured Classifier

13 years 2 months ago
Text Separation from Mixed Documents Using a Tree-Structured Classifier
In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all training data at each node with different weights. The evaluation of the proposed method is presented on a set of machineprinted documents which have been annotated by multiple writers in an office/collaborative environment.
Xujun Peng, Srirangaraj Setlur, Venu Govindaraju,
Added 13 Feb 2011
Updated 13 Feb 2011
Type Journal
Year 2010
Where ICPR
Authors Xujun Peng, Srirangaraj Setlur, Venu Govindaraju, Ramachandrula Sitaram
Comments (0)