We consider the problem of content extraction from online news webpages. To explore to what extent the syntactic markup and the visual structure of a webpage facilitate the extrac...
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
Image anchor templates are used in document image analysis for document classification, data localization, and other tasks. Current tools allow human operators to mark out small s...
We describe a simple active learning heuristic which greatly enhances the generalization behavior of support vector machines (SVMs) on several practical document classification ta...
A novel online dynamic value system for machine learning is proposed in this paper. The proposed system has a dual network structure: data processing network (DPN) and information ...