Abstract. In this paper, we present a method for the automatic extraction of numerical fields (zip codes, phone numbers, etc.) from incoming mail documents. The approach is based o...
This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The...
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...
Abstract. We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Mod...
This paper is concerned with a universal pattern, which is defined as a character pattern designed to have high machine-readability. This universal pattern is a character pattern...