The automatic extraction of information from unstructured sources has opened up new avenues for querying, organizing, and analyzing data by drawing upon the clean semantics of str...
There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semistructured (documents). A key challenge is to integrat...
Although Web search engines have become information gateways to the Internet, for queries containing technical terms, search results often contain pages that are difficult to be ...
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transfor...
: The heterogeneous and dynamic nature of components making up a web application, the lack of effective programming mechanisms for implementing basic software engineering principle...