We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
This paper presents a new enhanced text extraction algorithm from degraded document images on the basis of the probabilistic models. The observed document image is considered as a...
—Libraries in South Asia hold huge collections of valuable printed documents in Urdu and it is of interest to digitize these collections to make them more accessible. The unavail...
Archeological sites have heterogeneous information ranging from different artifacts, image data, geo-spatial information, chronological data, and other relevant metadata. ETANA-DL,...
Naga Srinivas Vemuri, Ricardo da Silva Torres, Rao...
Focusing on the context of XML retrieval, in this paper we propose a general methodology for managing structured queries (involving both content and structure) within any given st...