Abstract—Recent progress in research fields such as Information Extraction and Information Retrieval enables the creation of systems providing better search experiences to web u...
Gianluca Demartini, Claudiu S. Firan, Mihai George...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Abstract. In this paper we introduce BioPubMiner, a machine learning component-based platform for biomedical information analysis. BioPubMiner employs natural language processing t...
Document storage and retrieval capabilities of the CEDAR-FOX forensic handwritten document examination system are described. The system is designed for automated and semi-automate...
This paper proposes the Metadata Production Framework (MPF) as a common platform for generating content-based metadata. A lot of research on extracting useful information from aud...