Abstract. Poor quality data may be detected and corrected by performing various quality assurance activities that rely on techniques with different efficacy and cost. In this pape...
Lei Jiang, Daniele Barone, Alexander Borgida, John...
—This paper proposes a model-based text line segmentation algorithm for machine-printed document images. The model is based on geometric configuration which uses the interline sp...
We tackle the problem of disambiguating entities on the Web. We propose a user-driven scheme where graphs of entities ? represented by globally identifiable declarative artifacts ...
Hermann de Meer, Karl Aberer, Michael Jost, Parisa...
Systems based on statistical and machine learning methods have been shown to be extremely effective and scalable for the analysis of large amount of textual data. However, in the r...
The comparison of manually annotated medical images can be done using the comparison of keywords in a lexical way or using the existing medical thesauri to calculate semantic simil...