Sciweavers

AND
2009
13 years 2 months ago
Accessing the content of Greek historical documents
In this paper, we propose an alternative method for accessing the content of Greek historical documents printed during the 17th and 18th centuries by searching words directly in d...
Anastasios L. Kesidis, Eleni Galiotou, Basilios Ga...
AND
2009
13 years 2 months ago
Tools for monitoring, visualizing, and refining collections of noisy documents
Developing better systems for document image analysis requires understanding errors, their sources, and their effects. The interactions between various processing steps are comple...
Daniel P. Lopresti, George Nagy
AND
2009
13 years 2 months ago
A comprehensive evaluation methodology for noisy historical document recognition techniques
In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the...
Nikolaos Stamatopoulos, Georgios Louloudis, Basili...
AINA
2009
IEEE
13 years 2 months ago
Document-Oriented Pruning of the Inverted Index in Information Retrieval Systems
Searching very large collections can be costly in both computation and storage. To reduce this cost, recent research has focused on reducing the size (pruning) of the inverted ind...
Lei Zheng, Ingemar J. Cox
ACL
2009
13 years 2 months ago
A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections
User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
SOFSEM
2010
Springer
13 years 2 months ago
Approximate Structural Consistency
Abstract. We consider documents as words and trees on some alphabet and study how to compare them with some regular schemas on an alphabet . Given an input document I, we decide ...
Michel de Rougemont, Adrien Vieilleribière
KDD
2010
ACM
326views Data Mining» more  KDD 2010»
13 years 2 months ago
Document clustering via dirichlet process mixture model with feature selection
One essential issue of document clustering is to estimate the appropriate number of clusters for a document collection to which documents should be partitioned. In this paper, we ...
Guan Yu, Ruizhang Huang, Zhaojun Wang
IRFC
2010
Springer
13 years 2 months ago
An Information Retrieval Model Based on Discrete Fourier Transform
Abstract. Information Retrieval (IR) systems combine a variety of techniques stemming from logical, vector-space and probabilistic models. This variety of combinations has produced...
Alberto Costa, Massimo Melucci
ICPR
2010
IEEE
13 years 2 months ago
Learning Image Anchor Templates for Document Classification and Data Extraction
Image anchor templates are used in document image analysis for document classification, data localization, and other tasks. Current tools allow human operators to mark out small s...
Prateek Sarkar
ICML
2010
IEEE
13 years 2 months ago
Learning optimally diverse rankings over large document collections
Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The fe...
Aleksandrs Slivkins, Filip Radlinski, Sreenivas Go...