Sciweavers

290 search results - page 21 / 58
» Document normalization revisited
Sort
View
COLING
2000
15 years 12 days ago
A Method of Measuring Term Representativeness - Baseline Method Using Co-occurrence Distribution
This paper introduces a scheme, which we call the baseline method, to define a measure of term representativeness and measures defined by using the scheme. The representativeness ...
Toru Hisamitsu, Yoshiki Niwa, Jun-ichi Tsujii
94
Voted
SIGIR
2011
ACM
14 years 1 months ago
When documents are very long, BM25 fails!
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namel...
Yuanhua Lv, ChengXiang Zhai
ICPR
2002
IEEE
16 years 4 days ago
Robust Text Detection from Binarized Document Images
Many document images are rich in color and have complex background. To detect text from them, a standard approach utilizes both color and binary information. This often leads to t...
Oleg Okun, Yu Yan, Matti Pietikäinen
ICPR
2010
IEEE
14 years 9 months ago
Text Separation from Mixed Documents Using a Tree-Structured Classifier
In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured cla...
Xujun Peng, Srirangaraj Setlur, Venu Govindaraju, ...
DOCENG
2010
ACM
14 years 8 months ago
Diffing, patching and merging XML documents: toward a generic calculus of editing deltas
This work addresses what we believe to be a central issue in the field of XML diff and merge computation: the mathematical modeling o-called editing deltas and the study of their ...
Jean-Yves Vion-Dury