Sciweavers

290 search results - page 21 / 58
» Document normalization revisited
Sort
View
102
Voted
COLING
2000
15 years 2 months ago
A Method of Measuring Term Representativeness - Baseline Method Using Co-occurrence Distribution
This paper introduces a scheme, which we call the baseline method, to define a measure of term representativeness and measures defined by using the scheme. The representativeness ...
Toru Hisamitsu, Yoshiki Niwa, Jun-ichi Tsujii
109
Voted
SIGIR
2011
ACM
14 years 3 months ago
When documents are very long, BM25 fails!
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namel...
Yuanhua Lv, ChengXiang Zhai
92
Voted
ICPR
2002
IEEE
16 years 1 months ago
Robust Text Detection from Binarized Document Images
Many document images are rich in color and have complex background. To detect text from them, a standard approach utilizes both color and binary information. This often leads to t...
Oleg Okun, Yu Yan, Matti Pietikäinen
ICPR
2010
IEEE
14 years 10 months ago
Text Separation from Mixed Documents Using a Tree-Structured Classifier
In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured cla...
Xujun Peng, Srirangaraj Setlur, Venu Govindaraju, ...
127
Voted
DOCENG
2010
ACM
14 years 10 months ago
Diffing, patching and merging XML documents: toward a generic calculus of editing deltas
This work addresses what we believe to be a central issue in the field of XML diff and merge computation: the mathematical modeling o-called editing deltas and the study of their ...
Jean-Yves Vion-Dury