Sciweavers

3090 search results - page 202 / 618
» Document Processing with LinkIT
Sort
View
EDBT
2006
ACM
112views Database» more  EDBT 2006»
16 years 4 months ago
Indexing Shared Content in Information Retrieval Systems
Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
ICDAR
2009
IEEE
15 years 11 months ago
High Performance Chinese/English Mixed OCR with Character Level Language Identification
Currently, there have been several high performance OCR products for Chinese or for English. However, no one OCR technique can be simultaneously fit for both the English and the C...
Kai Wang, Jianming Jin, Qingren Wang
SIGIR
2006
ACM
15 years 10 months ago
Near-duplicate detection by instance-level constrained clustering
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
Hui Yang, James P. Callan
PKDD
1998
Springer
113views Data Mining» more  PKDD 1998»
15 years 8 months ago
Text Mining at the Term Level
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on...
Ronen Feldman, Moshe Fresko, Yakkov Kinar, Yehuda ...
DAS
2006
Springer
15 years 8 months ago
Finding the Best-Fit Bounding-Boxes
The bounding-box of a geometric shape in 2D is the rectangle with the smallest area in a given orientation (usually upright) that complete contains the shape. The best-fit bounding...
Bo Yuan, Leong Kwoh, Chew Lim Tan