Sciweavers

804 search results - page 80 / 161
» Text Segmentation Based on Similarity between Words
Sort
View
ICDAR
1997
IEEE
15 years 2 months ago
Representing OCRed documents in HTML
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Tao Hong, Sargur N. Srihari
PAKDD
2009
ACM
127views Data Mining» more  PAKDD 2009»
15 years 4 months ago
Clustering Documents Using a Wikipedia-Based Concept Representation
Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation b...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
LREC
2010
188views Education» more  LREC 2010»
14 years 11 months ago
How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method
We investigate the impact of input data scale in corpus-based learning using a study style of Zipf's law. In our research, Chinese word segmentation is chosen as the study ca...
Hai Zhao, Yan Song, Chunyu Kit
ICIW
2009
IEEE
14 years 7 months ago
Detecting Ontology Mappings via Descriptive Statistical Methods
Instance-based ontology mapping comprises a collection of theoretical approaches and applications for identifying the implicit semantic similarities between two ontologies on the ...
Konstantin Todorov
STACS
1992
Springer
15 years 2 months ago
Speeding Up Two String-Matching Algorithms
We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF al...
Maxime Crochemore, Thierry Lecroq, Artur Czumaj, L...