Sciweavers

ICDAR
2011
IEEE

Chinese Keyword Spotting Using Knowledge-Based Clustering

12 years 4 months ago
Chinese Keyword Spotting Using Knowledge-Based Clustering
—Content-based document image retrieval is a new and promising research area. Without OCR, document indexing directly based on image content is more general and convenient. However content-based Chinese document retrieval is difficult for the complexity of Chinese character structure and large class numbers. Few papers cover this issue, and this paper will focus on it. This paper presents a novel algorithm of knowledge-based clustering and gives a mechanism of serial batch clustering for large data set. Knowledge derives from an artificial document image collection. Chinese characters with high frequency are edited and synthesized to images automatically. Cluster IDs are adopted to index the characters. A Dream of Red Mansions, a famous classical Chinese literature work including near one million characters, is used to evaluate the performance of Chinese keyword spotting. Experimental results confirm the effectiveness of knowledge-based clustering and its application on Chinese keywo...
Yong Xia, Kuanquan Wang, Mingwei Li
Added 24 Dec 2011
Updated 24 Dec 2011
Type Journal
Year 2011
Where ICDAR
Authors Yong Xia, Kuanquan Wang, Mingwei Li
Comments (0)