Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web...
With the recent developments in ePaper technology, consumer eBook readers have display qualities and form factors that are approaching that of traditional books. These eBook reade...
The National Taiwan University Library has built a digital library of historical documents about Taiwan. The content is unique in that it covers about 80% of all primary Chinese hi...
In this paper, we propose a robust approach for recognition of text embedded in natural scenes. Instead of using binary information as most other OCR systems do, we extract featur...
Jing Zhang, Xilin Chen, Andreas Hanneman, Jie Yang...
In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weigh...