The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
In this paper, we propose GermanPolarityClues, a new publicly available lexical resource for sentiment analysis for the German language. While sentiment analysis and polarity clas...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...
Topics in prior-art patent search are typically full patent applications and relevant items are patents often taken from sources in different languages. Cross language patent retr...
This paper proposes an algorithm called Imprecise Spectrum Analysis (ISA) to carry out fast dimension reduction for document classification. ISA is designed based on the one-sided...
Hu Guan, Bin Xiao, Jingyu Zhou, Minyi Guo, Tao Yan...