High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from ...
This paper presents a method for learning a distance metric from relative comparison such as “A is closer to B than A is to C”. Taking a Support Vector Machine (SVM) approach,...
Separating machine printed text and handwriting from overlapping text is a challenging problem in the document analysis field and no reliable algorithms have been developed thus f...
Abstract. We describe a semantic clustering method designed to address shortcomings in the common bag-of-words document representation for functional semantic classification tasks....
Abstract. We consider the problem of predicting how a user will continue a given initial text fragment. Intuitively, our goal is to develop a “tab-complete” function for natura...