Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the ...
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua L...
In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many real...
Plankton form the base of the food chain in the ocean and are fundamental to marine ecosystem dynamics. The rapid mapping of plankton abundance together with taxonomic and size com...
Xiaoou Tang, W. Kenneth Stewart, He Huang, Scott M...
We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can e ciently compute principal comp...
For every string inclusion relation there are two optimization problems: find a longest string included in every string of a given finite language, and find a shortest string in...