This paper describes an exploratory, qualitative study of a process for extracting, identifying and exploiting an enterprise's implicit (less visible) web communities using l...
Handwritten text lines are prominent structures in freeform digital ink notes and their reliable detection is the foundation to a natural and intelligent interface for note editin...
Ming Ye, Herry Sutanto, Sashi Raghupathy, Chengyan...
Text segmentation is important for text analysis, while text alignment is to determine shared sub-topics among similar documents. Multi-task text segmentation and alignment is the...
We study the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, we make t...
We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and ...