Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
We present a user interface, the OntoRefiner1 system, for helping the user to navigate numerous retrieved documents after a search querying a semantic portal which integrates a ver...
Text segmentation is important for text analysis, while text alignment is to determine shared sub-topics among similar documents. Multi-task text segmentation and alignment is the...
Marking up queries with annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of many approaches to query processing and understanding. D...