With the wide adoption of XML as a standard data representation and exchange format, querying XML documents becomes increasingly important. However, relational database systems co...
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Traditionally, popular synonym acquisition methods are based on the distributional hypothesis, and a metric such as Jaccard coefficients is used to evaluate the similarity between...
Abstract. Given a binary dominance relation on a set of alternatives, a common thread in the social sciences is to identify subsets of alternatives that satisfy certain notions of ...
Dorothea Baumeister, Felix Brandt, Felix Fischer, ...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...