Stemming algorithms find canonical forms for inflected words, e. g. for declined nouns or conjugated verbs. Since such a unification of words with respect to gender, number, time, ...
Content classification performed by end users is spreading through the web. Most of the work done so far is related to the hypermedia web. In spite of that, there is a growing mas...
Considering a broad definition for service contracts (beyond web services and software, e.g. airline tickets and insurance policies), we tackle the challenges of building a high ...
Vertical partitioning is a well-known technique for optimizing query performance in relational databases. An extreme form of this technique, which we call vectorization, is to sto...
Peter Buneman, Byron Choi, Wenfei Fan, Robert Hutc...
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...