This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Microdata protection is a hot topic in the field of Statistical Disclosure Control, which has gained special interest after the disclosure of 658000 queries by the America Online...
The lack of a large scale Chinese test collection is an obstacle to the Chinese information retrieval development. In order to address this issue, we built such a collection compos...
In many design tasks it is difficult to explicitly define an objective function. This paper uses machine learning to derive an objective in a feature space based on selected examp...
Spam pages on the web use various techniques to artificially achieve high rankings in search engine results. Human experts can do a good job of identifying spam pages and pages wh...