Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In...
This work presents a new comparison-based diagnosis model and a new algorithm, called Hi-Dif, based on this model. The algorithm is used for checking the integrity of systems with...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through...
Adriano Veloso, Wagner Meira Jr., Marco Cristo, Ma...