We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than re...
Word form normalization through lemmatization or stemming is a standard procedure in information retrieval because morphological variation needs to be accounted for and several la...
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of query processing and understanding in ...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...