We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that conv...
Photo community sites such as Flickr and Picasa Web Album host a massive amount of personal photos with millions of new photos uploaded every month. These photos constitute an ove...
Liangliang Cao, Jie Yu, Jiebo Luo, Thomas S. Huang
In a traditional information retrieval system, it is assumed that queries can be posed about any topic. In reality, a large fraction of web queries are posed about a relatively sm...
This paper presents WordRank, a new page ranking system, which exploits similarity between interconnected pages. WordRank introduces the model of the ‘biased surfer’ which is ...
The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc...
Ernesto Di Iorio, Michelangelo Diligenti, Marco Go...