Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as ...
Chao Wei, Yiqun Liu, Min Zhang, Shaoping Ma, Liyun...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a preference score and a p...
Xingjie Liu, Mao Ye, Jianliang Xu, Yuan Tian, Wang...
Versioned textual collections are collections that retain multiple versions of a document as it evolves over time. Important large-scale examples are Wikipedia and the web collect...
Several initiatives for establishing standards for metadata models are being carried out at the moment, but everyone focuses on their own requirements when defining metadata attri...