Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying character...
Yuanyuan Tian, Richard A. Hankins, Jignesh M. Pate...
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
How can we automatically spot all outstanding observations in a data set? This question arises in a large variety of applications, e.g. in economy, biology and medicine. Existing ...
Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to iden...
In the past few years there has been increased research interest in detecting previously unidentified events from Web resources. Our focus in this paper is to detect events from ...