Physical database design is important for query performance in a shared-nothing parallel database system, in which data is horizontally partitioned among multiple independent node...
Jun Rao, Chun Zhang, Nimrod Megiddo, Guy M. Lohman
We present a novel anytime version of partitional clustering algorithm, such as k-Means and EM, for time series. The algorithm works by leveraging off the multi-resolution property...
Jessica Lin, Michail Vlachos, Eamonn J. Keogh, Dim...
Abstract. We consider a collaboration of peers autonomously crawling the Web. A pivotal issue when designing a peer-to-peer (P2P) Web search engine in this environment is query rou...
Sebastian Michel, Matthias Bender, Peter Triantafi...
Estimating the number of distinct elements in a large multiset has several applications, and hence has attracted active research in the past two decades. Several sampling and sket...
Major media companies such as The Financial Times, the Wall Street Journal or Reuters generate huge amounts of textual news data on a daily basis. Mining frequent patterns in this...