Sciweavers

WWW
2010
ACM

Mind the data skew: distributed inferencing by speeddating in elastic regions

14 years 5 months ago
Mind the data skew: distributed inferencing by speeddating in elastic regions
Semantic Web data exhibits very skewed frequency distributions among terms. Efficient large-scale distributed reasoning methods should maintain load-balance in the face of such highly skewed distribution of input data. We show that term-based partitioning, used by most distributed reasoning approaches, has limited scalability due to load-balancing problems. We address this problem with a method for data distribution based on clustering in elastic regions. Instead of assigning data to fixed peers, data flows semi-randomly in the network. Data items “speed-date” while being temporarily collocated in the same peer. We introduce a bias in the routing to allow semantically clustered neighborhoods to emerge. Our approach is self-organising, efficient and does not require any central coordination. We have implemented this method on the MaRVIN platform and have performed experiments on large real-world datasets, using a cluster of up to 64 nodes. We compute the RDFS closure over differ...
Spyros Kotoulas, Eyal Oren, Frank van Harmelen
Added 14 May 2010
Updated 14 May 2010
Type Conference
Year 2010
Where WWW
Authors Spyros Kotoulas, Eyal Oren, Frank van Harmelen
Comments (0)