Sciweavers

ECIR
2004
Springer

Performance Analysis of Distributed Architectures to Index One Terabyte of Text

13 years 6 months ago
Performance Analysis of Distributed Architectures to Index One Terabyte of Text
We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture using a variable number of workstations. A collection of approximately 94 million documents and 1 terabyte of text is used to test the performance of the different architectures. We show that in a purely distributed architecture, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a large number of query servers is used, mainly due to the reduction of the network load.
Fidel Cacheda, Vassilis Plachouras, Iadh Ounis
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where ECIR
Authors Fidel Cacheda, Vassilis Plachouras, Iadh Ounis
Comments (0)