Sciweavers

HPDC
2010
IEEE

Reshaping text data for efficient processing on Amazon EC2

13 years 5 months ago
Reshaping text data for efficient processing on Amazon EC2
Text analysis tools are nowadays required to process increasingly large corpora which are often organized as small files (abstracts, news articles, etc). Cloud computing offers a convenient, on-demand, pay-as-you-go computing environment for solving such problems. We investigate provisioning on the Amazon EC2 cloud from the user perspective, attempting to provide a scheduling strategy that is both timely and cost effective. We rely on the empirical performance of the application of interest on smaller subsets of data, to construct an execution plan. A first goal of our performance measurements is to determine an optimal file size for our application to consume. Using the subset-sum first fit heuristic we reshape the input data by merging files in order to match as closely as possible the desired file size. This also speeds up the task of retrieving the results of our application, by having the output be less segmented. Using predictions of the performance of our application based on m...
Gabriela Turcu, Ian T. Foster, Svetlozar Nestorov
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where HPDC
Authors Gabriela Turcu, Ian T. Foster, Svetlozar Nestorov
Comments (0)