Data parallelism in bioinformatics workflows using Hydra

10 years 7 months ago
Data parallelism in bioinformatics workflows using Hydra
Large scale bioinformatics experiments are usually composed by a set of data flows generated by a chain of activities (programs or services) that may be modeled as scientific workflows. Current Scientific Workflow Management Systems (SWfMS) are used to orchestrate these workflows to control and monitor the whole execution. It is very common in bioinformatics experiments to process very large datasets. In this way, data parallelism is a common approach used to increase performance and reduce overall execution time. However, most of current SWfMS still lack on supporting parallel executions in high performance computing (HPC) environments. Additionally keeping track of provenance data in distributed environments is still an open, yet important problem. Recently, Hydra middleware was proposed to bridge the gap between the SWfMS and the HPC environment, by providing a transparent way for scientists to parallelize workflow executions while capturing distributed provenance. This paper analy...
Fábio Coutinho, Eduardo S. Ogasawara, Danie
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where HPDC
Authors Fábio Coutinho, Eduardo S. Ogasawara, Daniel de Oliveira, Vanessa P. Braganholo, Alexandre A. B. Lima, Alberto M. R. Dávila, Marta Mattoso
Comments (0)