Sciweavers

BMCBI
2011

A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

12 years 12 months ago
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines
Background: Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or ‘workflow’, is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results: To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python (’PaPy’). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either l...
Marcin Cieslik, Cameron Mura
Added 12 May 2011
Updated 12 May 2011
Type Journal
Year 2011
Where BMCBI
Authors Marcin Cieslik, Cameron Mura
Comments (0)