Secure genome processing in public cloud and HPC environments

Aligning next generation sequencing data requires significant compute resources. HPC and cloud systems can provide sufficient compute capacity, but do not offer the required data security guarantees. HPC environments are typically designed for many groups of trusted users and often only include minimal security enforcement, while Cloud environments are mostly under the control of untrusted entities and companies. In this work we present a scalable pipeline approach that enables the use of public Cloud and HPC environments, while improving the patients’ privacy. The applied techniques include adding noisy data, cryptography, and a MapReduce program for the parallel processing of data.