Loughborough University
Browse
paper.pdf (649.97 kB)

Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments

Download (649.97 kB)
journal contribution
posted on 2018-09-28, 15:24 authored by Ramy Gad, Simon Pickartz, Tim Suss, Lars NagelLars Nagel, Stefan Lankes, Antonello Monti, Andre Brinkmann
Virtualization has become an indispensable tool in data centers and cloud environments to flexibly assign virtual machines (VMs) to resources. Virtualization also becomes more and more attractive for high-performance computing (HPC). This is mainly due to the strong isolation of VMs which enables: (1) the sharing of cluster nodes and optimization of the system’s overall utilization; (2) load balancing by means of migrations due to the reduction of residual dependencies; and (3) the creation of system-level checkpoints increasing the fault tolerance in an application-transparent way. On the downside, the additional virtualization layer conceals information that is only available on the process level. This information has a direct influence on the checkpoint size which should be kept as small as possible. In this paper, we propose a novel technique for checkpoint size reduction in virtualized environments. We exploit the fact that the hypervisor detects zero pages which are omitted when capturing a checkpoint. Moreover, compression techniques are applied for a further reduction of the checkpoint size. We therefore fill freed memory regions with zeros supporting both the zero-page detection and the compression. We evaluate our approach by taking the example of HPC applications. The results reveal a reduction of the checkpoint size by up to 9% when compression is disabled in the hypervisor and up to 49% with compression enabled. Furthermore, memory zeroing is able to reduce VM migration time by up to 10% when compression is disabled and by up to 60% when compression is enabled.

Funding

This research and development was supported by the Federal Ministry of Education and Research (BMBF) under Grant 01IH13004 (Project FAST) and Grant 01IH16010B (Project Envelope).

History

School

  • Science

Department

  • Computer Science

Published in

Journal of Supercomputing

Citation

GAD, R. ... et al, 2018. Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments. The Journal of Supercomputing, 74 (11), pp.6236–6257.

Publisher

© Springer

Version

  • AM (Accepted Manuscript)

Publisher statement

This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-018-2548-6.

Acceptance date

2018-08-25

Publication date

2018

ISSN

0920-8542

eISSN

1573-0484

Language

  • en