Germany, Belgium, France, and back again
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The EC-funded project HPC4U developed a Grid fabric that provides not only SLA-awareness but also a software-only based system for checkpointing sequential and MPI parallel jobs. This allows job completion and SLA-compliance even in case of resource outages. Checkpoints are generated transparently to the user in the background. There is no need to modify the applications in any way or to execute it in a special manner. Checkpoint data sets can be migrated to other cluster systems to resume the job execution. This paper focuses on the job migration over the Grid by utilizing the WS-Agreement protocol for SLA negotiation and mechanisms provided by the Globus-Toolkit.