Service level agreement aware resource management

Hovestadt, Matthias

Service level agreement aware resource management

dc.contributor.author	Hovestadt, Matthias
dc.date.accessioned	2025-04-09T07:09:57Z
dc.date.available	2025-04-09T07:09:57Z
dc.date.issued	2006
dc.description.abstract	Next Generation Grids aim at attracting commercial users to employ Grid environments for their business critical compute jobs. These customers demand for contractually fixed service quality levels, ensuring the availability of results in time In this context, a Service Level Agreement (SLA) is a powerful instrument for defining a comprehensive requirement profile. Numerous research projects worldwide already focus on integrating SLA technology in Grid middleware components like broker services. However, solely focusing on Grid middleware services is not sufficient. Services at Grid middleware may accept compute jobs from customers, but they have to realize them by means of local resource management systems (RMS). Current RMS offer best-effort service only, thus they are also limiting the service quality level the Grid middleware service is able to provide. In this thesis the architecture and operation of an SLA-aware resource management system is described, which allows Grid middleware components to negotiate on SLAs. The system uses its internal mechanisms of applicationtransparent fault tolerance to ensure the terms of these SLAs even in case of resource outages. The main parts of this work focus on scheduling aspects and strategies for ensuring SLA compliance, respectively design aspects on implementation. Scheduling strategies significantly determine the level of fault tolerance that the system is able to provide. After presenting requirements of Grid middleware components on service qualities and a description of operation phases of an SLA-aware resource management system, intra-cluster scheduling strategies are described. Here, the system solely uses its own resources and mechanisms for coping with resource outages. For further increasing the level of fault tolerance, strategies for cross-border migration are presented. Beside a migration to other cluster systems in the same administrative domain, the system uses also Grid resources as migration targets. For ensuring the successful restart, mechanisms for describing the compatibility profile of a checkpointed job are presented. The concept of the SLA-aware resource management system has been implemented in the scope of the EC-funded project HPC4U. We will describe design aspects of this realization and show results from system deployments at use-case customers.
dc.identifier.uri	https://bibliographie.hs-hannover.de/handle/hsh/27148
dc.language.iso	en
dc.publisher.place	Paderborn
dc.title	Service level agreement aware resource management
dc.type	Qualifikationsschrift
dc.type.thesis	Dissertation
dcterms.accessRights	open access
dspace.entity.type	Publication
hsh.citavi_tags	Fakultät IV; open_access
hsh.creator_hsh	Hovestadt, Matthias
hsh.openAccess.status	yes
hsh.publisher.peerreviewed	Unknown
relation.isAuthorOfPublication	8a491a6d-70aa-474d-b209-55e299568064
relation.isAuthorOfPublication.latestForDiscovery	8a491a6d-70aa-474d-b209-55e299568064

Collections

Publikationen

Service level agreement aware resource management

Files

Collections