Dynamically Resilient Systems



Introduction

In the future, important social and economic services will be delivered to users in a personalised way by pervasive, network-enabled information systems. The architectures of such systems will be open to the addition and modification of components, and will have to be able to assimilate change in a predictable way without dysfunction. But how resilient will they be when the environment, components or infrastructure change, in a crisis (the “surge response”) or even through normal evolution? Some technologies already exist to help us build resilient systems, but they fail to provide a means of ensuring resilience in this volatile environment because they require fixed (static) levels of redundancy to deal with specific failures identified during design. This collaboration will investigate and develop new embryonic technologies needed to build systems which are free to evolve dynamically, but remain predictably resilient.

Approach

System architects should be able to design and validate open, dynamic component-based systems that achieve predictable dynamic resilience through run-time architecture adaptation governed by resilience policies and triggered by trustworthy metadata. Fundamental advances are needed in several areas in order to realise this vision. We are working on the following points:

Dynamic Resilience Mechanisms (design time): are architectural patterns that use run-time information to maintain resilience through adaptation, e.g. by dynamically composing a satisfactory service from lower-specification components. DRMs are generic (not application-specific) but are realised in application-specific designs as resilience policies. In our scenario, the pattern allowing dynamic selection and parallel composition of services to maintain availability is an example of such a mechanism. So far, relatively few DRMs have been identified on an ad hoc basis, and we have no systematic ways of describing and reasoning about such mechanisms during design.

Predictable Resilience by Policies
(run-time): The architect must be able to define application-specific resilience policies that implement dynamic resilience mechanisms. Policies should be capable of being analysed, again formally, in advance of deployment in order to confirm that they will achieve the resilience properties required by the application. This in turn implies that the system architecture and the resilience policy have to be expressed sufficiently formally to give confidence in the outcome of analyses about whether a particular adaptation is viable.

Trustworthy Resilience Metadata
: Resilience policies, executed at runtime in an open and dynamic system, require appropriate metadata - information about the running system (its components, infrastructure and environment). For predictable dynamic resilience, we require metadata conveying functional information (e.g. pre/postconditions, represented by logical formulae or informal descriptions) and non-functional information (e.g. availability, represented by structured values) relevant to resilience. Metadata is permanently updated.

Service-Oriented Architecture supporting Reasoning and Adaptation Services: Architectures supporting dynamic resilience must include computation, reasoning and adaptation services that are strong enough to work over the metadata needed to implement the adaptation policies. These must be backed up with other services to perform component searches and enact adaptation with minimal disruption as described by the resilience policy.

Applications


Papers


Partners


G. Di Marzo Serugendo
Feb 2009