UNIVERSITÀ DEGLI STUDI DI PISA - RCL

UNIVERSITÀ DEGLI STUDI DI PISA

DIPARTIMENTO DI INGEGNERIA DELLA INFORMAZIONEELETTRONICA, INFORMATICA, TELECOMUNICAZIONI

TESI DI DOTTORATO DI RICERCA ININGEGNERIA ELETTRONICA, INFORMATICA

E DELLE TELECOMUNICAZIONI

XI CICLO

M O D ELLIN G A N D EV A LU A TIO N O FP H A S ED M IS S IO N S Y S TEM S

Il candidato

Ivan Mura

Il Tutore Il co-Tutore

Chiar.mo Prof. Ing. Luca Simoncini Dr. Andrea Bondavalli

GENNAIO 1999

M O D ELLIN G A N D EV A LU A TIO N O F

P H A S ED M IS S IO N S Y S TEM S

Ivan Mura

Table of contents

Introduction ..................................................................................................................... I

PART I: Framework of the research

Chapter 1. Analytical modelling for dependability evaluation ....................................... 1

1.1. The role of analytical modelling in dependability analysis ......................... 1

1.2. Dependability modelling methodologies and tools...................................... 4

1.2.1. Combinatorial modelling techniques ............................................ 4

Fault-trees.................................................................................... 4

Reliability block diagrams .......................................................... 6

1.2.2. Markov chains ............................................................................... 7

1.2.3. Classical Petri nets ........................................................................ 11

Place-transition Petri nets............................................................ 11

Stochastic Petri nets .................................................................... 14

Generalised stochastic Petri nets ................................................. 16

1.2.4. Non Markovian Petri nets ............................................................. 17

Deterministic and stochastic Petri nets ....................................... 17

Markov regenerative stochastic Petri nets .................................. 21

Chapter 2. Phased mission systems................................................................................. 23

2.1. Peculiar characteristics of PMSs .................................................................. 24

2.2. Examples of PMSs ....................................................................................... 27

2.3. Dependability measures for PMSs ............................................................... 29

2.4. An application example ............................................................................... 31

Chapter 3. State-of-the-art and open issues in the analysis of PMSs.............................. 36

3.1. Literature survey .......................................................................................... 37

3.2. Comparison of the methods based on the Markov approach ....................... 43

3.3. Open issues and objectives of the thesis ...................................................... 47

PART II: New methodologies for the analysis of PMS dependability

Chapter 4. Hierarchical Markov models for PMSs ......................................................... 51

4.1. The modelling approach............................................................................... 53

4.1.1. Upper level model ......................................................................... 53

4.1.2. Lower level models ....................................................................... 54

4.2. Evaluation procedure ................................................................................... 59

4.3. Application example .................................................................................... 62

4.3.1. Modelling ...................................................................................... 62

4.3.2. Evaluation ..................................................................................... 68

4.4. Concluding remarks ..................................................................................... 70

Chapter 5. DSPN models for PMSs ................................................................................ 73


5.2. Analytical solution ....................................................................................... 81

5.2.1. Linear PhN .................................................................................... 82

5.2.2. Tree-like PhN ................................................................................ 85

5.2.3. Deterministic transitions with marking dependent firing rate ...... 87

5.2.4. Sensitivity analysis of DSPN models of PMSs............................. 87



5.4.1. Modelling ...................................................................................... 93

5.4.2. Evaluation ..................................................................................... 96


Chapter 6. MRSPN models for PMSs............................................................................. 100


6.2. Analytical solution ....................................................................................... 102

6.2.1. The transient marking occupation probabilities ............................ 103

6.2.2. Analytical evaluation of PMS Reliability ..................................... 107

6.2.3. Specialising the general results ..................................................... 108

Constant phase duration .............................................................. 109

Markov chain subordinate processes .......................................... 110

General subordinate processes .................................................... 113

6.2.4. Sensitivity analysis of MRSPN models of PMSs ......................... 114

Sensitivity analysis of V( t ) ........................................................ 114

Sensitivity analysis of R ............................................................. 116



6.4.1. Modelling ...................................................................................... 118

6.4.2. Evaluation ..................................................................................... 119


Chapter 7. Conclusions ................................................................................................... 124

References. ...................................................................................................................... 128

Appendix A. Laplace transform properties ..................................................................... 136

Appendix B. LSTs of the kernel matrix blocks .............................................................. 138

B.1. Exponential distribution .............................................................................. 138

B.2. Deterministic distribution ............................................................................ 139

B.3. Truncated exponential distribution.............................................................. 140

B.4. Uniform distribution .................................................................................... 141

B.5. Erlang distribution ....................................................................................... 142

Introduction

Nowadays, an increasing number of critical systems is controlled by digital processing

systems, to effectively manage the complex control of the operations, and to ensure a

quick reaction against the occurrence of potentially dangerous events. As a consequence

of this pervasive deployment of computer systems, there is a growing need for depend-

able systems. The risk arising from the failures of the control system, in terms of both

human lives and economical losses, ought to be minimised. To this aim, specific design

methodologies have been defined to support the first project phases of the systems, and

fault-tolerance techniques and tools have been developed [6, 13, 16, 42, 47, 50, 82, 85] to

provide the means to cope with unforeseen situations which may turn into catastrophic

failures of the controlled systems.

Though the utilisation of the aforementioned techniques can surely increase the level of

reliance that can be put on a system, it is not enough to guarantee that the system will be

adequately dependable during its whole life-cycle. On the other hand, authorities push on

the constructors to impose high dependability targets for a wide variety of manufacturing,

production, and transportation systems, especially for all those systems whose malfunc-

tioning may endanger human lives [40]. The assessment of dependability attributes, like

Reliability, Safety, and Availability, is thus a necessary step for the verification and vali-

dation of a proper design, to ensure that the dependability targets have been achieved and

that only those systems adequately dependable come to operation.

Amongst the approaches commonly adopted to evaluate dependability attributes, analyt-

ical modelling has proven to be very useful and versatile [6, 8, 13, 41, 45-47, 50, 82, 85].

Modelling can be used for system assessment in all phases of the system life cycle. Since

the early design phases, models show their usefulness and potentialities, allowing for the

comparison of different architectural and design solutions and for the selection of the

most suitable one. Moreover, the sensitivity analyses that can be carried out after mod-

elling allow to identify dependability bottlenecks, thus highlighting problems in the de-

sign, and to identify the critical parameters (out of the many that are usually employed at

this stage), those to which the system is highly sensitive. Various methods and tools for

Introduction II

dependability modelling and analysis have been developed which provide support to the

analyst, during the phases of definition and evaluation of the models. Among these,

Markov based models [9, 15, 43, 49, 68, 85] and Petri nets [1, 2, 21, 22, 33, 36, 52, 53,

64, 65, 70, 84] have been widely accepted in the dependability community because of

their powerful representative capabilities, and the relatively cheap solution techniques,

and have been also integrated in many automated tools for the evaluation of system de-

pendability.

Many of the systems devoted to the control and management of critical activities have to

perform a series of tasks that must be accomplished in sequence. Their operational life

consists of a sequence of non-overlapping periods, called “phases”. A system that ex-

hibits such a phased behaviour is often called a phased mission system, abbreviated as

PMS hereafter. During a specific phase, a PMS is devoted to the execution of a particular

set of tasks, which may be completely different from the activities performed within

other phases. The performance and dependability requirements of a PMS can be sensibly

different from one phase to another. Moreover, during some phases the system may be

subject to a particularly stressing environment, thus experiencing dramatic increases in

the failure rate of its components. In order to accomplish its mission, a PMS may need to

change its configuration over time, to adopt the most suitable one with respect to the per-

formance and dependability requirements of the phase being currently executed, or sim-

ply to be more resilient to an hazardous external environment.

Because of their deployment in critical applications, the dependability modelling and

analysis of PMSs is a task of primary relevance. However, a systematic and generally

applicable methodology to this aim has not been found, yet. The modelling of complex

systems always poses formidable problems, and PMSs do not represent an exception to

the rule. Moreover, their phased behaviour adds a further degree of complexity to the

analysis of these systems. Modelling a PMS can be a complicate task even inside one

single phase; when a multiplicity of phases and the dependencies among them are to be

taken into account, additional difficulties are encountered, whichever modelling tools and

approaches are adopted.

This thesis aims at proposing new methodologies for the study of PMS dependability

related characteristics. The dissertation is divided in two main parts. The first part has the

role of defining the framework of the study. Chapter 1 introduces the motivations and the

goals of the system dependability analysis, and shortly presents the most commonly

employed methodologies and tools for dependability modelling and evaluation The spe-

Introduction III

cific features that characterise the PMSs with respect to other classes of systems are de-

scribed in Chapter 2, which also gives some examples of PMSs that can be found in vari-

ous application domains. The main dependability measures of interest for a PMS are also

defined in Chapter 2. Last, this same chapter introduces an ideal PMS application that is

used as the case of study throughout the entire dissertation, to provide a unified frame-

work for the application and comparison of the various approaches. A survey of the most

important works that have appeared in the literature for the study of PMSs is then pre-

sented in Chapter 3. We point out the advantages and the drawbacks of the proposed ap-

proaches, by using the application example defined in Chapter 2 as a benchmark to test

the applicability and efficiency of the various methodologies. The goals of the thesis are

then stated, with the aim of improving over the current limits of the state-of-the-art in the

dependability modelling of PMSs.

The second part of the thesis contains the original contribution to the dependability anal-

ysis of PMSs. It is articulated in three chapters, 4, 5, and 6, which describe different

methods for the analysis of PMSs. The method presented in Chapter 4 is based on a hier-

archy of Markov models, and proposes several novelty aspects that relieve some of the

weak points of the methods in the literature. In Chapter 5 we present a different approach

for the dependability modelling of PMSs, based on the class of Deterministic and

Stochastic Petri Nets (DSPNs). The method takes advantage of the power and expressive-

ness of DSPNs to define very concise and easy-to-read models for PMSs. At the same

time, DSPN models of PMSs can be analytically solved with a very efficient procedure.

Chapter 5 also presents the analytical derivation of the sensitivity functions for the DSPN

model of a PMS. The last methodology we present in Chapter 6 of this thesis has the

specific goal of enlarging the class of PMSs that can be analytically solved. We propose

the class of Markov Regenerative Stochastic Petri Nets (MRSPNs) as a modelling tools

for PMSs, and we derive a general, exact, and efficient analytical solution method.

Finally, a summary and some concluding remarks are given in Chapter 7, where the ob-

tained results are reviewed and discussed with respect to the fulfilment of the purposes of

the investigation. Some properties of the mathematical tools used throughout the thesis

and the derivation of some analytical expressions are presented in two short appendices.

The research presented throughout this thesis has been conducted in collaboration with

my tutor Prof. Luca Simoncini, from the department of Information Engineering of the

Pisa University, and my co-tutor Dr. Andrea Bondavalli, from the institute CNUCE of

the Italian National Research Council, to whom I am deeply thankful for the invaluable

support and for the time we enjoyed working together. I also wish to thank Prof. Kishor

Introduction IV

Shridharbhai Trivedi and Dr. Xinju Zang from the department of Electrical and

Computer Science Engineering of the Duke University at Durham, North Carolina, and

Ing. Manuela Nelli, previously a member of the Pisa Dependable Computing Center

research group, for the precious hints and fruitful discussions that contributed to the

fundamental results of this dissertation. I am also indebted to Dr. Felicita Di

Giandomenico, from the institute IEI of the Italian National Research Council, for her

support and the encouraging appreciation expressed to me during the whole Ph.D. course.

PART I:

Framework of the research

Chapter 1

Analytical modelling fordependability evaluation

1.1 The role of analytical modelling in dependability analysis

The term dependability has been introduced in the literature to define that characteristic

of a system that allows a justifiable reliance to be put in the services the system provides

[45, 46]. A system can be classified as dependable only with respect to the specific re-

quirements stated in the system specification for the service it must provide. A service

compliant with the system specification is called a proper service, a service that is not is

called an improper service. Since various types of requirements can be prescribed, the de-

pendability is in fact a unifying concept for a number of distinct system characteristics.

The following are some attributes of dependability. Reliability is a measure of the contin-

uous delivery of the correct service, Safety is the non-occurrence of catastrophic conse-

quences, Availability is the measure of the delivery of correct service with respect to the

alternation of correct and incorrect service, Security is the non-occurrence of unauthorised

access. Often other dependability-related attributes are also defined [9, 45, 46, 55, 85].

Performability attributes originate from a combination of performance and dependability

metrics, by taking into account performance in degraded system states. Integrity is de-

fined as the avoidance of improper alterations of system service (information provided by

the system). Confidentiality means the non-occurrence of unauthorised disclosures of

system service. Maintainability is the ability to undergo repairs and evolution.

Testability is the ability to test for certain attributes within the system.

Analytical modelling for dependability evaluation 2

A system is hopefully built to satisfy all the requirements prescribed for the service it

provides, in other words, to be perfect. However, for complex real systems, it is practi-

cally impossible to give absolute guarantees that the system will properly perform during

its whole life-cycle. This is because faults may randomly affect system components and

eventually lead to unforeseen situations, in which system requirements can not be ful-

filled anymore and improper services are delivered.

The deployment of fault-tolerance techniques is a useful means to limit the potential neg-

ative effects of faults [6, 13, 16, 42, 47, 50, 82, 85]. Redundant components increase the

robustness of the system against faults, thus increasing its level of dependability. Faults

can be detected, their effects masked, and even removed from the system. However,

whichever degree of redundancy is employed, even if extremely unlikely to happen, it is

possible for a system to experience a failure scenario that overcomes its fault-tolerance

capabilities. Moreover, the introduction of redundant components and of their manage-

ment policies further increases the complexity of the system, and exposes new targets to

the negative influences of faults. This poses significant problems for critical systems in

charge of managing valuable resources, such as production plants, transport means or hu-

man lives. For this class of systems, malfunctioning may lead to catastrophic conse-

quences.

The evaluation of dependability attributes can be used for fault-forecasting [5, 9, 13, 15,

32, 41, 47, 74, 85], that is to probabilistically estimate the occurrence of faults and their

impact on the ability of the system to provide a proper service. Hence, the dependability

evaluation represents a suitable means to verify the adequacy of a system design with re-

spect to the requirements given in its specification. System assessment can be performed

using several approaches like testing, fault injection and analytical models, often com-

bined together [48].

The analytical modelling approach is generally cheap for manufacturers and has proven to

be useful and versatile in all the phases of the system life cycle. During design phase,

models allow to compare different solutions and to select the most suitable one (among

those obeying other design constraints), and to highlight problems within the design. This

early validation of the concepts and architectural choices avoids wasting time and re-

sources before realising whether the systems fulfils its objectives or needs some re-design.


In assessing an already built system, models allow to characterise specific aspects, to de-

tect dependability bottle-necks and to suggest solutions to be adopted in future releases.

The modelling also represents an effective tool to foresee the effects of system mainte-

nance operations, and of possible changes or upgrades of the system configuration.

Models of complex systems usually require many parameters (the meaning thereof is not

always intuitive for designers), and require to determine the values to assign to them

(usually by way of experimental tests) which may be very difficult. Moreover, obtaining

values for all the parameters can be impossible during the preliminary design phases of

the system. The sensitivity analyses that can be carried out after modelling allow to iden-

tify the critical parameters out of the many that are employed, that is those parameters to

which the system is highly sensitive. Indeed, sensitivity analysis allows to evaluate a

range of possible system scenarios with varying the values of model parameters, to de-

termine the trends in the consequent variations of dependability figures. On one hand,

this helps in understanding the accuracy level that must be attained while estimating the

parameter values. Since even slight variations of critical parameter values may result in

relevant changes of system dependability attributes, a thorough calibration of such pa-

rameters is necessary to increase the level of confidence that can be put on the depend-

ability evaluation itself. On the other hand, it is possible to point out which parts of the

system are sensitive the most to the parameter variations, and to adopt responses such as

the deployment of adequate fault-tolerance techniques to achieve a proper level of de-

pendability.

Modelling shows also several problems that must be taken under special care. The first

problem is complexity; in fact, although building models of simple mechanisms may be

easy, the overall description of critical complex systems accounting at the same time for

all the relevant aspects is not trivial at all. To master complexity a modelling methodology

is needed so that only the relevant aspects can be detailed still allowing numerical results

to be effectively computable: for instance, if a model specifies too many details, then the

number of related information may explode giving raise to processing problems. In addi-

tion, simplifying hypotheses are very often necessary to keep the model manageable; the

choice of such hypotheses is critical. Making assumptions, on one hand, allows to obtain

simpler models, but, on the other, leads to approximations of the system behaviour. The


resulting error should always be estimated, either through sensitivity analyses or by com-

paring the results returned by the model containing the assumption and a model where it

has been released.

1.2 Dependability modelling methodologies and tools

In the following, we shortly describe the main characteristics of the various classes of

modelling methodologies that have been developed over the last decades to provide de-

pendability engineers the support tools for defining and solving models. A distinction can

be made between methodologies that employ combinatorial models like Fault-Trees and

Reliability Block Diagrams, and those based on state space oriented representations, such

as Markov chains and Petri net models. The combinatorial approaches offer extremely

simple and intuitive methods for the construction and solution of the models. However,

they are inadequate to deal with systems that exhibit complex dependencies among com-

ponents, and can not deal with repairable systems. Therefore, their popularity has been

decreasing at the advantage of the more powerful state space based approaches. Many of

the existing modelling techniques are supported by automated tools like SURF-2 [54],

SHARPE [73], SPNP [25], GreatSPN [20], UltraSAN [76], TimeNET [37], PANDA [4],

for the assisted construction and solution of dependability models.

1.2.1 Combinatorial modelling techniques

In this section we present two of the most commonly used combinatorial modelling ap-

proaches, Fault-trees and Reliability block diagrams.

Fault-trees

Fault-trees are a deductive modelling and analysis technique based on the study of the

events that may impair the dependability of a system [53]. A Fault-tree considers the

combination of events that may lead to an undesirable situation of the system, typically

the delivery of the improper service, if the Reliability of the system is the matter of the

analysis, or a catastrophic failure for a Safety study. Fault-trees are fruitfully employed

during the first design phases of a system, in that they are able to represent at a higher ab-


straction level the various failure scenarios that can affect the dependability figures of the

system.

A Fault-tree is formed by a series of layers of events, connected through logic gates that

represent the two Boolean operators AND and OR. Each outgoing event of a port is ob-

tained by combining the incoming events according to the logic expressed by the gate it-

self. The undesirable event that represents the object of the evaluation occupies the root

of the Fault-tree. The construction of a Fault-tree model follows a top-down approach.

Starting from the root of the Fault-tree, each event is decomposed as a logical combination

of a set of simpler events, until a set of elementary events is found that are not further de-

composed. An event can be classified as elementary either because its occurrence proba-

bility is known, or because it is not possible to carry on the decomposition.

Figure 1.1 shows an example of a Fault-tree, where the failure of the system is the result

of the failure of the components A, B, and C. The event D is an intermediate failure event

representing the failure of the subsystem formed by the two components A and B.

DAND

OR

System failure

Component A fails

Component B fails

Component C fails

Figure 1.1: An example of Fault-tree

The analysis of a Fault-tree model for the evaluation of the probability of occurrence of

the root event is based on the computation of the set of cuts. A cut is defined as a set of

elementary events that, according to the logic expressed by the Fault-tree, leads to the oc-

currence of the root event. For instance, for the example given in Figure 1.1 the set {A,B}

is a cut of the Fault-tree. The numerical solution of the Fault-Tree is performed by com-

puting the probability of occurrence for each of the cuts, and by combining those proba-

bilities to estimate the probability of the root event.

Among the possible cuts of a Fault-tree, the minimal cuts provide additional information

on the dependability characteristics of the modelled system. A cut is said to be minimal if


it does not contain any other cut. The two sets {A,B} and {C} are the minimal cuts for

the example. With the analysis of the minimal cuts, it is possible to identify the simplest

combinations of elementary events that bring the system to the failure. In particular, it is

possible to discover whether the system has single-points of failure, that is minimal cuts

formed by a single element, which may represent dependability bottlenecks.

Reliability block diagrams

Reliability block diagrams [78] are especially meant to evaluate the Reliability related

characteristics of systems composed of multiple components connected in series or in

parallel. The logical relations and the interconnections among components are modelled in

a quite intuitive way.

Each system component is modelled as an atomic entity, graphically represented as a box.

The various components are connected among them in a configuration that can represent

the temporal order with which the system uses the components, or some redundancy

management scheme, or the success criteria of the system. Each system component is as-

signed a probability of working, or the complete time-dependent Reliability function. The

goal is to derive the probability or the time-dependent Reliability of correct operation for

the overall system.

For instance, the Reliability block diagram shown in Figure 1.2 represents a systems that

uses six components to accomplish its duties. Some of the components are arranged in a

parallel combination, such as components A, D, and F, some others in a series, such as

component A and B.

The solution of a Reliability block diagram model is similar to that of a Fault-Tree. In

fact, it is quite easier to compute the Unreliability of the modelled system by considering

the cuts of the model. For a Reliability block diagram, a cut is easily identified as a set of

components whose removal causes the left end of the diagram to be disconnected from

the right end. The overall Unreliability of the system is evaluated from that of the single

cuts. Approximated solution techniques have been defined [34], which provide upper and

lower bounds on the exact results, based on the analysis of the sole minimal cuts of the

Reliability block diagram model.


A B

D

E F

C

Figure 1.2: Example of Reliability block diagram

1.2.2 Markov chains

Markov chains represent a suitable tools for the modelling of a variety of systems of dif-

ferent nature. They combine an extreme versatility with well developed and efficient solu-

tion algorithms, which have been implemented in many automated tools for the evaluation

of performance and dependability related attributes.

Markov chains are a state based formalism, very close to that of the automata. The set of

possible states of a Markov chain is called the state space, and is denoted by S . If the

state space S is a discrete or countable set, we are properly talking of a Markov chain,

and its cardinality is denoted by C . If the state space is a continuous set, then the term

Markov process is more appropriate. A state change of a Markov chain is called a state

transition.

Rigorously speaking, a Markov chain is a particular stochastic process { X ( t ),t ≥ 0 },

which enjoys the following memoryless property, also known as the Markov property:

Definition 1.1: (Markov property) The probability of any particular future be-

haviour of the process, when its current state is known exactly, is not altered by

additional knowledge concerning its past behaviour. In formulae, for any n > 0 and

any sequence of increasing time instants t1 ,t2 ,K ,tn ,tn+1 , the following Equation

holds:

Prob[ X ( tn+1 ) = j| X ( tn ) = in ,X ( tn−1 ) = in−1 ,K ,X ( t1 ) = i1 ] =

= Prob[ X ( tn+1 ) = j| X ( tn ) = in ] , ∀ j ,in ,in−1,Ki1 ∈S


The past history of a Markov chain is thus completely summarised by its current state.

If the exact characterisation of the present state of the process is independent from the

current time, then the Markov chain is said to be time-homogeneous, otherwise it is said

to be a non-homogeneous Markov chain.

The parameter t that indexes the Markov chain can be either discrete or continuous. In

the first case we have a discrete-time Markov chain { X n ,n ≥ 0 }, where state transitions

only occur at discrete points in time, often called steps, whereas in the latter case we have

a continuous-time Markov chain { X ( t ),t ≥ 0 }, and state transitions may occur at any

point in time.

The dynamic behaviour of a discrete-time non-homogeneous Markov chain at time n,

n ≥ 0 can be described by the one-step transition probability matrix Pn = p

i , jn , whose

entries are defined as follows:

p

i , jn = Pr ob[ X n+1 = j| X n = i ], ∀ i , j ∈S

For each pair ( i , j ) of states in S , matrix Pn gives the probability that at time n+ 1 the

Markov chain is in state j, given that it was in state i at time n. For a homogeneous dis-

crete-time Markov chain, the dependency from the time parameter n is not relevant, and

a single one-step matrix P exists.

Similarly, for continuous-time non-homogeneous Markov chains the evolution at time t is

described by the transition rate matrix Q( t ) = qi , j( t ) , also called the infinitesimal gener-

ator matrix of the Markov chain, defined as follows:

qi , j( t ) = lim

Δ→0

Pr ob[ X ( t + Δ ) = j| X ( t ) = i ]Δ

, ∀ i , j ∈S , i ≠ j

qi ,i( t ) = lim

Δ→0

Pr ob[ X ( t + Δ ) = i| X ( t ) = i ] − 1Δ

, ∀ i ∈S

For each pair ( i , j ) of states in S , matrix Q( t ) gives the rate with which the Markov

chain moves from state i to state j at time t. For a homogeneous continuous-time

Markov chain, the dependency from the parameter t is omitted, and a single transition

rate matrix Q is defined.


Because of the memoryless property, each transition from state i to state j of a homo-

geneous continuous-time Markov chain occurs in an exponentially distributed time, and

the rate qi , j is exactly the inverse of the expected time to the transition, that is the rate of

the corresponding exponential distribution. Therefore, in a homogeneous Markov chain,

all the transitions among states occur in a negative exponentially distributed time. This

implies that time needed to perform whichever activity of a system must be modelled

with an exponential transition in the Markov chain model of that system. This is the

most severe constraint that limits the applicability of Markov chains. Whenever an activ-

ity having non-exponential duration is modelled with an exponential transition, an

approximation is introduced in the model and the evaluation results can be significantly

different from the exact ones. Several techniques exist to limit this error, such as the phase

expansion [30], that uses a sequence of exponential stages to approximate a non

exponential random variable.

Modelling the dependability of a system by using a Markov chain requires first a projec-

tion of the set of possible states of the system to define the state space S . Different lev-

els of fulfilment of dependability requirements are associated to the states in S . For in-

stance, a state can be included in S to represent the catastrophic failure of the system, or

a degraded operation mode. Then, the matrix (transition rate or probability) ruling the

transitions of the Markov chain model from the initial state must be defined. The solution

of the Markov chain focuses on the computation of the state occupation probabilities for

some of the states of S .

Homogeneous Markov chains can be solved in a very efficient way in terms of the tran-

sient, also called time-dependent, state occupation probabilities. For a discrete-time ho-

mogeneous Markov chain, we shall denote with π

in the occupation probability of state i

at step n, for each i ∈S and n ≥ 0 , and with rπn the vector that collects the state occupa-

tion probabilities for all the elements of the state space S at step n, for each n ≥ 0 .

Starting from the initial occupation probabilities rπ0 , the transient state occupation prob-

abilities at step n are computed from the n-th power of the one-step transition probabil-

ity matrix, as follows:

rπn =

rπ0 ⋅P

n , n ≥ 0 (1.1)


Similarly, for a continuous-time homogeneous Markov chain, we shall denote with π i( t )

the occupation probability of state i at time t, for each i ∈S and t ≥ 0 . The vector rπ( t ),

which gives the state occupation probabilities of all the elements of state space S at time

t, is computed from the matrix exponential of the transition rate matrix, as follows:

rπ( t ) =

rπ(0 ) ⋅ eQt , t ≥ 0 (1.2)

The pointwise and expected values of the most important dependability measures can be

evaluated from the time-dependent occupation probabilities.

For instance, suppose we are interested in evaluating the time-dependent Availability of a

processor. As a consequence of faults, the processor stops to be operative and reaches a

faulty state in which it is unavailable. The average time to failure of the processor is 1 / λunits of time. Once failed, the processor is put in reparation, and when the repair is com-

pleted, it becomes newly available. The repair activity in average lasts for 1 / μ units of

time. The simple two state homogeneous continuous-time Markov chain shown in Figure

1.3 is a suitable model to represent the evolution of the processor over time. The state

space of the Markov chain obviously represents the two possible states of the processor.

The transition from the operative state to the faulty one in the model occurs in with rate

λ , which corresponds to an average time of 1 / λ spent in state Ok. Similarly, the transi-

tion modelling the repair has a rate μ . The solution of the Markov chain according to

Equation (1.2) returns the time dependent occupation probabilities πOk ( t ) and

πDown( t ) of the two states of the model. Since the processor is only available when it is

in state Ok, the transient state occupation probability πOk ( t ) is exactly the time-depen-

dent Availability we are interested in.

λ

μ

Ok Faulty

Figure 1.3: Markov chain model of processor availability


To evaluate combined metrics such as Performability measures, a reward structure must

also be defined, which assigns reward values to the states of the Markov chain model. We

denote with wi the reward associated to state i, for each i ∈S , and with rw the vector

that collects the reward values of all the states in S . Quite informally, a reward wi can be

assigned to represent the gain achieved by the system for any unit of time spent in state

i. A Performability measures can be computed as the total reward cumulated as the

model evolves over time. This evaluation requires the derivation of the state occupation

probabilities, too. For a discrete-time homogeneous Markov chain, the reward cumulated

over the first n steps is denoted with W( n ), and is evaluated as follows:

W( n ) =

rw ⋅

rπi

h=1

n

∑ , (1.3)

and the reward W( t ) cumulated during the time interval [0 ,t ] for a continuous-time ho-

mogeneous Markov chain is computed as follows:

W( t ) = rw ⋅

rπ( t )

0

t

∫ dt (1.4)

1.2.3 Classical Petri nets

The Petri net modelling paradigm has been developed with the specific purpose of repre-

senting in a compact and clear way concurrence, synchronisation and cooperation among

processes [65]. Very soon, Petri nets have been widely accepted because of their ability

to describe the qualitative and quantitative aspects of complex systems, and also because

of their intuitive and appealing graphical representation.

Place-transition Petri nets

The very basic class of Petri net models is that of place-transition Petri nets [64, 70]. A

place-transition Petri net model comprises a set of places { P1,P2 ,K,Pn } , and a set of

transitions { t1,t2 ,K,tm } . The elements of the two sets are linked by arcs. Arcs going

from a place to a transition are called input arcs, and arcs directed from a transition to a

place are called output arcs. The places that are linked to transition t by an input arc are


called the input places of the transition. Similarly, those places linked to transition t by

an output arc are called the output places of the transition. Arcs have associated a posi-

tive integer weight, called multiplicity. In the graphical representation of the Petri net

model, places are drawn as circles, and transitions are drawn as thin bars, with the input

and output arcs linking them. Places may contain tokens, entities graphically represented

as black dots. The state of the Petri net model is a vector ( m( P1 ),m( P2 ),K,m( Pn ) )

called the marking of the net, and is defined by the number of tokens m( Pi ) in each place

i of the model.

The model elements introduced above define the static structure of the Petri net. Besides,

there is a dynamic evolution of the net, which evolves from an initial marking denoted

with rm0 , to reach other markings. A transition t is said to be enabled in marking

rm if

each of its input places contains at least as many tokens as the multiplicity of the corre-

sponding input arc. An enabled transition can fire, and the marking of the net is changed

by removing from each input place of the transition as many tokens as the multiplicity of

the corresponding input arc, and adding to each output place as many tokens as the mul-

tiplicity of the corresponding output arc. A place may be an input place for more than a

single transition. This leads to a competition of the enabled transitions for the tokens con-

tained in the input places. When two enabled transitions share an input place and the

number of tokens therein is not sufficient for both of them to fire, the transitions are said

to be in conflict, and a selection rule must be employed to break the competition in favour

of one of them. Usually, a priority is assigned to solve the conflicts.

The definition of a Petri net model to represent the evolution of a system requires a little

experience, but is usually a quite intuitive process. Tokens can be used to represent enti-

ties of the system, such as tasks to be executed, messages to be sent, or control elements

to be passed to other entities such as a capability. A number of tokens in a place is put in

correspondence to a possible state of the system to be modelled. A transition models an

activity or a synchronisation constraint of the system, and the firing rules define the pre-

conditions to be satisfied for the activity to be executed or the synchronisation to be

completed, respectively. As the enabled transitions fire, the marking of the net is changed

and other transitions can get enabled. Thus, a set of different markings, called the reacha-


bility set of the Petri net model, is generated. The reachability set can depend on the ini-

tial marking m0 of the Petri net model, and is denoted by RS( m0 ).

The dynamic evolution of the model can be described by a direct graph, called the reach-

ability graph and denoted by RG( m0 ), whose nodes are the markings in RS( m0 ), and

whose arcs are labelled with the transition that causes the corresponding marking change.

It is worthwhile observing that the number of markings in the reachability set RS( m0 )

can be a huge number even for a very simple Petri net. It is thus evident that Petri net

models can provide a concise formalism to represent complex systems with a big number

of states.

The model shown in Figure 1.4 provides is a well-known example to understand the

simplicity and the expressiveness of the place-transition Petri nets. The model represents

a semaphoric mechanism for the mutual exclusion access to a resource shared by two con-

current processes. The evolution of the first process in represented by the thread on the

left side, formed by place P1, transition t1, place P4 and transition t3. Exactly one token

circulates on the places along this thread, modelling the two possible states of the pro-

cess, that is the request of the resource to be granted and the exclusive use of the resource.

1

t1

P

2P

3P

4P 5P

t2

t3 t4

use of the resource

release

resource request

acquisition

semaphore

Figure 1.4: Petri net model representing a mutual exclusion problem

Transition t1 represents the await for the resource acquisition, and t3 the resource

release. Similarly, the thread on the right side formed by place P3, transition t2 , place P5

and transition t4 models the evolution of the second process. The state of the resource is


modelled through the place P2 : when a token is in P2 the resource is free and when the

place is empty, the resource has already been assigned. Therefore, a process requiring the

resource is forced to wait for the enabling of the acquisition transition, which only hap-

pens when the resource is free. Notice that the model in Figure 1.4 represents in a very

simple way the non determinism that arises as a consequence of to the true concurrent ex-

ecution of the two processes.

The class of place-transition Petri nets do not include any notion of time. In fact, they

have been introduced to model qualitative rather than quantitative aspects of systems. By

analysing the reachability graph RG( m0 ) generated starting from the initial marking m0 ,

it is possible to verify a set of structural properties of the Petri net model, and infer a set

of corresponding properties on the modelled system. For instance, the absence of dead-

lock states of the system can be checked by verifying that the model never reaches an ab-

sorbing marking, that is a marking in which no transitions are enabled. Also, it is possible

to check that certain activities of the system are always executed in a given order, by veri-

fying the firing order of the corresponding transitions modelling those activities in the

Petri net model. For instance, for the model in Figure 1.4, it is possible to verify that the

acquisition of the resource is always followed by a release of the resource.

Stochastic Petri nets

A very popular timed extension of the place-transition Petri nets is the class of stochastic

Petri nets [57], denoted as SPNs in the following. In a SPN model, each transition t has

an associated random firing delay whose probability density function is a negative expo-

nential with rate λ . These timed transitions are graphically represented by empty boxes.

To simplify the discussion, a transition having a random firing delay with a negative ex-

ponential distribution will be called an exponential transition, without any ambiguity.

The introduction of the firing delay imposes the definition of a more precise firing rule.

The enabling of a transition is exactly defined as it was in the case of the place-transition

Petri net models. As soon as a transition t gets enabled, a random firing time is sampled

from the distribution associated to t , and a timer starts counting form that time downto

zero. Transition t fires if and only if it remains continuously enabled until the timer

reaches zero. When transition t fires, the tokens are removed from its input places and

added to the output places in a single atomic and instantaneous operation (atomic firing


rule). It is interesting observing that in the time interval between the enabling and the fir-

ing of t , other transitions sharing some input places with t can get enabled and fire with-

out disabling it, provided that there is a sufficient number of tokens in the common input

places. On the contrary, in the case of a conflict, the transition whose timer reaches zero

the first is the one that fires (race model). It is also important to notice that the use of ex-

ponential distribution relieves the user from the specification of the behaviour of those

transitions that do not fire after having been enabled. Indeed, thanks to the memoryless

property of the exponential distribution, whether the memory of the time they have al-

ready been enabled is kept or not, the remaining time to the firing is exponentially dis-

tributed with the same rate.

It is easy to realise that the evolution of a SPN model can be represented by a continu-

ous-time homogeneous Markov chain, whose state space elements are in a one-to-one

correspondence with the elements of the reachability set RS( m0 ), and whose transitions

rates among states are equal to the firing rates of the transitions that produce the corre-

sponding marking change in the SPN. Thus, the associated Markov chain is indeed iso-

morphic to the reachability graph RG( m0 ). For instance, Figure 1.5 shows a SPN model

and the corresponding Markov chain model. The SPN is a simple model of a two proces-

sors system, where each processor fails in a time that is exponentially distribute with rate

λ , and is repaired in an exponentially distributed time of rate μ . The number of tokens

in place Up represents the number of healthy processors, and that in the Down place the

number of faulty processors. The two exponential transitions t1 and t2 model the failure

and the repair processes. Notice that the firing rates of the two transitions are dependent

on the number of tokens in the respective input places.

Down

t

Up

t

1 2λ μm(Up)

λ 2μ

2,0

1,1

2λ

0,2

μ

m(Down)

Figure 1.5: SPN model and the corresponding Markov chain


An SPN model can be solved in terms of the marking occupation probabilities by per-

forming the analysis of the associate Markov chain. As an example, suppose we want to

evaluate the probability that no processors are available at time t for the simple system

modelled by the SPN in Figure 1.5, that is we are interested in the time-dependent prob-

ability that the marking of the net is ( 0,2 ), given that the marking of the net at time 0 is

m0 = ( 2,0 ), that is both the processors are healthy. To answer this question, we can gen-

erate the reachability graph RG( m0 ) by examining all the possible firing sequences of the

transitions, and then build the transition rate matrix Q for the Markov chain. By apply-

ing Equation (1.2) we can evaluate the transient state occupation probability for the state

( 0,2 ) of the Markov chain, which is exactly the same as the occupation probability of

marking ( 0,2 ) of the Petri net model.

A reward structure can be associated to the markings of a SPN model, to evaluate some

Performability metric of interest. The reward values assigned to the various markings

need to be translated into the corresponding rewards over the state associated Markov

chain. The evaluation of the cumulated reward is the performed according to Equation

(1.4).

Generalised stochastic Petri nets

In a SPN model, only exponential transitions are allowed. This can be inconvenient when

we want to use a transition to represent a synchronisation between multiple concurrent

flows, because in this case there is not a time associate to the synchronisation, but rather

the transition is enabled by the occurrence of a particular event. For instance, an expo-

nential delay could be assigned to the two transitions that model the use of the resource in

the model in Figure 1.4, but it is hard to figure out a meaningful delay to associate to the

two transitions modelling the resource acquisition. Moreover, we are often interested in

modelling activities that require a negligible amount of time to be performed.

The class of generalised stochastic Petri nets, denoted by GSPNs [2], relaxes the assump-

tion that all the transitions have an exponentially distributed delay, and allows for expo-

nential transitions, and for instantaneous transitions as well, that is transitions that once

enabled fire in zero time. Conflicts among timed transitions are solved according with the

same race model as in the case of SPNs, whereas conflicts among instantaneous transi-


tions are solved by a priority assignment, and by associating weights (or probabilities) to

instantaneous transitions at the same priority level.

The solution of a GSPN model resorts again to that of an associate Markov chain.

However, for GSPN models, the reachability set elements are not in a one-to-one corre-

spondence with the state of the associate Markov chain. Indeed, because of the instanta-

neous transitions, some of markings in RG( m0 ) have a zero sojourn time, that is the

GSPN model spends a zero time therein. These markings are called the vanishing mark-

ings of the GSPN model, whereas the non vanishing markings are often called tangible

markings. Nevertheless, it is possible to operate a reduction of the reachability graph

RG( m0 ) to eliminate the vanishing markings, and to obtain the reduced reachability

graph RRG( m0 ). The reduced reachability graph is isomorphic to a Markov chain, and

the reduction procedure does not affect the equivalence between the non vanishing mark-

ing occupation probabilities of the GSPN and the state occupation probabilities of the

Markov chain. Therefore, the Markov chain associated to the reduced reachability graph

RRG( m0 ) can be solved to study the GSPN model evolution over time.

1.2.4 Non Markovian Petri nets

The standard definition of GSPNs implies that all the timed activities associated to the

transitions have an exponentially distributed duration, so that the evolution of the model

can be mapped into a continuous-time homogeneous Markov chain. As we already

pointed out, this constraint implies that an approximation is introduced into the models

whenever an activity with non-exponential duration must be represented. Recently, sev-

eral classes of Petri nets have been defined, which allow for transitions whose firing times

can be drawn from non-exponential distributions. These new classes of Petri nets are col-

lectively called non Markovian Petri nets [11]. Among non Markovian Petri nets, we con-

sider particularly interesting for the purposes of this dissertation the class of determinis-

tic and stochastic Petri nets [1], denoted by DSPNs, and that of Markov regenerative

stochastic Petri nets [21], denoted by MRSPNs.

Deterministic and stochastic Petri nets

DSPNs have been introduced as an extension of GSPNs, to allow the modelling of events

having deterministic occurrence times [1]. The set of transitions of a DSPN can be parti-


tioned into three disjoint sets: the set of instantaneous transitions, represented by a thin

bar, the set of transitions having exponentially distributed firing times, represented by

empty rectangles, and the set of transitions with deterministic firing times represented by

filled rectangles. This enriched set of possible transitions offered by DSPNs allows the

exact modelling of a wider set of system features, such as time-outs and the message

propagation delays in synchronous systems. Repair delays represent another example of

activities that are typically more accurately modelled by deterministic transitions than by

an exponential ones. Furthermore, several classes of PMSs, such as those employed in

space applications, have a preplanned deterministic duration of the phases they perform,

a feature whose exact modelling requires the introduction of deterministic delays.

The DSPN model shown in Figure 1.6 is a classical example very often encountered in the

literature. It represents the queuing model denoted as M / D / 1 / 2 / 2 [43, 85], where

two customers require a service of deterministic duration, modelled by transition t3. After

having been served, each customer spends a period of time modelled by the exponential

transition t1 before reapplying for the service. The instantaneous transition t2 together

with the control place P3 is used to model the exclusive access to the service.

P

P

PP1 2

3

4t tt1 2 3

Figure 1.6: DSPN model of the M / D / 1 / 2 / 2 queuing system

Unfortunately, the analytical solution of a DSPN model is not possible in general. Indeed,

the deterministic distribution do not enjoy the Markov memoryless property, and the

time-dependent evolution of the model requires to keep track of much additional informa-

tion [36], which greatly complicates the analysis. However, the analytical tractability is

guaranteed for the subset of DSPN model whose structure satisfies the following as-

sumption:

Assumption 1.1: at most one deterministic transition is enabled in each of the pos-

sible markings of the DSPN.


This hypothesis severely limits the expressiveness of DSPN models. Quite recently,

some attempts have been made to relax Assumption 1.1 [36, 67].

Provided that Assumption 1 holds, the time-dependent marking occupation probabilities

of a DSPN model can be analytically evaluated by resorting to the theory of the Markov

regenerative processes [26], denoted by MRGPs in the following. The definition of

MRGP is based on Markov renewal sequences [26, 72].

Definition 1.2: Let S be a discrete state space. A sequence of bivariate random

variables { ( Yi ,Ti ),i ≥ 0 } is called a Markov renewal sequence if:

1) T0 = 0 , Ti+1 ≥ Ti , and Yi ∈S ∀ i ≥ 0

2) Prob[Yi+1 = ′m ,T i+1 −T i ≤ t Yi = m,T i ,Yi−1 ,T i−1 ,K ,T0 ,Y0 ] =

= Prob[Yi+1 = ′m ,T i+1 −T i ≤ t Yi = m] =

= Prob[Y1 = ′m ,T 1 ≤ t Y0 = m] , ∀ m , ′m ∈S , ∀ t ≥ 0

The sequence { Ti ,i ≥ 0 } identifies a series of time instants. The first equation of point 2)

states that the discrete-time stochastic process {Yi ,i ≥ 0 } enjoys the Markovian prop-

erty of absence of memory at each time instant Ti , i = 1,2,K,n . This is the reason why

the word Markov appears in the name of the bivariate sequence { ( Yi ,Ti ),i ≥ 0 }.

Moreover, the second equality of point 2) says that process {Yi ,i ≥ 0 } is time-homoge-

neous, and its probabilistic evolution restarts unchanged after each of the time instants

Ti , that is it represents a renewal instant for the process.

Definition 1.3: A stochastic process { Z( t ),t ≥ 0 } is called a MRGP if there exists

a Markov renewal sequence { ( Yi ,Ti ),i ≥ 0 } of random variables such that all the

condi t iona l f in i te d i s t r ibu t ions of { Z( Ti + t ),t ≥ 0 } g i v e n

{ Z( u ),0 ≤ u ≤ Ti ,Yi = m } are the same as those of { Z( t ),t ≥ 0 } given Yi = m .

In simpler words, a stochastic process { Z( t ),t ≥ 0 } is a MRGP if it is possible to find

an embedded Markov renewal sequence, that is a sequence of time instants such that the

properties stated by point 2) of in Definition 1.2 hold of the state of process

{ Z( t ),t ≥ 0 }. The time instants defined by the renewal sequence are also known as the

regeneration points of the MRGP.

Let { m( t ),t ≥ 0 } denote the marking process of the DSPN, that is stochastic process

formed by the changes of the tangible markings over time, and let S be the state space of


such process { m( t ),t ≥ 0 }. Note that, due to the existence of deterministic transitions,

this stochastic process is not a Markov chain. Indeed, it has been shown by Choi,

Kulkarni and Trivedi in [22] that, if Assumption 1) holds, the marking process

{ m( t ),t ≥ 0 } of a DSPN is a MRGP. In the following, we briefly recall that proof.

A Markov renewal sequence { ( Yi ,Ti ),i ≥ 0 } embedded in the marking process is defined

as follows. For i = 0 , set T0 = 0 , and suppose m ∈S is the marking such that

m = m( Ti+ ), that is the marking of the model immediately after Ti . If m is an absorbing

marking, then set Ti+1 = +∞ . If m is not absorbing, and no deterministic transitions are

enabled in m , define Ti+1 as the first time after Ti that a marking change occurs. If one

deterministic transition is enabled in marking m , define Ti+1 to be the time when such

transition fires or is disabled. Now, define the sequence {Yi ,i ≥ 0 } as Yi = m( Ti+ ),

i ≥ 0 . It can be proved that the so defined bivariate sequence { ( Yi ,Ti ),i ≥ 0 } embedded

in the marking-process { m( t ),t ≥ 0 } of the DSPN model is a Markov renewal sequence,

according to Definition 1. Moreover, it is easy to show that the stochastic behaviour of

the marking process from time Ti onwards { m( Ti + t ),t ≥ 0 } depends only on Yi = m ,

which is the marking of the DSPN at time Ti , and according to Definition 1.2, this implies

that the marking process is a MRGP.

The solution of the DSPN in terms of the time-dependent marking occupation probabili-

ties can exploit the general transient analysis technique developed for the MRGPs.

Consider the time-dependent marking occupation probability of marking ′m at time t ,

conditioned to the initial marking m , and denote it vm , ′m ( t ), m , ′m ∈S , t > 0 . Moreover,

denote with V ( t ) = vm , ′m ( t ) the transient probability matrix that collects these condi-

tional probabilities. Once matrix V( t ) is known, the vector of rπ( t ) of the unconditional

marking occupation probabilities is simply evaluated as rπ( t ) =

rπ( 0 )V( t ), for a given

initial marking occupation probability vector rπ( 0 ).

Obtaining matrix V( t ) requires first to compute two fundamental matrices, called the

kernels of the MRGP. The matrix of functions K( t ) = km , ′m ( t ) , whose entries are de-

fined as km , ′m ( t ) = Prob[Y1 = ′m ,T 1 ≤ t Y0 = m] , m , ′m ∈S , and t > 0 , is the global

kernel of the MRGP. This matrix accounts for the evolution of the MRGP between two

successive regeneration points. The matrix of functions E( t ) = em , ′m ( t ) , whose entries

are defined as em , ′m ( t ) = Prob[m( t ) = ′m ,T 1 < t Y0 = m] . m , ′m ∈S , and t > 0 , is the


local kernel of the MRGP, which describes the evolution of the process within two suc-

cessive regeneration time instants. For the sake of conciseness, denote with K∗V( t ) the

matrix whose element m , ′m is defined as follows:

k∗vm , ′m = dkm ,h( t )vh , ′m ( t − u)

0

t

∫h∈S∑ , m , ′m ∈S , t ≥ 0

that is K∗V( t ) is the matrix of functions of t whose generic element is obtained with the

row by column convolution of the kernel matrix K( t ) and matrix V( t ), rather than with

the usual multiplication operator. According to the MRGP theory [26], matrix V( t ) can

be obtained by solving the following generalised Markov renewal equation:

V( t ) = E( t )+ K∗V( t ) (1.5)

Markov regenerative stochastic Petri nets

MRSPNs are a class of timed Petri nets that further extend the DSPNs. The definition of

MRSPNs is indirectly given in terms of a property of their marking process, as follows:

Definition 1.4: a Petri Net belongs to the class of the MRSPNs if its marking pro-

cess is a MRGP.

Definition 1.4 does not give per se an immediate description of the modelling features al-

lowed for the MRSPNs. Since Markov processes are special cases of MRGPs, the SPN

and GSPNs classes of Petri nets belong in fact to the MRSPN class. Also, the set of

DSPNs for which an analytic solution method exists is included in the MRSPN.

Moreover, MRGPs comprise the class of semi-Markov processes, too, for which activi-

ties having generally distributed duration are allowed [26, 33, 72]. Thus, MRSPN models

may include instantaneous, and exponential transitions, as well as transitions having gen-

erally distributed (not instantaneous nor exponential) firing times. We shall refer to this

last type of transitions with the name general transitions.

In fact, because of their generality, it is not easy to check whether a given timed Petri net

model is a MRSPN or it is not. In general, the marking process must be inspected to ver-

ify the existence of a Markov regenerative sequence, but this can be a very inefficient

check for complex models. A possible way to build Petri net models that are actually


MRSPNs is to prescribe well-defined constraints on the net structure, in such a way that

a Markov regenerative sequence is surely found in the underlying marking processes. A

sufficient assumption to be made on the net structure, which guarantees that the underly-

ing marking process in an MRGP, is the following one:

Assumption 1.2: at most one general transition is enabled in each marking of the

Petri Net.

In this case, the Markov property holds (at least) each time a general transition fires or it

is disabled. The property stated by Assumption 1.2 is usually adopted [21] as an easy-

to-check condition on the existence of the MRGP structure in the underlying marking

process, but obviously restricts the modelling capabilities of MRSPNs. In particular, it

prevents the concurrent enabling of generally distributed transitions. Less restricting con-

ditions, still sufficient for the existence of a Markov renewal sequence embedded in the

marking process, can be given for particular classes of MRSPN models having particular

structural properties [12].

The transient solution of a MRSPN model follows the same procedure as that described

for the DSPN class of Petri nets. In fact, they both resort to the MRGP transient solu-

tion techniques. However, the analytic expressions of the kernel matrix entries depend on

the specific firing time distributions of the general transitions.

Chapter 2

Phased mission systems

With the increasing complexity and automation encountered in systems of the nuclear,

aerospace, transportation, electronic, and many other industrial fields, the deployment of

processing systems in charge of performing a multiplicity of different control and compu-

tational activities is becoming common practice. Very often, the system and its external

environment can be altered still during the operation, in a way that the behaviour of the

system during a time interval may be completely different from that within other periods.

This time-dependent characteristic of a wide class of systems has led to the concept of

phased mission systems. In the following, the term system is intended to describe the

physical system, composed of computing resources, either hardware and software, to be

distinguished from the external environment the processing system is embedded in. A

phase is one of the time periods between two consecutive changes of the system or of its

environment, and mission refers to the overall duty the system is intended to execute.

Following this terminology, the mission of a PMS over time can be partitioned in a se-

quence of disjoint consecutive phases. In the following, we describe in detail the particu-

lar features of PMSs, especially dealing with the changes responsible for the phased be-

haviour. Then, we give a set of examples of PMS applications, to clarify the application

domains for which the study of PMSs is of interest. The principal dependability measures

of interest for a PMS are after that defined, and finally a application example is intro-

duced, which exhibits most of the peculiar features of PMS and that represents an inter-

esting case study to compare the various modelling and evaluation methodologies that

will be considered throughout the dissertation.

Phased mission systems 24

2.1 Peculiar characteristics of PMSs

We consider systems that consist of multiple components (hardware, software, mechani-

cal, etc.). Each component may be subject to faults, as a consequences of damages affect-

ing its internal subcomponents, or due to a negative influence of a stressing external envi-

ronment. Faults may result in complete failures or degraded performing modes of the af-

fected components. A failure criterion is a function of the state of the system, which de-

fines the accomplishment level of the system on the basis of the failed components. A

system configuration is a subset of the available components, arranged in an operational

configuration to execute a given workload for the accomplishment of a duty of the sys-

tem.

A system configuration, a specific workload, a state of the external environment, and a

failure criterion are the defining elements for a single phase system, the typical class of

systems usually taken into account in performance and dependability modelling studies.

In the context of a single phased system, all the above elements are assumed to be con-

stant during system operation, or at most some parameters are subject to slight variations.

A PMS is a generalisation of a single phase system, which completely relaxes this invari-

ance assumption. As a consequence, PMSs must be able to react to the changing system

scenario in order to best utilise their resources for the accomplishment of the mission

purposes.

Consider first the changes that occur in a PMS as a result of a variable workload. During

its mission, the system may be required to execute various different sets of application

tasks, thus defining an equivalent number of different phases. A particular workload has

well-definite performance requirements and thus necessitates of a specific set of re-

sources for its execution. Thus, changes in the incoming workload usually turn out in

changes of the operational configuration of a PMS. Furthermore, for a particularly de-

manding phase during which many resources are utilised, the failure criteria of the PMS

are more stringent than for a less computationally intensive one.

The activities performed within the various phases can be associated with distinct criti-

cality level, the dependable execution of certain phases being much more relevant than

that of other phases, during which the system performs non vital functionalities.


Moreover, even if the workload does not vary from a phase to another one, the relative

criticality levels of the various tasks may change, and a given task that was of secondary

relevance while within an earlier phase might become primarily important during a later

one. As an instance, consider the execution of the flight altitude control tasks for an

avionics PMS, for which the delivery of erroneous results during the landing phase is

much more dangerous that during the cruise phase. To guarantee an adequately depend-

able execution of critical tasks, fault-tolerance techniques are employed, which resort to

the deployment of redundancy [8, 16, 47, 78]. As it is common practice for critical sys-

tems, PMSs are thus equipped with redundant resources, and the operational configura-

tion may be rearranged to properly support the different dependability requirements of

the application tasks. Obviously, the failure criteria of a PMS during a critical phase are

more stringent than during non those critical ones.

The external influence of the environment is a relevant point for the analysis of PMSs,

which are very often required to operate in particularly stressing conditions. The negative

impact that intense mechanical or electromagnetic solicitations have on computing sys-

tems leads to dramatic increases in the failure rates of components, and can not be ne-

glected without compromising the accuracy of the evaluation outcomes. Thus, a further

partition of the mission into different phases becomes necessary to properly characterise

this time-dependence of the failure processes. Moreover, due to the increased failure rate,

the overall robustness against faults might be reduced. The fault-tolerance provisions that

are sufficient during periods of normal solicitation of the external environment may be

not adequate to ensure a proper dependability during the occurrence of such hazardous

situations. Therefore, a reconfiguration of the PMS to increase the amount of resources

for the execution of critical tasks may be useful to face the accelerated wearing of the

system components.

A last point that affects the evolution of a PMS over time is the varying availability of

system resources. Due to the occurrence of faults, the number of correctly performing

components is not constant. When a component is diagnosed as faulty, it is removed

from the pool of available resources, and the workload it had been assigned must be re-

distributed. Such an event has the potential to trigger a reconfiguration of the system, and

consequently the beginning of a new phase of the PMS. Some of the failed components


can be recovered and reintegrated among the available resources. This can be a result of

either an explicit intervention, like the replacement or the repair of the component, or an

implicit repair, as in the case of transient faults, whose influences last for a very short pe-

riod of time. Similarly as for the case of a removal, a reinsertion of a component may

cause a reconfiguration of a PMS.

A typical PMS starts its operations by executing the first phase, when completed it

switches to the second, and so on. The mission is completed when the last phase is com-

pleted. In case a failure of the system occurs during a phase, that is any of the failure cri-

teria is met, the overall mission fails. However, it is possible for a phase to be unsuccess-

fully executed, in a sense the goals of the phase are not achieved or are only partially

achieved, without compromising the integrity of the mission as a whole.

Phase changes are usually assumed to be instantaneous activities. Moreover, those activi-

ties performed at phase change times, as the reconfigurations of the system, usually take

a negligible amount of time. In the case when the system spends a non-negligible amount

of time during those changes, a set of fictitious phases could be introduced to represent

the phase changes themselves as tasks of the mission.

The selection of the next phase to be performed can be completely preplanned before the

starting of the PMS operation. In this static scenario, a predictable sequence of phases is

executed. Each phase may have a fixed or a random duration, depending on the events

that trigger the beginning of the next phase. Fixed duration is typical of phase changes

that result from predefined variations of the workload, or foreseen fluctuations of the ex-

ternal environment influence. Conversely, unexpected events such as a fault leading to

the removal of a system component, or the arising of an emergency situation of operation,

turn out in phases of random duration, whose initial and ending time instants may be un-

known.

More flexible missions of PMSs are also of interest in the analysis of PMSs. For instance,

consider a mission with multiple objectives, where some less relevant phases could be

sacrificed at the expenses of a more important one. An already initiated phase could be

aborted or shortened if the PMS reaches a dangerous condition, as an insufficient re-

source availability, or an excessive stress from the external environment. The profile of


the mission would be in all these cases represented by a tree of possible choices, rather

than a chain as it is in the case of a predefined sequence of phases.

2.2 Examples of PMSs

A number of examples of PMSs can be found in various application domains. Before

starting with a list of some of them, we would like to point out that a few studies have

been published in the literature showing the applicability of the PMS theory to the mili-

tary field. We will intentionally ignore this branch of the research on PMSs, for it does

not fit with our idea of scientific research. We strongly believe that the interest for the re-

sults of this dissertation will be not at all affected by this choice.

A typical class of PMS that is frequently found in the literature, is represented by the on-

board systems for the aided-guide of aircrafts. Big plane producers companies, like the

American Boeing, are investing relevant resources for the dependability assessment of

their products [80]. The mission of a flight control system is divided in phases as take-

off, ascent, cruise, approach and landing. These phases have completely different de-

pendability requirements, with the take-off and landing phases being the most critical

ones. The duration of most of the phases, including the critical ones, is known in ad-

vance, but a certain randomness is considered to account for the influence of the external

environment, specifically the weather condition, at the terminal airport.

A PMS analysis can be also applied to study the dependability features of the aircraft

during its whole life-cycle, to establish a proper schedule of maintenance operations. In

this context, we talk of the Scheduled Maintenance System (SMS) problem. The system

is intended to be used during its life-time for multiple missions. Within each mission, the

system behaves as a PMS. The system is run for a finite duration, and then it has to pass a

maintenance check. Typically, it is the case that after a prefixed number of missions the

system is completely checked, so that all its components are as good as brand-new ones

after the check. However, other kinds of maintenance actions can be performed between

these major checks. For instance, some highly critical components could be checked and

possibly repaired after each mission, and some others could be replaced after each mis-

sion even if they are still working. All these maintenance actions can be thought as re-


configurations of the system, and the same analysis framework of the PMSs adopted to

study the SMS problem. Obviously, the study of system dependability can be limited to

the period between two complete checks of the system, because after the check, the sys-

tem is reset to its initial conditions.

Another typical example of PMS is provided by the computing systems embarked on

long-life spacecraft, which must survive long periods of very low activity, and then are

required to achieve the main scientific goals of the mission in a short period of intense

solicitation. The phases for a spacecraft typical mission include: launch, hibernation, and

operational phases such as orbital manoeuvres, rendez-vous and fly-byes in proximity of

planets and other space objects. Performance and dependability requirements are sensibly

different from phase to phase. The launch phase is an example of extremely stressing

phase due to the intense mechanical solicitations. Hibernations are long dormancy periods

usually entered for cruise navigation, characterised by a minimal level of activity. The

spacecraft stays in hibernation for most of the mission, and only leaves it for short time

periods, in which the operational phases are performed. For instance, during a planet ren-

dez-vous phase the spacecraft performs gravity assist manoeuvres to increase its speed.

Afterwards, it enters a new hibernation phase till the next wake-up. To survive a mission

that lasts for several years, space PMSs are equipped with many redundant components.

Besides providing a high availability, these redundant resources are employed to meet the

dependability requirements of the critical operational phases.

Spacecrafts represent a very interesting class of PMSs because of their multi-objective

missions. Different goals are defined, some of primary and others of secondary relevance.

Phases performed to reach secondary goals can be skipped in favour of the more impor-

tant primary ones. As an instance, a phase could be skipped if the current spacecraft

hardware configuration is already degraded due to component failures experienced in the

past. Reaching a secondary goal might yield to an unreliable execution of a more impor-

tant subsequent phase. An example of this flexible mission can be found in a research

performed within the framework of the ESPRIT project 20716 GUARDS, Generic

Upgradable Architecture for Real-Time Dependable Systems [66]]. This research is ori-

ented at the study of the European Space Agency mission Rosetta, and was started by the

aerospace French company Matra Marconi Space, an end-user of the GUARDS project.


Rosetta is a unmanned spacecraft for scientific research whose primary goal is a rendez-

vous with the comet Wirtanen in 2011. Secondary objectives include two fly-byes with

asteroids during the 9-years cruise to reach the comet.

Besides these classical examples of PMSs, there are many other kinds of systems not

generally classified as such, that could be conveniently and in a completely natural way

reformulated as PMSs, and then modelled and solved with the specific methodologies for

PMSs. More precisely, we are considering systems having a phased mission, for which

the phase ordering is dynamically determined by the environment the system is embed-

ded in, or by particular sequences of faults.

Real-time systems with “multi-moded” behaviour [14, 19] represent good examples of

these PMS-like systems. These real-time systems behave in different modes, each mode

being characterised by a fixed incoming workload and a constant availability of system

resources. Each mode can be naturally associated to a distinct phase. Changes from a

mode to another one occur as a result of workload variations. Also, faults affecting sys-

tem resources may cause mode changes because they may lead to the failures and the

subsequent unavailability of resources.

The classical “server with vacation” model, which is very often encountered in the litera-

ture, is another example of a system that is readily reformulated as a PMS. The server

with vacation is a simple single server queuing model, where the server cyclically alter-

nates normal operative periods with vacation periods. During a vacation period, cus-

tomers are not served and the queue length increases. The duration of both the operative

and the vacation periods is usually randomly distributed. A simple PMS model of the

server with vacation would consider two phases of random duration, cyclically executed.

2.3 Dependability measures for PMSs

Different classes of dependability measures are considered for PMSs. A first spontaneous

question a PMS engineer or a user might be ask is: “What is the probability that the mis-

sion will be completed successfully?”. To answer such a question, it is first necessary to

precisely define a success criterion for the mission of the PMS. Suppose a PMS is re-

quired to execute a certain number of phases selected somehow from a set of possible


phases numbered {1,2,K ,n}. The criterion usually considered is the complete fulfilment

of the goals of the mission during all of the n phases, but it is also possible to give more

stringent requirements, such as the completion of the n phases within a given time-out

and with a particular operational configuration, or less demanding requirements such as

the fulfilment of at least n© out of the primary goals of the mission, and so on.

Most of the questions about a PMS can be answered by evaluating the so-called

Reliability at mission completion time, that is the probability, denoted with R throughout

this dissertation, that the PMS is in a successful condition (a condition that fulfils the suc-

cess criterion) at the time the last phase ends. This measure, which can also be variously

called probability of survival, or mission Reliability, is a reasonable way of stating re-

quirements for some classes of PMS. The computation of R requires to model the tran-

sient evolution of the PMS during the whole mission, to keep track of the events that can

impair its ability to perform properly, and of the healthy/failed state of the components.

A more complete information can be obtained by evaluating the Reliability of the PMS at

each phase completion time, denoted as Ri, i = 1,2,K ,n, hereafter. This kind of evalua-

tion can be useful to understand the relative criticality of each single phase within the

PMS mission. Drops in these Reliability figures help in pointing out possible bottleneck

phases, those to which the overall Reliability at mission completion time R is sensitive

the most.

The complete Reliability function, defined as the probability that the PMS is successfully

performing at time t, for any time t ≥ 0 before the end of the mission, will be denoted

with R( t ). This real-valued provides an accurate description of the PMS evolution over

time. It is worthwhile observing that, as a result of the different failure criteria and recon-

figurations of the various phases, function R( t ) may be a discontinuous one.

Another kind of measure that can be interesting for PMSs is a performability metric [9,

14, 55, 56, 62, 75, 82], which takes into account a measure of the total value produced by

the system in the mission. Performability oriented measures are especially appropriate for

multi-objectives missions of PMSs, where a range of scenarios is possible between com-

plete failure bringing no benefits and completely successful operation. For such PMSs, a

client might be interested in the expected utility produced over a mission. For performa-

bility modelling, a reward structure must be first defined, by assigning values to the dif-


ferent phases and/or to different activities/states of the PMS inside a phase. A reward as-

sociated with a state of the phase model represents a benefit that is obtained by the PMS

for the time spent in that state. For instance, suppose a state models an operative configu-

ration for the sampling of relevant data in a PMS for scientific purposes. The longer the

time spent in that configuration, the bigger the amount of information collected. If we are

interested in quantifying a measure of the value achieved with the gathered data, we can

assign a reward w to the state representing the configuration, and evaluate the total re-

ward cumulated while within the phase. A reward value can be also assigned to a transi-

tion, meaning that the benefit is obtained with the completion of the activity the transition

represents.

Different applications require different reward functions. A reward function must be cho-

sen to be realistic for a specific real-world situation, and no one reward model is appro-

priate in general. Even considering only additive reward structures, we have some appli-

cations where the cost of a mission failure is large compared to the gain from a successful

mission. For instance, in a civil avionics system, where each flight constitutes a mission,

the failure of a mission implies a cost varying between that of aborting a flight and

rescheduling or compensating the passengers, to that of a crash. A completely different

scenario would be a deep space probe, where, after the initial investment, each unit of

time of productive flight until the spacecraft fails for good may be considered as added

value, thus allowing some degree of utility even in the case the mission does not success-

fully complete. In this latter case, the reward cumulated during the whole mission, and

the cumulated reward function W( t ) at time t, for any time t ≥ 0 before the end of the

mission, are two measures of interest to assess the dependability of the PMS.

2.4 An application example

We present in the following an example of a generic PMS application. This application is

inspired by a real PMS space application that we thoroughly studied in [15] with one of

the modelling approaches we are going to introduce in the second part of the thesis. Here,

we modify that PMS to serve the purposes of this dissertation. The resulting hypothetical


PMS application includes most of the typical features of the PMSs that have been de-

scribed in this chapter.

This example of PMS will be conveniently used to exemplify the various methodologies

we will be dealing with throughout this dissertation on a common a case of study. To

point out the applicability limits of each of these methodologies, we intentionally define a

PMS that is challenging, in that it contains many aspects that usually pose relevant issues

to the modelling and evaluation activities.

We consider a PMS that has to execute four phases whose duration is a random variable

having general distribution Fi( t ) with average value τi , i = 1,2,K,4 . Phase 3 represents

a secondary though relevant objective of the mission, whereas the activities performed

within phase 4 are the primary goals to be achieved.

The various phases require different levels of dependability, and consequently have dif-

ferent failure criteria. To meet these dependability requirements the PMS is equipped

with four redundant identical processors that can be used in various configurations.

While within a given phase, the system uses at most those processors that are necessary

to reach the ideal configuration for the phase, that is the configuration that best fits the

requirements of the activities to be executed within the phase. The number of active pro-

cessors being currently employed is denoted with a . The unused processors are turned

off and act as cold spares. The number of spare processors is denoted with s . The differ-

ent requirements of the various phases and the relative failure criteria are shown in Table

2.1, together with all the other phase-dependent parameters that define the PMS.

During each phase, active processors are subject to faults. All the processors are assumed

to exhibit the same stochastic behaviour with respect to fault occurrences. Each processor

fails independently from each other, and the time to failure is exponentially distributed

with a rate that is constant within a phase. However, the same processor can have differ-

ent failure rates in different phases. The phase-dependent failure rates are listed in Table

2.1. The number of faulty processors is denoted with f . Spare processors are not af-

fected by faults.


Phase Idealconfiguration

Minimalrequirement

Failurerate

Reward Nextphase

Expectedduration

Startingcriterion

1 a = 3 a = 2 2λ 0 2 τ1 -

2 a = 2 a = 1 λ 0 {3,4} τ2 -

3 a = 3 a = 3 5λ w 4 τ3( f ) f = 0

4 a = 3 a = 3 2λ 10w - τ4 f > 0

Table 2.1: Phase-dependent parameters of the PMS

Faulty processors are removed from the pool of the active ones. To recover the ideal con-

figuration for the current phase, one or more reconfigurations are tried, with the activa-

tion of a spare component. This activity requires a negligible amount of time, and may ei-

ther succeed with probability c or fail, with probability 1− c . Should it fail, the spare

processor that does not switch on is declared faulty. However, the PMS is able to perform

its vital functions as long as the configuration does not degrade below the one expressed

by the minimal requirement for the phase, which in fact simply represents the failure cri-

terion for the phase, which is shown in Table 2.1.

After a period of time that can not be determined a priori, a faulty processor can get re-

paired. Such a repair of a processor may require the replacement of some internal compo-

nents, or the reinstallation of parts of the software in case of a permanent fault. For a fault

of transient nature, a rollback operation is usually sufficient to recover a consistent state

of the processor. The repair facilities are shared in the system, and faulty processors are

sequentially repaired one after the other, on a First Come First Served basis. The random

time needed for the repair is assumed to follow a general distribution low, with a cumu-

lative distribution function μ( t ), t ≥ 0 . Repaired processors become spares or active

processors depending on the requirements of the current phase.

When a new phase begins, the PMS needs to update its configuration to reach the most

adequate one with respect to the requirements of the phase. This reconfiguration may

trigger the activation of processors that are kept in a spare state. Since the switching on of

a processor can fail, there is the possibility that a phase fails at its very beginning.

A certain level of flexibility is foreseen for the mission of the PMS, and a dynamic deci-

sion on which phase has to be performed next can be dynamically taken depending on the

state of the system. Therefore, as shown in Table 2.1, a phase can have multiple subse-

quent possible phases, like phase 2 that can be followed either by phase 3 or phase 4. The


dynamic decision is taken at the end of the preceding phase, that is phase 2, by evaluating

a starting criterion that enables the beginning of operation of only one among the possible

phases. This selection mechanism allows to skip phases that could lead to excessively

hazardous situations for the PMS, or to sacrifice some secondary objectives for the sake

of some more relevant phase to be performed later during the mission. The possible paths

of the mission profile are conveniently represented by the tree shown in Figure 2.1,

where the branching represent the points at which a selection among different alternative

phases is performed. This tree plays a relevant role in the application of the hierarchical

modelling methodology.

1 2

3

4

4

Total accomplishment

Partial accomplishment

Figure 2.1: Dynamic selection of the phase to be performed

For this application example, we assume that the secondary objective of the mission, that

is phase 3, is to be skipped if any of the processors is faulty, as dictated by the starting

criterion, because during phase 3 the PMS active processors are subject to a very intense

stress that results in a high failure rate. This selection mechanism allows to skip phase 3

that could lead to excessively hazardous situations for the PMS, and the secondary goal

of the mission is sacrificed for the sake of a more reliable execution of the primary ob-

jective represented by phase 4.

If the starting criterion for phase 3 is not fulfilled, then the PMS directly proceed to the

primary objective represented by phase 4, as shown in Figure 2.1. On the contrary, both

the two objectives of the mission are pursued. Moreover, the PMS dynamically adjusts

the duration of phase 3 after the phase has already started, if a change in the operational

configuration occurs. We suppose that phase 3 has an expected duration τ3( f ) that is a

function of the number of faulty processors f . Precisely, if f = 0 , then τ3( f ) = τ3 , that

is the phase lasts for the period of time determined by the distribution function F3( t ).

However, as soon as the first processor fails, that is f > 0 holds, then the duration of the


phase is changed so that the new expected duration of the phase is τ3( f ) = τ3 with

τ3 << τ3 . In other words, if the configuration of the PMS gets degraded, then phase 3 is

consequently shortened, to avoid endangering a dependable execution of the primary ob-

jective. However, if the repair of the faulty processor is completed before the PMS leaves

the phase, then the duration of phase 3 is newly updated to the original value τ3( f ) = τ3 .

Therefore, not only the sequence of phases that are performed by the PMS is not fixed,

but the overall duration of the mission itself can dynamically change depending on the

availability of the processors.

The measure of interest for the PMS are the probability R of successfully completing the

mission, and the whole probability function R( t ). Moreover, to evaluate the probabilities

of the different accomplishment level of the mission, that is the achievement of both the

objectives of the mission, or that of the primary goals solely, we define a very simple re-

ward structure associated with the phases of the system. As shown in Table 3.1, the two

first phases have a null associated reward, in that their correct execution, though essential

to carry on with the mission, brings no practical benefits. The primary objective is esti-

mated to be ten times more valuable than the secondary objective. The total reward W

cumulated during the entire mission is the Performability metric of interest.

Chapter 3

State-of-the-art and open issues in theanalysis of PMSs

Systems showing a phased behaviour offer challenges in dependability modelling and

evaluation. The analysis of PMSs is indeed complicated by their dynamic behaviour,

which may change considerably from phase to phase. A different model should be in

principle built to represent the behaviour of a PMS inside each phase. This point affects

not only the modelling step, but the evaluation step as well. The model that represents

the behaviour of a PMS within a phase cannot be solved independently from the other

phase models. Indeed, the same system components are to be used inside the various

phases, thus introducing dependencies among the phase models. Moreover, the phase

changes very often reflect the occurrence of particular events that do not depend on the

system state itself, but are rather triggered by external events, such as time-out excep-

tions, or are preplanned for the entire system life-time.

To better understand the contribution offered by this thesis, in this chapter we present a

brief survey of the most significant results that have appeared in the literature for the

analysis of the dependability attributes of PMSs. We first consider the methods based on

combinatorial approaches, as Reliability block diagrams and Fault trees, and then the

more recent ones based on Markov models. The two kinds of approaches have both ad-

vantages and drawbacks. The combinatorial approach is conceptually simple, but it tends

to become computationally expensive as the number of phases increases. More impor-

tantly, combinatorial models can only deal with non repairable systems. The Markov ap-

proach is very adapt to represent the complex dependencies among components and the

dynamic structure of the PMSs, but there is generally a state explosion problem associ-

State-of-the-art and open issues in the analysis of PMSs 37

ated with large systems. Nevertheless, in our opinion the Markov approach does offer

the potentialities needed to adequately address the features of the more complex instances

of PMSs. We present a comparison of the methods based on the Markov chain approach,

by briefly discussing the applicability to the modelling and solution of the example of

PMS introduced in Chapter 2 of each methodology. This helps in understanding the main

advantages and drawbacks of the methods, and to clarify which points need to be ad-

dressed to extend the applicability of the results that have been proposed up to now in

the literature. Finally, we point out the issues that will represent the goals of the research

conducted throughout this dissertation.

3.1 Literature survey

PMSs have been widely investigated. Starting from the early studies [17, 86], which as-

sumed fairly simple phase dependencies of system components, many works have been

proposed which resort to Reliability block diagrams and Fault tree models [18, 34, 62,

81] (only for non-repairable systems) and to Markov chain models [3, 5, 31, 56, 79, 80].

A different approach based on the Stochastic Activity Networks modelling paradigm is

adopted in [7] and presented at the end of the survey; it represents a study closely re-

lated to some points of the investigation presented in this dissertation.

One of the earliest studies that propose a general methodology for the analysis of PMS is

found in the paper [34] by Esary and Ziehms. They defined therein a method for the es-

timation of the Reliability of Phased Mission Systems, based on Reliability block diagram

analysis. The system components are assumed to fail independently from each other, and

failed components are not repaired. They considered a transformational approach that

takes as input the separate Reliability block diagram of the individual phases, and gener-

ates a single Reliability block diagram representative of the overall mission. The various

failure criteria of the different phases are captured by the separate Reliability block dia-

grams specific for the single phase, and are then combined to obtain the failure criterion of

the system. In this way, a multi-phase mission is reduced to an equivalent, synthetic,

single-phase system, for which standard solution algorithm can be applied to compute

mission Reliability. A number of new events need to be introduced in the final Reliability

block diagram during the transformation step, to represent the dependencies among sys-


tem components from a phase to the successive ones. This increase in the number of

Reliability block diagram elements is partially compensated by a clever definition of the

set of minimal cuts, which can be reduced by exploiting the sequential order of execution

of the phases.

The paper of Burdick et. al. [18] compares the exact results provided by the method of

Esary and Ziehms with several less computationally demanding approximate methods

based on Fault trees. They considered the same system scenario as Esary and Ziehms did

in the work cited above, that is non repairable systems with components failing indepen-

dently from each other. The measure of interest was the Unreliability of the system. The

comparison was carried out on an application example, the emergency core cooling sys-

tem for a boiling water nuclear reactor.

A similar approach was presented by Pedar and Sarma in [62] to evaluate the effective-

ness of a distributed fault-tolerant avionics computing system taken as an application ex-

ample, still under the hypothesis of independently failing and non-repairable compo-

nents. They considered an analysis based on Reliability block diagrams and obtained both

exact and bound results for the system Reliability. They developed a systematic proce-

dure to cancel out the common events in earlier phases that are accounted for in later

phases. Different accomplishment levels for the mission were considered, ranging from

the regular completion of the flight up to the fatal crash of the aircraft. The impredictabil-

ity of the external environment was included in the analysis as well, through the insertion

of a dummy phase at the landing time.

Somani and Trivedi proposed in [81] an analysis technique that combines Fault trees and

Boolean algebraic methods to efficiently compute the probability of failure in complex

PMSs. Component failure rates, configuration and failure criteria may vary from phase to

phase. The failure criteria of each phase is expressed as a Fault tree, similarly as for Esary

and Ziehms, but the method does not build a single monolithic model. Rather, phases are

handled one at the time, thus obtaining a computationally efficient solution technique. A

phase algebra was developed to account for the effects of variable configuration and fail-

ure criteria from phase to phase. Quite recently, the method of Somani and Trivedi has

been extended and improved by Ma and Trivedi [51], who developed a much more effi-

cient algorithmic strategy for the application of the phase algebra.


The method presented by Meyer et al. in [56] for the analysis of an air transport mission

is the first example of Markov based approach for PMSs found in the literature. The au-

thors consider a PMS formed by the SIFT (Software Implemented Fault-Tolerance) pro-

cessing system plus the external environment. Similarly to the work of Pedar and Sarma

described above, different levels of accomplishment are defined for the mission of the air-

craft, and the measure evaluated is a performability metric [55]. The system and the ex-

ternal environment are modelled with a hierarchical approach, where three levels of mod-

elling are devised: the computer level, the aircraft level and the mission level. Specific in-

terlevel translations are performed to obtain the performability measure at the highest

level (mission) from the state probabilities of the lower level ones. The various phases are

modelled at the lowest level, the computer level. Within a given phase, the stochastic pro-

cess representing the PMS is a continuous-time homogeneous Markov process. These in-

traphase processes are allowed to differ from phase to phase. The overall solution is ob-

tained by sequentially evaluating all the phases, by using as initial probability distribution

of the currently analysed phase the final probability distribution of the previously solved

phase. The probabilities of the interphase transitions taking place at the time of phase

change are specified by interphase transition matrices. These matrices perform the map-

ping of the state probability distribution from one phase to the successive one. To apply

this method to the modelling of our example of PMS, many simplifying assumptions

must be introduced. First of all, the only distribution of the phase duration that can be ac-

commodated is the deterministic one. Therefore, the duration of phase i must be approx-

imated with the expected value τi of distribution Fi( t ), i = 1,2,K,4 . The consequences

of such approximation can be more or less relevant for the results of evaluation, mainly

depending on the variance of the original distribution Fi( t ). Moreover, only a static

profile of the mission can be considered with the method described above: a model for the

mission can be built only forcing a sequential execution of the four phases. The dynamic

selection of the next phase to be performed and the other flexibility characteristics of the

PMS taken as example are hence to be disregarded. Last, since phase models are built as

homogeneous Markov chains, the time needed for the repair of a faulty processor must be

modelled with an exponential random variable, thus introducing a further approximation

of the PMS behaviour. Once the simplified model of the PMS has been defined, the eval-


uation of both the Reliability and Performability measures of interest can be carried out

within the solution framework proposed by the method.

A similar approach has been used by Arlat et al. in [5] to evaluate the Reliability and

Safety figures of a space application. The various phases are separately modelled by

continuous-time homogeneous Markov process, and the mapping of the state probability

distribution from one phase to the next is performed by hand. A sensitivity analysis

based on multiple evaluations with respect to the variations of some of the most crucial

parameters, as the coverage of fault-tolerance strategies, is also performed. Concerning the

application of the methodology to the example of PMS, the same considerations as those

we have made for the previous method apply. Indeed, again only a constant duration of

the phases can be considered, with a static profile of the mission and only exponential

activities can be exactly dealt with. Performability oriented metrics are not taken into ac-

count.

The work presented by Alam and Al-Saggaf in [3] defines two different methods, to deal

with repairable PMSs having deterministic and random phase duration, respectively.

Both system failure criteria and failure rates of components are allowed to vary from

phase to phase. In the case of deterministic duration, an exact analysis methodology is

defined, which models the multi-phase system with a single Markov process. This overall

process is solved as many times as the number of phases the PMS consists of, and the

initial state occupation probabilities of the system at the time a new phase starts are ob-

tained from the final state occupation probabilities of the preceding phase. For PMSs

having a random duration of the phases, an approximate method is proposed. First, the

probability distribution functions of the phase change times are computed, and the ex-

pected values for the phase change times are obtained. Then, the single Markov model is

again utilised, but now the state occupation probabilities of the PMS at the beginning of

each phase are approximated with the state occupation probabilities evaluated at the ex-

pected phase change times. When applied to our example of PMS, the method of Alam

and Al-Saggaf allows to consider the random phase duration, but at the expenses of the

exact solution. Unfortunately, no estimates of the error introduced by the approximation

are provided, thus heavily limiting the confidence that can be put on the evaluation re-

sults. The flexible profile of the mission and the non-exponential duration of the repair


can not be modelled with this methodology. Only Reliability measures are taken into

consideration in the Alam and Al-Saggaf method.

Smotherman and Zemoudeh presented a method [79] that builds a single non-homoge-

neous continuous-time Markov model to carry out the analysis of PMSs. The behaviour

of the system inside each phase is represented by a different Markov chain having a sepa-

rate subset of the whole state space for the system. The state transitions are described in

terms of general random variables, and are generalised to include the events representing

the phase change. Thus, state dependent phase changes, random phase duration, time

varying failure and repair behaviour are all easily modelled. The system of differential

equations describing the evolution of the non-homogeneous Markov model is numerically

solved to evaluate the dependability measures. Concerning the application of this method

to the modelling of our example of PMS, we remark the generality of the assumptions

made, which accommodate most of those features that pose problems to the other

methodologies. In fact, apart from the dynamic selection of the mission profile, the ran-

dom duration of the phases and the non-exponential duration of the repair activity is ex-

actly modelled. However, as we will better see in the following, the generality of the hy-

potheses under which the method applies is heavily paid in terms of solution computa-

tional cost.

Similarly to the method of Smotherman and Zemoudeh, Dugan proposed in [31] a method

in which a single Markov chain with state space equal to the union of the state spaces of

the individual phase models is generated. The phase models are automatically obtained

from a Fault tree description of the success/failure criteria for the single phase. The tran-

sition rates are parameterised with phase numbers; thus, strictly speaking, the model is a

non-homogeneous Markov chain. The model is solved once for each of the phases the

PMS is formed by, and only those transitions whose label corresponds to the one of the

phase being currently handled are considered. The final state occupation probabilities of

one phase become the initial state occupation probabilities for the next phase. For the

applicability of the Dugan’s approach, the failure criteria of the various phases must be

such that if a state is a system failure state in a phase, then it can not become a proper

state for the system in a later phase. This constraint is not respected by our example of

PMS, for which phase 2 is less demanding than phase 1 and consequently a failed config-


uration in phase 1 is a feasible configuration during phase 2. Apart from this issue that

can not be solved without relevant changes to the structure of the PMS, the same mod-

elling problem encountered by the methods of Meyer and Arlat are also found for the

Dugan's method, that is random phase duration, non-exponential distribution of the time

needed for the repair activities and flexible mission profile can not be taken into account.

The evaluation of the Reliability is the sole objective of the method.

Somani et al. [80] presented a computationally efficient method to analyse multi-phased

systems. With this approach, as for the methods of Meyer et al. [56], and Arlat et al. [5],

Markov chains for different phases are developed and solved separately, instead of a sin-

gle Markov chain. The issue of varying failure criteria and changes in system configura-

tion from phase to phase is addressed by providing an efficient mapping procedure at the

transition time from a phase to another phase. While analysing a phase, only the states

relevant to that phase are considered, thus limiting the size of each individual Markov

model, and reducing the computation time without compromising the accuracy of the

evaluation. Phase can be of fixed or random duration, and in the latter case duration are

drawn from the Beta distribution. The same approximate approach as the one described

above by the method of Alam and Al-Saggaf [3] is adopted to deal with random phase du-

ration. Therefore, this method allows to model the random duration of the phases of our

example of PMS, at least in the case they belong to the family of the Beta distributions,

but incurs in the same approximation as the method of Alam and Al-Saggaf, which does

not bound the error introduced. The other features of the PMS taken as example, that is

the dynamic profile of the mission and the non-exponential duration of the repair, can not

be modelled with this method. Once again, the Reliability of the PMS is the sole measure

that can be evaluated.

Last, we describe the study by Aupperle et al. in [7]. The authors employed the

METASAN [75] modelling tool, based on Stochastic Activity Networks (SANs), to rep-

resent with two separate models the failure/repair processes of system component and

the changing environment which is responsible for the phased behaviour. The two sub-

models defined for the PMS are easily combined into a single overall model that is able to

capture the complex phase dependent behaviour by exploiting the modelling features of

the SANs. This approach is quite remarkable in that it contains many interesting elements


for the modelling of complex PMSs. Indeed, the SAN paradigm makes available many

useful modelling features, as activities having general distributions of duration, and the

expressiveness to define state dependent firing rates, and guards controlling the enabling

conditions of activities. Thus, the random duration of the phases, as well as the non-ex-

ponential duration of the repair of our example of PMS can be easily modelled.

Moreover, even if this possibility is actually not explored by the authors in the paper [7],

by exploiting the modelling features of the SANs it is possible to represent the dynamic

profile of the mission, too. However, no analytical solution method is provided for the

evaluation of the SAN model of the PMS, which was instead intended to be solved by

simulation. It is worthwhile remarking that the simulative approach can be very costly for

PMSs. Indeed, PMSs usually have very long mission times, during which a number of

short duration activities are performed. These characteristics easily result in expensive

and time-consuming simulation runs required to solve the models of PMSs.

3.2 Comparison of the methods based on the Markov approach

In this section we compare the Markov chain-based approaches [3, 5, 31, 56, 79, 80] and

the one based on SANs in [7], for the dependability modelling and analysis of PMSs. The

most relevant aspects of the comparison have been deduced from the application of the

various methodologies to the example of PMS introduced in the Chapter 2, and are sum-

marised in Table 3.1. As a general remark, it is possible to affirm that very few of the

methods try to define a general methodology for the analysis of PMSs; most of them re-

mains stuck at the study of the particular example they present, and it has been not easy

to figure out how to apply the results to the modelling and evaluation of the example.

The studies that have appeared in the literature may be roughly classified in two major

groups, on the basis of the approach used to deal with the changes in the structure of the

system. These studies consider the definition and the solution of either a global model in-

cluding all phases as proposed in [3, 7, 31, 79] or the definition of a distinct model for

each phase of the system and a separate evaluation for each of these models as in [5, 80].

The hierarchical method of Meyer et al. [56] represents a single exception. We shortly

explain in the following the advantages and disadvantages of both the two approaches.


The single/separate modelling of the phases is a key point that impacts most of the other

aspects: it affects the understandability of the models, the reusability and flexibility of

previously built models, the modelling of dependencies among phases and the complexity

of the solution steps. The single model approaches suffer from a lack of reusability: a

new model needs to be built if the behaviour of the system in any phase is changed or if

the phase order is changed. However, this task is considerably simpler for the SAN model

of PMS than for all the other Markov chain based approaches. As an advantage, the sin-

gle-model approach allows the exploitation of similarities among phases to obtain a com-

pact model. It is indeed often the case that the set of possible actions of the system

within a given phase is a subset of the actions performed in other phases. If we consider

our example of PMS, It could be hence convenient to build a single model in which all the

phases are properly “embedded”.

Conversely, the separate modelling of each phase allows the reuse of previously built

models of the phases. Moreover, it is easier to characterise the differences among phases,

in terms of different failure rates and different configuration requirements. The hierarchi-

cal approach combines the advantages of the single and separate modelling to keep phase

models small and easy-to-define and at the same time to provide different levels of ab-

straction at which the mission can be described and analysed.

Another relevant feature in analysing phased mission systems is represented by phase

duration. The choice between deterministic or random phase duration basically depends

on the features of the system to be analysed. The assumption of constant phase duration

seems to hold for a wide range of application of PMS, as for instance the space applica-

tions, where phases are pre-planned on ground. Moreover, this assumption enables an ex-

act solution of the models through the mapping of the transient state probabilities from

one phase to the next. When phases of random duration are to be considered the analysis

becomes more complicated. Exponential distributions do not appear to be a suitable way

to model phase duration due to their long tail behaviour, while using non-exponential ones

complicates the analysis and may lead to approximate solutions as in [3, 80], or may re-

quire the numerical solution of the associated set of differential equations, as in [79], or a

simulative solution as in [7].


Method Type ofmodelling

Phaseduration

Treatment ofdependencies

Complexity(upper-bound)

Meyer et al. 1979[56]

hierarchical fixed intraphase ma-trices O( Ci

2qiτ ii=1n∑ )

Arlat et al. 1986[5]

separate models fixed by hand O( Ci

2qiτ ii=1n∑ )

Alam et al. 1986[3]

single model random implicit O(C 2q τ ii=1

n∑ )

Smotherman et al. 1989[79]

single model random implicit O( n2C 2q τ ii=1

n∑ )

Aupperle et al. 1989[7]

single model random implicit simulation

Dugan 1991[31]

single model fixed implicit O(C 2q τ ii=1

n∑ )

Somani et al. 1992[80]

separate models random intraphase ma-trices O( Ci

2qiτ ii=1n∑ )

Table 2.1: Markov chain based methods for PMS

The treatment of dependencies among phases requires a mapping of probabilities from

states of a phase to states of the next phase, and represents a relevant issue to be ad-

dressed. The single model permits representing implicitly the mapping of probabilities

from one phase to another phase because phase changes and state changes are modelled

together. However, this joint modelling may result in a non-homogeneous Markov model,

as in [31, 79]. On the contrary, the separate approach requires us to explicitly perform

the mapping from state to state for each phase change. This job is conceptually simple

but can be cumbersome and becomes a potential source of errors for large models. In [56,

80] the mapping is realised through proper intraphase matrices, whereas in [5] the map-

ping is carried out manually.

To carry out a comparison of the computational complexity that the various methods re-

quire, we consider an estimation of the cost required to evaluate the Reliability at mission

completion time R for the four phase mission of our example of PMS. Since none of the

considered methods is able to deal with the flexibility characteristics, we neglect such fea-

tures of the mission and assume that the four phases are sequentially performed. For the

sake of generality, we give the computational complexity expressions for a mission having

n phases; letting n = 4 provides the values for this particular example. Moreover, to

provide a consistent example that can be analysed with all the different methodologies,

we suppose that phase i has a constant duration of τi units of time i = 1,2,K ,4, and

also consider a fixed, time-invariant repair rate μ for the time to repair, which corre-

sponds to an exponential distribution of the repair activity duration.


Suppose phase i is modelled by a Markov chain whose state space Si has Ci states

i = 1,2,K,n . Both the methods in [3] and in [31] solve a unique Markov model whose

state space is the set given by the union of the single state space of each phase. The size

C of the unified state space is at least maxi=1,2,K,n Ci , and can be at most

Cii=1n∑ . The

single Markov chain is solved n times for n phases. If the uniformization method [68] is

used to evaluate the transient probability vector, the overall complexity of the solution is

O(C 2 q τ ii=1

n∑ ), where q is defined as the maximum module entry of the generator matrix

of the whole Markov model. The method of Smotherman and Zemoudeh [79] requires the

solution of a set of nC differential equations. The authors adopt a Runge-Kutta solution

algorithm, at a cost bounded by O( n2C 2 q τ ii=1

n∑ ), where q is defined as above. The

complexity of the solution for the method in [7] can not be compared to the other listed

in Table 1, because the SAN model is to be solved by simulation. Therefore, the compu-

tational cost is affected by a number of factors, as the width of the confidence intervals,

the method used for the statistical analysis, etc. The hierarchical method in [56], as well

as those that adopt the separate modelling approach [5, 80], require a number of opera-

tions which is dominated by O( C

i2qiτii=1

n∑ ) , where qi is the maximum module entry of

the Markov chain generator matrix of phase i . This computational complexity grows lin-

early with the number of phases. Therefore, the methods that use separate solution of the

phase models are, in general, more efficient than the ones that use a combined model.

Note that all the computational cost reported here are to be intended as upper-bounds. In

fact, the matrices obtained are quite often sparse, thus allowing cheaper solution tech-

niques [68].

Just to give a flavour of the relative efficiency of the various methodologies, we show in

Figure 3.1 the plots of the computational complexities as functions of n, by fixing the

values of the other model parameters. Precisely, we set the state space cardinalities

Ci = 5 , the maximum module of the generator matrix q = qi = 1, and the phase duration

τ i = 1, i = 1,2,K ,n. We assume that Ci( n+ 1) / 2 holds for the methods that adopt the

single modelling approach, that is we suppose that the union of the various state spaces

of the PMS over the n phases results in the average cardinality between the minimum

value Ci and the maximum one nCi.


1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1 2 3 4 5 6 7 8 9 10

Separate modelling approaches

Dugan, Alam and Al-Saggaf

Smotherman and Zemoudeh

Complexity

n

Figure 3.1: Comparison of the computational complexities for PMS methodologies

Notice that the values of the computational complexities on the vertical axis follow a log-

arithmic scale. Therefore, the advantage of the separate modelling approach in terms of

solution efficiency is very relevant. This point of the comparison thus represents a dis-

criminating aspect for the choice of a solution methodology. In particular, the computa-

tional cost required by the single modelling approach of Smotherman and Zemoudeh

makes it practically feasible only for very simple PMSs.

3.3 Open issues and objectives of the thesis

From the review and the subsequent comparison carried out in the previous subsections,

we can conclude that several aspects of PMSs still represent a source of formidable

problems for the dependability modelling and analysis techniques that have been applied

for their solution.

The main concern for a dependability engineer dealing with the analysis of a PMS, is rep-

resented by the choice of a proper method that is powerful enough to be applicable to the

particular system of interest, and at the same time provides accurate results with a limited

computational effort. Unfortunately, the set of available methodologies is not uniformly

distributed over the spectrum of the possible scenarios of PMSs. A few well-defined


methodologies have been proposed for particular classes of PMSs, such as the method of

Esary and Ziehms [34] and that of Somani and Trivedi [81] for non repairable systems.

When more general classes of PMSs are to be studied, as in the case of repairable PMSs

with random phase duration, the set of the applicable methods for an exact solution is

much narrower (the Smotherman's method [79]) and the computational cost becomes ex-

tremely heavy. Definitely, there is a lack of a “scalable” method, that is a method able to

effectively deal with PMS instances of different complexity, from the simplest ones to

the most general cases, by adopting a well-defined coherent modelling and solution ap-

proach.

Moreover, the available methods very often turn out in intricate and hardly understand-

able models, where the phased structure of the system is either merged with the mass of

other modelling elements, as in the case of single model approaches, or split and only

implicitly taken into account, as for the separate modelling approaches. Also, these

complicate models may easily become a potential source of errors. The problem is par-

ticularly relevant for the separate modelling approach, because the explicit handling of

phase dependencies is a cumbersome and error-prone task. In this respect, we believe that

the hierarchical method proposed by Meyer et al. [56] is a promising approach.

Combining different views of a PMS at various abstraction levels helps to keep in a clear

and easy to read way both the information about the profile of the mission and that re-

garding the state of the system components.

The methods appeared in the literature can not accommodate several interesting features

of PMSs. In particular, they are limited to consider PMSs with completely static profiles

of their mission. The phase duration is mostly assumed to be constant, and those few

studies that relax such hypothesis do not provide exact analytic solutions. The ordering

of phases is always considered as fixed and predetermined before the beginning of the

mission. However, from the discussion in the previous chapter, it is very easy to figure

out situations in which some phases of the mission can be skipped or prematurely

aborted, or the mission diverted to secondary objectives.

The ultimate goal of this thesis is the development of computationally efficient and gen-

erally applicable methodologies for the analysis of PMSs. We intend to alleviate the ad-

ditional complexity that is introduced by the phased behaviour of PMSs with respect to


the dependability modelling and evaluation tasks for the single phased systems.

Conciseness, understandability, modifiability and reusability of the models are some of

the key aspects we consider as mandatory for a methodology to be an adequate one for

our purposes. We will be take into account flexible missions of PMSs, and we will aim at

removing the constraint of constant phase duration. Also, a limited computational effort

of the analysis will be regarded as a discriminating point. The asymptotic computational

complexity required for the separate modelling and solution of the PMSs inside the vari-

ous phases is set as an upper-bound for the computational complexity of the an efficient

methodology for the dependability analysis of PMSs.

To address these tasks, we employ various modelling strategies and tools in the second

part of this thesis. First, we introduce the method presented in Chapter 4, which is based

on a hierarchy of Markov models, and, within the same restricted scenario of constant

phase duration of PMSs, proposes several original elements that relieve some of the weak

points of the methods in the literature. In particular, we attack the problem of phase de-

pendencies, introducing a systematic modelling strategy that includes their explicit repre-

sentation in the form of specific submodels. Moreover, we perform a first step towards

the introduction of a flexible profile of the mission, allowing the dynamic choice of the

next phase to be performed at the time the preceding phase ends. The computational

complexity required for the solution of the models is of the same order as the threshold

one.

In Chapter 5 we present a different approach for the dependability modelling of PMSs,

based on the class of DSPNs. The method takes advantage of the power and expressive-

ness of DSPN to define very concise and easy-to-read models for PMSs. Remarkably, it

completely relieves the necessity of explicitly handling phase dependencies, thus avoiding

such a cumbersome and error-prone modelling task. At the same time, the computational

cost of the method is not increased with respect to the fixed threshold. Moreover, since

the phase-dependent behaviour of the system is described with a simple list of marking

dependent predicates, it becomes easy to rearrange and modify the model of the PMS.

Finally, the method is supported by the existence of analytical solutions which can be

fully automated. The solution efficiency greatly benefits from the splitting of Markov

chains for different phases. This way the computational cost needed to solve the overall


model is almost completely reduced to the cheaper problem of separately solving the

PMS model inside the various phases, and the computational complexity is of the same

order of magnitude as the one of the previously proposed cheap hierarchical method.

Very flexible mission profiles can be readily modelled and analysed. The proposed solu-

tions approach for the DSPN model allows to perform the analytical sensitivity analysis

of a PMS, a task that, to the best of our knowledge, has not been addressed yet in the lit-

erature. If some of the parameter values of the PMS model must be varied, for instance

because they are subject to some estimation error, or because different scenarios of the

system are of interest, a convenient strategy of analysis is that of studying the sensitivity

of the model with respect to the parameter variations. The main limitation of the DSPN

approach is that does not allow to widen the applicability with respect to the methods

proposed in the literature. Indeed, the same restrictive assumptions on the PMS must be

satisfied to permit the applicability of the DSPN modelling approach. In particular, the

duration of the phases are once again restrained to be deterministic.

The last methodology we present in Chapter 6 of this thesis has the specific goal of en-

larging the class of PMSs that can be analytically solved. We propose the class of

MRSPNs as a modelling tools for PMSs, and we derive a general, exact, and efficient ana-

lytic solution method. We show that the MRSPN models are able to deal with the com-

plex and dynamic behaviours of PMSs, representing in quite a natural way all those sce-

narios of PMSs previously treated by the various studies in the literature. Much more,

we show how well the MRSPN models can deal with those features of PMSs that could

not been accommodated by the methods appeared in the literature. In particular, we pro-

vide an exact analytic solution technique that can be applied to PMSs with a phase dura-

tion drawn from a random variable of general distribution. The solution for PMSs with

deterministic phase duration is obtained as a particular case. Once again, the computa-

tional complexity is reduced to the one needed for the separate solution of the PMS in-

side the different phases. Chapter 6 also presents the analytical derivation of the sensitiv-

ity functions for the MRSPN models of PMSs.

PART II:

New metho do lo gies fo r the analys i s o f PMS dependabi l i ty

Chapter 4

Hierarchical Markov models forPMSs

In this chapter we present an approach to the modelling of PMSs with constant phase

duration, based on the hierarchical combination of various Markov models which repre-

sent different abstraction level views of the system. An example of application of this

methodology can be found in [15].

Hierarchical models offer many advantages, and have been employed in many contexts for

the analysis of various classes of processing and communication systems. For instance,

Courtois and Semal have proposed in [28, 29] exact and bound methodologies based on

the solution of a hierarchy of homogeneous Markov chain models of the system. The

seven levels of the OSI model of the communication networks have been modelled with a

hierarchical approach by Conway in [27], while the hierarchical modelling of the single

levels has been presented by Perros et al. in [63], Reiser in [69], and Schwartz in [77]. Tai

et. al. in [83] have applied a hierarchical modelling approach to derive the optimal duty

period for on-board preventive maintenance in a space application. The method proposed

by Meyer et al. [56] (introduced in Chapter 2) is the sole example of a hierarchical mod-

elling approach for the dependability modelling of a PMS, an air-transport application in

the particular case.

Here, we mainly follow the basic ideas of a methodology that has been originally pro-

posed by Nelli et al. for the analysis of complex single phased systems operating in the

railway application field [60] The use of a hierarchy of models represents a suitable mean

to master the complexity of the analysis. Different submodels of various parts of the sys-

tem can be separately solved and the obtained results aggregated to define a more compact

Hierarchical Markov models for PMSs 52

model of the system at an higher abstraction level. The iterative application of such a pro-

cedure alleviates the state explosion problem caused by the combinatorial growth of the

model size. At the same time, the savings in the computational complexity required for

the solution of the models permits the refined modelling of many features which can not

be usually taken into account due to the limited capacity of the solution tools. Very de-

tailed submodels of the most interesting parts of the system can be built and solved in

isolation.

We build a two level hierarchical model. The upper level describes the mission of the

PMS as a composition of phases, without detailing the internal behaviour of the system

within each single phase. This allows to easily model a variety of mission scenarios by

sequencing the phases in different ways. Moreover, we can consider missions in which a

dynamic selection of the next phase to be performed can take place, thus allowing a prob-

abilistic selection of alternative paths for the mission profile. Such probabilistic choice of

the mission has not yet been considered in literature, to the best of our knowledge. The

parameters to be used in the upper level model are obtained by solving the lower level

models. These models (one for each phase) detail the behaviour of the PMS inside phases

and are built and solved separately from each other. This way, if a phase is repeated dur-

ing the mission, a model for that phase can be built once and reused when needed. To rep-

resent the dependencies among successive phases, we introduce an additional set of sub-

models, that we call Phase Transition Models (PTM), which describe in an explicit and

precise way the changes that occur in the PMS as phase alternate.

In the following, we first describe the two modelling levels of the methodology, and then

we detail the solution procedure of the models and evaluate the computational complexity

required for the application of the out hierarchical approach. After that, the application of

the methodology to the PMS taken as example is presented. Last, we give some remarks

to discuss the main advantages of the hierarchical modelling approach with respect to the

other proposals that have appeared in the literature.


4.1 The modelling approach

We present now the general guidelines for the hierarchical dependability modelling of

PMSs having fixed duration of the phases. Phases are numbered 1,2,K,n and their dura-

tion is denoted with τ1,τ2 ,K,τn , respectively. The methodology we propose is an exact

one, where the only approximations introduced are those due to the computational solu-

tion algorithms. The two levels of the hierarchical modelling are first introduced and sub-

sequently applied for the analysis of the application example presented above.

4.1.1 Upper level model

The upper level is a concise model that represents the overall mission of the system as a

homogeneous discrete-time Markov chain. Since we intend to consider the dynamic selec-

tion of the mission profile, the PMS may in fact perform different missions. The upper

level model must take into account all the possible sequences of phases that can be per-

formed by the PMS.

The state space SU of the upper level model is generated by analysing the tree of all the

possible paths that the mission can follow. This tree, contains more nodes than the num-

ber n of the possible phases, in that some phases can be repeated along different paths.

For each node belonging to the tree, a distinct state is included in SU . In this way, phases

found in different subtrees are considered as different one from each other. This distinc-

tion allows the memory of the past history of the system to be taken into account in its

further evolution. Two extra states, named “FAIL” and “STOP” are added to SU , which

represent the premature loss and the completion of the mission, respectively.

Same as for the state space SU , the one-step transition matrix PU of the upper level

model is built from the analysis of the tree of the possible mission paths. For each direct

edge linking phase i to phase j in the tree, a non-zero transition probability pi , j is as-

signed to the entry of PU identified by the row corresponding to phase i and the column

corresponding to phase j . This transition probability representing the probability that

the system can start the new phase j after phase i has successfully completed. For each

state but STOP, a non-zero transition probability p

i ,F is assigned to the entry of PU

identified by the row corresponding to phase i and the column corresponding to state


FAIL, representing the failure of the PMS inside phase i , and the consequent failure of

the whole mission. For each state i representative of a phase that is a leaf of the tree of

the possible mission paths, a non-zero transition probability p

i ,S to the entry identified

by the row corresponding to phase i and the column corresponding to state STOP, repre-

senting the probability of successful completion of the last phase of the mission. All the

other entries of matrix PU are set to zero.

The transition probabilities that represent the entries of matrix PU are obtained solving

the lower level model. It is worthwhile observing that the upper level model does not rep-

resent stochastic dependencies among the various phases, arising from the usage of the

same system architectural components in all the phases of the mission. This dependen-

cies will be properly taken into account by the models of the lower level.

A reward wi can be associated to each state i of the upper level model, representing the

benefit obtained by the system with the execution of phase i . Clearly, in a given mission,

a reward may be associated to only a subset of the phases, and, as it will be shown later,

even those phases having a reward associated might be completed still bringing no benefit

at all. The reward wi can be either directly assigned to phase i at the upper level, or it can

be obtained from the solution of the lower level model of phase i . In this latter case, more

detailed reward structures can be considered, as it will be discussed in the following.

4.1.2 Lower level models

The models of the lower level explore the behaviour of the PMS inside the various

phases. We choose to model each phase separately, as the methodologies presented in [5,

56, 80], which have been described in Chapter 1. The standard dependability modelling

techniques [2, 6, 9, 13, 29, 32, 41, 43, 44, 47, 49, 52, 60, 77, 85] can be applied to model

the behaviour of a PMS inside a single phase, where all the phase-dependent parameters

stay constant. Notice that, if a phase is repeated during the mission, then it will be mod-

elled only once, and the model reused whenever it is needed.

A model for a phase is built by using a continuous-time homogeneous Markov chain. The

state space Si represents the set of operational configurations admitted for phase i ,

i = 1,2,K,n of the PMS. All the configurations that meet the failure criteria for the phase

are mapped into an absorbing failure state of Si , that is a state having no outgoing transi-


tions. If multiple failure modes are possible, then a set of failure states can be defined, to

keep track of the events that lead to the failure of the phase. A reward structure can be

defined at the level of phase models, by assigning reward values to the states or to the

transitions of the Markov chain.

The transitions among states of the model of phase i are defined by the transition rate

matrix Qi , i = 1,2,K,n . These transitions represent the events that may affect the PMS

dependability, such as the occurrence of faults, and repairs, or the diagnosis of the failed

components.

Notice that we can equivalently build separate model of each phase with SPN or GSPN

models. Such a higher level modelling would allow the verification of correctness of the

modelling, through the check of various structural properties of the Petri net models, to

increase the confidence in the modelling itself. The chosen reward structure could be de-

fined at the level of the Petri net, over the markings of the model. Very importantly, the

Petri net phase models can be automatically transformed into the associated continuous-

time homogeneous Markov chains. This automatic transformation also translates the re-

ward structure over the states of the corresponding Markov chain. Since Petri net models

are much more expressive with respect to Markov chains, this transformational approach

is very convenient in terms of easiness of modelling. However, to adopt a consistent

modelling approach and avoid the introduction of too many different models, in the fol-

lowing we directly define the Markov models of phases.

Due to the very nature of homogeneous Markov chains, the transitions among states of a

phase model occur in a negative exponentially distributed time. This implies that time

needed to perform whichever activity of the PMS inside a phase will be modelled with an

exponential transition in the Markov chain model of that phase. This is the most severe

constraint that limits the applicability of Markov chains. Whenever an activity having

non-exponential duration is modelled with an exponential transition, an approximation is

introduced in the model and the dependability results can be significantly different from

the exact ones. Several techniques exist to limit this error, such as the phase expansion

[30], that uses a sequence of exponential stages to approximate a non exponential random

variable.


Once the Markov chain models of all the phases have been built, n separate models de-

scribing the evolution of the PMS are available. As explained in Chapter 2, these n mod-

els are dependent one from each other, and they need to be properly linked to correctly

represent the time-dependent behaviour of the PMS over the whole mission. For in-

stance, consider the first phase of the mission. The evolution of the PMS inside this

phase, that is during the time interval [0,τ1 ), is described by the evolution of the corre-

sponding Markov chain phase model. Starting from the initial state occupation probabili-

ties, the time-dependent solution of the model can be computed according with Equation

(1.2), for any time instant t up to the phase ending time τ1.

At the time τ1, several changes may occur in the PMS, as described in Chapter 1, and the

future evolution of the PMS is represented by the Markov chain model corresponding to

the second phase. Due to the sharing of the architectural components, the initial state oc-

cupation probability for this next model are dependent from the final ones of the first

phase model. Since this next phase has in general a state space different from that of the

previous one, a mapping procedure is necessary to assign the initial state occupation

probabilities. Such a mapping procedure must be performed for any pair of consecutive

phases.

As we already pointed out in Chapter 2, the mapping of the state occupation probabili-

ties is conceptually simple, but becomes a boring task and a potential source of errors.

Indeed, the mapping is to be performed at the level of the phase model state space, which

easily becomes a huge set for a complex PMS. To attack the problem with a systematic

approach, we define specific submodels for phase changes, which address the treatment

of dependencies between phases, by explicitly representing the mapping between the

state spaces of consecutive phases.

These submodels are called Phase Transition Models, abbreviated as PTMs in the follow-

ing. PTMs are defined in terms of the state space of the Markov chains. PTMs map the

state occupation probabilities of the PMS at the end of a phase into the initial occupation

probabilities of the successive phase. Also, PTMs account for the configuration require-

ments of the next phase, and for any instantaneous activities that may need to be per-

formed at phase change times. The initial occupation probabilities of the next phase is

normalised with respect to the overall probability of starting the phase.


To better explain the role of PTMs, consider the case when the PMS completes phase i

and phase j is the one to be executed next. Let use denote with i1,i2 ,...,ih the elements of

the state space Si of the Markov chain underlying the model of phase i , and with

j1, j2 ,..., jk that of the state space S j of the model of phase j . In this case, where there

is only one possible next phase, we model this phase change by what is called a

Deterministic PTM, which is shown in Figure 4.1.

The mapping between states takes into account the instantaneous activities that are exe-

cuted at the phase change and of their associated probabilities. For instance, suppose that

the system is in state i1 at the time when phase i ends, and a reconfiguration must take

place before starting phase j . Assume that this reconfiguration can either succeed or fail,

with probability a and b , respectively, and that according to the outcome a different

system configuration is reached in the next phase. The two arcs leaving state i1 in the

PTM of Figure 3.3 just model such feature. The occupation probabilities of states that

are failure states either for phase i or for phase j are not mapped by the PTM.

a b c d

1

2

h

1

2

k

. . .

. . .

Phase i

Phase j

i

j j j

i i

Figure 4.1: Deterministic PTM for the transition from phase i to phase j

The mapping modelled by the PTM can be translated into an associated transition prob-

ability matrix Mi, j , with h rows and k columns, whose entries give the transition prob-

abilities for each pair of states ( u,v ), u ∈Si and v ∈S j , that is the probability on the

corresponding arc of the PTM. Matrix Mi, j is shown in Figure 4.2 where only the non-

zero entries related to the four arcs depicted in Figure 4.1 are represented. As we will see

in the following, the matrix-form of the PTM is particularly useful during the evaluation

of the models. Conversely, its pictorial representation is more suitable for the modelling

in that it helps in understanding the dependencies among phases. The clearness in the


modelling of the PTM is even more useful when dealing with phase changes that involve a

choice on the phase to be performed next.

j1 j2 L jki1i2M

ih

a b L

c

M O

d

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

Figure 4.2: Transition probability matrix associated to a PTM

Let us suppose that after phase i , either phase j or phase m can be performed, depend-

ing on some conditions related to the current system configuration. These conditions can

be used to control the mapping of probability distribution among marking of successive

phases. Let i1,i2 ,...,ih denote the elements of the state space Si of the Markov chain un-

derlying the model of phase i , and j1, j2 ,K, jk and m1,m2 ,K,ml those of the state

spaces S j and Sm, of the models of phase j and m , respectively. The choice between

two possible phases only is modelled here for the sake of clearness, but the extensions to

more than two phases is straightforward. We model this phase change with a probabilistic

PTM, shown in Figure 4.3.

i

a b c d

1

2

h

1

2

k

. . .

. . .

Phase i

Phase j

m 21 m m. . .

Phase m

i i

jjj

e

l

Figure 4.3: Probabilistic PTM modelling the dynamic choice of a phase

The associated transition probability matrix Mi, j ,m of the probabilistic PTM is shown in

Figure 3.6. Matrix Mi, j ,m is defined and in the case of deterministic PTMs, except that

the matrix has as many columns as the sum of the state spaces of all the possible phases

that may follow the current one. For instance, for the PTM in Figure 4.4, the transition

probability matrix has h rows and k + l columns.


j1 j2 L jk m1 m2 L ml

i1i2M

ih

a b L c L

d

M

e

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

Figure 4.4: Transition matrix associated to a probabilistic PTM

A special PTM is built to represent the transition from the last phase executed to the fic-

titious STOP state. It simply maps the occupation probability of each non failure state to

a single state STOP that represents the successfully completion of the mission.

4.2 Evaluation procedure

In our modelling framework, the evaluation is hierarchically performed, with a bottom up

solution procedure. First, the lower levels models are sequentially solved following their

ordering, as defined in the upper level mission model. The solution of the lower level

models returns all the parameters needed to instanciate the upper level model, which is

solved last to evaluate the measures of interest.

The lower level models are solved using the following iterative procedure. Start with the

first phase of the mission, say phase i , and solve the Markov chain model with a proper

vector rπ i ( 0 ) of initial state occupation probabilities, which for this initial step must be

provided from outside. This initial vector represents the state of the system at the very

beginning of operations. The Markov chain is solved by according to Equation (1.2), to

obtain the vector rπ i ( t ) of transient state occupation probabilities, for any 0 ≤ t < τi .

Also, if a reward structure is defined for the phase model, the total reward wi cumulated

during the phase is evaluated according to Equation (1.4), and reported at the upper level

to instanciate the discrete-time Markov chain model of the mission.

By evaluating rπ i ( τi ) , the state occupation probabilities at the end of the phase, we ob-

tain the input needed to solve the PTM. The solution of a PTM depends on its nature.

Suppose first that we are considering a deterministic PTM, that is there is only phase j

can be executed after phase i . Let us recall that Mi, j denotes the transition probability


matrix associated with the PTM modelling the phase change from i to j , and let k be the

number of states of the Markov chain for phase j . We first calculate the probability pi , j

of starting the next phase as follows:

pi , j =rπ i ( τi ) ⋅ Mi, j ⋅

re j , (4.1)

where re j is a vector having k elements, and whose components are all 1. Note that this

scalar value is exactly the mass probability which is mapped by the PTM from phase i

to phase j . Probability pi , j can thus be reported at the upper level, together with the as-

sociated probability pi ,F = 1− pi , j . Then, the vector rπ j ( 0 ) of initial state occupation

probabilities for the states of the Markov chain model of phase j is derived according

with the following formula:

rπ j ( 0 ) =

rπ i ( τi ) ⋅ Mi, j (

rπ i ( τi ) ⋅ Mi, j ⋅

re j ) =

rπ i ( τi ) ⋅ Mi, j pi , j

Note that via the division by pi , j we are conditioning the initial probability of subse-

quent phase with respect to the probability of successfully starting the phase.

The solution of a probabilistic PTM proceeds as follows. Let j1, j2 ,..., jr be the phases

that could be performed after phase i , and let us denote with Mi, j1 , j2 ,..., jr

the transition

probability matrix associated with that probabilistic PTM. Moreover, let k1, k2,..., kr be

the number of states of the Markov chains for phase j1, j2 ,..., jr respectively. We first

evaluate the probability pi ,F that none of the phases j1, j2 ,..., jr , starts, as follows:

pi ,F = 1−

rπ i ( τi )Mi, j1 , j2 ,..., jr

re j1 , j2 ,..., jr

(4.2)

where re j1 , j2 ,..., jr

is a vector with as many elements as the number of columns of

Mi, j1 , j2 ,..., jr

, and whose components are all 1. This probability is reported at the upper

level. Note that rπ i ( τi )Mi, j1 , j2 ,..., jr

re j1 , j2 ,..., jr

is exactly the mass probability which is

mapped by the probabilistic PTM to the subsequent phases. At this point, to evaluate

the initial state probability distributions rπ j1 ( 0 ),

rπ j2 ( 0 ),...,

rπ jr ( 0 ) we obtain the fol-

lowing fictitious state occupation probability vector rπ∗( 0 ):

rπ∗( 0 ) =

rπ i ( τi )Mi, j1 , j2 ,..., jr


The probability pi , j1

, pi , j2

,..., pi , jr

of starting each of the new phases is evaluated by

properly partitioning vector rπ∗( 0 ). More precisely, denote with

rπ∗( 0 )|j1 the subvector

of rπ∗( 0 ) that is related to the mapping between phase i and phase j1, which is the sub-

vector formed by the first k1 elements of rπ∗( 0 ). Similarly, let

rπ∗( 0 )|j2

be the subvector

with k2 components related to the mapping between i and phase j2 , and so on, so that

rπ∗( 0 )|j1 ,

rπ∗( 0 )|j2

,..., rπ∗( 0 )|jr

is partition of rπ∗( 0 ) into consecutive subvectors, each of

them representing the probabilities that are mapped by the transition to one of the pos-

sible next phases. The probability pi , j1

of starting phase j1, and the vector rπ j1 ( 0 ) of

initial state occupation probabilities for phase j1 are evaluated as follows:

pi , j1

=rπ∗( 0 )|j1

re j1

rπ j1 ( 0 ) =

rπ∗( 0 )|j1 pi , j1

Similarly, we obtain the other probabilities and initial vectors for phases j2 ,..., jr .

Once the new vectors of initial probability distribution are known, all the phases can be

sequentially solved as described, and all the parameters needed by the upper level model

are made available. Solving the discrete-time homogeneous Markov model of the upper

level allows to derive the dependability measures of interest. The Reliability R at the

mission completion time is obtained from Equation (1.1), by evaluating the occupation

probability of state STOP. Similarly, we can obtain the Reliability Ri at the completion

time of phase i , i = 1,2,K,n . The performability is very easily obtained by applying

Equation (1.3).

The complexity of the overall evaluation step is readily evaluated by following the steps

of the solution. The computational cost of solving one phase model is determined by the

cardinality of the state space of its underlying Markov chain. If that state space consists

of Ci different states, then a cost of O( C

i2qiτi ) operations (multiplications) is needed to

compute the vector of the transient state occupation probabilities at time τi , where qi is

the maximum module entry of the Markov chain generator matrix Qi of phase i . Solving a

deterministic PTM model requires a cost of O( Ci ×Cj ), where C j is the cardinality of

the state space of the next phase j , whereas a cost of O( Ci × ( Cj1

+L+Cjn) ) is needed


for a probabilistic PTM from phase i to phases j1,..., jn having state space cardinalities

C j1

,...,C jn. Of course, as it can be reasonably expected that a small set of phases is in-

volved in probabilistic PTMs, the solution cost for all PTMs is much smaller than that of

phase models. Thus, if n is the number of phases composing the mission (not all neces-

sarily executed in an individual mission), the overall cost required to obtain the parame-

ters to instanciate the upper level is dominated by O( C

i2 qiτ ii=1

n∑ ). If all the phase mod-

els have about the same number of states, then O( n max

i=1 ,2 ,...,nC

i2 qiτ i ) is a tight upper-

bound for this complexity. The solution of the upper level model only requires O( n3 )

steps, and is negligible compared to the other steps. Hence, the entire complexity of the

solution can be estimated as O( C

i2 qiτ ii=1

n∑ ).

4.3 Application example

4.3.1 Modelling

In this section we completely carry out the modelling of the example of PMS introduced

in Chapter 2 according with the methodology described above. The hierarchical modelling

methodology here proposed does not widens the applicability of the analytical solution

approaches with respect to the other Markov based methods. Therefore, some approxi-

mations and simplifications of the PMS are needed, similarly as for some other method-

ologies among those presented in Chapter 3.

Since phases are only allowed to have a constant duration, we approximate the random

duration of phase i with the expected value τi of the distribution Fi( t ), for i = 1,2,4 .

The special case of phase 3 is discussed in the following. In this way, we are in fact sub-

stituting the original distribution Fi( t ) with the deterministic distribution of mean τi ,

the sole distribution we can analytically take into account with this methodology. Also,

the model of each phase is an homogeneous continuous-time Markov chain, and thus the

repair activities can only be modelled by approximating the actual distribution of the time

to repair with a constant rate μ . A proper value of μ could be provided by the inverse

of the expected value of the time necessary to complete the repair.


The flexibility foreseen for the PMS can be partially supported thanks to the upper level.

Precisely, we can easily represent the dynamic selection of the next phase to be per-

formed which takes place at the end of phase 2. However, it is not possible to consider

the premature termination of phase 3 as a consequence of the PMS configuration degrad-

ing. Let us recall that phase 3 expected duration is intended to be τ3 if the number f of

faulty processors remains 0 for all the phase, and it is reduced to τ3 as soon as a proces-

sor fails. Nevertheless, a fixed constant duration l is instead to be chosen for the phase,

this introducing a further approximation.

A possible heuristic way of computing a suitable value l for phase 3 duration is to sepa-

rately analyse the transient evolution of the model of the phase, to estimate the average

time τ f =0 the PMS spends in those states in which no processor is faulty, and the time

τ f >0 spent in those in which there are faulty processors, over a time interval [0,Δ ] .

Once this information is available, the value l can be evaluated according to the following

weighted sum:

l = τ3

τ f =0

τ f >0 + τ f =0+ τ3

τ f >0

τ f >0 + τ f =0= τ3

τ f =0

Δ+ τ3

τ f >0

Δ(4.3)

Since the configuration degrading eventually lead to the failure of the PMS, the average

time τ f =0 spent in the states in which no processor is faulty depends on the value of Δ ,

and more precisely, the ratio τ f =0 / Δ tends to 0 as Δ grows. A proper value of Δwould be given by the expected phase duration l, which is however the unknown. An it-

erative method could be applied to successively evaluate Equation (4.3) for a series of

values of l, however, the convergence of the sequence is not guaranteed. Moreover, this

approach is only a heuristic one, which can be inadequate when conservative estimations

of the dependability attributes are needed. In this case, the pessimistic choice l = τ3 is

the one that results in the most stressing scenario for the PMS.

Consider now the upper level modelling. When the hierarchical methodology is applied to

the example of PMS, the Markov model shown in Figure 4.5 is produced. The state space

SU consists of four states 1,2,K,4 representing the four phases of the PMS, plus the

additional state 4© which is introduced because phase 4 appears in two distinct possible

paths of the mission, plus the FAIL and STOP states. The non-zero entries of the transi-


tion rate matrix PU correspond to those pairs for which an edge is represented in the

Markov chain model in Figure 4.5. The rewards associated to the two objectives of the

mission are indicated in boldface.

1 2

3

4

4'

STOP

FAILw 10w

10w

p p

p

p

p

p

p

p

p

p

p1,2

1,F2,3

2,F

2,4

4,S

3,F

4,F

3,4' 4',S

4',F

1

1

Figure 4.5: Upper level model for the application example

We now build the models of the lower level. Let us first consider the Markov chain mod-

els of the phases. The triple ( a,s, f ), where a represents the number of active proces-

sors, s represents the number of spare processors, and f that number of faulty proces-

sors, is a proper notation to completely describe a PMS configuration. Therefore, we also

use the notation ( a,s, f ) to identify the possible states of the phase models. The state

space Si of the Markov chain model of phase i , i = 1,2,K,4 , is defined as the set of

configuration that satisfy the minimal configuration requirements, and that at the same

time do not exceed the requirements of the ideal configuration. Also, each state space Si

contains a single failure state, namely state FAIL, that represents all the unfeasible con-

figurations for phase i , i = 1,2,K,4 . The four state spaces are as follows:

S1 = {( 3,1,0 ),( 3,0,1 ),( 2,0,2 ),( FAIL )}

S2 = {( 2,2,0 ),( 2,1,1 ),( 2,0,2 ),( 1,0,3 ),( FAIL )}

S3 = S4 = {( 3,1,0 ),( 3,0,1 ),( FAIL )}

The transition rate matrices of the various Markov chain models describe the effects of

the failures affecting the active processors, the reconfigurations and the consequent


switching on of the spare processors, and the repair of the failed ones. Figure 4.6 shows

the state transition diagram for the Markov chain model of phase 1.

μ

6λ

4λ3,1,0 3,0,1 2,0,2 FAIL

6λ(1-c)

c 6λ

μ

Figure 4.6: State transition diagram for the Markov chain model of phase 1

Starting from the initial state ( 3,1,0 ), where there are no failed processors, the configura-

tion of the PMS degrades as results of the failures affecting the active units. As soon as

the first active processor is hit by a fault, the insertion of the spare processor is tried. If

the switching on succeeds, the ideal configuration with three active units is restored, state

( 3,0,1 ), or else the degraded mode with only two operational units, state ( 2,0,2 ), is

reached. While the PMS stays in this configuration, any further fault immediately brings

it to the failure, modelled by the transition to state FAIL.

The state transition diagram of for the model of phase 2 is shown in Figure 4.7. Notice

that for this phase it is possible to have multiple reconfiguration attempts. For instance,

suppose the PMS is in configuration ( 2,2,0 ), and a fault hits a processor. The insertion

of a spare unit is instantaneously tried, and should it fail, the reconfiguration is immedi-

ately retried with the insertion of the second spare. The transition linking state ( 2,2,0 )

to state ( 1,0,3 ) models the occurrence of the fault and the double failure of the two con-

secutive spare insertions.

μ

2λ 2λ

λ2,2,0 2,1,1 2,0,2 1,0,3 FAIL

(1-c)

c

(1-c)2λ

2λ(1-c) 2λ(1-c)

c 2λc

μ μ



The Markov chain models for phase 3 and 4 are equivalent one to the other, except for

the transition rates related to the fault occurrence process. Thus, we only show in Figure

4.8 the state transition diagram for the model of phase 3.

μ

15λ

3,1,0 3,0,1 FAIL

c

15λ

15λ(1-c)


Consider now the PTMs modelling the phase changes. The deterministic PTM for the

transition from phase 1 to phase 2 is shown in Figure 4.9. This PTM is a very simple one

because phase 2 is less demanding than phase 1, and the only reconfigurations needed as

the next phase starts are represented by the switching off of the extra processors. These

operations are always successfully performed, as indicated by the probabilities on the

corresponding arcs of the PTM.

Phase 1

Phase 2

3,1,0 3,0,1 2,0,2 FAIL

2,0,2 FAIL1,0,3

1 1 1

2,2,0 2,1,1

Figure 4.9: Deterministic PTM for the transition from phase 1 to phase 2

The state occupation probability of state FAIL of phase 1 is not mapped by the PTM,

because it represents a failure probability for the whole mission, and is thus reported at

the upper level to instanciate probability p1,F . The dual probability p1,2 to be reported

at the upper level is the sum of the mass probability mapped by the PTM to the next

phase, and according to Equation (4.1) is given by

p1,2 =

rπ1( τ1 ) ⋅ M1,2 ⋅( 1,1,1,1,1 ) = π

3,1,01 ( τ1 )+ π

3,0 ,11 ( τ1 )+ π

2,0 ,21 ( τ1 )


Notice that a null initial occupation probability is assigned by the PTM to the two states

( 1,0,3 ) and FAIL of phase 2.

The more interesting probabilistic PTM modelling the selection of the next phase to be

performed at the time phase 2 ends is shown in Figure 4.10. The starting criterion given in

Table 2.1 prescribes that the secondary objective phase 3 must be performed only if there

are no faulty processors at the time phase 2 ends. Therefore, the PTM only maps the oc-

cupation probability of state ( 2,2,0 ) into the states of phase 3. However, it could be the

case that the insertion of the spare processor needed to reach the required operational

configuration for phase 3 fails. In this case, phase 3 does not start, and phase 4 is per-

formed instead, provided that the other spare can be inserted, according to the specifica-

tion given in Table 2.1.

Phase 3

Phase 2

3,1,0 3,0,1 FAIL

2,0,2 FAIL1,0,3

Phase 43,1,0 3,0,1 FAIL

2,2,0 2,1,1

c (1-c) cc

Figure 4.10: Probabilistic PTM for the transition from phase 2 to phase 3 or 4

All the mass probability not mapped by the PTM is summed up according to Equation

(4.2), to evaluate probability p2,F of the upper level model.

Phase 3

Phase 4

3,1,0 3,0,1

1 1

FAIL

3,1,0 3,0,1 FAIL

Figure 4.11: Deterministic PTM for the transition from phase 3 to phase 4


The deterministic PTM modelling the transition from phase 3 to phase 4 shown in Figure

4.11 is very simple because the two phases have exactly the same requirements

4.3.2 Evaluation

The evaluation of the models proceeds as explained in Section 2. The initial state occupa-

tion probability vector rπ 1(0 ) for phase 1 is the one that assigns probability 1 to the

state ( 3,1,0 ), that is we suppose that the PMS starts its activities with all healthy pro-

cessors. The initial state occupation probability vector for the subsequent phases are ob-

tained from the solution of the PTM submodels.

Solving the upper level model we can evaluate the measures of interest for this particular

PMS. The Reliability R at the mission completion time is simply the occupation prob-

ability of state STOP at the fourth step, when all the phases have been executed. To eval-

uate the Reliability Ri of the PMS at the time phase i ends, we must compute the occu-

pation probabilities at step i of all the non failure states that are reached in one step from

the state representing phase i . For instance, R2 is computed as the sum of the occupa-

tion probability of state 3 and 4 at the second step. The total reward cumulated during

the mission of the PMS is computed according to Equation (1.3) of Chapter 1.

We perform the evaluation of the example of PMS in terms of the Reliability Ri at phase

i completion time. The values of the model parameters are specified in Table 4.1, where

we only specify the mean values of the distribution involved in the definition of the

PMS. All the values are expressed in terms of the unit of time considered, which is the

hour.

Phase duration τ1 = 50h τ2 = 1000h τ3 = 100h , τ3 = 2h τ4 = 200h

Failure rate λ = 10−4 / h

Successful insertion probability c = 0.99

Mean repair rate μ = 5 * 10−2 / h

Table 4.1: Parameter values for the evaluation of PMS Reliability

The expected value l = τ3 is selected for phase 3. Figure 4.12 shows the values of the

Reliability Ri at phase completion time, i = 1,2,3,4. The Reliability of phase of each

phase is computed by averaging the various Reliabilities along the different paths of the


mission. Thus, we are only taking into account the original phases of the PMS, without

providing a separate Reliability value for all the fictitious phases that have been intro-

duced to keep the memory of the past history of the PMS, such as phase ′4 in the upper

level model. The Reliability R4 shown in Figure 4.12 is exactly the Reliability R at mis-

sion completion time.

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

R1 R2 R3 R4

Figure 4.12: Reliability at phase completion times

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

R1 R2 R3 R4

l=2 hours

l=100 hours

Figure 4.13: Reliability comparison for two values of l

Moreover, to quantify the effect of the approximation introduced by the constant dura-

tion l assumed for phase 3, we also evaluate the Reliability Ri, i = 1,2,3,4 in the case we

set l = τ3 . The comparison of the Reliability results for the two possible values of l is


shown in Figure 4.13. Notice how relevant is the difference of the two results for the two

values of l.

We also evaluate the reward cumulated during the mission of the PMS for the two differ-

ent values of phase 3 duration considered above, and for various values of the probability

c of successful spare insertion. The total cumulated rewards shown in Figure 4.14 are

normalised with respect to the maximum possible value 11w, which is obtained by the

PMS with the successful execution of both phase 3 and phase 4, the two objectives of the

mission. This normalised reward expresses the accomplishment level of the PMS with re-

spect to the reward structure defined for the mission. We can observe from the plots in

Figure 4.14 that the cumulated reward is nearly insensitive to the changes of c for values

greater than 0.99 . On the contrary, also in the case of the rewards, varying the duration

of phase 3 heavily affects the final results, and this influence is relevant for whichever

value of c.

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.85 0.9 0.95 0.99 0.995 0.999

l=2 hours

l=100 hours

RRRReeeewwwwaaaarrrrdddd

cccc

Figure 4.14: Normalised rewards as a function of c

4.4 Concluding remarks

The idea underlying this methodology is to take advantage from the separate modelling

and solution of the phase models, and at the same time to build, at a higher abstraction

level, a global model of the mission. This hierarchical approach simplifies the modelling


activity. The separate modelling of the different phases turns out in self-contained and

easy-to-define models, and allows the dependability analyst to focus on the specific char-

acteristics of each single phase, selecting the more appropriate level of detail for each

model.

The upper level can easily describe missions composed by any combination of phases

with pre-planned duration of the phases, thus allowing to analyse various mission pro-

files by using a high abstraction level modelling. Missions may include probabilistic

choices of the next phases to be performed, thus introducing a more flexible and dynamic

planning of the mission goals. The mission models can hence have a tree configuration.

Many different missions can be modelled and analysed by reusing and composing models

of the lower level which describe individual phases and deterministic as well as proba-

bilistic transitions among phases. This modularity allows very easily to change assump-

tions or conditions ruling a phase change requiring to change just a few sub models.

The hierarchical approach allows to master the complexity of the analysis by considering

a separate resolution of the phases, and of the dependencies among phases caused by the

usage of the same system components in the different phases. Indeed, each single phase

model and PTM is quite simple compared to that of the whole mission, and can be evalu-

ated at a low computational cost by using general purpose tools.

Moreover, the upper level allows to perform a sensitivity analysis to understand which

phases are more critical for the success of a specific mission. Once that such bottle-neck

phases are identified, it is possible to establish requirements on the dependability figures

of each single phase to guarantee that the mission target is reached.

With respect to the limits of the other approaches that have been proposed in the litera-

ture, the hierarchical approach here proposed does not extend the class of PMSs that can

be analysed. Its applicability is still restricted to the PMSs that have a constant phase

duration. However, some original aspects are introduced, which try to alleviate the weak-

points of the methods in the literature.

The explicit modelling of the transitions among phases performed by the PTMs is a first

attempt to introduce a systematic treatment of the dependencies among phases of a PMS.

The use of PTMs turns out in a clean, easily understandable modelling. Even complex sit-


uations, such as the dynamic choice of the phase to be performed next, is modelled in a

concise and easy-to-read way.

The possibility of dynamically selecting the next phase to be performed represents a first

extensions towards the analysis of more flexible missions of PMSs. None of the methods

previously presented in the literature allows the introduction of such a flexibility into the

models. Notably, dealing with this feature does not add a significant complexity, neither

to the modelling, nor to the evaluation step.

The overall computational complexity required for the application of the hierarchical

modelling methodology does not exceed the upper bound we set in Chapter 3, its asymp-

totic cost being exactly the same as that of the other methods that adopt a separate mod-

elling of the various phases. Moreover, the solution of the models can be easily auto-

mated, by exploiting the features made available by the tools for the automated evaluation

of system dependability.

Chapter 5

DSPN models for PMSs

In this chapter we propose another modelling and evaluation approach for PMSs having

constant phase duration, based on the deterministic and stochastic Petri nets (DSPNs). A

complete example of application of this methodology can be found in [15].

As we have described in Chapter 1, the class of DSPN models allows timed transitions

with exponentially distributed firing times, immediate transitions, and includes

deterministic transitions as well. Moreover, the DSPNs allow a very concise modelling of

even quite complex systems, through the use of guards on transitions, timed transition

priorities, halting conditions, reward rates, etc. Due to their high expressiveness, DSPNs

are able to cope with the dynamic structure of the PMSs in a much better way than the

lower level Markov chains can do. In this respect, they offer a suitable means for a

systematic formulation and solution of PMS models.

The DSPN approach to the PMSs offers many advantages, concerning both the modelling

and the evaluation. PMSs are modelled with a single DSPN model representing the whole

mission. As detailed in Chapter 3, the single-model approach offers the advantage of im-

plicitly solving the issues arising from the dependencies among phases caused by the

sharing of architectural components. Moreover, the DSPN model of the mission is a very

concise and at the same time easy-to-read one, and allows additional features of PMSs to

be modelled, features which are usually not considered in literature. In particular, we can

consider flexible missions during which the duration of the phases can be dynamically

adjusted depending on the current state of the PMS. This features is not supported by

the methodology based on hierarchical Markov model defined in the previous chapter.

DSPN models for PMSs 74

The evaluation procedure of the DSPN models is supported by the existence of the ana-

lytical solution method for the transient probabilities of the underlying MRGP marking

process, which has been summarised in Chapter 1. This solution technique can be very

costly in terms of computational complexity required for a general DSPN model.

However, by taking into account the particular structure of PMSs, we derive a much

more efficient analytical solution method, which basically only requires the separate so-

lution of each single phase, and that can be fully automated.

Moreover, the existence of analytical expressions for the time-dependent marking occu-

pation probabilities allows us to derive the sensitivity functions of the dependability

measures of interest for a PMS, that is the analytical form of the derivatives of the mea-

sures with respect to the variations of a set of parameters. Thus, a dependability engineer

can conduct a sensitivity analysis of a PMS by studying the characteristics of the sensi-

tivity functions, rather than performing batches of multiple evaluations for different val-

ues of the parameters.

In the following, we first present the DSPN modelling approach for PMSs, by describing

how the various peculiar features of PMSs can be modelled by exploiting the expressive

capabilities of the DSPNs. Then, we specialise the general theory of DSPNs to the mod-

elling of PMSs, and derive the sensitivity function for the transient marking occupation

probabilities. After that, we describe a possible automation of the solution technique.

Then, we apply the modelling methodology to the example of PMS we have been study-

ing throughout the dissertation. Finally, we give some concluding remarks, to discuss ad-

vantages and limitations of the DSPN approach with respect to the other studies in the

literature.


The original class of DSPNs has been introduced to extend the representative power of

the SPNs and GSPNs models, allowing the exact modelling of events having deterministic

occurrence times [1]. Besides the introduction of deterministic transitions, other mod-

elling features have been more recently added to the DSPNs, features that do not extend

the power of this class of models, but significantly improve their expressiveness. These


same features can be found in Stochastic Activity Networks, and because of their extreme

usefulness, they have been included in several other classes of Petri nets, such as GSPNs

and Stochastic Reward Nets. We now introduce and discuss some of the more relevant

among these new modelling capabilities.

A very useful modelling feature is the possibility of specifying the firing rates of a timed

transitions as an arbitrary function of the marking of any place of the Petri net. Notice

that in the Petri net models we considered up to now, the sole marking-dependence that

has been taken into account is that from the input places of a transition. The general

marking-dependent duration of the activities associated to the transitions permits to

model very complex behaviours in a quite compact way. As well as transition rates, re-

wards associated to the markings of the Petri net may depend on the marking of the

model.

Another very interesting possibility to enrich the capabilities of DSPNs is the introduc-

tion of logic conditions, also called guards, which control the enabling of transitions. In

the classical Petri net models, the enabling of transitions is completely determined by the

number of tokens in the input places and the corresponding multiplicity of the input arcs.

Therefore, to add a particular control condition on the enabling of a given transition, it is

necessary to introduce additional places and arcs which explicitly model the condition

into the structure of the net. This additional modelling elements increase the model di-

mensions and may consequently diminish its understandability. Conversely, a guard can

be any arbitrary function of the marking of the model, which is added to the specification

of the transition without impairing the clearness of the modelling.

A special class of guards on transitions is represented by the halting conditions. They are

introduced to stop the movement of the tokens as the Petri net model reaches a given

marking, by permanently disabling all the transitions and transforming the current mark-

ing in an absorbing one. Notice that if halting conditions are not available, stopping the

evolution of the model requires the introduction of additional modelling elements for each

potentially enabled transition.

Input arcs with variable cardinality provide additional simplifications of the modelling. A

variable cardinality input arc is one whose multiplicity is equal to the number of tokens in

the corresponding input place. Using such an arc can be very useful in the following situa-


tion. Consider the case when all the tokens in a place are to be flushed to another place.

This batch movement can be equivalently modelled through the basic constructs of the

GSPNs, by introducing an instantaneous transition that iteratively moves the tokens one

at the time until the input place is empty. Though perfectly correct, this modelling choice

is less efficient that the one that employs the variable cardinality arc. In fact, the mod-

elling with the basic GSPN constructs results in a set of intermediate vanishing markings

in the reachability set of the model, markings that need to be removed to obtain the re-

duced reachability graph. On the contrary, the solution with the variable cardinality does

not incur in this problem.

This same efficiency argument applies for all the additional modelling features introduced

for DSPN models and described above. Besides a clearer and more concise modelling,

these capabilities avoid the introduction of many vanishing markings that need to be sub-

sequently eliminated from the reachability graph of the models. However, it is worth-

while observing once again that these features do not increase the power and thus the

applicability of the models, but represent only convenient shorthand notations that

simplify the modelling and generation of the underlying marking process.

The enriched set of modelling features offered by DSPNs allows the PMSs to be repre-

sented by the general model schema shown in Figure 5.1.

Phase 1 Phase 2 Phase 3 STOP

PhN SN

P1 P2

P3

t1 t2 t3

Figure 5.1: General structure of a DSPN model of a PMS

At an high abstraction level, the model of a PMS is composed of two logically separate

parts: one represents the system, that is its components and their interactions, which

evolve according to the events that modify their state, and the other represents the

control part, which describes the phase changes. Both the parts can be modelled via

DSPNs: the system part containing only exponentially distributed and instantaneous


transitions, which we call System Net (SN), and the control part represented by another

DSPN submodel, the Phase Net (PhN), which contains all the deterministic transitions of

the overall DSPN model. As we shall see in the following, the PhN may contain

instantaneous transitions as well.

A token in a place of the PhN model represents a phase being executed, and the firing of a

deterministic transition models a phase change. The sequence of phases ends with a token

in STOP place, which represents the end of the mission. The SN model describing the

evolution of the underlying system is governed by the PhN: the SN's evolution may de-

pend on the marking of the PhN thus representing the phase-dependent behaviour of the

system. According to the DSPN modelling rules, this phase dependent behaviour is mod-

elled by proper marking dependent predicates which modify transition rates, enabling

conditions, reward rates. As we will detail in the following, the two nets can interact in

various ways, and this allows the modelling of different features of PMSs.

Any structure of the two nets can be considered: in particular, the PhN is not limited to

have a linear structure. Indeed, the PhN must represent all the possible paths of the mis-

sion. If a dynamic profile of the mission is to be considered, then the PhN takes a tree

structure. If a phase is performed along different paths of the mission, the PhN includes a

different place to model the execution of each distinct instance of that phase.

The same structure of the model of a PMS can be found in the paper of Aupperle, Meyer

and Wei described in Chapter 3. They used the METASAN [75] modelling tool, based on

Stochastic Activity Networks, to represent both the PhN and the SN. In the SAN lan-

guage, transitions are called activities. SANs allow for general distributions of activity du-

ration, for marking dependent firing rates, and for guards on the enabling conditions of ac-

tivities. Hence, SANs and DSPNs share many of the modelling features, and the work in

[7] contains many interesting elements for the modelling of complex PMS. Here, we in-

tend to apply the potentialities of DSPNs to investigate how the various aspects of PMS

can be modelled. Moreover, we exploit the MRGP theory to provide an analytical solu-

tion technique for the DSPN models of PMSs. The possibility of such analytical solution

was not available at the time the work in [7] was proposed, and it represents a consider-

able step towards the definition of a systematic solution method for PMSs.


The adequacy of the DSPN models for the modelling of PMSs is to be found in the possi-

bility of expressing even complex phase-dependent behaviours through the relations be-

tween the two submodels PhN and SN shown in Figure 5.1. To present the guidelines for

the modelling of PMS, in the following we take into account the various features de-

scribed in Chapter 2 and explain the way each of such features can be modelled by intro-

ducing new interaction capabilities between the PhN and the SN.

First of all, consider the changes that occur in the configuration of the PMS as result of

the various performance/dependability requirements of the different phases. These

changes can be modelled by corresponding variations of the SN, which are dependent

from the marking of the PhN. For instance, consider the situation represented in Figure

5.2, where the PMS adopts the operational configuration A while within phase i , and

configuration B within the successive phase j . Place Pi and P j in the PhN model the ex-

ecution of the two phases.

The SN submodel includes the model of both configurations, however the tokens circulate

only inside the model of configuration A while the token of the PhN submodel is in the

place representing Phase i . The two immediate transitions ta and tb have the guard

m( P j ) = 1, that enables them only when a token reaches the place P j . As they get en-

abled, the two instantaneous transitions move the tokens from the model of configuration

A to that of configuration B, modelling the configuration change.

P P

PhN

. . . . . .

Configuration A

Configuration B

t

ta

b

SN

i j

Figure 5.2: Phase-dependent configuration modelling

Variable cardinality arcs can be used to flush more tokens in a single firing. Notice that

this mechanism actually substitutes the mapping of the state occupation probabilities

that was necessary in the previous methodology based on Markov chains. If

configuration A and configuration B share some of the system components, only the


differences between the two configurations need to be modelled in the SN, and the

common parts can overlap for a concise modelling.

The different failures criteria of the various phase can be immediately modelled by using

again instantaneous transitions, whose enabling conditions are augmented with the failure

criterion of the current phase. For instance, suppose the failure criterion of phase i is

given by a predicate f ( m ) of the marking m . The correct behaviour of the PMS inside

phase i is modelled by a subnet of the SN that represents the normal operation, as shown

in Figure 5.3, and the failure of the PMS is modelled by an absorbing place FAIL. A set

of instantaneous transitions, the two transitions ta and tb in the example in Figure 5.3,

with a guard f ( m )∧ ( m( Pi ) = 1 ) can be used to move some or all the tokens of the

normal operation SN submodel into the FAIL place if the failure criterion is met during

phase i . Moreover, halting condition may be used to stop the movement of other tokens

not moved to place FAIL.

P

PhN

. . . . . . tta b

SN

i

normaloperation

FAIL

Figure 5.3: Phase-dependent failure criteria modelling

The phase-dependent failure rates and the rewards are easily modelled by defining the

proper firing rates as a function of the marking of the PhN submodel.

Up to now, we have conditioned the behaviour of the SN submodel to that of the PhN

submodel, to represent the phase-dependent behaviours that characterise PMSs. Now,

we perform the opposite step, by making the evolution of the PhN dependent on the

marking of the SN, to model the possibility of dynamically adjusting the mission as a

consequence of the state of the system.

Consider the case when the next phase to be performed is to be chosen at the time the

preceding phase ends. We show how the DSPN approach can provide the flexibility to

model such a dynamic behaviour. Suppose that after phase i either phase j or phase h


can be performed, depending whether the state of the PMS satisfies some starting crite-

rion or not.

P

P

PhN

. . .

. . . i

j

P . . .

h

t

t

j

h

Pchoice

Figure 5.4: Modelling of a dynamic mission profile

We model this feature as shown in Figure 5.4. The PhN submodel shows the two possi-

ble paths of the mission, and the choice between the two possible phases is determined

by the guards on the instantaneous transitions t j and th . The starting criterion is formu-

lated as a function s( m), where m is the marking of the SN submodel, and is used as a

guard to enable the instantaneous transition that triggers the proper phase. The place

Pchoice is an auxiliary place to model the selection of the next phase.

The possibility of defining marking dependent firing times of deterministic transitions

permits to model activities whose duration can be modified after they have already

started, taking however into account the amount of time already spent. More precisely,

suppose the a deterministic transition has been enabled for δ units of time, and, accord-

ing to the current marking of the DSPN, has a firing time of α units of time. Then, a

marking change occurs, and the firing time in the new marking is β . In the new marking,

an elapsed firing time equal to ( δ / α )β is taken for the transition, since δ out of the αunits of time of its firing delay have already elapsed. The transition will fire in the new

marking, if no further marking changes take place, when the elapsed firing time reaches β .

This particular dependency has been named marking-dependency by a scaling factor [24,

58], because the distribution of the firing time is affected by the marking changes only in

its expected value, and not in its general low.


5.2 Analytical solution

As we discussed in Chapter 1, a DSPN model can be analytically solved by applying the

results of the MRGP theory, provided that the condition expressed by Assumption 1.1

is satisfied, that is at most one deterministic transition is enabled in each of the possible

markings of the net. This condition is obviously satisfied for the DSPN model of a PMS,

provided that the only deterministic activities modelled are the phase duration. In this

case, there is only one (and exactly one) deterministic transition enabled, which corre-

sponds to the current phase of the system, and an elegant analytical solution can be given

for the transient marking probabilities by specialising the general solution techniques for

MRGPs.

The sequence of regeneration points { Ti ,i ≥ 0 } of the underlying MRGP is chosen an the

successive firing times of the deterministic phase transitions in the PhN. The kernel ma-

trices are accordingly defined, as detailed in Chapter 1, and the matrix V( t ), which de-

scribes the time-dependent behaviour of the DSPN model, can be obtained as the solution

to Equation (1.5). Solving Equation (1.5) to obtain matrix V( t ) for a general MRGP re-

quires the use of numerical algorithms or of Laplace transforms, and can be a computa-

tionally intensive task. However, the DSPN models of PMSs have the following prop-

erty that allow us to simplify the general expression of the transient probability matrix of

an MRGP:

Property 1: in every non-absorbing marking of the DSPN model of a PMS, there is

always one deterministic transition enabled.

This property allow to tailor the general theory introduced in Chapter 1 to derive a much

more efficient solution technique. We first consider the case when the PhN has a linear

structure, that is no flexibility is allowed for the mission. The ordered sequence of phases

performed through the mission is denoted with 1,2,K,n , and the deterministic duration

of the phases is denoted with τ1 ,τ2 ,K ,τn . Moreover, we also assume for the time being

that the duration of the phases does not depends on the marking of the SN. Then, we ad-

dress the more general case when the PMS can dynamically select the next phase to be

performed, and consequently the PhN shows a tree structure. After that, we briefly men-

tion the technique to be applied to analytically deal with deterministic transitions having


marking-dependent firing times. Then, the analytical derivation of the sensitivity function

for the time-dependent marking occupation probabilities is presented, and last we give the

main guidelines for the automation of the evaluation procedure.

5.2.1 Linear PhN

Let S denote the state space of the marking process of a DSPN for a PMS having a PhN

with linear structure. Let tiDet be the deterministic transition, with firing time τi , which

models the time the PMS spends in phase i , i = 1,2,K,n , respectively.

For any marking m of the DSPN state space S , let D( m ) denote the deterministic tran-

sition which is enabled in marking m . Consider the subsets Si of S defined as follows:

Si = m ∈S D( m )= t

iDet{ }, i = 1,2,K,n

Owing to Assumption 1.1 together with the Property 1 stated above, the sets Si are dis-

joint one from another. The marking process { m( t ),t ≥ 0 } moves from Si to Si+1,

i = 1,2,K,n − 1 over time. Moreover, denote with Sn+1 the set of markings given by

S \ Sii=1

nU , which contains those absorbing markings reached by the model at the end of

the mission, when a token is in place STOP. The sets Si, i = 1,2,K ,n+ 1 form a parti-

tion of the marking set S . Let Ci denote the cardinality of set Si , i = 1,2,K ,n+ 1.

Now, consider the continuous-time Markov chain { mi( t ),t ≥ 0 } whose state transition

diagram corresponds to the reduced reachability graph of the DSPN when transition tiDet

is enabled, that is within the time interval [Ti−1,Ti ), i = 1,2,K,n . This stochastic pro-

cess is called the subordinated process of transition tiDet . The state space of

{ mi( t ),t ≥ 0 } is a subset of S , and is given by Si . The generator matrix of

{ mi( t ),t ≥ 0 }, denoted by Qi , can be obtained by analysing the evolution of the DSPN

while staying in phase i , i = 1,2,K,n . More precisely, let λ( m, ′m ) denote the transi-

tion rate from marking m to marking ′m in the reduced reachability graph. Matrix Qi has

size Ci ×Ci , and is defined as follows:


Qi = qm , ′mi =

λ( m, ′m ) m ≠ ′m

− qm ,ri

r∈Si ,r≠m∑ m = ′m

⎧⎨⎪

⎩⎪ , m , ′m ∈Si

Between the regeneration points Ti−1 and Ti , i = 1,2,K,n , the evolution of the MRGP

underlying the DSPN follows that of the simple subordinated continuous-time Markov

chain { mi( t ),t ≥ 0 }. Therefore, for any t ∈[Ti−1,Ti ), the transient probability matrix

can be obtained through the exponential of the matrix Qi .

Let us define the following branching-probability matrices which account for the branch-

ing probabilities as the deterministic transitions fire. Let Δi , i = 1,2,K,n − 1, be the ma-

trix defined as follows:

Δi = δ

m, ′mi = Pr ob[Yi = ′m m( Ti− ) = m ], m ∈Si , ′m ∈Si+1

where we recall that Yi is the marking of the DSPN at the regeneration instant Ti , which

is phase i completion time, and Ti− denotes the instant of time that immediately pre-

cedes Ti . The branching-probability matrix Δi has dimension Ci ×Ci+1 , i = 1,2,K,n − 1.

These matrices can be automatically obtained when the reachability graph is generated

[22].

According to the partition Si , i = 1,2,K ,n+ 1 of the state space S , the matrices K( t )

and E( t ) can be written in block form K( t ) = Ki, j( t ) , and

E( t ) = Ei, j( t ) . The block

matrices Ki , j( t ) and Ei , j( t ), i = 1,2,K ,n+ 1 have size Ci ×Cj . We emphasise that sub-

scripts i and j now denote the phase being executed and not individual markings of the

DSPN. Thus Ki , j( t ) and Ei , j( t ) are submatrices of K( t ) and E( t ), respectively.

Let us observe that the study of the transient evolution of the marking process

{ m( t ),t ≥ 0 } can be stopped at the time T n where the last phase ends, which due to the

deterministic duration of all the phases is known in advance. Thus, we do not need to

consider the evolution of the marking process inside the subset Si+1 of the state space S .

The blocks of interest of the kernel matrices are defined as follows:

Ki , j ( t ) = eQiτ iΔ iu( t − τ i ) 1 ≤ i ≤ n− 1, j = i + 1

0 otherwise

⎧⎨⎪

⎩⎪


Ei , j ( t ) = eQiτ i (1− u( t − τ i )) j = i

0 otherwise

⎧⎨⎪

⎩⎪

where u( t − τ i ) is the delayed unit step function defined as u( t ) = 0 if t < τ i , and

u( t ) = 1 if t ≥ τ i, i = 1,2,K,n . Let us observe that function u( t − τ i ) is the cumulative

distribution of the deterministic firing time of transition i , for i = 1,2,K,n . Matrices

K( t ) and E( t ) depend on time only through these delayed unit step functions because

of the regeneration points {T i ,i ≥ 0 } are chosen as the firing times of deterministic tran-

sitions. Moreover, note that due to the linear structure of the PhN matrix, K( t ) is a

block upper-diagonal matrix with only a band of non-zero blocks, and matrix E( t ) is a

block diagonal one. These properties will be useful to simplify the general form of the

transient probability matrix V( t ), which gives the time-dependent marking occupation

probabilities and is the solution to matrix Equation (1.5). Indeed, we can solve Equation

(1.5) for matrix V( t ) by exploiting the block partitioning V ( t ) = V i , j ( t ) , as follows:

V i , j ( t ) = Ei , j ( t )+ dKi ,h( x )V h , j ( t − x )

0

t

∫h=1

n

∑ , t ≥ 0

By eliminating from the summation the null blocks of matrix K( t ) we rewrite the preced-

ing expression as follows:

V i , j ( t ) = Ei , j ( t )+ dKi ,i+1( x )V i+1 , j ( t − x )

0

t

∫ =

= Ei , j ( t )+ eQiτ iΔ i u( x − τ i )V i+1 , j ( t − x )

0

t

∫ dx =

= Ei , j ( t )+ eQiτ iΔ iV i+1 , j ( t − τ i ) , t ≥ 0 (5.1)

where the last expression comes from the fact that the derivative of function u( x − τ i ) is

the Dirac impulse function at τ i , which allows us to reduce the convolution integral to

just a time shift. Observe that matrix V( t ) shows an upper-triangular block structure.


This allows the linear system in Equation (5.1) to be solved by backward substitutions,

to recursively obtain all the non-zero blocks V i , j ( t ) of matrix V( t ), as follows:

V i , j ( t ) = eQhτhΔhh=1

j−1∏⎛⎝ ⎞⎠E j , j ( t − τh )

h=1

j−1∑ i ≤ j

0 i > j

⎧⎨⎪

⎩⎪(5.2)

where the product over an empty set is intended to be the identity matrix.

It is worthwhile observing that evaluating the formula given in (5.2) to obtain the tran-

sient state probability matrix only requires us to derive matrices eQhτh , h = 1,2,K , j and

Δh , h = 1,2,K , j − 1. The solution of the single DSPN model is reduced to the cheaper

problem of deriving the transient probability matrices of a set of homogeneous, time-con-

tinuous Markov chains whose state spaces are proper subsets of the whole state space of

the marking process. Therefore, the formula (5.3) provides a highly effective way of

solving the single DSPN model of a PMS with a linearly structured mission.

5.2.2 Tree-like PhN

Now, consider the case when the next phase to be performed can be chosen at the time

the current phase ends. The PhN exhibits a tree-structure, with all the leaves of the tree

linked to the place STOP. The solution of Equation (5.1) is quite similar to the previous

case, but it requires a little bit more notation. As before, let 1,2,K ,n be the set of phases

and tiDet ,t

2Det ,K ,t

nDet be the corresponding deterministic transitions. Let f ( i ) denote the

forward phases of phase i in the PhN, that is the set of phases which can be performed

after phase i, i = 1,2,K ,n. For the sake of simplicity, we assume that the ordering of

phases is such that j > i, for each j ∈ f ( i ). Note that such an ordering can always be

found because of the acyclic structure of the PhN.

Consider Markov chain { mi( t ),t ≥ 0 } , defined as in the previous subsection, and let Si

and Qi be its state space and its transition rate matrix, respectively, i = 1,2,K ,n. Define

the branching probability matrix Δ i , j , i , j = 1,2,K ,n as follows:

Δ i , j = δ

m , ′mi , j = Prob[Yi = ′m m(T i− ) = m] m ∈Si , ′m ∈S j , j ∈ f ( i )

0 otherwise

⎧⎨⎩


According to the partition Si, i = 1,2,K ,n of the state space S , matrix K( t ) can be writ-

ten in block form as K( t ) = Ki , j ( t ) . Each block Ki , j ( t ) has size Ci ×C j and is de-

fined as follows:

Ki , j ( t ) = eQiτ iΔ i , j u( t − τ i ) , i , j = 1,2,K ,n, t ≥ 0

Note that matrix E( t ) which accounts for the local evolution of the MRGP while within

a phase remains unchanged. The tree-structure of the PhN still results in a block upper-

diagonal form of matrix K( t ), but in this case more non-zero blocks may appear in each

row of the block matrix. Equation (1.5) can be rewritten as follows:

V i , j ( t ) = Ei , j ( t )+ dKi ,h( x )V h , j ( t − x )

0

t

∫h=1

n

∑ =

= Ei , j ( t )+ eQiτ i Δ i ,h u( x − τ i )V h , j ( t − x )dx

0

t

∫h∈ f ( i)∑ =

= Ei , j ( t )+ eQiτ i Δ i ,hV h , j ( t − τ i )h∈ f ( i)∑ , t ≥ 0 (5.3)

The linear system in (5.3) can be solved by backward substitutions and a solution can be

provided as follows by exploiting the acyclic structure of the PhN. Consider the unique

path p( i , j ) linking phase i to phase j according to the tree-structure of the PhN. This

path is a set of phases p( i , j ) = { p1 ,p2 ,K ,pr }, where p1 = i , pr = j and

ph+1 ∈ f ( ph ), h = 1,2,K ,r − 1. Matrix V i , j ( t ) is then given by the following formula:

V i , j ( t ) = eQ ph

τ ph Δ ph ,ph+1h=1

r−1∏⎛⎝⎞⎠E j , j ( t − τ ph

)h=1

r−1∑ p( i , j ) ≠ ∅

0 otherwise

⎧⎨⎪

⎩⎪(5.4)

Equation (5.4) gives an operative way to evaluate the transient probability matrix of the

DSPN model through the separate analysis of the various alternative paths which com-


pose the mission. The preceding Equation (5.2) can be derived as a particular case of

Equation (5.4).

5.2.3 Deterministic transitions with marking dependent firing rate

The treatment of deterministic transitions having a marking dependent firing rate is easily

reduced to that of fixed delay transitions [24]. Suppose phase i is modelled by a deter-

ministic transition tiDet whose firing rate is a function g( m) of the current marking m of

the net.

In order to give a marking-independent definition of the amount of work performed by

the marking-dependent deterministic transition, it is possible to introduce a normalised

elapsed firing time, as the ratio between the elapsed firing time τm and the deterministic

firing delay g( m) in marking m. The normalisation is extended to the transition rate ma-

trix Qi of the subordinated Markov chain representing the evolution of the PMS within

phase i. The rate of the outgoing transitions of marking m is multiplied by the firing de-

lay g( m) of tiDet . Correspondingly, the firing rate of transition

tiDet is set to a normalised

unitary value.

An intuitive interpretation of this scaling of the subordinated Markov chain lies in modi-

fying the rate, that is the relative speed, of the exponential transitions of the SN that are

concurrently with tiDet , rather than changing the speed of the deterministic transition

tiDet .

5.2.4 Sensitivity analysis of DSPN models of PMSs

The existence of an analytic expression for the marking dependent occupation probabili-

ties allows to evaluate the derivatives of the dependability measures of interest with re-

spect to the variations of some parameter θ . These derivatives can be conveniently em-

ployed to perform an analytical study of the effects that the parameter variations have on

the dependability [35]. Indeed, the absolute value of the derivative indicates the magni-

tude of the variations of the measure for a small perturbations of the parameter, and its

sign reveals whether an increase of the parameter value causes a corresponding increase or

instead a decrease of the measure. Therefore, the analytical sensitivity analysis can fruit-

fully complement the means available for the study of a PMS, in some cases avoiding or


limiting the need of performing long batches of evaluations for a number of different val-

ues of the parameters. Moreover, the pointwise derivatives of the measure can be em-

ployed to provide accurate approximations of the measure over continuous intervals,

through a series approximation.

In the following, we compute the derivative of the transient marking occupation probabil-

ity vector at the end of the mission, with respect to some independent parameter θ . The

sensitivity analysis of steady-state occupation probabilities for DSPN models has al-

ready been presented in [23].The analytical derivation of this derivative is just an example

of the sensitivity studies that can be performed once the analytical expression for the

transient marking occupation probabilities are available. The initial probability vector, the

duration of phases, the firing rates of exponential transitions, as well as the probability

mass functions ruling the firing of immediate transitions may be dependent on parameter

θ . For the sake of simplicity, we consider a PMS system having a chain-like structure of

the PhN, the extension to a tree-like PhN being simply a matter of more complicate nota-

tion.

According to Equation (5.2), for the transient marking probability matrix, the vector of

marking probabilities m( τ i )

i=1n∑ at the end of the mission is given by:

m( τ i

i=1

n

∑ ) = m0V 1 ,n( τ ii=1

n

∑ ) = m0 eQhτhΔh( )h=1

n−1

∏ En ,n( τn )

where m0 is the initial assignment of marking probabilities, and En ,n( τn ) is given by

En ,n( τn ) = eQnτn . To avoid a cumbersome notation, let use define Δn = I , and

Mh = eQhτhΔh , h = 1,2,K ,n, so that vector m( τ i )

i=1n∑ can be conveniently rewritten as

follows:

m( τ i

i=1

n

∑ ) = m0 Mhh=1

n

∏

Now, we evaluate the sensitivity function s(θ ), defined as the derivative of m( τ i )

i=1n∑

with respect to θ :


s(θ ) = ∂

∂θm( τ i

i=1

n

∑ ) = (∂∂θ

m0 ) Mhh=1

n

∏ + m0∂∂θ

Mhh=1

n

∏

The derivative of the product of the Mh, which appears in the last term of preceding

Equation is obtained as follows:

∂∂θ

Mhh=1

n

∏ = Mhh=1

k −1

∏⎛

⎝⎜

⎞

⎠⎟∂∂θ

Mk Mhh=k +1

n

∏⎛

⎝⎜

⎞

⎠⎟

⎡

⎣⎢⎢

⎤

⎦⎥⎥k =1

n

∑

Last, the derivative of Mk can be obtained as follows:

∂∂θ

Mk = ∂∂θ

eQ kτ k( )Δk + eQ kτ k∂∂θ

Δk

By combining the intermediate results found so far, we obtain the most general formula

for the sensitivity function s(θ ), for it takes into account the dependence from θ of any

other parameter of the DSPN model. With respect to the evaluation of the PMS time-de-

pendent marking occupation probabilities, the estimation of the sensitivity only requires

an additional small computation effort. Indeed, the only non trivial computation is that of

the derivative of the matrix exponential, for which efficient algorithms have been defined

[10, 39, 59]. Moreover, the general formula can be specialised if the dependence from θsolely involves some particular component of the net. Consider for instance the following

interesting cases.

Suppose that the phases in a subset D of 1,2,K ,n have a duration that is a function of

θ , and no other parameters are affected by θ variations. We can estimate the sensitivity

of the final marking occupation probability vector with respect to the length of the

phases in D . In this case, the general expressions for the derivative can be simplified, be-

cause the functions Mk only depend on θ through the τk , and matrices Qk and Δk are

independent from θ , k = 1,2,K ,n. Specifically, the derivative of Mk can be rewritten as

follows:

∂∂θ

Mk = ∂∂θ

eQ kτ k( )Δk = ∂∂θ

( Qk τk )i

i!i=0

∞

∑ Δk = ∂∂θ

( Qk τk )i

i!i=1

∞

∑ Δk =


=

Qki

i!(∂∂θ

τki

i=1

∞

∑ )Δk =Q

ki

i!iτ

ki−1 ∂τk

∂θi=1

∞

∑ Δk = Qk

Qki−1

( i − 1)!τ

ki−1 ∂τk

∂θi=1

∞

∑ Δk =

= Q

keQ kτ k

∂τk

∂θΔk = Q

kMk

∂τk

∂θ

By combining this last Equation with the others previously obtained, we obtain the fol-

lowing formula for the sensitivity function:

s(θ ) = m0 Mh Qk Mk

∂τk

∂θ⎛⎝⎜

⎞⎠⎟h=1

k −1

∏k∈D∑ Mh =

h=k +1

n

∏ m0 Mh Qk∂τk

∂θ⎛⎝⎜

⎞⎠⎟h=1

k −1

∏k∈D∑ Mh

h=k

n

∏

If we are interested in the single one particular phase j, and the dependence of τ j from

θ is such that τ j = θ we can further simplify to:

s(θ ) = m0 MhQk

h=1

k −1

∏ Mhh=k

n

∏

Now, suppose we want to study the sensitivity with varying some of the firing rates of

some exponential transitions, as for instance the repair rates. In this case, the generator

matrix Mk of the subordinate Markov chain during phase k , k = 1,2,K ,n, is the only el-

ement of the sensitivity function that depends on θ . The general formula for s(θ ) can be

rewritten as follows:

s(θ ) = m0 Mh

∂∂θ

eQ kτ kΔk⎛⎝⎜

⎞⎠⎟h=1

k −1

∏k =1

n

∑ Mhh=k +1

n

∏

Consider now the case that at the end of each phase the system is inspected, and some

maintenance actions are undertaken, depending on the status of its components. The re-

pair actions can be of several types, ranging from the minimal repair which brings a unit in

the condition it was immediately before the failure occurred, up to the replacement of

faulty or aged units with brand-new ones. Moreover, the repair itself can be unsuccessful,

and some damages may be inflicted to components. A coverage factor of the repair can be

therefore included in the modelling of the inter-phase maintenance. A sensitivity analysis


can be performed to evaluate the impact that this coverage factor may have on system

dependability. Therefore, assuming that the coverage factor is a function of parameter θ ,

matrices Δk , k = 1,2,K ,n− 1 are the sole elements that depend on θ , and the formula

for the sensitivity can be rewritten as follows:

s(θ ) = m0 MheQ kτ k

∂Δk

∂θh=1

k −1

∏k =1

n−1

∑ Mhh=k +1

n

∏


To compute the dependability measures of the PMS, including the reward-base measures,

the transient marking occupation probability vector m( t ) at time t , t ≥ 0 , must be de-

rived. We can obtain m( t ) from the transient probability matrix V ( t ) given by Equation

(5.4) in last section, with the equation m( t ) = m0V ( t ), where m0 is the initial probability

vector of the DSPN. To numerically evaluate the transient state probability vector m( t )

of the DSPN models presented in the previous sections, different approaches can be con-

sidered.

A general purpose transient solver for DSPNs as TimeNET [37] can be used for this

purpose. TimeNET provides many of the modelling features available under the DSPN

paradigm, and is able to support the proposed modelling methodology. Thus, the PMS

models can be built in the TimeNET environment according to the proposed structure and

solved with the transient solution algorithm based on supplementary variables [36, 38]

that the tool implements. Of course, in this case no advantage is obtained from the par-

ticular structure of the models. A system consisting of as many differential equations as

the number of non-vanishing markings is built and solved. The asymptotic computational

cost needed for the solution is the same as the one required by the method of Smotherman

and Zemoudeh [79] described in Chapter 3.

In fact, in order to take advantage of the separability of the PMS model solution, a spe-

cific algorithm must be developed and implemented, to obtain from the PMS model de-

scription the matrices needed to evaluate Equation (5.4). The solution algorithm takes as


input the DSPN model and its initial probability vector, and performs the following

steps:

1) builds the Markov chain subordinate to phase i, and obtain its transition rate ma-

trix Qi, i = 1,2,K ,n;

2) builds the branching probability matrix Δ i , j , from phase i to phase j,

i = 1,2,K ,n− 1 , j ∈ f ( i );

3) obtains the transient state probability vector of the subordinate Markov chain i at

T i, the ending time of the corresponding phase i, for i = 1,2,K ,n;

4) multiplies the initial probability vector and the matrices, according to the ordering

given by Equation (5.4), to obtain the final result.

All of the steps described above only require well-known algorithms; in fact they have

been implemented in most of the tools for the automated evaluation of dependability.

Therefore, such tools can be used at various extents to simplify some of the steps of the

algorithm. For example, it is possible to use any Markovian solver to carry out the tran-

sient analysis of the subordinate Markov chains in step 3 of the algorithm. In particular,

the Stochastic Petri Net Package (SPNP) [25] which supports the SRN modelling

paradigm, can be efficiently used to automate all the steps of the algorithm with a limited

additional programming effort.

The computational cost for the solution of a PMS is given by the cost to build the matri-

ces at steps 1) and 2), plus the cost for the transient solutions and the multiplications at

steps 3) and 4). The transition rate matrices Qi can be generated separately one from an-

other, by building the reduced reachability graph of the DSPN when transition tiDet is en-

abled, i = 1,2,K ,n, whereas to build matrices Δ i , j , i = 1,2,K ,n− 1, j ∈ f ( i ), the gener-

ation of the reachability graph for two consecutive phases is required.

It is worthwhile observing that the state space of the MRGP process does not need to be

generated and handled as a whole. Therefore, the overall computational cost of the algo-

rithm sketched above is dominated by the operations at the steps 3) and 4), which require

the same cost O( C

i2 qiτ ii=1

n∑ ) as the one needed for the evaluation of the hierarchical

modelling approach presented in the previous Chapter 4. Thus, with our single model ap-

proach we are able to efficiently and exactly deal with a quite general scenario of PMS, at

a computational complexity that is again limited by the threshold O( C

i2 qiτ ii=1

n∑ ). The


issues posed by the phased-behaviour are completely solved by our method, for the case

of constant phase duration PMSs. The applicability of the DSPN methodology is only

limited by the size of the biggest Petri net model which can be handled by the automated

tools.


5.4.1 Modelling

In this section we apply the DSPN modelling approach to the example of PMS intro-

duced in Chapter 2. Due to the intrinsic limitation of the class of transitions that can be

included into DSPN models, still some approximations and simplifications of the PMS

are needed for this methodology to be applicable.

The dynamic features of the PMS mission profile are easily modelled by exploiting the

dependence between the PhN and the SN submodels, as we explained in Section 1 of this

chapter. In particular, we can represent both the dynamic selection of the next phase to

be performed at the time phase 2 ends, and the dynamically adjusted duration of phase 3,

for which we can introduce a marking-dependent deterministic transition to model the

two possible expected values τ3 and τ3 of the phase duration. The duration of the phase

is only allowed to have a constant duration, therefore we must introduce the same ap-

proximation considered in Chapter 4 for the hierarchical modelling methodology. We ap-

proximate the random duration of phase i with the expected value τi of the distribution

Fi( t ), for i = 1,2,4 .

Once again, the modelling of the repair activities leads to an additional approximation.

Indeed, the SN submodel can only contain exponential and instantaneous transitions; the

distribution μ( t ) of the repair rate needs to be modelled by an exponential transition

having a time-invariant constant rate μ . A proper value of μ is the inverse of the ex-

pected value of the time necessary to complete the repair.

The DSPN model of the PhN is shown in Figure 5.5. A token in place Pi, i = 1,2,K ,4, ′4

models the execution of the corresponding phase of the PMS. Notice that Place P ′4 is in-

troduce because phase 4 can be executed along two possible different paths of the mis-


sion, that is the path that includes the secondary objective phase 3, and the one that does

not. The deterministic transitions tiDet , i = 1,2,K ,4, ′4 , models the duration of the corre-

sponding phases. The place Pchoice , and the two instantaneous transitions tyes and tno ,

model the dynamic selection of the next phase to be performed, which takes place at the

end of phase 2. The two instantaneous transitions are guarded by an enabling condition

that is a function of the marking of the SN submodel that will be presented in the follow-

ing. Similarly, transition t3Det has a firing rate that depends on the marking of the SN sub-

model. For the sake of clearness, we collect all the dependencies of the PhN parameters

on the SN markings in Table 5.1. The rewards associated to the phases are shown in bold-

face. The initial marking of the PhN submodel assigns a single token to the place P1,

modelling the start of the first phase of the mission, and no tokens to all the remaining

places.

P P

P P

P

1 2

3

4

4'

tt

t

Det1 2

3

4

4'yes

notDet

tDet tDet

tDetPhN

Pchoice w10w

10w

Figure 5.5: PhN submodel for the example of PMS

PhN element SN marking

t yes enabled if: m(Down) = 0

tno enabled if: m(Down) > 0

firing rate of t3Det = τ3 : m(Down) = 0

firing rate of t3Det = τ3 : m(Down) > 0

Table 5.1: Parameters of the PhN dependent of the SN marking

The SN submodel for the example of PMS is shown in Figure 5.6. It is worthwhile ob-

serving the extreme conciseness of the overall modelling of the PMS with respect to the

number of different submodels we defined according to the hierarchical modelling

methodology presented in Chapter 4. The two submodels shown in Figure 5.5 and 5.6

completely represent our example of PMS. The state of the processors composing the


processing system of the PMS is modelled by the three places Up, Down, and Spare.

The marking of these places represents the number of actively working, failed and cold

spare processors, respectively. The two exponential transitions t1 and t2 model the fail-

ure and repair process of the processors, moving tokens back and forth between place Up

and place Down. The rate of transition t1 is made dependent from the marking of the

PhN, to model the different failure intensities of the processors inside the various phases.

The phase-dependent parameters of the SN, whose value is determined by the marking of

the PhN, are shown in Table 5.2.

Up

Down

Fail

S-fail1

S-fail2

t2t1 Spare

SN

Turn-off

Rec-ok

Rec-nok

c

1-c

S-fail3

Figure 5.6: SN submodel for the example of PMS

SN element PhN marking

m(P1 ) = 1 m(P2 ) = 1 m(P3 ) = 1 m(P4 ) = 1 m(P ′4 ) = 1

firing rate of t1 : 2λ ⋅ m(Up ) λ ⋅ m(Up ) 5λ ⋅ m(Up ) 2λ ⋅ m(Up ) 2λ ⋅ m(Up )

Turn − off enabled if: m(Up ) > 3 m(Up ) > 2 m(Up ) > 3 m(Up ) > 3 m(Up ) > 3

Re c − ok enabled if: m(Up ) < 2 m(Up ) < 1 m(Up ) < 2 m(Up ) < 2 m(Up ) < 2

Re c − nok enabled if: m(Up ) < 2 m(Up ) < 1 m(Up ) < 2 m(Up ) < 2 m(Up ) < 2

S − fail1 enabled if: m(Down) = 3 m(Down) = 4 m(Down) = 2 m(Down) = 2 m(Down) = 2



Table 5.2: Parameters of the SN dependent of the PhN marking

The instantaneous transition Turn − off models the switching off of a processor that, ac-

cording to the requirements of the phase being currently executed, is not longer needed to

be active. Similarly, the other two instantaneous transitions Rec − ok and Rec− nok

model the spare reactivation, and the possible successful or failed result of this operation.


The two instantaneous transitions Rec − ok and Rec− nok are controlled by a condition

on the marking of the SN: they can be enabled if and only if the PMS has not failed, that

is if the condition m( FAIL ) = 0 holds of the marking of the net. Moreover, all the three

instantaneous transitions Turn − off , Rec − ok , and Rec− nok , are controlled by addi-

tional enabling conditions depending on the marking of the PhN subnet, as shown in

Table 5.2.

Finally, the instantaneous transitions S − fail1, S − fail2, and S − fail 3 , are introduced

to model the failure criteria of the different phases. They are controlled by the guards

shown in Table 5.2, and once enabled, flush all the tokens of the corresponding input

places to the FAIL place, which represents the failure of the phase, and the consequent

failure of the PMS. Notice that the input arcs of these two transitions are variable cardi-

nality arcs. The initial marking of the SN submodel is shown in Figure 5.6, with three to-

kens in the Up place and one in Spare, as prescribed by the requirements shown in Table

2.1 for phase 1 of the mission. No other tokens are assigned to the remaining places of the

SN.

It is very interesting to observe the close resemblance of the conditions that relate the be-

haviour of the two submodels as they are presented in Table 5.1 and 5.2, and the form

these same conditions are given in Table 2.1 of Chapter 2. In fact, the high level DSPN

modelling allows to define models of PMSs in a way that is very similar to that of a sys-

tem specification language, easily understandable, unambiguous and easily modifiable.

5.4.2 Evaluation

We conduct an evaluation of the PMS Reliability Ri at phase i completion time

i = 1,2,3,4, with the same parameter values as those we introduced in Table 4.1.

However, it is important to keep in mind that the dynamic duration of phase 3 is now

taken into account by the marking dependent firing rate of the corresponding determinis-

tic transition of the PhN. The Reliability values for the DSPN model solution are shown

in Figure 5.7. It is interesting to compare these results with the pairs obtained with the

hierarchical modelling methodology, shown in Figure 4.13. The Reliability Ri obtained

with the DSPN model is always within the interval identified by the pair of values of Ri

returned by the hierarchical model solution for l = τ3 and l = τ3 r.


The total reward cumulated during the mission of the PMS for various values of c, the

probability of successful reconfiguration, is shown in Figure 5.8. As in the case of the

Reliability, the rewards obtained with the DSPN approach are within the intervals de-

fined by the pair of reward values evaluated with the hierarchical Markov modelling ap-

proach, shown in Figure 4.14.

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

R1 R2 R3 R4


0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.85 0.9 0.95 0.99 0.995 0.999cccc


Figure 5.8: Reward cumulated during the mission



In this Chapter we have shown a new method to model PMS dependability related char-

acteristics using DSPN models. The method takes advantage of the power and expressive-

ness of DSPN to provide a modelling methodology that results in compact and easy-to-

read models of PMSs. At the same time, the DSPN models of PMSs can be solved in a

very efficient way, by exploiting the peculiar characteristics of the underlying MRGP

marking process.

According to the proposed methodology, the model of a PMS is built as a single DSPN,

though logically split in two parts which separately represent the sequence of phases en-

compassed by the PMS, and the failure/repair behaviour of its components. As we dis-

cussed in Chapter 2, the methods based on the single model approach offer the relevant

advantage of implicitly solving the issues related to the treatment of the dependencies

among phases. Indeed, for our DSPN model, the behaviour of the PMS between two con-

secutive phases is represented by a branching probability matrix that can be automatically

obtained from the analysis of the reachability graph. This completely relieves the depend-

ability analyst from the problem of dealing with the mapping of the marking occupation

probabilities, which plagues the approaches based on the separate modelling.

Besides, the DSPN modelling permits to represent a quite satisfactory level of flexibility

of the mission. The dynamic selection of the next phase to performed is easily modelled.

Further, it is possible to take into account phases whose duration can be adjusted as the

state of the system changes, a features that has been never considered by any of the other

methods that appeared in the literature. Since the phase-dependent behaviour of the sys-

tem is described with a simple list of marking dependent predicates, it becomes easy to

rearrange and modify the model of PMSs.

These interesting characteristics are reinforced by the existence of a very efficient solution

procedure for the evaluation of the time-dependent marking occupation probabilities. We

have obtained such efficient solution technique by specialising the general transient anal-

ysis method that has been proposed for MRGPs. Remarkably, the computational com-

plexity of this evaluation technique for the DSPN models of PMSs is exactly the same as

that of the cheapest methodologies that we have presented in Chapter 2. Moreover, an-


other interesting aspect of the DSPN approach is that it easily lends itself to the automa-

tion of the solution steps, by exploiting the capabilities of the automated tools for the

evaluation of system dependability.

We have applied our method to the quite complex example of PMS defined in Chapter 2,

to show how well it can deal with nearly all its peculiar features, including those related

to the flexibility of the mission. The main limitation of the DSPN approach is that it still

requires some of the same restrictive assumptions as the other ones that have been pro-

posed in the literature. In particular, the duration of the phases are restrained to be de-

terministic, and the phase model is restricted to contain only exponential and instanta-

neous transitions.

However, this methodology starts a new direction in the research about PMSs. In the

next chapter of this thesis, we exploit the full potentialities of MRGPs to widen the class

of systems that can be analysed.

Chapter 6

MRSPN models for PMSs

In this chapter we address the most relevant weakness shared by all the methods that

have appeared in the literature, that is the lack of exact analytic solution for the case of

PMSs that have a random phase duration. To this aim, we proceed in the exploitation of

the potentialities of the MRGP theory, which has been started in the previous chapter

with the DSPN modelling of PMSs.

As introduced in Chapter 1, the class of Markov regenerative stochastic Petri nets

(MRSPNs) has been especially defined to characterise those Petri net models whose un-

derlying marking process exhibits the MRGP structure, and is thus amenable for an exact

analytical solution within the framework of the MRGP theory. MRSPNs allow for tran-

sitions having a general distribution of the firing delays, and thus permit the exact mod-

elling and evaluation of a dramatically wider class of systems, with respect to the other

class of Petri nets.

Here, we propose the MRSPN models for the analysis of PMSs. We basically follow the

same single model approach that has been fruitfully employed in the case of the DSPN

methodology. We specialise the general MRGP theory that applies for the solution of the

MRSPN models, to exploit the special phased-structure of PMSs. The resulting analyti-

cal solution technique for PMSs has a low computational complexity, which is again basi-

cally dominated by the cost of the separate analysis of the PMSs inside each phase. This

way, the solution issues posed by the phased-behaviour are completely solved in the

general scenario of PMSs with general random phase duration. The solution for PMSs

with deterministic phase duration is obtained as a particular case, together with all the

other scenarios previously treated by the various studies in the literature.

MRSPN models for PMSs 101

The rest of the chapter is organised as follows. We first present the MRSPN approach

for PMSs, discussing the extensions of the modelling representative power that it offers.

Then, we define the analytical solution procedure for the MRGP underlying the MRSPN

model, by specialising the general solution approach presented in Chapter 1. We obtain

the exact analytical solution for the Laplace-Stieltjes transforms of the time-dependent

marking occupation probabilities, and for the probability R of successfully completing

the mission. Also, some specialisation of the general results are given for a set of interest-

ing particular cases, and the applicability limits of the MRSPN modelling approach dis-

cussed. Furthermore, the analytical derivation of the sensitivity functions is presented.

The modelling approach is thus applied to our example of PMS, and its evaluation carried

out in terms of the Reliability Ri at phase i completion time, i = 1,2,K ,n. Last, some

concluding remarks are given to compare the MRSPN approach to the other ones that

have appeared in the literature


The modelling of PMSs through MRSPN models follows the general approach we pre-

sented in the last chapter for DSPNs. We adopt the same single modelling strategy,

building a model of the PMS that is logically split in a submodel called PhN, which repre-

sents the mission, and a submodel called SN, which represents the failure/repair behaviour

of the system components. Moreover, we also exploit the high level modelling features

introduced for DSPNs, such as marking-dependent firing rates, guards on transitions and

variable cardinality arcs, which allow to provide a concise description of the PMS system

dynamic. In this way, we follow the same general model of a PMS shown in Figure 5.1.

However, since the MRSPN class also allows for generally distributed firing times, we do

not impose any constraint on the transitions of the PhN subnet.

To ensure that such Petri net model is indeed a MRSPN, or equivalently that its underly-

ing marking process is a MRGP, we can check whether the condition expressed by

Assumption 1.2 is satisfied, which requires that at most one general transition is enabled

in each marking of the Petri net model. As it was for the DSPN models introduced in the

last chapter, this condition is fulfilled by the general MRSPN model sketched above,


provided that the only general transitions of the model are those included in the PhN sub-

net.

This constraint prevents the concurrent enabling of generally distributed transitions. In

fact, we can not introduce any general nor deterministic transition in the SN submodel

without violating Assumption 1.2. However, in the special case of the models for the

PMSs only, a much less restrictive assumption is needed to ensure that the Petri net

model is indeed a MRSPN:

Assumption 6.1: whenever a timed transition fires in the PhN, each enabled transi-

tion resamples a new firing time, which can only depend on the marking of the net.

Owing to the structure of the model, this condition is sufficient to ensure that the firing

times of the timed transitions in the PhN are regeneration points of the MRSPN model,

and thus the underlying marking process is a MRGP. Note that it implies that all memory

of the model evolution, apart from that included in the marking, is lost whenever a new

phase begins. The limitations that this regeneration imposes on the representativity of the

models will be carefully discussed in the following.

It is worthwhile observing that this condition is much less restrictive than the one pre-

scribed by Assumption 1.2. Between two regeneration points, the marking process of the

model can be a completely general stochastic process. For instance, this allows to include

general transitions in the SN submodel, which can be concurrently enabled with the gen-

eral transitions of the PhN, as long as no transition fires in the PhN. Moreover,

Assumption 6.1 could be further relaxed. In fact, what is actually required to ensure that

the model is a MRSPN, is that a sequence of regeneration points exists. Therefore, it

would be sufficient that the property stated by Assumption 6.1 hold for some subset of

transitions in the PhN. However, throughout the rest of the chapter we assume that

Assumption 6.1 holds of the model of the PMS.

6.2 Analytical solution

We present now the specialisation of the general solution method for MRGPs to the ana-

lytical study of the MRSPN models of PMSs. We consider in the following the case

when the PhN submodel shows a linear structure, that is the PMS sequentially performs


the n phases the mission consists of. The generalisation of the results to the models of

PMSs with a tree structure is simply a matter of a slightly more complicated notation, as

it has been for the case of the DSPN modelling approach. In the following, we first derive

the analytical expression for the transient marking occupation probabilities, and then we

focus on the evaluation of the Reliability figures obtaining quite simple and compact ex-

pressions for some interesting cases of PMSs. Last, we analytically obtain the sensitivity

functions for both the transient marking occupation probabilities and the Reliability of

PMSs.

6.2.1 The transient marking occupation probabilities

Let 1,2,K,n be the ordered sequence of phases performed through the mission. Let tiGen

be the transition of the PhN which models the time the PMS spends in phase i,

i = 1,2,K ,n, respectively. The firing time of transition tiGen is assumed to be a general

random variable having cumulative distribution function Fi( t ), and probability density

function f i( t ) = ′Fi( t ), i = 1,2,K ,n.

Denote with S the set of markings that form the reachability graph of the MRSPN

model, and with { m( t ),t ≥ 0 } the stochastic process representing the evolution of the

marking of the MRSPN. Define T0 = 0 , and let T i be the time instant at which transition

tiGen completes, i = 1,2,K ,n. As in the case of DSPN models of PMS, it is easy to check

that the phase ending times T i represent regeneration points for the marking process

{ m( t ),t ≥ 0 }. Indeed, the property stated by Assumption 6.1 guarantees that the memo-

ryless Markov property holds of the marking process of the MRSPN at the phase com-

pletion times. Thus, the sequence of bivariate random variables { (Yi ,T i ),i = 1,2,K ,n},

where Yi denotes the marking of the MRSPN model at the instant of time immediately af-

ter T i, is a Markov regenerative sequence, and consequently the marking process

{ m( t ),t ≥ 0 } is a MRGP.

The global kernel matrix K( t ) = km , ′m ( t ) and the local kernel matrix

E( t ) = em , ′m ( t ) ,

m , ′m ∈S , t ≥ 0 , are defined as in the case of DSPN models. For the sake of clearness we

recall the analytical expressions of their entries:

km , ′m ( t ) = Prob[Y1 = ′m ,T 1 ≤ t Y0 = m] (6.1)


em , ′m ( t ) = Prob[m( t ) = ′m ,T 1 > t Y0 = m]

(6.2)

To obtain the transient probability matrix V ( t ), which describes the transient behaviour

of the MRGP marking process, we proceed with the same approach adopted for the

DSPN models of PMSs in Chapter 5, by separately studying the marking process

{ m( t ),t ≥ 0 } over the different phases.

Consider the subordinate process that describes the MRGP evolution during the time in-

terval [T i−1 ,T i ), when transition tiGen is enabled. Notice that, due to the possibility of

having concurrently enabled general transitions in the MRSPN model, the subordinate

processes are not restricted to be Markov chains as it was in the case of DSPNs models,

but can be general stochastic processes. We denote with Π i( t ) the transient probability

matrix of the subordinate process during the time period [T i−1 ,T i ). Moreover, denote

with Si the state space of the subordinate process, i = 1,2,K ,n. Because of the structure

of the MRSPN model of a PMS, the sets Si, i = 1,2,K ,n are disjoint from each another.

The set given by S \ Sii=1

nU contains those absorbing markings reached by the model at

the end of the mission, when a token is in place STOP. Denote by Sn+1 this set of mark-

ings. Let us observe that in the case of DSPN models of PMSs we did not need to take

into account the evolution of the marking process over the states of the set Sn+1 because

the measures of interest could be evaluated from the transient marking occupation prob-

abilities evaluated at the mission completion time, which was known. Here, for PMSs

that have a random duration of the phases, the mission completion time is unknown, and

we must analyse the evolution of the PMS after the last phase ending time. We consider a

fictitious phase n+ 1 of infinite duration to properly take into account the random times

to absorption into the states of Sn+1.

The sets Si, i = 1,2,K ,n+ 1, form a partition of the marking set S . Let Ci be the cardi-

nality of set Si , i = 1,2,K,n + 1. The global kernel K( t ) and the local kernel E( t ) matri-

ces can be reordered according to the partition defined by sets Si, i = 1,2,K,n + 1, and

thus partitioned in blocks Ki , j ( t ) and Ei , j ( t ) having dimensions Ci ×C j , respectively.

Matrices K( t ) and E( t ) exhibit exactly the same block structure they had in the case of

DSPN model of PMSs, that is the block matrices Ki , j ( t ) and Ei , j ( t ) take non-zero val-


ues if and only if j = i + 1 and j = i , respectively. Indeed, this particular structure of the

kernel matrices is due to the peculiar features of the PMS.

Let Δ i , i = 1,2,K ,n be the Ci ×C j branching-probability matrix defined as follows:

Δ i = δ

m , ′mi = Prob[Yi = ′m m(T i− ) = m] , m ∈Si , ′m ∈Si+1

where T i − denotes the time instant immediately before T i. The elements of the global

kernel defined by Equation (6.1) can be rewritten as follows:

km , ′m ( t ) = Prob[Y1 = ′m ,T 1 ≤ t Y0 = m] = Prob[Y1 = ′m ,T 1 = u Y0 = m]du

0

t

∫ =

= Prob[Y1 = ′m T 1 = u,Y0 = m] f i( u)du

0

t

∫ , m , ′m ∈S , t ≥ 0 (6.3)

where we recall that f i( u) = ′Fi( u) is the probability density function of the duration of

phase i, i = 1,2,K ,n. According to Equation (6.3), the non-zero blocks Ki ,i+1( t ) of the

global kernel matrix K( t ) can be obtained as follows:

Ki ,i+1( t ) = Π i( u)Δ i f i( u)du, i = 1,2,K ,n, t ≥ 0

0

t

∫

Similarly, the entries em , ′m ( t ) of the local kernel matrix defined by Equation (6.2) can be

rewritten as follows:

em , ′m ( t ) = Prob[m( t ) = ′m ,T 1 > t Y0 = m] =

= Prob[m( t ) = ′m T 1 > t ,Y0 = m] Prob[T 1 > t ] , m, ′m ∈S , t ≥ 0

and consequently, the non-zero blocks Ei ,i( t ) of the local kernel matrix E( t ) are given

by:

Ei ,i( t ) = Π i( t )(1− Fi( t )), i = 1,2,K ,n, t ≥ 0


Consider the block partition of matrix V ( t ) induced by the sets Si, i = 1,2,K ,n+ 1.

Owing to the upper triangular and diagonal block structure of matrices K( t ) and E( t ), re-

spectively, all the blocks V i , j ( t ) are null if i > j, that is matrix V ( t ) exhibits an upper

triangular block form. The generalised Markov renewal Equation (1.5) can be rewritten ac-

cording to the block partitioning of the matrices:

V i , j ( t ) = Ei , j ( t )+ dKi ,i+1( u)V i+1 , j ( t − u)

0

t

∫ , i , j = 1,2,K ,n+ 1, t ≥ 0 (6.4)

and a backward substitution procedure can be applied to obtain all the non-zero blocks of

V ( t ) in the j-th column, starting from the diagonal block V j , j ( t ) = E j , j ( t ).

The solution of the system of integrals in Equation (6.4) to obtain matrix V ( t ) can be

carried out numerically in a direct way, but this would result in a computationally expen-

sive procedure. Indeed, each of the substitution steps requires the solution of the integral

Equation (6.4) for the associated submatrices, which is costly and may require the intro-

duction of numerical approximations.

Alternatively, we can resort to the analysis in the Laplace-Stieltjes transform (LST) do-

main. Define the Laplace transform of a f ( t ) to be the function

e−sto+∞∫ f ( t )dt, and its

LST fo( s ) as follows:

f o( s ) = e−st df ( t )

0

+∞

∫

The properties of LSTs are thus derived from those of Laplace transforms, which for the

sake of convenience we list in Appendix A.

If we take the LST of both sides of Equation (6.4) we obtain:

Voi , j ( s ) = Eo

i , j ( s )+ K oi ,i+1( s )V o

i+1 , j ( s ), i = 1,2,K ,n+ 1, j ≥ i

where Voi , j ( s ), E

oi , j ( s ), and K

oi ,i+1( s ) are the LSTs of V i , j ( t ), Ei , j ( t ) and

Ki ,i+1( t ), respectively. Then, the backward substitution procedure can be explicitly per-

formed and we can get the expression for each block of the of matrix Vo( s ), as follows:


V o

i , j ( s ) = K oh ,h+1( s )

h= i

j−1

∏ Eoj , j ( s ), i = 1,2,K ,n+ 1, j ≥ i (6.5)

Equation (6.5) is remarkable for its generality. It allows to derive the LST of the time de-

pendent marking occupation probabilities for the MRSPN model of a PMS with general

duration of the phases, and general intraphase marking process. Of course, a numerical

antitransformation is needed to obtain the final result in the time domain. However, we

only need to antitransform a single block Voi , j ( s ), rather than performing a sequence of

numerical solutions of integral equation systems, as required by the analysis in the time-

domain with Equation (6.4). Moreover, as we shall see in the following, the LSTs of the

kernel blocks can be explicitly computed for a wide family of distributions of the phase

duration.

6.2.2 Analytical evaluation of PMS Reliability

Starting from the transient MRGP probability matrix V ( t ), we can compute the proba-

bility R of successfully completing the mission. If m0 is the initial probability vector

over the markings of the reduced reachability graph of the PMS model, the expression

m0 ⋅V 1 ,n+1( t ) is the time-dependent marking occupation probability vector at time t,

t ≥ 0 , over the markings of Sn+1 which represent the state of the PMS at the end of the

mission. Since these markings are all absorbing, we can compute probability R as the

product m0 ⋅V 1 ,n+1(∞ ) ⋅Θ , where Θ is the vector that selects those markings that repre-

sent successful states of the system, according to the success criterion defined for the

mission of the PMS.

The computation of R is thus formulated as the problem of evaluating the limiting prob-

ability distribution of the transient MRGP process. Once the block matrix V 1 ,n+1( t ) has

been computed, the following limit needs to be obtained:

R = lim

t→∞m0 ⋅V 1 ,n+1( t ) ⋅Θ

However, it is also possible to take advantage of the LST theory to directly compute the

limit without obtaining the time-domain expression of V 1 ,n+1( t ), but rather dealing with

its transform Vo1 ,n+1( s ) given by Equation (6.5):


V o

1 ,n+1( s ) = K oi ,i+1( s )

i=1

n

∏ Eon+1 ,n+1( s ) = K o

i ,i+1( s )i=1

n

∏

Notice that the last equivalence comes from the fact that all the markings of Sn+1 are ab-

sorbing markings, and thus matrix Eon+1 ,n+1( s ) is the identity matrix. Then, because of

the time-invariance of m0 and Θ , and from the final-value theorem of Laplace transforms

(see Appendix A), the reliability R is obtained as follows:

R = lim

t→∞m0 ⋅V 1 ,n+1( t ) ⋅Θ = m0 lim

t→∞V 1 ,n+1( t )⎛

⎝⎞⎠ ⋅Θ = m0 lim

s→0 +⋅V o

1 ,n+1( s )⎛⎝⎜

⎞⎠⎟⋅Θ

Let K

i ,i+1∗ denote the limit for s → 0+ of matrix K

oi ,i+1( s ), which surely exists for the

properties of the transient probability matrices. Then, we can rewrite the preceding for-

mula as:

R = m0 lim

s→0 +K o

i ,i+1( s )i=1

n

∏ ⋅Θ = m0 Ki ,i+1∗

i=1

n

∏ ⋅Θ (6.6)

Thus, an explicit formula can be obtained for probability R , without any need to evaluate

the transient state probability matrix of the MRGP underlying the PMS model. It is

worthwhile observing that the approach followed to compute the reliability of the PMS

at the time the mission ends can be also applied to obtain the reliability at the system at

each phase completion instant. Indeed, each phase of the system can be seen as being the

last one the system has to perform, and thus the probability distribution Ri at the end of

a phase i, i = 1,2,K ,n, is computed as an absorption probability through the limit of the

global kernel submatrices transforms.

6.2.3 Specialising the general results

In this section we present a specialisation of the general results provided by Equations

(6.4) and (6.5) for the transient marking occupation probabilities, and by Equation (6.6)

for the Reliability R at mission completion time. We consider three particular scenarios.

First, we briefly reconsider the case when the phases have a deterministic duration, for

which we can complete the derivation of the transient marking occupation probabilities in


the time domain. Then, we explore the particular case when the subordinate process is a

simple homogeneous continuous-time Markov chain, that is all the timed transitions of

the SN submodel are exponential transitions. In this scenario, we consider the special case

of deterministic phase duration, for which we reobtain the same analytic results for the

Reliability R as the ones obtained with the DSPN approach, thus providing a crossed

validation of the correctness of the two methodologies. Last, we discuss the case when

the subordinate process is a more general stochastic process, and the effects of

Assumption 6.1 on the representativity of the models, to point out the applicability lim-

its of the MRSPN approach.

Constant phase duration

Let τ i be the deterministic firing time of transition tiGen , i = 1,2,K ,n. In this special case,

we can complete the analysis in the time domain, without resorting to the LSTs, but

completing the backward substitution in Equation (6.4) in the same way we have done for

the DSPN modelling approach. The blocks of the global kernel matrix K( t ) take the fol-

lowing form:

Ki ,i+1( t ) = Π i( u)Δ i f i( u)du = Π i( t )Δ iFi( t ) = Π i( t )Δ iu( t − τ i ) , i = 1,2,K ,n

0

t

∫ , t ≥ 0

and thus Equation (6.4) for the transient matrix blocks V i , j ( t ) can be rewritten as fol-

lows:

V i , j ( t ) = Ei , j ( t )+ dKi ,i+1( u)V i+1 , j ( t − u)

0

t

∫ =

= Ei , j ( t )+ Π i( τ i )Δ i f i( u)V i+1 , j ( t − u)du

0

t

∫ =

= Ei , j ( t )+ Π i( τ i )Δ iV i+1 , j ( t − τ i ) , i , j = 1,2,K ,n+ 1, t ≥ 0


where the last integral is solved as in the case of the DSPN models of PMSs, due to the

fact that the probability density function of the deterministic transition is the Dirac im-

pulse function at τ i , which allows us to reduce the convolution integral to just a time

shift. After completing the backward substitution procedure, we obtain the following ex-

pression for the transient probability matrix blocks Vi, j (t) :

V i , j ( t ) = Πh( τh )

h= i

j−1

∏ ΔhE j , j ( t − τhh=1

j−1

∑ ), i = 1,2,K ,n, j ≥ i , t ≥ 0 (6.7)

where the product over an empty set is to be intended the identity matrix. It is worth-

while observing the close resemblance of Equation (6.7) with the Equation for their trans-

form counterpart in (6.5). Also, the form of Equation (6.7) is equivalent to that of

Equation (5.2) we obtained in Chapter 5 for the time-dependent marking occupation

probabilities of the DSPN models of PMSs. However, it is important to remark that

Equation (6.7) is applicable to whichever MRSPN model of a PMS with constant phase

duration that satisfies Assumption 6.1, regardless of the structure of the SN submodel.

Markov chain subordinate processes

In this section we consider the particular case when the SN model contains only transi-

tions whose firing delays are either exponentially distributed or instantaneous. In this

case, the subordinate process during the time window [T i−1 ,T i ) is a continuous-time

Markov chain. This implies that the transient probability matrix Π i( t ) of the subordinate

process can be expressed via the matrix exponential eQi( t ) , where Qi is the infinitesimal

generator of the subordinate Markov process, i = 1,2,K,n + 1. The expressions given for

the global and local kernel matrix blocks can be rewritten as follows:

Ki ,i+1( t ) = eQi( t )Δ i f i( u)du,

0

t

∫ i = 1,2,K.n, t ≥ 0

Ei ,i( t ) = eQi( t )(1− Fi( t )), i = 1,2,K.n+ 1, t ≥ 0

Quite interestingly, the LSTs of the preceding expressions can be computed in an explicit

form. Indeed, the LST of the global kernel matrix blocks is obtained as follows:


K oi ,i+1( s ) = e−st dKi ,i+1( t )

0

+∞

∫ = L[eQi( t ) f i( t )]Δ i , i = 1,2,K ,n (6.8)

where L[⋅] denotes the Laplace transform operator. The LST of the local kernel blocks is

given by:

Eoi ,i( s ) = e−st dEi ,i( t ) =

0

+∞

∫ sL[eQi( t )(1− Fi( t ))] , i = 1,2,K.n+ 1, t ≥ 0

(6.9)

These expressions for the LSTs can be completely worked out and the integration solved

in many interesting special cases. For instance, consider the case when the duration of

phase i has an exponential distribution of parameter λ , that is Fi( t ) = 1− e−λt and

f i( t ) = λe−λt , t ≥ 0 . In this case, due to the properties of Laplace transforms, Equation

(6.8) can be rewritten as follows:

Koi ,i+1( s ) = L[eQi( t )λe−λt ]Δ i = λ((λ + s )I − Qi )−1Δ i

where to obtain the last expression we recall that the Laplace transform of the matrix ex-

ponential eM ⋅t is given by ( sI − M )−1 . Similarly, Equation (6.9) for the local kernel

block matrices becomes:

Eoi ,i( s ) = sL[eQi( t )e−λt ] = s((λ + s )I − Qi )−1

Consider now the case when phase i has a deterministic duration τ distribution. From

Equation (6.8) we obtain the following expression for the global kernel block matrices:

Koi ,i+1( s ) = L[eQit f i( t )]Δ i = e−sτeQiτΔ i = e−( sI −Qi )τΔ i

The local kernel block matrices can be obtained as follows:

Eo

i ,i( s ) = s( sI − Qi )−1 I − e−( sI −Qi )τ( )The detailed derivation of these analytical expressions for the LSTs of the kernel matrices

can be found in Appendix B, where other interesting cases are also presented.


Owing to the fact that the explicit expressions for the LSTs of the global kernel matrix

blocks are available, we can compute the Reliability R of successfully completing the

mission by applying Equation (6.6). Suppose for instance that phase duration are all ex-

ponentially distributed, and let λ i be the firing rate of transition tiGen in the PhN sub-

model. Then, Equation (6.6) takes the following form:

R = m0 lim

s→0 +λ i(( s + λ i )I − Qi )−1

i=1

n

∏ Δ i ⋅Θ = m0 λ i(λ iI − Qi )−1

i=1

n

∏ Δ i ⋅Θ (6.10)

Notice the simplicity of Equation (6.10), that allows to evaluate the Reliability R at a

very limited cost. If phase duration are all deterministic, and phase i lasts for τ i units of

time, Equation (6.6) specialises as follows:

R = m0 lim

s→0 +e−( sI −Qi )τ i

i=1

n

∏ Δ i ⋅Θ = m0 eQiτ i

i=1

n

∏ Δ i ⋅Θ (6.11)

in perfect agreement with the result already established in Chapter 5 with the time-do-

main analysis of the MRGP transient marking occupation probabilities of DSPN models

of PMSs.

Deterministic and random phase duration can be freely mixed and the Reliability of the

PMS is still easily obtained according to Equation (6.6). For instance, consider a periodic

PMS alternating the execution of a phase whose duration is a random variable exponen-

tially distributed with parameter λ , and a constant duration phase which lasts for τunits of time. The PMS cyclically executes the two phases over time. The Reliability R2 n

of the system at the time the 2n-th phase ends is obtained as follows:

R2 n = m0 lim

s→0 +λ(( s + λ )I − Q1 )−1Δ1e−( sI −Q2 )τΔ2( )

i=1

n

∏ Θ =

= m0 λ(λI − Q1 )−1Δ1eQ2τΔ2( )nΘ


General subordinate processes

In the case the SN submodel is a non Markovian Petri net, the subordinate process can be

any stochastic process. Still, the general results stated by Equations (6.4), (6.5), and (6.6)

are valid.

The limit of practical applicability of the theoretic results derived in this chapter for the

MRSPN models of PMSs is represented by the possibility of evaluating the time-depen-

dent matrix of marking occupation probabilities Π i( t ) of the subordinate process during

the enabling period of transition tiGen , i = 1,2,K ,n. This information is absolutely neces-

sary to complete the analytical derivation of the expressions for the time-dependent

marking occupation probabilities and the Reliability of the models.

Besides the particular case of homogeneous Markov chains we dealt with above, there are

other types of subordinate process that are amenable to the analytical solution in terms of

their time-dependent marking occupation probabilities. For instance, the subordinate pro-

cess can be a non homogeneous Markov chain, for which exact methods exist for the

transient analysis [71]. Another interesting case is when the subordinate process is a

semi-Markov process, as in the case when some general transitions are included in the SN

submodel [33], and they restart their firing each time a marking change occurs. Last, we

consider the case when the subordinate process is a MRGP itself. In this case the SN is

allowed to contain general transitions, which must however fulfil Assumption 1.2, that is

a single general transition can be enabled in each marking of the nested MRGP. The tran-

sient marking occupation probabilities of the subordinate process can be computed ac-

cording to Equation (1.5).

In all the scenarios above, the condition stated by Assumption 6.1 requires that the mem-

ory of subordinate process is lost each time a phase completes. This constraint that the

PhN submodel imposes on the SN submodel leads to a approximate modelling whenever

this loss of memory is not on the semantic of the activities modelled in the SN. The loss

of memory can be reflected in different ways in the behaviour of the non memoryless

transitions included in the SN. In fact we can either model the loss of memory with a

premature completion of all the timed non exponential transitions of the SN at the phase

ending times, or with their restart at the beginning of the next phase, provided that they

are still enabled. These two different possibilities can be in some cases conveniently ex-


ploited to compute bounds on the error that the approximate modelling introduces in the

outcomes of the evaluation, as we shall explain in the following, when dealing with the

modelling of our example of PMS.

6.2.4 Sensitivity analysis of MRSPN models of PMSs

Suppose now that some of the parameter values defining the PMS model can be varied,

for instance because they are subject to some estimation error, or because different sce-

narios of the system are to be analysed. In this case, an additional strategy of analysis of

the PMS is that offered by the study of the sensitivity of the model with respect to the

parameter variations. As an example of the various sensitivity studies that can be per-

formed, we analytically derive the sensitivity functions that represent the effects that pa-

rameter variations may have on the transient marking occupation probability matrix V ( t )

of the MRGP underlying the PMS, and on the probability R of successfully completing

the mission.

Sensitivity analysis of V (t)

To carry on the sensitivity analysis of the transient marking occupation probability ma-

trix, we consider the matrix V (t) as being a function of some independent parameter θ ,

that is V ( t ) = V (θ ;t ). We denote with S (θ ;t ) the sensitivity function matrix, that is the

derivative of V (θ ;t ) with respect to θ .

Matrix S (θ ;t ) can be partitioned into its blocks Si, j (θ ;t ) according to the same parti-

tion as that defined for V (θ ;t ). Since the expression for the transient marking occupation

probabilities is given in terms of the LSTs of the kernel matrices in Equation (6.5), we ob-

tain the sensitivity function matrix in the transform domain. Let us denote with

Voi, j (θ ;s ) and S

oi, j (θ ;s ) the Laplace Stieltjes transform of block matrix V i, j (θ ;t ) and

Si, j (θ ;t ), respectively. The sensitivity function transform is obtained as follows:

S oi, j (θ ;s ) = e−st d

∂∂θ

V i, j (θ ;t )

0

+∞

∫ = ∂∂θ

e−st dV i, j (θ ;t )

0

+∞

∫ = ∂V oi, j (θ ;s )

∂θ


where the second equality comes from the fact that θ is independent from both s and t.

Carrying on the differentiation of Voi, j (θ ;s ) gives the following expressions for the sen-

sitivity function:

S o

i, j (θ ;s ) = ∂∂θ

K oh ,h+1( s )

h=1

j−1

∏ Eoj , j ( s ) =

= K ol ,l+1( s )

∂K oh ,h+1( s )

∂θK o

l ,l+1( s )l=h+1

j−1

∏l= i

h−1

∏h= i

j−1

∑⎡

⎣⎢⎢

⎤

⎦⎥⎥Eo

j , j ( s )+

+ K o

h ,h+1( s )h=1

j−1

∏ ∂Eoj , j ( s )

∂θ (6.12)

Thus, the evaluation of the sensitivity function S (θ ;t ) is reduced to the computation of

the derivatives with respect to θ of the LSTs of the blocks of the kernel matrices.

When more specific assumption on the structure of the MRSPN model of the PMS are

made, the general Equation (6.12) can be further simplified. For instance, in the case the

subordinate processes are simple homogeneous Markov chains, the derivatives of the

LSTs of the kernel matrix blocks can be obtained as follows:

∂K oi ,i+1( s )

∂θ= ∂∂θ

L eQitΔ i f i( t )[ ] = L∂∂θ

eQitΔ i f i( t )( )⎡

⎣⎢

⎤

⎦⎥ =

= L∂eQit

∂θΔ i f i( t )

⎡

⎣⎢⎢

⎤

⎦⎥⎥+ L eQit ∂Δ i

∂θf i( t )

⎡

⎣⎢

⎤

⎦⎥ + L eQitΔ i

∂f i( t )

∂θ⎡

⎣⎢

⎤

⎦⎥

∂Eoj , j ( s )

∂θ= ∂∂θ

sL[eQit (1− Fi( t ))] = sL[∂∂θ

eQit (1− Fi( t ))( )] =

= sL[

∂eQit

∂θ(1− Fi( t ))] − sL[eQit f i( t )]


Sensitivity analysis of R

Consider now the probability R of successfully completing the mission as being a func-

tion of some independent parameter θ , that is R = R(θ ). Let r(θ ) denote the derivative

of R(θ ) with respect to parameter θ . From Equation (6.6), we can evaluate function

r(θ ) as follows:

r(θ ) = ∂R(θ )

∂θ= ∂∂θ

m0 Ki ,i+1∗

i=1

n

∏ ⋅Θ⎛

⎝⎜

⎞

⎠⎟ = m0

∂∂θ

Ki ,i+1∗

i=1

n

∏ ⋅Θ

The derivative in the second term of the preceding sum can be obtained as follows:

∂∂θ

Ki ,i+1∗

i=1

n

∏ = Kh ,h+1∗

h=1

j−1

∏j=1

n

∑ ∂∂θ

Kj , j+1∗ K

h ,h+1∗

h= j+1

n

∏ (6.13)

If we consider a particular distribution of the phase duration and a specific type of sub-

ordinate process, the general expression given above for function r(θ ) can be further

specialised. For instance, consider a MRSPN model for a PMS having an exponential du-

ration of the phases, and a subordinate process that in each phase is a homogeneous

Markov chain. In this case, by combining Equations (6.10), and (6.13), we obtain the fol-

lowing expression for r(θ ):

r(θ ) = m0 ( I − Qh / λh )−1Δhh=1

j−1

∏j=1

n

∑ ∂∂θ

( I − Q j / λ j )−1Δ j⎛⎝⎜

⎞⎠⎟

( I − Qh / λh )−1

h= j+1

n

∏ ΔhΘ

Note that the expression involving the derivative of matrix ( I − Q j / λ j )−1 can be

worked out as follows:

∂∂θ

( I − Q j / λ j )−1Δ j = ( I − Q j / λ j )−1 ∂Q j / λ j

∂θ( I − Q j / λ j )−1Δ j



The procedure to compute the time dependent marking occupation probabilities according

to Equations (6.4) and (6.5), and the Reliability according to Equation (6.6), is dependent

on the specific characteristics of the MRSPN model of the PMS. In particular, the prop-

erties of the subordinate processes and of the phase duration distributions can affect the

evaluation procedure. However, it is always true that the evaluation of the overall PMS is

reduced to the sequential analysis of its MRSPN model inside the various phases.

Therefore, an efficient solution scheme similar to the one presented for the DSPN models

of PMSs applies.

For instance, suppose we are interested in computing the transient marking occupation

probabilities of the MRSPN model at the end of the mission, that is the kernel matrix

block V 1 ,n+1( t ). The solution algorithm takes as input the MRSPN model and its initial

probability vector, and performs the following steps:

1) builds the i-th subordinate processes, from the analysis of the reachability graph

of the model during the enabling period of transition tiGen ;

2) builds the branching probability matrix Δ i , from phase i to phase i + 1,

i = 1,2,K ,n;

3) obtains the transient state occupation probability matrix Π i( t ) of the subordinate

process i, i = 1,2,K ,n;

4) if phases have deterministic duration, then completes the analysis in the time do-

main according to Equation (6.4), or else computes the LSTs of the kernel matrix

blocks and applies Equation (6.5).

The computational complexity of the preceding steps is dominated by that of steps 4)

and 5), which are heavily dependent from the particular characteristics of the specific

MRSPN model.

We can easily provide the computational complexity required to evaluate the Reliability

at the mission completion time R, in the case the subordinate process is a homogeneous

Markov chain. In this scenario, according to Equation (6.6), we basically need to apply

the steps of the following algorithm:


1) build the Markov chain subordinate to phase i, and obtain its transition rate ma-

trix Qi, i = 1,2,K ,n;

2) build the branching probability matrix Δ i , from phase i to phase i + 1,

i = 1,2,K ,n ;

3) obtain the limit K

i ,i+1∗ as s → 0+ of the LST K

oi ,i+1( s ) of the global kernel matrix

blocks, i = 1,2,K ,n;

4) multiply the initial probability vector, the matrices K

i ,i+1∗ , and the vector Θ ac-

cording to the ordering given by Equation (6.6), to obtain the Reliability R.

The computational complexity of this algorithm is dominated by the evaluation of the

limit matrices in step 3). Indeed, apart from the special case of deterministic phase dura-

tion, the inversion of a matrix of the form I − Qi is in general (see Appendix B) needed to

derive each of the matrices K

i ,i+1∗ , i = 1,2,K ,n. This inversion requires a complexity of

O(C

i3 ) operations (multiplications), where Ci is the cardinality of the state space Si of

the Markov chain subordinate to phase i. Thus, in this scenario of PMS, the overall

complexity required for the evaluation of the Reliability R is dominated by O( C

i3

i=1n∑ ).


6.4.1 Modelling

In this section we discuss the MRSPN modelling of our example of PMS. The extreme

modelling power of MRSPN allows to take into account all the features of the PMS. In

particular, we can accommodate the random duration of the phases, the flexibility charac-

teristics of the mission, and the general distribution of the time needed to perform the re-

pair of the failed processors.

The sole approximation that may need be introduced is due to the loss of memory of the

non exponential repair activities at the phase completion times. Notice that this leads to

an approximate modelling if and only if such a memoryless of the repair does not respect

the semantic of the repair itself. In fact, we can think of some interesting cases when the

repair activities can be stopped at the phase ending times, for instance when the next


phase to be performed is a maintenance phase, during which all the failed components are

replaced.

Anyway, in the case when such an approximation is actually introduced in the model, the

impact on the results can be easily estimated, by considering the following bounding ap-

proach. Suppose that, at the phase ending times, we decide to model the loss of the mem-

ory by forcing the premature completion of the repairs that are currently being executed.

It is easy to realise that such a model would provide an optimistic estimation of the de-

pendability measures of the original PMS. Conversely, we can obtain a pessimistic esti-

mation of the dependability measures from another model of PMS, in which the repairs

that are not completed as a phase ends simply start over as the next phase begins. These

two possible model provide an upper and a lower bound on the exact dependability mea-

sures, thus allowing for the estimation of the error introduced with the loss of memory.

Moreover, if the actual semantic of the reparation is such that forcing the loss of memory

at the phase completion times does not allow a satisfactory modelling, some modifica-

tions can be made to the upper and lower bound models, to improve their accuracy, and

consequently limit the gap between the pair of results they provide. For instance, one

could apply a technique similar to the phase expansion one [30], to split the repair in a

sequence of shorter activities, for which the influence of the loss of memory would be

less relevant. In particular, we could model a deterministic repair whose duration is τwith two consecutive deterministic transitions, each one having constant delay τ / 2. In

this way, the restart or the premature completion of a repair activity would affect only

one out of the two stages, resulting in a better approximation of the original behaviour of

the PMS with respect to the case when the repair is modelled with a single transition.

The structure of the MRSPN model of the PMS is the same as the one we built in

Chapter 5 using the DSPN models, which is shown in Figures 5.5 and 5.6, and whose

definition is completed by Tables 5.1 and 5.2. The differences are only found in the firing

time distributions of the PhN general transitions, and in that of the repair rate.

6.4.2 Evaluation

The solution of the MRSPN model of our example of PMS is conducted by considering

the parameter values defined in Table 4.1. Moreover, we also need to specify the distri-


bution Fi( t ) of phase i duration, 1,2,3,4, and the distribution μ( t ) of the time needed

to perform the repair of a faulty processor. We select for phases 2 and 4 a random dura-

tion that follows the truncated exponential distribution of expected value τ2 and τ4 , and

whose truncation time point is γ = 3000 hours, and φ = 500 hours, respectively. This

means that the phases 2 and 4 necessarily complete not later than their truncation points.

The cumulative distribution function F2 ( t ) for the duration of phase 2 is as follows:

F2 ( t ) = (1− e−αt ) (1− e−αγ ) 0 ≤ t < γ

1 t ≥ γ

⎧⎨⎪

⎩⎪

and F4 ( t ) is similarly defined. Notice that the phase probability density functions

f 2 ( t ) and f 4 ( t ) are discontinuous functions. We keep the deterministic duration of

phases 1 and 3. For the time to repair, we select a 2-stage Erlang distribution. All these

random distributions are completely defined by their expected values already given by

Table 4.1.

We consider the pair of MRSPN models that are defined by taking into account the two

different policies to force the loss of the memory at the phase completion times. Thus,

we are actually solving two different models, which, as explained above, provide an upper

and lower bound on the exact results.

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

R1 R2 R3 R4

lower bound

upper bound



The evaluation results for the Reliability Ri at phase i completion time, i = 1,2,3,4, are

shown in Figure 6.1. Notice that the gap between the two bound results tends to grow as

the number of phase increase. Obviously, the lower bound results provide a conservative

estimation of the PMS Reliability, which can be acceptable in many critical applications.

The accuracy of the PMS Reliability evaluation can be improved by applying the phase

expansion technique for the transitions that loose the memory at phase boundaries. We

split the Erlang repairs in two exponential subrepair transitions of rate μ / 2. When a

phase completes, only the subrepair that is currently being performed is affected. Still,

we can define two different MRSPN models, one of which models the premature com-

pletion of the subrepair, and the other one models the restart of the subrepair. Thus we

obtain a new pair of upper and lower bound models, which provide the much more accu-

rate Reliability results shown in Figure 6.2.

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

R1 R2 R3 R4

lower bound

upper bound

Figure 6.2: A more accurate estimation of the Reliability at phase ending times

We evaluate the reward cumulated during the mission of the PMS for different values of

the probability of successful spare insertion c, with all of the four MRSPN models con-

sidered so far. As it is shown by Figure 6.3, the second pair of lower and upper bound re-

sults is much tighter than the first one. The exact reward value for the PMS we have

taken as example lies within the interval defined by the second pair of results.


0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.85 0.9 0.95 0.99 0.995 0.999

lower bound 1upper bound 1lower bound 2upper bound 2


cccc

Figure 6.3: Reward cumulated during the PMS mission


The methodology based on MRSPNs we have introduced in this chapter provides a clear

and generally applicable modelling approach for the study of a wide class of PMSs. Even

very complex features of PMSs can be accommodated in our modelling framework, which

also provides a general analytical solution technique to evaluate the time-dependent be-

haviour and the Reliability figures of the MRSPN models of PMSs.

The modelling approach we employed is mainly based on that proposed in the last chap-

ter for the DSPN models of PMSs. Therefore, the high expressiveness, the consequent

simplicity of the modelling, and the possibility of considering flexible missions, are also

found in the MRSPN models of PMSs. Furthermore, the modelling power of MRSPNs

allows to relax those restrictive assumptions on the phase duration and on the types of

transitions that can be included in the SN submodel.

Therefore, very broad conditions are sufficient to guarantee the applicability of the

MRSPN methodology. We provided a specific sufficient condition (Assumption 6.1) that

exactly defines the set of Petri net models of PMSs that can be solved within the frame-

work of our MRSPN approach. If such assumption is fulfilled, the results we derived by

specialising the MRGP theory can be applied, which provide an analytical solution


method for the study of the MRSPN transient marking occupation probabilities and of

the Reliability figures of the PMS.

Once again, the computational solution required for the solution of the overall model of

the PMS is reduced to that necessary to separately analyse the PMS inside the different

phases. However, we must observe that, due of the generality of the MRSPNs, the exact

computational cost required for the application of our methodology cannot be estimated

in general, in that it is heavily dependent from the characteristics of the subordinate pro-

cesses.

The MRSPN modelling approach represents a relevant enlargement of the class of PMSs

that can be analytically studied. With respect to the class of PMSs that the methods pro-

posed in the literature (see Chapter 3) can deal with, we are now able to exactly represent

inside our MRSPN models many more interesting features of PMSs, without renouncing

to usefulness of an analytical solution that can be carried out at a limited cost.

Chapter 7

Conclusions

In this thesis we have addressed the analytical modelling and evaluation of dependability

related characteristics for the class of systems that are known as phased mission systems.

PMSs are often encountered in many critical applications, and thus the evaluation of their

dependability attributes is a task of primary relevance. However, due to their dynamic

behaviour, a systematic and computationally efficient approach to the modelling and

evaluation of PMSs had not been proposed in the literature.

We have started our investigation in the first part of the dissertation from a review of the

methodologies that have been published over the last decades. Several different ap-

proaches have been considered, but the various peculiar aspects of PMSs represent a

source of formidable problems for the dependability modelling and analysis techniques

that have been applied for their solution. From our preliminary inspection, we have con-

cluded that a generally applicable methodology able to effectively deal with the various

features of PMSs was still missing.

Therefore, we have set as a goal for this dissertation the development of computationally

efficient and generally applicable modelling and evaluation methodologies for the depend-

ability analysis of PMSs. The key characteristics for the methodologies to be developed

have been defined to be understandability, conciseness, modifiability and reusability of

the models, together with a limited computational complexity of the evaluation step.

We have attacked the study of PMSs with the aim of taking into account all the specific

features of PMSs, including those that had not been considered in the literature because of

the limits of the various modelling methodologies. In particular, we have considered flexi-

ble missions of PMSs, whereas all the methods on the literature only had dealt with fixed

Conclusions 125

mission profiles, and the possibility of having phases whose duration is drawn from a

generally distributed random variable, while only phases of constant duration had been

taken into account in the previous studies.

To honour these ambitious commitments, we have defined three different modelling and

evaluation methodologies in the second part of the dissertation, which step by step in-

clude more and more features of PMSs, and relax the restrictive assumptions that had

been introduced in the literature to permit their analytical tractability.

We have first presented a method based on a hierarchy of Markov models, which, within

the same restricted scenario of constant phase duration of PMSs, proposes several origi-

nal elements. In particular, it introduces a systematic modelling of the dependencies

among consecutive phases, which had posed relevant issues to many of the methodolo-

gies proposed in the literature. Moreover, we perform a first step towards the introduc-

tion of a flexible profile of the mission, allowing the dynamic choice of the next phase to

be performed at the time the preceding phase ends.

Then, we have proposed a second methodology for constant phase duration PMSs, based

on the high level modelling paradigm of deterministic and stochastic Petri nets. Thanks to

the expressiveness of DSPN models, we have defined a general modelling strategy that is

able to represent in a very concise and easy to read form even very complex behaviour of

PMSs. Also, the single DSPN model of a PMS completely relieves the necessity of ex-

plicitly handling phase dependencies, thus avoiding a cumbersome and error-prone mod-

elling task. Moreover, it allows to consider flexible missions of PMSs, in which the next

phase to be executed can be selected at the time the previous phase ends, and the duration

of the phases can be dynamically adjusted. The proposed solution approach for DSPN

models allows to perform the analytical sensitivity analysis of a PMS, a task that, to the

best of our knowledge, had not been considered in the literature, yet.

Finally, we have proposed a last methodology based on MRSPN models, which consider-

ably widens the class of PMSs that can be analytically solved. MRSPN models can rep-

resent in quite a natural way all those scenarios of PMSs previously accommodated by

the various studies in the literature, and by the other two methodologies proposed in this

thesis. Moreover, MRSPNs can exactly represent PMSs whose phases have a randomly

distributed duration. The general results have been specialised for particular scenarios of

Conclusions 126

PMSs, deriving in many interesting cases simple and explicit expressions for the

Reliability figures of the system. We have also analytically derived some sensitivity

functions for the MRSPN models of PMSs.

For all the three defined methodologies, a limited computational cost is required for the

evaluation of the models. Indeed, the complexity of the analysis of the overall mission of

the PMS is in each case reduced to that necessary for the separate solution inside the dif-

ferent phases. Thus, the computational cost of the original methodologies here proposed

is comparable to that of the cheapest approaches described in the survey we have given in

Chapter 3.

Throughout the dissertation we have applied the three different methodologies to an ideal

example of PMS, which we have defined as a benchmark to exercise the various ap-

proaches on the same application case. We have evaluated the Reliability of the PMS at

the phase completion times, and observed how the various simplifying hypotheses

needed for the applicability of the methods affect the final results. The results shown in

Figure 7.1 for the Reliability R at mission completion time of the example of PMS illus-

trate the impact of the assumptions that the various methods require. We can immedi-

ately understand from the differences in the three values the effects that a scantily realis-

tic modelling may have on the final results.

0

0.2

0.4

0.6

0.8

1

Markov DSPN MRSPN

Re l i ab i l i t y

Figure 7.1: Reliability estimations with the three methodologies

Conclusions 127

The results of the research presented in this thesis offer a relevant practical contribution

to the study of dependability features of PMSs. A future activity that would represent a

further significant improvement of the result applicability is the implementation of the al-

gorithmic solution procedures for the various methodologies here proposed, in the

framework of a specific automated tool for the modelling and evaluation of PMSs.

References

[1] M. Ajmone-Marsan and G. Chiola, “On Petri nets with deterministic and

exponentially distributed firing times,” in Lecture Notes in Computer Science 266:

Springer-Verlag, 1987, pp. 132-145.

[2] M. Ajmone-Marsan, G. Conte, and G. Balbo, “A class of generalized stochastic

Petri nets for the performance evaluation of multiprocessor systems,” ACM

Transactions on Computer Systems, vol. 2, pp. 93-122, 1984.

[3] M. Alam and U. M. Al-Saggaf, “Quantitative Reliability Evaluation of Repairable

Phased-Mission Systems Using Markov Approach,” IEEE Transactions on

Reliability, vol. R-35, pp. 498-503, 1986.

[4] S. Allmaier and S. Dalibor, “PANDA - Petri net analysis and design assistant,”

presented at Performance TOOLS’97, Saint Malo, France, 1997.

[5] J. Arlat, T. Eliasson, K. Kanoun, et al., “Evaluation of Fault -Tolerant Data

Handling Systems for Spacecraft: Measures, Techniques and Example

Applications,” LAAS-CNRS 86.321, November 1986.

[6] J. Arlat, K. Kanoun, and J. C. Laprie, “Dependability Modelling and Evaluation of

Software Fault-Tolerant Systems,” IEEE Transactions on Computers, vol. 39, pp.

540-513, 1990.

[7] B. E. Aupperle, J. F. Meyer, and L. Wei, “Evaluation of fault-tolerant systems

with non-homogeneous workloads,” presented at 19th IEEE Fault Tolerant

Computing Symposium (FTCS-19), 1989.

[8] A. Avizienis, “Building Dependable Systems: How to Keep Up with Complexity,”

presented at IEEE 25-th International Symposium on Fault-Tolerant Computing

Systems (FTCS-25), Pasadina, U.S.A., 1995.

[9] M. D. Beaudry, “Performance-Related Reliability Measures for Computing

Systems,” IEEE Transactions on Computers, vol. 27, pp. 540-547, 1978.

Bibliography 129

[10] J. T. Blake, A. Reibman, and K. S. Trivedi, “Sensitivity analysis of reliability and

performability measures for multiprocessor systems,”, Technical Report DUKE

TR-1987-32, Duke University, 1987.

[11] A. Bobbio and M. Telek, “Computational restrictions for SPN with generally

distributed transition times,” presented at First European Dependable Computing

Conference (EDCC-1), 1994.

[12] A. Bobbio and M. Telek, “Markov regenerative SPN with non-overlapping activity

cycles,” presented at International Computer Performance and Dependability

Symposium (IPDS’95), 1995.

[13] A. Bondavalli, S. Chiaradonna, F. Di Giandomenico, et al., “Dependability of

Iterative Software: a Model for Evaluating the Effects of Input Correlation,”

presented at SAFECOMP, Belgirate, Italy, 1995.

[14] A. Bondavalli, F. Di Giandomenico, and I. Mura, “An Optimal Value-Based

Admission Policy and its Reflective Use in Real-Time Systems,” Real-Time

Systems, to appear.

[15] A. Bondavalli, I. Mura, and M. Nelli, “Analytical Modelling and Evaluation of

Phased-Mission Systems for Space Applications,” presented at IEEE High

Assurance System Engineering Workshop (HASE’97), Bethesda Maryland, USA,

1997.

[16] A. Bondavalli, L. Strigini, and J. A. Stankovic, “Adaptable Fault-Tolerance for

Real-Time Systems,” in Responsive Computer Systems, Towards the Integration of

Fault-Tolerance and Real-Time: Kluwer, 1994.

[17] J. L. Bricker, “A unified method for analizing mission reliability for fault tolerant

computer systems,” IEEE Transactions on Reliability, vol. R-22, 1973.

[18] G. R. Burdick, J. B. Fussell, D. M. Rasmuson, et al., “Phased Mission Analysis: A

review of New Developments and An Application,” IEEE Transactions on

Reliability, vol. R-26, pp. 43-49, 1977.

[19] A. Burns and N. Audsley, “Scheduling Hard Real-Time Systems: a Review,”

Software Engineering Journal, vol. 6, pp. 116-128, 1991.

[20] G. Chiola, G. Franceschinis, R. Gaeta, et al., “GreatSPN 1.7: graphical editor and

anlyzer for timed and stochastic Petri nets,” Performance Evaluation, vol. 24, pp.

47-68, 1997.

Bibliography 130

[21] H. Choi, V. Kulkarni, and K. S. Trivedi, “Markov regenerative stochastic Petri

nets,” Performance Evaluation, vol. 20, pp. 337-356, 1994.

[22] H. Choi, V. G. Kulkarni, and K. S. Trivedi, “Transient analysis of deterministic and

stochastic Petri nets.,” presented at 14th International Conference on Application

and Theory of Petri Nets, Chicago Illinois, USA, 1993.

[23] H. Choi, V. Mainkar, and K. S. Trivedi, “Sensitivity analysis of deterministic and

stochastic Petri nets,” presented at MASCOTS ‘93, 1993.

[24] G. Ciardo, R. German, and C. Lindemann, “A characterization of the stochastic

process underlying a stochastic Petri net,” IEEE Transactions on Software

Engineering, vol. 20, pp. 506-515, 1994.

[25] G. Ciardo, J. Muppala, and K. S. Trivedi, “Spnp: Stochastic Petri net package.,”

presented at International Conference on Petri Nets and Performance Models,

Kyoto, Japan, 1989.

[26] E. Cinlar, Introduction to Stochastic Processes. Englewood Cliffs, NJ, USA:

Prentice-Hall, 1975.

[27] A. Conway, “Analytical performance evaluation of communication protocols,”

IEEE Journal on Selected Areas of Communications, vol. 9, pp. 1-14, 1991.

[28] P. J. Courtois and P. Semal, “Bounds for the positive eigenvectors of non negative

matrices and for their approximation by decomposition,” Journal of the ACM, vol.

31, pp. 804-825, 1984.

[29] P. J. Courtois and P. Semal, “Computable bounds for conditional steady-state

probabilities in large Markov chains and in queueing models,” IEEE Journal on

Selected Areas of Communications, vol. 4, pp. 920-926, 1986.

[30] D. Cox, “The analysis of non-Markov stochastic processes by the inclusion of

supplementary variables,” Proc. Cambridge Philosophical Society, vol. 51, pp. 433-

441, 1955.

[31] J. B. Dugan, “Automated Analysis of Phased-Mission Reliability,” IEEE

Transactions on Reliability, vol. 40, pp. 45-52, 1991.

[32] J. B. Dugan and M. R. Lyu, “Dependability modeling for fault-tolerant software

and systems,” in Software fault-tolerance, vol. volume 3 of Trends in Software,

Trends in Software, M. R. Lyu, Ed. New York: Wiley & Sons, 1995, pp. 109-137.

Bibliography 131

[33] J. B. Dugan, K. S. Trivedi, R. Geist, et al., “Extended stochastic Petri nets:

application and analysis,” presented at PERFORMANCE ‘84, Paris, France, 1984.

[34] J. D. Esary and H. Ziehms, “Reliability analysis of phased missions,” in Reliability

and fault tree analysis: SIAM Philadelphia, 1975, pp. 213-236.

[35] P. M. Frank, Introduction to System Sensitivity Theory: Academic Press, New York

USA, 1978.

[36] R. German, “Transient analysis of deterministic and stochastic Petri nets by

method of supplementary variables.,” presented at MASCOTS ‘95, Durham, NC,

1995.

[37] R. German, C. Kelling, A. Zimmermann, et al., “TimeNET: a toolkit for evaluating

non-Markovian stochastic Petri nets,” Performance Evaluation, vol. 24, 1995.

[38] R. German and C. Lindemann, “Analysis of stochastic Petri nets by the method of

supplementary variables,” Performance Evaluation, vol. 20, pp. 317-335, 1994.

[39] P. Heidelberger and A. Goyal, “Sensitivity analysis of continuous time Markov

chains using uniformization,” presented at 2-nd International Workshop on Applied

Mathematics and Performance/Reliability models of Computer/Communication

systems, Rome, Italy, 1987.

[40] I.E.C. International Electromechanical Commission SC 65A, “Functional safety:

safety related systems,” , Draft n. 1508-5, 1995.

[41] K. Kanoun, M. Borrel, T. Morteveille, et al., “Modeling the dependability of

CAUTRA, a subset of the french air traffic control system,” presented at IEEE 26-

th International Symposium on Fault-Tolerant Computing (FTCS26), Sendai,

Japan, 1996.

[42] K. Kim and T. Lawrence, “Adaptive Fault-Tolerance in Complex Real-Time

Distributed Computer Applications,” Computer Communications, vol. 15, 1992.

[43] L. Kleinrock, Queueing Systems: Wiley-Interscience, 1976.

[44] V. G. Kulkarni, Modeling and analysis of stochastic systems: Chapman-Hall, 1995.

[45] J. C. Laprie, “Dependability: a Unifying Concept for Reliable Computing and Fault

Tolerance,” in Dependability of Resilient Computers, T. Anderson, Ed.: BSP

Professional Books, 1989, pp. 1-28.

Bibliography 132

[46] J. C. Laprie, “Dependability-Its Attribues, Impairments and Means,” in Predictably

Dependable Computing Systems, B. Randell, J. C. Laprie, H. Kopetz, and B.

Littlewood, Eds.: Springer-Verlag, 1995, pp. 3-24.

[47] J. C. Laprie, J. Arlat, C. Beounes, et al., “Definition and Analysis of Hardware-and-

Software Fault-Tolerant Architectures,” IEEE Computer, vol. 23, pp. 39-51, 1990.

[48] J. C. Laprie, J. Arlat, J. P. Blanquart, et al., Guide de la surete de fonctionnement.

Toulouse: Cepadues-Editions, 1996.

[49] S. S. Lavenberg, Computer performance modeling handbook. New York: Academic

Press, 1983.

[50] M. R. Lyu, Software Fault Tolerance: Wiley, 1995.

[51] Y. Ma and K. S. Trivedi, “An algorithm for reliability analysis of phased-mission

systems,” Reliability Engineering and System Safety, special issue on Reliability

Certification, to appear.

[52] M. Malhotra and K. S. Trived, “Dependability modeling using Petri nets,” IEEE


[53] M. Malhotra and K. S. Trivedi, “Power-hierarchy among dependability model

types,” IEEE Transactions on Reliability, vol. 43, pp. 493-502, 1994.

[54] S. Metge, M. Aguera, J. Arlat, et al., “SURF-2: a tool for dependability evaluation

based on Markov chains and stochastic Petri nets,” presented at 9-th International

Conference on Relaibility and Maintainability (ESREL’94), La Baule, France, 1994.

[55] J. F. Meyer, “On Evaluating the Performability of Degradable Computing

Systems,” IEEE Transactions on Computers, vol. C-29, pp. 720-731, 1980.

[56] J. F. Meyer, D. G. Furchgott, and L. T. Wu, “Performability Evaluation of the

SIFT Computer,” presented at IEEE FTCS’79 Fault-Tolerant Computing

Symposium, June20-22, Madison, Wisconsin, USA, 1979.

[57] M. K. Molloy, “Performance analysis using stochastic Petri nets,” IEEE

Transactions on Computers, vol. 31, pp. 913-917, 1982.

[58] M. K. Molloy, “Discrete time stochastic Petri nets,” IEEE Transactions on

Software Engineering, vol. 11, pp. 417-423, 1985.

[59] J. K. Muppala and K. S. Trivedi, “GSPN models: sensitivity analysis and

applications,” presented at 28-th ACM Southeast Region Conference, 1990.

Bibliography 133

[60] M. Nelli, A. Bondavalli, and L. Simoncini, “Dependability modelling and analysis

of complex control systems: an application to railway interlocking,” presented at

EDCC2, Taormina, Italy, 1996.

[61] M. F. Neuts, Matrix-Geometric Solutions in Stochastic Models: Johns Hopkins

University Press, 1982.

[62] A. Pedar and V. V. S. Sarma, “Phased-Mission Analysis for Evaluating the

Effectiveness of Aerospace Computing-Systems,” IEEE Transactions on Reliability,

vol. R-30, pp. 429-437, 1981.

[63] H. G. Perros, Y. Dallery, and G. Pujolle, “Analysis of a queueing network model

with class dependent window flow control,” IEEE Infocom, pp. 968-977, 1992.

[64] J. L. Peterson, Petri net theory and the modeling of systems. Englewood Cliffs, NJ:

Prentice-Hall, 1981.

[65] C. Petri, “Kommunikation mit automaten,” : University of Bonn, Bonn, Germany,

1962.

[66] D. Powell, J. Arlat, L. Beus-Dukic, et al., “GUARDS: a Generic Upgradable

Architecture for Real-time Dependable Systems,” IEEE Transactions on

Paralleland Distributed Systems, to appear.

[67] A. Puliafito, M. Scarpa, and K. S. Trivedi, “Petri nets with k simultaneously

enabled generally distributed timed transitions,” Performance Evaluation, 1997.

[68] A. Reibman and K. S. Trivedi, “Numerical transient analysis of Markov models,”

Computers and Operation Research, vol. 15, pp. 19-36, 1988.

[69] M. Reiser, “Communication system models embedded in the OSI reference model, a

survey,” in Computer Networking and Performance Evaluation, T. Hasegawa, H.

Takagi, and Y. Takahashi, Eds.: Elsevier, 1986.

[70] W. Reisig, Petri nets: an introduction: Springer Verlag, 1985.

[71] A. Rindos, S. Woolet, I. Viniotis, et al., “Exact methods for the transient analysis of

non-homogeneous continuous-time Markov chains,” presented at 2-nd International

Workshop on the Numerical Solution of Markov Chains, Raleigh, NC, USA, 1995.

[72] S. M. Ross, Stochastic Processes. Berkeley, CA, USA: John Wiley & Sons, 1983.

[73] R. A. Sahner and K. S. Trivedi, “Reliability modeling using SHARPE,” IEEE


Bibliography 134

[74] W. H. Sanders and L. M. Malhis, “Dependability evaluation using composed SAN-

based reward models,” Journal of parallel and distributed computing, vol. 15, pp.

238-254, 1992.

[75] W. H. Sanders and J. F. Meyer, “METASAN: a performability evaluation tool

based on stochastic activity networks,” presented at ACM-IEEE Comp. Soc. 1986

Fall Joint Comp. Conf., Dallas TX, 1986.

[76] W. H. Sanders, W. D. Obal II, M. A. Qureshi, et al., “The UltraSAN modeling

environment,” Performance Evaluation, vol. 21, 1995.

[77] M. Schwartz, Telecommunication networks: protocol modelling and analysis:

Addison Wesley, 1987.

[78] D. P. Siewiorek and R. S. Swarz, Reliable Computer Systems: Digital Press, 1992.

[79] M. Smotherman and K. Zemoudeh, “A Non-Homogeneous Markov Model for

Phased-Mission Reliability Analysis,” IEEE Transactions on Reliability, vol. 38,

pp. 585-590, 1989.

[80] A. K. Somani, J. A. Ritcey, and S. H. L. Au, “Computationally-Efficent Phased-

Mission Reliability Analysis for Systems with Variable Configurations,” IEEE


[81] A. K. Somani and K. S. Trivedi, “Phased-Mission Systems Using Boolean

Algebraic Methods,” Performance Evaluation Review, pp. 98-107, 1994.

[82] A. T. Tai, A. Avizienis, and J. F. Meyer, “Evaluation of Fault-Tolerant Software: a

Performability Modeling Approach,” in Dependable Computing for Critical

Applications 3, C. E. Landwher, B. Randell, and L. Simoncini, Eds.: Springer-Verlag,

1992, pp. 113-135.

[83] A. T. Tai, S. N. Chau, L. Alkalaj, et al., “On board preventive maintenance: analysis

of effctiveness and optimal duty period,” presented at Third International

Workshop on Object-Oriented Real-Time Dependable Systems (WORDS’97),

Newport Beach, CA, USA, 1997.

[84] L. A. Tomek and K. S. Trivedi, “Analyses using stochastic reward nets,” in

Software fault-tolerance, vol. volume 3 of Trends in Software, Trends in Software,

M. R. Lyu, Ed. New York: Wiley & Sons, 1995, pp. 139-166.

[85] K. S. Trivedi, Probability and Statistics with Reliability, Queuing and Computer

Science Applications: Prentice-Hall, 1982.

Bibliography 135

[86] H. S. Winokur and L. J. Goldstein, “Analysis of mission oriented systems,” IEEE

Transactions on Reliability, vol. R-18, pp. 144-148, 1969.

Appendix A

Laplace transform properties

Suppose f ( t ) is a piecewise-continuous function, defined for t ≥ 0 . If f ( t ) is of expo-

nential order, that is there exist two constants M and α , such that | f ( t )|≤ Meαt holds

for any t ≥ 0 , then the Laplace transform L[ f ]( s ) of f ( t ) is defined as follows:

L[ f ]( s ) = e−st f ( t )dt

0

+∞

∫

for any complex number s such that Re( s ) > α . Table A.1 shows the Laplace transforms

for some important function.

Function Transform

c constant c / s,

t 1 / s2

ta , a natural number a! / sa+1

e−at 1 / ( s + a), Re( s ) > a

tbe−at b! / ( s + a)b+1 , Re( s ) > a

eM ⋅t , M matrix ( sI − M )−1

Table A.1: Laplace transform of some functions

The Laplace transform is very useful in simplifying calculations. Indeed, in the domain

transform expressions that involve integrals, derivatives and limits usually become ex-

tremely easy. Moreover, there is a one-to-one correspondence between functions and

Laplace transforms, and this implies that once the calculation has been performed in the

Laplace transform properties 137

transform domain, the result can be reobtained in the time domain by applying the inverse

Laplace transformation.

The main properties of the Laplace transform operator are listed in Table A.2.

Property Function Transform

Direct transform f ( t )

e−st f ( t )dt0+∞∫

Linearity af ( t )+ bg( t ) aL[ f ]( s )+ bL[g ]( s )

Real differentiation ′f ( t ) sL[ f ]( s )− f (0+ )

Real integration

f ( t )dt0t∫ s

−1L[ f ]( s )

Translation e−at f ( t ) L[ f ]( s + a)

Scale change a−1 f ( t / a) L[ f ]( as)

Differentiation with respect

to an independent variable

∂∂r

f ( t ,r )

∂∂r

L[ f ( r )]( s )

Final value limt→+∞ f ( t ) = lim

s→0 + sL[ f ]( s )

Initial value lim

t→0 + f ( t ) = lims→+∞ sL[ f ]( s )

Table A.2: Laplace transform properties

Appendix B

LSTs of the kernel matrix blocks

In this appendix we present the mathematical derivation of the LSTs of the kernel matrix

blocks for the MRGP underlying the MRSPN model of a PMS. We consider the case

when the subordinate process is a homogeneous continuous-time Markov chain. In this

case, the kernel matrix blocks are given by the following Equations, which we have de-

rived in Chapter 6:

K oi ,i+1( s ) = e−st dKi ,i+1( t )

0

+∞

∫ = L[eQi( t ) f i( t )]Δ i , i = 1,2,K ,n

Eoi ,i( s ) = e−st dEi ,i( t ) =

0

+∞

∫ sL[eQi( t )(1− Fi( t ))] , i = 1,2,K.n+ 1, t ≥ 0

where Fi( t ) and f i( t ) are the cumulative distribution function and the probability den-

sity function of phase i duration, respectively, and eQit is the transient probability ma-

trix of the subordinate Markov chain. In the following, we obtain the LSTs of the kernel

matrix blocks for various cases of the distribution of the phase duration.

B.1 Exponential distribution

Suppose phase i duration is a random variable exponentially distributed with parameter

λ , that is Fi( t ) = 1− e−λt , and f i( t ) = λe−λt , t ≥ 0 . In this case, the expression for the

global kernel block Koi ,i+1( s ) can be rewritten as follows:

LSTs of the kernel matrix blocks 139

Koi ,i+1( s ) = L[eQi( t )λe−λt ]Δ i = λ((λ + s )I − Qi )−1Δ i

where to obtain the last expression we recall that the Laplace transform of the matrix ex-

ponential eM ⋅t is given by ( sI − M )−1 , and we apply the translation property of Laplace

transforms. Similarly, the local kernel block matrix Eoi ,i( s ) becomes:

Eoi ,i( s ) = sL[eQi( t )e−λt ] = s((λ + s )I − Qi )−1

B.2 Deterministic distribution

Consider a phase i having a fixed duration of τ units of time. The probability density

function f i( t ) of the random variable representing the phase duration is the Dirac impul-

sive function at τ . We obtain the following expression for the global kernel block matri-

ces:

Koi ,i+1( s ) = L[eQit f ( t )]Δ = e−sτeQiτΔ = e−( sI −Qi )τ

The cumulative distribution function Fi( t ) is given by

F( t ) =

0 0 ≤ t < τ1 t ≥ τ⎧⎨⎩

The local kernel block matrices can be obtained as follows:

Eoi ,i( s ) = sL[eQit F( t )] = s e−st eQit dt =

0

τ

∫

= s e−st eQit dt − s e−st eQit dt =τ

∞

∫ s( sI − Qi )−1

0

∞

∫ − se−( sI −Qi )τ e−sueQiudu0

∞

∫

where the last Equation is obtained with the substitution u = t + τ . Finally, we obtain

Eo

i ,i( s ) = s( sI − Qi )−1 I − e−( sI −Qi )τ( )


B.3 Truncated exponential distribution

We consider this particular case as an example to show that also in the case of a finite

support distribution, the LSTs of the kernel matrices are easily derived. The distribution

of interest is obtained by truncating at time τ > 0 the exponential distribution of parame-

ter λ , and then normalising with respect to the probability e−λτ . The cumulative distri-

bution function and the probability density function of such truncated exponential distri-

bution are respectively:

F( t ) = (1− e−λt ) / (1− e−λτ ) 0 ≤ t < τ

1 t ≥ τ

⎧⎨⎪

⎩⎪ , f ( t ) = λe−λt / (1− e−λτ ) 0 ≤ t < τ

0 t ≥ τ

⎧⎨⎪

⎩⎪

and the global kernel generic block is easily obtained as follows:

K oi ,i+1( s ) = e−st eQit λe−λt

1− e−λτ0

τ

∫ dtΔ i =λ

1− e−λτe−st eQit e−λt

0

τ

∫ dtΔ i =

= λ1− e−λτ

L[eQit e−λt ]Δ i −λ

1− e−λτe−st eQit e−λt

τ

∞

∫ dtΔ i =

= λ(( s + λ )I − Qi )−1

1− e−λτI − e−(( s+λ ) I −Qi )τ( )Δ i

The blocks of the local kernel are obtained as follows:

Eoi ,i = s e−st eQit e−λt − e−λτ

1− e−λτ0

τ

∫ dt = s

1− e−λτe−st eQit e−λt dt −

0

τ

∫ e−λτ e−st eQit dt0

τ

∫⎡

⎣⎢⎢

⎤

⎦⎥⎥=

= s

1− e−λτ(( s + λ )I − Qi )−1 I − e−(( s+λ ) I −Qi )τ( ) − e−λτ ( sI − Qi )−1 I − e−( sI −Qi )τ( )⎡⎣⎢

⎤⎦⎥


B.4 Uniform distribution

The probability density function f i( t ) of the uniform distribution over the interval

[a,b] is given by:

f i( t ) =1

b− aa ≤ t ≤ b

0 elsewhere

⎧⎨⎪

⎩⎪

Therefore, the LST of the global kernel matrix blocks can be written as follows:

K oi ,i+1( s ) = 1

b− ae−st eQit dtΔ i

a

b

∫ = 1

b− ae−st eQit dt

0

b

∫ − e−st eQit dt0

a

∫⎡

⎣⎢⎢

⎤

⎦⎥⎥Δ i =

= 1

b− a( sI − Qi )−1( I − e−( sI −Qi )b )− ( sI − Qi )−1( I − e−( sI −Qi )a )[ ]Δ i =

= 1

b− a( sI − Qi )−1 e−( sI −Qi )a − e−( sI −Qi )b( )Δ i

The probability distribution function for the uniform distribution has the following form:

Fi( t ) =0 t < a

( t − b) / ( b− a) a ≤ t ≤ b

1 t > b

⎧

⎨⎪

⎩⎪

The LST of local kernel blocks takes the following form:

Eoi ,i( s ) = e−st eQit dt

o

a

∫ + b


a

b

∫ − 1

b− ae−st teQit dt

a

b

∫

The first and the second terms in the preceding sum are easily derived as follows:

e−st eQit dto

a

∫ = ( sI − Qi )−1 I − e−( sI −Qi )a( )


b


a

b

∫ = b

b− a( sI − Qi )−1 e−( sI −Qi )a − e−( sI −Qi )b( )

To obtain the third term, we proceed this way:

1


a

b

∫ = 1


a

∞

∫ − e−st teQit dtb

∞

∫⎛

⎝⎜⎜

⎞

⎠⎟⎟

Now, the first improper integral appearing in the preceding expression is solved as fol-

lows:

e−st teQit dta

∞

∫ = e−s( u+a )( u+ a)eQi( u+a )du =0

∞

∫

= e−sa eQia a e−sueQiudu+ e−suueQiudu0

∞

∫0

∞

∫⎡

⎣⎢⎢

⎤

⎦⎥⎥=

= e−( sI −Qi )a a( sI − Qi )−1 + e−suueQiudu0

∞

∫⎡

⎣⎢⎢

⎤

⎦⎥⎥= e−( sI −Qi )a a( sI − Qi )−1 + ( sI − Qi )−2[ ]

where the first equality is obtained with the substitution u = t − a. The second improper

integral is solved in a similar way. By combining all the intermediate results, with some

algebra we obtain:

Eo

i ,i( s ) = ( sI − Qi )−2

b− a( b− a)( sI − Qi )+ ( Qi − (1− s )I ) e−( sI −Qi )b − e−( sI −Qi )a( )⎡⎣⎢

⎤⎦⎥

B.5 Erlang distribution

We consider now as a last case that of the Erlang distribution with r stages of parameter

λ . The probability density function cumulative and the distribution function of such

Erlang variable are defined as follows:


f i( t ) = λr tr−1e−λt

( r − 1)!, t ≥ 0 Fi( t ) = 1− e−λt th

h!h=0

r−1

∑ , t ≥ 0

The global kernel blocks can be rewritten as follows:

K o

i ,i+1( s ) = L[eQi( t ) λr tr−1e−λt

( r − 1)!]Δ i =

λr

( r − 1)!L[eQi( t ) tr−1e−λt ]Δ i =

= λr

( r − 1)!(−1)r−1 ∂ r−1

∂sr−1L[eQi( t )e−λt ]Δ i =

λr

( r − 1)!(−1)r−1 ∂ r−1

∂sr−1((λ + s )I − Qi )−1Δ i

The derivative of the inverse of matrix (λ + s )I − Qi with respect to s can be obtained as

follows:

∂ r−1

∂sr−1((λ + s )I − Qi )−1 = ∂ r−1

∂sr−1

1

λ + sI − Qi

λ + s

⎛⎝⎜

⎞⎠⎟−1

= ∂ r−1

∂sr−1

1

λ + s

Qih

(λ + s )h+1h=0

∞

∑ =

= Q

ih ∂ r−1

∂sr−1(λ + s )−( h+1)

h=0

∞

∑ = Qih (−1)r−1( h+ r − 1)!

h!(λ + s )h+rh=0

∞

∑

By substituting this expression into the LST of Koi ,i+1( s ) we obtain:

K o

i ,i+1( s ) = λr

( r − 1)!(−1)r−1 Q

ih (−1)r−1( h+ r − 1)!

h!(λ + s )h+rh=0

∞

∑ Δ i =

= λr

(λ + s )r

h+ r − 1

h

⎛⎝⎜

⎞⎠⎟

Qih

(λ + s )hh=0

∞

∑ Δ i =λr

(λ + s )rI − Qi

λ + s

⎛⎝⎜

⎞⎠⎟−r

Δ i =

= λr ((λ + s )I − Qi )−r Δ i

The local kernel blocks become:

Eoi ,i( s ) = sL eQit e−λt th

h!h=0

r−1

∑⎡

⎣⎢⎢

⎤

⎦⎥⎥= s

1

h!h=0

r−1

∑ L[eQit e−λt th ]


By following the same steps of the global kernel we obtain:

s

1

h!h=0

r−1

∑ L[eQit e−λt th ] = s(−1)h

h!h=0

r−1

∑ ∂ h

∂sh((λ + s )I − Qi )−1 =

= s(−1)h

h!h=0

r−1

∑ Qip (−1)h( p+ h)!

p!(λ + s )p+h+1p=0

∞

∑ = s (λ + s )−( h+1)

h=0

r−1

∑ p+ h

p

⎛⎝⎜

⎞⎠⎟

Qip

(λ + s )pp=0

∞

∑ =

= s (λ + s )−( h+1)

h=0

r−1

∑ I − Qi

λ + s

⎛⎝⎜

⎞⎠⎟−( h+1)

= s ((λ + s )h=0

r−1

∑ I − Qi )−( h+1)

We observe that the same procedure we have here followed to compute the LSTs of the

kernel matrix blocks in the case of the Erlang distribution can be similarly applied to the

case of other more general phased distribution, such as the Coxian distributions [30] and

the phase-type distributions [61].

Date post:	05-Nov-2021
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

UNIVERSITÀ DEGLI STUDI DI PISA - RCL

Documents