UNIVERSITÀ DEGLI STUDI DI PISA
DIPARTIMENTO DI INGEGNERIA DELLA INFORMAZIONEELETTRONICA, INFORMATICA, TELECOMUNICAZIONI
TESI DI DOTTORATO DI RICERCA ININGEGNERIA ELETTRONICA, INFORMATICA
E DELLE TELECOMUNICAZIONI
XI CICLO
M O D ELLIN G A N D EV A LU A TIO N O FP H A S ED M IS S IO N S Y S TEM S
Il candidato
Ivan Mura
Il Tutore Il co-Tutore
Chiar.mo Prof. Ing. Luca Simoncini Dr. Andrea Bondavalli
GENNAIO 1999
M O D ELLIN G A N D EV A LU A TIO N O F
P H A S ED M IS S IO N S Y S TEM S
Ivan Mura
Table of contents
Introduction ..................................................................................................................... I
PART I: Framework of the research
Chapter 1. Analytical modelling for dependability evaluation ....................................... 1
1.1. The role of analytical modelling in dependability analysis ......................... 1
1.2. Dependability modelling methodologies and tools...................................... 4
1.2.1. Combinatorial modelling techniques ............................................ 4
Fault-trees.................................................................................... 4
Reliability block diagrams .......................................................... 6
1.2.2. Markov chains ............................................................................... 7
1.2.3. Classical Petri nets ........................................................................ 11
Place-transition Petri nets............................................................ 11
Stochastic Petri nets .................................................................... 14
Generalised stochastic Petri nets ................................................. 16
1.2.4. Non Markovian Petri nets ............................................................. 17
Deterministic and stochastic Petri nets ....................................... 17
Markov regenerative stochastic Petri nets .................................. 21
Chapter 2. Phased mission systems................................................................................. 23
2.1. Peculiar characteristics of PMSs .................................................................. 24
2.2. Examples of PMSs ....................................................................................... 27
2.3. Dependability measures for PMSs ............................................................... 29
2.4. An application example ............................................................................... 31
Chapter 3. State-of-the-art and open issues in the analysis of PMSs.............................. 36
3.1. Literature survey .......................................................................................... 37
3.2. Comparison of the methods based on the Markov approach ....................... 43
3.3. Open issues and objectives of the thesis ...................................................... 47
PART II: New methodologies for the analysis of PMS dependability
Chapter 4. Hierarchical Markov models for PMSs ......................................................... 51
4.1. The modelling approach............................................................................... 53
4.1.1. Upper level model ......................................................................... 53
4.1.2. Lower level models ....................................................................... 54
4.2. Evaluation procedure ................................................................................... 59
4.3. Application example .................................................................................... 62
4.3.1. Modelling ...................................................................................... 62
4.3.2. Evaluation ..................................................................................... 68
4.4. Concluding remarks ..................................................................................... 70
Chapter 5. DSPN models for PMSs ................................................................................ 73
5.1. The modelling approach............................................................................... 74
5.2. Analytical solution ....................................................................................... 81
5.2.1. Linear PhN .................................................................................... 82
5.2.2. Tree-like PhN ................................................................................ 85
5.2.3. Deterministic transitions with marking dependent firing rate ...... 87
5.2.4. Sensitivity analysis of DSPN models of PMSs............................. 87
5.3. Evaluation procedure ................................................................................... 91
5.4. Application example .................................................................................... 93
5.4.1. Modelling ...................................................................................... 93
5.4.2. Evaluation ..................................................................................... 96
5.5. Concluding remarks ..................................................................................... 98
Chapter 6. MRSPN models for PMSs............................................................................. 100
6.1. The modelling approach............................................................................... 101
6.2. Analytical solution ....................................................................................... 102
6.2.1. The transient marking occupation probabilities ............................ 103
6.2.2. Analytical evaluation of PMS Reliability ..................................... 107
6.2.3. Specialising the general results ..................................................... 108
Constant phase duration .............................................................. 109
Markov chain subordinate processes .......................................... 110
General subordinate processes .................................................... 113
6.2.4. Sensitivity analysis of MRSPN models of PMSs ......................... 114
Sensitivity analysis of V( t ) ........................................................ 114
Sensitivity analysis of R ............................................................. 116
6.3. Evaluation procedure ................................................................................... 117
6.4. Application example .................................................................................... 118
6.4.1. Modelling ...................................................................................... 118
6.4.2. Evaluation ..................................................................................... 119
6.5. Concluding remarks ..................................................................................... 122
Chapter 7. Conclusions ................................................................................................... 124
References. ...................................................................................................................... 128
Appendix A. Laplace transform properties ..................................................................... 136
Appendix B. LSTs of the kernel matrix blocks .............................................................. 138
B.1. Exponential distribution .............................................................................. 138
B.2. Deterministic distribution ............................................................................ 139
B.3. Truncated exponential distribution.............................................................. 140
B.4. Uniform distribution .................................................................................... 141
B.5. Erlang distribution ....................................................................................... 142
Introduction
Nowadays, an increasing number of critical systems is controlled by digital processing
systems, to effectively manage the complex control of the operations, and to ensure a
quick reaction against the occurrence of potentially dangerous events. As a consequence
of this pervasive deployment of computer systems, there is a growing need for depend-
able systems. The risk arising from the failures of the control system, in terms of both
human lives and economical losses, ought to be minimised. To this aim, specific design
methodologies have been defined to support the first project phases of the systems, and
fault-tolerance techniques and tools have been developed [6, 13, 16, 42, 47, 50, 82, 85] to
provide the means to cope with unforeseen situations which may turn into catastrophic
failures of the controlled systems.
Though the utilisation of the aforementioned techniques can surely increase the level of
reliance that can be put on a system, it is not enough to guarantee that the system will be
adequately dependable during its whole life-cycle. On the other hand, authorities push on
the constructors to impose high dependability targets for a wide variety of manufacturing,
production, and transportation systems, especially for all those systems whose malfunc-
tioning may endanger human lives [40]. The assessment of dependability attributes, like
Reliability, Safety, and Availability, is thus a necessary step for the verification and vali-
dation of a proper design, to ensure that the dependability targets have been achieved and
that only those systems adequately dependable come to operation.
Amongst the approaches commonly adopted to evaluate dependability attributes, analyt-
ical modelling has proven to be very useful and versatile [6, 8, 13, 41, 45-47, 50, 82, 85].
Modelling can be used for system assessment in all phases of the system life cycle. Since
the early design phases, models show their usefulness and potentialities, allowing for the
comparison of different architectural and design solutions and for the selection of the
most suitable one. Moreover, the sensitivity analyses that can be carried out after mod-
elling allow to identify dependability bottlenecks, thus highlighting problems in the de-
sign, and to identify the critical parameters (out of the many that are usually employed at
this stage), those to which the system is highly sensitive. Various methods and tools for
Introduction II
dependability modelling and analysis have been developed which provide support to the
analyst, during the phases of definition and evaluation of the models. Among these,
Markov based models [9, 15, 43, 49, 68, 85] and Petri nets [1, 2, 21, 22, 33, 36, 52, 53,
64, 65, 70, 84] have been widely accepted in the dependability community because of
their powerful representative capabilities, and the relatively cheap solution techniques,
and have been also integrated in many automated tools for the evaluation of system de-
pendability.
Many of the systems devoted to the control and management of critical activities have to
perform a series of tasks that must be accomplished in sequence. Their operational life
consists of a sequence of non-overlapping periods, called “phases”. A system that ex-
hibits such a phased behaviour is often called a phased mission system, abbreviated as
PMS hereafter. During a specific phase, a PMS is devoted to the execution of a particular
set of tasks, which may be completely different from the activities performed within
other phases. The performance and dependability requirements of a PMS can be sensibly
different from one phase to another. Moreover, during some phases the system may be
subject to a particularly stressing environment, thus experiencing dramatic increases in
the failure rate of its components. In order to accomplish its mission, a PMS may need to
change its configuration over time, to adopt the most suitable one with respect to the per-
formance and dependability requirements of the phase being currently executed, or sim-
ply to be more resilient to an hazardous external environment.
Because of their deployment in critical applications, the dependability modelling and
analysis of PMSs is a task of primary relevance. However, a systematic and generally
applicable methodology to this aim has not been found, yet. The modelling of complex
systems always poses formidable problems, and PMSs do not represent an exception to
the rule. Moreover, their phased behaviour adds a further degree of complexity to the
analysis of these systems. Modelling a PMS can be a complicate task even inside one
single phase; when a multiplicity of phases and the dependencies among them are to be
taken into account, additional difficulties are encountered, whichever modelling tools and
approaches are adopted.
This thesis aims at proposing new methodologies for the study of PMS dependability
related characteristics. The dissertation is divided in two main parts. The first part has the
role of defining the framework of the study. Chapter 1 introduces the motivations and the
goals of the system dependability analysis, and shortly presents the most commonly
employed methodologies and tools for dependability modelling and evaluation The spe-
Introduction III
cific features that characterise the PMSs with respect to other classes of systems are de-
scribed in Chapter 2, which also gives some examples of PMSs that can be found in vari-
ous application domains. The main dependability measures of interest for a PMS are also
defined in Chapter 2. Last, this same chapter introduces an ideal PMS application that is
used as the case of study throughout the entire dissertation, to provide a unified frame-
work for the application and comparison of the various approaches. A survey of the most
important works that have appeared in the literature for the study of PMSs is then pre-
sented in Chapter 3. We point out the advantages and the drawbacks of the proposed ap-
proaches, by using the application example defined in Chapter 2 as a benchmark to test
the applicability and efficiency of the various methodologies. The goals of the thesis are
then stated, with the aim of improving over the current limits of the state-of-the-art in the
dependability modelling of PMSs.
The second part of the thesis contains the original contribution to the dependability anal-
ysis of PMSs. It is articulated in three chapters, 4, 5, and 6, which describe different
methods for the analysis of PMSs. The method presented in Chapter 4 is based on a hier-
archy of Markov models, and proposes several novelty aspects that relieve some of the
weak points of the methods in the literature. In Chapter 5 we present a different approach
for the dependability modelling of PMSs, based on the class of Deterministic and
Stochastic Petri Nets (DSPNs). The method takes advantage of the power and expressive-
ness of DSPNs to define very concise and easy-to-read models for PMSs. At the same
time, DSPN models of PMSs can be analytically solved with a very efficient procedure.
Chapter 5 also presents the analytical derivation of the sensitivity functions for the DSPN
model of a PMS. The last methodology we present in Chapter 6 of this thesis has the
specific goal of enlarging the class of PMSs that can be analytically solved. We propose
the class of Markov Regenerative Stochastic Petri Nets (MRSPNs) as a modelling tools
for PMSs, and we derive a general, exact, and efficient analytical solution method.
Finally, a summary and some concluding remarks are given in Chapter 7, where the ob-
tained results are reviewed and discussed with respect to the fulfilment of the purposes of
the investigation. Some properties of the mathematical tools used throughout the thesis
and the derivation of some analytical expressions are presented in two short appendices.
The research presented throughout this thesis has been conducted in collaboration with
my tutor Prof. Luca Simoncini, from the department of Information Engineering of the
Pisa University, and my co-tutor Dr. Andrea Bondavalli, from the institute CNUCE of
the Italian National Research Council, to whom I am deeply thankful for the invaluable
support and for the time we enjoyed working together. I also wish to thank Prof. Kishor
Introduction IV
Shridharbhai Trivedi and Dr. Xinju Zang from the department of Electrical and
Computer Science Engineering of the Duke University at Durham, North Carolina, and
Ing. Manuela Nelli, previously a member of the Pisa Dependable Computing Center
research group, for the precious hints and fruitful discussions that contributed to the
fundamental results of this dissertation. I am also indebted to Dr. Felicita Di
Giandomenico, from the institute IEI of the Italian National Research Council, for her
support and the encouraging appreciation expressed to me during the whole Ph.D. course.
PART I:
Framework of the research
Chapter 1
Analytical modelling fordependability evaluation
1.1 The role of analytical modelling in dependability analysis
The term dependability has been introduced in the literature to define that characteristic
of a system that allows a justifiable reliance to be put in the services the system provides
[45, 46]. A system can be classified as dependable only with respect to the specific re-
quirements stated in the system specification for the service it must provide. A service
compliant with the system specification is called a proper service, a service that is not is
called an improper service. Since various types of requirements can be prescribed, the de-
pendability is in fact a unifying concept for a number of distinct system characteristics.
The following are some attributes of dependability. Reliability is a measure of the contin-
uous delivery of the correct service, Safety is the non-occurrence of catastrophic conse-
quences, Availability is the measure of the delivery of correct service with respect to the
alternation of correct and incorrect service, Security is the non-occurrence of unauthorised
access. Often other dependability-related attributes are also defined [9, 45, 46, 55, 85].
Performability attributes originate from a combination of performance and dependability
metrics, by taking into account performance in degraded system states. Integrity is de-
fined as the avoidance of improper alterations of system service (information provided by
the system). Confidentiality means the non-occurrence of unauthorised disclosures of
system service. Maintainability is the ability to undergo repairs and evolution.
Testability is the ability to test for certain attributes within the system.
Analytical modelling for dependability evaluation 2
A system is hopefully built to satisfy all the requirements prescribed for the service it
provides, in other words, to be perfect. However, for complex real systems, it is practi-
cally impossible to give absolute guarantees that the system will properly perform during
its whole life-cycle. This is because faults may randomly affect system components and
eventually lead to unforeseen situations, in which system requirements can not be ful-
filled anymore and improper services are delivered.
The deployment of fault-tolerance techniques is a useful means to limit the potential neg-
ative effects of faults [6, 13, 16, 42, 47, 50, 82, 85]. Redundant components increase the
robustness of the system against faults, thus increasing its level of dependability. Faults
can be detected, their effects masked, and even removed from the system. However,
whichever degree of redundancy is employed, even if extremely unlikely to happen, it is
possible for a system to experience a failure scenario that overcomes its fault-tolerance
capabilities. Moreover, the introduction of redundant components and of their manage-
ment policies further increases the complexity of the system, and exposes new targets to
the negative influences of faults. This poses significant problems for critical systems in
charge of managing valuable resources, such as production plants, transport means or hu-
man lives. For this class of systems, malfunctioning may lead to catastrophic conse-
quences.
The evaluation of dependability attributes can be used for fault-forecasting [5, 9, 13, 15,
32, 41, 47, 74, 85], that is to probabilistically estimate the occurrence of faults and their
impact on the ability of the system to provide a proper service. Hence, the dependability
evaluation represents a suitable means to verify the adequacy of a system design with re-
spect to the requirements given in its specification. System assessment can be performed
using several approaches like testing, fault injection and analytical models, often com-
bined together [48].
The analytical modelling approach is generally cheap for manufacturers and has proven to
be useful and versatile in all the phases of the system life cycle. During design phase,
models allow to compare different solutions and to select the most suitable one (among
those obeying other design constraints), and to highlight problems within the design. This
early validation of the concepts and architectural choices avoids wasting time and re-
sources before realising whether the systems fulfils its objectives or needs some re-design.
Analytical modelling for dependability evaluation 3
In assessing an already built system, models allow to characterise specific aspects, to de-
tect dependability bottle-necks and to suggest solutions to be adopted in future releases.
The modelling also represents an effective tool to foresee the effects of system mainte-
nance operations, and of possible changes or upgrades of the system configuration.
Models of complex systems usually require many parameters (the meaning thereof is not
always intuitive for designers), and require to determine the values to assign to them
(usually by way of experimental tests) which may be very difficult. Moreover, obtaining
values for all the parameters can be impossible during the preliminary design phases of
the system. The sensitivity analyses that can be carried out after modelling allow to iden-
tify the critical parameters out of the many that are employed, that is those parameters to
which the system is highly sensitive. Indeed, sensitivity analysis allows to evaluate a
range of possible system scenarios with varying the values of model parameters, to de-
termine the trends in the consequent variations of dependability figures. On one hand,
this helps in understanding the accuracy level that must be attained while estimating the
parameter values. Since even slight variations of critical parameter values may result in
relevant changes of system dependability attributes, a thorough calibration of such pa-
rameters is necessary to increase the level of confidence that can be put on the depend-
ability evaluation itself. On the other hand, it is possible to point out which parts of the
system are sensitive the most to the parameter variations, and to adopt responses such as
the deployment of adequate fault-tolerance techniques to achieve a proper level of de-
pendability.
Modelling shows also several problems that must be taken under special care. The first
problem is complexity; in fact, although building models of simple mechanisms may be
easy, the overall description of critical complex systems accounting at the same time for
all the relevant aspects is not trivial at all. To master complexity a modelling methodology
is needed so that only the relevant aspects can be detailed still allowing numerical results
to be effectively computable: for instance, if a model specifies too many details, then the
number of related information may explode giving raise to processing problems. In addi-
tion, simplifying hypotheses are very often necessary to keep the model manageable; the
choice of such hypotheses is critical. Making assumptions, on one hand, allows to obtain
simpler models, but, on the other, leads to approximations of the system behaviour. The
Analytical modelling for dependability evaluation 4
resulting error should always be estimated, either through sensitivity analyses or by com-
paring the results returned by the model containing the assumption and a model where it
has been released.
1.2 Dependability modelling methodologies and tools
In the following, we shortly describe the main characteristics of the various classes of
modelling methodologies that have been developed over the last decades to provide de-
pendability engineers the support tools for defining and solving models. A distinction can
be made between methodologies that employ combinatorial models like Fault-Trees and
Reliability Block Diagrams, and those based on state space oriented representations, such
as Markov chains and Petri net models. The combinatorial approaches offer extremely
simple and intuitive methods for the construction and solution of the models. However,
they are inadequate to deal with systems that exhibit complex dependencies among com-
ponents, and can not deal with repairable systems. Therefore, their popularity has been
decreasing at the advantage of the more powerful state space based approaches. Many of
the existing modelling techniques are supported by automated tools like SURF-2 [54],
SHARPE [73], SPNP [25], GreatSPN [20], UltraSAN [76], TimeNET [37], PANDA [4],
for the assisted construction and solution of dependability models.
1.2.1 Combinatorial modelling techniques
In this section we present two of the most commonly used combinatorial modelling ap-
proaches, Fault-trees and Reliability block diagrams.
Fault-trees
Fault-trees are a deductive modelling and analysis technique based on the study of the
events that may impair the dependability of a system [53]. A Fault-tree considers the
combination of events that may lead to an undesirable situation of the system, typically
the delivery of the improper service, if the Reliability of the system is the matter of the
analysis, or a catastrophic failure for a Safety study. Fault-trees are fruitfully employed
during the first design phases of a system, in that they are able to represent at a higher ab-
Analytical modelling for dependability evaluation 5
straction level the various failure scenarios that can affect the dependability figures of the
system.
A Fault-tree is formed by a series of layers of events, connected through logic gates that
represent the two Boolean operators AND and OR. Each outgoing event of a port is ob-
tained by combining the incoming events according to the logic expressed by the gate it-
self. The undesirable event that represents the object of the evaluation occupies the root
of the Fault-tree. The construction of a Fault-tree model follows a top-down approach.
Starting from the root of the Fault-tree, each event is decomposed as a logical combination
of a set of simpler events, until a set of elementary events is found that are not further de-
composed. An event can be classified as elementary either because its occurrence proba-
bility is known, or because it is not possible to carry on the decomposition.
Figure 1.1 shows an example of a Fault-tree, where the failure of the system is the result
of the failure of the components A, B, and C. The event D is an intermediate failure event
representing the failure of the subsystem formed by the two components A and B.
DAND
OR
System failure
Component A fails
Component B fails
Component C fails
Figure 1.1: An example of Fault-tree
The analysis of a Fault-tree model for the evaluation of the probability of occurrence of
the root event is based on the computation of the set of cuts. A cut is defined as a set of
elementary events that, according to the logic expressed by the Fault-tree, leads to the oc-
currence of the root event. For instance, for the example given in Figure 1.1 the set {A,B}
is a cut of the Fault-tree. The numerical solution of the Fault-Tree is performed by com-
puting the probability of occurrence for each of the cuts, and by combining those proba-
bilities to estimate the probability of the root event.
Among the possible cuts of a Fault-tree, the minimal cuts provide additional information
on the dependability characteristics of the modelled system. A cut is said to be minimal if
Analytical modelling for dependability evaluation 6
it does not contain any other cut. The two sets {A,B} and {C} are the minimal cuts for
the example. With the analysis of the minimal cuts, it is possible to identify the simplest
combinations of elementary events that bring the system to the failure. In particular, it is
possible to discover whether the system has single-points of failure, that is minimal cuts
formed by a single element, which may represent dependability bottlenecks.
Reliability block diagrams
Reliability block diagrams [78] are especially meant to evaluate the Reliability related
characteristics of systems composed of multiple components connected in series or in
parallel. The logical relations and the interconnections among components are modelled in
a quite intuitive way.
Each system component is modelled as an atomic entity, graphically represented as a box.
The various components are connected among them in a configuration that can represent
the temporal order with which the system uses the components, or some redundancy
management scheme, or the success criteria of the system. Each system component is as-
signed a probability of working, or the complete time-dependent Reliability function. The
goal is to derive the probability or the time-dependent Reliability of correct operation for
the overall system.
For instance, the Reliability block diagram shown in Figure 1.2 represents a systems that
uses six components to accomplish its duties. Some of the components are arranged in a
parallel combination, such as components A, D, and F, some others in a series, such as
component A and B.
The solution of a Reliability block diagram model is similar to that of a Fault-Tree. In
fact, it is quite easier to compute the Unreliability of the modelled system by considering
the cuts of the model. For a Reliability block diagram, a cut is easily identified as a set of
components whose removal causes the left end of the diagram to be disconnected from
the right end. The overall Unreliability of the system is evaluated from that of the single
cuts. Approximated solution techniques have been defined [34], which provide upper and
lower bounds on the exact results, based on the analysis of the sole minimal cuts of the
Reliability block diagram model.
Analytical modelling for dependability evaluation 7
A B
D
E F
C
Figure 1.2: Example of Reliability block diagram
1.2.2 Markov chains
Markov chains represent a suitable tools for the modelling of a variety of systems of dif-
ferent nature. They combine an extreme versatility with well developed and efficient solu-
tion algorithms, which have been implemented in many automated tools for the evaluation
of performance and dependability related attributes.
Markov chains are a state based formalism, very close to that of the automata. The set of
possible states of a Markov chain is called the state space, and is denoted by S . If the
state space S is a discrete or countable set, we are properly talking of a Markov chain,
and its cardinality is denoted by C . If the state space is a continuous set, then the term
Markov process is more appropriate. A state change of a Markov chain is called a state
transition.
Rigorously speaking, a Markov chain is a particular stochastic process { X ( t ),t ≥ 0 },
which enjoys the following memoryless property, also known as the Markov property:
Definition 1.1: (Markov property) The probability of any particular future be-
haviour of the process, when its current state is known exactly, is not altered by
additional knowledge concerning its past behaviour. In formulae, for any n > 0 and
any sequence of increasing time instants t1 ,t2 ,K ,tn ,tn+1 , the following Equation
holds:
Prob[ X ( tn+1 ) = j| X ( tn ) = in ,X ( tn−1 ) = in−1 ,K ,X ( t1 ) = i1 ] =
= Prob[ X ( tn+1 ) = j| X ( tn ) = in ] , ∀ j ,in ,in−1,Ki1 ∈S
Analytical modelling for dependability evaluation 8
The past history of a Markov chain is thus completely summarised by its current state.
If the exact characterisation of the present state of the process is independent from the
current time, then the Markov chain is said to be time-homogeneous, otherwise it is said
to be a non-homogeneous Markov chain.
The parameter t that indexes the Markov chain can be either discrete or continuous. In
the first case we have a discrete-time Markov chain { X n ,n ≥ 0 }, where state transitions
only occur at discrete points in time, often called steps, whereas in the latter case we have
a continuous-time Markov chain { X ( t ),t ≥ 0 }, and state transitions may occur at any
point in time.
The dynamic behaviour of a discrete-time non-homogeneous Markov chain at time n,
n ≥ 0 can be described by the one-step transition probability matrix Pn = p
i , jn , whose
entries are defined as follows:
p
i , jn = Pr ob[ X n+1 = j| X n = i ], ∀ i , j ∈S
For each pair ( i , j ) of states in S , matrix Pn gives the probability that at time n+ 1 the
Markov chain is in state j, given that it was in state i at time n. For a homogeneous dis-
crete-time Markov chain, the dependency from the time parameter n is not relevant, and
a single one-step matrix P exists.
Similarly, for continuous-time non-homogeneous Markov chains the evolution at time t is
described by the transition rate matrix Q( t ) = qi , j( t ) , also called the infinitesimal gener-
ator matrix of the Markov chain, defined as follows:
qi , j( t ) = lim
Δ→0
Pr ob[ X ( t + Δ ) = j| X ( t ) = i ]Δ
, ∀ i , j ∈S , i ≠ j
qi ,i( t ) = lim
Δ→0
Pr ob[ X ( t + Δ ) = i| X ( t ) = i ] − 1Δ
, ∀ i ∈S
For each pair ( i , j ) of states in S , matrix Q( t ) gives the rate with which the Markov
chain moves from state i to state j at time t. For a homogeneous continuous-time
Markov chain, the dependency from the parameter t is omitted, and a single transition
rate matrix Q is defined.
Analytical modelling for dependability evaluation 9
Because of the memoryless property, each transition from state i to state j of a homo-
geneous continuous-time Markov chain occurs in an exponentially distributed time, and
the rate qi , j is exactly the inverse of the expected time to the transition, that is the rate of
the corresponding exponential distribution. Therefore, in a homogeneous Markov chain,
all the transitions among states occur in a negative exponentially distributed time. This
implies that time needed to perform whichever activity of a system must be modelled
with an exponential transition in the Markov chain model of that system. This is the
most severe constraint that limits the applicability of Markov chains. Whenever an activ-
ity having non-exponential duration is modelled with an exponential transition, an
approximation is introduced in the model and the evaluation results can be significantly
different from the exact ones. Several techniques exist to limit this error, such as the phase
expansion [30], that uses a sequence of exponential stages to approximate a non
exponential random variable.
Modelling the dependability of a system by using a Markov chain requires first a projec-
tion of the set of possible states of the system to define the state space S . Different lev-
els of fulfilment of dependability requirements are associated to the states in S . For in-
stance, a state can be included in S to represent the catastrophic failure of the system, or
a degraded operation mode. Then, the matrix (transition rate or probability) ruling the
transitions of the Markov chain model from the initial state must be defined. The solution
of the Markov chain focuses on the computation of the state occupation probabilities for
some of the states of S .
Homogeneous Markov chains can be solved in a very efficient way in terms of the tran-
sient, also called time-dependent, state occupation probabilities. For a discrete-time ho-
mogeneous Markov chain, we shall denote with π
in the occupation probability of state i
at step n, for each i ∈S and n ≥ 0 , and with rπn the vector that collects the state occupa-
tion probabilities for all the elements of the state space S at step n, for each n ≥ 0 .
Starting from the initial occupation probabilities rπ0 , the transient state occupation prob-
abilities at step n are computed from the n-th power of the one-step transition probabil-
ity matrix, as follows:
rπn =
rπ0 ⋅P
n , n ≥ 0 (1.1)
Analytical modelling for dependability evaluation 10
Similarly, for a continuous-time homogeneous Markov chain, we shall denote with π i( t )
the occupation probability of state i at time t, for each i ∈S and t ≥ 0 . The vector rπ( t ),
which gives the state occupation probabilities of all the elements of state space S at time
t, is computed from the matrix exponential of the transition rate matrix, as follows:
rπ( t ) =
rπ(0 ) ⋅ eQt , t ≥ 0 (1.2)
The pointwise and expected values of the most important dependability measures can be
evaluated from the time-dependent occupation probabilities.
For instance, suppose we are interested in evaluating the time-dependent Availability of a
processor. As a consequence of faults, the processor stops to be operative and reaches a
faulty state in which it is unavailable. The average time to failure of the processor is 1 / λunits of time. Once failed, the processor is put in reparation, and when the repair is com-
pleted, it becomes newly available. The repair activity in average lasts for 1 / μ units of
time. The simple two state homogeneous continuous-time Markov chain shown in Figure
1.3 is a suitable model to represent the evolution of the processor over time. The state
space of the Markov chain obviously represents the two possible states of the processor.
The transition from the operative state to the faulty one in the model occurs in with rate
λ , which corresponds to an average time of 1 / λ spent in state Ok. Similarly, the transi-
tion modelling the repair has a rate μ . The solution of the Markov chain according to
Equation (1.2) returns the time dependent occupation probabilities πOk ( t ) and
πDown( t ) of the two states of the model. Since the processor is only available when it is
in state Ok, the transient state occupation probability πOk ( t ) is exactly the time-depen-
dent Availability we are interested in.
λ
μ
Ok Faulty
Figure 1.3: Markov chain model of processor availability
Analytical modelling for dependability evaluation 11
To evaluate combined metrics such as Performability measures, a reward structure must
also be defined, which assigns reward values to the states of the Markov chain model. We
denote with wi the reward associated to state i, for each i ∈S , and with rw the vector
that collects the reward values of all the states in S . Quite informally, a reward wi can be
assigned to represent the gain achieved by the system for any unit of time spent in state
i. A Performability measures can be computed as the total reward cumulated as the
model evolves over time. This evaluation requires the derivation of the state occupation
probabilities, too. For a discrete-time homogeneous Markov chain, the reward cumulated
over the first n steps is denoted with W( n ), and is evaluated as follows:
W( n ) =
rw ⋅
rπi
h=1
n
∑ , (1.3)
and the reward W( t ) cumulated during the time interval [0 ,t ] for a continuous-time ho-
mogeneous Markov chain is computed as follows:
W( t ) = rw ⋅
rπ( t )
0
t
∫ dt (1.4)
1.2.3 Classical Petri nets
The Petri net modelling paradigm has been developed with the specific purpose of repre-
senting in a compact and clear way concurrence, synchronisation and cooperation among
processes [65]. Very soon, Petri nets have been widely accepted because of their ability
to describe the qualitative and quantitative aspects of complex systems, and also because
of their intuitive and appealing graphical representation.
Place-transition Petri nets
The very basic class of Petri net models is that of place-transition Petri nets [64, 70]. A
place-transition Petri net model comprises a set of places { P1,P2 ,K,Pn } , and a set of
transitions { t1,t2 ,K,tm } . The elements of the two sets are linked by arcs. Arcs going
from a place to a transition are called input arcs, and arcs directed from a transition to a
place are called output arcs. The places that are linked to transition t by an input arc are
Analytical modelling for dependability evaluation 12
called the input places of the transition. Similarly, those places linked to transition t by
an output arc are called the output places of the transition. Arcs have associated a posi-
tive integer weight, called multiplicity. In the graphical representation of the Petri net
model, places are drawn as circles, and transitions are drawn as thin bars, with the input
and output arcs linking them. Places may contain tokens, entities graphically represented
as black dots. The state of the Petri net model is a vector ( m( P1 ),m( P2 ),K,m( Pn ) )
called the marking of the net, and is defined by the number of tokens m( Pi ) in each place
i of the model.
The model elements introduced above define the static structure of the Petri net. Besides,
there is a dynamic evolution of the net, which evolves from an initial marking denoted
with rm0 , to reach other markings. A transition t is said to be enabled in marking
rm if
each of its input places contains at least as many tokens as the multiplicity of the corre-
sponding input arc. An enabled transition can fire, and the marking of the net is changed
by removing from each input place of the transition as many tokens as the multiplicity of
the corresponding input arc, and adding to each output place as many tokens as the mul-
tiplicity of the corresponding output arc. A place may be an input place for more than a
single transition. This leads to a competition of the enabled transitions for the tokens con-
tained in the input places. When two enabled transitions share an input place and the
number of tokens therein is not sufficient for both of them to fire, the transitions are said
to be in conflict, and a selection rule must be employed to break the competition in favour
of one of them. Usually, a priority is assigned to solve the conflicts.
The definition of a Petri net model to represent the evolution of a system requires a little
experience, but is usually a quite intuitive process. Tokens can be used to represent enti-
ties of the system, such as tasks to be executed, messages to be sent, or control elements
to be passed to other entities such as a capability. A number of tokens in a place is put in
correspondence to a possible state of the system to be modelled. A transition models an
activity or a synchronisation constraint of the system, and the firing rules define the pre-
conditions to be satisfied for the activity to be executed or the synchronisation to be
completed, respectively. As the enabled transitions fire, the marking of the net is changed
and other transitions can get enabled. Thus, a set of different markings, called the reacha-
Analytical modelling for dependability evaluation 13
bility set of the Petri net model, is generated. The reachability set can depend on the ini-
tial marking m0 of the Petri net model, and is denoted by RS( m0 ).
The dynamic evolution of the model can be described by a direct graph, called the reach-
ability graph and denoted by RG( m0 ), whose nodes are the markings in RS( m0 ), and
whose arcs are labelled with the transition that causes the corresponding marking change.
It is worthwhile observing that the number of markings in the reachability set RS( m0 )
can be a huge number even for a very simple Petri net. It is thus evident that Petri net
models can provide a concise formalism to represent complex systems with a big number
of states.
The model shown in Figure 1.4 provides is a well-known example to understand the
simplicity and the expressiveness of the place-transition Petri nets. The model represents
a semaphoric mechanism for the mutual exclusion access to a resource shared by two con-
current processes. The evolution of the first process in represented by the thread on the
left side, formed by place P1, transition t1, place P4 and transition t3. Exactly one token
circulates on the places along this thread, modelling the two possible states of the pro-
cess, that is the request of the resource to be granted and the exclusive use of the resource.
1
t1
P
2P
3P
4P 5P
t2
t3 t4
use of the resource
release
resource request
acquisition
semaphore
Figure 1.4: Petri net model representing a mutual exclusion problem
Transition t1 represents the await for the resource acquisition, and t3 the resource
release. Similarly, the thread on the right side formed by place P3, transition t2 , place P5
and transition t4 models the evolution of the second process. The state of the resource is
Analytical modelling for dependability evaluation 14
modelled through the place P2 : when a token is in P2 the resource is free and when the
place is empty, the resource has already been assigned. Therefore, a process requiring the
resource is forced to wait for the enabling of the acquisition transition, which only hap-
pens when the resource is free. Notice that the model in Figure 1.4 represents in a very
simple way the non determinism that arises as a consequence of to the true concurrent ex-
ecution of the two processes.
The class of place-transition Petri nets do not include any notion of time. In fact, they
have been introduced to model qualitative rather than quantitative aspects of systems. By
analysing the reachability graph RG( m0 ) generated starting from the initial marking m0 ,
it is possible to verify a set of structural properties of the Petri net model, and infer a set
of corresponding properties on the modelled system. For instance, the absence of dead-
lock states of the system can be checked by verifying that the model never reaches an ab-
sorbing marking, that is a marking in which no transitions are enabled. Also, it is possible
to check that certain activities of the system are always executed in a given order, by veri-
fying the firing order of the corresponding transitions modelling those activities in the
Petri net model. For instance, for the model in Figure 1.4, it is possible to verify that the
acquisition of the resource is always followed by a release of the resource.
Stochastic Petri nets
A very popular timed extension of the place-transition Petri nets is the class of stochastic
Petri nets [57], denoted as SPNs in the following. In a SPN model, each transition t has
an associated random firing delay whose probability density function is a negative expo-
nential with rate λ . These timed transitions are graphically represented by empty boxes.
To simplify the discussion, a transition having a random firing delay with a negative ex-
ponential distribution will be called an exponential transition, without any ambiguity.
The introduction of the firing delay imposes the definition of a more precise firing rule.
The enabling of a transition is exactly defined as it was in the case of the place-transition
Petri net models. As soon as a transition t gets enabled, a random firing time is sampled
from the distribution associated to t , and a timer starts counting form that time downto
zero. Transition t fires if and only if it remains continuously enabled until the timer
reaches zero. When transition t fires, the tokens are removed from its input places and
added to the output places in a single atomic and instantaneous operation (atomic firing
Analytical modelling for dependability evaluation 15
rule). It is interesting observing that in the time interval between the enabling and the fir-
ing of t , other transitions sharing some input places with t can get enabled and fire with-
out disabling it, provided that there is a sufficient number of tokens in the common input
places. On the contrary, in the case of a conflict, the transition whose timer reaches zero
the first is the one that fires (race model). It is also important to notice that the use of ex-
ponential distribution relieves the user from the specification of the behaviour of those
transitions that do not fire after having been enabled. Indeed, thanks to the memoryless
property of the exponential distribution, whether the memory of the time they have al-
ready been enabled is kept or not, the remaining time to the firing is exponentially dis-
tributed with the same rate.
It is easy to realise that the evolution of a SPN model can be represented by a continu-
ous-time homogeneous Markov chain, whose state space elements are in a one-to-one
correspondence with the elements of the reachability set RS( m0 ), and whose transitions
rates among states are equal to the firing rates of the transitions that produce the corre-
sponding marking change in the SPN. Thus, the associated Markov chain is indeed iso-
morphic to the reachability graph RG( m0 ). For instance, Figure 1.5 shows a SPN model
and the corresponding Markov chain model. The SPN is a simple model of a two proces-
sors system, where each processor fails in a time that is exponentially distribute with rate
λ , and is repaired in an exponentially distributed time of rate μ . The number of tokens
in place Up represents the number of healthy processors, and that in the Down place the
number of faulty processors. The two exponential transitions t1 and t2 model the failure
and the repair processes. Notice that the firing rates of the two transitions are dependent
on the number of tokens in the respective input places.
Down
t
Up
t
1 2λ μm(Up)
λ 2μ
2,0
1,1
2λ
0,2
μ
m(Down)
Figure 1.5: SPN model and the corresponding Markov chain
Analytical modelling for dependability evaluation 16
An SPN model can be solved in terms of the marking occupation probabilities by per-
forming the analysis of the associate Markov chain. As an example, suppose we want to
evaluate the probability that no processors are available at time t for the simple system
modelled by the SPN in Figure 1.5, that is we are interested in the time-dependent prob-
ability that the marking of the net is ( 0,2 ), given that the marking of the net at time 0 is
m0 = ( 2,0 ), that is both the processors are healthy. To answer this question, we can gen-
erate the reachability graph RG( m0 ) by examining all the possible firing sequences of the
transitions, and then build the transition rate matrix Q for the Markov chain. By apply-
ing Equation (1.2) we can evaluate the transient state occupation probability for the state
( 0,2 ) of the Markov chain, which is exactly the same as the occupation probability of
marking ( 0,2 ) of the Petri net model.
A reward structure can be associated to the markings of a SPN model, to evaluate some
Performability metric of interest. The reward values assigned to the various markings
need to be translated into the corresponding rewards over the state associated Markov
chain. The evaluation of the cumulated reward is the performed according to Equation
(1.4).
Generalised stochastic Petri nets
In a SPN model, only exponential transitions are allowed. This can be inconvenient when
we want to use a transition to represent a synchronisation between multiple concurrent
flows, because in this case there is not a time associate to the synchronisation, but rather
the transition is enabled by the occurrence of a particular event. For instance, an expo-
nential delay could be assigned to the two transitions that model the use of the resource in
the model in Figure 1.4, but it is hard to figure out a meaningful delay to associate to the
two transitions modelling the resource acquisition. Moreover, we are often interested in
modelling activities that require a negligible amount of time to be performed.
The class of generalised stochastic Petri nets, denoted by GSPNs [2], relaxes the assump-
tion that all the transitions have an exponentially distributed delay, and allows for expo-
nential transitions, and for instantaneous transitions as well, that is transitions that once
enabled fire in zero time. Conflicts among timed transitions are solved according with the
same race model as in the case of SPNs, whereas conflicts among instantaneous transi-
Analytical modelling for dependability evaluation 17
tions are solved by a priority assignment, and by associating weights (or probabilities) to
instantaneous transitions at the same priority level.
The solution of a GSPN model resorts again to that of an associate Markov chain.
However, for GSPN models, the reachability set elements are not in a one-to-one corre-
spondence with the state of the associate Markov chain. Indeed, because of the instanta-
neous transitions, some of markings in RG( m0 ) have a zero sojourn time, that is the
GSPN model spends a zero time therein. These markings are called the vanishing mark-
ings of the GSPN model, whereas the non vanishing markings are often called tangible
markings. Nevertheless, it is possible to operate a reduction of the reachability graph
RG( m0 ) to eliminate the vanishing markings, and to obtain the reduced reachability
graph RRG( m0 ). The reduced reachability graph is isomorphic to a Markov chain, and
the reduction procedure does not affect the equivalence between the non vanishing mark-
ing occupation probabilities of the GSPN and the state occupation probabilities of the
Markov chain. Therefore, the Markov chain associated to the reduced reachability graph
RRG( m0 ) can be solved to study the GSPN model evolution over time.
1.2.4 Non Markovian Petri nets
The standard definition of GSPNs implies that all the timed activities associated to the
transitions have an exponentially distributed duration, so that the evolution of the model
can be mapped into a continuous-time homogeneous Markov chain. As we already
pointed out, this constraint implies that an approximation is introduced into the models
whenever an activity with non-exponential duration must be represented. Recently, sev-
eral classes of Petri nets have been defined, which allow for transitions whose firing times
can be drawn from non-exponential distributions. These new classes of Petri nets are col-
lectively called non Markovian Petri nets [11]. Among non Markovian Petri nets, we con-
sider particularly interesting for the purposes of this dissertation the class of determinis-
tic and stochastic Petri nets [1], denoted by DSPNs, and that of Markov regenerative
stochastic Petri nets [21], denoted by MRSPNs.
Deterministic and stochastic Petri nets
DSPNs have been introduced as an extension of GSPNs, to allow the modelling of events
having deterministic occurrence times [1]. The set of transitions of a DSPN can be parti-
Analytical modelling for dependability evaluation 18
tioned into three disjoint sets: the set of instantaneous transitions, represented by a thin
bar, the set of transitions having exponentially distributed firing times, represented by
empty rectangles, and the set of transitions with deterministic firing times represented by
filled rectangles. This enriched set of possible transitions offered by DSPNs allows the
exact modelling of a wider set of system features, such as time-outs and the message
propagation delays in synchronous systems. Repair delays represent another example of
activities that are typically more accurately modelled by deterministic transitions than by
an exponential ones. Furthermore, several classes of PMSs, such as those employed in
space applications, have a preplanned deterministic duration of the phases they perform,
a feature whose exact modelling requires the introduction of deterministic delays.
The DSPN model shown in Figure 1.6 is a classical example very often encountered in the
literature. It represents the queuing model denoted as M / D / 1 / 2 / 2 [43, 85], where
two customers require a service of deterministic duration, modelled by transition t3. After
having been served, each customer spends a period of time modelled by the exponential
transition t1 before reapplying for the service. The instantaneous transition t2 together
with the control place P3 is used to model the exclusive access to the service.
P
P
PP1 2
3
4t tt1 2 3
Figure 1.6: DSPN model of the M / D / 1 / 2 / 2 queuing system
Unfortunately, the analytical solution of a DSPN model is not possible in general. Indeed,
the deterministic distribution do not enjoy the Markov memoryless property, and the
time-dependent evolution of the model requires to keep track of much additional informa-
tion [36], which greatly complicates the analysis. However, the analytical tractability is
guaranteed for the subset of DSPN model whose structure satisfies the following as-
sumption:
Assumption 1.1: at most one deterministic transition is enabled in each of the pos-
sible markings of the DSPN.
Analytical modelling for dependability evaluation 19
This hypothesis severely limits the expressiveness of DSPN models. Quite recently,
some attempts have been made to relax Assumption 1.1 [36, 67].
Provided that Assumption 1 holds, the time-dependent marking occupation probabilities
of a DSPN model can be analytically evaluated by resorting to the theory of the Markov
regenerative processes [26], denoted by MRGPs in the following. The definition of
MRGP is based on Markov renewal sequences [26, 72].
Definition 1.2: Let S be a discrete state space. A sequence of bivariate random
variables { ( Yi ,Ti ),i ≥ 0 } is called a Markov renewal sequence if:
1) T0 = 0 , Ti+1 ≥ Ti , and Yi ∈S ∀ i ≥ 0
2) Prob[Yi+1 = ′m ,T i+1 −T i ≤ t Yi = m,T i ,Yi−1 ,T i−1 ,K ,T0 ,Y0 ] =
= Prob[Yi+1 = ′m ,T i+1 −T i ≤ t Yi = m] =
= Prob[Y1 = ′m ,T 1 ≤ t Y0 = m] , ∀ m , ′m ∈S , ∀ t ≥ 0
The sequence { Ti ,i ≥ 0 } identifies a series of time instants. The first equation of point 2)
states that the discrete-time stochastic process {Yi ,i ≥ 0 } enjoys the Markovian prop-
erty of absence of memory at each time instant Ti , i = 1,2,K,n . This is the reason why
the word Markov appears in the name of the bivariate sequence { ( Yi ,Ti ),i ≥ 0 }.
Moreover, the second equality of point 2) says that process {Yi ,i ≥ 0 } is time-homoge-
neous, and its probabilistic evolution restarts unchanged after each of the time instants
Ti , that is it represents a renewal instant for the process.
Definition 1.3: A stochastic process { Z( t ),t ≥ 0 } is called a MRGP if there exists
a Markov renewal sequence { ( Yi ,Ti ),i ≥ 0 } of random variables such that all the
condi t iona l f in i te d i s t r ibu t ions of { Z( Ti + t ),t ≥ 0 } g i v e n
{ Z( u ),0 ≤ u ≤ Ti ,Yi = m } are the same as those of { Z( t ),t ≥ 0 } given Yi = m .
In simpler words, a stochastic process { Z( t ),t ≥ 0 } is a MRGP if it is possible to find
an embedded Markov renewal sequence, that is a sequence of time instants such that the
properties stated by point 2) of in Definition 1.2 hold of the state of process
{ Z( t ),t ≥ 0 }. The time instants defined by the renewal sequence are also known as the
regeneration points of the MRGP.
Let { m( t ),t ≥ 0 } denote the marking process of the DSPN, that is stochastic process
formed by the changes of the tangible markings over time, and let S be the state space of
Analytical modelling for dependability evaluation 20
such process { m( t ),t ≥ 0 }. Note that, due to the existence of deterministic transitions,
this stochastic process is not a Markov chain. Indeed, it has been shown by Choi,
Kulkarni and Trivedi in [22] that, if Assumption 1) holds, the marking process
{ m( t ),t ≥ 0 } of a DSPN is a MRGP. In the following, we briefly recall that proof.
A Markov renewal sequence { ( Yi ,Ti ),i ≥ 0 } embedded in the marking process is defined
as follows. For i = 0 , set T0 = 0 , and suppose m ∈S is the marking such that
m = m( Ti+ ), that is the marking of the model immediately after Ti . If m is an absorbing
marking, then set Ti+1 = +∞ . If m is not absorbing, and no deterministic transitions are
enabled in m , define Ti+1 as the first time after Ti that a marking change occurs. If one
deterministic transition is enabled in marking m , define Ti+1 to be the time when such
transition fires or is disabled. Now, define the sequence {Yi ,i ≥ 0 } as Yi = m( Ti+ ),
i ≥ 0 . It can be proved that the so defined bivariate sequence { ( Yi ,Ti ),i ≥ 0 } embedded
in the marking-process { m( t ),t ≥ 0 } of the DSPN model is a Markov renewal sequence,
according to Definition 1. Moreover, it is easy to show that the stochastic behaviour of
the marking process from time Ti onwards { m( Ti + t ),t ≥ 0 } depends only on Yi = m ,
which is the marking of the DSPN at time Ti , and according to Definition 1.2, this implies
that the marking process is a MRGP.
The solution of the DSPN in terms of the time-dependent marking occupation probabili-
ties can exploit the general transient analysis technique developed for the MRGPs.
Consider the time-dependent marking occupation probability of marking ′m at time t ,
conditioned to the initial marking m , and denote it vm , ′m ( t ), m , ′m ∈S , t > 0 . Moreover,
denote with V ( t ) = vm , ′m ( t ) the transient probability matrix that collects these condi-
tional probabilities. Once matrix V( t ) is known, the vector of rπ( t ) of the unconditional
marking occupation probabilities is simply evaluated as rπ( t ) =
rπ( 0 )V( t ), for a given
initial marking occupation probability vector rπ( 0 ).
Obtaining matrix V( t ) requires first to compute two fundamental matrices, called the
kernels of the MRGP. The matrix of functions K( t ) = km , ′m ( t ) , whose entries are de-
fined as km , ′m ( t ) = Prob[Y1 = ′m ,T 1 ≤ t Y0 = m] , m , ′m ∈S , and t > 0 , is the global
kernel of the MRGP. This matrix accounts for the evolution of the MRGP between two
successive regeneration points. The matrix of functions E( t ) = em , ′m ( t ) , whose entries
are defined as em , ′m ( t ) = Prob[m( t ) = ′m ,T 1 < t Y0 = m] . m , ′m ∈S , and t > 0 , is the
Analytical modelling for dependability evaluation 21
local kernel of the MRGP, which describes the evolution of the process within two suc-
cessive regeneration time instants. For the sake of conciseness, denote with K∗V( t ) the
matrix whose element m , ′m is defined as follows:
k∗vm , ′m = dkm ,h( t )vh , ′m ( t − u)
0
t
∫h∈S∑ , m , ′m ∈S , t ≥ 0
that is K∗V( t ) is the matrix of functions of t whose generic element is obtained with the
row by column convolution of the kernel matrix K( t ) and matrix V( t ), rather than with
the usual multiplication operator. According to the MRGP theory [26], matrix V( t ) can
be obtained by solving the following generalised Markov renewal equation:
V( t ) = E( t )+ K∗V( t ) (1.5)
Markov regenerative stochastic Petri nets
MRSPNs are a class of timed Petri nets that further extend the DSPNs. The definition of
MRSPNs is indirectly given in terms of a property of their marking process, as follows:
Definition 1.4: a Petri Net belongs to the class of the MRSPNs if its marking pro-
cess is a MRGP.
Definition 1.4 does not give per se an immediate description of the modelling features al-
lowed for the MRSPNs. Since Markov processes are special cases of MRGPs, the SPN
and GSPNs classes of Petri nets belong in fact to the MRSPN class. Also, the set of
DSPNs for which an analytic solution method exists is included in the MRSPN.
Moreover, MRGPs comprise the class of semi-Markov processes, too, for which activi-
ties having generally distributed duration are allowed [26, 33, 72]. Thus, MRSPN models
may include instantaneous, and exponential transitions, as well as transitions having gen-
erally distributed (not instantaneous nor exponential) firing times. We shall refer to this
last type of transitions with the name general transitions.
In fact, because of their generality, it is not easy to check whether a given timed Petri net
model is a MRSPN or it is not. In general, the marking process must be inspected to ver-
ify the existence of a Markov regenerative sequence, but this can be a very inefficient
check for complex models. A possible way to build Petri net models that are actually
Analytical modelling for dependability evaluation 22
MRSPNs is to prescribe well-defined constraints on the net structure, in such a way that
a Markov regenerative sequence is surely found in the underlying marking processes. A
sufficient assumption to be made on the net structure, which guarantees that the underly-
ing marking process in an MRGP, is the following one:
Assumption 1.2: at most one general transition is enabled in each marking of the
Petri Net.
In this case, the Markov property holds (at least) each time a general transition fires or it
is disabled. The property stated by Assumption 1.2 is usually adopted [21] as an easy-
to-check condition on the existence of the MRGP structure in the underlying marking
process, but obviously restricts the modelling capabilities of MRSPNs. In particular, it
prevents the concurrent enabling of generally distributed transitions. Less restricting con-
ditions, still sufficient for the existence of a Markov renewal sequence embedded in the
marking process, can be given for particular classes of MRSPN models having particular
structural properties [12].
The transient solution of a MRSPN model follows the same procedure as that described
for the DSPN class of Petri nets. In fact, they both resort to the MRGP transient solu-
tion techniques. However, the analytic expressions of the kernel matrix entries depend on
the specific firing time distributions of the general transitions.
Chapter 2
Phased mission systems
With the increasing complexity and automation encountered in systems of the nuclear,
aerospace, transportation, electronic, and many other industrial fields, the deployment of
processing systems in charge of performing a multiplicity of different control and compu-
tational activities is becoming common practice. Very often, the system and its external
environment can be altered still during the operation, in a way that the behaviour of the
system during a time interval may be completely different from that within other periods.
This time-dependent characteristic of a wide class of systems has led to the concept of
phased mission systems. In the following, the term system is intended to describe the
physical system, composed of computing resources, either hardware and software, to be
distinguished from the external environment the processing system is embedded in. A
phase is one of the time periods between two consecutive changes of the system or of its
environment, and mission refers to the overall duty the system is intended to execute.
Following this terminology, the mission of a PMS over time can be partitioned in a se-
quence of disjoint consecutive phases. In the following, we describe in detail the particu-
lar features of PMSs, especially dealing with the changes responsible for the phased be-
haviour. Then, we give a set of examples of PMS applications, to clarify the application
domains for which the study of PMSs is of interest. The principal dependability measures
of interest for a PMS are after that defined, and finally a application example is intro-
duced, which exhibits most of the peculiar features of PMS and that represents an inter-
esting case study to compare the various modelling and evaluation methodologies that
will be considered throughout the dissertation.
Phased mission systems 24
2.1 Peculiar characteristics of PMSs
We consider systems that consist of multiple components (hardware, software, mechani-
cal, etc.). Each component may be subject to faults, as a consequences of damages affect-
ing its internal subcomponents, or due to a negative influence of a stressing external envi-
ronment. Faults may result in complete failures or degraded performing modes of the af-
fected components. A failure criterion is a function of the state of the system, which de-
fines the accomplishment level of the system on the basis of the failed components. A
system configuration is a subset of the available components, arranged in an operational
configuration to execute a given workload for the accomplishment of a duty of the sys-
tem.
A system configuration, a specific workload, a state of the external environment, and a
failure criterion are the defining elements for a single phase system, the typical class of
systems usually taken into account in performance and dependability modelling studies.
In the context of a single phased system, all the above elements are assumed to be con-
stant during system operation, or at most some parameters are subject to slight variations.
A PMS is a generalisation of a single phase system, which completely relaxes this invari-
ance assumption. As a consequence, PMSs must be able to react to the changing system
scenario in order to best utilise their resources for the accomplishment of the mission
purposes.
Consider first the changes that occur in a PMS as a result of a variable workload. During
its mission, the system may be required to execute various different sets of application
tasks, thus defining an equivalent number of different phases. A particular workload has
well-definite performance requirements and thus necessitates of a specific set of re-
sources for its execution. Thus, changes in the incoming workload usually turn out in
changes of the operational configuration of a PMS. Furthermore, for a particularly de-
manding phase during which many resources are utilised, the failure criteria of the PMS
are more stringent than for a less computationally intensive one.
The activities performed within the various phases can be associated with distinct criti-
cality level, the dependable execution of certain phases being much more relevant than
that of other phases, during which the system performs non vital functionalities.
Phased mission systems 25
Moreover, even if the workload does not vary from a phase to another one, the relative
criticality levels of the various tasks may change, and a given task that was of secondary
relevance while within an earlier phase might become primarily important during a later
one. As an instance, consider the execution of the flight altitude control tasks for an
avionics PMS, for which the delivery of erroneous results during the landing phase is
much more dangerous that during the cruise phase. To guarantee an adequately depend-
able execution of critical tasks, fault-tolerance techniques are employed, which resort to
the deployment of redundancy [8, 16, 47, 78]. As it is common practice for critical sys-
tems, PMSs are thus equipped with redundant resources, and the operational configura-
tion may be rearranged to properly support the different dependability requirements of
the application tasks. Obviously, the failure criteria of a PMS during a critical phase are
more stringent than during non those critical ones.
The external influence of the environment is a relevant point for the analysis of PMSs,
which are very often required to operate in particularly stressing conditions. The negative
impact that intense mechanical or electromagnetic solicitations have on computing sys-
tems leads to dramatic increases in the failure rates of components, and can not be ne-
glected without compromising the accuracy of the evaluation outcomes. Thus, a further
partition of the mission into different phases becomes necessary to properly characterise
this time-dependence of the failure processes. Moreover, due to the increased failure rate,
the overall robustness against faults might be reduced. The fault-tolerance provisions that
are sufficient during periods of normal solicitation of the external environment may be
not adequate to ensure a proper dependability during the occurrence of such hazardous
situations. Therefore, a reconfiguration of the PMS to increase the amount of resources
for the execution of critical tasks may be useful to face the accelerated wearing of the
system components.
A last point that affects the evolution of a PMS over time is the varying availability of
system resources. Due to the occurrence of faults, the number of correctly performing
components is not constant. When a component is diagnosed as faulty, it is removed
from the pool of available resources, and the workload it had been assigned must be re-
distributed. Such an event has the potential to trigger a reconfiguration of the system, and
consequently the beginning of a new phase of the PMS. Some of the failed components
Phased mission systems 26
can be recovered and reintegrated among the available resources. This can be a result of
either an explicit intervention, like the replacement or the repair of the component, or an
implicit repair, as in the case of transient faults, whose influences last for a very short pe-
riod of time. Similarly as for the case of a removal, a reinsertion of a component may
cause a reconfiguration of a PMS.
A typical PMS starts its operations by executing the first phase, when completed it
switches to the second, and so on. The mission is completed when the last phase is com-
pleted. In case a failure of the system occurs during a phase, that is any of the failure cri-
teria is met, the overall mission fails. However, it is possible for a phase to be unsuccess-
fully executed, in a sense the goals of the phase are not achieved or are only partially
achieved, without compromising the integrity of the mission as a whole.
Phase changes are usually assumed to be instantaneous activities. Moreover, those activi-
ties performed at phase change times, as the reconfigurations of the system, usually take
a negligible amount of time. In the case when the system spends a non-negligible amount
of time during those changes, a set of fictitious phases could be introduced to represent
the phase changes themselves as tasks of the mission.
The selection of the next phase to be performed can be completely preplanned before the
starting of the PMS operation. In this static scenario, a predictable sequence of phases is
executed. Each phase may have a fixed or a random duration, depending on the events
that trigger the beginning of the next phase. Fixed duration is typical of phase changes
that result from predefined variations of the workload, or foreseen fluctuations of the ex-
ternal environment influence. Conversely, unexpected events such as a fault leading to
the removal of a system component, or the arising of an emergency situation of operation,
turn out in phases of random duration, whose initial and ending time instants may be un-
known.
More flexible missions of PMSs are also of interest in the analysis of PMSs. For instance,
consider a mission with multiple objectives, where some less relevant phases could be
sacrificed at the expenses of a more important one. An already initiated phase could be
aborted or shortened if the PMS reaches a dangerous condition, as an insufficient re-
source availability, or an excessive stress from the external environment. The profile of
Phased mission systems 27
the mission would be in all these cases represented by a tree of possible choices, rather
than a chain as it is in the case of a predefined sequence of phases.
2.2 Examples of PMSs
A number of examples of PMSs can be found in various application domains. Before
starting with a list of some of them, we would like to point out that a few studies have
been published in the literature showing the applicability of the PMS theory to the mili-
tary field. We will intentionally ignore this branch of the research on PMSs, for it does
not fit with our idea of scientific research. We strongly believe that the interest for the re-
sults of this dissertation will be not at all affected by this choice.
A typical class of PMS that is frequently found in the literature, is represented by the on-
board systems for the aided-guide of aircrafts. Big plane producers companies, like the
American Boeing, are investing relevant resources for the dependability assessment of
their products [80]. The mission of a flight control system is divided in phases as take-
off, ascent, cruise, approach and landing. These phases have completely different de-
pendability requirements, with the take-off and landing phases being the most critical
ones. The duration of most of the phases, including the critical ones, is known in ad-
vance, but a certain randomness is considered to account for the influence of the external
environment, specifically the weather condition, at the terminal airport.
A PMS analysis can be also applied to study the dependability features of the aircraft
during its whole life-cycle, to establish a proper schedule of maintenance operations. In
this context, we talk of the Scheduled Maintenance System (SMS) problem. The system
is intended to be used during its life-time for multiple missions. Within each mission, the
system behaves as a PMS. The system is run for a finite duration, and then it has to pass a
maintenance check. Typically, it is the case that after a prefixed number of missions the
system is completely checked, so that all its components are as good as brand-new ones
after the check. However, other kinds of maintenance actions can be performed between
these major checks. For instance, some highly critical components could be checked and
possibly repaired after each mission, and some others could be replaced after each mis-
sion even if they are still working. All these maintenance actions can be thought as re-
Phased mission systems 28
configurations of the system, and the same analysis framework of the PMSs adopted to
study the SMS problem. Obviously, the study of system dependability can be limited to
the period between two complete checks of the system, because after the check, the sys-
tem is reset to its initial conditions.
Another typical example of PMS is provided by the computing systems embarked on
long-life spacecraft, which must survive long periods of very low activity, and then are
required to achieve the main scientific goals of the mission in a short period of intense
solicitation. The phases for a spacecraft typical mission include: launch, hibernation, and
operational phases such as orbital manoeuvres, rendez-vous and fly-byes in proximity of
planets and other space objects. Performance and dependability requirements are sensibly
different from phase to phase. The launch phase is an example of extremely stressing
phase due to the intense mechanical solicitations. Hibernations are long dormancy periods
usually entered for cruise navigation, characterised by a minimal level of activity. The
spacecraft stays in hibernation for most of the mission, and only leaves it for short time
periods, in which the operational phases are performed. For instance, during a planet ren-
dez-vous phase the spacecraft performs gravity assist manoeuvres to increase its speed.
Afterwards, it enters a new hibernation phase till the next wake-up. To survive a mission
that lasts for several years, space PMSs are equipped with many redundant components.
Besides providing a high availability, these redundant resources are employed to meet the
dependability requirements of the critical operational phases.
Spacecrafts represent a very interesting class of PMSs because of their multi-objective
missions. Different goals are defined, some of primary and others of secondary relevance.
Phases performed to reach secondary goals can be skipped in favour of the more impor-
tant primary ones. As an instance, a phase could be skipped if the current spacecraft
hardware configuration is already degraded due to component failures experienced in the
past. Reaching a secondary goal might yield to an unreliable execution of a more impor-
tant subsequent phase. An example of this flexible mission can be found in a research
performed within the framework of the ESPRIT project 20716 GUARDS, Generic
Upgradable Architecture for Real-Time Dependable Systems [66]]. This research is ori-
ented at the study of the European Space Agency mission Rosetta, and was started by the
aerospace French company Matra Marconi Space, an end-user of the GUARDS project.
Phased mission systems 29
Rosetta is a unmanned spacecraft for scientific research whose primary goal is a rendez-
vous with the comet Wirtanen in 2011. Secondary objectives include two fly-byes with
asteroids during the 9-years cruise to reach the comet.
Besides these classical examples of PMSs, there are many other kinds of systems not
generally classified as such, that could be conveniently and in a completely natural way
reformulated as PMSs, and then modelled and solved with the specific methodologies for
PMSs. More precisely, we are considering systems having a phased mission, for which
the phase ordering is dynamically determined by the environment the system is embed-
ded in, or by particular sequences of faults.
Real-time systems with “multi-moded” behaviour [14, 19] represent good examples of
these PMS-like systems. These real-time systems behave in different modes, each mode
being characterised by a fixed incoming workload and a constant availability of system
resources. Each mode can be naturally associated to a distinct phase. Changes from a
mode to another one occur as a result of workload variations. Also, faults affecting sys-
tem resources may cause mode changes because they may lead to the failures and the
subsequent unavailability of resources.
The classical “server with vacation” model, which is very often encountered in the litera-
ture, is another example of a system that is readily reformulated as a PMS. The server
with vacation is a simple single server queuing model, where the server cyclically alter-
nates normal operative periods with vacation periods. During a vacation period, cus-
tomers are not served and the queue length increases. The duration of both the operative
and the vacation periods is usually randomly distributed. A simple PMS model of the
server with vacation would consider two phases of random duration, cyclically executed.
2.3 Dependability measures for PMSs
Different classes of dependability measures are considered for PMSs. A first spontaneous
question a PMS engineer or a user might be ask is: “What is the probability that the mis-
sion will be completed successfully?”. To answer such a question, it is first necessary to
precisely define a success criterion for the mission of the PMS. Suppose a PMS is re-
quired to execute a certain number of phases selected somehow from a set of possible
Phased mission systems 30
phases numbered {1,2,K ,n}. The criterion usually considered is the complete fulfilment
of the goals of the mission during all of the n phases, but it is also possible to give more
stringent requirements, such as the completion of the n phases within a given time-out
and with a particular operational configuration, or less demanding requirements such as
the fulfilment of at least n© out of the primary goals of the mission, and so on.
Most of the questions about a PMS can be answered by evaluating the so-called
Reliability at mission completion time, that is the probability, denoted with R throughout
this dissertation, that the PMS is in a successful condition (a condition that fulfils the suc-
cess criterion) at the time the last phase ends. This measure, which can also be variously
called probability of survival, or mission Reliability, is a reasonable way of stating re-
quirements for some classes of PMS. The computation of R requires to model the tran-
sient evolution of the PMS during the whole mission, to keep track of the events that can
impair its ability to perform properly, and of the healthy/failed state of the components.
A more complete information can be obtained by evaluating the Reliability of the PMS at
each phase completion time, denoted as Ri, i = 1,2,K ,n, hereafter. This kind of evalua-
tion can be useful to understand the relative criticality of each single phase within the
PMS mission. Drops in these Reliability figures help in pointing out possible bottleneck
phases, those to which the overall Reliability at mission completion time R is sensitive
the most.
The complete Reliability function, defined as the probability that the PMS is successfully
performing at time t, for any time t ≥ 0 before the end of the mission, will be denoted
with R( t ). This real-valued provides an accurate description of the PMS evolution over
time. It is worthwhile observing that, as a result of the different failure criteria and recon-
figurations of the various phases, function R( t ) may be a discontinuous one.
Another kind of measure that can be interesting for PMSs is a performability metric [9,
14, 55, 56, 62, 75, 82], which takes into account a measure of the total value produced by
the system in the mission. Performability oriented measures are especially appropriate for
multi-objectives missions of PMSs, where a range of scenarios is possible between com-
plete failure bringing no benefits and completely successful operation. For such PMSs, a
client might be interested in the expected utility produced over a mission. For performa-
bility modelling, a reward structure must be first defined, by assigning values to the dif-
Phased mission systems 31
ferent phases and/or to different activities/states of the PMS inside a phase. A reward as-
sociated with a state of the phase model represents a benefit that is obtained by the PMS
for the time spent in that state. For instance, suppose a state models an operative configu-
ration for the sampling of relevant data in a PMS for scientific purposes. The longer the
time spent in that configuration, the bigger the amount of information collected. If we are
interested in quantifying a measure of the value achieved with the gathered data, we can
assign a reward w to the state representing the configuration, and evaluate the total re-
ward cumulated while within the phase. A reward value can be also assigned to a transi-
tion, meaning that the benefit is obtained with the completion of the activity the transition
represents.
Different applications require different reward functions. A reward function must be cho-
sen to be realistic for a specific real-world situation, and no one reward model is appro-
priate in general. Even considering only additive reward structures, we have some appli-
cations where the cost of a mission failure is large compared to the gain from a successful
mission. For instance, in a civil avionics system, where each flight constitutes a mission,
the failure of a mission implies a cost varying between that of aborting a flight and
rescheduling or compensating the passengers, to that of a crash. A completely different
scenario would be a deep space probe, where, after the initial investment, each unit of
time of productive flight until the spacecraft fails for good may be considered as added
value, thus allowing some degree of utility even in the case the mission does not success-
fully complete. In this latter case, the reward cumulated during the whole mission, and
the cumulated reward function W( t ) at time t, for any time t ≥ 0 before the end of the
mission, are two measures of interest to assess the dependability of the PMS.
2.4 An application example
We present in the following an example of a generic PMS application. This application is
inspired by a real PMS space application that we thoroughly studied in [15] with one of
the modelling approaches we are going to introduce in the second part of the thesis. Here,
we modify that PMS to serve the purposes of this dissertation. The resulting hypothetical
Phased mission systems 32
PMS application includes most of the typical features of the PMSs that have been de-
scribed in this chapter.
This example of PMS will be conveniently used to exemplify the various methodologies
we will be dealing with throughout this dissertation on a common a case of study. To
point out the applicability limits of each of these methodologies, we intentionally define a
PMS that is challenging, in that it contains many aspects that usually pose relevant issues
to the modelling and evaluation activities.
We consider a PMS that has to execute four phases whose duration is a random variable
having general distribution Fi( t ) with average value τi , i = 1,2,K,4 . Phase 3 represents
a secondary though relevant objective of the mission, whereas the activities performed
within phase 4 are the primary goals to be achieved.
The various phases require different levels of dependability, and consequently have dif-
ferent failure criteria. To meet these dependability requirements the PMS is equipped
with four redundant identical processors that can be used in various configurations.
While within a given phase, the system uses at most those processors that are necessary
to reach the ideal configuration for the phase, that is the configuration that best fits the
requirements of the activities to be executed within the phase. The number of active pro-
cessors being currently employed is denoted with a . The unused processors are turned
off and act as cold spares. The number of spare processors is denoted with s . The differ-
ent requirements of the various phases and the relative failure criteria are shown in Table
2.1, together with all the other phase-dependent parameters that define the PMS.
During each phase, active processors are subject to faults. All the processors are assumed
to exhibit the same stochastic behaviour with respect to fault occurrences. Each processor
fails independently from each other, and the time to failure is exponentially distributed
with a rate that is constant within a phase. However, the same processor can have differ-
ent failure rates in different phases. The phase-dependent failure rates are listed in Table
2.1. The number of faulty processors is denoted with f . Spare processors are not af-
fected by faults.
Phased mission systems 33
Phase Idealconfiguration
Minimalrequirement
Failurerate
Reward Nextphase
Expectedduration
Startingcriterion
1 a = 3 a = 2 2λ 0 2 τ1 -
2 a = 2 a = 1 λ 0 {3,4} τ2 -
3 a = 3 a = 3 5λ w 4 τ3( f ) f = 0
4 a = 3 a = 3 2λ 10w - τ4 f > 0
Table 2.1: Phase-dependent parameters of the PMS
Faulty processors are removed from the pool of the active ones. To recover the ideal con-
figuration for the current phase, one or more reconfigurations are tried, with the activa-
tion of a spare component. This activity requires a negligible amount of time, and may ei-
ther succeed with probability c or fail, with probability 1− c . Should it fail, the spare
processor that does not switch on is declared faulty. However, the PMS is able to perform
its vital functions as long as the configuration does not degrade below the one expressed
by the minimal requirement for the phase, which in fact simply represents the failure cri-
terion for the phase, which is shown in Table 2.1.
After a period of time that can not be determined a priori, a faulty processor can get re-
paired. Such a repair of a processor may require the replacement of some internal compo-
nents, or the reinstallation of parts of the software in case of a permanent fault. For a fault
of transient nature, a rollback operation is usually sufficient to recover a consistent state
of the processor. The repair facilities are shared in the system, and faulty processors are
sequentially repaired one after the other, on a First Come First Served basis. The random
time needed for the repair is assumed to follow a general distribution low, with a cumu-
lative distribution function μ( t ), t ≥ 0 . Repaired processors become spares or active
processors depending on the requirements of the current phase.
When a new phase begins, the PMS needs to update its configuration to reach the most
adequate one with respect to the requirements of the phase. This reconfiguration may
trigger the activation of processors that are kept in a spare state. Since the switching on of
a processor can fail, there is the possibility that a phase fails at its very beginning.
A certain level of flexibility is foreseen for the mission of the PMS, and a dynamic deci-
sion on which phase has to be performed next can be dynamically taken depending on the
state of the system. Therefore, as shown in Table 2.1, a phase can have multiple subse-
quent possible phases, like phase 2 that can be followed either by phase 3 or phase 4. The
Phased mission systems 34
dynamic decision is taken at the end of the preceding phase, that is phase 2, by evaluating
a starting criterion that enables the beginning of operation of only one among the possible
phases. This selection mechanism allows to skip phases that could lead to excessively
hazardous situations for the PMS, or to sacrifice some secondary objectives for the sake
of some more relevant phase to be performed later during the mission. The possible paths
of the mission profile are conveniently represented by the tree shown in Figure 2.1,
where the branching represent the points at which a selection among different alternative
phases is performed. This tree plays a relevant role in the application of the hierarchical
modelling methodology.
1 2
3
4
4
Total accomplishment
Partial accomplishment
Figure 2.1: Dynamic selection of the phase to be performed
For this application example, we assume that the secondary objective of the mission, that
is phase 3, is to be skipped if any of the processors is faulty, as dictated by the starting
criterion, because during phase 3 the PMS active processors are subject to a very intense
stress that results in a high failure rate. This selection mechanism allows to skip phase 3
that could lead to excessively hazardous situations for the PMS, and the secondary goal
of the mission is sacrificed for the sake of a more reliable execution of the primary ob-
jective represented by phase 4.
If the starting criterion for phase 3 is not fulfilled, then the PMS directly proceed to the
primary objective represented by phase 4, as shown in Figure 2.1. On the contrary, both
the two objectives of the mission are pursued. Moreover, the PMS dynamically adjusts
the duration of phase 3 after the phase has already started, if a change in the operational
configuration occurs. We suppose that phase 3 has an expected duration τ3( f ) that is a
function of the number of faulty processors f . Precisely, if f = 0 , then τ3( f ) = τ3 , that
is the phase lasts for the period of time determined by the distribution function F3( t ).
However, as soon as the first processor fails, that is f > 0 holds, then the duration of the
Phased mission systems 35
phase is changed so that the new expected duration of the phase is τ3( f ) = τ3 with
τ3 << τ3 . In other words, if the configuration of the PMS gets degraded, then phase 3 is
consequently shortened, to avoid endangering a dependable execution of the primary ob-
jective. However, if the repair of the faulty processor is completed before the PMS leaves
the phase, then the duration of phase 3 is newly updated to the original value τ3( f ) = τ3 .
Therefore, not only the sequence of phases that are performed by the PMS is not fixed,
but the overall duration of the mission itself can dynamically change depending on the
availability of the processors.
The measure of interest for the PMS are the probability R of successfully completing the
mission, and the whole probability function R( t ). Moreover, to evaluate the probabilities
of the different accomplishment level of the mission, that is the achievement of both the
objectives of the mission, or that of the primary goals solely, we define a very simple re-
ward structure associated with the phases of the system. As shown in Table 3.1, the two
first phases have a null associated reward, in that their correct execution, though essential
to carry on with the mission, brings no practical benefits. The primary objective is esti-
mated to be ten times more valuable than the secondary objective. The total reward W
cumulated during the entire mission is the Performability metric of interest.
Chapter 3
State-of-the-art and open issues in theanalysis of PMSs
Systems showing a phased behaviour offer challenges in dependability modelling and
evaluation. The analysis of PMSs is indeed complicated by their dynamic behaviour,
which may change considerably from phase to phase. A different model should be in
principle built to represent the behaviour of a PMS inside each phase. This point affects
not only the modelling step, but the evaluation step as well. The model that represents
the behaviour of a PMS within a phase cannot be solved independently from the other
phase models. Indeed, the same system components are to be used inside the various
phases, thus introducing dependencies among the phase models. Moreover, the phase
changes very often reflect the occurrence of particular events that do not depend on the
system state itself, but are rather triggered by external events, such as time-out excep-
tions, or are preplanned for the entire system life-time.
To better understand the contribution offered by this thesis, in this chapter we present a
brief survey of the most significant results that have appeared in the literature for the
analysis of the dependability attributes of PMSs. We first consider the methods based on
combinatorial approaches, as Reliability block diagrams and Fault trees, and then the
more recent ones based on Markov models. The two kinds of approaches have both ad-
vantages and drawbacks. The combinatorial approach is conceptually simple, but it tends
to become computationally expensive as the number of phases increases. More impor-
tantly, combinatorial models can only deal with non repairable systems. The Markov ap-
proach is very adapt to represent the complex dependencies among components and the
dynamic structure of the PMSs, but there is generally a state explosion problem associ-
State-of-the-art and open issues in the analysis of PMSs 37
ated with large systems. Nevertheless, in our opinion the Markov approach does offer
the potentialities needed to adequately address the features of the more complex instances
of PMSs. We present a comparison of the methods based on the Markov chain approach,
by briefly discussing the applicability to the modelling and solution of the example of
PMS introduced in Chapter 2 of each methodology. This helps in understanding the main
advantages and drawbacks of the methods, and to clarify which points need to be ad-
dressed to extend the applicability of the results that have been proposed up to now in
the literature. Finally, we point out the issues that will represent the goals of the research
conducted throughout this dissertation.
3.1 Literature survey
PMSs have been widely investigated. Starting from the early studies [17, 86], which as-
sumed fairly simple phase dependencies of system components, many works have been
proposed which resort to Reliability block diagrams and Fault tree models [18, 34, 62,
81] (only for non-repairable systems) and to Markov chain models [3, 5, 31, 56, 79, 80].
A different approach based on the Stochastic Activity Networks modelling paradigm is
adopted in [7] and presented at the end of the survey; it represents a study closely re-
lated to some points of the investigation presented in this dissertation.
One of the earliest studies that propose a general methodology for the analysis of PMS is
found in the paper [34] by Esary and Ziehms. They defined therein a method for the es-
timation of the Reliability of Phased Mission Systems, based on Reliability block diagram
analysis. The system components are assumed to fail independently from each other, and
failed components are not repaired. They considered a transformational approach that
takes as input the separate Reliability block diagram of the individual phases, and gener-
ates a single Reliability block diagram representative of the overall mission. The various
failure criteria of the different phases are captured by the separate Reliability block dia-
grams specific for the single phase, and are then combined to obtain the failure criterion of
the system. In this way, a multi-phase mission is reduced to an equivalent, synthetic,
single-phase system, for which standard solution algorithm can be applied to compute
mission Reliability. A number of new events need to be introduced in the final Reliability
block diagram during the transformation step, to represent the dependencies among sys-
State-of-the-art and open issues in the analysis of PMSs 38
tem components from a phase to the successive ones. This increase in the number of
Reliability block diagram elements is partially compensated by a clever definition of the
set of minimal cuts, which can be reduced by exploiting the sequential order of execution
of the phases.
The paper of Burdick et. al. [18] compares the exact results provided by the method of
Esary and Ziehms with several less computationally demanding approximate methods
based on Fault trees. They considered the same system scenario as Esary and Ziehms did
in the work cited above, that is non repairable systems with components failing indepen-
dently from each other. The measure of interest was the Unreliability of the system. The
comparison was carried out on an application example, the emergency core cooling sys-
tem for a boiling water nuclear reactor.
A similar approach was presented by Pedar and Sarma in [62] to evaluate the effective-
ness of a distributed fault-tolerant avionics computing system taken as an application ex-
ample, still under the hypothesis of independently failing and non-repairable compo-
nents. They considered an analysis based on Reliability block diagrams and obtained both
exact and bound results for the system Reliability. They developed a systematic proce-
dure to cancel out the common events in earlier phases that are accounted for in later
phases. Different accomplishment levels for the mission were considered, ranging from
the regular completion of the flight up to the fatal crash of the aircraft. The impredictabil-
ity of the external environment was included in the analysis as well, through the insertion
of a dummy phase at the landing time.
Somani and Trivedi proposed in [81] an analysis technique that combines Fault trees and
Boolean algebraic methods to efficiently compute the probability of failure in complex
PMSs. Component failure rates, configuration and failure criteria may vary from phase to
phase. The failure criteria of each phase is expressed as a Fault tree, similarly as for Esary
and Ziehms, but the method does not build a single monolithic model. Rather, phases are
handled one at the time, thus obtaining a computationally efficient solution technique. A
phase algebra was developed to account for the effects of variable configuration and fail-
ure criteria from phase to phase. Quite recently, the method of Somani and Trivedi has
been extended and improved by Ma and Trivedi [51], who developed a much more effi-
cient algorithmic strategy for the application of the phase algebra.
State-of-the-art and open issues in the analysis of PMSs 39
The method presented by Meyer et al. in [56] for the analysis of an air transport mission
is the first example of Markov based approach for PMSs found in the literature. The au-
thors consider a PMS formed by the SIFT (Software Implemented Fault-Tolerance) pro-
cessing system plus the external environment. Similarly to the work of Pedar and Sarma
described above, different levels of accomplishment are defined for the mission of the air-
craft, and the measure evaluated is a performability metric [55]. The system and the ex-
ternal environment are modelled with a hierarchical approach, where three levels of mod-
elling are devised: the computer level, the aircraft level and the mission level. Specific in-
terlevel translations are performed to obtain the performability measure at the highest
level (mission) from the state probabilities of the lower level ones. The various phases are
modelled at the lowest level, the computer level. Within a given phase, the stochastic pro-
cess representing the PMS is a continuous-time homogeneous Markov process. These in-
traphase processes are allowed to differ from phase to phase. The overall solution is ob-
tained by sequentially evaluating all the phases, by using as initial probability distribution
of the currently analysed phase the final probability distribution of the previously solved
phase. The probabilities of the interphase transitions taking place at the time of phase
change are specified by interphase transition matrices. These matrices perform the map-
ping of the state probability distribution from one phase to the successive one. To apply
this method to the modelling of our example of PMS, many simplifying assumptions
must be introduced. First of all, the only distribution of the phase duration that can be ac-
commodated is the deterministic one. Therefore, the duration of phase i must be approx-
imated with the expected value τi of distribution Fi( t ), i = 1,2,K,4 . The consequences
of such approximation can be more or less relevant for the results of evaluation, mainly
depending on the variance of the original distribution Fi( t ). Moreover, only a static
profile of the mission can be considered with the method described above: a model for the
mission can be built only forcing a sequential execution of the four phases. The dynamic
selection of the next phase to be performed and the other flexibility characteristics of the
PMS taken as example are hence to be disregarded. Last, since phase models are built as
homogeneous Markov chains, the time needed for the repair of a faulty processor must be
modelled with an exponential random variable, thus introducing a further approximation
of the PMS behaviour. Once the simplified model of the PMS has been defined, the eval-
State-of-the-art and open issues in the analysis of PMSs 40
uation of both the Reliability and Performability measures of interest can be carried out
within the solution framework proposed by the method.
A similar approach has been used by Arlat et al. in [5] to evaluate the Reliability and
Safety figures of a space application. The various phases are separately modelled by
continuous-time homogeneous Markov process, and the mapping of the state probability
distribution from one phase to the next is performed by hand. A sensitivity analysis
based on multiple evaluations with respect to the variations of some of the most crucial
parameters, as the coverage of fault-tolerance strategies, is also performed. Concerning the
application of the methodology to the example of PMS, the same considerations as those
we have made for the previous method apply. Indeed, again only a constant duration of
the phases can be considered, with a static profile of the mission and only exponential
activities can be exactly dealt with. Performability oriented metrics are not taken into ac-
count.
The work presented by Alam and Al-Saggaf in [3] defines two different methods, to deal
with repairable PMSs having deterministic and random phase duration, respectively.
Both system failure criteria and failure rates of components are allowed to vary from
phase to phase. In the case of deterministic duration, an exact analysis methodology is
defined, which models the multi-phase system with a single Markov process. This overall
process is solved as many times as the number of phases the PMS consists of, and the
initial state occupation probabilities of the system at the time a new phase starts are ob-
tained from the final state occupation probabilities of the preceding phase. For PMSs
having a random duration of the phases, an approximate method is proposed. First, the
probability distribution functions of the phase change times are computed, and the ex-
pected values for the phase change times are obtained. Then, the single Markov model is
again utilised, but now the state occupation probabilities of the PMS at the beginning of
each phase are approximated with the state occupation probabilities evaluated at the ex-
pected phase change times. When applied to our example of PMS, the method of Alam
and Al-Saggaf allows to consider the random phase duration, but at the expenses of the
exact solution. Unfortunately, no estimates of the error introduced by the approximation
are provided, thus heavily limiting the confidence that can be put on the evaluation re-
sults. The flexible profile of the mission and the non-exponential duration of the repair
State-of-the-art and open issues in the analysis of PMSs 41
can not be modelled with this methodology. Only Reliability measures are taken into
consideration in the Alam and Al-Saggaf method.
Smotherman and Zemoudeh presented a method [79] that builds a single non-homoge-
neous continuous-time Markov model to carry out the analysis of PMSs. The behaviour
of the system inside each phase is represented by a different Markov chain having a sepa-
rate subset of the whole state space for the system. The state transitions are described in
terms of general random variables, and are generalised to include the events representing
the phase change. Thus, state dependent phase changes, random phase duration, time
varying failure and repair behaviour are all easily modelled. The system of differential
equations describing the evolution of the non-homogeneous Markov model is numerically
solved to evaluate the dependability measures. Concerning the application of this method
to the modelling of our example of PMS, we remark the generality of the assumptions
made, which accommodate most of those features that pose problems to the other
methodologies. In fact, apart from the dynamic selection of the mission profile, the ran-
dom duration of the phases and the non-exponential duration of the repair activity is ex-
actly modelled. However, as we will better see in the following, the generality of the hy-
potheses under which the method applies is heavily paid in terms of solution computa-
tional cost.
Similarly to the method of Smotherman and Zemoudeh, Dugan proposed in [31] a method
in which a single Markov chain with state space equal to the union of the state spaces of
the individual phase models is generated. The phase models are automatically obtained
from a Fault tree description of the success/failure criteria for the single phase. The tran-
sition rates are parameterised with phase numbers; thus, strictly speaking, the model is a
non-homogeneous Markov chain. The model is solved once for each of the phases the
PMS is formed by, and only those transitions whose label corresponds to the one of the
phase being currently handled are considered. The final state occupation probabilities of
one phase become the initial state occupation probabilities for the next phase. For the
applicability of the Dugan’s approach, the failure criteria of the various phases must be
such that if a state is a system failure state in a phase, then it can not become a proper
state for the system in a later phase. This constraint is not respected by our example of
PMS, for which phase 2 is less demanding than phase 1 and consequently a failed config-
State-of-the-art and open issues in the analysis of PMSs 42
uration in phase 1 is a feasible configuration during phase 2. Apart from this issue that
can not be solved without relevant changes to the structure of the PMS, the same mod-
elling problem encountered by the methods of Meyer and Arlat are also found for the
Dugan's method, that is random phase duration, non-exponential distribution of the time
needed for the repair activities and flexible mission profile can not be taken into account.
The evaluation of the Reliability is the sole objective of the method.
Somani et al. [80] presented a computationally efficient method to analyse multi-phased
systems. With this approach, as for the methods of Meyer et al. [56], and Arlat et al. [5],
Markov chains for different phases are developed and solved separately, instead of a sin-
gle Markov chain. The issue of varying failure criteria and changes in system configura-
tion from phase to phase is addressed by providing an efficient mapping procedure at the
transition time from a phase to another phase. While analysing a phase, only the states
relevant to that phase are considered, thus limiting the size of each individual Markov
model, and reducing the computation time without compromising the accuracy of the
evaluation. Phase can be of fixed or random duration, and in the latter case duration are
drawn from the Beta distribution. The same approximate approach as the one described
above by the method of Alam and Al-Saggaf [3] is adopted to deal with random phase du-
ration. Therefore, this method allows to model the random duration of the phases of our
example of PMS, at least in the case they belong to the family of the Beta distributions,
but incurs in the same approximation as the method of Alam and Al-Saggaf, which does
not bound the error introduced. The other features of the PMS taken as example, that is
the dynamic profile of the mission and the non-exponential duration of the repair, can not
be modelled with this method. Once again, the Reliability of the PMS is the sole measure
that can be evaluated.
Last, we describe the study by Aupperle et al. in [7]. The authors employed the
METASAN [75] modelling tool, based on Stochastic Activity Networks (SANs), to rep-
resent with two separate models the failure/repair processes of system component and
the changing environment which is responsible for the phased behaviour. The two sub-
models defined for the PMS are easily combined into a single overall model that is able to
capture the complex phase dependent behaviour by exploiting the modelling features of
the SANs. This approach is quite remarkable in that it contains many interesting elements
State-of-the-art and open issues in the analysis of PMSs 43
for the modelling of complex PMSs. Indeed, the SAN paradigm makes available many
useful modelling features, as activities having general distributions of duration, and the
expressiveness to define state dependent firing rates, and guards controlling the enabling
conditions of activities. Thus, the random duration of the phases, as well as the non-ex-
ponential duration of the repair of our example of PMS can be easily modelled.
Moreover, even if this possibility is actually not explored by the authors in the paper [7],
by exploiting the modelling features of the SANs it is possible to represent the dynamic
profile of the mission, too. However, no analytical solution method is provided for the
evaluation of the SAN model of the PMS, which was instead intended to be solved by
simulation. It is worthwhile remarking that the simulative approach can be very costly for
PMSs. Indeed, PMSs usually have very long mission times, during which a number of
short duration activities are performed. These characteristics easily result in expensive
and time-consuming simulation runs required to solve the models of PMSs.
3.2 Comparison of the methods based on the Markov approach
In this section we compare the Markov chain-based approaches [3, 5, 31, 56, 79, 80] and
the one based on SANs in [7], for the dependability modelling and analysis of PMSs. The
most relevant aspects of the comparison have been deduced from the application of the
various methodologies to the example of PMS introduced in the Chapter 2, and are sum-
marised in Table 3.1. As a general remark, it is possible to affirm that very few of the
methods try to define a general methodology for the analysis of PMSs; most of them re-
mains stuck at the study of the particular example they present, and it has been not easy
to figure out how to apply the results to the modelling and evaluation of the example.
The studies that have appeared in the literature may be roughly classified in two major
groups, on the basis of the approach used to deal with the changes in the structure of the
system. These studies consider the definition and the solution of either a global model in-
cluding all phases as proposed in [3, 7, 31, 79] or the definition of a distinct model for
each phase of the system and a separate evaluation for each of these models as in [5, 80].
The hierarchical method of Meyer et al. [56] represents a single exception. We shortly
explain in the following the advantages and disadvantages of both the two approaches.
State-of-the-art and open issues in the analysis of PMSs 44
The single/separate modelling of the phases is a key point that impacts most of the other
aspects: it affects the understandability of the models, the reusability and flexibility of
previously built models, the modelling of dependencies among phases and the complexity
of the solution steps. The single model approaches suffer from a lack of reusability: a
new model needs to be built if the behaviour of the system in any phase is changed or if
the phase order is changed. However, this task is considerably simpler for the SAN model
of PMS than for all the other Markov chain based approaches. As an advantage, the sin-
gle-model approach allows the exploitation of similarities among phases to obtain a com-
pact model. It is indeed often the case that the set of possible actions of the system
within a given phase is a subset of the actions performed in other phases. If we consider
our example of PMS, It could be hence convenient to build a single model in which all the
phases are properly “embedded”.
Conversely, the separate modelling of each phase allows the reuse of previously built
models of the phases. Moreover, it is easier to characterise the differences among phases,
in terms of different failure rates and different configuration requirements. The hierarchi-
cal approach combines the advantages of the single and separate modelling to keep phase
models small and easy-to-define and at the same time to provide different levels of ab-
straction at which the mission can be described and analysed.
Another relevant feature in analysing phased mission systems is represented by phase
duration. The choice between deterministic or random phase duration basically depends
on the features of the system to be analysed. The assumption of constant phase duration
seems to hold for a wide range of application of PMS, as for instance the space applica-
tions, where phases are pre-planned on ground. Moreover, this assumption enables an ex-
act solution of the models through the mapping of the transient state probabilities from
one phase to the next. When phases of random duration are to be considered the analysis
becomes more complicated. Exponential distributions do not appear to be a suitable way
to model phase duration due to their long tail behaviour, while using non-exponential ones
complicates the analysis and may lead to approximate solutions as in [3, 80], or may re-
quire the numerical solution of the associated set of differential equations, as in [79], or a
simulative solution as in [7].
State-of-the-art and open issues in the analysis of PMSs 45
Method Type ofmodelling
Phaseduration
Treatment ofdependencies
Complexity(upper-bound)
Meyer et al. 1979[56]
hierarchical fixed intraphase ma-trices O( Ci
2qiτ ii=1n∑ )
Arlat et al. 1986[5]
separate models fixed by hand O( Ci
2qiτ ii=1n∑ )
Alam et al. 1986[3]
single model random implicit O(C 2q τ ii=1
n∑ )
Smotherman et al. 1989[79]
single model random implicit O( n2C 2q τ ii=1
n∑ )
Aupperle et al. 1989[7]
single model random implicit simulation
Dugan 1991[31]
single model fixed implicit O(C 2q τ ii=1
n∑ )
Somani et al. 1992[80]
separate models random intraphase ma-trices O( Ci
2qiτ ii=1n∑ )
Table 2.1: Markov chain based methods for PMS
The treatment of dependencies among phases requires a mapping of probabilities from
states of a phase to states of the next phase, and represents a relevant issue to be ad-
dressed. The single model permits representing implicitly the mapping of probabilities
from one phase to another phase because phase changes and state changes are modelled
together. However, this joint modelling may result in a non-homogeneous Markov model,
as in [31, 79]. On the contrary, the separate approach requires us to explicitly perform
the mapping from state to state for each phase change. This job is conceptually simple
but can be cumbersome and becomes a potential source of errors for large models. In [56,
80] the mapping is realised through proper intraphase matrices, whereas in [5] the map-
ping is carried out manually.
To carry out a comparison of the computational complexity that the various methods re-
quire, we consider an estimation of the cost required to evaluate the Reliability at mission
completion time R for the four phase mission of our example of PMS. Since none of the
considered methods is able to deal with the flexibility characteristics, we neglect such fea-
tures of the mission and assume that the four phases are sequentially performed. For the
sake of generality, we give the computational complexity expressions for a mission having
n phases; letting n = 4 provides the values for this particular example. Moreover, to
provide a consistent example that can be analysed with all the different methodologies,
we suppose that phase i has a constant duration of τi units of time i = 1,2,K ,4, and
also consider a fixed, time-invariant repair rate μ for the time to repair, which corre-
sponds to an exponential distribution of the repair activity duration.
State-of-the-art and open issues in the analysis of PMSs 46
Suppose phase i is modelled by a Markov chain whose state space Si has Ci states
i = 1,2,K,n . Both the methods in [3] and in [31] solve a unique Markov model whose
state space is the set given by the union of the single state space of each phase. The size
C of the unified state space is at least maxi=1,2,K,n Ci , and can be at most
Cii=1n∑ . The
single Markov chain is solved n times for n phases. If the uniformization method [68] is
used to evaluate the transient probability vector, the overall complexity of the solution is
O(C 2 q τ ii=1
n∑ ), where q is defined as the maximum module entry of the generator matrix
of the whole Markov model. The method of Smotherman and Zemoudeh [79] requires the
solution of a set of nC differential equations. The authors adopt a Runge-Kutta solution
algorithm, at a cost bounded by O( n2C 2 q τ ii=1
n∑ ), where q is defined as above. The
complexity of the solution for the method in [7] can not be compared to the other listed
in Table 1, because the SAN model is to be solved by simulation. Therefore, the compu-
tational cost is affected by a number of factors, as the width of the confidence intervals,
the method used for the statistical analysis, etc. The hierarchical method in [56], as well
as those that adopt the separate modelling approach [5, 80], require a number of opera-
tions which is dominated by O( C
i2qiτii=1
n∑ ) , where qi is the maximum module entry of
the Markov chain generator matrix of phase i . This computational complexity grows lin-
early with the number of phases. Therefore, the methods that use separate solution of the
phase models are, in general, more efficient than the ones that use a combined model.
Note that all the computational cost reported here are to be intended as upper-bounds. In
fact, the matrices obtained are quite often sparse, thus allowing cheaper solution tech-
niques [68].
Just to give a flavour of the relative efficiency of the various methodologies, we show in
Figure 3.1 the plots of the computational complexities as functions of n, by fixing the
values of the other model parameters. Precisely, we set the state space cardinalities
Ci = 5 , the maximum module of the generator matrix q = qi = 1, and the phase duration
τ i = 1, i = 1,2,K ,n. We assume that Ci( n+ 1) / 2 holds for the methods that adopt the
single modelling approach, that is we suppose that the union of the various state spaces
of the PMS over the n phases results in the average cardinality between the minimum
value Ci and the maximum one nCi.
State-of-the-art and open issues in the analysis of PMSs 47
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1 2 3 4 5 6 7 8 9 10
Separate modelling approaches
Dugan, Alam and Al-Saggaf
Smotherman and Zemoudeh
Complexity
n
Figure 3.1: Comparison of the computational complexities for PMS methodologies
Notice that the values of the computational complexities on the vertical axis follow a log-
arithmic scale. Therefore, the advantage of the separate modelling approach in terms of
solution efficiency is very relevant. This point of the comparison thus represents a dis-
criminating aspect for the choice of a solution methodology. In particular, the computa-
tional cost required by the single modelling approach of Smotherman and Zemoudeh
makes it practically feasible only for very simple PMSs.
3.3 Open issues and objectives of the thesis
From the review and the subsequent comparison carried out in the previous subsections,
we can conclude that several aspects of PMSs still represent a source of formidable
problems for the dependability modelling and analysis techniques that have been applied
for their solution.
The main concern for a dependability engineer dealing with the analysis of a PMS, is rep-
resented by the choice of a proper method that is powerful enough to be applicable to the
particular system of interest, and at the same time provides accurate results with a limited
computational effort. Unfortunately, the set of available methodologies is not uniformly
distributed over the spectrum of the possible scenarios of PMSs. A few well-defined
State-of-the-art and open issues in the analysis of PMSs 48
methodologies have been proposed for particular classes of PMSs, such as the method of
Esary and Ziehms [34] and that of Somani and Trivedi [81] for non repairable systems.
When more general classes of PMSs are to be studied, as in the case of repairable PMSs
with random phase duration, the set of the applicable methods for an exact solution is
much narrower (the Smotherman's method [79]) and the computational cost becomes ex-
tremely heavy. Definitely, there is a lack of a “scalable” method, that is a method able to
effectively deal with PMS instances of different complexity, from the simplest ones to
the most general cases, by adopting a well-defined coherent modelling and solution ap-
proach.
Moreover, the available methods very often turn out in intricate and hardly understand-
able models, where the phased structure of the system is either merged with the mass of
other modelling elements, as in the case of single model approaches, or split and only
implicitly taken into account, as for the separate modelling approaches. Also, these
complicate models may easily become a potential source of errors. The problem is par-
ticularly relevant for the separate modelling approach, because the explicit handling of
phase dependencies is a cumbersome and error-prone task. In this respect, we believe that
the hierarchical method proposed by Meyer et al. [56] is a promising approach.
Combining different views of a PMS at various abstraction levels helps to keep in a clear
and easy to read way both the information about the profile of the mission and that re-
garding the state of the system components.
The methods appeared in the literature can not accommodate several interesting features
of PMSs. In particular, they are limited to consider PMSs with completely static profiles
of their mission. The phase duration is mostly assumed to be constant, and those few
studies that relax such hypothesis do not provide exact analytic solutions. The ordering
of phases is always considered as fixed and predetermined before the beginning of the
mission. However, from the discussion in the previous chapter, it is very easy to figure
out situations in which some phases of the mission can be skipped or prematurely
aborted, or the mission diverted to secondary objectives.
The ultimate goal of this thesis is the development of computationally efficient and gen-
erally applicable methodologies for the analysis of PMSs. We intend to alleviate the ad-
ditional complexity that is introduced by the phased behaviour of PMSs with respect to
State-of-the-art and open issues in the analysis of PMSs 49
the dependability modelling and evaluation tasks for the single phased systems.
Conciseness, understandability, modifiability and reusability of the models are some of
the key aspects we consider as mandatory for a methodology to be an adequate one for
our purposes. We will be take into account flexible missions of PMSs, and we will aim at
removing the constraint of constant phase duration. Also, a limited computational effort
of the analysis will be regarded as a discriminating point. The asymptotic computational
complexity required for the separate modelling and solution of the PMSs inside the vari-
ous phases is set as an upper-bound for the computational complexity of the an efficient
methodology for the dependability analysis of PMSs.
To address these tasks, we employ various modelling strategies and tools in the second
part of this thesis. First, we introduce the method presented in Chapter 4, which is based
on a hierarchy of Markov models, and, within the same restricted scenario of constant
phase duration of PMSs, proposes several original elements that relieve some of the weak
points of the methods in the literature. In particular, we attack the problem of phase de-
pendencies, introducing a systematic modelling strategy that includes their explicit repre-
sentation in the form of specific submodels. Moreover, we perform a first step towards
the introduction of a flexible profile of the mission, allowing the dynamic choice of the
next phase to be performed at the time the preceding phase ends. The computational
complexity required for the solution of the models is of the same order as the threshold
one.
In Chapter 5 we present a different approach for the dependability modelling of PMSs,
based on the class of DSPNs. The method takes advantage of the power and expressive-
ness of DSPN to define very concise and easy-to-read models for PMSs. Remarkably, it
completely relieves the necessity of explicitly handling phase dependencies, thus avoiding
such a cumbersome and error-prone modelling task. At the same time, the computational
cost of the method is not increased with respect to the fixed threshold. Moreover, since
the phase-dependent behaviour of the system is described with a simple list of marking
dependent predicates, it becomes easy to rearrange and modify the model of the PMS.
Finally, the method is supported by the existence of analytical solutions which can be
fully automated. The solution efficiency greatly benefits from the splitting of Markov
chains for different phases. This way the computational cost needed to solve the overall
State-of-the-art and open issues in the analysis of PMSs 50
model is almost completely reduced to the cheaper problem of separately solving the
PMS model inside the various phases, and the computational complexity is of the same
order of magnitude as the one of the previously proposed cheap hierarchical method.
Very flexible mission profiles can be readily modelled and analysed. The proposed solu-
tions approach for the DSPN model allows to perform the analytical sensitivity analysis
of a PMS, a task that, to the best of our knowledge, has not been addressed yet in the lit-
erature. If some of the parameter values of the PMS model must be varied, for instance
because they are subject to some estimation error, or because different scenarios of the
system are of interest, a convenient strategy of analysis is that of studying the sensitivity
of the model with respect to the parameter variations. The main limitation of the DSPN
approach is that does not allow to widen the applicability with respect to the methods
proposed in the literature. Indeed, the same restrictive assumptions on the PMS must be
satisfied to permit the applicability of the DSPN modelling approach. In particular, the
duration of the phases are once again restrained to be deterministic.
The last methodology we present in Chapter 6 of this thesis has the specific goal of en-
larging the class of PMSs that can be analytically solved. We propose the class of
MRSPNs as a modelling tools for PMSs, and we derive a general, exact, and efficient ana-
lytic solution method. We show that the MRSPN models are able to deal with the com-
plex and dynamic behaviours of PMSs, representing in quite a natural way all those sce-
narios of PMSs previously treated by the various studies in the literature. Much more,
we show how well the MRSPN models can deal with those features of PMSs that could
not been accommodated by the methods appeared in the literature. In particular, we pro-
vide an exact analytic solution technique that can be applied to PMSs with a phase dura-
tion drawn from a random variable of general distribution. The solution for PMSs with
deterministic phase duration is obtained as a particular case. Once again, the computa-
tional complexity is reduced to the one needed for the separate solution of the PMS in-
side the different phases. Chapter 6 also presents the analytical derivation of the sensitiv-
ity functions for the MRSPN models of PMSs.
PART II:
New metho do lo gies fo r the analys i s o f PMS dependabi l i ty
Chapter 4
Hierarchical Markov models forPMSs
In this chapter we present an approach to the modelling of PMSs with constant phase
duration, based on the hierarchical combination of various Markov models which repre-
sent different abstraction level views of the system. An example of application of this
methodology can be found in [15].
Hierarchical models offer many advantages, and have been employed in many contexts for
the analysis of various classes of processing and communication systems. For instance,
Courtois and Semal have proposed in [28, 29] exact and bound methodologies based on
the solution of a hierarchy of homogeneous Markov chain models of the system. The
seven levels of the OSI model of the communication networks have been modelled with a
hierarchical approach by Conway in [27], while the hierarchical modelling of the single
levels has been presented by Perros et al. in [63], Reiser in [69], and Schwartz in [77]. Tai
et. al. in [83] have applied a hierarchical modelling approach to derive the optimal duty
period for on-board preventive maintenance in a space application. The method proposed
by Meyer et al. [56] (introduced in Chapter 2) is the sole example of a hierarchical mod-
elling approach for the dependability modelling of a PMS, an air-transport application in
the particular case.
Here, we mainly follow the basic ideas of a methodology that has been originally pro-
posed by Nelli et al. for the analysis of complex single phased systems operating in the
railway application field [60] The use of a hierarchy of models represents a suitable mean
to master the complexity of the analysis. Different submodels of various parts of the sys-
tem can be separately solved and the obtained results aggregated to define a more compact
Hierarchical Markov models for PMSs 52
model of the system at an higher abstraction level. The iterative application of such a pro-
cedure alleviates the state explosion problem caused by the combinatorial growth of the
model size. At the same time, the savings in the computational complexity required for
the solution of the models permits the refined modelling of many features which can not
be usually taken into account due to the limited capacity of the solution tools. Very de-
tailed submodels of the most interesting parts of the system can be built and solved in
isolation.
We build a two level hierarchical model. The upper level describes the mission of the
PMS as a composition of phases, without detailing the internal behaviour of the system
within each single phase. This allows to easily model a variety of mission scenarios by
sequencing the phases in different ways. Moreover, we can consider missions in which a
dynamic selection of the next phase to be performed can take place, thus allowing a prob-
abilistic selection of alternative paths for the mission profile. Such probabilistic choice of
the mission has not yet been considered in literature, to the best of our knowledge. The
parameters to be used in the upper level model are obtained by solving the lower level
models. These models (one for each phase) detail the behaviour of the PMS inside phases
and are built and solved separately from each other. This way, if a phase is repeated dur-
ing the mission, a model for that phase can be built once and reused when needed. To rep-
resent the dependencies among successive phases, we introduce an additional set of sub-
models, that we call Phase Transition Models (PTM), which describe in an explicit and
precise way the changes that occur in the PMS as phase alternate.
In the following, we first describe the two modelling levels of the methodology, and then
we detail the solution procedure of the models and evaluate the computational complexity
required for the application of the out hierarchical approach. After that, the application of
the methodology to the PMS taken as example is presented. Last, we give some remarks
to discuss the main advantages of the hierarchical modelling approach with respect to the
other proposals that have appeared in the literature.
Hierarchical Markov models for PMSs 53
4.1 The modelling approach
We present now the general guidelines for the hierarchical dependability modelling of
PMSs having fixed duration of the phases. Phases are numbered 1,2,K,n and their dura-
tion is denoted with τ1,τ2 ,K,τn , respectively. The methodology we propose is an exact
one, where the only approximations introduced are those due to the computational solu-
tion algorithms. The two levels of the hierarchical modelling are first introduced and sub-
sequently applied for the analysis of the application example presented above.
4.1.1 Upper level model
The upper level is a concise model that represents the overall mission of the system as a
homogeneous discrete-time Markov chain. Since we intend to consider the dynamic selec-
tion of the mission profile, the PMS may in fact perform different missions. The upper
level model must take into account all the possible sequences of phases that can be per-
formed by the PMS.
The state space SU of the upper level model is generated by analysing the tree of all the
possible paths that the mission can follow. This tree, contains more nodes than the num-
ber n of the possible phases, in that some phases can be repeated along different paths.
For each node belonging to the tree, a distinct state is included in SU . In this way, phases
found in different subtrees are considered as different one from each other. This distinc-
tion allows the memory of the past history of the system to be taken into account in its
further evolution. Two extra states, named “FAIL” and “STOP” are added to SU , which
represent the premature loss and the completion of the mission, respectively.
Same as for the state space SU , the one-step transition matrix PU of the upper level
model is built from the analysis of the tree of the possible mission paths. For each direct
edge linking phase i to phase j in the tree, a non-zero transition probability pi , j is as-
signed to the entry of PU identified by the row corresponding to phase i and the column
corresponding to phase j . This transition probability representing the probability that
the system can start the new phase j after phase i has successfully completed. For each
state but STOP, a non-zero transition probability p
i ,F is assigned to the entry of PU
identified by the row corresponding to phase i and the column corresponding to state
Hierarchical Markov models for PMSs 54
FAIL, representing the failure of the PMS inside phase i , and the consequent failure of
the whole mission. For each state i representative of a phase that is a leaf of the tree of
the possible mission paths, a non-zero transition probability p
i ,S to the entry identified
by the row corresponding to phase i and the column corresponding to state STOP, repre-
senting the probability of successful completion of the last phase of the mission. All the
other entries of matrix PU are set to zero.
The transition probabilities that represent the entries of matrix PU are obtained solving
the lower level model. It is worthwhile observing that the upper level model does not rep-
resent stochastic dependencies among the various phases, arising from the usage of the
same system architectural components in all the phases of the mission. This dependen-
cies will be properly taken into account by the models of the lower level.
A reward wi can be associated to each state i of the upper level model, representing the
benefit obtained by the system with the execution of phase i . Clearly, in a given mission,
a reward may be associated to only a subset of the phases, and, as it will be shown later,
even those phases having a reward associated might be completed still bringing no benefit
at all. The reward wi can be either directly assigned to phase i at the upper level, or it can
be obtained from the solution of the lower level model of phase i . In this latter case, more
detailed reward structures can be considered, as it will be discussed in the following.
4.1.2 Lower level models
The models of the lower level explore the behaviour of the PMS inside the various
phases. We choose to model each phase separately, as the methodologies presented in [5,
56, 80], which have been described in Chapter 1. The standard dependability modelling
techniques [2, 6, 9, 13, 29, 32, 41, 43, 44, 47, 49, 52, 60, 77, 85] can be applied to model
the behaviour of a PMS inside a single phase, where all the phase-dependent parameters
stay constant. Notice that, if a phase is repeated during the mission, then it will be mod-
elled only once, and the model reused whenever it is needed.
A model for a phase is built by using a continuous-time homogeneous Markov chain. The
state space Si represents the set of operational configurations admitted for phase i ,
i = 1,2,K,n of the PMS. All the configurations that meet the failure criteria for the phase
are mapped into an absorbing failure state of Si , that is a state having no outgoing transi-
Hierarchical Markov models for PMSs 55
tions. If multiple failure modes are possible, then a set of failure states can be defined, to
keep track of the events that lead to the failure of the phase. A reward structure can be
defined at the level of phase models, by assigning reward values to the states or to the
transitions of the Markov chain.
The transitions among states of the model of phase i are defined by the transition rate
matrix Qi , i = 1,2,K,n . These transitions represent the events that may affect the PMS
dependability, such as the occurrence of faults, and repairs, or the diagnosis of the failed
components.
Notice that we can equivalently build separate model of each phase with SPN or GSPN
models. Such a higher level modelling would allow the verification of correctness of the
modelling, through the check of various structural properties of the Petri net models, to
increase the confidence in the modelling itself. The chosen reward structure could be de-
fined at the level of the Petri net, over the markings of the model. Very importantly, the
Petri net phase models can be automatically transformed into the associated continuous-
time homogeneous Markov chains. This automatic transformation also translates the re-
ward structure over the states of the corresponding Markov chain. Since Petri net models
are much more expressive with respect to Markov chains, this transformational approach
is very convenient in terms of easiness of modelling. However, to adopt a consistent
modelling approach and avoid the introduction of too many different models, in the fol-
lowing we directly define the Markov models of phases.
Due to the very nature of homogeneous Markov chains, the transitions among states of a
phase model occur in a negative exponentially distributed time. This implies that time
needed to perform whichever activity of the PMS inside a phase will be modelled with an
exponential transition in the Markov chain model of that phase. This is the most severe
constraint that limits the applicability of Markov chains. Whenever an activity having
non-exponential duration is modelled with an exponential transition, an approximation is
introduced in the model and the dependability results can be significantly different from
the exact ones. Several techniques exist to limit this error, such as the phase expansion
[30], that uses a sequence of exponential stages to approximate a non exponential random
variable.
Hierarchical Markov models for PMSs 56
Once the Markov chain models of all the phases have been built, n separate models de-
scribing the evolution of the PMS are available. As explained in Chapter 2, these n mod-
els are dependent one from each other, and they need to be properly linked to correctly
represent the time-dependent behaviour of the PMS over the whole mission. For in-
stance, consider the first phase of the mission. The evolution of the PMS inside this
phase, that is during the time interval [0,τ1 ), is described by the evolution of the corre-
sponding Markov chain phase model. Starting from the initial state occupation probabili-
ties, the time-dependent solution of the model can be computed according with Equation
(1.2), for any time instant t up to the phase ending time τ1.
At the time τ1, several changes may occur in the PMS, as described in Chapter 1, and the
future evolution of the PMS is represented by the Markov chain model corresponding to
the second phase. Due to the sharing of the architectural components, the initial state oc-
cupation probability for this next model are dependent from the final ones of the first
phase model. Since this next phase has in general a state space different from that of the
previous one, a mapping procedure is necessary to assign the initial state occupation
probabilities. Such a mapping procedure must be performed for any pair of consecutive
phases.
As we already pointed out in Chapter 2, the mapping of the state occupation probabili-
ties is conceptually simple, but becomes a boring task and a potential source of errors.
Indeed, the mapping is to be performed at the level of the phase model state space, which
easily becomes a huge set for a complex PMS. To attack the problem with a systematic
approach, we define specific submodels for phase changes, which address the treatment
of dependencies between phases, by explicitly representing the mapping between the
state spaces of consecutive phases.
These submodels are called Phase Transition Models, abbreviated as PTMs in the follow-
ing. PTMs are defined in terms of the state space of the Markov chains. PTMs map the
state occupation probabilities of the PMS at the end of a phase into the initial occupation
probabilities of the successive phase. Also, PTMs account for the configuration require-
ments of the next phase, and for any instantaneous activities that may need to be per-
formed at phase change times. The initial occupation probabilities of the next phase is
normalised with respect to the overall probability of starting the phase.
Hierarchical Markov models for PMSs 57
To better explain the role of PTMs, consider the case when the PMS completes phase i
and phase j is the one to be executed next. Let use denote with i1,i2 ,...,ih the elements of
the state space Si of the Markov chain underlying the model of phase i , and with
j1, j2 ,..., jk that of the state space S j of the model of phase j . In this case, where there
is only one possible next phase, we model this phase change by what is called a
Deterministic PTM, which is shown in Figure 4.1.
The mapping between states takes into account the instantaneous activities that are exe-
cuted at the phase change and of their associated probabilities. For instance, suppose that
the system is in state i1 at the time when phase i ends, and a reconfiguration must take
place before starting phase j . Assume that this reconfiguration can either succeed or fail,
with probability a and b , respectively, and that according to the outcome a different
system configuration is reached in the next phase. The two arcs leaving state i1 in the
PTM of Figure 3.3 just model such feature. The occupation probabilities of states that
are failure states either for phase i or for phase j are not mapped by the PTM.
a b c d
1
2
h
1
2
k
. . .
. . .
Phase i
Phase j
i
j j j
i i
Figure 4.1: Deterministic PTM for the transition from phase i to phase j
The mapping modelled by the PTM can be translated into an associated transition prob-
ability matrix Mi, j , with h rows and k columns, whose entries give the transition prob-
abilities for each pair of states ( u,v ), u ∈Si and v ∈S j , that is the probability on the
corresponding arc of the PTM. Matrix Mi, j is shown in Figure 4.2 where only the non-
zero entries related to the four arcs depicted in Figure 4.1 are represented. As we will see
in the following, the matrix-form of the PTM is particularly useful during the evaluation
of the models. Conversely, its pictorial representation is more suitable for the modelling
in that it helps in understanding the dependencies among phases. The clearness in the
Hierarchical Markov models for PMSs 58
modelling of the PTM is even more useful when dealing with phase changes that involve a
choice on the phase to be performed next.
j1 j2 L jki1i2M
ih
a b L
c
M O
d
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Figure 4.2: Transition probability matrix associated to a PTM
Let us suppose that after phase i , either phase j or phase m can be performed, depend-
ing on some conditions related to the current system configuration. These conditions can
be used to control the mapping of probability distribution among marking of successive
phases. Let i1,i2 ,...,ih denote the elements of the state space Si of the Markov chain un-
derlying the model of phase i , and j1, j2 ,K, jk and m1,m2 ,K,ml those of the state
spaces S j and Sm, of the models of phase j and m , respectively. The choice between
two possible phases only is modelled here for the sake of clearness, but the extensions to
more than two phases is straightforward. We model this phase change with a probabilistic
PTM, shown in Figure 4.3.
i
a b c d
1
2
h
1
2
k
. . .
. . .
Phase i
Phase j
m 21 m m. . .
Phase m
i i
jjj
e
l
Figure 4.3: Probabilistic PTM modelling the dynamic choice of a phase
The associated transition probability matrix Mi, j ,m of the probabilistic PTM is shown in
Figure 3.6. Matrix Mi, j ,m is defined and in the case of deterministic PTMs, except that
the matrix has as many columns as the sum of the state spaces of all the possible phases
that may follow the current one. For instance, for the PTM in Figure 4.4, the transition
probability matrix has h rows and k + l columns.
Hierarchical Markov models for PMSs 59
j1 j2 L jk m1 m2 L ml
i1i2M
ih
a b L c L
d
M
e
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Figure 4.4: Transition matrix associated to a probabilistic PTM
A special PTM is built to represent the transition from the last phase executed to the fic-
titious STOP state. It simply maps the occupation probability of each non failure state to
a single state STOP that represents the successfully completion of the mission.
4.2 Evaluation procedure
In our modelling framework, the evaluation is hierarchically performed, with a bottom up
solution procedure. First, the lower levels models are sequentially solved following their
ordering, as defined in the upper level mission model. The solution of the lower level
models returns all the parameters needed to instanciate the upper level model, which is
solved last to evaluate the measures of interest.
The lower level models are solved using the following iterative procedure. Start with the
first phase of the mission, say phase i , and solve the Markov chain model with a proper
vector rπ i ( 0 ) of initial state occupation probabilities, which for this initial step must be
provided from outside. This initial vector represents the state of the system at the very
beginning of operations. The Markov chain is solved by according to Equation (1.2), to
obtain the vector rπ i ( t ) of transient state occupation probabilities, for any 0 ≤ t < τi .
Also, if a reward structure is defined for the phase model, the total reward wi cumulated
during the phase is evaluated according to Equation (1.4), and reported at the upper level
to instanciate the discrete-time Markov chain model of the mission.
By evaluating rπ i ( τi ) , the state occupation probabilities at the end of the phase, we ob-
tain the input needed to solve the PTM. The solution of a PTM depends on its nature.
Suppose first that we are considering a deterministic PTM, that is there is only phase j
can be executed after phase i . Let us recall that Mi, j denotes the transition probability
Hierarchical Markov models for PMSs 60
matrix associated with the PTM modelling the phase change from i to j , and let k be the
number of states of the Markov chain for phase j . We first calculate the probability pi , j
of starting the next phase as follows:
pi , j =rπ i ( τi ) ⋅ Mi, j ⋅
re j , (4.1)
where re j is a vector having k elements, and whose components are all 1. Note that this
scalar value is exactly the mass probability which is mapped by the PTM from phase i
to phase j . Probability pi , j can thus be reported at the upper level, together with the as-
sociated probability pi ,F = 1− pi , j . Then, the vector rπ j ( 0 ) of initial state occupation
probabilities for the states of the Markov chain model of phase j is derived according
with the following formula:
rπ j ( 0 ) =
rπ i ( τi ) ⋅ Mi, j (
rπ i ( τi ) ⋅ Mi, j ⋅
re j ) =
rπ i ( τi ) ⋅ Mi, j pi , j
Note that via the division by pi , j we are conditioning the initial probability of subse-
quent phase with respect to the probability of successfully starting the phase.
The solution of a probabilistic PTM proceeds as follows. Let j1, j2 ,..., jr be the phases
that could be performed after phase i , and let us denote with Mi, j1 , j2 ,..., jr
the transition
probability matrix associated with that probabilistic PTM. Moreover, let k1, k2,..., kr be
the number of states of the Markov chains for phase j1, j2 ,..., jr respectively. We first
evaluate the probability pi ,F that none of the phases j1, j2 ,..., jr , starts, as follows:
pi ,F = 1−
rπ i ( τi )Mi, j1 , j2 ,..., jr
re j1 , j2 ,..., jr
(4.2)
where re j1 , j2 ,..., jr
is a vector with as many elements as the number of columns of
Mi, j1 , j2 ,..., jr
, and whose components are all 1. This probability is reported at the upper
level. Note that rπ i ( τi )Mi, j1 , j2 ,..., jr
re j1 , j2 ,..., jr
is exactly the mass probability which is
mapped by the probabilistic PTM to the subsequent phases. At this point, to evaluate
the initial state probability distributions rπ j1 ( 0 ),
rπ j2 ( 0 ),...,
rπ jr ( 0 ) we obtain the fol-
lowing fictitious state occupation probability vector rπ∗( 0 ):
rπ∗( 0 ) =
rπ i ( τi )Mi, j1 , j2 ,..., jr
Hierarchical Markov models for PMSs 61
The probability pi , j1
, pi , j2
,..., pi , jr
of starting each of the new phases is evaluated by
properly partitioning vector rπ∗( 0 ). More precisely, denote with
rπ∗( 0 )|j1 the subvector
of rπ∗( 0 ) that is related to the mapping between phase i and phase j1, which is the sub-
vector formed by the first k1 elements of rπ∗( 0 ). Similarly, let
rπ∗( 0 )|j2
be the subvector
with k2 components related to the mapping between i and phase j2 , and so on, so that
rπ∗( 0 )|j1 ,
rπ∗( 0 )|j2
,..., rπ∗( 0 )|jr
is partition of rπ∗( 0 ) into consecutive subvectors, each of
them representing the probabilities that are mapped by the transition to one of the pos-
sible next phases. The probability pi , j1
of starting phase j1, and the vector rπ j1 ( 0 ) of
initial state occupation probabilities for phase j1 are evaluated as follows:
pi , j1
=rπ∗( 0 )|j1
re j1
rπ j1 ( 0 ) =
rπ∗( 0 )|j1 pi , j1
Similarly, we obtain the other probabilities and initial vectors for phases j2 ,..., jr .
Once the new vectors of initial probability distribution are known, all the phases can be
sequentially solved as described, and all the parameters needed by the upper level model
are made available. Solving the discrete-time homogeneous Markov model of the upper
level allows to derive the dependability measures of interest. The Reliability R at the
mission completion time is obtained from Equation (1.1), by evaluating the occupation
probability of state STOP. Similarly, we can obtain the Reliability Ri at the completion
time of phase i , i = 1,2,K,n . The performability is very easily obtained by applying
Equation (1.3).
The complexity of the overall evaluation step is readily evaluated by following the steps
of the solution. The computational cost of solving one phase model is determined by the
cardinality of the state space of its underlying Markov chain. If that state space consists
of Ci different states, then a cost of O( C
i2qiτi ) operations (multiplications) is needed to
compute the vector of the transient state occupation probabilities at time τi , where qi is
the maximum module entry of the Markov chain generator matrix Qi of phase i . Solving a
deterministic PTM model requires a cost of O( Ci ×Cj ), where C j is the cardinality of
the state space of the next phase j , whereas a cost of O( Ci × ( Cj1
+L+Cjn) ) is needed
Hierarchical Markov models for PMSs 62
for a probabilistic PTM from phase i to phases j1,..., jn having state space cardinalities
C j1
,...,C jn. Of course, as it can be reasonably expected that a small set of phases is in-
volved in probabilistic PTMs, the solution cost for all PTMs is much smaller than that of
phase models. Thus, if n is the number of phases composing the mission (not all neces-
sarily executed in an individual mission), the overall cost required to obtain the parame-
ters to instanciate the upper level is dominated by O( C
i2 qiτ ii=1
n∑ ). If all the phase mod-
els have about the same number of states, then O( n max
i=1 ,2 ,...,nC
i2 qiτ i ) is a tight upper-
bound for this complexity. The solution of the upper level model only requires O( n3 )
steps, and is negligible compared to the other steps. Hence, the entire complexity of the
solution can be estimated as O( C
i2 qiτ ii=1
n∑ ).
4.3 Application example
4.3.1 Modelling
In this section we completely carry out the modelling of the example of PMS introduced
in Chapter 2 according with the methodology described above. The hierarchical modelling
methodology here proposed does not widens the applicability of the analytical solution
approaches with respect to the other Markov based methods. Therefore, some approxi-
mations and simplifications of the PMS are needed, similarly as for some other method-
ologies among those presented in Chapter 3.
Since phases are only allowed to have a constant duration, we approximate the random
duration of phase i with the expected value τi of the distribution Fi( t ), for i = 1,2,4 .
The special case of phase 3 is discussed in the following. In this way, we are in fact sub-
stituting the original distribution Fi( t ) with the deterministic distribution of mean τi ,
the sole distribution we can analytically take into account with this methodology. Also,
the model of each phase is an homogeneous continuous-time Markov chain, and thus the
repair activities can only be modelled by approximating the actual distribution of the time
to repair with a constant rate μ . A proper value of μ could be provided by the inverse
of the expected value of the time necessary to complete the repair.
Hierarchical Markov models for PMSs 63
The flexibility foreseen for the PMS can be partially supported thanks to the upper level.
Precisely, we can easily represent the dynamic selection of the next phase to be per-
formed which takes place at the end of phase 2. However, it is not possible to consider
the premature termination of phase 3 as a consequence of the PMS configuration degrad-
ing. Let us recall that phase 3 expected duration is intended to be τ3 if the number f of
faulty processors remains 0 for all the phase, and it is reduced to τ3 as soon as a proces-
sor fails. Nevertheless, a fixed constant duration l is instead to be chosen for the phase,
this introducing a further approximation.
A possible heuristic way of computing a suitable value l for phase 3 duration is to sepa-
rately analyse the transient evolution of the model of the phase, to estimate the average
time τ f =0 the PMS spends in those states in which no processor is faulty, and the time
τ f >0 spent in those in which there are faulty processors, over a time interval [0,Δ ] .
Once this information is available, the value l can be evaluated according to the following
weighted sum:
l = τ3
τ f =0
τ f >0 + τ f =0+ τ3
τ f >0
τ f >0 + τ f =0= τ3
τ f =0
Δ+ τ3
τ f >0
Δ(4.3)
Since the configuration degrading eventually lead to the failure of the PMS, the average
time τ f =0 spent in the states in which no processor is faulty depends on the value of Δ ,
and more precisely, the ratio τ f =0 / Δ tends to 0 as Δ grows. A proper value of Δwould be given by the expected phase duration l, which is however the unknown. An it-
erative method could be applied to successively evaluate Equation (4.3) for a series of
values of l, however, the convergence of the sequence is not guaranteed. Moreover, this
approach is only a heuristic one, which can be inadequate when conservative estimations
of the dependability attributes are needed. In this case, the pessimistic choice l = τ3 is
the one that results in the most stressing scenario for the PMS.
Consider now the upper level modelling. When the hierarchical methodology is applied to
the example of PMS, the Markov model shown in Figure 4.5 is produced. The state space
SU consists of four states 1,2,K,4 representing the four phases of the PMS, plus the
additional state 4© which is introduced because phase 4 appears in two distinct possible
paths of the mission, plus the FAIL and STOP states. The non-zero entries of the transi-
Hierarchical Markov models for PMSs 64
tion rate matrix PU correspond to those pairs for which an edge is represented in the
Markov chain model in Figure 4.5. The rewards associated to the two objectives of the
mission are indicated in boldface.
1 2
3
4
4'
STOP
FAILw 10w
10w
p p
p
p
p
p
p
p
p
p
p1,2
1,F2,3
2,F
2,4
4,S
3,F
4,F
3,4' 4',S
4',F
1
1
Figure 4.5: Upper level model for the application example
We now build the models of the lower level. Let us first consider the Markov chain mod-
els of the phases. The triple ( a,s, f ), where a represents the number of active proces-
sors, s represents the number of spare processors, and f that number of faulty proces-
sors, is a proper notation to completely describe a PMS configuration. Therefore, we also
use the notation ( a,s, f ) to identify the possible states of the phase models. The state
space Si of the Markov chain model of phase i , i = 1,2,K,4 , is defined as the set of
configuration that satisfy the minimal configuration requirements, and that at the same
time do not exceed the requirements of the ideal configuration. Also, each state space Si
contains a single failure state, namely state FAIL, that represents all the unfeasible con-
figurations for phase i , i = 1,2,K,4 . The four state spaces are as follows:
S1 = {( 3,1,0 ),( 3,0,1 ),( 2,0,2 ),( FAIL )}
S2 = {( 2,2,0 ),( 2,1,1 ),( 2,0,2 ),( 1,0,3 ),( FAIL )}
S3 = S4 = {( 3,1,0 ),( 3,0,1 ),( FAIL )}
The transition rate matrices of the various Markov chain models describe the effects of
the failures affecting the active processors, the reconfigurations and the consequent
Hierarchical Markov models for PMSs 65
switching on of the spare processors, and the repair of the failed ones. Figure 4.6 shows
the state transition diagram for the Markov chain model of phase 1.
μ
6λ
4λ3,1,0 3,0,1 2,0,2 FAIL
6λ(1-c)
c 6λ
μ
Figure 4.6: State transition diagram for the Markov chain model of phase 1
Starting from the initial state ( 3,1,0 ), where there are no failed processors, the configura-
tion of the PMS degrades as results of the failures affecting the active units. As soon as
the first active processor is hit by a fault, the insertion of the spare processor is tried. If
the switching on succeeds, the ideal configuration with three active units is restored, state
( 3,0,1 ), or else the degraded mode with only two operational units, state ( 2,0,2 ), is
reached. While the PMS stays in this configuration, any further fault immediately brings
it to the failure, modelled by the transition to state FAIL.
The state transition diagram of for the model of phase 2 is shown in Figure 4.7. Notice
that for this phase it is possible to have multiple reconfiguration attempts. For instance,
suppose the PMS is in configuration ( 2,2,0 ), and a fault hits a processor. The insertion
of a spare unit is instantaneously tried, and should it fail, the reconfiguration is immedi-
ately retried with the insertion of the second spare. The transition linking state ( 2,2,0 )
to state ( 1,0,3 ) models the occurrence of the fault and the double failure of the two con-
secutive spare insertions.
μ
2λ 2λ
λ2,2,0 2,1,1 2,0,2 1,0,3 FAIL
(1-c)
c
(1-c)2λ
2λ(1-c) 2λ(1-c)
c 2λc
μ μ
Figure 4.7: State transition diagram for the Markov chain model of phase 2
Hierarchical Markov models for PMSs 66
The Markov chain models for phase 3 and 4 are equivalent one to the other, except for
the transition rates related to the fault occurrence process. Thus, we only show in Figure
4.8 the state transition diagram for the model of phase 3.
μ
15λ
3,1,0 3,0,1 FAIL
c
15λ
15λ(1-c)
Figure 4.8: State transition diagram for the Markov chain model of phase 3
Consider now the PTMs modelling the phase changes. The deterministic PTM for the
transition from phase 1 to phase 2 is shown in Figure 4.9. This PTM is a very simple one
because phase 2 is less demanding than phase 1, and the only reconfigurations needed as
the next phase starts are represented by the switching off of the extra processors. These
operations are always successfully performed, as indicated by the probabilities on the
corresponding arcs of the PTM.
Phase 1
Phase 2
3,1,0 3,0,1 2,0,2 FAIL
2,0,2 FAIL1,0,3
1 1 1
2,2,0 2,1,1
Figure 4.9: Deterministic PTM for the transition from phase 1 to phase 2
The state occupation probability of state FAIL of phase 1 is not mapped by the PTM,
because it represents a failure probability for the whole mission, and is thus reported at
the upper level to instanciate probability p1,F . The dual probability p1,2 to be reported
at the upper level is the sum of the mass probability mapped by the PTM to the next
phase, and according to Equation (4.1) is given by
p1,2 =
rπ1( τ1 ) ⋅ M1,2 ⋅( 1,1,1,1,1 ) = π
3,1,01 ( τ1 )+ π
3,0 ,11 ( τ1 )+ π
2,0 ,21 ( τ1 )
Hierarchical Markov models for PMSs 67
Notice that a null initial occupation probability is assigned by the PTM to the two states
( 1,0,3 ) and FAIL of phase 2.
The more interesting probabilistic PTM modelling the selection of the next phase to be
performed at the time phase 2 ends is shown in Figure 4.10. The starting criterion given in
Table 2.1 prescribes that the secondary objective phase 3 must be performed only if there
are no faulty processors at the time phase 2 ends. Therefore, the PTM only maps the oc-
cupation probability of state ( 2,2,0 ) into the states of phase 3. However, it could be the
case that the insertion of the spare processor needed to reach the required operational
configuration for phase 3 fails. In this case, phase 3 does not start, and phase 4 is per-
formed instead, provided that the other spare can be inserted, according to the specifica-
tion given in Table 2.1.
Phase 3
Phase 2
3,1,0 3,0,1 FAIL
2,0,2 FAIL1,0,3
Phase 43,1,0 3,0,1 FAIL
2,2,0 2,1,1
c (1-c) cc
Figure 4.10: Probabilistic PTM for the transition from phase 2 to phase 3 or 4
All the mass probability not mapped by the PTM is summed up according to Equation
(4.2), to evaluate probability p2,F of the upper level model.
Phase 3
Phase 4
3,1,0 3,0,1
1 1
FAIL
3,1,0 3,0,1 FAIL
Figure 4.11: Deterministic PTM for the transition from phase 3 to phase 4
Hierarchical Markov models for PMSs 68
The deterministic PTM modelling the transition from phase 3 to phase 4 shown in Figure
4.11 is very simple because the two phases have exactly the same requirements
4.3.2 Evaluation
The evaluation of the models proceeds as explained in Section 2. The initial state occupa-
tion probability vector rπ 1(0 ) for phase 1 is the one that assigns probability 1 to the
state ( 3,1,0 ), that is we suppose that the PMS starts its activities with all healthy pro-
cessors. The initial state occupation probability vector for the subsequent phases are ob-
tained from the solution of the PTM submodels.
Solving the upper level model we can evaluate the measures of interest for this particular
PMS. The Reliability R at the mission completion time is simply the occupation prob-
ability of state STOP at the fourth step, when all the phases have been executed. To eval-
uate the Reliability Ri of the PMS at the time phase i ends, we must compute the occu-
pation probabilities at step i of all the non failure states that are reached in one step from
the state representing phase i . For instance, R2 is computed as the sum of the occupa-
tion probability of state 3 and 4 at the second step. The total reward cumulated during
the mission of the PMS is computed according to Equation (1.3) of Chapter 1.
We perform the evaluation of the example of PMS in terms of the Reliability Ri at phase
i completion time. The values of the model parameters are specified in Table 4.1, where
we only specify the mean values of the distribution involved in the definition of the
PMS. All the values are expressed in terms of the unit of time considered, which is the
hour.
Phase duration τ1 = 50h τ2 = 1000h τ3 = 100h , τ3 = 2h τ4 = 200h
Failure rate λ = 10−4 / h
Successful insertion probability c = 0.99
Mean repair rate μ = 5 * 10−2 / h
Table 4.1: Parameter values for the evaluation of PMS Reliability
The expected value l = τ3 is selected for phase 3. Figure 4.12 shows the values of the
Reliability Ri at phase completion time, i = 1,2,3,4. The Reliability of phase of each
phase is computed by averaging the various Reliabilities along the different paths of the
Hierarchical Markov models for PMSs 69
mission. Thus, we are only taking into account the original phases of the PMS, without
providing a separate Reliability value for all the fictitious phases that have been intro-
duced to keep the memory of the past history of the PMS, such as phase ′4 in the upper
level model. The Reliability R4 shown in Figure 4.12 is exactly the Reliability R at mis-
sion completion time.
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
R1 R2 R3 R4
Figure 4.12: Reliability at phase completion times
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
R1 R2 R3 R4
l=2 hours
l=100 hours
Figure 4.13: Reliability comparison for two values of l
Moreover, to quantify the effect of the approximation introduced by the constant dura-
tion l assumed for phase 3, we also evaluate the Reliability Ri, i = 1,2,3,4 in the case we
set l = τ3 . The comparison of the Reliability results for the two possible values of l is
Hierarchical Markov models for PMSs 70
shown in Figure 4.13. Notice how relevant is the difference of the two results for the two
values of l.
We also evaluate the reward cumulated during the mission of the PMS for the two differ-
ent values of phase 3 duration considered above, and for various values of the probability
c of successful spare insertion. The total cumulated rewards shown in Figure 4.14 are
normalised with respect to the maximum possible value 11w, which is obtained by the
PMS with the successful execution of both phase 3 and phase 4, the two objectives of the
mission. This normalised reward expresses the accomplishment level of the PMS with re-
spect to the reward structure defined for the mission. We can observe from the plots in
Figure 4.14 that the cumulated reward is nearly insensitive to the changes of c for values
greater than 0.99 . On the contrary, also in the case of the rewards, varying the duration
of phase 3 heavily affects the final results, and this influence is relevant for whichever
value of c.
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.85 0.9 0.95 0.99 0.995 0.999
l=2 hours
l=100 hours
RRRReeeewwwwaaaarrrrdddd
cccc
Figure 4.14: Normalised rewards as a function of c
4.4 Concluding remarks
The idea underlying this methodology is to take advantage from the separate modelling
and solution of the phase models, and at the same time to build, at a higher abstraction
level, a global model of the mission. This hierarchical approach simplifies the modelling
Hierarchical Markov models for PMSs 71
activity. The separate modelling of the different phases turns out in self-contained and
easy-to-define models, and allows the dependability analyst to focus on the specific char-
acteristics of each single phase, selecting the more appropriate level of detail for each
model.
The upper level can easily describe missions composed by any combination of phases
with pre-planned duration of the phases, thus allowing to analyse various mission pro-
files by using a high abstraction level modelling. Missions may include probabilistic
choices of the next phases to be performed, thus introducing a more flexible and dynamic
planning of the mission goals. The mission models can hence have a tree configuration.
Many different missions can be modelled and analysed by reusing and composing models
of the lower level which describe individual phases and deterministic as well as proba-
bilistic transitions among phases. This modularity allows very easily to change assump-
tions or conditions ruling a phase change requiring to change just a few sub models.
The hierarchical approach allows to master the complexity of the analysis by considering
a separate resolution of the phases, and of the dependencies among phases caused by the
usage of the same system components in the different phases. Indeed, each single phase
model and PTM is quite simple compared to that of the whole mission, and can be evalu-
ated at a low computational cost by using general purpose tools.
Moreover, the upper level allows to perform a sensitivity analysis to understand which
phases are more critical for the success of a specific mission. Once that such bottle-neck
phases are identified, it is possible to establish requirements on the dependability figures
of each single phase to guarantee that the mission target is reached.
With respect to the limits of the other approaches that have been proposed in the litera-
ture, the hierarchical approach here proposed does not extend the class of PMSs that can
be analysed. Its applicability is still restricted to the PMSs that have a constant phase
duration. However, some original aspects are introduced, which try to alleviate the weak-
points of the methods in the literature.
The explicit modelling of the transitions among phases performed by the PTMs is a first
attempt to introduce a systematic treatment of the dependencies among phases of a PMS.
The use of PTMs turns out in a clean, easily understandable modelling. Even complex sit-
Hierarchical Markov models for PMSs 72
uations, such as the dynamic choice of the phase to be performed next, is modelled in a
concise and easy-to-read way.
The possibility of dynamically selecting the next phase to be performed represents a first
extensions towards the analysis of more flexible missions of PMSs. None of the methods
previously presented in the literature allows the introduction of such a flexibility into the
models. Notably, dealing with this feature does not add a significant complexity, neither
to the modelling, nor to the evaluation step.
The overall computational complexity required for the application of the hierarchical
modelling methodology does not exceed the upper bound we set in Chapter 3, its asymp-
totic cost being exactly the same as that of the other methods that adopt a separate mod-
elling of the various phases. Moreover, the solution of the models can be easily auto-
mated, by exploiting the features made available by the tools for the automated evaluation
of system dependability.
Chapter 5
DSPN models for PMSs
In this chapter we propose another modelling and evaluation approach for PMSs having
constant phase duration, based on the deterministic and stochastic Petri nets (DSPNs). A
complete example of application of this methodology can be found in [15].
As we have described in Chapter 1, the class of DSPN models allows timed transitions
with exponentially distributed firing times, immediate transitions, and includes
deterministic transitions as well. Moreover, the DSPNs allow a very concise modelling of
even quite complex systems, through the use of guards on transitions, timed transition
priorities, halting conditions, reward rates, etc. Due to their high expressiveness, DSPNs
are able to cope with the dynamic structure of the PMSs in a much better way than the
lower level Markov chains can do. In this respect, they offer a suitable means for a
systematic formulation and solution of PMS models.
The DSPN approach to the PMSs offers many advantages, concerning both the modelling
and the evaluation. PMSs are modelled with a single DSPN model representing the whole
mission. As detailed in Chapter 3, the single-model approach offers the advantage of im-
plicitly solving the issues arising from the dependencies among phases caused by the
sharing of architectural components. Moreover, the DSPN model of the mission is a very
concise and at the same time easy-to-read one, and allows additional features of PMSs to
be modelled, features which are usually not considered in literature. In particular, we can
consider flexible missions during which the duration of the phases can be dynamically
adjusted depending on the current state of the PMS. This features is not supported by
the methodology based on hierarchical Markov model defined in the previous chapter.
DSPN models for PMSs 74
The evaluation procedure of the DSPN models is supported by the existence of the ana-
lytical solution method for the transient probabilities of the underlying MRGP marking
process, which has been summarised in Chapter 1. This solution technique can be very
costly in terms of computational complexity required for a general DSPN model.
However, by taking into account the particular structure of PMSs, we derive a much
more efficient analytical solution method, which basically only requires the separate so-
lution of each single phase, and that can be fully automated.
Moreover, the existence of analytical expressions for the time-dependent marking occu-
pation probabilities allows us to derive the sensitivity functions of the dependability
measures of interest for a PMS, that is the analytical form of the derivatives of the mea-
sures with respect to the variations of a set of parameters. Thus, a dependability engineer
can conduct a sensitivity analysis of a PMS by studying the characteristics of the sensi-
tivity functions, rather than performing batches of multiple evaluations for different val-
ues of the parameters.
In the following, we first present the DSPN modelling approach for PMSs, by describing
how the various peculiar features of PMSs can be modelled by exploiting the expressive
capabilities of the DSPNs. Then, we specialise the general theory of DSPNs to the mod-
elling of PMSs, and derive the sensitivity function for the transient marking occupation
probabilities. After that, we describe a possible automation of the solution technique.
Then, we apply the modelling methodology to the example of PMS we have been study-
ing throughout the dissertation. Finally, we give some concluding remarks, to discuss ad-
vantages and limitations of the DSPN approach with respect to the other studies in the
literature.
5.1 The modelling approach
The original class of DSPNs has been introduced to extend the representative power of
the SPNs and GSPNs models, allowing the exact modelling of events having deterministic
occurrence times [1]. Besides the introduction of deterministic transitions, other mod-
elling features have been more recently added to the DSPNs, features that do not extend
the power of this class of models, but significantly improve their expressiveness. These
DSPN models for PMSs 75
same features can be found in Stochastic Activity Networks, and because of their extreme
usefulness, they have been included in several other classes of Petri nets, such as GSPNs
and Stochastic Reward Nets. We now introduce and discuss some of the more relevant
among these new modelling capabilities.
A very useful modelling feature is the possibility of specifying the firing rates of a timed
transitions as an arbitrary function of the marking of any place of the Petri net. Notice
that in the Petri net models we considered up to now, the sole marking-dependence that
has been taken into account is that from the input places of a transition. The general
marking-dependent duration of the activities associated to the transitions permits to
model very complex behaviours in a quite compact way. As well as transition rates, re-
wards associated to the markings of the Petri net may depend on the marking of the
model.
Another very interesting possibility to enrich the capabilities of DSPNs is the introduc-
tion of logic conditions, also called guards, which control the enabling of transitions. In
the classical Petri net models, the enabling of transitions is completely determined by the
number of tokens in the input places and the corresponding multiplicity of the input arcs.
Therefore, to add a particular control condition on the enabling of a given transition, it is
necessary to introduce additional places and arcs which explicitly model the condition
into the structure of the net. This additional modelling elements increase the model di-
mensions and may consequently diminish its understandability. Conversely, a guard can
be any arbitrary function of the marking of the model, which is added to the specification
of the transition without impairing the clearness of the modelling.
A special class of guards on transitions is represented by the halting conditions. They are
introduced to stop the movement of the tokens as the Petri net model reaches a given
marking, by permanently disabling all the transitions and transforming the current mark-
ing in an absorbing one. Notice that if halting conditions are not available, stopping the
evolution of the model requires the introduction of additional modelling elements for each
potentially enabled transition.
Input arcs with variable cardinality provide additional simplifications of the modelling. A
variable cardinality input arc is one whose multiplicity is equal to the number of tokens in
the corresponding input place. Using such an arc can be very useful in the following situa-
DSPN models for PMSs 76
tion. Consider the case when all the tokens in a place are to be flushed to another place.
This batch movement can be equivalently modelled through the basic constructs of the
GSPNs, by introducing an instantaneous transition that iteratively moves the tokens one
at the time until the input place is empty. Though perfectly correct, this modelling choice
is less efficient that the one that employs the variable cardinality arc. In fact, the mod-
elling with the basic GSPN constructs results in a set of intermediate vanishing markings
in the reachability set of the model, markings that need to be removed to obtain the re-
duced reachability graph. On the contrary, the solution with the variable cardinality does
not incur in this problem.
This same efficiency argument applies for all the additional modelling features introduced
for DSPN models and described above. Besides a clearer and more concise modelling,
these capabilities avoid the introduction of many vanishing markings that need to be sub-
sequently eliminated from the reachability graph of the models. However, it is worth-
while observing once again that these features do not increase the power and thus the
applicability of the models, but represent only convenient shorthand notations that
simplify the modelling and generation of the underlying marking process.
The enriched set of modelling features offered by DSPNs allows the PMSs to be repre-
sented by the general model schema shown in Figure 5.1.
Phase 1 Phase 2 Phase 3 STOP
PhN SN
P1 P2
P3
t1 t2 t3
Figure 5.1: General structure of a DSPN model of a PMS
At an high abstraction level, the model of a PMS is composed of two logically separate
parts: one represents the system, that is its components and their interactions, which
evolve according to the events that modify their state, and the other represents the
control part, which describes the phase changes. Both the parts can be modelled via
DSPNs: the system part containing only exponentially distributed and instantaneous
DSPN models for PMSs 77
transitions, which we call System Net (SN), and the control part represented by another
DSPN submodel, the Phase Net (PhN), which contains all the deterministic transitions of
the overall DSPN model. As we shall see in the following, the PhN may contain
instantaneous transitions as well.
A token in a place of the PhN model represents a phase being executed, and the firing of a
deterministic transition models a phase change. The sequence of phases ends with a token
in STOP place, which represents the end of the mission. The SN model describing the
evolution of the underlying system is governed by the PhN: the SN's evolution may de-
pend on the marking of the PhN thus representing the phase-dependent behaviour of the
system. According to the DSPN modelling rules, this phase dependent behaviour is mod-
elled by proper marking dependent predicates which modify transition rates, enabling
conditions, reward rates. As we will detail in the following, the two nets can interact in
various ways, and this allows the modelling of different features of PMSs.
Any structure of the two nets can be considered: in particular, the PhN is not limited to
have a linear structure. Indeed, the PhN must represent all the possible paths of the mis-
sion. If a dynamic profile of the mission is to be considered, then the PhN takes a tree
structure. If a phase is performed along different paths of the mission, the PhN includes a
different place to model the execution of each distinct instance of that phase.
The same structure of the model of a PMS can be found in the paper of Aupperle, Meyer
and Wei described in Chapter 3. They used the METASAN [75] modelling tool, based on
Stochastic Activity Networks, to represent both the PhN and the SN. In the SAN lan-
guage, transitions are called activities. SANs allow for general distributions of activity du-
ration, for marking dependent firing rates, and for guards on the enabling conditions of ac-
tivities. Hence, SANs and DSPNs share many of the modelling features, and the work in
[7] contains many interesting elements for the modelling of complex PMS. Here, we in-
tend to apply the potentialities of DSPNs to investigate how the various aspects of PMS
can be modelled. Moreover, we exploit the MRGP theory to provide an analytical solu-
tion technique for the DSPN models of PMSs. The possibility of such analytical solution
was not available at the time the work in [7] was proposed, and it represents a consider-
able step towards the definition of a systematic solution method for PMSs.
DSPN models for PMSs 78
The adequacy of the DSPN models for the modelling of PMSs is to be found in the possi-
bility of expressing even complex phase-dependent behaviours through the relations be-
tween the two submodels PhN and SN shown in Figure 5.1. To present the guidelines for
the modelling of PMS, in the following we take into account the various features de-
scribed in Chapter 2 and explain the way each of such features can be modelled by intro-
ducing new interaction capabilities between the PhN and the SN.
First of all, consider the changes that occur in the configuration of the PMS as result of
the various performance/dependability requirements of the different phases. These
changes can be modelled by corresponding variations of the SN, which are dependent
from the marking of the PhN. For instance, consider the situation represented in Figure
5.2, where the PMS adopts the operational configuration A while within phase i , and
configuration B within the successive phase j . Place Pi and P j in the PhN model the ex-
ecution of the two phases.
The SN submodel includes the model of both configurations, however the tokens circulate
only inside the model of configuration A while the token of the PhN submodel is in the
place representing Phase i . The two immediate transitions ta and tb have the guard
m( P j ) = 1, that enables them only when a token reaches the place P j . As they get en-
abled, the two instantaneous transitions move the tokens from the model of configuration
A to that of configuration B, modelling the configuration change.
P P
PhN
. . . . . .
Configuration A
Configuration B
t
ta
b
SN
i j
Figure 5.2: Phase-dependent configuration modelling
Variable cardinality arcs can be used to flush more tokens in a single firing. Notice that
this mechanism actually substitutes the mapping of the state occupation probabilities
that was necessary in the previous methodology based on Markov chains. If
configuration A and configuration B share some of the system components, only the
DSPN models for PMSs 79
differences between the two configurations need to be modelled in the SN, and the
common parts can overlap for a concise modelling.
The different failures criteria of the various phase can be immediately modelled by using
again instantaneous transitions, whose enabling conditions are augmented with the failure
criterion of the current phase. For instance, suppose the failure criterion of phase i is
given by a predicate f ( m ) of the marking m . The correct behaviour of the PMS inside
phase i is modelled by a subnet of the SN that represents the normal operation, as shown
in Figure 5.3, and the failure of the PMS is modelled by an absorbing place FAIL. A set
of instantaneous transitions, the two transitions ta and tb in the example in Figure 5.3,
with a guard f ( m )∧ ( m( Pi ) = 1 ) can be used to move some or all the tokens of the
normal operation SN submodel into the FAIL place if the failure criterion is met during
phase i . Moreover, halting condition may be used to stop the movement of other tokens
not moved to place FAIL.
P
PhN
. . . . . . tta b
SN
i
normaloperation
FAIL
Figure 5.3: Phase-dependent failure criteria modelling
The phase-dependent failure rates and the rewards are easily modelled by defining the
proper firing rates as a function of the marking of the PhN submodel.
Up to now, we have conditioned the behaviour of the SN submodel to that of the PhN
submodel, to represent the phase-dependent behaviours that characterise PMSs. Now,
we perform the opposite step, by making the evolution of the PhN dependent on the
marking of the SN, to model the possibility of dynamically adjusting the mission as a
consequence of the state of the system.
Consider the case when the next phase to be performed is to be chosen at the time the
preceding phase ends. We show how the DSPN approach can provide the flexibility to
model such a dynamic behaviour. Suppose that after phase i either phase j or phase h
DSPN models for PMSs 80
can be performed, depending whether the state of the PMS satisfies some starting crite-
rion or not.
P
P
PhN
. . .
. . . i
j
P . . .
h
t
t
j
h
Pchoice
Figure 5.4: Modelling of a dynamic mission profile
We model this feature as shown in Figure 5.4. The PhN submodel shows the two possi-
ble paths of the mission, and the choice between the two possible phases is determined
by the guards on the instantaneous transitions t j and th . The starting criterion is formu-
lated as a function s( m), where m is the marking of the SN submodel, and is used as a
guard to enable the instantaneous transition that triggers the proper phase. The place
Pchoice is an auxiliary place to model the selection of the next phase.
The possibility of defining marking dependent firing times of deterministic transitions
permits to model activities whose duration can be modified after they have already
started, taking however into account the amount of time already spent. More precisely,
suppose the a deterministic transition has been enabled for δ units of time, and, accord-
ing to the current marking of the DSPN, has a firing time of α units of time. Then, a
marking change occurs, and the firing time in the new marking is β . In the new marking,
an elapsed firing time equal to ( δ / α )β is taken for the transition, since δ out of the αunits of time of its firing delay have already elapsed. The transition will fire in the new
marking, if no further marking changes take place, when the elapsed firing time reaches β .
This particular dependency has been named marking-dependency by a scaling factor [24,
58], because the distribution of the firing time is affected by the marking changes only in
its expected value, and not in its general low.
DSPN models for PMSs 81
5.2 Analytical solution
As we discussed in Chapter 1, a DSPN model can be analytically solved by applying the
results of the MRGP theory, provided that the condition expressed by Assumption 1.1
is satisfied, that is at most one deterministic transition is enabled in each of the possible
markings of the net. This condition is obviously satisfied for the DSPN model of a PMS,
provided that the only deterministic activities modelled are the phase duration. In this
case, there is only one (and exactly one) deterministic transition enabled, which corre-
sponds to the current phase of the system, and an elegant analytical solution can be given
for the transient marking probabilities by specialising the general solution techniques for
MRGPs.
The sequence of regeneration points { Ti ,i ≥ 0 } of the underlying MRGP is chosen an the
successive firing times of the deterministic phase transitions in the PhN. The kernel ma-
trices are accordingly defined, as detailed in Chapter 1, and the matrix V( t ), which de-
scribes the time-dependent behaviour of the DSPN model, can be obtained as the solution
to Equation (1.5). Solving Equation (1.5) to obtain matrix V( t ) for a general MRGP re-
quires the use of numerical algorithms or of Laplace transforms, and can be a computa-
tionally intensive task. However, the DSPN models of PMSs have the following prop-
erty that allow us to simplify the general expression of the transient probability matrix of
an MRGP:
Property 1: in every non-absorbing marking of the DSPN model of a PMS, there is
always one deterministic transition enabled.
This property allow to tailor the general theory introduced in Chapter 1 to derive a much
more efficient solution technique. We first consider the case when the PhN has a linear
structure, that is no flexibility is allowed for the mission. The ordered sequence of phases
performed through the mission is denoted with 1,2,K,n , and the deterministic duration
of the phases is denoted with τ1 ,τ2 ,K ,τn . Moreover, we also assume for the time being
that the duration of the phases does not depends on the marking of the SN. Then, we ad-
dress the more general case when the PMS can dynamically select the next phase to be
performed, and consequently the PhN shows a tree structure. After that, we briefly men-
tion the technique to be applied to analytically deal with deterministic transitions having
DSPN models for PMSs 82
marking-dependent firing times. Then, the analytical derivation of the sensitivity function
for the time-dependent marking occupation probabilities is presented, and last we give the
main guidelines for the automation of the evaluation procedure.
5.2.1 Linear PhN
Let S denote the state space of the marking process of a DSPN for a PMS having a PhN
with linear structure. Let tiDet be the deterministic transition, with firing time τi , which
models the time the PMS spends in phase i , i = 1,2,K,n , respectively.
For any marking m of the DSPN state space S , let D( m ) denote the deterministic tran-
sition which is enabled in marking m . Consider the subsets Si of S defined as follows:
Si = m ∈S D( m )= t
iDet{ }, i = 1,2,K,n
Owing to Assumption 1.1 together with the Property 1 stated above, the sets Si are dis-
joint one from another. The marking process { m( t ),t ≥ 0 } moves from Si to Si+1,
i = 1,2,K,n − 1 over time. Moreover, denote with Sn+1 the set of markings given by
S \ Sii=1
nU , which contains those absorbing markings reached by the model at the end of
the mission, when a token is in place STOP. The sets Si, i = 1,2,K ,n+ 1 form a parti-
tion of the marking set S . Let Ci denote the cardinality of set Si , i = 1,2,K ,n+ 1.
Now, consider the continuous-time Markov chain { mi( t ),t ≥ 0 } whose state transition
diagram corresponds to the reduced reachability graph of the DSPN when transition tiDet
is enabled, that is within the time interval [Ti−1,Ti ), i = 1,2,K,n . This stochastic pro-
cess is called the subordinated process of transition tiDet . The state space of
{ mi( t ),t ≥ 0 } is a subset of S , and is given by Si . The generator matrix of
{ mi( t ),t ≥ 0 }, denoted by Qi , can be obtained by analysing the evolution of the DSPN
while staying in phase i , i = 1,2,K,n . More precisely, let λ( m, ′m ) denote the transi-
tion rate from marking m to marking ′m in the reduced reachability graph. Matrix Qi has
size Ci ×Ci , and is defined as follows:
DSPN models for PMSs 83
Qi = qm , ′mi =
λ( m, ′m ) m ≠ ′m
− qm ,ri
r∈Si ,r≠m∑ m = ′m
⎧⎨⎪
⎩⎪ , m , ′m ∈Si
Between the regeneration points Ti−1 and Ti , i = 1,2,K,n , the evolution of the MRGP
underlying the DSPN follows that of the simple subordinated continuous-time Markov
chain { mi( t ),t ≥ 0 }. Therefore, for any t ∈[Ti−1,Ti ), the transient probability matrix
can be obtained through the exponential of the matrix Qi .
Let us define the following branching-probability matrices which account for the branch-
ing probabilities as the deterministic transitions fire. Let Δi , i = 1,2,K,n − 1, be the ma-
trix defined as follows:
Δi = δ
m, ′mi = Pr ob[Yi = ′m m( Ti− ) = m ], m ∈Si , ′m ∈Si+1
where we recall that Yi is the marking of the DSPN at the regeneration instant Ti , which
is phase i completion time, and Ti− denotes the instant of time that immediately pre-
cedes Ti . The branching-probability matrix Δi has dimension Ci ×Ci+1 , i = 1,2,K,n − 1.
These matrices can be automatically obtained when the reachability graph is generated
[22].
According to the partition Si , i = 1,2,K ,n+ 1 of the state space S , the matrices K( t )
and E( t ) can be written in block form K( t ) = Ki, j( t ) , and
E( t ) = Ei, j( t ) . The block
matrices Ki , j( t ) and Ei , j( t ), i = 1,2,K ,n+ 1 have size Ci ×Cj . We emphasise that sub-
scripts i and j now denote the phase being executed and not individual markings of the
DSPN. Thus Ki , j( t ) and Ei , j( t ) are submatrices of K( t ) and E( t ), respectively.
Let us observe that the study of the transient evolution of the marking process
{ m( t ),t ≥ 0 } can be stopped at the time T n where the last phase ends, which due to the
deterministic duration of all the phases is known in advance. Thus, we do not need to
consider the evolution of the marking process inside the subset Si+1 of the state space S .
The blocks of interest of the kernel matrices are defined as follows:
Ki , j ( t ) = eQiτ iΔ iu( t − τ i ) 1 ≤ i ≤ n− 1, j = i + 1
0 otherwise
⎧⎨⎪
⎩⎪
DSPN models for PMSs 84
Ei , j ( t ) = eQiτ i (1− u( t − τ i )) j = i
0 otherwise
⎧⎨⎪
⎩⎪
where u( t − τ i ) is the delayed unit step function defined as u( t ) = 0 if t < τ i , and
u( t ) = 1 if t ≥ τ i, i = 1,2,K,n . Let us observe that function u( t − τ i ) is the cumulative
distribution of the deterministic firing time of transition i , for i = 1,2,K,n . Matrices
K( t ) and E( t ) depend on time only through these delayed unit step functions because
of the regeneration points {T i ,i ≥ 0 } are chosen as the firing times of deterministic tran-
sitions. Moreover, note that due to the linear structure of the PhN matrix, K( t ) is a
block upper-diagonal matrix with only a band of non-zero blocks, and matrix E( t ) is a
block diagonal one. These properties will be useful to simplify the general form of the
transient probability matrix V( t ), which gives the time-dependent marking occupation
probabilities and is the solution to matrix Equation (1.5). Indeed, we can solve Equation
(1.5) for matrix V( t ) by exploiting the block partitioning V ( t ) = V i , j ( t ) , as follows:
V i , j ( t ) = Ei , j ( t )+ dKi ,h( x )V h , j ( t − x )
0
t
∫h=1
n
∑ , t ≥ 0
By eliminating from the summation the null blocks of matrix K( t ) we rewrite the preced-
ing expression as follows:
V i , j ( t ) = Ei , j ( t )+ dKi ,i+1( x )V i+1 , j ( t − x )
0
t
∫ =
= Ei , j ( t )+ eQiτ iΔ i u( x − τ i )V i+1 , j ( t − x )
0
t
∫ dx =
= Ei , j ( t )+ eQiτ iΔ iV i+1 , j ( t − τ i ) , t ≥ 0 (5.1)
where the last expression comes from the fact that the derivative of function u( x − τ i ) is
the Dirac impulse function at τ i , which allows us to reduce the convolution integral to
just a time shift. Observe that matrix V( t ) shows an upper-triangular block structure.
DSPN models for PMSs 85
This allows the linear system in Equation (5.1) to be solved by backward substitutions,
to recursively obtain all the non-zero blocks V i , j ( t ) of matrix V( t ), as follows:
V i , j ( t ) = eQhτhΔhh=1
j−1∏⎛⎝ ⎞⎠E j , j ( t − τh )
h=1
j−1∑ i ≤ j
0 i > j
⎧⎨⎪
⎩⎪(5.2)
where the product over an empty set is intended to be the identity matrix.
It is worthwhile observing that evaluating the formula given in (5.2) to obtain the tran-
sient state probability matrix only requires us to derive matrices eQhτh , h = 1,2,K , j and
Δh , h = 1,2,K , j − 1. The solution of the single DSPN model is reduced to the cheaper
problem of deriving the transient probability matrices of a set of homogeneous, time-con-
tinuous Markov chains whose state spaces are proper subsets of the whole state space of
the marking process. Therefore, the formula (5.3) provides a highly effective way of
solving the single DSPN model of a PMS with a linearly structured mission.
5.2.2 Tree-like PhN
Now, consider the case when the next phase to be performed can be chosen at the time
the current phase ends. The PhN exhibits a tree-structure, with all the leaves of the tree
linked to the place STOP. The solution of Equation (5.1) is quite similar to the previous
case, but it requires a little bit more notation. As before, let 1,2,K ,n be the set of phases
and tiDet ,t
2Det ,K ,t
nDet be the corresponding deterministic transitions. Let f ( i ) denote the
forward phases of phase i in the PhN, that is the set of phases which can be performed
after phase i, i = 1,2,K ,n. For the sake of simplicity, we assume that the ordering of
phases is such that j > i, for each j ∈ f ( i ). Note that such an ordering can always be
found because of the acyclic structure of the PhN.
Consider Markov chain { mi( t ),t ≥ 0 } , defined as in the previous subsection, and let Si
and Qi be its state space and its transition rate matrix, respectively, i = 1,2,K ,n. Define
the branching probability matrix Δ i , j , i , j = 1,2,K ,n as follows:
Δ i , j = δ
m , ′mi , j = Prob[Yi = ′m m(T i− ) = m] m ∈Si , ′m ∈S j , j ∈ f ( i )
0 otherwise
⎧⎨⎩
DSPN models for PMSs 86
According to the partition Si, i = 1,2,K ,n of the state space S , matrix K( t ) can be writ-
ten in block form as K( t ) = Ki , j ( t ) . Each block Ki , j ( t ) has size Ci ×C j and is de-
fined as follows:
Ki , j ( t ) = eQiτ iΔ i , j u( t − τ i ) , i , j = 1,2,K ,n, t ≥ 0
Note that matrix E( t ) which accounts for the local evolution of the MRGP while within
a phase remains unchanged. The tree-structure of the PhN still results in a block upper-
diagonal form of matrix K( t ), but in this case more non-zero blocks may appear in each
row of the block matrix. Equation (1.5) can be rewritten as follows:
V i , j ( t ) = Ei , j ( t )+ dKi ,h( x )V h , j ( t − x )
0
t
∫h=1
n
∑ =
= Ei , j ( t )+ eQiτ i Δ i ,h u( x − τ i )V h , j ( t − x )dx
0
t
∫h∈ f ( i)∑ =
= Ei , j ( t )+ eQiτ i Δ i ,hV h , j ( t − τ i )h∈ f ( i)∑ , t ≥ 0 (5.3)
The linear system in (5.3) can be solved by backward substitutions and a solution can be
provided as follows by exploiting the acyclic structure of the PhN. Consider the unique
path p( i , j ) linking phase i to phase j according to the tree-structure of the PhN. This
path is a set of phases p( i , j ) = { p1 ,p2 ,K ,pr }, where p1 = i , pr = j and
ph+1 ∈ f ( ph ), h = 1,2,K ,r − 1. Matrix V i , j ( t ) is then given by the following formula:
V i , j ( t ) = eQ ph
τ ph Δ ph ,ph+1h=1
r−1∏⎛⎝⎞⎠E j , j ( t − τ ph
)h=1
r−1∑ p( i , j ) ≠ ∅
0 otherwise
⎧⎨⎪
⎩⎪(5.4)
Equation (5.4) gives an operative way to evaluate the transient probability matrix of the
DSPN model through the separate analysis of the various alternative paths which com-
DSPN models for PMSs 87
pose the mission. The preceding Equation (5.2) can be derived as a particular case of
Equation (5.4).
5.2.3 Deterministic transitions with marking dependent firing rate
The treatment of deterministic transitions having a marking dependent firing rate is easily
reduced to that of fixed delay transitions [24]. Suppose phase i is modelled by a deter-
ministic transition tiDet whose firing rate is a function g( m) of the current marking m of
the net.
In order to give a marking-independent definition of the amount of work performed by
the marking-dependent deterministic transition, it is possible to introduce a normalised
elapsed firing time, as the ratio between the elapsed firing time τm and the deterministic
firing delay g( m) in marking m. The normalisation is extended to the transition rate ma-
trix Qi of the subordinated Markov chain representing the evolution of the PMS within
phase i. The rate of the outgoing transitions of marking m is multiplied by the firing de-
lay g( m) of tiDet . Correspondingly, the firing rate of transition
tiDet is set to a normalised
unitary value.
An intuitive interpretation of this scaling of the subordinated Markov chain lies in modi-
fying the rate, that is the relative speed, of the exponential transitions of the SN that are
concurrently with tiDet , rather than changing the speed of the deterministic transition
tiDet .
5.2.4 Sensitivity analysis of DSPN models of PMSs
The existence of an analytic expression for the marking dependent occupation probabili-
ties allows to evaluate the derivatives of the dependability measures of interest with re-
spect to the variations of some parameter θ . These derivatives can be conveniently em-
ployed to perform an analytical study of the effects that the parameter variations have on
the dependability [35]. Indeed, the absolute value of the derivative indicates the magni-
tude of the variations of the measure for a small perturbations of the parameter, and its
sign reveals whether an increase of the parameter value causes a corresponding increase or
instead a decrease of the measure. Therefore, the analytical sensitivity analysis can fruit-
fully complement the means available for the study of a PMS, in some cases avoiding or
DSPN models for PMSs 88
limiting the need of performing long batches of evaluations for a number of different val-
ues of the parameters. Moreover, the pointwise derivatives of the measure can be em-
ployed to provide accurate approximations of the measure over continuous intervals,
through a series approximation.
In the following, we compute the derivative of the transient marking occupation probabil-
ity vector at the end of the mission, with respect to some independent parameter θ . The
sensitivity analysis of steady-state occupation probabilities for DSPN models has al-
ready been presented in [23].The analytical derivation of this derivative is just an example
of the sensitivity studies that can be performed once the analytical expression for the
transient marking occupation probabilities are available. The initial probability vector, the
duration of phases, the firing rates of exponential transitions, as well as the probability
mass functions ruling the firing of immediate transitions may be dependent on parameter
θ . For the sake of simplicity, we consider a PMS system having a chain-like structure of
the PhN, the extension to a tree-like PhN being simply a matter of more complicate nota-
tion.
According to Equation (5.2), for the transient marking probability matrix, the vector of
marking probabilities m( τ i )
i=1n∑ at the end of the mission is given by:
m( τ i
i=1
n
∑ ) = m0V 1 ,n( τ ii=1
n
∑ ) = m0 eQhτhΔh( )h=1
n−1
∏ En ,n( τn )
where m0 is the initial assignment of marking probabilities, and En ,n( τn ) is given by
En ,n( τn ) = eQnτn . To avoid a cumbersome notation, let use define Δn = I , and
Mh = eQhτhΔh , h = 1,2,K ,n, so that vector m( τ i )
i=1n∑ can be conveniently rewritten as
follows:
m( τ i
i=1
n
∑ ) = m0 Mhh=1
n
∏
Now, we evaluate the sensitivity function s(θ ), defined as the derivative of m( τ i )
i=1n∑
with respect to θ :
DSPN models for PMSs 89
s(θ ) = ∂
∂θm( τ i
i=1
n
∑ ) = (∂∂θ
m0 ) Mhh=1
n
∏ + m0∂∂θ
Mhh=1
n
∏
The derivative of the product of the Mh, which appears in the last term of preceding
Equation is obtained as follows:
∂∂θ
Mhh=1
n
∏ = Mhh=1
k −1
∏⎛
⎝⎜
⎞
⎠⎟∂∂θ
Mk Mhh=k +1
n
∏⎛
⎝⎜
⎞
⎠⎟
⎡
⎣⎢⎢
⎤
⎦⎥⎥k =1
n
∑
Last, the derivative of Mk can be obtained as follows:
∂∂θ
Mk = ∂∂θ
eQ kτ k( )Δk + eQ kτ k∂∂θ
Δk
By combining the intermediate results found so far, we obtain the most general formula
for the sensitivity function s(θ ), for it takes into account the dependence from θ of any
other parameter of the DSPN model. With respect to the evaluation of the PMS time-de-
pendent marking occupation probabilities, the estimation of the sensitivity only requires
an additional small computation effort. Indeed, the only non trivial computation is that of
the derivative of the matrix exponential, for which efficient algorithms have been defined
[10, 39, 59]. Moreover, the general formula can be specialised if the dependence from θsolely involves some particular component of the net. Consider for instance the following
interesting cases.
Suppose that the phases in a subset D of 1,2,K ,n have a duration that is a function of
θ , and no other parameters are affected by θ variations. We can estimate the sensitivity
of the final marking occupation probability vector with respect to the length of the
phases in D . In this case, the general expressions for the derivative can be simplified, be-
cause the functions Mk only depend on θ through the τk , and matrices Qk and Δk are
independent from θ , k = 1,2,K ,n. Specifically, the derivative of Mk can be rewritten as
follows:
∂∂θ
Mk = ∂∂θ
eQ kτ k( )Δk = ∂∂θ
( Qk τk )i
i!i=0
∞
∑ Δk = ∂∂θ
( Qk τk )i
i!i=1
∞
∑ Δk =
DSPN models for PMSs 90
=
Qki
i!(∂∂θ
τki
i=1
∞
∑ )Δk =Q
ki
i!iτ
ki−1 ∂τk
∂θi=1
∞
∑ Δk = Qk
Qki−1
( i − 1)!τ
ki−1 ∂τk
∂θi=1
∞
∑ Δk =
= Q
keQ kτ k
∂τk
∂θΔk = Q
kMk
∂τk
∂θ
By combining this last Equation with the others previously obtained, we obtain the fol-
lowing formula for the sensitivity function:
s(θ ) = m0 Mh Qk Mk
∂τk
∂θ⎛⎝⎜
⎞⎠⎟h=1
k −1
∏k∈D∑ Mh =
h=k +1
n
∏ m0 Mh Qk∂τk
∂θ⎛⎝⎜
⎞⎠⎟h=1
k −1
∏k∈D∑ Mh
h=k
n
∏
If we are interested in the single one particular phase j, and the dependence of τ j from
θ is such that τ j = θ we can further simplify to:
s(θ ) = m0 MhQk
h=1
k −1
∏ Mhh=k
n
∏
Now, suppose we want to study the sensitivity with varying some of the firing rates of
some exponential transitions, as for instance the repair rates. In this case, the generator
matrix Mk of the subordinate Markov chain during phase k , k = 1,2,K ,n, is the only el-
ement of the sensitivity function that depends on θ . The general formula for s(θ ) can be
rewritten as follows:
s(θ ) = m0 Mh
∂∂θ
eQ kτ kΔk⎛⎝⎜
⎞⎠⎟h=1
k −1
∏k =1
n
∑ Mhh=k +1
n
∏
Consider now the case that at the end of each phase the system is inspected, and some
maintenance actions are undertaken, depending on the status of its components. The re-
pair actions can be of several types, ranging from the minimal repair which brings a unit in
the condition it was immediately before the failure occurred, up to the replacement of
faulty or aged units with brand-new ones. Moreover, the repair itself can be unsuccessful,
and some damages may be inflicted to components. A coverage factor of the repair can be
therefore included in the modelling of the inter-phase maintenance. A sensitivity analysis
DSPN models for PMSs 91
can be performed to evaluate the impact that this coverage factor may have on system
dependability. Therefore, assuming that the coverage factor is a function of parameter θ ,
matrices Δk , k = 1,2,K ,n− 1 are the sole elements that depend on θ , and the formula
for the sensitivity can be rewritten as follows:
s(θ ) = m0 MheQ kτ k
∂Δk
∂θh=1
k −1
∏k =1
n−1
∑ Mhh=k +1
n
∏
5.3 Evaluation procedure
To compute the dependability measures of the PMS, including the reward-base measures,
the transient marking occupation probability vector m( t ) at time t , t ≥ 0 , must be de-
rived. We can obtain m( t ) from the transient probability matrix V ( t ) given by Equation
(5.4) in last section, with the equation m( t ) = m0V ( t ), where m0 is the initial probability
vector of the DSPN. To numerically evaluate the transient state probability vector m( t )
of the DSPN models presented in the previous sections, different approaches can be con-
sidered.
A general purpose transient solver for DSPNs as TimeNET [37] can be used for this
purpose. TimeNET provides many of the modelling features available under the DSPN
paradigm, and is able to support the proposed modelling methodology. Thus, the PMS
models can be built in the TimeNET environment according to the proposed structure and
solved with the transient solution algorithm based on supplementary variables [36, 38]
that the tool implements. Of course, in this case no advantage is obtained from the par-
ticular structure of the models. A system consisting of as many differential equations as
the number of non-vanishing markings is built and solved. The asymptotic computational
cost needed for the solution is the same as the one required by the method of Smotherman
and Zemoudeh [79] described in Chapter 3.
In fact, in order to take advantage of the separability of the PMS model solution, a spe-
cific algorithm must be developed and implemented, to obtain from the PMS model de-
scription the matrices needed to evaluate Equation (5.4). The solution algorithm takes as
DSPN models for PMSs 92
input the DSPN model and its initial probability vector, and performs the following
steps:
1) builds the Markov chain subordinate to phase i, and obtain its transition rate ma-
trix Qi, i = 1,2,K ,n;
2) builds the branching probability matrix Δ i , j , from phase i to phase j,
i = 1,2,K ,n− 1 , j ∈ f ( i );
3) obtains the transient state probability vector of the subordinate Markov chain i at
T i, the ending time of the corresponding phase i, for i = 1,2,K ,n;
4) multiplies the initial probability vector and the matrices, according to the ordering
given by Equation (5.4), to obtain the final result.
All of the steps described above only require well-known algorithms; in fact they have
been implemented in most of the tools for the automated evaluation of dependability.
Therefore, such tools can be used at various extents to simplify some of the steps of the
algorithm. For example, it is possible to use any Markovian solver to carry out the tran-
sient analysis of the subordinate Markov chains in step 3 of the algorithm. In particular,
the Stochastic Petri Net Package (SPNP) [25] which supports the SRN modelling
paradigm, can be efficiently used to automate all the steps of the algorithm with a limited
additional programming effort.
The computational cost for the solution of a PMS is given by the cost to build the matri-
ces at steps 1) and 2), plus the cost for the transient solutions and the multiplications at
steps 3) and 4). The transition rate matrices Qi can be generated separately one from an-
other, by building the reduced reachability graph of the DSPN when transition tiDet is en-
abled, i = 1,2,K ,n, whereas to build matrices Δ i , j , i = 1,2,K ,n− 1, j ∈ f ( i ), the gener-
ation of the reachability graph for two consecutive phases is required.
It is worthwhile observing that the state space of the MRGP process does not need to be
generated and handled as a whole. Therefore, the overall computational cost of the algo-
rithm sketched above is dominated by the operations at the steps 3) and 4), which require
the same cost O( C
i2 qiτ ii=1
n∑ ) as the one needed for the evaluation of the hierarchical
modelling approach presented in the previous Chapter 4. Thus, with our single model ap-
proach we are able to efficiently and exactly deal with a quite general scenario of PMS, at
a computational complexity that is again limited by the threshold O( C
i2 qiτ ii=1
n∑ ). The
DSPN models for PMSs 93
issues posed by the phased-behaviour are completely solved by our method, for the case
of constant phase duration PMSs. The applicability of the DSPN methodology is only
limited by the size of the biggest Petri net model which can be handled by the automated
tools.
5.4 Application example
5.4.1 Modelling
In this section we apply the DSPN modelling approach to the example of PMS intro-
duced in Chapter 2. Due to the intrinsic limitation of the class of transitions that can be
included into DSPN models, still some approximations and simplifications of the PMS
are needed for this methodology to be applicable.
The dynamic features of the PMS mission profile are easily modelled by exploiting the
dependence between the PhN and the SN submodels, as we explained in Section 1 of this
chapter. In particular, we can represent both the dynamic selection of the next phase to
be performed at the time phase 2 ends, and the dynamically adjusted duration of phase 3,
for which we can introduce a marking-dependent deterministic transition to model the
two possible expected values τ3 and τ3 of the phase duration. The duration of the phase
is only allowed to have a constant duration, therefore we must introduce the same ap-
proximation considered in Chapter 4 for the hierarchical modelling methodology. We ap-
proximate the random duration of phase i with the expected value τi of the distribution
Fi( t ), for i = 1,2,4 .
Once again, the modelling of the repair activities leads to an additional approximation.
Indeed, the SN submodel can only contain exponential and instantaneous transitions; the
distribution μ( t ) of the repair rate needs to be modelled by an exponential transition
having a time-invariant constant rate μ . A proper value of μ is the inverse of the ex-
pected value of the time necessary to complete the repair.
The DSPN model of the PhN is shown in Figure 5.5. A token in place Pi, i = 1,2,K ,4, ′4
models the execution of the corresponding phase of the PMS. Notice that Place P ′4 is in-
troduce because phase 4 can be executed along two possible different paths of the mis-
DSPN models for PMSs 94
sion, that is the path that includes the secondary objective phase 3, and the one that does
not. The deterministic transitions tiDet , i = 1,2,K ,4, ′4 , models the duration of the corre-
sponding phases. The place Pchoice , and the two instantaneous transitions tyes and tno ,
model the dynamic selection of the next phase to be performed, which takes place at the
end of phase 2. The two instantaneous transitions are guarded by an enabling condition
that is a function of the marking of the SN submodel that will be presented in the follow-
ing. Similarly, transition t3Det has a firing rate that depends on the marking of the SN sub-
model. For the sake of clearness, we collect all the dependencies of the PhN parameters
on the SN markings in Table 5.1. The rewards associated to the phases are shown in bold-
face. The initial marking of the PhN submodel assigns a single token to the place P1,
modelling the start of the first phase of the mission, and no tokens to all the remaining
places.
P P
P P
P
1 2
3
4
4'
tt
t
Det1 2
3
4
4'yes
notDet
tDet tDet
tDetPhN
Pchoice w10w
10w
Figure 5.5: PhN submodel for the example of PMS
PhN element SN marking
t yes enabled if: m(Down) = 0
tno enabled if: m(Down) > 0
firing rate of t3Det = τ3 : m(Down) = 0
firing rate of t3Det = τ3 : m(Down) > 0
Table 5.1: Parameters of the PhN dependent of the SN marking
The SN submodel for the example of PMS is shown in Figure 5.6. It is worthwhile ob-
serving the extreme conciseness of the overall modelling of the PMS with respect to the
number of different submodels we defined according to the hierarchical modelling
methodology presented in Chapter 4. The two submodels shown in Figure 5.5 and 5.6
completely represent our example of PMS. The state of the processors composing the
DSPN models for PMSs 95
processing system of the PMS is modelled by the three places Up, Down, and Spare.
The marking of these places represents the number of actively working, failed and cold
spare processors, respectively. The two exponential transitions t1 and t2 model the fail-
ure and repair process of the processors, moving tokens back and forth between place Up
and place Down. The rate of transition t1 is made dependent from the marking of the
PhN, to model the different failure intensities of the processors inside the various phases.
The phase-dependent parameters of the SN, whose value is determined by the marking of
the PhN, are shown in Table 5.2.
Up
Down
Fail
S-fail1
S-fail2
t2t1 Spare
SN
Turn-off
Rec-ok
Rec-nok
c
1-c
S-fail3
Figure 5.6: SN submodel for the example of PMS
SN element PhN marking
m(P1 ) = 1 m(P2 ) = 1 m(P3 ) = 1 m(P4 ) = 1 m(P ′4 ) = 1
firing rate of t1 : 2λ ⋅ m(Up ) λ ⋅ m(Up ) 5λ ⋅ m(Up ) 2λ ⋅ m(Up ) 2λ ⋅ m(Up )
Turn − off enabled if: m(Up ) > 3 m(Up ) > 2 m(Up ) > 3 m(Up ) > 3 m(Up ) > 3
Re c − ok enabled if: m(Up ) < 2 m(Up ) < 1 m(Up ) < 2 m(Up ) < 2 m(Up ) < 2
Re c − nok enabled if: m(Up ) < 2 m(Up ) < 1 m(Up ) < 2 m(Up ) < 2 m(Up ) < 2
S − fail1 enabled if: m(Down) = 3 m(Down) = 4 m(Down) = 2 m(Down) = 2 m(Down) = 2
S − fail2 enabled if: m(Down) = 3 m(Down) = 4 m(Down) = 2 m(Down) = 2 m(Down) = 2
S − fail3 enabled if: m(Down) = 3 m(Down) = 4 m(Down) = 2 m(Down) = 2 m(Down) = 2
Table 5.2: Parameters of the SN dependent of the PhN marking
The instantaneous transition Turn − off models the switching off of a processor that, ac-
cording to the requirements of the phase being currently executed, is not longer needed to
be active. Similarly, the other two instantaneous transitions Rec − ok and Rec− nok
model the spare reactivation, and the possible successful or failed result of this operation.
DSPN models for PMSs 96
The two instantaneous transitions Rec − ok and Rec− nok are controlled by a condition
on the marking of the SN: they can be enabled if and only if the PMS has not failed, that
is if the condition m( FAIL ) = 0 holds of the marking of the net. Moreover, all the three
instantaneous transitions Turn − off , Rec − ok , and Rec− nok , are controlled by addi-
tional enabling conditions depending on the marking of the PhN subnet, as shown in
Table 5.2.
Finally, the instantaneous transitions S − fail1, S − fail2, and S − fail 3 , are introduced
to model the failure criteria of the different phases. They are controlled by the guards
shown in Table 5.2, and once enabled, flush all the tokens of the corresponding input
places to the FAIL place, which represents the failure of the phase, and the consequent
failure of the PMS. Notice that the input arcs of these two transitions are variable cardi-
nality arcs. The initial marking of the SN submodel is shown in Figure 5.6, with three to-
kens in the Up place and one in Spare, as prescribed by the requirements shown in Table
2.1 for phase 1 of the mission. No other tokens are assigned to the remaining places of the
SN.
It is very interesting to observe the close resemblance of the conditions that relate the be-
haviour of the two submodels as they are presented in Table 5.1 and 5.2, and the form
these same conditions are given in Table 2.1 of Chapter 2. In fact, the high level DSPN
modelling allows to define models of PMSs in a way that is very similar to that of a sys-
tem specification language, easily understandable, unambiguous and easily modifiable.
5.4.2 Evaluation
We conduct an evaluation of the PMS Reliability Ri at phase i completion time
i = 1,2,3,4, with the same parameter values as those we introduced in Table 4.1.
However, it is important to keep in mind that the dynamic duration of phase 3 is now
taken into account by the marking dependent firing rate of the corresponding determinis-
tic transition of the PhN. The Reliability values for the DSPN model solution are shown
in Figure 5.7. It is interesting to compare these results with the pairs obtained with the
hierarchical modelling methodology, shown in Figure 4.13. The Reliability Ri obtained
with the DSPN model is always within the interval identified by the pair of values of Ri
returned by the hierarchical model solution for l = τ3 and l = τ3 r.
DSPN models for PMSs 97
The total reward cumulated during the mission of the PMS for various values of c, the
probability of successful reconfiguration, is shown in Figure 5.8. As in the case of the
Reliability, the rewards obtained with the DSPN approach are within the intervals de-
fined by the pair of reward values evaluated with the hierarchical Markov modelling ap-
proach, shown in Figure 4.14.
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
R1 R2 R3 R4
Figure 5.7: Reliability at phase completion times
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.85 0.9 0.95 0.99 0.995 0.999cccc
RRRReeeewwwwaaaarrrrdddd
Figure 5.8: Reward cumulated during the mission
DSPN models for PMSs 98
5.5 Concluding remarks
In this Chapter we have shown a new method to model PMS dependability related char-
acteristics using DSPN models. The method takes advantage of the power and expressive-
ness of DSPN to provide a modelling methodology that results in compact and easy-to-
read models of PMSs. At the same time, the DSPN models of PMSs can be solved in a
very efficient way, by exploiting the peculiar characteristics of the underlying MRGP
marking process.
According to the proposed methodology, the model of a PMS is built as a single DSPN,
though logically split in two parts which separately represent the sequence of phases en-
compassed by the PMS, and the failure/repair behaviour of its components. As we dis-
cussed in Chapter 2, the methods based on the single model approach offer the relevant
advantage of implicitly solving the issues related to the treatment of the dependencies
among phases. Indeed, for our DSPN model, the behaviour of the PMS between two con-
secutive phases is represented by a branching probability matrix that can be automatically
obtained from the analysis of the reachability graph. This completely relieves the depend-
ability analyst from the problem of dealing with the mapping of the marking occupation
probabilities, which plagues the approaches based on the separate modelling.
Besides, the DSPN modelling permits to represent a quite satisfactory level of flexibility
of the mission. The dynamic selection of the next phase to performed is easily modelled.
Further, it is possible to take into account phases whose duration can be adjusted as the
state of the system changes, a features that has been never considered by any of the other
methods that appeared in the literature. Since the phase-dependent behaviour of the sys-
tem is described with a simple list of marking dependent predicates, it becomes easy to
rearrange and modify the model of PMSs.
These interesting characteristics are reinforced by the existence of a very efficient solution
procedure for the evaluation of the time-dependent marking occupation probabilities. We
have obtained such efficient solution technique by specialising the general transient anal-
ysis method that has been proposed for MRGPs. Remarkably, the computational com-
plexity of this evaluation technique for the DSPN models of PMSs is exactly the same as
that of the cheapest methodologies that we have presented in Chapter 2. Moreover, an-
DSPN models for PMSs 99
other interesting aspect of the DSPN approach is that it easily lends itself to the automa-
tion of the solution steps, by exploiting the capabilities of the automated tools for the
evaluation of system dependability.
We have applied our method to the quite complex example of PMS defined in Chapter 2,
to show how well it can deal with nearly all its peculiar features, including those related
to the flexibility of the mission. The main limitation of the DSPN approach is that it still
requires some of the same restrictive assumptions as the other ones that have been pro-
posed in the literature. In particular, the duration of the phases are restrained to be de-
terministic, and the phase model is restricted to contain only exponential and instanta-
neous transitions.
However, this methodology starts a new direction in the research about PMSs. In the
next chapter of this thesis, we exploit the full potentialities of MRGPs to widen the class
of systems that can be analysed.
Chapter 6
MRSPN models for PMSs
In this chapter we address the most relevant weakness shared by all the methods that
have appeared in the literature, that is the lack of exact analytic solution for the case of
PMSs that have a random phase duration. To this aim, we proceed in the exploitation of
the potentialities of the MRGP theory, which has been started in the previous chapter
with the DSPN modelling of PMSs.
As introduced in Chapter 1, the class of Markov regenerative stochastic Petri nets
(MRSPNs) has been especially defined to characterise those Petri net models whose un-
derlying marking process exhibits the MRGP structure, and is thus amenable for an exact
analytical solution within the framework of the MRGP theory. MRSPNs allow for tran-
sitions having a general distribution of the firing delays, and thus permit the exact mod-
elling and evaluation of a dramatically wider class of systems, with respect to the other
class of Petri nets.
Here, we propose the MRSPN models for the analysis of PMSs. We basically follow the
same single model approach that has been fruitfully employed in the case of the DSPN
methodology. We specialise the general MRGP theory that applies for the solution of the
MRSPN models, to exploit the special phased-structure of PMSs. The resulting analyti-
cal solution technique for PMSs has a low computational complexity, which is again basi-
cally dominated by the cost of the separate analysis of the PMSs inside each phase. This
way, the solution issues posed by the phased-behaviour are completely solved in the
general scenario of PMSs with general random phase duration. The solution for PMSs
with deterministic phase duration is obtained as a particular case, together with all the
other scenarios previously treated by the various studies in the literature.
MRSPN models for PMSs 101
The rest of the chapter is organised as follows. We first present the MRSPN approach
for PMSs, discussing the extensions of the modelling representative power that it offers.
Then, we define the analytical solution procedure for the MRGP underlying the MRSPN
model, by specialising the general solution approach presented in Chapter 1. We obtain
the exact analytical solution for the Laplace-Stieltjes transforms of the time-dependent
marking occupation probabilities, and for the probability R of successfully completing
the mission. Also, some specialisation of the general results are given for a set of interest-
ing particular cases, and the applicability limits of the MRSPN modelling approach dis-
cussed. Furthermore, the analytical derivation of the sensitivity functions is presented.
The modelling approach is thus applied to our example of PMS, and its evaluation carried
out in terms of the Reliability Ri at phase i completion time, i = 1,2,K ,n. Last, some
concluding remarks are given to compare the MRSPN approach to the other ones that
have appeared in the literature
6.1 The modelling approach
The modelling of PMSs through MRSPN models follows the general approach we pre-
sented in the last chapter for DSPNs. We adopt the same single modelling strategy,
building a model of the PMS that is logically split in a submodel called PhN, which repre-
sents the mission, and a submodel called SN, which represents the failure/repair behaviour
of the system components. Moreover, we also exploit the high level modelling features
introduced for DSPNs, such as marking-dependent firing rates, guards on transitions and
variable cardinality arcs, which allow to provide a concise description of the PMS system
dynamic. In this way, we follow the same general model of a PMS shown in Figure 5.1.
However, since the MRSPN class also allows for generally distributed firing times, we do
not impose any constraint on the transitions of the PhN subnet.
To ensure that such Petri net model is indeed a MRSPN, or equivalently that its underly-
ing marking process is a MRGP, we can check whether the condition expressed by
Assumption 1.2 is satisfied, which requires that at most one general transition is enabled
in each marking of the Petri net model. As it was for the DSPN models introduced in the
last chapter, this condition is fulfilled by the general MRSPN model sketched above,
MRSPN models for PMSs 102
provided that the only general transitions of the model are those included in the PhN sub-
net.
This constraint prevents the concurrent enabling of generally distributed transitions. In
fact, we can not introduce any general nor deterministic transition in the SN submodel
without violating Assumption 1.2. However, in the special case of the models for the
PMSs only, a much less restrictive assumption is needed to ensure that the Petri net
model is indeed a MRSPN:
Assumption 6.1: whenever a timed transition fires in the PhN, each enabled transi-
tion resamples a new firing time, which can only depend on the marking of the net.
Owing to the structure of the model, this condition is sufficient to ensure that the firing
times of the timed transitions in the PhN are regeneration points of the MRSPN model,
and thus the underlying marking process is a MRGP. Note that it implies that all memory
of the model evolution, apart from that included in the marking, is lost whenever a new
phase begins. The limitations that this regeneration imposes on the representativity of the
models will be carefully discussed in the following.
It is worthwhile observing that this condition is much less restrictive than the one pre-
scribed by Assumption 1.2. Between two regeneration points, the marking process of the
model can be a completely general stochastic process. For instance, this allows to include
general transitions in the SN submodel, which can be concurrently enabled with the gen-
eral transitions of the PhN, as long as no transition fires in the PhN. Moreover,
Assumption 6.1 could be further relaxed. In fact, what is actually required to ensure that
the model is a MRSPN, is that a sequence of regeneration points exists. Therefore, it
would be sufficient that the property stated by Assumption 6.1 hold for some subset of
transitions in the PhN. However, throughout the rest of the chapter we assume that
Assumption 6.1 holds of the model of the PMS.
6.2 Analytical solution
We present now the specialisation of the general solution method for MRGPs to the ana-
lytical study of the MRSPN models of PMSs. We consider in the following the case
when the PhN submodel shows a linear structure, that is the PMS sequentially performs
MRSPN models for PMSs 103
the n phases the mission consists of. The generalisation of the results to the models of
PMSs with a tree structure is simply a matter of a slightly more complicated notation, as
it has been for the case of the DSPN modelling approach. In the following, we first derive
the analytical expression for the transient marking occupation probabilities, and then we
focus on the evaluation of the Reliability figures obtaining quite simple and compact ex-
pressions for some interesting cases of PMSs. Last, we analytically obtain the sensitivity
functions for both the transient marking occupation probabilities and the Reliability of
PMSs.
6.2.1 The transient marking occupation probabilities
Let 1,2,K,n be the ordered sequence of phases performed through the mission. Let tiGen
be the transition of the PhN which models the time the PMS spends in phase i,
i = 1,2,K ,n, respectively. The firing time of transition tiGen is assumed to be a general
random variable having cumulative distribution function Fi( t ), and probability density
function f i( t ) = ′Fi( t ), i = 1,2,K ,n.
Denote with S the set of markings that form the reachability graph of the MRSPN
model, and with { m( t ),t ≥ 0 } the stochastic process representing the evolution of the
marking of the MRSPN. Define T0 = 0 , and let T i be the time instant at which transition
tiGen completes, i = 1,2,K ,n. As in the case of DSPN models of PMS, it is easy to check
that the phase ending times T i represent regeneration points for the marking process
{ m( t ),t ≥ 0 }. Indeed, the property stated by Assumption 6.1 guarantees that the memo-
ryless Markov property holds of the marking process of the MRSPN at the phase com-
pletion times. Thus, the sequence of bivariate random variables { (Yi ,T i ),i = 1,2,K ,n},
where Yi denotes the marking of the MRSPN model at the instant of time immediately af-
ter T i, is a Markov regenerative sequence, and consequently the marking process
{ m( t ),t ≥ 0 } is a MRGP.
The global kernel matrix K( t ) = km , ′m ( t ) and the local kernel matrix
E( t ) = em , ′m ( t ) ,
m , ′m ∈S , t ≥ 0 , are defined as in the case of DSPN models. For the sake of clearness we
recall the analytical expressions of their entries:
km , ′m ( t ) = Prob[Y1 = ′m ,T 1 ≤ t Y0 = m] (6.1)
MRSPN models for PMSs 104
em , ′m ( t ) = Prob[m( t ) = ′m ,T 1 > t Y0 = m]
(6.2)
To obtain the transient probability matrix V ( t ), which describes the transient behaviour
of the MRGP marking process, we proceed with the same approach adopted for the
DSPN models of PMSs in Chapter 5, by separately studying the marking process
{ m( t ),t ≥ 0 } over the different phases.
Consider the subordinate process that describes the MRGP evolution during the time in-
terval [T i−1 ,T i ), when transition tiGen is enabled. Notice that, due to the possibility of
having concurrently enabled general transitions in the MRSPN model, the subordinate
processes are not restricted to be Markov chains as it was in the case of DSPNs models,
but can be general stochastic processes. We denote with Π i( t ) the transient probability
matrix of the subordinate process during the time period [T i−1 ,T i ). Moreover, denote
with Si the state space of the subordinate process, i = 1,2,K ,n. Because of the structure
of the MRSPN model of a PMS, the sets Si, i = 1,2,K ,n are disjoint from each another.
The set given by S \ Sii=1
nU contains those absorbing markings reached by the model at
the end of the mission, when a token is in place STOP. Denote by Sn+1 this set of mark-
ings. Let us observe that in the case of DSPN models of PMSs we did not need to take
into account the evolution of the marking process over the states of the set Sn+1 because
the measures of interest could be evaluated from the transient marking occupation prob-
abilities evaluated at the mission completion time, which was known. Here, for PMSs
that have a random duration of the phases, the mission completion time is unknown, and
we must analyse the evolution of the PMS after the last phase ending time. We consider a
fictitious phase n+ 1 of infinite duration to properly take into account the random times
to absorption into the states of Sn+1.
The sets Si, i = 1,2,K ,n+ 1, form a partition of the marking set S . Let Ci be the cardi-
nality of set Si , i = 1,2,K,n + 1. The global kernel K( t ) and the local kernel E( t ) matri-
ces can be reordered according to the partition defined by sets Si, i = 1,2,K,n + 1, and
thus partitioned in blocks Ki , j ( t ) and Ei , j ( t ) having dimensions Ci ×C j , respectively.
Matrices K( t ) and E( t ) exhibit exactly the same block structure they had in the case of
DSPN model of PMSs, that is the block matrices Ki , j ( t ) and Ei , j ( t ) take non-zero val-
MRSPN models for PMSs 105
ues if and only if j = i + 1 and j = i , respectively. Indeed, this particular structure of the
kernel matrices is due to the peculiar features of the PMS.
Let Δ i , i = 1,2,K ,n be the Ci ×C j branching-probability matrix defined as follows:
Δ i = δ
m , ′mi = Prob[Yi = ′m m(T i− ) = m] , m ∈Si , ′m ∈Si+1
where T i − denotes the time instant immediately before T i. The elements of the global
kernel defined by Equation (6.1) can be rewritten as follows:
km , ′m ( t ) = Prob[Y1 = ′m ,T 1 ≤ t Y0 = m] = Prob[Y1 = ′m ,T 1 = u Y0 = m]du
0
t
∫ =
= Prob[Y1 = ′m T 1 = u,Y0 = m] f i( u)du
0
t
∫ , m , ′m ∈S , t ≥ 0 (6.3)
where we recall that f i( u) = ′Fi( u) is the probability density function of the duration of
phase i, i = 1,2,K ,n. According to Equation (6.3), the non-zero blocks Ki ,i+1( t ) of the
global kernel matrix K( t ) can be obtained as follows:
Ki ,i+1( t ) = Π i( u)Δ i f i( u)du, i = 1,2,K ,n, t ≥ 0
0
t
∫
Similarly, the entries em , ′m ( t ) of the local kernel matrix defined by Equation (6.2) can be
rewritten as follows:
em , ′m ( t ) = Prob[m( t ) = ′m ,T 1 > t Y0 = m] =
= Prob[m( t ) = ′m T 1 > t ,Y0 = m] Prob[T 1 > t ] , m, ′m ∈S , t ≥ 0
and consequently, the non-zero blocks Ei ,i( t ) of the local kernel matrix E( t ) are given
by:
Ei ,i( t ) = Π i( t )(1− Fi( t )), i = 1,2,K ,n, t ≥ 0
MRSPN models for PMSs 106
Consider the block partition of matrix V ( t ) induced by the sets Si, i = 1,2,K ,n+ 1.
Owing to the upper triangular and diagonal block structure of matrices K( t ) and E( t ), re-
spectively, all the blocks V i , j ( t ) are null if i > j, that is matrix V ( t ) exhibits an upper
triangular block form. The generalised Markov renewal Equation (1.5) can be rewritten ac-
cording to the block partitioning of the matrices:
V i , j ( t ) = Ei , j ( t )+ dKi ,i+1( u)V i+1 , j ( t − u)
0
t
∫ , i , j = 1,2,K ,n+ 1, t ≥ 0 (6.4)
and a backward substitution procedure can be applied to obtain all the non-zero blocks of
V ( t ) in the j-th column, starting from the diagonal block V j , j ( t ) = E j , j ( t ).
The solution of the system of integrals in Equation (6.4) to obtain matrix V ( t ) can be
carried out numerically in a direct way, but this would result in a computationally expen-
sive procedure. Indeed, each of the substitution steps requires the solution of the integral
Equation (6.4) for the associated submatrices, which is costly and may require the intro-
duction of numerical approximations.
Alternatively, we can resort to the analysis in the Laplace-Stieltjes transform (LST) do-
main. Define the Laplace transform of a f ( t ) to be the function
e−sto+∞∫ f ( t )dt, and its
LST fo( s ) as follows:
f o( s ) = e−st df ( t )
0
+∞
∫
The properties of LSTs are thus derived from those of Laplace transforms, which for the
sake of convenience we list in Appendix A.
If we take the LST of both sides of Equation (6.4) we obtain:
Voi , j ( s ) = Eo
i , j ( s )+ K oi ,i+1( s )V o
i+1 , j ( s ), i = 1,2,K ,n+ 1, j ≥ i
where Voi , j ( s ), E
oi , j ( s ), and K
oi ,i+1( s ) are the LSTs of V i , j ( t ), Ei , j ( t ) and
Ki ,i+1( t ), respectively. Then, the backward substitution procedure can be explicitly per-
formed and we can get the expression for each block of the of matrix Vo( s ), as follows:
MRSPN models for PMSs 107
V o
i , j ( s ) = K oh ,h+1( s )
h= i
j−1
∏ Eoj , j ( s ), i = 1,2,K ,n+ 1, j ≥ i (6.5)
Equation (6.5) is remarkable for its generality. It allows to derive the LST of the time de-
pendent marking occupation probabilities for the MRSPN model of a PMS with general
duration of the phases, and general intraphase marking process. Of course, a numerical
antitransformation is needed to obtain the final result in the time domain. However, we
only need to antitransform a single block Voi , j ( s ), rather than performing a sequence of
numerical solutions of integral equation systems, as required by the analysis in the time-
domain with Equation (6.4). Moreover, as we shall see in the following, the LSTs of the
kernel blocks can be explicitly computed for a wide family of distributions of the phase
duration.
6.2.2 Analytical evaluation of PMS Reliability
Starting from the transient MRGP probability matrix V ( t ), we can compute the proba-
bility R of successfully completing the mission. If m0 is the initial probability vector
over the markings of the reduced reachability graph of the PMS model, the expression
m0 ⋅V 1 ,n+1( t ) is the time-dependent marking occupation probability vector at time t,
t ≥ 0 , over the markings of Sn+1 which represent the state of the PMS at the end of the
mission. Since these markings are all absorbing, we can compute probability R as the
product m0 ⋅V 1 ,n+1(∞ ) ⋅Θ , where Θ is the vector that selects those markings that repre-
sent successful states of the system, according to the success criterion defined for the
mission of the PMS.
The computation of R is thus formulated as the problem of evaluating the limiting prob-
ability distribution of the transient MRGP process. Once the block matrix V 1 ,n+1( t ) has
been computed, the following limit needs to be obtained:
R = lim
t→∞m0 ⋅V 1 ,n+1( t ) ⋅Θ
However, it is also possible to take advantage of the LST theory to directly compute the
limit without obtaining the time-domain expression of V 1 ,n+1( t ), but rather dealing with
its transform Vo1 ,n+1( s ) given by Equation (6.5):
MRSPN models for PMSs 108
V o
1 ,n+1( s ) = K oi ,i+1( s )
i=1
n
∏ Eon+1 ,n+1( s ) = K o
i ,i+1( s )i=1
n
∏
Notice that the last equivalence comes from the fact that all the markings of Sn+1 are ab-
sorbing markings, and thus matrix Eon+1 ,n+1( s ) is the identity matrix. Then, because of
the time-invariance of m0 and Θ , and from the final-value theorem of Laplace transforms
(see Appendix A), the reliability R is obtained as follows:
R = lim
t→∞m0 ⋅V 1 ,n+1( t ) ⋅Θ = m0 lim
t→∞V 1 ,n+1( t )⎛
⎝⎞⎠ ⋅Θ = m0 lim
s→0 +⋅V o
1 ,n+1( s )⎛⎝⎜
⎞⎠⎟⋅Θ
Let K
i ,i+1∗ denote the limit for s → 0+ of matrix K
oi ,i+1( s ), which surely exists for the
properties of the transient probability matrices. Then, we can rewrite the preceding for-
mula as:
R = m0 lim
s→0 +K o
i ,i+1( s )i=1
n
∏ ⋅Θ = m0 Ki ,i+1∗
i=1
n
∏ ⋅Θ (6.6)
Thus, an explicit formula can be obtained for probability R , without any need to evaluate
the transient state probability matrix of the MRGP underlying the PMS model. It is
worthwhile observing that the approach followed to compute the reliability of the PMS
at the time the mission ends can be also applied to obtain the reliability at the system at
each phase completion instant. Indeed, each phase of the system can be seen as being the
last one the system has to perform, and thus the probability distribution Ri at the end of
a phase i, i = 1,2,K ,n, is computed as an absorption probability through the limit of the
global kernel submatrices transforms.
6.2.3 Specialising the general results
In this section we present a specialisation of the general results provided by Equations
(6.4) and (6.5) for the transient marking occupation probabilities, and by Equation (6.6)
for the Reliability R at mission completion time. We consider three particular scenarios.
First, we briefly reconsider the case when the phases have a deterministic duration, for
which we can complete the derivation of the transient marking occupation probabilities in
MRSPN models for PMSs 109
the time domain. Then, we explore the particular case when the subordinate process is a
simple homogeneous continuous-time Markov chain, that is all the timed transitions of
the SN submodel are exponential transitions. In this scenario, we consider the special case
of deterministic phase duration, for which we reobtain the same analytic results for the
Reliability R as the ones obtained with the DSPN approach, thus providing a crossed
validation of the correctness of the two methodologies. Last, we discuss the case when
the subordinate process is a more general stochastic process, and the effects of
Assumption 6.1 on the representativity of the models, to point out the applicability lim-
its of the MRSPN approach.
Constant phase duration
Let τ i be the deterministic firing time of transition tiGen , i = 1,2,K ,n. In this special case,
we can complete the analysis in the time domain, without resorting to the LSTs, but
completing the backward substitution in Equation (6.4) in the same way we have done for
the DSPN modelling approach. The blocks of the global kernel matrix K( t ) take the fol-
lowing form:
Ki ,i+1( t ) = Π i( u)Δ i f i( u)du = Π i( t )Δ iFi( t ) = Π i( t )Δ iu( t − τ i ) , i = 1,2,K ,n
0
t
∫ , t ≥ 0
and thus Equation (6.4) for the transient matrix blocks V i , j ( t ) can be rewritten as fol-
lows:
V i , j ( t ) = Ei , j ( t )+ dKi ,i+1( u)V i+1 , j ( t − u)
0
t
∫ =
= Ei , j ( t )+ Π i( τ i )Δ i f i( u)V i+1 , j ( t − u)du
0
t
∫ =
= Ei , j ( t )+ Π i( τ i )Δ iV i+1 , j ( t − τ i ) , i , j = 1,2,K ,n+ 1, t ≥ 0
MRSPN models for PMSs 110
where the last integral is solved as in the case of the DSPN models of PMSs, due to the
fact that the probability density function of the deterministic transition is the Dirac im-
pulse function at τ i , which allows us to reduce the convolution integral to just a time
shift. After completing the backward substitution procedure, we obtain the following ex-
pression for the transient probability matrix blocks Vi, j (t) :
V i , j ( t ) = Πh( τh )
h= i
j−1
∏ ΔhE j , j ( t − τhh=1
j−1
∑ ), i = 1,2,K ,n, j ≥ i , t ≥ 0 (6.7)
where the product over an empty set is to be intended the identity matrix. It is worth-
while observing the close resemblance of Equation (6.7) with the Equation for their trans-
form counterpart in (6.5). Also, the form of Equation (6.7) is equivalent to that of
Equation (5.2) we obtained in Chapter 5 for the time-dependent marking occupation
probabilities of the DSPN models of PMSs. However, it is important to remark that
Equation (6.7) is applicable to whichever MRSPN model of a PMS with constant phase
duration that satisfies Assumption 6.1, regardless of the structure of the SN submodel.
Markov chain subordinate processes
In this section we consider the particular case when the SN model contains only transi-
tions whose firing delays are either exponentially distributed or instantaneous. In this
case, the subordinate process during the time window [T i−1 ,T i ) is a continuous-time
Markov chain. This implies that the transient probability matrix Π i( t ) of the subordinate
process can be expressed via the matrix exponential eQi( t ) , where Qi is the infinitesimal
generator of the subordinate Markov process, i = 1,2,K,n + 1. The expressions given for
the global and local kernel matrix blocks can be rewritten as follows:
Ki ,i+1( t ) = eQi( t )Δ i f i( u)du,
0
t
∫ i = 1,2,K.n, t ≥ 0
Ei ,i( t ) = eQi( t )(1− Fi( t )), i = 1,2,K.n+ 1, t ≥ 0
Quite interestingly, the LSTs of the preceding expressions can be computed in an explicit
form. Indeed, the LST of the global kernel matrix blocks is obtained as follows:
MRSPN models for PMSs 111
K oi ,i+1( s ) = e−st dKi ,i+1( t )
0
+∞
∫ = L[eQi( t ) f i( t )]Δ i , i = 1,2,K ,n (6.8)
where L[⋅] denotes the Laplace transform operator. The LST of the local kernel blocks is
given by:
Eoi ,i( s ) = e−st dEi ,i( t ) =
0
+∞
∫ sL[eQi( t )(1− Fi( t ))] , i = 1,2,K.n+ 1, t ≥ 0
(6.9)
These expressions for the LSTs can be completely worked out and the integration solved
in many interesting special cases. For instance, consider the case when the duration of
phase i has an exponential distribution of parameter λ , that is Fi( t ) = 1− e−λt and
f i( t ) = λe−λt , t ≥ 0 . In this case, due to the properties of Laplace transforms, Equation
(6.8) can be rewritten as follows:
Koi ,i+1( s ) = L[eQi( t )λe−λt ]Δ i = λ((λ + s )I − Qi )−1Δ i
where to obtain the last expression we recall that the Laplace transform of the matrix ex-
ponential eM ⋅t is given by ( sI − M )−1 . Similarly, Equation (6.9) for the local kernel
block matrices becomes:
Eoi ,i( s ) = sL[eQi( t )e−λt ] = s((λ + s )I − Qi )−1
Consider now the case when phase i has a deterministic duration τ distribution. From
Equation (6.8) we obtain the following expression for the global kernel block matrices:
Koi ,i+1( s ) = L[eQit f i( t )]Δ i = e−sτeQiτΔ i = e−( sI −Qi )τΔ i
The local kernel block matrices can be obtained as follows:
Eo
i ,i( s ) = s( sI − Qi )−1 I − e−( sI −Qi )τ( )The detailed derivation of these analytical expressions for the LSTs of the kernel matrices
can be found in Appendix B, where other interesting cases are also presented.
MRSPN models for PMSs 112
Owing to the fact that the explicit expressions for the LSTs of the global kernel matrix
blocks are available, we can compute the Reliability R of successfully completing the
mission by applying Equation (6.6). Suppose for instance that phase duration are all ex-
ponentially distributed, and let λ i be the firing rate of transition tiGen in the PhN sub-
model. Then, Equation (6.6) takes the following form:
R = m0 lim
s→0 +λ i(( s + λ i )I − Qi )−1
i=1
n
∏ Δ i ⋅Θ = m0 λ i(λ iI − Qi )−1
i=1
n
∏ Δ i ⋅Θ (6.10)
Notice the simplicity of Equation (6.10), that allows to evaluate the Reliability R at a
very limited cost. If phase duration are all deterministic, and phase i lasts for τ i units of
time, Equation (6.6) specialises as follows:
R = m0 lim
s→0 +e−( sI −Qi )τ i
i=1
n
∏ Δ i ⋅Θ = m0 eQiτ i
i=1
n
∏ Δ i ⋅Θ (6.11)
in perfect agreement with the result already established in Chapter 5 with the time-do-
main analysis of the MRGP transient marking occupation probabilities of DSPN models
of PMSs.
Deterministic and random phase duration can be freely mixed and the Reliability of the
PMS is still easily obtained according to Equation (6.6). For instance, consider a periodic
PMS alternating the execution of a phase whose duration is a random variable exponen-
tially distributed with parameter λ , and a constant duration phase which lasts for τunits of time. The PMS cyclically executes the two phases over time. The Reliability R2 n
of the system at the time the 2n-th phase ends is obtained as follows:
R2 n = m0 lim
s→0 +λ(( s + λ )I − Q1 )−1Δ1e−( sI −Q2 )τΔ2( )
i=1
n
∏ Θ =
= m0 λ(λI − Q1 )−1Δ1eQ2τΔ2( )nΘ
MRSPN models for PMSs 113
General subordinate processes
In the case the SN submodel is a non Markovian Petri net, the subordinate process can be
any stochastic process. Still, the general results stated by Equations (6.4), (6.5), and (6.6)
are valid.
The limit of practical applicability of the theoretic results derived in this chapter for the
MRSPN models of PMSs is represented by the possibility of evaluating the time-depen-
dent matrix of marking occupation probabilities Π i( t ) of the subordinate process during
the enabling period of transition tiGen , i = 1,2,K ,n. This information is absolutely neces-
sary to complete the analytical derivation of the expressions for the time-dependent
marking occupation probabilities and the Reliability of the models.
Besides the particular case of homogeneous Markov chains we dealt with above, there are
other types of subordinate process that are amenable to the analytical solution in terms of
their time-dependent marking occupation probabilities. For instance, the subordinate pro-
cess can be a non homogeneous Markov chain, for which exact methods exist for the
transient analysis [71]. Another interesting case is when the subordinate process is a
semi-Markov process, as in the case when some general transitions are included in the SN
submodel [33], and they restart their firing each time a marking change occurs. Last, we
consider the case when the subordinate process is a MRGP itself. In this case the SN is
allowed to contain general transitions, which must however fulfil Assumption 1.2, that is
a single general transition can be enabled in each marking of the nested MRGP. The tran-
sient marking occupation probabilities of the subordinate process can be computed ac-
cording to Equation (1.5).
In all the scenarios above, the condition stated by Assumption 6.1 requires that the mem-
ory of subordinate process is lost each time a phase completes. This constraint that the
PhN submodel imposes on the SN submodel leads to a approximate modelling whenever
this loss of memory is not on the semantic of the activities modelled in the SN. The loss
of memory can be reflected in different ways in the behaviour of the non memoryless
transitions included in the SN. In fact we can either model the loss of memory with a
premature completion of all the timed non exponential transitions of the SN at the phase
ending times, or with their restart at the beginning of the next phase, provided that they
are still enabled. These two different possibilities can be in some cases conveniently ex-
MRSPN models for PMSs 114
ploited to compute bounds on the error that the approximate modelling introduces in the
outcomes of the evaluation, as we shall explain in the following, when dealing with the
modelling of our example of PMS.
6.2.4 Sensitivity analysis of MRSPN models of PMSs
Suppose now that some of the parameter values defining the PMS model can be varied,
for instance because they are subject to some estimation error, or because different sce-
narios of the system are to be analysed. In this case, an additional strategy of analysis of
the PMS is that offered by the study of the sensitivity of the model with respect to the
parameter variations. As an example of the various sensitivity studies that can be per-
formed, we analytically derive the sensitivity functions that represent the effects that pa-
rameter variations may have on the transient marking occupation probability matrix V ( t )
of the MRGP underlying the PMS, and on the probability R of successfully completing
the mission.
Sensitivity analysis of V (t)
To carry on the sensitivity analysis of the transient marking occupation probability ma-
trix, we consider the matrix V (t) as being a function of some independent parameter θ ,
that is V ( t ) = V (θ ;t ). We denote with S (θ ;t ) the sensitivity function matrix, that is the
derivative of V (θ ;t ) with respect to θ .
Matrix S (θ ;t ) can be partitioned into its blocks Si, j (θ ;t ) according to the same parti-
tion as that defined for V (θ ;t ). Since the expression for the transient marking occupation
probabilities is given in terms of the LSTs of the kernel matrices in Equation (6.5), we ob-
tain the sensitivity function matrix in the transform domain. Let us denote with
Voi, j (θ ;s ) and S
oi, j (θ ;s ) the Laplace Stieltjes transform of block matrix V i, j (θ ;t ) and
Si, j (θ ;t ), respectively. The sensitivity function transform is obtained as follows:
S oi, j (θ ;s ) = e−st d
∂∂θ
V i, j (θ ;t )
0
+∞
∫ = ∂∂θ
e−st dV i, j (θ ;t )
0
+∞
∫ = ∂V oi, j (θ ;s )
∂θ
MRSPN models for PMSs 115
where the second equality comes from the fact that θ is independent from both s and t.
Carrying on the differentiation of Voi, j (θ ;s ) gives the following expressions for the sen-
sitivity function:
S o
i, j (θ ;s ) = ∂∂θ
K oh ,h+1( s )
h=1
j−1
∏ Eoj , j ( s ) =
= K ol ,l+1( s )
∂K oh ,h+1( s )
∂θK o
l ,l+1( s )l=h+1
j−1
∏l= i
h−1
∏h= i
j−1
∑⎡
⎣⎢⎢
⎤
⎦⎥⎥Eo
j , j ( s )+
+ K o
h ,h+1( s )h=1
j−1
∏ ∂Eoj , j ( s )
∂θ (6.12)
Thus, the evaluation of the sensitivity function S (θ ;t ) is reduced to the computation of
the derivatives with respect to θ of the LSTs of the blocks of the kernel matrices.
When more specific assumption on the structure of the MRSPN model of the PMS are
made, the general Equation (6.12) can be further simplified. For instance, in the case the
subordinate processes are simple homogeneous Markov chains, the derivatives of the
LSTs of the kernel matrix blocks can be obtained as follows:
∂K oi ,i+1( s )
∂θ= ∂∂θ
L eQitΔ i f i( t )[ ] = L∂∂θ
eQitΔ i f i( t )( )⎡
⎣⎢
⎤
⎦⎥ =
= L∂eQit
∂θΔ i f i( t )
⎡
⎣⎢⎢
⎤
⎦⎥⎥+ L eQit ∂Δ i
∂θf i( t )
⎡
⎣⎢
⎤
⎦⎥ + L eQitΔ i
∂f i( t )
∂θ⎡
⎣⎢
⎤
⎦⎥
∂Eoj , j ( s )
∂θ= ∂∂θ
sL[eQit (1− Fi( t ))] = sL[∂∂θ
eQit (1− Fi( t ))( )] =
= sL[
∂eQit
∂θ(1− Fi( t ))] − sL[eQit f i( t )]
MRSPN models for PMSs 116
Sensitivity analysis of R
Consider now the probability R of successfully completing the mission as being a func-
tion of some independent parameter θ , that is R = R(θ ). Let r(θ ) denote the derivative
of R(θ ) with respect to parameter θ . From Equation (6.6), we can evaluate function
r(θ ) as follows:
r(θ ) = ∂R(θ )
∂θ= ∂∂θ
m0 Ki ,i+1∗
i=1
n
∏ ⋅Θ⎛
⎝⎜
⎞
⎠⎟ = m0
∂∂θ
Ki ,i+1∗
i=1
n
∏ ⋅Θ
The derivative in the second term of the preceding sum can be obtained as follows:
∂∂θ
Ki ,i+1∗
i=1
n
∏ = Kh ,h+1∗
h=1
j−1
∏j=1
n
∑ ∂∂θ
Kj , j+1∗ K
h ,h+1∗
h= j+1
n
∏ (6.13)
If we consider a particular distribution of the phase duration and a specific type of sub-
ordinate process, the general expression given above for function r(θ ) can be further
specialised. For instance, consider a MRSPN model for a PMS having an exponential du-
ration of the phases, and a subordinate process that in each phase is a homogeneous
Markov chain. In this case, by combining Equations (6.10), and (6.13), we obtain the fol-
lowing expression for r(θ ):
r(θ ) = m0 ( I − Qh / λh )−1Δhh=1
j−1
∏j=1
n
∑ ∂∂θ
( I − Q j / λ j )−1Δ j⎛⎝⎜
⎞⎠⎟
( I − Qh / λh )−1
h= j+1
n
∏ ΔhΘ
Note that the expression involving the derivative of matrix ( I − Q j / λ j )−1 can be
worked out as follows:
∂∂θ
( I − Q j / λ j )−1Δ j = ( I − Q j / λ j )−1 ∂Q j / λ j
∂θ( I − Q j / λ j )−1Δ j
MRSPN models for PMSs 117
6.3 Evaluation procedure
The procedure to compute the time dependent marking occupation probabilities according
to Equations (6.4) and (6.5), and the Reliability according to Equation (6.6), is dependent
on the specific characteristics of the MRSPN model of the PMS. In particular, the prop-
erties of the subordinate processes and of the phase duration distributions can affect the
evaluation procedure. However, it is always true that the evaluation of the overall PMS is
reduced to the sequential analysis of its MRSPN model inside the various phases.
Therefore, an efficient solution scheme similar to the one presented for the DSPN models
of PMSs applies.
For instance, suppose we are interested in computing the transient marking occupation
probabilities of the MRSPN model at the end of the mission, that is the kernel matrix
block V 1 ,n+1( t ). The solution algorithm takes as input the MRSPN model and its initial
probability vector, and performs the following steps:
1) builds the i-th subordinate processes, from the analysis of the reachability graph
of the model during the enabling period of transition tiGen ;
2) builds the branching probability matrix Δ i , from phase i to phase i + 1,
i = 1,2,K ,n;
3) obtains the transient state occupation probability matrix Π i( t ) of the subordinate
process i, i = 1,2,K ,n;
4) if phases have deterministic duration, then completes the analysis in the time do-
main according to Equation (6.4), or else computes the LSTs of the kernel matrix
blocks and applies Equation (6.5).
The computational complexity of the preceding steps is dominated by that of steps 4)
and 5), which are heavily dependent from the particular characteristics of the specific
MRSPN model.
We can easily provide the computational complexity required to evaluate the Reliability
at the mission completion time R, in the case the subordinate process is a homogeneous
Markov chain. In this scenario, according to Equation (6.6), we basically need to apply
the steps of the following algorithm:
MRSPN models for PMSs 118
1) build the Markov chain subordinate to phase i, and obtain its transition rate ma-
trix Qi, i = 1,2,K ,n;
2) build the branching probability matrix Δ i , from phase i to phase i + 1,
i = 1,2,K ,n ;
3) obtain the limit K
i ,i+1∗ as s → 0+ of the LST K
oi ,i+1( s ) of the global kernel matrix
blocks, i = 1,2,K ,n;
4) multiply the initial probability vector, the matrices K
i ,i+1∗ , and the vector Θ ac-
cording to the ordering given by Equation (6.6), to obtain the Reliability R.
The computational complexity of this algorithm is dominated by the evaluation of the
limit matrices in step 3). Indeed, apart from the special case of deterministic phase dura-
tion, the inversion of a matrix of the form I − Qi is in general (see Appendix B) needed to
derive each of the matrices K
i ,i+1∗ , i = 1,2,K ,n. This inversion requires a complexity of
O(C
i3 ) operations (multiplications), where Ci is the cardinality of the state space Si of
the Markov chain subordinate to phase i. Thus, in this scenario of PMS, the overall
complexity required for the evaluation of the Reliability R is dominated by O( C
i3
i=1n∑ ).
6.4 Application example
6.4.1 Modelling
In this section we discuss the MRSPN modelling of our example of PMS. The extreme
modelling power of MRSPN allows to take into account all the features of the PMS. In
particular, we can accommodate the random duration of the phases, the flexibility charac-
teristics of the mission, and the general distribution of the time needed to perform the re-
pair of the failed processors.
The sole approximation that may need be introduced is due to the loss of memory of the
non exponential repair activities at the phase completion times. Notice that this leads to
an approximate modelling if and only if such a memoryless of the repair does not respect
the semantic of the repair itself. In fact, we can think of some interesting cases when the
repair activities can be stopped at the phase ending times, for instance when the next
MRSPN models for PMSs 119
phase to be performed is a maintenance phase, during which all the failed components are
replaced.
Anyway, in the case when such an approximation is actually introduced in the model, the
impact on the results can be easily estimated, by considering the following bounding ap-
proach. Suppose that, at the phase ending times, we decide to model the loss of the mem-
ory by forcing the premature completion of the repairs that are currently being executed.
It is easy to realise that such a model would provide an optimistic estimation of the de-
pendability measures of the original PMS. Conversely, we can obtain a pessimistic esti-
mation of the dependability measures from another model of PMS, in which the repairs
that are not completed as a phase ends simply start over as the next phase begins. These
two possible model provide an upper and a lower bound on the exact dependability mea-
sures, thus allowing for the estimation of the error introduced with the loss of memory.
Moreover, if the actual semantic of the reparation is such that forcing the loss of memory
at the phase completion times does not allow a satisfactory modelling, some modifica-
tions can be made to the upper and lower bound models, to improve their accuracy, and
consequently limit the gap between the pair of results they provide. For instance, one
could apply a technique similar to the phase expansion one [30], to split the repair in a
sequence of shorter activities, for which the influence of the loss of memory would be
less relevant. In particular, we could model a deterministic repair whose duration is τwith two consecutive deterministic transitions, each one having constant delay τ / 2. In
this way, the restart or the premature completion of a repair activity would affect only
one out of the two stages, resulting in a better approximation of the original behaviour of
the PMS with respect to the case when the repair is modelled with a single transition.
The structure of the MRSPN model of the PMS is the same as the one we built in
Chapter 5 using the DSPN models, which is shown in Figures 5.5 and 5.6, and whose
definition is completed by Tables 5.1 and 5.2. The differences are only found in the firing
time distributions of the PhN general transitions, and in that of the repair rate.
6.4.2 Evaluation
The solution of the MRSPN model of our example of PMS is conducted by considering
the parameter values defined in Table 4.1. Moreover, we also need to specify the distri-
MRSPN models for PMSs 120
bution Fi( t ) of phase i duration, 1,2,3,4, and the distribution μ( t ) of the time needed
to perform the repair of a faulty processor. We select for phases 2 and 4 a random dura-
tion that follows the truncated exponential distribution of expected value τ2 and τ4 , and
whose truncation time point is γ = 3000 hours, and φ = 500 hours, respectively. This
means that the phases 2 and 4 necessarily complete not later than their truncation points.
The cumulative distribution function F2 ( t ) for the duration of phase 2 is as follows:
F2 ( t ) = (1− e−αt ) (1− e−αγ ) 0 ≤ t < γ
1 t ≥ γ
⎧⎨⎪
⎩⎪
and F4 ( t ) is similarly defined. Notice that the phase probability density functions
f 2 ( t ) and f 4 ( t ) are discontinuous functions. We keep the deterministic duration of
phases 1 and 3. For the time to repair, we select a 2-stage Erlang distribution. All these
random distributions are completely defined by their expected values already given by
Table 4.1.
We consider the pair of MRSPN models that are defined by taking into account the two
different policies to force the loss of the memory at the phase completion times. Thus,
we are actually solving two different models, which, as explained above, provide an upper
and lower bound on the exact results.
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
R1 R2 R3 R4
lower bound
upper bound
Figure 6.1: Reliability at phase completion times
MRSPN models for PMSs 121
The evaluation results for the Reliability Ri at phase i completion time, i = 1,2,3,4, are
shown in Figure 6.1. Notice that the gap between the two bound results tends to grow as
the number of phase increase. Obviously, the lower bound results provide a conservative
estimation of the PMS Reliability, which can be acceptable in many critical applications.
The accuracy of the PMS Reliability evaluation can be improved by applying the phase
expansion technique for the transitions that loose the memory at phase boundaries. We
split the Erlang repairs in two exponential subrepair transitions of rate μ / 2. When a
phase completes, only the subrepair that is currently being performed is affected. Still,
we can define two different MRSPN models, one of which models the premature com-
pletion of the subrepair, and the other one models the restart of the subrepair. Thus we
obtain a new pair of upper and lower bound models, which provide the much more accu-
rate Reliability results shown in Figure 6.2.
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
R1 R2 R3 R4
lower bound
upper bound
Figure 6.2: A more accurate estimation of the Reliability at phase ending times
We evaluate the reward cumulated during the mission of the PMS for different values of
the probability of successful spare insertion c, with all of the four MRSPN models con-
sidered so far. As it is shown by Figure 6.3, the second pair of lower and upper bound re-
sults is much tighter than the first one. The exact reward value for the PMS we have
taken as example lies within the interval defined by the second pair of results.
MRSPN models for PMSs 122
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.85 0.9 0.95 0.99 0.995 0.999
lower bound 1upper bound 1lower bound 2upper bound 2
RRRReeeewwwwaaaarrrrdddd
cccc
Figure 6.3: Reward cumulated during the PMS mission
6.5 Concluding remarks
The methodology based on MRSPNs we have introduced in this chapter provides a clear
and generally applicable modelling approach for the study of a wide class of PMSs. Even
very complex features of PMSs can be accommodated in our modelling framework, which
also provides a general analytical solution technique to evaluate the time-dependent be-
haviour and the Reliability figures of the MRSPN models of PMSs.
The modelling approach we employed is mainly based on that proposed in the last chap-
ter for the DSPN models of PMSs. Therefore, the high expressiveness, the consequent
simplicity of the modelling, and the possibility of considering flexible missions, are also
found in the MRSPN models of PMSs. Furthermore, the modelling power of MRSPNs
allows to relax those restrictive assumptions on the phase duration and on the types of
transitions that can be included in the SN submodel.
Therefore, very broad conditions are sufficient to guarantee the applicability of the
MRSPN methodology. We provided a specific sufficient condition (Assumption 6.1) that
exactly defines the set of Petri net models of PMSs that can be solved within the frame-
work of our MRSPN approach. If such assumption is fulfilled, the results we derived by
specialising the MRGP theory can be applied, which provide an analytical solution
MRSPN models for PMSs 123
method for the study of the MRSPN transient marking occupation probabilities and of
the Reliability figures of the PMS.
Once again, the computational solution required for the solution of the overall model of
the PMS is reduced to that necessary to separately analyse the PMS inside the different
phases. However, we must observe that, due of the generality of the MRSPNs, the exact
computational cost required for the application of our methodology cannot be estimated
in general, in that it is heavily dependent from the characteristics of the subordinate pro-
cesses.
The MRSPN modelling approach represents a relevant enlargement of the class of PMSs
that can be analytically studied. With respect to the class of PMSs that the methods pro-
posed in the literature (see Chapter 3) can deal with, we are now able to exactly represent
inside our MRSPN models many more interesting features of PMSs, without renouncing
to usefulness of an analytical solution that can be carried out at a limited cost.
Chapter 7
Conclusions
In this thesis we have addressed the analytical modelling and evaluation of dependability
related characteristics for the class of systems that are known as phased mission systems.
PMSs are often encountered in many critical applications, and thus the evaluation of their
dependability attributes is a task of primary relevance. However, due to their dynamic
behaviour, a systematic and computationally efficient approach to the modelling and
evaluation of PMSs had not been proposed in the literature.
We have started our investigation in the first part of the dissertation from a review of the
methodologies that have been published over the last decades. Several different ap-
proaches have been considered, but the various peculiar aspects of PMSs represent a
source of formidable problems for the dependability modelling and analysis techniques
that have been applied for their solution. From our preliminary inspection, we have con-
cluded that a generally applicable methodology able to effectively deal with the various
features of PMSs was still missing.
Therefore, we have set as a goal for this dissertation the development of computationally
efficient and generally applicable modelling and evaluation methodologies for the depend-
ability analysis of PMSs. The key characteristics for the methodologies to be developed
have been defined to be understandability, conciseness, modifiability and reusability of
the models, together with a limited computational complexity of the evaluation step.
We have attacked the study of PMSs with the aim of taking into account all the specific
features of PMSs, including those that had not been considered in the literature because of
the limits of the various modelling methodologies. In particular, we have considered flexi-
ble missions of PMSs, whereas all the methods on the literature only had dealt with fixed
Conclusions 125
mission profiles, and the possibility of having phases whose duration is drawn from a
generally distributed random variable, while only phases of constant duration had been
taken into account in the previous studies.
To honour these ambitious commitments, we have defined three different modelling and
evaluation methodologies in the second part of the dissertation, which step by step in-
clude more and more features of PMSs, and relax the restrictive assumptions that had
been introduced in the literature to permit their analytical tractability.
We have first presented a method based on a hierarchy of Markov models, which, within
the same restricted scenario of constant phase duration of PMSs, proposes several origi-
nal elements. In particular, it introduces a systematic modelling of the dependencies
among consecutive phases, which had posed relevant issues to many of the methodolo-
gies proposed in the literature. Moreover, we perform a first step towards the introduc-
tion of a flexible profile of the mission, allowing the dynamic choice of the next phase to
be performed at the time the preceding phase ends.
Then, we have proposed a second methodology for constant phase duration PMSs, based
on the high level modelling paradigm of deterministic and stochastic Petri nets. Thanks to
the expressiveness of DSPN models, we have defined a general modelling strategy that is
able to represent in a very concise and easy to read form even very complex behaviour of
PMSs. Also, the single DSPN model of a PMS completely relieves the necessity of ex-
plicitly handling phase dependencies, thus avoiding a cumbersome and error-prone mod-
elling task. Moreover, it allows to consider flexible missions of PMSs, in which the next
phase to be executed can be selected at the time the previous phase ends, and the duration
of the phases can be dynamically adjusted. The proposed solution approach for DSPN
models allows to perform the analytical sensitivity analysis of a PMS, a task that, to the
best of our knowledge, had not been considered in the literature, yet.
Finally, we have proposed a last methodology based on MRSPN models, which consider-
ably widens the class of PMSs that can be analytically solved. MRSPN models can rep-
resent in quite a natural way all those scenarios of PMSs previously accommodated by
the various studies in the literature, and by the other two methodologies proposed in this
thesis. Moreover, MRSPNs can exactly represent PMSs whose phases have a randomly
distributed duration. The general results have been specialised for particular scenarios of
Conclusions 126
PMSs, deriving in many interesting cases simple and explicit expressions for the
Reliability figures of the system. We have also analytically derived some sensitivity
functions for the MRSPN models of PMSs.
For all the three defined methodologies, a limited computational cost is required for the
evaluation of the models. Indeed, the complexity of the analysis of the overall mission of
the PMS is in each case reduced to that necessary for the separate solution inside the dif-
ferent phases. Thus, the computational cost of the original methodologies here proposed
is comparable to that of the cheapest approaches described in the survey we have given in
Chapter 3.
Throughout the dissertation we have applied the three different methodologies to an ideal
example of PMS, which we have defined as a benchmark to exercise the various ap-
proaches on the same application case. We have evaluated the Reliability of the PMS at
the phase completion times, and observed how the various simplifying hypotheses
needed for the applicability of the methods affect the final results. The results shown in
Figure 7.1 for the Reliability R at mission completion time of the example of PMS illus-
trate the impact of the assumptions that the various methods require. We can immedi-
ately understand from the differences in the three values the effects that a scantily realis-
tic modelling may have on the final results.
0
0.2
0.4
0.6
0.8
1
Markov DSPN MRSPN
Re l i ab i l i t y
Figure 7.1: Reliability estimations with the three methodologies
Conclusions 127
The results of the research presented in this thesis offer a relevant practical contribution
to the study of dependability features of PMSs. A future activity that would represent a
further significant improvement of the result applicability is the implementation of the al-
gorithmic solution procedures for the various methodologies here proposed, in the
framework of a specific automated tool for the modelling and evaluation of PMSs.
References
[1] M. Ajmone-Marsan and G. Chiola, “On Petri nets with deterministic and
exponentially distributed firing times,” in Lecture Notes in Computer Science 266:
Springer-Verlag, 1987, pp. 132-145.
[2] M. Ajmone-Marsan, G. Conte, and G. Balbo, “A class of generalized stochastic
Petri nets for the performance evaluation of multiprocessor systems,” ACM
Transactions on Computer Systems, vol. 2, pp. 93-122, 1984.
[3] M. Alam and U. M. Al-Saggaf, “Quantitative Reliability Evaluation of Repairable
Phased-Mission Systems Using Markov Approach,” IEEE Transactions on
Reliability, vol. R-35, pp. 498-503, 1986.
[4] S. Allmaier and S. Dalibor, “PANDA - Petri net analysis and design assistant,”
presented at Performance TOOLS’97, Saint Malo, France, 1997.
[5] J. Arlat, T. Eliasson, K. Kanoun, et al., “Evaluation of Fault -Tolerant Data
Handling Systems for Spacecraft: Measures, Techniques and Example
Applications,” LAAS-CNRS 86.321, November 1986.
[6] J. Arlat, K. Kanoun, and J. C. Laprie, “Dependability Modelling and Evaluation of
Software Fault-Tolerant Systems,” IEEE Transactions on Computers, vol. 39, pp.
540-513, 1990.
[7] B. E. Aupperle, J. F. Meyer, and L. Wei, “Evaluation of fault-tolerant systems
with non-homogeneous workloads,” presented at 19th IEEE Fault Tolerant
Computing Symposium (FTCS-19), 1989.
[8] A. Avizienis, “Building Dependable Systems: How to Keep Up with Complexity,”
presented at IEEE 25-th International Symposium on Fault-Tolerant Computing
Systems (FTCS-25), Pasadina, U.S.A., 1995.
[9] M. D. Beaudry, “Performance-Related Reliability Measures for Computing
Systems,” IEEE Transactions on Computers, vol. 27, pp. 540-547, 1978.
Bibliography 129
[10] J. T. Blake, A. Reibman, and K. S. Trivedi, “Sensitivity analysis of reliability and
performability measures for multiprocessor systems,”, Technical Report DUKE
TR-1987-32, Duke University, 1987.
[11] A. Bobbio and M. Telek, “Computational restrictions for SPN with generally
distributed transition times,” presented at First European Dependable Computing
Conference (EDCC-1), 1994.
[12] A. Bobbio and M. Telek, “Markov regenerative SPN with non-overlapping activity
cycles,” presented at International Computer Performance and Dependability
Symposium (IPDS’95), 1995.
[13] A. Bondavalli, S. Chiaradonna, F. Di Giandomenico, et al., “Dependability of
Iterative Software: a Model for Evaluating the Effects of Input Correlation,”
presented at SAFECOMP, Belgirate, Italy, 1995.
[14] A. Bondavalli, F. Di Giandomenico, and I. Mura, “An Optimal Value-Based
Admission Policy and its Reflective Use in Real-Time Systems,” Real-Time
Systems, to appear.
[15] A. Bondavalli, I. Mura, and M. Nelli, “Analytical Modelling and Evaluation of
Phased-Mission Systems for Space Applications,” presented at IEEE High
Assurance System Engineering Workshop (HASE’97), Bethesda Maryland, USA,
1997.
[16] A. Bondavalli, L. Strigini, and J. A. Stankovic, “Adaptable Fault-Tolerance for
Real-Time Systems,” in Responsive Computer Systems, Towards the Integration of
Fault-Tolerance and Real-Time: Kluwer, 1994.
[17] J. L. Bricker, “A unified method for analizing mission reliability for fault tolerant
computer systems,” IEEE Transactions on Reliability, vol. R-22, 1973.
[18] G. R. Burdick, J. B. Fussell, D. M. Rasmuson, et al., “Phased Mission Analysis: A
review of New Developments and An Application,” IEEE Transactions on
Reliability, vol. R-26, pp. 43-49, 1977.
[19] A. Burns and N. Audsley, “Scheduling Hard Real-Time Systems: a Review,”
Software Engineering Journal, vol. 6, pp. 116-128, 1991.
[20] G. Chiola, G. Franceschinis, R. Gaeta, et al., “GreatSPN 1.7: graphical editor and
anlyzer for timed and stochastic Petri nets,” Performance Evaluation, vol. 24, pp.
47-68, 1997.
Bibliography 130
[21] H. Choi, V. Kulkarni, and K. S. Trivedi, “Markov regenerative stochastic Petri
nets,” Performance Evaluation, vol. 20, pp. 337-356, 1994.
[22] H. Choi, V. G. Kulkarni, and K. S. Trivedi, “Transient analysis of deterministic and
stochastic Petri nets.,” presented at 14th International Conference on Application
and Theory of Petri Nets, Chicago Illinois, USA, 1993.
[23] H. Choi, V. Mainkar, and K. S. Trivedi, “Sensitivity analysis of deterministic and
stochastic Petri nets,” presented at MASCOTS ‘93, 1993.
[24] G. Ciardo, R. German, and C. Lindemann, “A characterization of the stochastic
process underlying a stochastic Petri net,” IEEE Transactions on Software
Engineering, vol. 20, pp. 506-515, 1994.
[25] G. Ciardo, J. Muppala, and K. S. Trivedi, “Spnp: Stochastic Petri net package.,”
presented at International Conference on Petri Nets and Performance Models,
Kyoto, Japan, 1989.
[26] E. Cinlar, Introduction to Stochastic Processes. Englewood Cliffs, NJ, USA:
Prentice-Hall, 1975.
[27] A. Conway, “Analytical performance evaluation of communication protocols,”
IEEE Journal on Selected Areas of Communications, vol. 9, pp. 1-14, 1991.
[28] P. J. Courtois and P. Semal, “Bounds for the positive eigenvectors of non negative
matrices and for their approximation by decomposition,” Journal of the ACM, vol.
31, pp. 804-825, 1984.
[29] P. J. Courtois and P. Semal, “Computable bounds for conditional steady-state
probabilities in large Markov chains and in queueing models,” IEEE Journal on
Selected Areas of Communications, vol. 4, pp. 920-926, 1986.
[30] D. Cox, “The analysis of non-Markov stochastic processes by the inclusion of
supplementary variables,” Proc. Cambridge Philosophical Society, vol. 51, pp. 433-
441, 1955.
[31] J. B. Dugan, “Automated Analysis of Phased-Mission Reliability,” IEEE
Transactions on Reliability, vol. 40, pp. 45-52, 1991.
[32] J. B. Dugan and M. R. Lyu, “Dependability modeling for fault-tolerant software
and systems,” in Software fault-tolerance, vol. volume 3 of Trends in Software,
Trends in Software, M. R. Lyu, Ed. New York: Wiley & Sons, 1995, pp. 109-137.
Bibliography 131
[33] J. B. Dugan, K. S. Trivedi, R. Geist, et al., “Extended stochastic Petri nets:
application and analysis,” presented at PERFORMANCE ‘84, Paris, France, 1984.
[34] J. D. Esary and H. Ziehms, “Reliability analysis of phased missions,” in Reliability
and fault tree analysis: SIAM Philadelphia, 1975, pp. 213-236.
[35] P. M. Frank, Introduction to System Sensitivity Theory: Academic Press, New York
USA, 1978.
[36] R. German, “Transient analysis of deterministic and stochastic Petri nets by
method of supplementary variables.,” presented at MASCOTS ‘95, Durham, NC,
1995.
[37] R. German, C. Kelling, A. Zimmermann, et al., “TimeNET: a toolkit for evaluating
non-Markovian stochastic Petri nets,” Performance Evaluation, vol. 24, 1995.
[38] R. German and C. Lindemann, “Analysis of stochastic Petri nets by the method of
supplementary variables,” Performance Evaluation, vol. 20, pp. 317-335, 1994.
[39] P. Heidelberger and A. Goyal, “Sensitivity analysis of continuous time Markov
chains using uniformization,” presented at 2-nd International Workshop on Applied
Mathematics and Performance/Reliability models of Computer/Communication
systems, Rome, Italy, 1987.
[40] I.E.C. International Electromechanical Commission SC 65A, “Functional safety:
safety related systems,” , Draft n. 1508-5, 1995.
[41] K. Kanoun, M. Borrel, T. Morteveille, et al., “Modeling the dependability of
CAUTRA, a subset of the french air traffic control system,” presented at IEEE 26-
th International Symposium on Fault-Tolerant Computing (FTCS26), Sendai,
Japan, 1996.
[42] K. Kim and T. Lawrence, “Adaptive Fault-Tolerance in Complex Real-Time
Distributed Computer Applications,” Computer Communications, vol. 15, 1992.
[43] L. Kleinrock, Queueing Systems: Wiley-Interscience, 1976.
[44] V. G. Kulkarni, Modeling and analysis of stochastic systems: Chapman-Hall, 1995.
[45] J. C. Laprie, “Dependability: a Unifying Concept for Reliable Computing and Fault
Tolerance,” in Dependability of Resilient Computers, T. Anderson, Ed.: BSP
Professional Books, 1989, pp. 1-28.
Bibliography 132
[46] J. C. Laprie, “Dependability-Its Attribues, Impairments and Means,” in Predictably
Dependable Computing Systems, B. Randell, J. C. Laprie, H. Kopetz, and B.
Littlewood, Eds.: Springer-Verlag, 1995, pp. 3-24.
[47] J. C. Laprie, J. Arlat, C. Beounes, et al., “Definition and Analysis of Hardware-and-
Software Fault-Tolerant Architectures,” IEEE Computer, vol. 23, pp. 39-51, 1990.
[48] J. C. Laprie, J. Arlat, J. P. Blanquart, et al., Guide de la surete de fonctionnement.
Toulouse: Cepadues-Editions, 1996.
[49] S. S. Lavenberg, Computer performance modeling handbook. New York: Academic
Press, 1983.
[50] M. R. Lyu, Software Fault Tolerance: Wiley, 1995.
[51] Y. Ma and K. S. Trivedi, “An algorithm for reliability analysis of phased-mission
systems,” Reliability Engineering and System Safety, special issue on Reliability
Certification, to appear.
[52] M. Malhotra and K. S. Trived, “Dependability modeling using Petri nets,” IEEE
Transactions on Reliability, vol. 44, pp. 428-440, 1995.
[53] M. Malhotra and K. S. Trivedi, “Power-hierarchy among dependability model
types,” IEEE Transactions on Reliability, vol. 43, pp. 493-502, 1994.
[54] S. Metge, M. Aguera, J. Arlat, et al., “SURF-2: a tool for dependability evaluation
based on Markov chains and stochastic Petri nets,” presented at 9-th International
Conference on Relaibility and Maintainability (ESREL’94), La Baule, France, 1994.
[55] J. F. Meyer, “On Evaluating the Performability of Degradable Computing
Systems,” IEEE Transactions on Computers, vol. C-29, pp. 720-731, 1980.
[56] J. F. Meyer, D. G. Furchgott, and L. T. Wu, “Performability Evaluation of the
SIFT Computer,” presented at IEEE FTCS’79 Fault-Tolerant Computing
Symposium, June20-22, Madison, Wisconsin, USA, 1979.
[57] M. K. Molloy, “Performance analysis using stochastic Petri nets,” IEEE
Transactions on Computers, vol. 31, pp. 913-917, 1982.
[58] M. K. Molloy, “Discrete time stochastic Petri nets,” IEEE Transactions on
Software Engineering, vol. 11, pp. 417-423, 1985.
[59] J. K. Muppala and K. S. Trivedi, “GSPN models: sensitivity analysis and
applications,” presented at 28-th ACM Southeast Region Conference, 1990.
Bibliography 133
[60] M. Nelli, A. Bondavalli, and L. Simoncini, “Dependability modelling and analysis
of complex control systems: an application to railway interlocking,” presented at
EDCC2, Taormina, Italy, 1996.
[61] M. F. Neuts, Matrix-Geometric Solutions in Stochastic Models: Johns Hopkins
University Press, 1982.
[62] A. Pedar and V. V. S. Sarma, “Phased-Mission Analysis for Evaluating the
Effectiveness of Aerospace Computing-Systems,” IEEE Transactions on Reliability,
vol. R-30, pp. 429-437, 1981.
[63] H. G. Perros, Y. Dallery, and G. Pujolle, “Analysis of a queueing network model
with class dependent window flow control,” IEEE Infocom, pp. 968-977, 1992.
[64] J. L. Peterson, Petri net theory and the modeling of systems. Englewood Cliffs, NJ:
Prentice-Hall, 1981.
[65] C. Petri, “Kommunikation mit automaten,” : University of Bonn, Bonn, Germany,
1962.
[66] D. Powell, J. Arlat, L. Beus-Dukic, et al., “GUARDS: a Generic Upgradable
Architecture for Real-time Dependable Systems,” IEEE Transactions on
Paralleland Distributed Systems, to appear.
[67] A. Puliafito, M. Scarpa, and K. S. Trivedi, “Petri nets with k simultaneously
enabled generally distributed timed transitions,” Performance Evaluation, 1997.
[68] A. Reibman and K. S. Trivedi, “Numerical transient analysis of Markov models,”
Computers and Operation Research, vol. 15, pp. 19-36, 1988.
[69] M. Reiser, “Communication system models embedded in the OSI reference model, a
survey,” in Computer Networking and Performance Evaluation, T. Hasegawa, H.
Takagi, and Y. Takahashi, Eds.: Elsevier, 1986.
[70] W. Reisig, Petri nets: an introduction: Springer Verlag, 1985.
[71] A. Rindos, S. Woolet, I. Viniotis, et al., “Exact methods for the transient analysis of
non-homogeneous continuous-time Markov chains,” presented at 2-nd International
Workshop on the Numerical Solution of Markov Chains, Raleigh, NC, USA, 1995.
[72] S. M. Ross, Stochastic Processes. Berkeley, CA, USA: John Wiley & Sons, 1983.
[73] R. A. Sahner and K. S. Trivedi, “Reliability modeling using SHARPE,” IEEE
Transactions on Reliability, vol. 36, pp. 186-193, 1987.
Bibliography 134
[74] W. H. Sanders and L. M. Malhis, “Dependability evaluation using composed SAN-
based reward models,” Journal of parallel and distributed computing, vol. 15, pp.
238-254, 1992.
[75] W. H. Sanders and J. F. Meyer, “METASAN: a performability evaluation tool
based on stochastic activity networks,” presented at ACM-IEEE Comp. Soc. 1986
Fall Joint Comp. Conf., Dallas TX, 1986.
[76] W. H. Sanders, W. D. Obal II, M. A. Qureshi, et al., “The UltraSAN modeling
environment,” Performance Evaluation, vol. 21, 1995.
[77] M. Schwartz, Telecommunication networks: protocol modelling and analysis:
Addison Wesley, 1987.
[78] D. P. Siewiorek and R. S. Swarz, Reliable Computer Systems: Digital Press, 1992.
[79] M. Smotherman and K. Zemoudeh, “A Non-Homogeneous Markov Model for
Phased-Mission Reliability Analysis,” IEEE Transactions on Reliability, vol. 38,
pp. 585-590, 1989.
[80] A. K. Somani, J. A. Ritcey, and S. H. L. Au, “Computationally-Efficent Phased-
Mission Reliability Analysis for Systems with Variable Configurations,” IEEE
Transactions on Reliability, vol. 41, pp. 504-511, 1992.
[81] A. K. Somani and K. S. Trivedi, “Phased-Mission Systems Using Boolean
Algebraic Methods,” Performance Evaluation Review, pp. 98-107, 1994.
[82] A. T. Tai, A. Avizienis, and J. F. Meyer, “Evaluation of Fault-Tolerant Software: a
Performability Modeling Approach,” in Dependable Computing for Critical
Applications 3, C. E. Landwher, B. Randell, and L. Simoncini, Eds.: Springer-Verlag,
1992, pp. 113-135.
[83] A. T. Tai, S. N. Chau, L. Alkalaj, et al., “On board preventive maintenance: analysis
of effctiveness and optimal duty period,” presented at Third International
Workshop on Object-Oriented Real-Time Dependable Systems (WORDS’97),
Newport Beach, CA, USA, 1997.
[84] L. A. Tomek and K. S. Trivedi, “Analyses using stochastic reward nets,” in
Software fault-tolerance, vol. volume 3 of Trends in Software, Trends in Software,
M. R. Lyu, Ed. New York: Wiley & Sons, 1995, pp. 139-166.
[85] K. S. Trivedi, Probability and Statistics with Reliability, Queuing and Computer
Science Applications: Prentice-Hall, 1982.
Bibliography 135
[86] H. S. Winokur and L. J. Goldstein, “Analysis of mission oriented systems,” IEEE
Transactions on Reliability, vol. R-18, pp. 144-148, 1969.
Appendix A
Laplace transform properties
Suppose f ( t ) is a piecewise-continuous function, defined for t ≥ 0 . If f ( t ) is of expo-
nential order, that is there exist two constants M and α , such that | f ( t )|≤ Meαt holds
for any t ≥ 0 , then the Laplace transform L[ f ]( s ) of f ( t ) is defined as follows:
L[ f ]( s ) = e−st f ( t )dt
0
+∞
∫
for any complex number s such that Re( s ) > α . Table A.1 shows the Laplace transforms
for some important function.
Function Transform
c constant c / s,
t 1 / s2
ta , a natural number a! / sa+1
e−at 1 / ( s + a), Re( s ) > a
tbe−at b! / ( s + a)b+1 , Re( s ) > a
eM ⋅t , M matrix ( sI − M )−1
Table A.1: Laplace transform of some functions
The Laplace transform is very useful in simplifying calculations. Indeed, in the domain
transform expressions that involve integrals, derivatives and limits usually become ex-
tremely easy. Moreover, there is a one-to-one correspondence between functions and
Laplace transforms, and this implies that once the calculation has been performed in the
Laplace transform properties 137
transform domain, the result can be reobtained in the time domain by applying the inverse
Laplace transformation.
The main properties of the Laplace transform operator are listed in Table A.2.
Property Function Transform
Direct transform f ( t )
e−st f ( t )dt0+∞∫
Linearity af ( t )+ bg( t ) aL[ f ]( s )+ bL[g ]( s )
Real differentiation ′f ( t ) sL[ f ]( s )− f (0+ )
Real integration
f ( t )dt0t∫ s
−1L[ f ]( s )
Translation e−at f ( t ) L[ f ]( s + a)
Scale change a−1 f ( t / a) L[ f ]( as)
Differentiation with respect
to an independent variable
∂∂r
f ( t ,r )
∂∂r
L[ f ( r )]( s )
Final value limt→+∞ f ( t ) = lim
s→0 + sL[ f ]( s )
Initial value lim
t→0 + f ( t ) = lims→+∞ sL[ f ]( s )
Table A.2: Laplace transform properties
Appendix B
LSTs of the kernel matrix blocks
In this appendix we present the mathematical derivation of the LSTs of the kernel matrix
blocks for the MRGP underlying the MRSPN model of a PMS. We consider the case
when the subordinate process is a homogeneous continuous-time Markov chain. In this
case, the kernel matrix blocks are given by the following Equations, which we have de-
rived in Chapter 6:
K oi ,i+1( s ) = e−st dKi ,i+1( t )
0
+∞
∫ = L[eQi( t ) f i( t )]Δ i , i = 1,2,K ,n
Eoi ,i( s ) = e−st dEi ,i( t ) =
0
+∞
∫ sL[eQi( t )(1− Fi( t ))] , i = 1,2,K.n+ 1, t ≥ 0
where Fi( t ) and f i( t ) are the cumulative distribution function and the probability den-
sity function of phase i duration, respectively, and eQit is the transient probability ma-
trix of the subordinate Markov chain. In the following, we obtain the LSTs of the kernel
matrix blocks for various cases of the distribution of the phase duration.
B.1 Exponential distribution
Suppose phase i duration is a random variable exponentially distributed with parameter
λ , that is Fi( t ) = 1− e−λt , and f i( t ) = λe−λt , t ≥ 0 . In this case, the expression for the
global kernel block Koi ,i+1( s ) can be rewritten as follows:
LSTs of the kernel matrix blocks 139
Koi ,i+1( s ) = L[eQi( t )λe−λt ]Δ i = λ((λ + s )I − Qi )−1Δ i
where to obtain the last expression we recall that the Laplace transform of the matrix ex-
ponential eM ⋅t is given by ( sI − M )−1 , and we apply the translation property of Laplace
transforms. Similarly, the local kernel block matrix Eoi ,i( s ) becomes:
Eoi ,i( s ) = sL[eQi( t )e−λt ] = s((λ + s )I − Qi )−1
B.2 Deterministic distribution
Consider a phase i having a fixed duration of τ units of time. The probability density
function f i( t ) of the random variable representing the phase duration is the Dirac impul-
sive function at τ . We obtain the following expression for the global kernel block matri-
ces:
Koi ,i+1( s ) = L[eQit f ( t )]Δ = e−sτeQiτΔ = e−( sI −Qi )τ
The cumulative distribution function Fi( t ) is given by
F( t ) =
0 0 ≤ t < τ1 t ≥ τ⎧⎨⎩
The local kernel block matrices can be obtained as follows:
Eoi ,i( s ) = sL[eQit F( t )] = s e−st eQit dt =
0
τ
∫
= s e−st eQit dt − s e−st eQit dt =τ
∞
∫ s( sI − Qi )−1
0
∞
∫ − se−( sI −Qi )τ e−sueQiudu0
∞
∫
where the last Equation is obtained with the substitution u = t + τ . Finally, we obtain
Eo
i ,i( s ) = s( sI − Qi )−1 I − e−( sI −Qi )τ( )
LSTs of the kernel matrix blocks 140
B.3 Truncated exponential distribution
We consider this particular case as an example to show that also in the case of a finite
support distribution, the LSTs of the kernel matrices are easily derived. The distribution
of interest is obtained by truncating at time τ > 0 the exponential distribution of parame-
ter λ , and then normalising with respect to the probability e−λτ . The cumulative distri-
bution function and the probability density function of such truncated exponential distri-
bution are respectively:
F( t ) = (1− e−λt ) / (1− e−λτ ) 0 ≤ t < τ
1 t ≥ τ
⎧⎨⎪
⎩⎪ , f ( t ) = λe−λt / (1− e−λτ ) 0 ≤ t < τ
0 t ≥ τ
⎧⎨⎪
⎩⎪
and the global kernel generic block is easily obtained as follows:
K oi ,i+1( s ) = e−st eQit λe−λt
1− e−λτ0
τ
∫ dtΔ i =λ
1− e−λτe−st eQit e−λt
0
τ
∫ dtΔ i =
= λ1− e−λτ
L[eQit e−λt ]Δ i −λ
1− e−λτe−st eQit e−λt
τ
∞
∫ dtΔ i =
= λ(( s + λ )I − Qi )−1
1− e−λτI − e−(( s+λ ) I −Qi )τ( )Δ i
The blocks of the local kernel are obtained as follows:
Eoi ,i = s e−st eQit e−λt − e−λτ
1− e−λτ0
τ
∫ dt = s
1− e−λτe−st eQit e−λt dt −
0
τ
∫ e−λτ e−st eQit dt0
τ
∫⎡
⎣⎢⎢
⎤
⎦⎥⎥=
= s
1− e−λτ(( s + λ )I − Qi )−1 I − e−(( s+λ ) I −Qi )τ( ) − e−λτ ( sI − Qi )−1 I − e−( sI −Qi )τ( )⎡⎣⎢
⎤⎦⎥
LSTs of the kernel matrix blocks 141
B.4 Uniform distribution
The probability density function f i( t ) of the uniform distribution over the interval
[a,b] is given by:
f i( t ) =1
b− aa ≤ t ≤ b
0 elsewhere
⎧⎨⎪
⎩⎪
Therefore, the LST of the global kernel matrix blocks can be written as follows:
K oi ,i+1( s ) = 1
b− ae−st eQit dtΔ i
a
b
∫ = 1
b− ae−st eQit dt
0
b
∫ − e−st eQit dt0
a
∫⎡
⎣⎢⎢
⎤
⎦⎥⎥Δ i =
= 1
b− a( sI − Qi )−1( I − e−( sI −Qi )b )− ( sI − Qi )−1( I − e−( sI −Qi )a )[ ]Δ i =
= 1
b− a( sI − Qi )−1 e−( sI −Qi )a − e−( sI −Qi )b( )Δ i
The probability distribution function for the uniform distribution has the following form:
Fi( t ) =0 t < a
( t − b) / ( b− a) a ≤ t ≤ b
1 t > b
⎧
⎨⎪
⎩⎪
The LST of local kernel blocks takes the following form:
Eoi ,i( s ) = e−st eQit dt
o
a
∫ + b
b− ae−st eQit dt
a
b
∫ − 1
b− ae−st teQit dt
a
b
∫
The first and the second terms in the preceding sum are easily derived as follows:
e−st eQit dto
a
∫ = ( sI − Qi )−1 I − e−( sI −Qi )a( )
LSTs of the kernel matrix blocks 142
b
b− ae−st eQit dt
a
b
∫ = b
b− a( sI − Qi )−1 e−( sI −Qi )a − e−( sI −Qi )b( )
To obtain the third term, we proceed this way:
1
b− ae−st teQit dt
a
b
∫ = 1
b− ae−st teQit dt
a
∞
∫ − e−st teQit dtb
∞
∫⎛
⎝⎜⎜
⎞
⎠⎟⎟
Now, the first improper integral appearing in the preceding expression is solved as fol-
lows:
e−st teQit dta
∞
∫ = e−s( u+a )( u+ a)eQi( u+a )du =0
∞
∫
= e−sa eQia a e−sueQiudu+ e−suueQiudu0
∞
∫0
∞
∫⎡
⎣⎢⎢
⎤
⎦⎥⎥=
= e−( sI −Qi )a a( sI − Qi )−1 + e−suueQiudu0
∞
∫⎡
⎣⎢⎢
⎤
⎦⎥⎥= e−( sI −Qi )a a( sI − Qi )−1 + ( sI − Qi )−2[ ]
where the first equality is obtained with the substitution u = t − a. The second improper
integral is solved in a similar way. By combining all the intermediate results, with some
algebra we obtain:
Eo
i ,i( s ) = ( sI − Qi )−2
b− a( b− a)( sI − Qi )+ ( Qi − (1− s )I ) e−( sI −Qi )b − e−( sI −Qi )a( )⎡⎣⎢
⎤⎦⎥
B.5 Erlang distribution
We consider now as a last case that of the Erlang distribution with r stages of parameter
λ . The probability density function cumulative and the distribution function of such
Erlang variable are defined as follows:
LSTs of the kernel matrix blocks 143
f i( t ) = λr tr−1e−λt
( r − 1)!, t ≥ 0 Fi( t ) = 1− e−λt th
h!h=0
r−1
∑ , t ≥ 0
The global kernel blocks can be rewritten as follows:
K o
i ,i+1( s ) = L[eQi( t ) λr tr−1e−λt
( r − 1)!]Δ i =
λr
( r − 1)!L[eQi( t ) tr−1e−λt ]Δ i =
= λr
( r − 1)!(−1)r−1 ∂ r−1
∂sr−1L[eQi( t )e−λt ]Δ i =
λr
( r − 1)!(−1)r−1 ∂ r−1
∂sr−1((λ + s )I − Qi )−1Δ i
The derivative of the inverse of matrix (λ + s )I − Qi with respect to s can be obtained as
follows:
∂ r−1
∂sr−1((λ + s )I − Qi )−1 = ∂ r−1
∂sr−1
1
λ + sI − Qi
λ + s
⎛⎝⎜
⎞⎠⎟−1
= ∂ r−1
∂sr−1
1
λ + s
Qih
(λ + s )h+1h=0
∞
∑ =
= Q
ih ∂ r−1
∂sr−1(λ + s )−( h+1)
h=0
∞
∑ = Qih (−1)r−1( h+ r − 1)!
h!(λ + s )h+rh=0
∞
∑
By substituting this expression into the LST of Koi ,i+1( s ) we obtain:
K o
i ,i+1( s ) = λr
( r − 1)!(−1)r−1 Q
ih (−1)r−1( h+ r − 1)!
h!(λ + s )h+rh=0
∞
∑ Δ i =
= λr
(λ + s )r
h+ r − 1
h
⎛⎝⎜
⎞⎠⎟
Qih
(λ + s )hh=0
∞
∑ Δ i =λr
(λ + s )rI − Qi
λ + s
⎛⎝⎜
⎞⎠⎟−r
Δ i =
= λr ((λ + s )I − Qi )−r Δ i
The local kernel blocks become:
Eoi ,i( s ) = sL eQit e−λt th
h!h=0
r−1
∑⎡
⎣⎢⎢
⎤
⎦⎥⎥= s
1
h!h=0
r−1
∑ L[eQit e−λt th ]
LSTs of the kernel matrix blocks 144
By following the same steps of the global kernel we obtain:
s
1
h!h=0
r−1
∑ L[eQit e−λt th ] = s(−1)h
h!h=0
r−1
∑ ∂ h
∂sh((λ + s )I − Qi )−1 =
= s(−1)h
h!h=0
r−1
∑ Qip (−1)h( p+ h)!
p!(λ + s )p+h+1p=0
∞
∑ = s (λ + s )−( h+1)
h=0
r−1
∑ p+ h
p
⎛⎝⎜
⎞⎠⎟
Qip
(λ + s )pp=0
∞
∑ =
= s (λ + s )−( h+1)
h=0
r−1
∑ I − Qi
λ + s
⎛⎝⎜
⎞⎠⎟−( h+1)
= s ((λ + s )h=0
r−1
∑ I − Qi )−( h+1)
We observe that the same procedure we have here followed to compute the LSTs of the
kernel matrix blocks in the case of the Erlang distribution can be similarly applied to the
case of other more general phased distribution, such as the Coxian distributions [30] and
the phase-type distributions [61].