M. Keshtgary Shiraz university of Technology 1 Fall 1392 Shiraz University of Technology FAULT TOLERANT SYSTEMS DESIGN COURSE REFERENCES E. Dubrova, Fault-Tolerant Design, Springer, 2013 I. Koren and C. M. Krishna, Fault Tolerant Systems, Morgan-Kaufman 2007. Mostafa Abd-El-Barr, Design and Analysis of Reliable and Fault-Tolerant Computer Systems, Department of Information Science Kuwait University, Kuwait, Published by Imperial College Press, 2007 Shiraz University of Technology 2 OBJECTIVES understanding fault tolerance – faults and their effects (errors, failures) – redundancy techniques – evaluation of fault-tolerant systems – concepts and applications Shiraz University of Technology 3 OVERVIEW Introduction – definition of fault tolerance, applications Fundamentals of dependability – dependability attributes: reliability, availability, safety – dependability impairments: faults, errors, failures – dependability means Dependability evaluation techniques Shiraz University of Technology common measures: failure rate, MTTF, MTTR reliability block diagrams Markov processes 4 OVERVIEW Shiraz University of Technology Redundancy techniques – space redundancy • hardware redundancy: NMR • information redundancy: Parity check • software redundancy: Consistency Check – time redundancy: Re-computation 5 OVERVIEW Shiraz University of Technology 6 FAULT TOLERANCE Fault-tolerance is the ability of a system to continue performing its function in spite of faults broken connection hardware bug in program software Shiraz University of Technology 7 GOALS OF FAULT TOLERANCE The main goal of fault tolerance is to increase the dependability of a system Shiraz University of Technology 8 DEPENDABILITY Shiraz University of Technology 9 EXAMPLES OF SPECIFICATIONS OF PROPER SERVICE Shiraz University of Technology 10 DEPENDABILITY TREE Shiraz University of Technology 11 AVAILABILITY A(t) is the probability that a system is functioning correctly at the instant of time t Depends on Shiraz University of Technology How frequently the system becomes non-operational How quickly it can be repaired 12 AVAILABILITY STEADY-STATE Shiraz University of Technology 13 HIGH AVAILABILITY EXAMPLES Shiraz University of Technology 14 RELIABILITY a measure of the continuous delivery of service R(t) is the probability that a system operates without failure in the interval [0,t], given that it worked at time 0 We need high reliability when: Shiraz University of Technology even momentary periods of incorrect performance are unacceptable (Ex: aircraft) no repair possible (Ex: satellite, spacecraft) 15 RELIABILITY VERSUS AVAILABILITY Shiraz University of Technology 16 RELIABILITY VERSUS FAULT TOLERANCE Shiraz University of Technology 17 RELIABILITY VERSUS FAULT TOLERANCE Shiraz University of Technology 18 HOW FAULT TOLERANCE HELPS Shiraz University of Technology 19 SAFETY Safety is the probability that a system will either perform its function correctly or will discontinue its operation in a safe way. System is safe Shiraz University of Technology if it functions correctly, or if it fails, it remains in a safe state 20 HIGH SAFETY EXAMPLES Nuclear energy Banking don’t give the money if in doubt Shiraz University of Technology stop reactor if a problem occur 21 RELIABILITY VERSUS SAFETY Reliability is the probability that a system will perform its functions correctly Safety is the probability that a system will either work correctly or will stop in a manner that causes no harm Shiraz University of Technology 22 HOW FAULT TOLERANCE HELPS Fault tolerance techniques can improve safety by turning a system off if a failure of a certain sort is detected In a nuclear power plant the reaction process should be stopped if some discrepancy is detected Shiraz University of Technology 23 CONFIDENTIALITY absence of unauthorized disclosure of information Shiraz University of Technology 24 INTEGRITY absence of improper system state alterations or information Shiraz University of Technology 25 MAINTAINABILITY ability to undergo repairs and modifications Shiraz University of Technology 26 SECURITY Shiraz University of Technology is the concurrent existence of a) availability for authorized users only, b) confidentiality, and c) integrity with ‘improper’ meaning ‘unauthorized’. 27
© Copyright 2024 ExpyDoc