Design of Reliable Processors Based on Unreliable Devices Séminaire COMELEC Lirida Alves de Barros Naviner Paris, 1 July 213
Outline Basics on reliability Technology Aspects Design for Reliability Conclusions 2 /33 COMELEC Lirida Naviner Paris, June 213
What is Reliability? Reliability is the ability of a system or component to perform its reuired functions under stated conditions for a specified period of time. 3 /33 COMELEC Lirida Naviner Paris, June 213
What is Reliability? Reliability is the ability of a system or component to perform its reuired functions under stated conditions for a specified period of time. 3 /33 COMELEC Lirida Naviner Paris, June 213
When is Reliability Important? Safety critical applications Biomedical Transport Spatial Energy Security... But today also for Computer Communications Consumer 4 /33 COMELEC Lirida Naviner Paris, June 213
Is the Electronics Reliable? Radiations Shocks (mechanical, temperature) HW/SW system inputs Unexpected conditions of use Design errors Software failures Defects Process variation Ageing Noise Malicious attacks Human errors expected outputs Performance Power Data integrity Availability Security 5 /33 COMELEC Lirida Naviner Paris, June 213
Advances in CMOS Moore s law (popular form) : 2 N tr /mm 2 every 18 months Intel 44 (1971) : 1µm and 2.3 1 3 tr Intel Xeon Phi (212) : 22nm and 5 1 9 tr 6 /33 COMELEC Lirida Naviner Paris, June 213
More Moore Scaling issues Design complexity, test challenge, low power voltage Variability Modelling Sensitivity to unscaled environmental disturbances Scaling effects Yield decrease Reliability decrease 7 /33 COMELEC Lirida Naviner Paris, June 213
Scaling and Reliability ageing, single and multiple transient faults 8 /33 COMELEC Lirida Naviner Paris, June 213
Fault Propagation 9 /33 COMELEC Lirida Naviner Paris, June 213
Bathtub Curve 1 /33 COMELEC Lirida Naviner Paris, June 213
Some Metrics Mean Time To Failures MTTF Mean Time To Repair MTTR Mean Time Between Failures MTBF Failures In Time FIT Failure Rate R = e Time MTBF R = Prob(exact) 11 /33 COMELEC Lirida Naviner Paris, June 213
Research at Telecom ParisTech How to get optimized design of processors? 12 /33 COMELEC Lirida Naviner Paris, June 213
Research at Telecom ParisTech How to get optimized design of reliable processors based on unreliable devices? Risk minimization More (than) Moore Fabless generalization Reliability improvement penalties! 12 /33 COMELEC Lirida Naviner Paris, June 213
Masking Effects Reliability assessment! 13 /33 COMELEC Lirida Naviner Paris, June 213
Hardware Fault Injection 14 /33 COMELEC Lirida Naviner Paris, June 213
Probabilistic Transfer Matrices (PTM) inputs 1 1 11 a b outputs 1 1 1 1 1 IT M OR R OR = inputs 1 1 11 z outputs 1 1 1 1 1 P T M OR inputs 1 1 11 1 11 11 111 outputs 1 1 1 1 1 1 1 1 1 a bc R XOR3 = inputs 1 1 11 1 11 11 111 z 1 1 1 1 outputs 1 1 1 1 1 IT M XOR3 P T M XOR3 [1] S. Krishnaswamy, G.F Viamontes, I.L. Markov, I.L and J.P Hayes, Accurate reliability evaluation and enhancement via probabilistic transfer matrices, DATE 25 15 /33 COMELEC Lirida Naviner Paris, June 213
PTM Computation A B C A Level 1 (L1) B C Level 1 (L1) Level 2 (L2) Level 2 (L2) PTM = PTM PTM L1 AND NOT PTM = PTM PTM L1 AND NOT p p p S PTM p p NOT = p PTM AND = PTM NOR p p = p p p p S PTM p NOT = p PTM AND = PTM NOR p = p p p p 2 p 2 p p p p 2 p p p 2 p p p 2 p 2 p p p p p p 2 p 2 p 2 p p p 2 = p p = p p p 2 p 2 p p 2 p p p 2 p p p p 2 p p p p 2 p 2 2 p p p p p p p 2 2 p p p 2 p 2 = p p p p = p pp 2 p p 2 p 2 p p p 2 p p p 2 p p p p 2 p p 2 p p p p 2 2 p p p p p 16 /33 COMELEC Lirida Naviner Paris, June 213
PTM Computation A B C Level 1 (L1) Level 2 (L2) p p S PTM p NOT = p PTM AND = PTM NOR p = p p p p p p p p p p 2 p p p 2 p p p 2 p p p p p 2 p p 2 p 2 p 2p 2 + p 2 + 3 p 2 + 2p 2 + p 3 p p 2 p p p 2 PTM = PTM PTM 2 p p = p 2 = p 2 + 3p 2 L1 AND NOT p p p p p 2 2p 2 + p 3+ 3 p p 2 p 2 p p p p 2p 2 + 2 p 2 + 3 p p p p 2 2 + 2p 2 + p 3 2 p p p 2 p p p 2 p p p + 3p 2 p p 2 2p 2 + p 3+ 3 PTM CIR = PTM L1 PTM NOR = p 2 p 2 = p p p 2 2 p 2p p + p p + 3 2 p p 2 + 2p 2 + p 3 2 p p p 2 p p 2 + 3p 2 2p 2 + p 3+ 3 p 2 p p 2 2p 2 + p 3+ 3 3p 2 + p 2 p p 2 2 p 2p 2 + p 2 + 3 p 2 + 2p 2 + p 3 16 /33 COMELEC Lirida Naviner Paris, June 213
Reliability Calculation with PTM R cir = p(j i)p(i) ITM cir (i,j)=1 p(i) is the probability of input value i p(j i) is the (i, j)th element in PTM cir 2p 2 + p 2 + 3 1 1 p 2 + 3p 2 1 2p 2 + p 2 + 3 ITM CIR = 1 p 2 + 3p 2 PTM CIR = 1 2p 2 + p 2 + 3 1 p 2 + 3p 2 1 2p 2 + p 3+ 3 1 2p 2 + p 2 + 3 p 2 + 2p 2 + p 3 2p 2 + p 3+ 3 p 2 + 2p 2 + p 3 2p 2 + p 3+ 3 p 2 + 2p 2 + p 3 2p 2 + p 3+ 3 3p 2 + p 2 p 2 + 2p 2 + p 3 17 /33 COMELEC Lirida Naviner Paris, June 213
Signal Probability and Reliability (SPR) [ c 1 S = i i 1 c ] SPR S = [ p(s = c ) p(s = 1 i ) p(s = i ) p(s = 1 c ) ] [ s s = 1 s 2 s 3 ] R S = p(s = c ) + p(s = 1 c ) = s + s 3 [1] D. Franco, M. Vasconcelos, L. Naviner et J.-F. Naviner, Signal Probability for Reliability Evaluation of Logic Circuits, Microelectronics Reliability Journal, Septembre 28, vol. 48, pp. 1586-1591 18 /33 COMELEC Lirida Naviner Paris, June 213
Signal Reliability Propagation A 4 = B 4 =.5.5.5.5 NAND =.92 a y b y IT M NAND 1 1 1 1 y 3.25.25.25.25 I NAND = A 4 B 4.8.92.8.92.8.92.92.8 P T M NAND =.2.2.2.23 p(y).23.23.23.2.23.6 Y 4.2.69 19 /33 COMELEC Lirida Naviner Paris, June 213
Reconvergent Signals a = b A 4 = B 4 = a a 3 b b 3 a b circuit R NOT = y() y(1) Y () 4 = Y (1) 4 = a 3 a b 3 b a 3 a b 3 b R(circuit) = R y() R y(1) = a 3 b 3 2 + a 3 b 2 + a b 3 2 + a b 2 2 /33 COMELEC Lirida Naviner Paris, June 213
Multipath SPR P ass 1 : 1 A 4 = circuit a R NOT = y() y(1) Y () 4 = Y (1) 4 = R(circuit, a ) = R y() R y(1) a = a 2 P ass 2 : A 4 = 1 a circuit R NOT = y() y(1) Y () 4 = Y (1) 4 = R(circuit, a 3 ) = R y() R y(1) a 3 = a 3 2 R(circuit) = R(circuit, a ) + R(circuit, a 3 ) = a 2 + a 3 2 21 /33 COMELEC Lirida Naviner Paris, June 213
Multipath SPR A G1 D G3 B S C E G2 SPR Di = ITM G1 (SPR A SPR Bi ) PTM G1 (1) SPR Ei = ITM G2 (SPR Bi SPR C ) PTM G2 (2) SPR Si = ITM G3 (SPR Di SPR Ei ) PTM G3 (3) 4 SPR S = SPR Si w i (4) i=1 [1] D. Teixeira Franco, M. Rabelo de Vasconcelos, L. A. B. de Naviner, and J. cois Naviner, "A Tool for Signal Reliability Analysis of Logic Circuits," DATE 9 22 /33 COMELEC Lirida Naviner Paris, June 213
Conditional Probability I1 B1 Z1 S I2 B2 Z2 p(z 1 = i) = f 1(S, I1, B1) p(z 1 = i S = k) = f 1(I1, B1) p(z 2 = j) = f 2(S, I2, B2) p(z 2 = j S = k) = f 2(I2, B2) 23 /33 COMELEC Lirida Naviner Paris, June 213
Conditional Probability A c i 1i 1c B B B B c i 1i 1c c i 1i 1c c i 1i 1c c i 1i 1c G G G G G G G G G G G G G G G G c 1i i 1c 1i c 1c i i 1c c 1i 1c i 1i c 1i c 1c i c 1i i 1c 1c i 1i c i 1c 1i c p(z A1) p(z A2) p(z A3) p(z A4) Logic XOR 24 /33 COMELEC Lirida Naviner Paris, June 213
Conditional Probability Matrices (CPM) CPM Z = p(z 1 /a 1 ) p(z 1 /a 2 ) p(z 1 /a 3 ) p(z 1 /a 4 ) p(z 2 /a 1 ) p(z 2 /a 2 ) p(z 2 /a 3 ) p(z 2 /a 4 ) p(z 3 /a 1 ) p(z 3 /a 2 ) p(z 3 /a 3 ) p(z 3 /a 4 ) p(z 4 /a 1 ) p(z 4 /a 2 ) p(z 4 /a 3 ) p(z 4 /a 4 ) [1] J. Torras Flauer, J.-M. Daveau, L. Naviner and Ph. Roche, Handling reconvergent paths using conditional probabilities in combinatorial logic netlists reliability estimation, IEEE ICECS 1. 25 /33 COMELEC Lirida Naviner Paris, June 213
Conditional Probability Matrices (CPM) CPM Y = p(y 1 /s1 1... sn 1 )... p(y 1/s4 1... sn 4 ) p(y 2 /s1 1... sn 1 )... p(y 1/s4 1... sn 4 ) p(y 3 /s1 1... sn 1 )... p(y 1/s4 1... sn 4 ) p(y 4 /s1 1... sn 1 )... p(y 1/s4 1... sn 4 ) [1] J. Torras Flauer, J.-M. Daveau, L. Naviner and Ph. Roche, Handling reconvergent paths using conditional probabilities in combinatorial logic netlists reliability estimation, IEEE ICECS 1. 25 /33 COMELEC Lirida Naviner Paris, June 213
Reliability based on CPM A B P G C D P1 G1 PM GN X Y Gout Z SPR Z = ITM T G out M XY PTM T G out M N CPM X/A B = CPM Pi CPM Y /A B = CPM Gj i= i= 4 4 p(x i y j ) = p(x i /a k b l ) p(y j /a k b l ) p(a k ) p(b j ) k=1 l=1 26 /33 COMELEC Lirida Naviner Paris, June 213
Probabilistic Binomial Reliability Model R = 2 n e 1 j= fault free i faulty 2 n x 1 ( ) (1 i ) p(x i ) y(x i, e ) y(x i, e j ) i= Logical masking features Technology/profile aspects Relevant faults Relevant input vectors n m x generic logic circuit y [1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliability Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS and TAISA Conference, Montréal, Canada, June 28. 27 /33 COMELEC Lirida Naviner Paris, June 213
Probabilistic Binomial Reliability Model R = 2 n e 1 j= fault free i faulty 2 n x 1 ( ) (1 i ) p(x i ) y(x i, e ) y(x i, e j ) i= Logical masking features Technology/profile aspects Relevant faults Relevant input vectors n m x generic logic circuit y [1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliability Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS and TAISA Conference, Montréal, Canada, June 28. 27 /33 COMELEC Lirida Naviner Paris, June 213
Probabilistic Binomial Reliability Model R = 2 n e 1 j= fault free i faulty 2 n x 1 ( ) (1 i ) p(x i ) y(x i, e ) y(x i, e j ) i= Logical masking features Technology/profile aspects Relevant faults Relevant input vectors n m x generic logic circuit y [1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliability Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS and TAISA Conference, Montréal, Canada, June 28. 27 /33 COMELEC Lirida Naviner Paris, June 213
Probabilistic Binomial Reliability Model R = 2 n e 1 j= fault free i faulty 2 n x 1 ( ) (1 i ) p(x i ) y(x i, e ) y(x i, e j ) i= Logical masking features Technology/profile aspects Relevant faults Relevant input vectors n m x generic logic circuit y [1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliability Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS and TAISA Conference, Montréal, Canada, June 28. 27 /33 COMELEC Lirida Naviner Paris, June 213
Probabilistic Binomial Reliability Model R = 2 n e 1 j= fault free i faulty 2 n x 1 ( ) (1 i ) p(x i ) y(x i, e ) y(x i, e j ) i= Logical masking features Technology/profile aspects Relevant faults Relevant input vectors n m x generic logic circuit y [1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliability Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS and TAISA Conference, Montréal, Canada, June 28. 27 /33 COMELEC Lirida Naviner Paris, June 213
PBR Implementation Simulate (or emulate) errors and then compare outputs FIFA : Fault-Injection Fault-Analysis platform [1,2] Acceleration of fault injection and functional simulation input generation n inputs n inputs n fault free circuit fault prone circuit outputs m outputs m comparison coefficients storage fault generation G [1] E. Marues, N. Maciel, L. Naviner and J. F. Naviner, A New Fault Generator Suitable for Reliability Analysis of Digital Circuits, IEEE CUMTA 1. [2] L. Alves de Barros Naviner, J. F. Naviner, G. Gonçalves dos Santos Jr, E. Crespo Marues and N. Maciel, FIFA : A Fault-Injection-Fault-Analysis-based tool for reliability assessment at RTL level, ES- REF 11. 28 /33 COMELEC Lirida Naviner Paris, June 213
SNAP Approach Fault source and fault propagation input output input (a) output ifs analysis gf = 1 ofs (b) ofs = f (ifs, gf, input, gate) [1] S. Pagliarini, A. Ben Dhia, L. Naviner and J. F. Naviner, SNaP : a Novel Hybrid Method for Circuit Reliability Assessment Under Multiple Faults, ESREF 13. 29 /33 COMELEC Lirida Naviner Paris, June 213
Reliability Calculation based on SNAP R = ofs inputa inputb output (a) inputa ifsa inputb ifsb analysis gf = 1 analysis gf = 1 (b) analysis gf = 2 output ofs 3 /33 COMELEC Lirida Naviner Paris, June 213
Comparisons Accuracy Complexity Trade-off FPGA PTM - - SPR - - SPR-MP Yes - CPM Yes - PBR Yes Yes SNAP - Yes 31 /33 COMELEC Lirida Naviner Paris, June 213
Conclusions Reliability issues and challenges Need of cost-effective fault tolerant architectures Need of efficient assessment approaches Some Telecom ParisTech contributions (models, methods and tools) SPR, SPR-MP, CPM, SPR-DWAA PBR, FIFA, SNAP,... 32 /33 COMELEC Lirida Naviner Paris, June 213
Conclusions Reliability issues and challenges Need of cost-effective fault tolerant architectures Need of efficient assessment approaches Some Telecom ParisTech contributions (models, methods and tools) SPR, SPR-MP, CPM, SPR-DWAA PBR, FIFA, SNAP,... 32 /33 COMELEC Lirida Naviner Paris, June 213
To know more about... Contact : Visit : Prof. Lirida A. B. Naviner lirida.naviner@telecom-paristech.fr http ://nanoelec.wp.mines-telecom.fr 33 /33 COMELEC Lirida Naviner Paris, June 213