Revisiting Memory Errors in Large-Scale Production Data Centers

Size: px
Start display at page:

Download "Revisiting Memory Errors in Large-Scale Production Data Centers"

Transcription

1 Revisiting Memory Errors in Large-Scale Production Da Centers Analysis and Modeling of New Trends from the Field Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu

2 Overview Study of DRAM reliability: on modern devices and workloads at a large scale in the field

3 Overview Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

4 Overview Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Errors follow a power-law distribution and a large number of errors occur due to sockets/channels Modeling errors Architecture & workload

5 Overview Error/failure occurrence We find that newer cell fabrication technologies have higher failure rates Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

6 Overview Error/failure occurrence Chips per DIMM, transfer width, and workload type (not necessarily CPU/ memory utilization) affect reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

7 Overview Error/failure occurrence We have made publicly available a stistical model for assessing server memory reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

8 Overview Error/failure occurrence Page offlining at scale First large-scale study of page offlining; real-world limitions of technique New reliability trends Technology scaling Modeling errors Architecture & workload

9 Outline background and motivation server memory organization error collection/analysis methodology memory reliability trends summary

10 Background and motivation

11 DRAM errors are common examined extensively in prior work charged particles, wear-out variable retention time (next lk) error correcting codes used to detect and correct errors require additional storage overheads

12 Our goal Strengthen undersnding of DRAM reliability by studying: new trends in DRAM errors modern devices and workloads at a large scale billions of device-days, across 4 months

13 Our main contributions identified new DRAM failure trends developed a model for DRAM errors evaluated page offlining at scale

14 Server memory organization

15

16 Socket

17 Memory channels

18 DIMM slots

19 DIMM 24 2

20 Chip 24 2

21 Banks 24 2

22 24 2 Rows and columns

23 24 2 Cell

24 User da 24 2 ECC meda additional 2.% overhead

25 Reliability events Fault the underlying cause of an error DRAM cell unreliably stores charge Error the manifestion of a fault permanent: every time transient: only some of the time

26 Error collection/ analysis methodology

27 DRAM error measurement measured every correcble error across Facebook's fleet for 4 months meda associated with each error parallelized Map-Reduce to process used R for further analysis

28 System characteristics 6 different system configurations Web, Hadoop, Ingest, Dabase, Cache, Media diverse CPU/memory/storage requirements modern DRAM devices DDR communication protocol (more aggressive clock frequencies) diverse organizations (banks, ranks,...) previously unexamined characteristics density, # of chips, transfer width, workload

29 Memory reliability trends

30 Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

31 Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

32 Server error rate )UaFtion of seuveus CoUUeFEle euuous (C() 8nFoUUeFEle euuous (8C() % 7/ 8/ 9/ / / 2/ /4 2/4 /4 4/4 /4 6/4 7/4 8/4.% onth

33 Memory error distribution

34 Memory error distribution creasing hazard rate

35 Memory error distribution creasing hazard rate How are errors mapped to memory organization?

36 Sockets/channels: many errors Socket Channel Bank Row Column Cell Spurious Fraction of errors...

37 Sockets/channels: many errors Fraction of errors... Socket Channel Bank Row Not mentioned in prior chiplevel studies Column Cell Spurious

38 Sockets/channels: many errors Fraction of errors... Socket Channel Bank Row Not accounted for in prior chiplevel studies At what rate do components fail on servers? Column Cell Spurious

39 Bank/cell/spurious failures are common Socket Channel Bank Row Column Cell Spurious Fraction of servers...

40 # of servers # of errors Socket Channel Fraction of servers... Fraction of errors Socket Channel... nial-ofservice like behavior

41 # of servers # of errors Socket Channel Fraction of servers... Socket Fraction of errors Channel... nial-ofservice like behavior What factors contribute to memory failures at scale?

42 Analytical methodology measure server characteristics not feasible to examine every server examined all servers with errors (error group) sampled servers without errors (control group) bucket devices based on characteristics measure relative failure rate of error group vs. control group within each bucket

43 Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

44 Prior work found inconclusive trends with respect to memory capacity Relative server failure rate DIMM capacity (GB)

45 Prior work found inconclusive trends with respect to memory capacity Relative server failure rate... Examine characteristic more closely related to cell fabrication technology DIMM capacity (GB)

46 Use DRAM chip density to examine technology scaling (closely related to fabrication technology)

47 Relative server failure rate Chip density (Gb)

48 Relative server failure rate... Intuition: quadratic increase in capacity 2 4 Chip density (Gb)

49 Error/failure occurrence We find that newer cell fabrication technologies have higher failure rates Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

50 Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

51 DIMM architecture chips per DIMM, transfer width 8 to 48 chips x4, x8 = 4 or 8 bits per cycle electrical implications D Registe Clock il C. il D Back 2..2 Side 24 2 f Con B.46mm m il of Con C il of Con D ~

52 DIMM architecture Does DIMM organization affect memory reliability? chips per DIMM, transfer width 8 to 48 chips x4, x8 = 4 or 8 bits per cycle electrical implications D Registe Clock il C. il D Back 2..2 Side 24 2 f Con B.46mm m il of Con C il of Con D ~

53 Gb 2 Gb 4 Gb Relative server failure rate... x8 x Da chips per DIMM

54 Gb 2 Gb 4 Gb Relative server failure rate... x8 x Da chips per DIMM

55 Gb 2 Gb 4 Gb Relative server failure rate... x8 No consistent trend across only chips per DIMM x Da chips per DIMM

56 Gb 2 Gb 4 Gb Relative server failure rate... x8 x Da chips per DIMM

57 Gb 2 Gb 4 Gb Relative server failure rate... x8 x Da chips per DIMM

58 Gb 2 Gb 4 Gb Relative server failure rate... x8 x Da chips per DIMM

59 Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 More chips higher failure rate Da chips per DIMM

60 Gb 2 Gb 4 Gb Relative server failure rate... x8 x Da chips per DIMM

61 Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 More bits per cycle higher failure rate Da chips per DIMM

62 Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 Intuition: increased electrical loading Da chips per DIMM

63 Workload dependence prior studies: homogeneous workloads web search and scientific warehouse-scale da centers: web, hadoop, ingest, dabase, cache, media

64 Workload dependence prior studies: homogeneous workloads What affect to heterogeneous workloads have on reliability? web search and scientific warehouse-scale da centers: web, hadoop, ingest, dabase, cache, media

65 Gb 2 Gb 4 Gb Relative server failure rate... Relative server failure rate CPU utilization.2..7 Memory utilization

66 Gb 2 Gb 4 Gb Relative server failure rate... Relative server failure rate No consistent trend across CPU/memory utilization CPU utilization.2..7 Memory utilization

67 Web Hadoop Ingest Dabase Memcache Media Relative server failure rate...

68 Web Hadoop Ingest Dabase Memcache Media Relative server failure rate... Varies by up to 6.x

69 Error/failure occurrence Chips per DIMM, transfer width, and workload type (not necessarily CPU/ memory utilization) affect reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

70 Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

71 A model for server failure use stistical regression model compare control group vs. error group linear regression in R trained using da from analysis enable exploratory analysis high perf. vs. low power systems

72 nsity Chips Age Memory error model Relative server failure rate...

73 nsity Chips Age Memory error model Relative server failure rate... Number of physical CPU cores in the server. 2 Yes ln ŒF=. F/ç D ˇIntercept C.Capacity ˇCapacity / C.nsity2Gb ˇnsity2Gb / C.nsity4Gb ˇnsity4Gb / C.Chips ˇChips / C.CPU% ˇCPU% / C.Age ˇAge / C.CPUs ˇCPUs /

74 Available online

75 Error/failure occurrence We have made publicly available a stistical model for assessing server memory reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

76 Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

77 Prior page offlining work [Tang+,DSN'6] proposed technique "retire" faulty pages using OS do not allow software to allocate them [Hwang+,ASPLOS'2] simulated eval. error traces from Google and IBM recommended retirement on first error large number of cell/spurious errors

78 Prior page off lining work [Tang+,DSN'6] proposed technique "retire" faulty pages using OS How effective is page offlining in the wild? do not allow software to allocate them [Hwang+,ASPLOS'2] simulated eval. error traces from Google and IBM recommended retirement on first error large number of cell/spurious errors

79 Cluster of 2,276 servers

80 Cluster of 2,276 servers -67%

81 Cluster of 2,276 servers -67% Prior work: -86% to -94%

82 Cluster of 2,276 servers -67% 6% of page offlining attempts failed due to OS Prior work: -86% to -94%

83 Error/failure occurrence Page offlining at scale First large-scale study of page offlining; real-world limitions of technique New reliability trends Technology scaling Modeling errors Architecture & workload

84 Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

85 More results in paper Vendors Age Processor cores Correlation analysis Memory model case study

86 Summary

87 Modern systems Large scale

88 Summary Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

89 Summary Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Errors follow a power-law distribution and a large number of errors occur due to sockets/channels Modeling errors Architecture & workload

90 Summary Error/failure occurrence We find that newer cell fabrication technologies have higher failure rates Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

91 Summary Error/failure occurrence Chips per DIMM, transfer width, and workload type (not necessarily CPU/ memory utilization) affect reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

92 Summary Error/failure occurrence We have made publicly available a stistical model for assessing server memory reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

93 Summary Error/failure occurrence Page offlining at scale First large-scale study of page offlining; real-world limitions of technique New reliability trends Technology scaling Modeling errors Architecture & workload

94 Revisiting Memory Errors in Large-Scale Production Da Centers Analysis and Modeling of New Trends from the Field Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu

95 Backup slides

96 creasing hazard rate

97 A n Co of. il et llim mi s: t i Un s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle..8 A n Co of. il B ts c on fc il o s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle D A n Co of. il il o e.8 B ts c on fc il o C. et llim mi s: t i Un f n Co 2.. s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. x ma ~ ted s ise w r e 2. mm.. et llim mi s: t i Un f C on fc D.8 D il o e n Co. x ma ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc 2. mm ted s ise w r e ~. o il.2 D.8 B ts c on fc il o f C on fc 6m.4 D il o e n Co e Sid m max 6m.4 o il. x ma P_ W :WP_ 89 _482 CWM APCP 2. mm 7.2. A n Co of. il.6 on fc 7. D. ~..2 o il.. X 2 e Sid m max 2 ck Ba C il. 47 6m.4 D.. 7. il D B ts c on fc il o C 4X 2..8 on fc il A 2.. X. e Sid m max 6m D 24 f 9 8. TS 2 SPD il B ck Ba C il il D 2.. nt Fro. 2. 4X 2. il o e. A R8 BF 2 R7 G MT -H.2 7. o il n Co il A 2.. X. e Sid m max ck Ba C il. 47. X il D x7 G il B.8 4X 9 8. TS 2 SPD il A nt Fro. P_ W :WP_ 89 _482 CWM APCP. 7.2 C il. 47. A R8 BF 2 R7 G MT -H x7 G il B il D ck Ba 9 8. TS 2 SPD X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H TS 2 SPD ring r iste Reg k Drive Cloc 2.. nt Fro P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e s er 7

98 A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 4Gb oth ss nle. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb x ma D A n Co of. il il o e B ts c on fc il o C. et llim mi s: t i Un f n Co mm ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4, A n Co of. il.2 x ma e Sid m max.. X on fc C il ck Ba 2.. 6m.4 2. mm X il D ~ B ts c on fc il o 7. o il 2. D 24 f C il A e Sid m max.. X D 2. il o e 6m.4 on fc C il o il n Co ck Ba. 4X il D X. e Sid m max 7. il A 9 8. TS 2 SPD il B X C il ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

99 A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 4Gb. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb x ma D A n Co of. il il o e B ts c on fc il o C. et llim mi s: t i Un f n Co mm ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb Errors nsity..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4, A n Co of. il.2 x ma e Sid m max.. X on fc C il ck Ba 2.. 6m.4 2. mm X il D ~ B ts c on fc il o 7. o il 2. D 24 f C il A e Sid m max.. X D 2. il o e 6m.4 on fc C il o il n Co ck Ba. 4X il D X. e Sid m max 7. il A 9 8. TS 2 SPD il B X C il ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

100 A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 4Gb. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb x ma D A n Co of. il il o e B ts c on fc il o C. et llim mi s: t i Un f n Co mm ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb Buckets Errors nsity..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4, A n Co of. il.2 x ma e Sid m max.. X on fc C il ck Ba 2.. 6m.4 2. mm X il D ~ B ts c on fc il o 7. o il 2. D 24 f C il A e Sid m max.. X D 2. il o e 6m.4 on fc C il o il n Co ck Ba. 4X il D X. e Sid m max 7. il A 9 8. TS 2 SPD il B X C il ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

101 A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 4Gb. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb x ma D A n Co of. il il o e B ts c on fc il o C. et llim mi s: t i Un f n Co mm ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb Errors nsity..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4, A n Co of. il.2 x ma e Sid m max.. X on fc C il ck Ba 2.. 6m.4 2. mm X il D ~ B ts c on fc il o 7. o il 2. D 24 f C il A e Sid m max.. X D 2. il o e 6m.4 on fc C il o il n Co ck Ba. 4X il D X. e Sid m max 7. il A 9 8. TS 2 SPD il B X C il ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

102 A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 4Gb. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb x ma D A n Co of. il il o e B ts c on fc il o C. et llim mi s: t i Un f n Co mm ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb Errors nsity..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4, A n Co of. il.2 x ma e Sid m max.. X on fc C il ck Ba 2.. 6m.4 2. mm X il D ~ B ts c on fc il o 7. o il 2. D 24 f C il A e Sid m max.. X D 2. il o e 6m.4 on fc C il o il n Co ck Ba. 4X il D X. e Sid m max 7. il A 9 8. TS 2 SPD il B X C il ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

103 Case study

104 Case study Factor Low-end High-end (HE) Capacity 4 GB 6 GB nsity2gb nsity4gb Chips 6 2 CPU% % 2% Age CPUs 8 6 Predicted relative failure rate.2.78 Inputs Output

105 Case study Factor Low-end High-end (HE) Capacity 4 GB 6 GB nsity2gb nsity4gb Chips a higher 6 impact? 2 CPU% % 2% Age CPUs 8 6 Predicted relative failure rate.2.78 Does CPUs or density have Inputs Output

106 Exploratory analysis Factor Low-end High-end (HE) HE/#density HE/#CPUs Capacity 4 GB 6 GB 4 GB 6 GB nsity2gb nsity4gb Chips CPU% % 2% 2% % Age CPUs Predicted relative failure rate

107 Exploratory analysis Factor Low-end High-end (HE) HE/#density HE/#CPUs Capacity 4 GB 6 GB 4 GB 6 GB nsity2gb nsity4gb Chips CPU% % 2% 2% % Age CPUs Predicted relative failure rate

CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY

CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY MEMSYS-2017 SWAMIT TANNU DOUG CARMEAN MOINUDDIN QURESHI Why Quantum Computers? 2 Molecule and Material Simulations

More information

RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures

RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures Guangyan Zhang, Zican Huang, Xiaosong Ma SonglinYang, Zhufan Wang, Weimin Zheng Tsinghua University Qatar Computing Research

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results Vendor and Hardware Platform: IBM System x3950 M2 Virtualization Platform: VMware ESX 3.5.0 U2 Build 110181 Performance VMware VMmark V1.1 Results Tested By: IBM Inc., RTP, NC Test Date: 2008-09-20 Performance

More information

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs April 16, 2009 John Wawrzynek Spring 2009 EECS150 - Lec24-blocks Page 1 Cross-coupled NOR gates remember, If both R=0 & S=0, then

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Tradeoff between Reliability and Power Management

Tradeoff between Reliability and Power Management Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,

More information

DRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13

DRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13 slide 1 DRAM Reliability: Parity, ECC, Chipkill, Scrubbing Alpha Particle or Cosmic Ray electron-hole pairs silicon Alpha Particles: Radioactive impurity in package material slide 2 - Soft errors were

More information

Tuan V. Dinh, Lachlan Andrew and Philip Branch

Tuan V. Dinh, Lachlan Andrew and Philip Branch Predicting supercomputing workload using per user information Tuan V. Dinh, Lachlan Andrew and Philip Branch 13 th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Delft, 14 th -16

More information

Enhancing Multicore Reliability Through Wear Compensation in Online Assignment and Scheduling. Tam Chantem Electrical & Computer Engineering

Enhancing Multicore Reliability Through Wear Compensation in Online Assignment and Scheduling. Tam Chantem Electrical & Computer Engineering Enhancing Multicore Reliability Through Wear Compensation in Online Assignment and Scheduling Tam Chantem Electrical & Computer Engineering High performance Energy efficient Multicore Systems High complexity

More information

Statistical Reliability Modeling of Field Failures Works!

Statistical Reliability Modeling of Field Failures Works! Statistical Reliability Modeling of Field Failures Works! David Trindade, Ph.D. Distinguished Principal Engineer Sun Microsystems, Inc. Quality & Productivity Research Conference 1 Photo by Dave Trindade

More information

REPORT, RE0813, MIL, ENV, 810G, TEMP, IN-HOUSE, , PASS

REPORT, RE0813, MIL, ENV, 810G, TEMP, IN-HOUSE, , PASS REPORT, RE0813, MIL, ENV, 810G, TEMP, IN-HOUSE, REV. A C r y s t a l G r o u p I n c 8 5 0 K a c e n a R d., H i a w a t h a, I A P h o n e : 8 7 7-2 7 9-7 8 6 3 F a x : 3 1 9-3 9 3-2 3 3 8 8 / 1 5 / 2

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

ArcGIS Deployment Pattern. Azlina Mahad

ArcGIS Deployment Pattern. Azlina Mahad ArcGIS Deployment Pattern Azlina Mahad Agenda Deployment Options Cloud Portal ArcGIS Server Data Publication Mobile System Management Desktop Web Device ArcGIS An Integrated Web GIS Platform Portal Providing

More information

ww.padasalai.net

ww.padasalai.net t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 19: March 29, 2018 Memory Overview, Memory Core Cells Today! Charge Leakage/Charge Sharing " Domino Logic Design Considerations! Logic Comparisons!

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

ECE 3060 VLSI and Advanced Digital Design. Testing

ECE 3060 VLSI and Advanced Digital Design. Testing ECE 3060 VLSI and Advanced Digital Design Testing Outline Definitions Faults and Errors Fault models and definitions Fault Detection Undetectable Faults can be used in synthesis Fault Simulation Observability

More information

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

! Charge Leakage/Charge Sharing.  Domino Logic Design Considerations. ! Logic Comparisons. ! Memory.  Classification.  ROM Memories. ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 9: March 9, 8 Memory Overview, Memory Core Cells Today! Charge Leakage/ " Domino Logic Design Considerations! Logic Comparisons! Memory " Classification

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Yifeng Guo, Dakai Zhu The University of Texas at San Antonio Hakan Aydin George Mason University Outline Background and Motivation

More information

Chapter 7. Sequential Circuits Registers, Counters, RAM

Chapter 7. Sequential Circuits Registers, Counters, RAM Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage

More information

One Optimized I/O Configuration per HPC Application

One Optimized I/O Configuration per HPC Application One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University

More information

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers Sarani Bhattacharya and Debdeep Mukhopadhyay Indian Institute of Technology Kharagpur PROOFS 2016 August

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra

More information

PC100 Memory Driver Competitive Comparisons

PC100 Memory Driver Competitive Comparisons Fairchild Semiconductor Application Note March 1999 Revised December 2000 PC100 Memory Driver Competitive Comparisons Introduction The latest developments in chipset and motherboard design have taken memory

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

DM74LS670 3-STATE 4-by-4 Register File

DM74LS670 3-STATE 4-by-4 Register File DM74LS670 3-STATE 4-by-4 Register File General Description August 1986 Revised March 2000 These register files are organized as 4 words of 4 bits each, and separate on-chip decoding is provided for addressing

More information

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0 NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented

More information

Moores Law for DRAM. 2x increase in capacity every 18 months 2006: 4GB

Moores Law for DRAM. 2x increase in capacity every 18 months 2006: 4GB MEMORY Moores Law for DRAM 2x increase in capacity every 18 months 2006: 4GB Corollary to Moores Law Cost / chip ~ constant (packaging) Cost / bit = 2X reduction / 18 months Current (2008) ~ 1 micro-cent

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 21: April 4, 2017 Memory Overview, Memory Core Cells Penn ESE 570 Spring 2017 Khanna Today! Memory " Classification " ROM Memories " RAM Memory

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each

More information

Redundant Array of Independent Disks

Redundant Array of Independent Disks Redundant Array of Independent Disks Yashwant K. Malaiya 1 Redundant Array of Independent Disks (RAID) Enables greater levels of performance and/or reliability How? By concurrent use of two or more hard

More information

Do we have a quorum?

Do we have a quorum? Do we have a quorum? Quorum Systems Given a set U of servers, U = n: A quorum system is a set Q 2 U such that Q 1, Q 2 Q : Q 1 Q 2 Each Q in Q is a quorum How quorum systems work: A read/write shared register

More information

P214 Efficient Computation of Passive Seismic Interferometry

P214 Efficient Computation of Passive Seismic Interferometry P214 Efficient Computation of Passive Seismic Interferometry J.W. Thorbecke* (Delft University of Technology) & G.G. Drijkoningen (Delft University of Technology) SUMMARY Seismic interferometry is from

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs)

EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) EECS150 - igital esign Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Nov 21, 2002 John Wawrzynek Fall 2002 EECS150 Lec26-ECC Page 1 Outline Error detection using parity Hamming

More information

EECS150 - Digital Design Lecture 26 Faults and Error Correction. Recap

EECS150 - Digital Design Lecture 26 Faults and Error Correction. Recap EECS150 - Digital Design Lecture 26 Faults and Error Correction Nov. 26, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof.

More information

ECE-470 Digital Design II Memory Test. Memory Cells Per Chip. Failure Mechanisms. Motivation. Test Time in Seconds (Memory Size: n Bits) Fault Types

ECE-470 Digital Design II Memory Test. Memory Cells Per Chip. Failure Mechanisms. Motivation. Test Time in Seconds (Memory Size: n Bits) Fault Types ECE-470 Digital Design II Memory Test Motivation Semiconductor memories are about 35% of the entire semiconductor market Memories are the most numerous IPs used in SOC designs Number of bits per chip continues

More information

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine

More information

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking Real-Time Course Clock synchronization 1 Clocks Processor p has monotonically increasing clock function C p (t) Clock has drift rate For t1 and t2, with t2 > t1 (1-ρ)(t2-t1)

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Fault Modeling. 李昆忠 Kuen-Jong Lee. Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan. VLSI Testing Class

Fault Modeling. 李昆忠 Kuen-Jong Lee. Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan. VLSI Testing Class Fault Modeling 李昆忠 Kuen-Jong Lee Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan Class Fault Modeling Some Definitions Why Modeling Faults Various Fault Models Fault Detection

More information

Introduction to Reliability Theory (part 2)

Introduction to Reliability Theory (part 2) Introduction to Reliability Theory (part 2) Frank Coolen UTOPIAE Training School II, Durham University 3 July 2018 (UTOPIAE) Introduction to Reliability Theory 1 / 21 Outline Statistical issues Software

More information

Computational and Monte-Carlo Aspects of Systems for Monitoring Reliability Data. Emmanuel Yashchin IBM Research, Yorktown Heights, NY

Computational and Monte-Carlo Aspects of Systems for Monitoring Reliability Data. Emmanuel Yashchin IBM Research, Yorktown Heights, NY Computational and Monte-Carlo Aspects of Systems for Monitoring Reliability Data Emmanuel Yashchin IBM Research, Yorktown Heights, NY COMPSTAT 2010, Paris, 2010 Outline Motivation Time-managed lifetime

More information

Comparing the Effects of Intermittent and Transient Hardware Faults on Programs

Comparing the Effects of Intermittent and Transient Hardware Faults on Programs Comparing the Effects of Intermittent and Transient Hardware Faults on Programs Jiesheng Wei, Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Department of Electrical and Computer Engineering,

More information

Does soft-decision ECC improve endurance?

Does soft-decision ECC improve endurance? Flash Memory Summit 2015 Does soft-decision ECC improve endurance? Thomas Parnell IBM Research - Zurich Outline Motivation Why do we need more endurance? Capacity What is channel capacity? How can it be

More information

Contributions to the Evaluation of Ensembles of Combinational Logic Gates

Contributions to the Evaluation of Ensembles of Combinational Logic Gates Contributions to the Evaluation of Ensembles of Combinational Logic Gates R. P. Ribas, S. Bavaresco, N. Schuch, V. Callegaro,. Lubaszewski, A. I. Reis Institute of Informatics FRGS, Brazil. ISCAS 29 &

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

EECS150 - Digital Design Lecture 26 - Faults and Error Correction. Types of Faults in Digital Designs

EECS150 - Digital Design Lecture 26 - Faults and Error Correction. Types of Faults in Digital Designs EECS150 - Digital Design Lecture 26 - Faults and Error Correction April 25, 2013 John Wawrzynek 1 Types of Faults in Digital Designs Design Bugs (function, timing, power draw) detected and corrected at

More information

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining

More information

Terminology and Concepts

Terminology and Concepts Terminology and Concepts Prof. Naga Kandasamy 1 Goals of Fault Tolerance Dependability is an umbrella term encompassing the concepts of reliability, availability, performability, safety, and testability.

More information

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories Digital Integrated Circuits A Design Perspective Semiconductor Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies Semiconductor Memory Classification

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Fault Tolerance. Dealing with Faults

Fault Tolerance. Dealing with Faults Fault Tolerance Real-time computing systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

GMU, ECE 680 Physical VLSI Design 1

GMU, ECE 680 Physical VLSI Design 1 ECE680: Physical VLSI Design Chapter VIII Semiconductor Memory (chapter 12 in textbook) 1 Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies

More information

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Prasanna Balaprakash, Leonardo A. Bautista Gomez, Slim Bouguerra, Stefan M. Wild, Franck Cappello, and Paul D. Hovland

More information

Chapter 7 HYPOTHESIS-BASED INVESTIGATION OF DIGITAL TIMESTAMPS. 1. Introduction. Svein Willassen

Chapter 7 HYPOTHESIS-BASED INVESTIGATION OF DIGITAL TIMESTAMPS. 1. Introduction. Svein Willassen Chapter 7 HYPOTHESIS-BASED INVESTIGATION OF DIGITAL TIMESTAMPS Svein Willassen Abstract Timestamps stored on digital media play an important role in digital investigations. However, the evidentiary value

More information

On Superposition of Heterogeneous Edge Processes in Dynamic Random Graphs

On Superposition of Heterogeneous Edge Processes in Dynamic Random Graphs On Superposition of Heterogeneous Edge Processes in Dynamic Random Graphs Zhongmei Yao (Univerity of Dayton) Daren B.H. Cline (Texas A&M University) Dmitri Loguinov (Texas A&M University) IEEE INFOCOM

More information

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 11

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 11 EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org (based on Dr. Raj Jain s lecture

More information

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis EE115C Winter 2017 Digital Electronic Circuits Lecture 19: Timing Analysis Outline Timing parameters Clock nonidealities (skew and jitter) Impact of Clk skew on timing Impact of Clk jitter on timing Flip-flop-

More information

416 Distributed Systems

416 Distributed Systems 416 Distributed Systems RAID, Feb 26 2018 Thanks to Greg Ganger and Remzi Arapaci-Dusseau for slides Outline Using multiple disks Why have multiple disks? problem and approaches RAID levels and performance

More information

Introduction to VLSI Testing

Introduction to VLSI Testing Introduction to 李昆忠 Kuen-Jong Lee Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan Class Problems to Think How are you going to test A 32 bit adder A 32 bit counter A 32Mb

More information

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3) CMPE12 - Notes chapter 1 Digital Logic (Textbook Chapter 3) Transistor: Building Block of Computers Microprocessors contain TONS of transistors Intel Montecito (2005): 1.72 billion Intel Pentium 4 (2000):

More information

Machine Learning to Automatically Detect Human Development from Satellite Imagery

Machine Learning to Automatically Detect Human Development from Satellite Imagery Technical Disclosure Commons Defensive Publications Series April 24, 2017 Machine Learning to Automatically Detect Human Development from Satellite Imagery Matthew Manolides Follow this and additional

More information

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory

More information

1 Brief Introduction to Quantum Mechanics

1 Brief Introduction to Quantum Mechanics CMSC 33001: Novel Computing Architectures and Technologies Lecturer: Yongshan Ding Scribe: Jean Salac Lecture 02: From bits to qubits October 4, 2018 1 Brief Introduction to Quantum Mechanics 1.1 Quantum

More information

PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1

PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1 PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1 AUGUST 7, 2007 APRIL 14, 2010 APRIL 24, 2012 Copyr i g h t 2012 O S Is o f t, L L C. 2 PI Data Archive Security PI Asset

More information

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1> Chapter 3 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 3 Chapter 3 :: Topics Introduction Latches and Flip-Flops Synchronous Logic Design Finite

More information

Semiconductor Memories

Semiconductor Memories Semiconductor References: Adapted from: Digital Integrated Circuits: A Design Perspective, J. Rabaey UCB Principles of CMOS VLSI Design: A Systems Perspective, 2nd Ed., N. H. E. Weste and K. Eshraghian

More information

CMSC 313 Lecture 25 Registers Memory Organization DRAM

CMSC 313 Lecture 25 Registers Memory Organization DRAM CMSC 33 Lecture 25 Registers Memory Organization DRAM UMBC, CMSC33, Richard Chang A-75 Four-Bit Register Appendix A: Digital Logic Makes use of tri-state buffers so that multiple registers

More information

Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions

Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions In the Proceedings of the International Conference on Dependable Systems and Networks(DSN 07), June 2007. Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions Xiaodong Li,

More information

Lecture 6 NEW TYPES OF MEMORY

Lecture 6 NEW TYPES OF MEMORY Lecture 6 NEW TYPES OF MEMORY Memory Logic needs memory to function (efficiently) Current memories Volatile memory SRAM DRAM Non-volatile memory (Flash) Emerging memories Phase-change memory STT-MRAM (Ferroelectric

More information

Coded Caching for Files with Distinct File Sizes

Coded Caching for Files with Distinct File Sizes Coded Caching for Files with Distinct File Sizes Jinbei Zhang, Xiaojun Lin, Chih-Chun Wang, Xinbing Wang Shanghai Jiao Tong University Purdue University The Importance of Caching Server cache cache cache

More information

Magnetic core memory (1951) cm 2 ( bit)

Magnetic core memory (1951) cm 2 ( bit) Magnetic core memory (1951) 16 16 cm 2 (128 128 bit) Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory Random Access Non-Random Access EPROM E 2 PROM

More information

Square Always Exponentiation

Square Always Exponentiation Square Always Exponentiation Christophe Clavier 1 Benoit Feix 1,2 Georges Gagnerot 1,2 Mylène Roussellet 2 Vincent Verneuil 2,3 1 XLIM-Université de Limoges, France 2 INSIDE Secure, Aix-en-Provence, France

More information

Part 1: Hashing and Its Many Applications

Part 1: Hashing and Its Many Applications 1 Part 1: Hashing and Its Many Applications Sid C-K Chau Chi-Kin.Chau@cl.cam.ac.u http://www.cl.cam.ac.u/~cc25/teaching Why Randomized Algorithms? 2 Randomized Algorithms are algorithms that mae random

More information

Using R for Iterative and Incremental Processing

Using R for Iterative and Incremental Processing Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 1 Introduction C. M. Krishna Fall 2006 ECE655/Krishna Part.1.1 Prerequisites Basic courses in

More information

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach Hwisung Jung, Massoud Pedram Outline Introduction Background Thermal Management Framework Accuracy of Modeling Policy Representation

More information

Quantitative evaluation of Dependability

Quantitative evaluation of Dependability Quantitative evaluation of Dependability 1 Quantitative evaluation of Dependability Faults are the cause of errors and failures. Does the arrival time of faults fit a probability distribution? If so, what

More information

Troubleshooting IOM Issues

Troubleshooting IOM Issues This chapter contains the following sections: IOM Terminology, page 1 Chassis Boot Sequence, page 2 Link Pinning and Failover Behavior, page 3 Recommended Solutions for IOM Issues, page 4 IOM Terminology

More information

Saving Power in the Data Center. Marios Papaefthymiou

Saving Power in the Data Center. Marios Papaefthymiou Saving Power in the Data Center Marios Papaefthymiou Energy Use in Data Centers Several MW for large facilities Responsible for sizable fraction of US electricity consumption 2006: 1.5% or 61 billion kwh

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines Reminder: midterm on Tue 2/28 will cover Chapters 1-3, App A, B if you understand all slides, assignments,

More information

THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY. Daniel Sanchez and Christos Kozyrakis Stanford University

THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY. Daniel Sanchez and Christos Kozyrakis Stanford University THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY Daniel Sanchez and Christos Kozyrakis Stanford University MICRO-43, December 6 th 21 Executive Summary 2 Mitigating the memory wall requires large, highly

More information

The Analysis of Microburst (Burstiness) on Virtual Switch

The Analysis of Microburst (Burstiness) on Virtual Switch The Analysis of Microburst (Burstiness) on Virtual Switch Chunghan Lee Fujitsu Laboratories 09.19.2016 Copyright 2016 FUJITSU LABORATORIES LIMITED Background What is Network Function Virtualization (NFV)?

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol Introduction to Side Channel Analysis Elisabeth Oswald University of Bristol Outline Part 1: SCA overview & leakage Part 2: SCA attacks & exploiting leakage and very briefly Part 3: Countermeasures Part

More information

2M x 32 Bit 5V FPM SIMM. Fast Page Mode (FPM) DRAM SIMM S51T04JD Pin 2Mx32 FPM SIMM Unbuffered, 1k Refresh, 5V. General Description.

2M x 32 Bit 5V FPM SIMM. Fast Page Mode (FPM) DRAM SIMM S51T04JD Pin 2Mx32 FPM SIMM Unbuffered, 1k Refresh, 5V. General Description. Fast Page Mode (FPM) DRAM SIMM 322006-S51T04JD Pin 2Mx32 Unbuffered, 1k Refresh, 5V General Description The module is a 2Mx32 bit, 4 chip, 5V, 72 Pin SIMM module consisting of (4) 1Mx16 (SOJ) DRAM. The

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Using A54SX72A and RT54SX72S Quadrant Clocks

Using A54SX72A and RT54SX72S Quadrant Clocks Application Note AC169 Using A54SX72A and RT54SX72S Quadrant Clocks Architectural Overview The A54SX72A and RT54SX72S devices offer four quadrant clock networks (QCLK0, 1, 2, and 3) that can be driven

More information

Bus transit = 20 ns (one way) access, each module cannot be accessed faster than 120 ns. So, the maximum bandwidth is

Bus transit = 20 ns (one way) access, each module cannot be accessed faster than 120 ns. So, the maximum bandwidth is 32 Flynn: Coputer Architecture { The Solutions Chapter 6. Meory Syste Design Proble 6. The eory odule uses 64 4 M b chips for 32MB of data and 8 4 M b chips for ECC. This allows 64 bits + 8 bits ECC to

More information

April 2004 AS7C3256A

April 2004 AS7C3256A pril 2004 S7C3256 3.3V 32K X 8 CMOS SRM (Common I/O) Features Pin compatible with S7C3256 Industrial and commercial temperature options Organization: 32,768 words 8 bits High speed - 10/12/15/20 ns address

More information

Analysis of Software Artifacts

Analysis of Software Artifacts Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University

More information

EECS 579: SOC Testing

EECS 579: SOC Testing EECS 579: SOC Testing Core-Based Systems-On-A-Chip (SOCs) Cores or IP circuits are predesigned and verified functional units of three main types Soft core: synthesizable RTL Firm core: gate-level netlist

More information

Predicting IC Defect Level using Diagnosis

Predicting IC Defect Level using Diagnosis 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising

More information

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world.

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world. Outline EECS 150 - Components and esign Techniques for igital Systems Lec 18 Error Coding Errors and error models Parity and Hamming Codes (SECE) Errors in Communications LFSRs Cyclic Redundancy Check

More information

SLOVAKIA. Slovak Hydrometeorological Institute

SLOVAKIA. Slovak Hydrometeorological Institute WWW TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA- PROCESSING AND FORECASTING SYSTEM (GDPFS), AND THE ANNUAL NUMERICAL WEATHER PREDICTION (NWP) PROGRESS REPORT FOR THE YEAR 2005 SLOVAKIA Slovak Hydrometeorological

More information