Revisiting Memory Errors in Large-Scale Production Data Centers

Similar documents
CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY

RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures

VMware VMmark V1.1 Results

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

Introduction to Randomized Algorithms III

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Tradeoff between Reliability and Power Management

DRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13

Tuan V. Dinh, Lachlan Andrew and Philip Branch

Enhancing Multicore Reliability Through Wear Compensation in Online Assignment and Scheduling. Tam Chantem Electrical & Computer Engineering

Statistical Reliability Modeling of Field Failures Works!

REPORT, RE0813, MIL, ENV, 810G, TEMP, IN-HOUSE, , PASS

ICS 233 Computer Architecture & Assembly Language

ArcGIS Deployment Pattern. Azlina Mahad

ww.padasalai.net

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Direct Self-Consistent Field Computations on GPU Clusters

CS 700: Quantitative Methods & Experimental Design in Computer Science

ECE 3060 VLSI and Advanced Digital Design. Testing

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems

Chapter 7. Sequential Circuits Registers, Counters, RAM

One Optimized I/O Configuration per HPC Application

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers

Knowledge Discovery and Data Mining 1 (VO) ( )

PC100 Memory Driver Competitive Comparisons

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

DM74LS670 3-STATE 4-by-4 Register File

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

Moores Law for DRAM. 2x increase in capacity every 18 months 2006: 4GB

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 13: Sequential Circuits, FSM

Redundant Array of Independent Disks

Do we have a quorum?

P214 Efficient Computation of Passive Seismic Interferometry

Lecture 2: Metrics to Evaluate Systems

EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs)

EECS150 - Digital Design Lecture 26 Faults and Error Correction. Recap

ECE-470 Digital Design II Memory Test. Memory Cells Per Chip. Failure Mechanisms. Motivation. Test Time in Seconds (Memory Size: n Bits) Fault Types

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

Fault Modeling. 李昆忠 Kuen-Jong Lee. Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan. VLSI Testing Class

Introduction to Reliability Theory (part 2)

Computational and Monte-Carlo Aspects of Systems for Monitoring Reliability Data. Emmanuel Yashchin IBM Research, Yorktown Heights, NY

Comparing the Effects of Intermittent and Transient Hardware Faults on Programs

Does soft-decision ECC improve endurance?

Contributions to the Evaluation of Ensembles of Combinational Logic Gates

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

EECS150 - Digital Design Lecture 26 - Faults and Error Correction. Types of Faults in Digital Designs

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

Terminology and Concepts

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

Fault Tolerance. Dealing with Faults

Parallel Transposition of Sparse Data Structures

GMU, ECE 680 Physical VLSI Design 1

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing

Chapter 7 HYPOTHESIS-BASED INVESTIGATION OF DIGITAL TIMESTAMPS. 1. Introduction. Svein Willassen

On Superposition of Heterogeneous Edge Processes in Dynamic Random Graphs

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 11

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

416 Distributed Systems

Introduction to VLSI Testing

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

Machine Learning to Automatically Detect Human Development from Satellite Imagery

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

1 Brief Introduction to Quantum Mechanics

PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

Semiconductor Memories

CMSC 313 Lecture 25 Registers Memory Organization DRAM

Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions

Lecture 6 NEW TYPES OF MEMORY

Coded Caching for Files with Distinct File Sizes

Magnetic core memory (1951) cm 2 ( bit)

Square Always Exponentiation

Part 1: Hashing and Its Many Applications

Using R for Iterative and Incremental Processing

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram

Quantitative evaluation of Dependability

Troubleshooting IOM Issues

Saving Power in the Data Center. Marios Papaefthymiou

Lecture 13: Sequential Circuits, FSM

THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY. Daniel Sanchez and Christos Kozyrakis Stanford University

The Analysis of Microburst (Burstiness) on Virtual Switch

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol

2M x 32 Bit 5V FPM SIMM. Fast Page Mode (FPM) DRAM SIMM S51T04JD Pin 2Mx32 FPM SIMM Unbuffered, 1k Refresh, 5V. General Description.

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Using A54SX72A and RT54SX72S Quadrant Clocks

Bus transit = 20 ns (one way) access, each module cannot be accessed faster than 120 ns. So, the maximum bandwidth is

April 2004 AS7C3256A

Analysis of Software Artifacts

EECS 579: SOC Testing

Predicting IC Defect Level using Diagnosis

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world.

SLOVAKIA. Slovak Hydrometeorological Institute

Transcription:

Revisiting Memory Errors in Large-Scale Production Da Centers Analysis and Modeling of New Trends from the Field Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu

Overview Study of DRAM reliability: on modern devices and workloads at a large scale in the field

Overview Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Overview Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Errors follow a power-law distribution and a large number of errors occur due to sockets/channels Modeling errors Architecture & workload

Overview Error/failure occurrence We find that newer cell fabrication technologies have higher failure rates Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Overview Error/failure occurrence Chips per DIMM, transfer width, and workload type (not necessarily CPU/ memory utilization) affect reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Overview Error/failure occurrence We have made publicly available a stistical model for assessing server memory reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Overview Error/failure occurrence Page offlining at scale First large-scale study of page offlining; real-world limitions of technique New reliability trends Technology scaling Modeling errors Architecture & workload

Outline background and motivation server memory organization error collection/analysis methodology memory reliability trends summary

Background and motivation

DRAM errors are common examined extensively in prior work charged particles, wear-out variable retention time (next lk) error correcting codes used to detect and correct errors require additional storage overheads

Our goal Strengthen undersnding of DRAM reliability by studying: new trends in DRAM errors modern devices and workloads at a large scale billions of device-days, across 4 months

Our main contributions identified new DRAM failure trends developed a model for DRAM errors evaluated page offlining at scale

Server memory organization

Socket

Memory channels

DIMM slots

DIMM 24 2

Chip 24 2

Banks 24 2

24 2 Rows and columns

24 2 Cell

User da 24 2 ECC meda additional 2.% overhead

Reliability events Fault the underlying cause of an error DRAM cell unreliably stores charge Error the manifestion of a fault permanent: every time transient: only some of the time

Error collection/ analysis methodology

DRAM error measurement measured every correcble error across Facebook's fleet for 4 months meda associated with each error parallelized Map-Reduce to process used R for further analysis

System characteristics 6 different system configurations Web, Hadoop, Ingest, Dabase, Cache, Media diverse CPU/memory/storage requirements modern DRAM devices DDR communication protocol (more aggressive clock frequencies) diverse organizations (banks, ranks,...) previously unexamined characteristics density, # of chips, transfer width, workload

Memory reliability trends

Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Server error rate )UaFtion of seuveus.4...2.2.... CoUUeFEle euuous (C() 8nFoUUeFEle euuous (8C() % 7/ 8/ 9/ / / 2/ /4 2/4 /4 4/4 /4 6/4 7/4 8/4.% onth

Memory error distribution

Memory error distribution creasing hazard rate

Memory error distribution creasing hazard rate How are errors mapped to memory organization?

Sockets/channels: many errors Socket Channel Bank Row Column Cell Spurious Fraction of errors...

Sockets/channels: many errors Fraction of errors... Socket Channel Bank Row Not mentioned in prior chiplevel studies Column Cell Spurious

Sockets/channels: many errors Fraction of errors... Socket Channel Bank Row Not accounted for in prior chiplevel studies At what rate do components fail on servers? Column Cell Spurious

Bank/cell/spurious failures are common Socket Channel Bank Row Column Cell Spurious Fraction of servers...

# of servers # of errors Socket Channel Fraction of servers... Fraction of errors Socket Channel... nial-ofservice like behavior

# of servers # of errors Socket Channel Fraction of servers... Socket Fraction of errors Channel... nial-ofservice like behavior What factors contribute to memory failures at scale?

Analytical methodology measure server characteristics not feasible to examine every server examined all servers with errors (error group) sampled servers without errors (control group) bucket devices based on characteristics measure relative failure rate of error group vs. control group within each bucket

Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Prior work found inconclusive trends with respect to memory capacity Relative server failure rate... 2 8 6 24 DIMM capacity (GB)

Prior work found inconclusive trends with respect to memory capacity Relative server failure rate... Examine characteristic more closely related to cell fabrication technology 2 8 6 24 DIMM capacity (GB)

Use DRAM chip density to examine technology scaling (closely related to fabrication technology)

Relative server failure rate... 2 4 Chip density (Gb)

Relative server failure rate... Intuition: quadratic increase in capacity 2 4 Chip density (Gb)

Error/failure occurrence We find that newer cell fabrication technologies have higher failure rates Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

DIMM architecture chips per DIMM, transfer width 8 to 48 chips x4, x8 = 4 or 8 bits per cycle electrical implications D Registe Clock 47. 7. il C. il D Back 2..2 Side 24 2 f Con B.46mm m il of Con C il of Con D.8. 2..~.....8.. 2..2

DIMM architecture Does DIMM organization affect memory reliability? chips per DIMM, transfer width 8 to 48 chips x4, x8 = 4 or 8 bits per cycle electrical implications D Registe Clock 47. 7. il C. il D Back 2..2 Side 24 2 f Con B.46mm m il of Con C il of Con D.8. 2..~.....8.. 2..2

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 No consistent trend across only chips per DIMM x4 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 More chips higher failure rate 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 More bits per cycle higher failure rate 8 6 2 48 Da chips per DIMM

Gb 2 Gb 4 Gb Relative server failure rate... x8 x4 Intuition: increased electrical loading 8 6 2 48 Da chips per DIMM

Workload dependence prior studies: homogeneous workloads web search and scientific warehouse-scale da centers: web, hadoop, ingest, dabase, cache, media

Workload dependence prior studies: homogeneous workloads What affect to heterogeneous workloads have on reliability? web search and scientific warehouse-scale da centers: web, hadoop, ingest, dabase, cache, media

Gb 2 Gb 4 Gb Relative server failure rate... Relative server failure rate....2..7 CPU utilization.2..7 Memory utilization

Gb 2 Gb 4 Gb Relative server failure rate... Relative server failure rate No consistent trend across CPU/memory utilization....2..7 CPU utilization.2..7 Memory utilization

Web Hadoop Ingest Dabase Memcache Media Relative server failure rate...

Web Hadoop Ingest Dabase Memcache Media Relative server failure rate... Varies by up to 6.x

Error/failure occurrence Chips per DIMM, transfer width, and workload type (not necessarily CPU/ memory utilization) affect reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

A model for server failure use stistical regression model compare control group vs. error group linear regression in R trained using da from analysis enable exploratory analysis high perf. vs. low power systems

nsity Chips Age Memory error model Relative server failure rate...

nsity Chips Age Memory error model Relative server failure rate... Number of physical CPU cores in the server. 2 Yes 2 26 449 ln ŒF=. F/ç D ˇIntercept C.Capacity ˇCapacity / C.nsity2Gb ˇnsity2Gb / C.nsity4Gb ˇnsity4Gb / C.Chips ˇChips / C.CPU% ˇCPU% / C.Age ˇAge / C.CPUs ˇCPUs /

Available online http://www.ece.cmu.edu/~safari/tools/memerr/

Error/failure occurrence We have made publicly available a stistical model for assessing server memory reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Prior page offlining work [Tang+,DSN'6] proposed technique "retire" faulty pages using OS do not allow software to allocate them [Hwang+,ASPLOS'2] simulated eval. error traces from Google and IBM recommended retirement on first error large number of cell/spurious errors

Prior page off lining work [Tang+,DSN'6] proposed technique "retire" faulty pages using OS How effective is page offlining in the wild? do not allow software to allocate them [Hwang+,ASPLOS'2] simulated eval. error traces from Google and IBM recommended retirement on first error large number of cell/spurious errors

Cluster of 2,276 servers

Cluster of 2,276 servers -67%

Cluster of 2,276 servers -67% Prior work: -86% to -94%

Cluster of 2,276 servers -67% 6% of page offlining attempts failed due to OS Prior work: -86% to -94%

Error/failure occurrence Page offlining at scale First large-scale study of page offlining; real-world limitions of technique New reliability trends Technology scaling Modeling errors Architecture & workload

Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

More results in paper Vendors Age Processor cores Correlation analysis Memory model case study

Summary

Modern systems Large scale

Summary Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Summary Error/failure occurrence Page offlining at scale New reliability trends Technology scaling Errors follow a power-law distribution and a large number of errors occur due to sockets/channels Modeling errors Architecture & workload

Summary Error/failure occurrence We find that newer cell fabrication technologies have higher failure rates Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Summary Error/failure occurrence Chips per DIMM, transfer width, and workload type (not necessarily CPU/ memory utilization) affect reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Summary Error/failure occurrence We have made publicly available a stistical model for assessing server memory reliability Page offlining at scale New reliability trends Technology scaling Modeling errors Architecture & workload

Summary Error/failure occurrence Page offlining at scale First large-scale study of page offlining; real-world limitions of technique New reliability trends Technology scaling Modeling errors Architecture & workload

Revisiting Memory Errors in Large-Scale Production Da Centers Analysis and Modeling of New Trends from the Field Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu

Backup slides

creasing hazard rate

A n Co of. il.4.9 4.. et llim mi s: t i Un.4.9 4 s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle..8 A n Co of. il B ts c on fc il o s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 9. 7.. 2.... 4 2. D A n Co of. il il o e.8 B ts c on fc il o C. et llim mi s: t i Un.4.9 4 f n Co 2.. s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. x ma. 7.2..6.2.2 2. ~. 7.2. ted s ise w r e 2. mm.. et llim mi s: t i Un.4.9 4 f C on fc D.8 D il o e n Co. x ma.. 4 2..6 ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc 2. mm 7.2. 7.2. ted s ise w r e ~. o il.2 D.8 B ts c on fc il o f C on fc 6m.4 D 2... 4 2. il o e n Co e Sid m max 6m.4 o il. x ma 9. 7.. 2. 9. 7.. 2. P_ W :WP_ 89 _482 CWM APCP 2. mm 7.2. A n Co of. il.6 on fc 7. D. ~..2 o il.. X 2 e Sid m max 2 ck Ba C il. 47 6m.4 D.. 7. il D 2... 7.2.8 B ts c on fc il o C 4X 2..8 on fc il A 2.. X. e Sid m max 6m.4 2..2 D 24 f 9 8. TS 2 SPD il B ck Ba C il. 47. 7 il D 2.. nt Fro. 2. 4X 2. il o e. A R8 BF 2 R7 G MT -H.2 7. o il n Co il A 2.. X. e Sid m max 7.. 2. 2. ck Ba C il. 47. X il D 2.. 2 x7 G il B.8 4X 9 8. TS 2 SPD.2. 2. il A nt Fro. P_ W :WP_ 89 _482 CWM APCP. 7.2 C il. 47. A R8 BF 2 R7 G MT -H 2.. 2..8. 2 x7 G il B il D ck Ba 9 8. TS 2 SPD. 7.2 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H.2 9 8. TS 2 SPD ring r iste Reg k Drive Cloc 2.. nt Fro..2 2. P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP 9. 7.. 2. P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e s er 7

A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un.4.9 4 c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 4Gb oth ss nle. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un.4.9 4 f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb 9. 7.. 2.. x ma.. 4 2. D A n Co of. il il o e.8.6.2.2 B ts c on fc il o C. et llim mi s: t i Un.4.9 4 f n Co 2. 2. mm 7.2. 7.2. ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma.. 4 2. A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il. 7.6 9. 7.. 2. 9. 7.. 2. P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4,26.. 4 2. A n Co of. il.2 x ma e Sid m max.. X 2 2. 2....8. on fc C il. 47 2 ck Ba 2.. 6m.4 2. mm 7.2..6 4X il D.2.4.9 4 ~..2..2.8 B ts c on fc il o 7. o il 2. D 24 f C.. 7 2. 2 il A e Sid m max.. X D 2. il o e 6m.4 on fc C il. 47. 2. 7. o il n Co 2.. 2. ck Ba. 4X il D.8 2.. X. e Sid m max 7. il A 9 8. TS 2 SPD il B.2 2.. 2.. X C il. 47 2. ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H.2. 9 8. TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H.2 9 8. TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro..2 2. P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP 9. 7.. 2. P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un.4.9 4 c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 4Gb. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un.4.9 4 f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb 9. 7.. 2.. x ma.. 4 2. D A n Co of. il il o e.8.6.2.2 B ts c on fc il o C. et llim mi s: t i Un.4.9 4 f n Co 2. 2. mm 7.2. 7.2. ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb Errors nsity..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma.. 4 2. A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il. 7.6 9. 7.. 2. 9. 7.. 2. P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4,26.. 4 2. A n Co of. il.2 x ma e Sid m max.. X 2 2. 2....8. on fc C il. 47 2 ck Ba 2.. 6m.4 2. mm 7.2..6 4X il D.2.4.9 4 ~..2..2.8 B ts c on fc il o 7. o il 2. D 24 f C.. 7 2. 2 il A e Sid m max.. X D 2. il o e 6m.4 on fc C il. 47. 2. 7. o il n Co 2.. 2. ck Ba. 4X il D.8 2.. X. e Sid m max 7. il A 9 8. TS 2 SPD il B.2 2.. 2.. X C il. 47 2. ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H.2. 9 8. TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H.2 9 8. TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro..2 2. P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP 9. 7.. 2. P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un.4.9 4 c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 4Gb. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un.4.9 4 f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb 9. 7.. 2.. x ma.. 4 2. D A n Co of. il il o e.8.6.2.2 B ts c on fc il o C. et llim mi s: t i Un.4.9 4 f n Co 2. 2. mm 7.2. 7.2. ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb Buckets Errors nsity..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma.. 4 2. A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il. 7.6 9. 7.. 2. 9. 7.. 2. P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4,26.. 4 2. A n Co of. il.2 x ma e Sid m max.. X 2 2. 2....8. on fc C il. 47 2 ck Ba 2.. 6m.4 2. mm 7.2..6 4X il D.2.4.9 4 ~..2..2.8 B ts c on fc il o 7. o il 2. D 24 f C.. 7 2. 2 il A e Sid m max.. X D 2. il o e 6m.4 on fc C il. 47. 2. 7. o il n Co 2.. 2. ck Ba. 4X il D.8 2.. X. e Sid m max 7. il A 9 8. TS 2 SPD il B.2 2.. 2.. X C il. 47 2. ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H.2. 9 8. TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H.2 9 8. TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro..2 2. P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP 9. 7.. 2. P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un.4.9 4 c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 4Gb. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un.4.9 4 f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb 9. 7.. 2.. x ma.. 4 2. D A n Co of. il il o e.8.6.2.2 B ts c on fc il o C. et llim mi s: t i Un.4.9 4 f n Co 2. 2. mm 7.2. 7.2. ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb Errors nsity..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma.. 4 2. A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il. 7.6 9. 7.. 2. 9. 7.. 2. P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4,26.. 4 2. A n Co of. il.2 x ma e Sid m max.. X 2 2. 2....8. on fc C il. 47 2 ck Ba 2.. 6m.4 2. mm 7.2..6 4X il D.2.4.9 4 ~..2..2.8 B ts c on fc il o 7. o il 2. D 24 f C.. 7 2. 2 il A e Sid m max.. X D 2. il o e 6m.4 on fc C il. 47. 2. 7. o il n Co 2.. 2. ck Ba. 4X il D.8 2.. X. e Sid m max 7. il A 9 8. TS 2 SPD il B.2 2.. 2.. X C il. 47 2. ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H.2. 9 8. TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H.2 9 8. TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro..2 2. P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP 9. 7.. 2. P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

A n Co of. il Errors. : te No. nsity..8 B ts c on fc il o et llim mi s: t i Un.4.9 4 c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 4Gb. ted s ise w r e s er D.8 B ts c on fc il o. et llim mi s: t i Un.4.9 4 f C s er. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle Gb 9. 7.. 2.. x ma.. 4 2. D A n Co of. il il o e.8.6.2.2 B ts c on fc il o C. et llim mi s: t i Un.4.9 4 f n Co 2. 2. mm 7.2. 7.2. ted s ise w r e ~. on fc. s er 2. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2 oth ss nle 2Gb Errors nsity..6. : te No. c ran tole Re v.. eo /A na ug u ns sio en im ll d.2. ~. mm 7.2 x ma. et llim mi s: t i Un 7.2. ted s ise w r e.. il o e n Co o il D. x ma.. 4 2. A n Co of. il on fc 6m.4 D 2. mm 7.2. ring r iste Reg k Drive Cloc ring r iste Reg k Drive Cloc o il. 7.6 9. 7.. 2. 9. 7.. 2. P_ W :WP_ 89 _482 CWM APCP D ~. e Sid m max 6m.4.8 f C 7..2 D il o e n Co.2 4,26.. 4 2. A n Co of. il.2 x ma e Sid m max.. X 2 2. 2....8. on fc C il. 47 2 ck Ba 2.. 6m.4 2. mm 7.2..6 4X il D.2.4.9 4 ~..2..2.8 B ts c on fc il o 7. o il 2. D 24 f C.. 7 2. 2 il A e Sid m max.. X D 2. il o e 6m.4 on fc C il. 47. 2. 7. o il n Co 2.. 2. ck Ba. 4X il D.8 2.. X. e Sid m max 7. il A 9 8. TS 2 SPD il B.2 2.. 2.. X C il. 47 2. ck Ba 2.. il D nt Fro.. 7 4X 2 x7 G il B. 7 A R8 BF 2 R7 G MT -H.2. 9 8. TS 2 SPD.2 il A 2. C il. 47. nt Fro. P_ W :WP_ 89 _482 CWM APCP.. 2. A R8 BF 2 R7 G MT -H.. 2 x7 G il B. 7 il D ck Ba 9 8. TS 2 SPD.8 4X..2 il A nt Fro. P_ W :WP_ 89 _482 CWM APCP 2. ring r iste Reg k Drive Cloc 2 x7 G il B A R8 BF 2 R7 G MT -H.2 9 8. TS 2 SPD ring r iste Reg k Drive Cloc. nt Fro..2 2. P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP 2 x7 G :WP_ 89 _482 CWM APCP 9. 7.. 2. P_ W A R8 BF R7 G MT -H P_ W :WP_ 89 _482 CWM APCP P_ W :WP_ 89 _482 CWM APCP oth ss nle. ted s ise w r e 2Gb s er 7

Case study

Case study Factor Low-end High-end (HE) Capacity 4 GB 6 GB nsity2gb nsity4gb Chips 6 2 CPU% % 2% Age CPUs 8 6 Predicted relative failure rate.2.78 Inputs Output

Case study Factor Low-end High-end (HE) Capacity 4 GB 6 GB nsity2gb nsity4gb Chips a higher 6 impact? 2 CPU% % 2% Age CPUs 8 6 Predicted relative failure rate.2.78 Does CPUs or density have Inputs Output

Exploratory analysis Factor Low-end High-end (HE) HE/#density HE/#CPUs Capacity 4 GB 6 GB 4 GB 6 GB nsity2gb nsity4gb Chips 6 2 6 2 CPU% % 2% 2% % Age CPUs 8 6 6 8 Predicted relative failure rate.2.78..

Exploratory analysis Factor Low-end High-end (HE) HE/#density HE/#CPUs Capacity 4 GB 6 GB 4 GB 6 GB nsity2gb nsity4gb Chips 6 2 6 2 CPU% % 2% 2% % Age CPUs 8 6 6 8 Predicted relative failure rate.2.78..