PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Similar documents
Lecture 2: Metrics to Evaluate Systems

Computer Architecture

CMP 338: Third Class

Modern Computer Architecture

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then:

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Chapter 1. Binary Systems 1-1. Outline. ! Introductions. ! Number Base Conversions. ! Binary Arithmetic. ! Binary Codes. ! Binary Elements 1-2

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Lecture: Pipelining Basics

Performance of Computers. Performance of Computers. Defining Performance. Forecast

CMP 334: Seventh Class

Measurement & Performance

Measurement & Performance

ECE321 Electronics I

CS 700: Quantitative Methods & Experimental Design in Computer Science

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

Goals for Performance Lecture

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy

ALU A functional unit

Lecture 13: Sequential Circuits, FSM

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

Low power Architectures. Lecture #1:Introduction

TU Wien. Energy Efficiency. H. Kopetz 26/11/2009 Model of a Real-Time System

In 1980, the yield = 48% and the Die Area = 0.16 from figure In 1992, the yield = 48% and the Die Area = 0.97 from figure 1.31.

Lecture 2: CMOS technology. Energy-aware computing

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

Lecture 3, Performance

Lecture 3, Performance

CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

ww.padasalai.net

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

CIS 371 Computer Organization and Design

CSE140L: Components and Design Techniques for Digital Systems Lab. Power Consumption in Digital Circuits. Pietro Mercati

Where Does Power Go in CMOS?

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

CS470: Computer Architecture. AMD Quad Core

Computation Offloading Strategy Optimization with Multiple Heterogeneous Servers in Mobile Edge Computing

Convolution Algorithm

Administrative Stuff

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Chapter 7. Sequential Circuits Registers, Counters, RAM

Tradeoff between Reliability and Power Management

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 15: Scaling & Economics

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 5: Performance (Sequential) James C. Hoe Department of ECE Carnegie Mellon University

Moores Law for DRAM. 2x increase in capacity every 18 months 2006: 4GB

EE241 - Spring 2001 Advanced Digital Integrated Circuits

Serial Parallel Multiplier Design in Quantum-dot Cellular Automata

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Implications on the Design

Power in Digital CMOS Circuits. Fruits of Scaling SpecInt 2000

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

Introduction to CMOS VLSI Design Lecture 1: Introduction

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory

Lecture 8-1. Low Power Design

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

Qualitative vs Quantitative metrics

Anton: A Specialized ASIC for Molecular Dynamics

EN2912C: Future Directions in Computing Lecture 08: Overview of Near-Term Emerging Computing Technologies

CSE140L: Components and Design Techniques for Digital Systems Lab. FSMs. Instructor: Mohsen Imani. Slides from Tajana Simunic Rosing

LECTURE 28. Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State

Microprocessor Power Analysis by Labeled Simulation

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Lecture 5: Performance and Efficiency. James C. Hoe Department of ECE Carnegie Mellon University

Introduction to Computer Engineering. CS/ECE 252, Fall 2012 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

Lecture 13: Sequential Circuits, FSM

Technical Report GIT-CERCS. Thermal Field Management for Many-core Processors

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of

ICS 233 Computer Architecture & Assembly Language

3. (2) What is the difference between fixed and hybrid instructions?

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Mitigating Semiconductor Hotspots

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Digital Integrated Circuits A Design Perspective

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

CprE 281: Digital Logic

Transcription:

PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

Overview Announcement Jan. 17 th : Homework 1 release (due on Jan. 30 th ) This lecture Technology trends Measuring performance Principles of computer design Power and energy Cost and reliability

Technology Trends (Historical Data) IC logic Technology: on-chip transistor count doubles every 18-24 months (Moore s Law) Transistor density increases by 35% per year Die size increases 10-20% per year DRAM Technology Chip capacity increases 25-40% per year Flash Storage Chip capacity increases 50-60% per year

Technology Trends (Historical Data) Recent Microprocessor Trends Transistor count (1.43x/yr) Core count (1.2-1.43x/yr) Performance (1.15x/yr) Frequency (1.05x/yr) Power (1.04x/yr) 2004 2010 Source: Micron University Symposium

Performance Trends How to measure performance? Latency or response time: the time between start and completion of an event (e.g., milliseconds for disk access) Bandwidth or throughput: the total amount of work done in a given time (e.g., megabytes per second for disk transfer) Which one grows faster? Bandwidth, by at least the square of latency improvement rate. Which one is better? latency or throughput?

Measuring Performance Which one is better (faster)? Car Bus Delay=10m Delay=30m Capacity=4p Capacity=30p Throughput=0.4PPM Throughput=1PPM It really depends on your needs (goals).

Measuring Performance What program to use for measuring performance? Benchmarks Suites A set of representative programs that are likely relevant to the user Examples: n SPEC CPU 2006: CPU-oriented programs (for desktops) n SPECweb: throughput-oriented (for servers) n EEMBC: embedded processors/workloads

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 10 5 25 Prog-2 5 10 20 Prog-3 25 10 25 AM: Arithmetic Mean (good for times and latencies)

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 1/10 1/5 1/25 Prog-2 1/5 1/10 1/20 Prog-3 1/25 1/10 1/25 HM: Harmonic Mean (good for rates and throughput)

Summarizing Performance Numbers How to capture the behavior of multiple programs with a single number Comp-A Comp-B Comp-C Prog-1 10/10 10/5 10/25 Prog-2 5/5 5/10 5/20 Prog-3 25/25 25/10 25/25 GM: Geometric Mean (good for speedups)

The Processor Performance Clock cycle time (CT = 1/clock frequency) Influenced by technology and pipeline Cycles per instruction (CPI) Influenced by architecture IPC may be used instead (IPC = 1/CPI) Instruction count (IC) Influenced by ISA and compiler CPU time = IC x CPI x CT

Example Problem Find the average CPI of a load/store machine when running an application that results in the following statistics Instruction Type Frequency Cycles Load 20% 2 Store 20% 2 Branch 20% 2 ALU 40% 1 CPI = 0.2x2 + 0.2x2 + 0.2x2 + 0.4x1 = 1.6

Example Problem Find the average CPI of a load/store machine when running an application that results in the following statistics Instruction Type Frequency Cycles Load 20% 2 Store 20% 2 Branch 20% 2 ALU 40% 1 50% of the branches can be combined with ALU instructions and executed as Branch-ALU fused in 2 cycles. What is the new average CPI?

Example Problem Find the average CPI of a load/store machine when running an application that results in the following statistics Instruction Type Frequency Cycles Load 22% 2 Store 22% 2 Branch 11% 2 ALU 33% 1 Branch-ALU 12% 2 80% of the branches can be combined with ALU instructions and executed as Branch-ALU fused in 2 cycles. What is the new average CPI? CPI = 1.67

The Processor Performance Points to note Performance = 1 / execution time AM(IPCs) = 1 / HM(CPIs) GM(IPCs) = 1 / GM(CPIs)

Speedup vs. Percentage Speedup = old execution time / new execution time Improvement = (new performance - old performance)/old performance My old and new computers run a particular program in 80 and 60 seconds; compute the followings speedup = 80/60 percentage increase in performance reduction in execution time = 33% = 20/80 = 25%

Example Problem A new computer has an IPC that is 20% worse than the old one. However, it has a clock speed that is 30% higher than the old one. If running the same binaries on both machines. What speedup is the new computer providing? Speedup = 1/0.96 = 1.04 OLD NEW IPC 1 0.8 Frequency 1 1.3 IC 1 1 CPI 1/1 1/0.8 = 1.25 CT 1/1 1/1.3 ~ 0.77 CPU Time 1 ~0.96

Principles of Computer Design Designing better computer systems requires better utilization of resources Parallelism n Multiple units for executing partial or complete tasks Principle of locality (temporal and spatial) n Reuse data and functional units Common Case n Use additional resources to improve the common case

Amdahl s Law The law of diminishing returns

Example Problem Our new processor is 10x faster on computation than the original processor. Assuming that the original processor is busy with computation 40% of the time and is waiting for IO 60% of the time, what is the overall speedup? f=0.4 s=10 Speedup = 1 / (0.6 + 0.4/10) = 1/0.64 = 1.5625

Power and Energy

Power and Energy Power = Voltage x Current (P = VI) Instantaneous rate of energy transfer (Watt) Energy = Power x Time (E = PT) The cost of performing a task (Joule)

Power and Energy Power = Voltage x Current (P = VI) Instantaneous rate of energy transfer (Watt) Energy = Power x Time (E = PT) The cost of performing a task (Joule) Peak Power = 3W Average Power = 1.66W Total Energy = 5J

CPU Power and Energy All consumed energy is converted to heat CPU power is the rate of heat generation Excessive peak power may result in burning the chip Static and dynamic energy components n Energy = (Power Static + Power Dynamic ) x Time n Power Static = Voltage x Current Static n Power Dynamic = Activity x Capacitance x Voltage 2 x Frequency

Power Reduction Techniques Reducing capacitance (C) Requires changes to physical layout and technology Reducing voltage (V) Negative effect on frequency Opportunistically power gating (wakeup time) Dynamic voltage and frequency scaling Reducing frequency (f) Negative effect on CPU time Clock gating in unused resources Points to note Utilization directly effects dynamic power Lowering power does NOT mean lowering energy

Example Problem For a processor running at 100% utilization and consuming 60W, 30% of the power is attributed to leakage. What is the total power dissipation when the processor is running at 50% utilization?

Example Problem For a processor running at 100% utilization and consuming 60W, 30% of the power is attributed to leakage. What is the total power dissipation when the processor is running at 50% utilization? @100% Power = 18W + 42W = 60W @50% Power = 18W + 21W = 39W

Example Problem A processor consumes 80W of dynamic power and 20W of static power at 3GHz. It completes a program in 20 seconds. What is the energy consumption if frequency scales down by 20%?

Example Problem A processor consumes 80W of dynamic power and 20W of static power at 3GHz. It completes a program in 20 seconds. What is the energy consumption if frequency scales down by 20%? @3GHz Energy = (80W + 20W) x 20s = 2000J @2.4GHz Energy = (0.8x80W + 20W) x 20/0.8 = 2100J

Example Problem A processor consumes 80W of dynamic power and 20W of static power at 3GHz. It completes a program in 20 seconds. What is the energy consumption if frequency scales down by 20%? What is the energy consumption if voltage and frequency scale down by 20%?

Example Problem A processor consumes 80W of dynamic power and 20W of static power at 3GHz. It completes a program in 20 seconds. What is the energy consumption if frequency scales down by 20%? What is the energy consumption if voltage and frequency scale down by 20%? @ 80%V and 80%f Energy = (80x0.8 2 x0.8+20x0.8) x 20/0.8 = 1424J

Cost and Reliability

Cost of Integrated Circuit Cost of die!"#$% '()* +,$) -$%!"#$% +,$ /,$0+ Example wafer Yield of die!"#$% /,$0+ (23+$#$'* -$% 45,* "%$" +,$ "%$") 7 N: process-complexity factor Die n Specified by chip manufacturer

Example Problem Defect rate for a 144mm 2 die is 0.5 per cm 2. Assuming that we use a 40nm technology node (N=11) with 100% wafer yield, find the die yield.

Example Problem Defect rate for a 144mm 2 die is 0.5 per cm 2. Assuming that we use a 40nm technology node (N=11) with 100% wafer yield, find the die yield. Die yield = 1/(1 + 0.5x1.44) 11

Dependability A measure of system's reliability and availability System reliability n A measure of continuous service (time-to-failure) n Mean Time To Failure (MTTF) n Mean Time To Repair (MTTR) System availability

Dependability A measure of system's reliability and availability System reliability n A measure of continuous service (time-to-failure) n Mean Time To Failure (MTTF) = (3+2+1)/3 = 2 n Mean Time To Repair (MTTR) = (0.75+1+1.25)/3 = 1 System availability MTTF MTTF + MTTR = 2/(2+1) = 0.67