Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then:

Similar documents
Measurement & Performance

Measurement & Performance

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Lecture 2: Metrics to Evaluate Systems

Computer Architecture

Performance of Computers. Performance of Computers. Defining Performance. Forecast

Lecture 2: CMOS technology. Energy-aware computing

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

CMP 334: Seventh Class

CSE140L: Components and Design Techniques for Digital Systems Lab. Power Consumption in Digital Circuits. Pietro Mercati

Scheduling for Reduced CPU Energy

Goals for Performance Lecture

CIS 371 Computer Organization and Design

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

Power in Digital CMOS Circuits. Fruits of Scaling SpecInt 2000

CS 700: Quantitative Methods & Experimental Design in Computer Science

Lecture 3, Performance

Modern Computer Architecture

Lecture 3, Performance

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Dynamic Voltage and Frequency Scaling Under a Precise Energy Model Considering Variable and Fixed Components of the System Power Dissipation

Low power Architectures. Lecture #1:Introduction

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design Boolean Algebra, Logic Gates

CSE140L: Components and Design Techniques for Digital Systems Lab. FSMs. Instructor: Mohsen Imani. Slides from Tajana Simunic Rosing

ICS 233 Computer Architecture & Assembly Language

ECE/CS 250 Computer Architecture

Lecture: Pipelining Basics

CMP 338: Third Class

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

ENERGY AND TIME CONSTANTS IN RC CIRCUITS By: Iwana Loveu Student No Lab Section: 0003 Date: February 8, 2004

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Where Does Power Go in CMOS?

Lecture 13: Sequential Circuits, FSM

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

In 1980, the yield = 48% and the Die Area = 0.16 from figure In 1992, the yield = 48% and the Die Area = 0.97 from figure 1.31.

TU Wien. Energy Efficiency. H. Kopetz 26/11/2009 Model of a Real-Time System

Lecture 21: Packaging, Power, & Clock

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram

Lecture 7 Circuit Delay, Area and Power

CprE 281: Digital Logic

Introduction The Nature of High-Performance Computation

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Digital System Clocking: High-Performance and Low-Power Aspects. Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M.

VLSI Design I; A. Milenkovic 1

CHAPTER 20 ELECTRIC CIRCUITS

Dynamic operation 20

CSE493/593. Designing for Low Power

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Designing Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Homework 8. This homework is due October 26, 2015, at Noon.

Reducing power in using different technologies using FSM architecture

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors

Saving Power in the Data Center. Marios Papaefthymiou

Microprocessor Power Analysis by Labeled Simulation

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

CIRCUITS AND ELECTRONICS. Introduction and Lumped Circuit Abstraction

Series and Parallel. How we wire the world

Lecture 13: Sequential Circuits, FSM

Summarizing Measured Data

Power Dissipation. Where Does Power Go in CMOS?

EEC 216 Lecture #2: Metrics and Logic Level Power Estimation. Rajeevan Amirtharajah University of California, Davis

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

Digital Electronics Part II - Circuits

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Sequential Logic (3.1 and is a long difficult section you really should read!)

6.002x CIRCUITS AND ELECTRONICS. Introduction and Lumped Circuit Abstraction

EECS 141: FALL 05 MIDTERM 1

Introduction to Computer Engineering. CS/ECE 252, Fall 2012 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

Chapter 8. Capacitors. Charging a capacitor

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

ASIC FPGA Chip hip Design Pow Po e w r e Di ssipation ssipa Mahdi Shabany

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

ECE/CS 250: Computer Architecture. Basics of Logic Design: Boolean Algebra, Logic Gates. Benjamin Lee

LECTURE 28. Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State

Fig. 1 CMOS Transistor Circuits (a) Inverter Out = NOT In, (b) NOR-gate C = NOT (A or B)

Answer Key. Chapter 23. c. What is the current through each resistor?

Question 1. Question 2. Question 3

Optimizing Energy Consumption under Flow and Stretch Constraints

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Coulomb s constant k = 9x10 9 N m 2 /C 2

ALU A functional unit

Intro To Digital Logic

A capacitor is a device that stores electric charge (memory devices). A capacitor is a device that stores energy E = Q2 2C = CV 2

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters

The Linear-Feedback Shift Register

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Electrical Circuits. Winchester College Physics. makptb. c D. Common Time man. 3rd year Revision Test

8. Electric Currents

Chapter 9. Estimating circuit speed. 9.1 Counting gate delays

Lecture 13 - Digital Circuits (II) MOS Inverter Circuits. March 20, 2003

1.7 Digital Logic Inverters

Lecture 5: Performance (Sequential) James C. Hoe Department of ECE Carnegie Mellon University

Transcription:

Amdahl's Law Useful for evaluating the impact of a change. (A general observation.) Insight: Improving a feature cannot improve performance beyond the use of the feature Suppose we introduce a particular enhancement that improves fraction f of execution time by factor S. Then: Execution time new = ((1 f) + f/s) Execution time

Example Task: complete EE/CS 314 project Walk to lab: 15 minutes Work in lab: 600 minutes Food: 45 minutes Walk home: 10 minutes Total time = 670 minutes. Skip meals! And sleep in the lab! :-) 10.4% improvement Observation: doesn't help with the time spent working.

Other Performance Metrics Commonly used: MIPS (Millions of instructions per second) MIPS = Instruction count Execution time 10 6 Problems with MIPS as a metric: Instructions with different capabilities? Programs with different instruction mixes? Might not predict which machine is faster!

MIPS As A Performance Metric Consider an optimized and unoptimized version of a program: Memory ALU Branch Total Instrs. Instrs. Instrs. Instrs. Unopt 200 M 150 M 50 M 400 M Optimized 100 M 100 M 50 M 250 M Memory ALU Branch CPI cycles (2x) cycles (1x) cycles (3x) Unopt 400 M 150 M 150 M 1.75 Optimized 200 M 100 M 150 M 1.8

MIPS As A Performance Metric Assume a 750MHz Clock: MIPS (Unopt): 750/1.75 = 428.6 MIPS (opt): 750/1.8 = 416.7 Execution time: Unopt: CP I IC 1/CR = 1.75 400/750 = 0.93 secs Opt: CP I IC 1/CR = 1.8 250/750 = 0.6 secs MIPS doesn't predict which program is faster! (why?)

What Is "Typical"? Source of serious debate Instruction mix can signi cantly impact performance Benchmarks for performance Started with simple toy programs (quicksort, sieve, etc) Moved to synthetic benchmarks (dhrystone, whetstone) Kernels (time critical parts of real programs) Today: SPEC (collection of standard programs)

Summarizing Performance Given the execution time of a number of programs, can we come up with one gure of merit? Arithmetic mean: Average time = 1 n n i=1 Execution time i Assumes each benchmark is equally important; long-running benchmarks dominate. If we know workload, we can weight execution times: Weighted average time = n i=1 W i Execution time i n i=1 W i

Normalized Performance Pick weights to be 1/(time on a reference machine) What if we attempt a summary using normalized performance? Consider two machines A and B: P1 P2 Relative to A Relative to B secs secs P1 P2 AM P1 P2 AM A 10 50 1 1 1 0.1 50 25 B 100 1 10 0.02 5 1 1 1 Total execution time on A: 60 secs Total execution time on B: 101 secs If equal runs of programs P1 and P2, A is 1.7 times faster than B.

Geometric Mean Can be used to combine relative measures. Geometric Mean = n n R i i=1 P1 P2 Relative to A Relative to B secs secs P1 P2 GM P1 P2 GM A 10 50 1 1 1 0.1 50 2.2 B 100 1 10 0.02 0.45 1 1 1 Some unfortunate properties: does not track execution time! all improvements are equal

Harmonic Mean Used to average rates like MIPS or MFLOPS. Harmonic Mean = n n i=1 (1/R i ) Another way to look at it: 1 Harmonic Mean = n i=1 (1/R i ) n i.e. convert rate to time/task, average, convert back to rate. Also can be weighted like arithmetic mean.

Performance Estimation Time is the measure of performance! Recall how we derived the CPU performance equation. Typically, tradeoff between performance and cost. cost: parts, labor for assembly, etc. New metric: power. portable devices want to extend battery life

Power Where does power dissipation come from? Underlying implementation: transistor networks We can approximate these as RC circuits Power dissipated: CV 2, C capacitance, V voltage. (Resistors are dissipative) Power dissipated when the voltage on a wire changes. (Quick recap: I = V R e t/rc, E = CV 2 ) time 1/V, energy V 2 power V 3

Power Total power: Alpha 21164: 50W Alpha 21264: 72W 550 MHz Pentium III: 30.8W 300 MHz Mobile Pentium II: 9W 160 MHz StrongARM: 0.5W Sources of power consumption: Inputs to combinational logic, Šip-Šops Signals within combinational logic Clock keeps changing!

Power Minimize power consumption: Avoid spurious changes of signal values Turn off clock when part of the circuit not in use clock gating Modes of operation: turn off large parts of the chip Slow down clock in idle mode Reduce voltage in idle mode Only activate part of circuit used for computation? Eliminate clocks?