BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

Size: px
Start display at page:

Download "BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power"

Transcription

1 BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power James C. Hoe Department of ECE Carnegie Mellon niversity Eric S. Chung, et al., Single chip Heterogeneous Computing: Does the future include Custom Logic, FPGAs, and GPs? Proc. ACM/IEEE International Symposium on Microarchitecture (MICRO), pp 53 64, December J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s1 Moore s Law The number of transistors that can be economically integrated shall double every 24 months J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s2 [ 1

2 The Real Moore s Law J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s3 [Cramming More Components Onto Integrated Circuits, G. E. Moore, 1965] Transistor Scaling gate length J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s4 [ commercial products are at 20~14nm today distance between silicon atoms ~ 500 pm 2

3 Basic Scaling Theory Planned scaling occurs in nodes ~0.7x of the previous e.g., 90nm, 65nm, 45nm, 32nm, 20nm, 14nm, The End Take the same design, reducing the linear dimensions by 0.7x (aka gate shrink ) leads to (ideally) die area = 0.5x delay = 0.7x, frequency=1.43x capacitance = 0.7x Vdd=0.7x (if constant field) or Vdd=1x (if constant voltage) power = C V 2 f = 0.5x (if constant field) BT power = 1x (if constant voltage) Take the same area, then transistor count = 2x power = 1x (constant field), power = 2x (constant voltage) J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s5 Moore s Law Performance According to scaling theory, we should complexity: 1x transistors at 1.43x frequency 1.43x performance at 0.5x complexity: 2x transistors at 1.43x frequency 2.8x performance at constant power Instead, we have been getting (for high perf CPs) ~2x transistors ~2x frequency (note: faster than scaling would give us) all together we get about ~2x performance at ~2x power J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s6 3

4 Performance (In)efficiency To hit the expected performance target on single thread microprocessors we had been pushing frequency harder by deepening pipelines we used the 2x transistors to build more complicated microarchitectures so the fast/deep pipelines don t stall (i.e., caches, BP, superscalar, out of order) The consequence of performance inefficiency is limit of economical PC cooling [ITRS] IntelP4 Tehas 150W (Guess what happened next) J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s7 [from Shekhar Borkar, IEEE Micro, July 1999] How to run faster on less power? technology normalized power (Watt) Better to replace 1 of this by 2 of these; Or N of these Pentium 4 Power Perf Energy per Instruction Trends in Intel Microprocessors, Grochowski et al., 2006 J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s8 technology normalized performance (op/sec) 4

5 Multis Big Core 1970~ ~?? J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s9 So what is the problem? We (actually Intel, et al.) know how to pack more s on a die to stay on Moore s law in terms of aggregate or throughput performance How to use them? life is good if your N units of work is N independent programs just run them what if your N units of work is N operations of the same program? rewrite a parallel program what if your N units of work is N sequentially dependent operations of the same program??? How many s can you use up meaningfully? J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s10 5

6 Future is about Perf/Watt and Op/Joule By 2022 (11nm node), a large die (~500mm 2 ) will have over 10 billion transistors Big Core What will you Custom choose Logic to put on it? GPGP FPGA J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s11 Amdahl s Law on Speedup A program is rarely completely parallelizable. Let s say a fraction f can be sped up by a factor of p and the rest can not time baseline (1 f) f time accellerated (1 f) f/p If f is small p doesn t matter. An architect also cannot ignore the sequential performance of the (1 f) portion J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s12 6

7 Asymmetric Multis Fast Core Base Core Equivalent () in [Hill and Marty, 2008] J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s13 Combine a power hungry fast with lots of simple, power efficient slow s fast for sequential code slow s for parallel sections [Hill and Marty, 2008] 1 Speedup 1 f f perf seq ( n r) perf seq solve for optimal ldie area allocation depending on parallelizability f But, up to 95% of energy of von Neuman processors is overhead [Hameed, et al., ISCA 2010] Modeling Heterogeneous Multis Asymmetric Fast Core Base Core Equivalent Heterogeneous Speedup 1 f perf 1 f ( n r) seq perf seq [Hill and Marty, 2008] simplified f is fraction parallelizable n is total die area in units r is fast area in units perf seq (r) is fast perf. relative to Fast Core J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s14 Speedup 1- f perf seq 1 f ( n - r) For the sake of analysis, break the area for GP/FPGA/etc. into units of s that are the same size as s. Each type is characterized by a relative performance µ and relative power compared to a 7

8 Case Study [Chung, MICRO 2010] CP GPs FPGA ASIC Intel Core i7 960 Nvidia GTX285 Nvidia GTX480 ATI R5870 Xilinx V6 LX760 Std. Cell Year Node 45nm 55nm 40nm 40nm 40nm 65nm Die area 263mm 2 470mm 2 529mm 2 334mm 2 Clock rate 3.2GHz 1.5GHz 1.4GHz 1.5GHz 0.3GHz Single prec. floating point apps M M Mult FFT Black Scholes MKL multithreaded Spiral.net multithreaded J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s15 CBLAS 2.3 CFFT /3.1 PARSEC multithreaded CDA 2.3 CBLAS 3.1 CAL++ hand coded CFFT 3.0 Spiral.net hand coded Best Case Performance and Energy MMM Device GFLOP/s actual (GFLOP/s)/mm 2 normalized to 40nm GFLOP/J normalized to 40nm Intel Core i7 (45nm) Nvidia GTX285 (55nm) Nvidia GTX480 (40nm) ATI R5870 (40nm) Xilinx V6 LX760 (40nm) same RTL std cell (65nm) CP and GP benchmarking is compute bound; FPGA and Std Cell effectively compute bound (no off chip I/O) Power (switching+leakage) measurements isolated the from the system For detail see [Chung, et al. MICRO 2010] J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s16 8

9 Less Regular Applications GFLOP/s (GFLOP/s)/mm 2 GFLOP/J FFT 2 10 Intel Core i7 (45nm) Nvidia GTX285 (55nm) Nvidia GTX480 (40nm) ATI R5870 (40nm) Xilinx V6 LX760 (40nm) same RTL std cell (65nm) Black Scholes Mopt/s (Mopts/s)/mm 2 Mopts/J Intel lcore i7 (45nm) Nvidia GTX285 (55nm) Nvidia GTX480 (40nm) ATI R5870 (40nm) Xilinx V6 LX760 (40nm) same RTL std cell (65nm) J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s17 and μ example values MMM Black Scholes FFT 2 10 Nvidia Φ GTX285 μ Nvidia GTX480 Φ μ On equal area basis, 3.41x performance ATI Φ 1.27 R5870 μ 8.47 at 0.74x power Xilinx Φ 0.31 relative 0.26 a 0.29 LX760 μ Custom Logic Φ μ Nominal based on an Intel Atom in order processor, 26mm 2 in a 45nm process J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s18 9

10 Modeling Power and Bandwidth Budgets Heterogeneous Fast Core Speedup 1 1- f f perf seq ( n - r) The above is based on area alone Power or bandwidth budget limits the usable die area if P is total power budget expressed as a multiple of a s power, then usable area n r P if B is total memory bandwidth expressed also as a multiple of s, then usable area n r B μ J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s19 Combine Model with ITRS Trends Year Technology 40nm 32nm 22nm 16nm 11nm Core die budget (mm 2 ) Normalized area () (16x) Core power (W) Bandwidth (GB/s) (1.4x) Rel pwr per device 1X 0.75X 0.5X 0.36X 0.25X Fast Core 2011 parameters reflect high end h systems today; future parameters extrapolated from ITRS mm 2 populated by an optimally sized Fast Core and s of choice J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s20 10

11 Single Prec. MMMult (f=90%) Speedup ASIC GP R5870 FPGA LX760 Asymmetric multi Symmetric multi f= nm 32nm 22nm 16nm 11nm (0) SymCMP SymMC (2) ASIC GP (5) R5870 (1) AsymCMP AsymMC FPGA (3) LX760 Power Bound Mem Bound J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s21 Single Prec. MMMult (f=99%) 300 ASIC 250 Speedup nm 32nm 22nm 16nm 11nm (0) SymCMP SymMC (1) AsymCMP AsymMC (2) ASIC FPGA (3) LX760 GP (5) R5870 GP R5870 FPGA LX760 Sym f=0.990 & Asym multi Power Bound Mem Bound J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s22 11

12 Single Prec. MMMult (f=50%) Speedup 8 ASIC/GP/FPGA 7 Asymmetric Symmetric 1 f= nm 32nm 22nm 16nm 11nm (0) SymCMP SymMC (2) ASIC ASIC GP (5) R5870 (1) AsymCMP AsymMC FPGA (3) LX760 Power Bound Mem Bound J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s23 Single Prec. FFT 1024 (f=99%) 60 ASIC/GP/FPGA 50 Speedup Asymmetric Symmetric 10 f= nm 32nm 22nm 16nm 11nm (0) SymCMP SymMC (2) ASIC (4) GP GTX480 R5870 (1) AsymCMP AsymMC FPGA (3) LX760 Power Bound Mem Bound J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s24 12

13 FFT 1024 (f=99%) if hypothetical 1TB/sec bandwidth 200 ASIC Speedup FPGA LX760 GP GTX nm 32nm 22nm 16nm 11nm (0) SymCMP SymMC (1) AsymCMP AsymMC (2) ASIC ASIC FPGA (3) LX760 (4) GP GTX480 R5870 f=0.990 Sym & Asym multi Power Bound Mem Bound J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s25 Recap the Last Hour Perf per Watt and op per Joule matter need to look beyond programmable processors GPs and FPGAs are viable programmable candidates GP/FPGA/custom logic help performance only if significant fraction amenable to acceleration, and adequate bandwidth to sustain acceleration 3D stacked memory could be very helpful! Without adequate bandwidth, GP/FPGA catches up with custom logic in achievable performance but remains programmable and flexible Custom logic best for maximizing performance under tight energy/power constraints J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s26 13

14 Computer Architecture Lab (CALCM) Carnegie Mellon niversity J. C. Hoe, CM/ECE/CALCM, 2014, BHSC L7 s27 14

Lecture 27: Hardware Acceleration. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 27: Hardware Acceleration. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 27: Hardware Acceleration James C. Hoe Department of ECE Carnegie Mellon niversity 18 447 S18 L27 S1, James C. Hoe, CM/ECE/CALCM, 2018 18 447 S18 L27 S2, James C. Hoe, CM/ECE/CALCM, 2018

More information

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 12: Energy and Power James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L12 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today a working understanding of

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory L16: Power Dissipation in Digital Systems 1 Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974

More information

Lecture 5: Performance and Efficiency. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 5: Performance and Efficiency. James C. Hoe Department of ECE Carnegie Mellon University 18 643 Lecture 5: Performance and Efficiency James C. Hoe Department of ECE Carnegie Mellon University 18 643 F17 L05 S1, James C. Hoe, CMU/ECE/CALCM, 2017 18 643 F17 L05 S2, James C. Hoe, CMU/ECE/CALCM,

More information

CIS 371 Computer Organization and Design

CIS 371 Computer Organization and Design CIS 371 Computer Organization and Design Unit 13: Power & Energy Slides developed by Milo Mar0n & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by

More information

Lecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 23: Illusiveness of Parallel Performance James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L23 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping peel

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of ESE532: System-on-a-Chip Architecture Day 20: November 8, 2017 Energy Today Energy Today s bottleneck What drives Efficiency of Processors, FPGAs, accelerators How does parallelism impact energy? 1 2 Message

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

Lecture 4: Technology Scaling

Lecture 4: Technology Scaling Digital Integrated Circuits (83-313) Lecture 4: Technology Scaling Semester B, 2016-17 Lecturer: Dr. Adam Teman TAs: Itamar Levi, Robert Giterman 2 April 2017 Disclaimer: This course was prepared, in its

More information

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of ESE532: System-on-a-Chip Architecture Day 22: April 10, 2017 Today Today s bottleneck What drives Efficiency of Processors, FPGAs, accelerators 1 2 Message dominates Including limiting performance Make

More information

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance Metrics & Architectural Adaptivity ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So What are the Options? Power Consumption Activity factor (amount of circuit switching) Load Capacitance (size

More information

Administrative Stuff

Administrative Stuff EE141- Spring 2004 Digital Integrated Circuits Lecture 30 PERSPECTIVES 1 Administrative Stuff Homework 10 posted just for practice. No need to turn in (hw 9 due today). Normal office hours next week. HKN

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then:

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then: Amdahl's Law Useful for evaluating the impact of a change. (A general observation.) Insight: Improving a feature cannot improve performance beyond the use of the feature Suppose we introduce a particular

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture Rajeevan Amirtharajah University of California, Davis Outline Announcements Review: PDP, EDP, Intersignal Correlations, Glitching, Top

More information

TU Wien. Energy Efficiency. H. Kopetz 26/11/2009 Model of a Real-Time System

TU Wien. Energy Efficiency. H. Kopetz 26/11/2009 Model of a Real-Time System TU Wien 1 Energy Efficiency Outline 2 Basic Concepts Energy Estimation Hardware Scaling Power Gating Software Techniques Energy Sources 3 Why are Energy and Power Awareness Important? The widespread use

More information

Low power Architectures. Lecture #1:Introduction

Low power Architectures. Lecture #1:Introduction Low power Architectures Lecture #1:Introduction Dr. Avi Mendelson mendlson@ee.technion.ac.il Contributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred PollackP Technion, EE department Dr. Avi Mendelson,

More information

From Physics to Logic

From Physics to Logic From Physics to Logic This course aims to introduce you to the layers of abstraction of modern computer systems. We won t spend much time below the level of bits, bytes, words, and functional units, but

More information

Intro To Digital Logic

Intro To Digital Logic Intro To Digital Logic 1 Announcements... Project 2.2 out But delayed till after the midterm Midterm in a week Covers up to last lecture + next week's homework & lab Nick goes "H-Bomb of Justice" About

More information

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions Lecture Slides Intro Number Systems Logic Functions EE 0 in Context EE 0 EE 20L Logic Design Fundamentals Logic Design, CAD Tools, Lab tools, Project EE 357 EE 457 Computer Architecture Using the logic

More information

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &

More information

Post Von Neumann Computing

Post Von Neumann Computing Post Von Neumann Computing Matthias Kaiserswerth Hasler Stiftung (formerly IBM Research) 1 2014 IBM Corporation Foundation Purpose Support information and communication technologies (ICT) to advance Switzerland

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

3/10/2013. Lecture #1. How small is Nano? (A movie) What is Nanotechnology? What is Nanoelectronics? What are Emerging Devices?

3/10/2013. Lecture #1. How small is Nano? (A movie) What is Nanotechnology? What is Nanoelectronics? What are Emerging Devices? EECS 498/598: Nanocircuits and Nanoarchitectures Lecture 1: Introduction to Nanotelectronic Devices (Sept. 5) Lectures 2: ITRS Nanoelectronics Road Map (Sept 7) Lecture 3: Nanodevices; Guest Lecture by

More information

Implications on the Design

Implications on the Design Implications on the Design Ramon Canal NCD Master MIRI NCD Master MIRI 1 VLSI Basics Resistance: Capacity: Agenda Energy Consumption Static Dynamic Thermal maps Voltage Scaling Metrics NCD Master MIRI

More information

ECE321 Electronics I

ECE321 Electronics I ECE321 Electronics I Lecture 1: Introduction to Digital Electronics Payman Zarkesh-Ha Office: ECE Bldg. 230B Office hours: Tuesday 2:00-3:00PM or by appointment E-mail: payman@ece.unm.edu Slide: 1 Textbook

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018 ECE 172 Digital Systems Chapter 12 Instruction Pipelining Herbert G. Mayer, PSU Status 7/20/2018 1 Syllabus l Scheduling on Pipelined Architecture l Idealized Pipeline l Goal of Scheduling l Causes for

More information

Today. ESE532: System-on-a-Chip Architecture. Why Care? Message. Scaling. Why Care: Custom SoC

Today. ESE532: System-on-a-Chip Architecture. Why Care? Message. Scaling. Why Care: Custom SoC ESE532: System-on-a-Chip Architecture Day 21: April 5, 2017 VLSI Scaling 1 Today VLSI Scaling Rules Effects Historical/predicted scaling Variations (cheating) Limits Note: gory equations! goal is to understand

More information

Technical Report GIT-CERCS. Thermal Field Management for Many-core Processors

Technical Report GIT-CERCS. Thermal Field Management for Many-core Processors Technical Report GIT-CERCS Thermal Field Management for Many-core Processors Minki Cho, Nikhil Sathe, Sudhakar Yalamanchili and Saibal Mukhopadhyay School of Electrical and Computer Engineering Georgia

More information

DESIGN AND ANALYSIS OF A FULL ADDER USING VARIOUS REVERSIBLE GATES

DESIGN AND ANALYSIS OF A FULL ADDER USING VARIOUS REVERSIBLE GATES DESIGN AND ANALYSIS OF A FULL ADDER USING VARIOUS REVERSIBLE GATES Sudhir Dakey Faculty,Department of E.C.E., MVSR Engineering College Abstract The goal of VLSI has remained unchanged since many years

More information

Performance Evaluation of MPI on Weather and Hydrological Models

Performance Evaluation of MPI on Weather and Hydrological Models NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance

More information

LEADING THE EVOLUTION OF COMPUTE MARK KACHMAREK HPC STRATEGIC PLANNING MANAGER APRIL 17, 2018

LEADING THE EVOLUTION OF COMPUTE MARK KACHMAREK HPC STRATEGIC PLANNING MANAGER APRIL 17, 2018 LEADING THE EVOLUTION OF COMPUTE MARK KACHMAREK HPC STRATEGIC PLANNING MANAGER APRIL 17, 2018 INTEL S RESEARCH EFFORTS COMPONENTS RESEARCH INTEL LABS ENABLING MOORE S LAW DEVELOPING NOVEL INTEGRATION ENABLING

More information

Novel Devices and Circuits for Computing

Novel Devices and Circuits for Computing Novel Devices and Circuits for Computing UCSB 594BB Winter 2013 Lecture 4: Resistive switching: Logic Class Outline Material Implication logic Stochastic computing Reconfigurable logic Material Implication

More information

Moores Law for DRAM. 2x increase in capacity every 18 months 2006: 4GB

Moores Law for DRAM. 2x increase in capacity every 18 months 2006: 4GB MEMORY Moores Law for DRAM 2x increase in capacity every 18 months 2006: 4GB Corollary to Moores Law Cost / chip ~ constant (packaging) Cost / bit = 2X reduction / 18 months Current (2008) ~ 1 micro-cent

More information

Performance of Computers. Performance of Computers. Defining Performance. Forecast

Performance of Computers. Performance of Computers. Defining Performance. Forecast Performance of Computers Which computer is fastest? Not so simple scientific simulation - FP performance program development - Integer performance commercial work - I/O Performance of Computers Want to

More information

Modeling and Analyzing NBTI in the Presence of Process Variation

Modeling and Analyzing NBTI in the Presence of Process Variation Modeling and Analyzing NBTI in the Presence of Process Variation Taniya Siddiqua, Sudhanva Gurumurthi, Mircea R. Stan Dept. of Computer Science, Dept. of Electrical and Computer Engg., University of Virginia

More information

EEC 216 Lecture #2: Metrics and Logic Level Power Estimation. Rajeevan Amirtharajah University of California, Davis

EEC 216 Lecture #2: Metrics and Logic Level Power Estimation. Rajeevan Amirtharajah University of California, Davis EEC 216 Lecture #2: Metrics and Logic Level Power Estimation Rajeevan Amirtharajah University of California, Davis Announcements PS1 available online tonight R. Amirtharajah, EEC216 Winter 2008 2 Outline

More information

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8 EECS 141 S02 Lecture 8 Power Dissipation CMOS Scaling Last Lecture CMOS Inverter loading Switching Performance Evaluation Design optimization Inverter Sizing 1 Today CMOS Inverter power dissipation» Dynamic»

More information

Performance, Power & Energy

Performance, Power & Energy Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?

More information

Lecture 3, Performance

Lecture 3, Performance Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations

More information

Lecture 15: Scaling & Economics

Lecture 15: Scaling & Economics Lecture 15: Scaling & Economics Outline Scaling Transistors Interconnect Future Challenges Economics 2 Moore s Law Recall that Moore s Law has been driving CMOS [Moore65] Corollary: clock speeds have improved

More information

Power in Digital CMOS Circuits. Fruits of Scaling SpecInt 2000

Power in Digital CMOS Circuits. Fruits of Scaling SpecInt 2000 Power in Digital CMOS Circuits Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2004 by Mark Horowitz MAH 1 Fruits of Scaling SpecInt 2000 1000.00 100.00 10.00

More information

CS-683: Advanced Computer Architecture Course Introduction

CS-683: Advanced Computer Architecture Course Introduction CS-683: Advanced Computer Architecture Course Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology

More information

Anton: A Specialized ASIC for Molecular Dynamics

Anton: A Specialized ASIC for Molecular Dynamics Anton: A Specialized ASIC for Molecular Dynamics Martin M. Deneroff, David E. Shaw, Ron O. Dror, Jeffrey S. Kuskin, Richard H. Larson, John K. Salmon, and Cliff Young D. E. Shaw Research Marty.Deneroff@DEShawResearch.com

More information

CRYPTOGRAPHIC COMPUTING

CRYPTOGRAPHIC COMPUTING CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,

More information

System Level Leakage Reduction Considering the Interdependence of Temperature and Leakage

System Level Leakage Reduction Considering the Interdependence of Temperature and Leakage System Level Leakage Reduction Considering the Interdependence of Temperature and Leakage Lei He, Weiping Liao and Mircea R. Stan EE Department, University of California, Los Angeles 90095 ECE Department,

More information

Floating Point Representation and Digital Logic. Lecture 11 CS301

Floating Point Representation and Digital Logic. Lecture 11 CS301 Floating Point Representation and Digital Logic Lecture 11 CS301 Administrative Daily Review of today s lecture w Due tomorrow (10/4) at 8am Lab #3 due Friday (9/7) 1:29pm HW #5 assigned w Due Monday 10/8

More information

CSE493/593. Designing for Low Power

CSE493/593. Designing for Low Power CSE493/593 Designing for Low Power Mary Jane Irwin [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.].1 Why Power Matters Packaging costs Power supply rail design Chip and system

More information

Moore s Law Technology Scaling and CMOS

Moore s Law Technology Scaling and CMOS Design Challenges in Digital High Performance Circuits Outline Manoj achdev Dept. of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada Power truggle ummary Moore s Law

More information

CS 100: Parallel Computing

CS 100: Parallel Computing CS 100: Parallel Computing Chris Kauffman Week 12 Logistics Upcoming HW 5: Due Friday by 11:59pm HW 6: Up by Early Next Week Moore s Law: CPUs get faster Smaller transistors closer together Smaller transistors

More information

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem Abid Rafique, Nachiket Kapre, and George A. Constantinides Electrical and Electronic Engineering,

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

Previously. ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems. Today. Variation Types. Fabrication

Previously. ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems. Today. Variation Types. Fabrication ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Previously Understand how to model transistor behavior Given that we know its parameters V dd, V th, t OX, C OX, W, L, N A Day

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Variation-Resistant Dynamic Power Optimization for VLSI Circuits

Variation-Resistant Dynamic Power Optimization for VLSI Circuits Process-Variation Variation-Resistant Dynamic Power Optimization for VLSI Circuits Fei Hu Department of ECE Auburn University, AL 36849 Ph.D. Dissertation Committee: Dr. Vishwani D. Agrawal Dr. Foster

More information

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders EECS 47 Lecture 11: Power and Energy Reading: 5.55 [Adapted from Irwin and Narayanan] 1 Reminders CAD5 is due Wednesday 10/8 You can submit it by Thursday 10/9 at noon Lecture on 11/ will be taught by

More information

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters 1 Scaling of MOS Circuits CONTENTS 1. What is scaling?. Why scaling? 3. Figure(s) of Merit (FoM) for scaling 4. International Technology Roadmap for Semiconductors (ITRS) 5. Scaling models 6. Scaling factors

More information

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3) CMPE12 - Notes chapter 1 Digital Logic (Textbook Chapter 3) Transistor: Building Block of Computers Microprocessors contain TONS of transistors Intel Montecito (2005): 1.72 billion Intel Pentium 4 (2000):

More information

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture1 Fundamentals of Quantitative Design and Analysis (II) Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University 1.4 Trends in Technology Logic: transistor density 35%/year,

More information

Designing Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Homework 8. This homework is due October 26, 2015, at Noon.

Designing Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Homework 8. This homework is due October 26, 2015, at Noon. EECS 16A Designing Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Homework 8 This homework is due October 26, 2015, at Noon. 1. Nodal Analysis Or Superposition? (a) Solve for the

More information

Counters. We ll look at different kinds of counters and discuss how to build them

Counters. We ll look at different kinds of counters and discuss how to build them Counters We ll look at different kinds of counters and discuss how to build them These are not only examples of sequential analysis and design, but also real devices used in larger circuits 1 Introducing

More information

VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects

VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Smruti R. Sarangi, Brian Greskamp, Radu Teodorescu, Jun Nakano, Abhishekh Tiwarii and Josep Torrellas University of

More information

Lecture 3, Performance

Lecture 3, Performance Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations

More information

Impact of Scaling on The Effectiveness of Dynamic Power Reduction Schemes

Impact of Scaling on The Effectiveness of Dynamic Power Reduction Schemes Impact of Scaling on The Effectiveness of Dynamic Power Reduction Schemes D. Duarte Intel Corporation david.e.duarte@intel.com N. Vijaykrishnan, M.J. Irwin, H-S Kim Department of CSE, Penn State University

More information

Qualitative vs Quantitative metrics

Qualitative vs Quantitative metrics Qualitative vs Quantitative metrics Quantitative: hard numbers, measurable Time, Energy, Space Signal-to-Noise, Frames-per-second, Memory Usage Money (?) Qualitative: feelings, opinions Complexity: Simple,

More information

Today. ESE534: Computer Organization. Why Care? Why Care. Scaling. ITRS Roadmap

Today. ESE534: Computer Organization. Why Care? Why Care. Scaling. ITRS Roadmap ESE534: Computer Organization Day 5: September 19, 2016 VLSI Scaling 1 Today VLSI Scaling Rules Effects Historical/predicted scaling Variations (cheating) Limits Note: Day 5 and 6 most gory MOSFET equations

More information

2.1. Unit 2. Digital Circuits (Logic)

2.1. Unit 2. Digital Circuits (Logic) 2.1 Unit 2 Digital Circuits (Logic) 2.2 Moving from voltages to 1's and 0's ANALOG VS. DIGITAL volts volts 2.3 Analog signal Signal Types Continuous time signal where each voltage level has a unique meaning

More information

Chapter 1. Binary Systems 1-1. Outline. ! Introductions. ! Number Base Conversions. ! Binary Arithmetic. ! Binary Codes. ! Binary Elements 1-2

Chapter 1. Binary Systems 1-1. Outline. ! Introductions. ! Number Base Conversions. ! Binary Arithmetic. ! Binary Codes. ! Binary Elements 1-2 Chapter 1 Binary Systems 1-1 Outline! Introductions! Number Base Conversions! Binary Arithmetic! Binary Codes! Binary Elements 1-2 3C Integration 傳輸與介面 IA Connecting 聲音與影像 Consumer Screen Phone Set Top

More information

Molecular Dynamics Simulations

Molecular Dynamics Simulations MDGRAPE-3 chip: A 165- Gflops application-specific LSI for Molecular Dynamics Simulations Makoto Taiji High-Performance Biocomputing Research Team Genomic Sciences Center, RIKEN Molecular Dynamics Simulations

More information

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic Section 3: Combinational Logic Design Major Topics Design Procedure Multilevel circuits Design with XOR gates Adders and Subtractors Binary parallel adder Decoders Encoders Multiplexers Programmed Logic

More information

Lecture 5: Performance (Sequential) James C. Hoe Department of ECE Carnegie Mellon University

Lecture 5: Performance (Sequential) James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 5: Performance (Sequential) James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L05 S1, James C. Hoe, CMU/ECE/CALCM, 2018 18 447 S18 L05 S2, James C. Hoe, CMU/ECE/CALCM,

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics Lecture 23 Dealing with Interconnect Impact of Interconnect Parasitics Reduce Reliability Affect Performance Classes of Parasitics Capacitive Resistive Inductive 1 INTERCONNECT Dealing with Capacitance

More information

Lecture 2: CMOS technology. Energy-aware computing

Lecture 2: CMOS technology. Energy-aware computing Energy-Aware Computing Lecture 2: CMOS technology Basic components Transistors Two types: NMOS, PMOS Wires (interconnect) Transistors as switches Gate Drain Source NMOS: When G is @ logic 1 (actually over

More information

Skew-Tolerant Circuit Design

Skew-Tolerant Circuit Design Skew-Tolerant Circuit Design David Harris David_Harris@hmc.edu December, 2000 Harvey Mudd College Claremont, CA Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino

More information

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version)

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version) PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version) Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, Weiping Shi Dept. of Electrical Engineering Dept. of Computer Science Texas

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:

More information

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 Major Components of a Computer Processor Devices Control Memory Input Datapath

More information

LECTURE 28. Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State

LECTURE 28. Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State Today LECTURE 28 Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State Time permitting, RC circuits (where we intentionally put in resistance

More information

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS Bojan Musizza, Dejan Petelin, Juš Kocijan, Jožef Stefan Institute Jamova 39, Ljubljana, Slovenia University of Nova Gorica Vipavska 3, Nova Gorica, Slovenia

More information

Professor Lee, Yong Surk. References. Topics Microprocessor & microcontroller. High Performance Microprocessor Architecture Overview

Professor Lee, Yong Surk. References. Topics Microprocessor & microcontroller. High Performance Microprocessor Architecture Overview This lecture was mae by a generous contribution of C & S Technology corporation. (http://( http://www.cnstec.com) There is no copyright on this lecture. Processor Laboratory homepage, (http://( http://mpu.yonsei.ac.kr)

More information

Where Does Power Go in CMOS?

Where Does Power Go in CMOS? Power Dissipation Where Does Power Go in CMOS? Dynamic Power Consumption Charging and Discharging Capacitors Short Circuit Currents Short Circuit Path between Supply Rails during Switching Leakage Leaking

More information

Computer Architecture

Computer Architecture Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines

More information

Datapath Component Tradeoffs

Datapath Component Tradeoffs Datapath Component Tradeoffs Faster Adders Previously we studied the ripple-carry adder. This design isn t feasible for larger adders due to the ripple delay. ʽ There are several methods that we could

More information

Exascale Computing for Radio Astronomy: GPU or FPGA?

Exascale Computing for Radio Astronomy: GPU or FPGA? Exascale Computing for Radio Astronomy: GPU or FPGA? Kees van Berkel MPSoC 2016, Nara, Japan 2016, July 14 Radio Astronomy: Herculus A (a.k.a. 3C 348) optically invisible jets, one-and-a-half million light-years

More information

τ gd =Q/I=(CV)/I I d,sat =(µc OX /2)(W/L)(V gs -V TH ) 2 ESE534 Computer Organization Today At Issue Preclass 1 Energy and Delay Tradeoff

τ gd =Q/I=(CV)/I I d,sat =(µc OX /2)(W/L)(V gs -V TH ) 2 ESE534 Computer Organization Today At Issue Preclass 1 Energy and Delay Tradeoff ESE534 Computer Organization Today Day 8: February 10, 2010 Energy, Power, Reliability Energy Tradeoffs? Voltage limits and leakage? Variations Transients Thermodynamics meets Information Theory (brief,

More information

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19

More information

Computer Architecture ELEC2401 & ELEC3441

Computer Architecture ELEC2401 & ELEC3441 Last Time Pipeline Hazard Computer Architecture ELEC2401 & ELEC3441 Lecture 8 Pipelining (3) Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Structural Hazard Hazard Control

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information