Lecture 27: Hardware Acceleration. James C. Hoe Department of ECE Carnegie Mellon University

Size: px
Start display at page:

Download "Lecture 27: Hardware Acceleration. James C. Hoe Department of ECE Carnegie Mellon University"

Transcription

1 Lecture 27: Hardware Acceleration James C. Hoe Department of ECE Carnegie Mellon niversity S18 L27 S1, James C. Hoe, CM/ECE/CALCM, 2018

2 S18 L27 S2, James C. Hoe, CM/ECE/CALCM, 2018 Housekeeping Your goal today see why you should care about accelerators know the basics to think about the topic Notices Lab4, due this week HW5, past due practice final solutions (hardcopies) Readings Amdahl's Law in the Multi Era, 2008 (optional) Single Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPs? 2010 (optional)

3 HW Acceleration is nothing new! What needed to be faster/smaller/cheaper/ lower energy than SW has always been done in HW we go to HW when SW isn t good enough because good HW can be more efficient we don t go to HW when SW is good enough because good HW takes more work When we say HW acceleration, we always mean efficient and not just correct S18 L27 S3, James C. Hoe, CM/ECE/CALCM, 2018

4 Computing s Brave New World Microsoft Catapult [MICRO 2016, Caulfield, et al.] Google TP [Hotchips, 2017, Jeff Dean] S18 L27 S4, James C. Hoe, CM/ECE/CALCM, 2018

5 How we got here.... Big Core 1970~ ~?? S18 L27 S5, James C. Hoe, CM/ECE/CALCM, 2018

6 Moore s Law without Dennard Scaling Intl. Technology Roadmap for Semiconductors logic density VDD >16x 1 25% 0 node label ?? S18 L27 S6, James C. Hoe, CM/ECE/CALCM, 2018 nder fixed power ceiling, more ops/second only achievable if less Joules/op?

7 Future is about Performance/Watt and Ops/Joule Big Core What will (or can) you do with all those transistors? GPGP FPGA Custom Logic S18 L27 S7, James C. Hoe, CM/ECE/CALCM, 2018 This is a sign of desperation....

8 Why is Computing Directly in Hardware Efficient? S18 L27 S8, James C. Hoe, CM/ECE/CALCM, 2018

9 S18 L27 S9, James C. Hoe, CM/ECE/CALCM, 2018 Why is HW/FPGA better? no overhead A processor spends a lot of transistors & energy to present von Neumann ISA abstraction to support a broad application base (e.g., caches, superscalar out of order, prefetching,...) In fact, processor is mostly overhead ~90% energy [Hameed, ISCA 2010, Tensilica ] ~95% energy [Balfour, CAL 2007, embedded RISC ] even worse on a high perf superscalar OoO proc Computing directly in application specific hardware can be 10x to 100x more energy efficient

10 Why is HW/FPGA better? For a given functionality, non linear tradeoff between power and performance slower design is simpler lower frequency needs lower voltage For the same throughput, replacing 1 module by 2 half as fast reduces total power and energy S18 L27 S10, James C. Hoe, CM/ECE/CALCM, 2018 efficiency of parallelism Power (Joule/sec) Better to replace 1 of this by 2 of these; or N of these Power Perf k>1 Perf (op/sec) Good hardware designs derive performance from parallelism

11 Software to Hardware Spectrum Software CP: highest level abstraction / most general purpose support GP: explicitly parallel programs / best for SIMD, regular FPGA: ASIC like abstraction / overhead for reprogrammability ASIC: lowest level abstraction / fixed application and tuning Hardware tradeoff between efficiency and effort S18 L27 S11, James C. Hoe, CM/ECE/CALCM, 2018

12 Case Study [Chung, MICRO 2010] CP GPs FPGA ASIC Intel Core i7 960 Nvidia GTX285 ATI R5870 Xilinx V6 LX760 Std. Cell Year Node 45nm 55nm 40nm 40nm 65nm Die area 263mm 2 470mm 2 334mm 2 Clock rate 3.2GHz 1.5GHz 1.5GHz 0.3GHz Single prec. floating point apps M M Mult FFT Black Scholes MKL Multithreaded Spiral.net Multithreaded PARSEC multithreaded CBLAS 2.3 CAL++ hand coded CFFT /3.1 CDA 2.3 Spiral.net hand coded S18 L27 S12, James C. Hoe, CM/ECE/CALCM, 2018

13 Best Case Performance and Energy MMM Device GFLOP/s actual (GFLOP/s)/mm 2 normalized to 40nm GFLOP/J normalized to 40nm Intel Core i7 (45nm) Nvidia GTX285 (55nm) ATI R5870 (40nm) XilinxV6 LX760 (40nm) same RTL std cell (65nm) CP and GP benchmarking is compute bound; FPGA and Std Cell effectively compute bound (no off chip I/O) Power (switching+leakage) measurements isolated the from the system For detail see [Chung, et al. MICRO 2010] S18 L27 S13, James C. Hoe, CM/ECE/CALCM, 2018

14 Less Regular Applications GFLOP/s (GFLOP/s)/mm 2 GFLOP/J FFT 2 10 Intel Core i7 (45nm) Nvidia GTX285 (55nm) ATI R5870 (40nm) XilinxV6 LX760 (40nm) same RTL std cell (65nm) Mopt/s (Mopts/s)/mm 2 Mopts/J Black Scholes Intel Core i7 (45nm) Nvidia GTX285 (55nm) ATI R5870 (40nm) XilinxV6 LX760 (40nm) same RTL std cell (65nm) S18 L27 S14, James C. Hoe, CM/ECE/CALCM, 2018

15 ASIC isn t always ultimate in performance Amdahl s Law: S overall = 1 / ( (1 f)+ f/s f ) S f ASIC > S f FPGA but f ASIC f FPGA f FPGA > f ASIC (when not perfectly app specific) more flexible design to cover a greater fraction reprogram FPGA to cover different applications S f ASIC S overall S f FPGA [based on Joel Emer s original comment about programmable accelerators in general] S18 L27 S15, James C. Hoe, CM/ECE/CALCM, 2018 f ASIC f FPGA (at break even)

16 S18 L27 S16, James C. Hoe, CM/ECE/CALCM, 2018 Tradeoff in Heterogeneity?

17 Amdahl s Law on Multi Base Core Equivalent () in [Hill and Marty, 2008] A program is rarely completely parallelizable; let s say a fraction f is perfectly parallelizable Speedup of n s over sequential Speedup for small f, die area underutilized 1 (1 f ) f n S18 L27 S17, James C. Hoe, CM/ECE/CALCM, 2018

18 F=0.999 F=0.99 F=0.9 F=0.5 more smaller s S18 L27 S18, James C. Hoe, CM/ECE/CALCM, 2018 size of s in fewer larger s

19 Asymmetric Multis Fast Core Base Core Equivalent () in [Hill and Marty, 2008] S18 L27 S19, James C. Hoe, CM/ECE/CALCM, 2018 Pwr/area efficient slow s vs pwr/area hungry fast fast for sequential code slow s for parallel sections [Hill and Marty, 2008] Speedup 1 f perf 1 f ( n r) seq perf seq r = cost of fast in perf seq = speedup of fast over solve for optimal die allocation

20 F=0.999 F=0.99 assume perf of fast grows with sqrt of area F=0.9 F= S18 L27 S20, James C. Hoe, CM/ECE/CALCM, 2018 size of fast in

21 Asymmetric Fast Core Fast Core S18 L27 S21, James C. Hoe, CM/ECE/CALCM, 2018 Heterogeneous Multis [Chung, et al. MICRO 2010] Base Core Equivalent Heterogeneous Speedup Speedup 1 f perf 1- f perf seq perf seq seq 1 f ( n r) [Hill and Marty, 2008] simplified f is fraction parallelizable n is total die area in units r is fast area in units perf seq (r) is fast perf. relative to 1 f ( n - r) For the sake of analysis, break the area for GP/FPGA/etc. into units of s that are the same size as s. Each type is characterized by a relative performance µ and relative power compared to a

22 and μ example values MMM Black Scholes FFT 2 10 Nvidia GTX285 Xilinx LX760 Custom Logic Φ μ On equal area basis, 3.41x performance at 0.74x power relative a Φ μ Φ μ S18 L27 S22, James C. Hoe, CM/ECE/CALCM, 2018 Nominal based on an Intel Atom in order processor, 26mm 2 in a 45nm process

23 Modeling Power and Bandwidth Budgets Heterogeneous Fast Core Speedup 1- f perf seq 1 f ( n - r) The above is based on area alone Power or bandwidth budget limits the usable die area if P is total power budget expressed as a multiple of a s power, usable area n r P if B is total memory bandwidth expressed as a multiple of s, usable area n r B μ S18 L27 S23, James C. Hoe, CM/ECE/CALCM, 2018

24 Combine Model with ITRS Trends Year Technology 40nm 32nm 22nm 16nm 11nm Core die budget (mm 2 ) Normalized area () (16x) Core power (W) Bandwidth (GB/s) (1.4x) Rel pwr per device 1X 0.75X 0.5X 0.36X 0.25X Fast Core 2011 parameters reflect high end systems of the day; future parameters extrapolated from ITRS mm 2 populated by an optimally sized Fast Core and s of choice S18 L27 S24, James C. Hoe, CM/ECE/CALCM, 2018

25 Single Prec. MMMult (f=99%) 300 ASIC 250 Speedup nm 32nm 22nm 16nm 11nm SymMC AsymMC (0) SymCMP (1) AsymCMP (2) ASIC FPGA (3) LX760 GP R5870 (5) R5870 GP R5870 FPGA LX760 Sym f=0.990 & Asym multi Power Bound Mem Bound S18 L27 S25, James C. Hoe, CM/ECE/CALCM, 2018

26 Single Prec. MMMult (f=90%) Speedup ASIC GP R5870 FPGA LX760 Asymmetric multi Symmetric multi f=0.900 x 0 40nm 32nm 22nm 16nm 11nm SymMC AsymMC (0) SymCMP (1) AsymCMP (2) ASIC FPGA (3) LX760 GP R5870 (5) R5870 Power Bound Mem Bound S18 L27 S26, James C. Hoe, CM/ECE/CALCM, 2018

27 Single Prec. MMMult (f=50%) ASIC/GP/FPGA Asymmetric Symmetric Speedup f= nm 32nm 22nm 16nm 11nm SymMC AsymMC (0) SymCMP (1) AsymCMP (2) ASIC ASIC FPGA (3) LX760 GP R5870 (5) R5870 Power Bound Mem Bound S18 L27 S27, James C. Hoe, CM/ECE/CALCM, 2018

28 Single Prec. FFT 1024 (f=99%) 60 ASIC/GP/FPGA 50 Speedup Asymmetric Symmetric 10 f= nm 32nm 22nm 16nm 11nm SymMC AsymMC (0) SymCMP (1) AsymCMP (2) ASIC FPGA (3) LX760 GP R5870 (4) GTX480 Power Bound Mem Bound S18 L27 S28, James C. Hoe, CM/ECE/CALCM, 2018

29 FFT 1024 (f=99%) if hypothetical 1TB/sec bandwidth 200 ASIC Speedup FPGA LX760 GP GTX nm 32nm 22nm 16nm 11nm SymMC AsymMC (0) SymCMP (1) AsymCMP (2) ASIC ASIC FPGA (3) LX760 GP R5870 (4) GTX480 f=0.990 Sym & Asym multi Power Bound Mem Bound S18 L27 S29, James C. Hoe, CM/ECE/CALCM, 2018

30 You will be seeing more of this Performance scaling requires improved efficiency in Op/Joules and Perf/Watt Hardware acceleration is the most direct way to improve energy/power efficiency Need better hardware design methodology to enable application developers (without losing hardware s advantages) Software is easy; hardware is hard? Hardware isn t hard; perf and efficiency is!!! S18 L27 S30, James C. Hoe, CM/ECE/CALCM, 2018

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power James C. Hoe Department of ECE Carnegie Mellon niversity Eric S. Chung, et al., Single chip Heterogeneous Computing:

More information

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 12: Energy and Power James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L12 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today a working understanding of

More information

Lecture 5: Performance and Efficiency. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 5: Performance and Efficiency. James C. Hoe Department of ECE Carnegie Mellon University 18 643 Lecture 5: Performance and Efficiency James C. Hoe Department of ECE Carnegie Mellon University 18 643 F17 L05 S1, James C. Hoe, CMU/ECE/CALCM, 2017 18 643 F17 L05 S2, James C. Hoe, CMU/ECE/CALCM,

More information

Lecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 23: Illusiveness of Parallel Performance James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L23 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping peel

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory L16: Power Dissipation in Digital Systems 1 Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974

More information

Lecture 5: Performance (Sequential) James C. Hoe Department of ECE Carnegie Mellon University

Lecture 5: Performance (Sequential) James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 5: Performance (Sequential) James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L05 S1, James C. Hoe, CMU/ECE/CALCM, 2018 18 447 S18 L05 S2, James C. Hoe, CMU/ECE/CALCM,

More information

Performance Evaluation of MPI on Weather and Hydrological Models

Performance Evaluation of MPI on Weather and Hydrological Models NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance

More information

Lecture 15: Scaling & Economics

Lecture 15: Scaling & Economics Lecture 15: Scaling & Economics Outline Scaling Transistors Interconnect Future Challenges Economics 2 Moore s Law Recall that Moore s Law has been driving CMOS [Moore65] Corollary: clock speeds have improved

More information

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of ESE532: System-on-a-Chip Architecture Day 20: November 8, 2017 Energy Today Energy Today s bottleneck What drives Efficiency of Processors, FPGAs, accelerators How does parallelism impact energy? 1 2 Message

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

Intro To Digital Logic

Intro To Digital Logic Intro To Digital Logic 1 Announcements... Project 2.2 out But delayed till after the midterm Midterm in a week Covers up to last lecture + next week's homework & lab Nick goes "H-Bomb of Justice" About

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

CIS 371 Computer Organization and Design

CIS 371 Computer Organization and Design CIS 371 Computer Organization and Design Unit 13: Power & Energy Slides developed by Milo Mar0n & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Anton: A Specialized ASIC for Molecular Dynamics

Anton: A Specialized ASIC for Molecular Dynamics Anton: A Specialized ASIC for Molecular Dynamics Martin M. Deneroff, David E. Shaw, Ron O. Dror, Jeffrey S. Kuskin, Richard H. Larson, John K. Salmon, and Cliff Young D. E. Shaw Research Marty.Deneroff@DEShawResearch.com

More information

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of

Today. ESE532: System-on-a-Chip Architecture. Energy. Message. Preclass Challenge: Power. Energy Today s bottleneck What drives Efficiency of ESE532: System-on-a-Chip Architecture Day 22: April 10, 2017 Today Today s bottleneck What drives Efficiency of Processors, FPGAs, accelerators 1 2 Message dominates Including limiting performance Make

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Lecture 4: Technology Scaling

Lecture 4: Technology Scaling Digital Integrated Circuits (83-313) Lecture 4: Technology Scaling Semester B, 2016-17 Lecturer: Dr. Adam Teman TAs: Itamar Levi, Robert Giterman 2 April 2017 Disclaimer: This course was prepared, in its

More information

Administrative Stuff

Administrative Stuff EE141- Spring 2004 Digital Integrated Circuits Lecture 30 PERSPECTIVES 1 Administrative Stuff Homework 10 posted just for practice. No need to turn in (hw 9 due today). Normal office hours next week. HKN

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture1 Fundamentals of Quantitative Design and Analysis (II) Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University 1.4 Trends in Technology Logic: transistor density 35%/year,

More information

Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013

Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013 Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions James Demmel, Andrew Gearhart, Benjamin Lipshitz and Oded Schwartz Electrical Engineering and Computer Sciences University

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8 EECS 141 S02 Lecture 8 Power Dissipation CMOS Scaling Last Lecture CMOS Inverter loading Switching Performance Evaluation Design optimization Inverter Sizing 1 Today CMOS Inverter power dissipation» Dynamic»

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Post Von Neumann Computing

Post Von Neumann Computing Post Von Neumann Computing Matthias Kaiserswerth Hasler Stiftung (formerly IBM Research) 1 2014 IBM Corporation Foundation Purpose Support information and communication technologies (ICT) to advance Switzerland

More information

From Physics to Logic

From Physics to Logic From Physics to Logic This course aims to introduce you to the layers of abstraction of modern computer systems. We won t spend much time below the level of bits, bytes, words, and functional units, but

More information

CS 100: Parallel Computing

CS 100: Parallel Computing CS 100: Parallel Computing Chris Kauffman Week 12 Logistics Upcoming HW 5: Due Friday by 11:59pm HW 6: Up by Early Next Week Moore s Law: CPUs get faster Smaller transistors closer together Smaller transistors

More information

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters 1 Scaling of MOS Circuits CONTENTS 1. What is scaling?. Why scaling? 3. Figure(s) of Merit (FoM) for scaling 4. International Technology Roadmap for Semiconductors (ITRS) 5. Scaling models 6. Scaling factors

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

τ gd =Q/I=(CV)/I I d,sat =(µc OX /2)(W/L)(V gs -V TH ) 2 ESE534 Computer Organization Today At Issue Preclass 1 Energy and Delay Tradeoff

τ gd =Q/I=(CV)/I I d,sat =(µc OX /2)(W/L)(V gs -V TH ) 2 ESE534 Computer Organization Today At Issue Preclass 1 Energy and Delay Tradeoff ESE534 Computer Organization Today Day 8: February 10, 2010 Energy, Power, Reliability Energy Tradeoffs? Voltage limits and leakage? Variations Transients Thermodynamics meets Information Theory (brief,

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

ECE 574 Cluster Computing Lecture 20

ECE 574 Cluster Computing Lecture 20 ECE 574 Cluster Computing Lecture 20 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 18 April 2017 Announcements Project updates, related work. HW#8 was due Big Data: Last HW not

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions Lecture Slides Intro Number Systems Logic Functions EE 0 in Context EE 0 EE 20L Logic Design Fundamentals Logic Design, CAD Tools, Lab tools, Project EE 357 EE 457 Computer Architecture Using the logic

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation EEC 118 Lecture #6: CMOS Logic Rajeevan mirtharajah University of California, Davis Jeff Parkhurst Intel Corporation nnouncements Quiz 1 today! Lab 2 reports due this week Lab 3 this week HW 3 due this

More information

Performance, Power & Energy

Performance, Power & Energy Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?

More information

In 1980, the yield = 48% and the Die Area = 0.16 from figure In 1992, the yield = 48% and the Die Area = 0.97 from figure 1.31.

In 1980, the yield = 48% and the Die Area = 0.16 from figure In 1992, the yield = 48% and the Die Area = 0.97 from figure 1.31. CS152 Homework 1 Solutions Spring 2004 1.51 Yield = 1 / ((1 + (Defects per area * Die Area / 2))^2) Thus, if die area increases, defects per area must decrease. 1.52 Solving the yield equation for Defects

More information

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes Wen Wang 1, Jakub Szefer 1, and Ruben Niederhagen 2 1. Yale University, USA 2. Fraunhofer Institute SIT, Germany April 9, 2018 PQCrypto 2018

More information

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering TIMING ANALYSIS Overview Circuits do not respond instantaneously to input changes

More information

Lecture 3, Performance

Lecture 3, Performance Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations

More information

LEADING THE EVOLUTION OF COMPUTE MARK KACHMAREK HPC STRATEGIC PLANNING MANAGER APRIL 17, 2018

LEADING THE EVOLUTION OF COMPUTE MARK KACHMAREK HPC STRATEGIC PLANNING MANAGER APRIL 17, 2018 LEADING THE EVOLUTION OF COMPUTE MARK KACHMAREK HPC STRATEGIC PLANNING MANAGER APRIL 17, 2018 INTEL S RESEARCH EFFORTS COMPONENTS RESEARCH INTEL LABS ENABLING MOORE S LAW DEVELOPING NOVEL INTEGRATION ENABLING

More information

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose

More information

Summary of Hyperion Research's First QC Expert Panel Survey Questions/Answers. Bob Sorensen, Earl Joseph, Steve Conway, and Alex Norton

Summary of Hyperion Research's First QC Expert Panel Survey Questions/Answers. Bob Sorensen, Earl Joseph, Steve Conway, and Alex Norton Summary of Hyperion Research's First QC Expert Panel Survey Questions/Answers Bob Sorensen, Earl Joseph, Steve Conway, and Alex Norton Hyperion s Quantum Computing Program Global Coverage of R&D Efforts

More information

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. 1 2 Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change

More information

Power in Digital CMOS Circuits. Fruits of Scaling SpecInt 2000

Power in Digital CMOS Circuits. Fruits of Scaling SpecInt 2000 Power in Digital CMOS Circuits Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2004 by Mark Horowitz MAH 1 Fruits of Scaling SpecInt 2000 1000.00 100.00 10.00

More information

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS Bojan Musizza, Dejan Petelin, Juš Kocijan, Jožef Stefan Institute Jamova 39, Ljubljana, Slovenia University of Nova Gorica Vipavska 3, Nova Gorica, Slovenia

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

CS-683: Advanced Computer Architecture Course Introduction

CS-683: Advanced Computer Architecture Course Introduction CS-683: Advanced Computer Architecture Course Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Goals for Performance Lecture

Goals for Performance Lecture Goals for Performance Lecture Understand performance, speedup, throughput, latency Relationship between cycle time, cycles/instruction (CPI), number of instructions (the performance equation) Amdahl s

More information

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance Metrics & Architectural Adaptivity ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So What are the Options? Power Consumption Activity factor (amount of circuit switching) Load Capacitance (size

More information

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation EEC 116 Lecture #5: CMOS Logic Rajeevan mirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation nnouncements Quiz 1 today! Lab 2 reports due this week Lab 3 this week HW

More information

Novel Devices and Circuits for Computing

Novel Devices and Circuits for Computing Novel Devices and Circuits for Computing UCSB 594BB Winter 2013 Lecture 4: Resistive switching: Logic Class Outline Material Implication logic Stochastic computing Reconfigurable logic Material Implication

More information

Introduction The Nature of High-Performance Computation

Introduction The Nature of High-Performance Computation 1 Introduction The Nature of High-Performance Computation The need for speed. Since the beginning of the era of the modern digital computer in the early 1940s, computing power has increased at an exponential

More information

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan,

More information

Previously. ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems. Today. Variation Types. Fabrication

Previously. ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems. Today. Variation Types. Fabrication ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Previously Understand how to model transistor behavior Given that we know its parameters V dd, V th, t OX, C OX, W, L, N A Day

More information

Today. ESE532: System-on-a-Chip Architecture. Why Care? Message. Scaling. Why Care: Custom SoC

Today. ESE532: System-on-a-Chip Architecture. Why Care? Message. Scaling. Why Care: Custom SoC ESE532: System-on-a-Chip Architecture Day 21: April 5, 2017 VLSI Scaling 1 Today VLSI Scaling Rules Effects Historical/predicted scaling Variations (cheating) Limits Note: gory equations! goal is to understand

More information

New Exploration Frameworks for Temperature-Aware Design of MPSoCs. Prof. David Atienza

New Exploration Frameworks for Temperature-Aware Design of MPSoCs. Prof. David Atienza New Exploration Frameworks for Temperature-Aware Degn of MPSoCs Prof. David Atienza Dept Computer Architecture and Systems Engineering (DACYA) Complutense Univerty of Madrid, Spain Integrated Systems Lab

More information

Where Does Power Go in CMOS?

Where Does Power Go in CMOS? Power Dissipation Where Does Power Go in CMOS? Dynamic Power Consumption Charging and Discharging Capacitors Short Circuit Currents Short Circuit Path between Supply Rails during Switching Leakage Leaking

More information

Parallel Performance Theory - 1

Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis

More information

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem Abid Rafique, Nachiket Kapre, and George A. Constantinides Electrical and Electronic Engineering,

More information

Skew-Tolerant Circuit Design

Skew-Tolerant Circuit Design Skew-Tolerant Circuit Design David Harris David_Harris@hmc.edu December, 2000 Harvey Mudd College Claremont, CA Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino

More information

Continuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering

Continuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering Thermal Modeling, Analysis and Management of 2D Multi-Processor System-on-Chip Prof David Atienza Alonso Embedded Systems Laboratory (ESL) Institute of EE, Falty of Engineering Outline MPSoC thermal modeling

More information

Analytical Modeling of Parallel Systems

Analytical Modeling of Parallel Systems Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

Tradeoff between Reliability and Power Management

Tradeoff between Reliability and Power Management Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,

More information

EE 240B Spring Advanced Analog Integrated Circuits Lecture 2: MOS Transistor Models. Elad Alon Dept. of EECS

EE 240B Spring Advanced Analog Integrated Circuits Lecture 2: MOS Transistor Models. Elad Alon Dept. of EECS EE 240B Spring 2018 Advanced Analog Integrated Circuits Lecture 2: MOS Transistor Models Elad Alon Dept. of EECS Square Law Model? Assumptions made to come up with this model: Charge density determined

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

Lecture 3, Performance

Lecture 3, Performance Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations

More information

3/10/2013. Lecture #1. How small is Nano? (A movie) What is Nanotechnology? What is Nanoelectronics? What are Emerging Devices?

3/10/2013. Lecture #1. How small is Nano? (A movie) What is Nanotechnology? What is Nanoelectronics? What are Emerging Devices? EECS 498/598: Nanocircuits and Nanoarchitectures Lecture 1: Introduction to Nanotelectronic Devices (Sept. 5) Lectures 2: ITRS Nanoelectronics Road Map (Sept 7) Lecture 3: Nanodevices; Guest Lecture by

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on

More information

Qualitative vs Quantitative metrics

Qualitative vs Quantitative metrics Qualitative vs Quantitative metrics Quantitative: hard numbers, measurable Time, Energy, Space Signal-to-Noise, Frames-per-second, Memory Usage Money (?) Qualitative: feelings, opinions Complexity: Simple,

More information

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics Lecture 23 Dealing with Interconnect Impact of Interconnect Parasitics Reduce Reliability Affect Performance Classes of Parasitics Capacitive Resistive Inductive 1 INTERCONNECT Dealing with Capacitance

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

Lecture 11. Advanced Dividers

Lecture 11. Advanced Dividers Lecture 11 Advanced Dividers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division

More information

Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm

Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm Aaron Stillmaker, Zhibin Xiao, and Bevan Baas VLSI Computation Lab Department of Electrical and Computer Engineering University

More information

Datapath Component Tradeoffs

Datapath Component Tradeoffs Datapath Component Tradeoffs Faster Adders Previously we studied the ripple-carry adder. This design isn t feasible for larger adders due to the ripple delay. ʽ There are several methods that we could

More information

14.1. Unit 14. State Machine Design

14.1. Unit 14. State Machine Design 4. Unit 4 State Machine Design 4.2 Outcomes I can create a state diagram to solve a sequential problem I can implement a working state machine given a state diagram STATE MACHINES OVERVIEW 4.3 4.4 Review

More information

CRYPTOGRAPHIC COMPUTING

CRYPTOGRAPHIC COMPUTING CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,

More information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: Digital Logic Our goal for the next few weeks is to paint a a reasonably complete picture of how we can go from transistor

More information

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations technische universität dortmund Universität Dortmund fakultät für mathematik LS III (IAM) UCHPC UnConventional High Performance Computing for Finite Element Simulations S. Turek, Chr. Becker, S. Buijssen,

More information

2.1. Unit 2. Digital Circuits (Logic)

2.1. Unit 2. Digital Circuits (Logic) 2.1 Unit 2 Digital Circuits (Logic) 2.2 Moving from voltages to 1's and 0's ANALOG VS. DIGITAL volts volts 2.3 Analog signal Signal Types Continuous time signal where each voltage level has a unique meaning

More information

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then:

Amdahl's Law. Execution time new = ((1 f) + f/s) Execution time. S. Then: Amdahl's Law Useful for evaluating the impact of a change. (A general observation.) Insight: Improving a feature cannot improve performance beyond the use of the feature Suppose we introduce a particular

More information

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &

More information

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect Lecture 25 Dealing with Interconnect and Timing Administrivia Projects will be graded by next week Project phase 3 will be announced next Tu.» Will be homework-like» Report will be combined poster Today

More information

2. Accelerated Computations

2. Accelerated Computations 2. Accelerated Computations 2.1. Bent Function Enumeration by a Circular Pipeline Implemented on an FPGA Stuart W. Schneider Jon T. Butler 2.1.1. Background A naive approach to encoding a plaintext message

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 19: March 29, 2018 Memory Overview, Memory Core Cells Today! Charge Leakage/Charge Sharing " Domino Logic Design Considerations! Logic Comparisons!

More information