Reducing NVM Writes with Optimized Shadow Paging
|
|
- Allison Davidson
- 5 years ago
- Views:
Transcription
1 Reducing NVM Writes with Optimized Shadow Paging Yuanjiang Ni, Jishen Zhao, Daniel Bittman, Ethan L. Miller Center for Research in Storage Systems University of California, Santa Cruz
2 Emerging Technology Memory Storage Byte-Addressable High speed Volatile Small capacity BNVM Block-Addressable Slow Durable Large Capacity 2
3 New Storage Architecture read()/write() cache-line load/store cache-line flush DRAM BNVM fsync(), etc page HDD/SSD 3
4 Crash Consistency A 1,000,000 XBEGIN B 1,000,000 A.account -= 500,000 B.account += 500,000 XEND A 500,000 B 1,000,000 Crash-consistency is a must! A, B lost money! 4
5 Opportunities Leverage byte-addressability e.g Fine-grained logging. 5
6 Opportunities Leverage byte-addressability e.g Fine-grained logging. Leverage virtual memory Indirection is necessary for many techniques Can we directly leverage virtual memory indirection? 6
7 Opportunities Leverage byte-addressability e.g Fine-grained logging. Leverage virtual memory Indirection is necessary for many techniques Can we directly leverage virtual memory indirection? Explore Hardware Support Intel proposes instructions such as clwb for especially persistent memory. Other HW supports? 7
8 Inefficiencies of Existing Approaches Extra writes to NVM are bad. Performance Endurance 8
9 Inefficiencies of Existing Approaches X Extra Writes Y Newly Written Z Data from Last Commit Extra writes to NVM are bad. Data Log A B A B Write twice Performance Logging C D Endurance Logging Write the actual data twice 9
10 Inefficiencies of Existing Approaches X Extra Writes Y Newly Written Z Data from Last Commit Extra writes to NVM are bad. Data Log A B A B Write twice Performance Logging C D Endurance Logging P0 P1 Write the actual data twice Shadow A B A B Paging shadow paging C D C D Copy unmodified Copy unmodified data 10
11 Inefficiencies of Existing Approaches X Extra Writes Y Newly Written Z Data from Last Commit Extra writes to NVM are bad. Data Log A B A B Write twice Performance Logging C D Endurance Logging P0 P1 Write the actual data twice Shadow A B A B Paging shadow paging C D C D Copy unmodified Copy unmodified data P0 P1 Our approach - OSP A B A B OSP C D 11
12 Cache-line Level Mapping Track modifications at cache line level? Can t simply reduce page size! 12
13 Cache-line Level Mapping P0 P1 Two bits per cache line Committed Bit - Where is the old state? Updated Bit - has this cache line been updated? Only required when pages are being actively updated! 13
14 TLB Extension Wider TLB entry Committed bitmap Updated bitmap Additional PPN 14
15 TLB Extension Wider TLB entry Committed bitmap Updated bitmap Additional PPN Minimal impact on run-time performance. Require only few gate delays Done in parallel with cache access (e.g. VIPT caches) 15
16 TLB Extension Wider TLB entry Committed bitmap Updated bitmap Additional PPN Minimal impact on run-time performance. Require only few gate delays Done in parallel with cache access (e.g. VIPT caches) Need not change the PTE Additional information required only when pages are actively being updated. 16
17 Example P0 P1 VPN P0 P1 Committed Updated Wider TLB entry V P0 P
18 Read the cache line 0 Read from P ( Committed_bit XOR updated_bit ) P0 P1 VPN P0 P1 Committed Updated Wider TLB entry V P0 P
19 Update the cache line 0 Writes go to P ( Committed_bit XOR 1 ) And, set the updated_bit P0 P1 VPN P0 P1 Committed Updated Wider TLB entry V P0 P
20 Update the cache line 1 Writes go to P ( Committed_bit XOR 1 ) And, set the updated_bit P0 P1 VPN P0 P1 Committed Updated Wider TLB entry V P0 P
21 Commit committed bitmap = (committed bitmap XOR updated bitmap) And, clear the updated bitmap P0 Before After P0 P1 P1 VPN P0 P1 Committed Updated VPN P0 P1 Committed Updated V P0 P V P0 P
22 Abort Clear the updated bitmap P0 Before After P0 P1 P1 VPN P0 P1 Committed Updated VPN P0 P1 Committed Updated V P0 P V P0 P
23 Page Consolidation Double physical pages can waste memory space Reduce storage cost Consolidating virtual pages that are not being actively updated. Copy valid data into one page and free the other one. TLB eviction identifies inactive virtual pages. Page Consolidation is not a per-transaction overhead. 23
24 Multi-page Atomicity Consistent State Table VPN Committed V1 V2 Can't atomically update separate locations in-place 24
25 Lightweight Journaling Consistent State Table VPN Committed Journaling Completed V1 V2 V1 Bitmap1 V2 Bitmap2 TX-END V2 Bitmap2 uncompleted Lightweight and not a per-update overhead! 25
26 Experiment Setup Based on McSimA+ 64-entry L1 DTLB Transactional workloads: array swap (SPS), hashtable (HT), RBtree (RBT), B-tree (BT) *-uni : inserts/deletes in a uniformly random fashion *-zipf : inserts/deletes following Zipf distribution 1G ~ 4G footprint Metric: CPU flush 26
27 CPU Flushes CPU flushes (normalized) SPS baseline (undo-log) HT-uni HT-zipf RBT-uni RBT-zipf BT-uni BT-zipf Reduces the number of CPU flushes by 1.6x on average OSP 27
28 Breakdown CPU flushes (normalized) in-place journaling consolidation SPS HT-uni HT-zipf RBT-uni RBT-zipf BT-uni BT-zipf Nearly eliminate all of the consistency cost for workloads with locality 28
29 Discussion Limitations. Size of a transaction is limited by the TLB capacity Fallback path. TLB coherence for multi-threaded processes Overhead, correctness Work with virtual cache 29
30 Conclusion Use virtual memory system to implement efficient, transactional update avoid extra copies required by logging Keep two copies of each page being modified Track modifications at the cache line level Avoid the inefficiencies of traditional shadow paging Small changes to hardware: TLB extension Preliminary simulation shows great promise 30
31 Questions Collaborators: Yuanjiang Ni Jishen Zhao ) Daniel Bittman ) Ethan Miller 31
ECE 571 Advanced Microprocessor-Based Design Lecture 10
ECE 571 Advanced Microprocessor-Based Design Lecture 10 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 23 February 2017 Announcements HW#5 due HW#6 will be posted 1 Oh No, More
More information1 st Semester 2007/2008
Chapter 17: System Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides baseados nos slides oficiais do livro Database System c Silberschatz, Korth and Sudarshan.
More informationNEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0
NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented
More informationECE 571 Advanced Microprocessor-Based Design Lecture 9
ECE 571 Advanced Microprocessor-Based Design Lecture 9 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 20 February 2018 Announcements HW#4 was posted. About branch predictors Don
More informationChapter 7. Sequential Circuits Registers, Counters, RAM
Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage
More informationTHE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY. Daniel Sanchez and Christos Kozyrakis Stanford University
THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY Daniel Sanchez and Christos Kozyrakis Stanford University MICRO-43, December 6 th 21 Executive Summary 2 Mitigating the memory wall requires large, highly
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationParallel Numerics. Scope: Revise standard numerical methods considering parallel computations!
Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:
More information416 Distributed Systems
416 Distributed Systems RAID, Feb 26 2018 Thanks to Greg Ganger and Remzi Arapaci-Dusseau for slides Outline Using multiple disks Why have multiple disks? problem and approaches RAID levels and performance
More informationOutline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014
Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro
More informationPerformance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So
Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION
More informationMTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application
2011 11th Non-Volatile Memory Technology Symposium @ Shanghai, China, Nov. 9, 20112 MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application Takahiro Hanyu 1,3, S. Matsunaga 1, D. Suzuki
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationAS the number of cores per chip continues to increase,
IEEE TRANSACTIONS ON COMPUTERS 1 Improving Bit Flip Reduction for Biased and Random Data Seyed Mohammad Seyedzadeh, Rakan Maddah, Donald Kline Jr, Student Member, IEEE, Alex K. Jones, Senior Member, IEEE,
More informationJim Held, Ph.D., Intel Fellow & Director Emerging Technology Research, Intel Labs. HPC User Forum April 18, 2018
Jim Held, Ph.D., Intel Fellow & Director Emerging Technology Research, Intel Labs HPC User Forum April 18, 2018 Quantum Computing: Key Concepts Superposition Classical Physics Quantum Physics v Entanglement
More informationOrigami: Folding Warps for Energy Efficient GPUs
Origami: Folding Warps for Energy Efficient GPUs Mohammad Abdel-Majeed*, Daniel Wong, Justin Huang and Murali Annavaram* * University of Southern alifornia University of alifornia, Riverside Stanford University
More informationChapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>
Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building
More informationLecture 19. Architectural Directions
Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:
More informationAnalytical Modeling of Parallel Programs (Chapter 5) Alexandre David
Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on
More information/ : Computer Architecture and Design
16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)
More informationBlock AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark
Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise
More informationLecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 23: Illusiveness of Parallel Performance James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L23 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping peel
More informationCyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism
Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism Nima Honarmand, Nathan Dautenhahn, Josep Torrellas and Samuel T. King (UIUC) Gilles Pokam and Cristiano Pereira (Intel) iacoma.cs.uiuc.edu
More informationAccelerating Decoupled Look-ahead to Exploit Implicit Parallelism
Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Raj Parihar Advisor: Prof. Michael C. Huang March 22, 2013 Raj Parihar Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism
More informationImpression Store: Compressive Sensing-based Storage for. Big Data Analytics
Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in
More informationStrassen s Algorithm for Tensor Contraction
Strassen s Algorithm for Tensor Contraction Jianyu Huang, Devin A. Matthews, Robert A. van de Geijn The University of Texas at Austin September 14-15, 2017 Tensor Computation Workshop Flatiron Institute,
More informationECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018
ECE 172 Digital Systems Chapter 12 Instruction Pipelining Herbert G. Mayer, PSU Status 7/20/2018 1 Syllabus l Scheduling on Pipelined Architecture l Idealized Pipeline l Goal of Scheduling l Causes for
More informationImpact of Extending Side Channel Attack on Cipher Variants: A Case Study with the HC Series of Stream Ciphers
Impact of Extending Side Channel Attack on Cipher Variants: A Case Study with the HC Series of Stream Ciphers Goutam Paul and Shashwat Raizada Jadavpur University, Kolkata and Indian Statistical Institute,
More information1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p
UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Study Homework: Arithmetic NTU IC54CA (Fall 2004) SOLUTIONS Short adders A The delay of the ripple
More informationCSE241 VLSI Digital Circuits Winter Lecture 07: Timing II
CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II CSE241 L3 ASICs.1 Delay Calculation Cell Fall Cap\Tr 0.05 0.2 0.5 0.01 0.02 0.16 0.30 0.5 2.0 0.04 0.32 0.178 0.08 0.64 0.60 1.20 0.1ns 0.147ns
More informationDMP. Deterministic Shared Memory Multiprocessing. Presenter: Wu, Weiyi Yale University
DMP Deterministic Shared Memory Multiprocessing 1 Presenter: Wu, Weiyi Yale University Outline What is determinism? How to make execution deterministic? What s the overhead of determinism? 2 What Is Determinism?
More informationSEMICONDUCTOR MEMORIES
SEMICONDUCTOR MEMORIES Semiconductor Memory Classification RWM NVRWM ROM Random Access Non-Random Access EPROM E 2 PROM Mask-Programmed Programmable (PROM) SRAM FIFO FLASH DRAM LIFO Shift Register CAM
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationSemiconductor memories
Semiconductor memories Semiconductor Memories Data in Write Memory cell Read Data out Some design issues : How many cells? Function? Power consuption? Access type? How fast are read/write operations? Semiconductor
More informationEnsemble Consistency Testing for CESM: A new form of Quality Assurance
Ensemble Consistency Testing for CESM: A new form of Quality Assurance Dorit Hammerling Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research (NCAR) Joint work with
More informationHYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017
HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher
More informationTimeline of a Vulnerability
Timeline of a Vulnerability Is this all a conspiracy? Vulnerability existed for many years 1 Daniel Gruss, Moritz Lipp, Michael Schwarz www.iaik.tugraz.at Timeline of a Vulnerability Is this all a conspiracy?
More informationAnnouncements. Project #1 grades were returned on Monday. Midterm #1. Project #2. Requests for re-grades due by Tuesday
Announcements Project #1 grades were returned on Monday Requests for re-grades due by Tuesday Midterm #1 Re-grade requests due by Monday Project #2 Due 10 AM Monday 1 Page State (hardware view) Page frame
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN
More informationCoding for loss tolerant systems
Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon
More informationStat Mech: Problems II
Stat Mech: Problems II [2.1] A monatomic ideal gas in a piston is cycled around the path in the PV diagram in Fig. 3 Along leg a the gas cools at constant volume by connecting to a cold heat bath at a
More informationI/O Devices. Device. Lecture Notes Week 8
I/O Devices CPU PC ALU System bus Memory bus Bus interface I/O bridge Main memory USB Graphics adapter I/O bus Disk other devices such as network adapters Mouse Keyboard Disk hello executable stored on
More informationICS 233 Computer Architecture & Assembly Language
ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by
More informationCMP N 301 Computer Architecture. Appendix C
CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)
More informationCOMPUTER SCIENCE TRIPOS
CST0.2017.2.1 COMPUTER SCIENCE TRIPOS Part IA Thursday 8 June 2017 1.30 to 4.30 COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit the
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 13: Power & Energy Slides developed by Milo Mar0n & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by
More informationDepartment of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.
Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19
More informationScalable Store-Load Forwarding via Store Queue Index Prediction
Scalable Store-Load Forwarding via Store Queue Index Prediction Tingting Sha, Milo M.K. Martin, Amir Roth University of Pennsylvania {shatingt, milom, amir}@cis.upenn.edu addr addr addr (CAM) predictor
More informationAnalysis and Construction of Galois Fields for Efficient Storage Reliability
Analysis and Construction of Galois Fields for Efficient Storage Reliability Technical Report UCSC-SSRC-07-09 Kevin M. Greenan (kmgreen@cs.ucsc.edu) Ethan L. Miller (elm@cs.ucsc.edu) Thomas J. E. Schwarz,
More informationInstruction Set Extensions for Reed-Solomon Encoding and Decoding
Instruction Set Extensions for Reed-Solomon Encoding and Decoding Suman Mamidi and Michael J Schulte Dept of ECE University of Wisconsin-Madison {mamidi, schulte}@caewiscedu http://mesaecewiscedu Daniel
More informationDesign and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources
Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources Jan Reineke @ saarland university ACACES Summer School 2017 Fiuggi, Italy computer science Fixed-Priority
More informationDaniel J. Bernstein University of Illinois at Chicago. means an algorithm that a quantum computer can run.
Quantum algorithms 1 Daniel J. Bernstein University of Illinois at Chicago Quantum algorithm means an algorithm that a quantum computer can run. i.e. a sequence of instructions, where each instruction
More informationLeveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks
Leveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks Stefan Metzlaff, Sebastian Weis, and Theo Ungerer Department of Computer Science,
More informationGMU, ECE 680 Physical VLSI Design 1
ECE680: Physical VLSI Design Chapter VIII Semiconductor Memory (chapter 12 in textbook) 1 Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationPerformance, Power & Energy
Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?
More informationCaches in WCET Analysis
Caches in WCET Analysis Jan Reineke Department of Computer Science Saarland University Saarbrücken, Germany ARTIST Summer School in Europe 2009 Autrans, France September 7-11, 2009 Jan Reineke Caches in
More informationSemiconductor Memories
Semiconductor References: Adapted from: Digital Integrated Circuits: A Design Perspective, J. Rabaey UCB Principles of CMOS VLSI Design: A Systems Perspective, 2nd Ed., N. H. E. Weste and K. Eshraghian
More informationLooking at a two binary digit sum shows what we need to extend addition to multiple binary digits.
A Full Adder The half-adder is extremely useful until you want to add more that one binary digit quantities. The slow way to develop a two binary digit adders would be to make a truth table and reduce
More information2. Accelerated Computations
2. Accelerated Computations 2.1. Bent Function Enumeration by a Circular Pipeline Implemented on an FPGA Stuart W. Schneider Jon T. Butler 2.1.1. Background A naive approach to encoding a plaintext message
More informationOptimized LU-decomposition with Full Pivot for Small Batched Matrices S3069
Optimized LU-decomposition with Full Pivot for Small Batched Matrices S369 Ian Wainwright High Performance Consulting Sweden ian.wainwright@hpcsweden.se Based on work for GTC 212: 1x speed-up vs multi-threaded
More informationAnalysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation*
Analysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation* Meng Xu Linh Thi Xuan Phan Hyon-Young Choi Insup Lee University of Pennsylvania Abstract We introduce
More informationCalculating Algebraic Signatures Thomas Schwarz, S.J.
Calculating Algebraic Signatures Thomas Schwarz, S.J. 1 Introduction A signature is a small string calculated from a large object. The primary use of signatures is the identification of objects: equal
More informationRollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash:
Rollback-Recovery Uncoordinated Checkpointing Easy to understand No synchronization overhead p!! Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart How (not)to
More informationA Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )
A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and
More informationHardware implementations of ECC
Hardware implementations of ECC The University of Electro- Communications Introduction Public- key Cryptography (PKC) The most famous PKC is RSA and ECC Used for key agreement (Diffie- Hellman), digital
More informationModule 9: "Introduction to Shared Memory Multiprocessors" Lecture 17: "Introduction to Cache Coherence Protocols" Invalidation vs.
Invalidation vs. Update Sharing patterns Migratory hand-off States of a cache line Stores MSI protocol MSI example MESI protocol MESI example MOESI protocol Hybrid inval+update file:///e /parallel_com_arch/lecture17/17_1.htm[6/13/2012
More informationESE 570: Digital Integrated Circuits and VLSI Fundamentals
ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 19: March 29, 2018 Memory Overview, Memory Core Cells Today! Charge Leakage/Charge Sharing " Domino Logic Design Considerations! Logic Comparisons!
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationWorst-Case Execution Time Analysis. LS 12, TU Dortmund
Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper
More informationMagnetic core memory (1951) cm 2 ( bit)
Magnetic core memory (1951) 16 16 cm 2 (128 128 bit) Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory Random Access Non-Random Access EPROM E 2 PROM
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationSystem Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address
Intel 8080 CPU block diagram 8 System Data Bus (8-bit) Data Buffer Registry Array B 8 C Internal Data Bus (8-bit) F D E H L ALU SP A PC Address Buffer 16 System Address Bus (16-bit) Internal register addressing:
More informationDrowsy cache partitioning for reduced static and dynamic energy in the cache hierarchy
Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 6-1-2012 Drowsy cache partitioning for reduced static and dynamic energy in the cache hierarchy Brendan Fitzgerald
More informationCIS 371 Computer Organization and Design
CIS 371 Computer Organization and Design Unit 7: Caches Based on slides by Prof. Amir Roth & Prof. Milo Martin CIS 371: Comp. Org. Prof. Milo Martin Caches 1 This Unit: Caches I$ Core L2 D$ Main Memory
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 18: Sharing Patterns and Cache Coherence Protocols. The Lecture Contains:
The Lecture Contains: Invalidation vs. Update Sharing Patterns Migratory Hand-off States of a Cache Line Stores MSI Protocol State Transition MSI Example MESI Protocol MESI Example MOESI Protocol MOSI
More informationCMP 338: Third Class
CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does
More informationESE 570: Digital Integrated Circuits and VLSI Fundamentals
ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 21: April 4, 2017 Memory Overview, Memory Core Cells Penn ESE 570 Spring 2017 Khanna Today! Memory " Classification " ROM Memories " RAM Memory
More informationCSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT
CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch
More informationDigital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories
Digital Integrated Circuits A Design Perspective Semiconductor Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies Semiconductor Memory Classification
More informationP Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.
Exam INFO-H-417 Database System Architecture 13 January 2014 Name: ULB Student ID: P Q1 Q2 Q3 Q4 Q5 Tot (60 (20 (20 (20 (60 (20 (200 Exam modalities You are allotted a maximum of 4 hours to complete this
More information! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 9: March 9, 8 Memory Overview, Memory Core Cells Today! Charge Leakage/ " Domino Logic Design Considerations! Logic Comparisons! Memory " Classification
More informationIn-Memory Computing of Akers Logic Array
In-Memory Computing of Akers Logic Array Eitan Yaakobi Electrical Engineering California Institute of Technology Pasadena, CA 91125 yaakobi@caltechedu Anxiao (Andrew) Jiang Computer Science and Engineering
More informationAdministrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application
Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory
More informationCache-Oblivious Algorithms
Cache-Oblivious Algorithms 1 Cache-Oblivious Model 2 The Unknown Machine Algorithm C program gcc Object code linux Execution Can be executed on machines with a specific class of CPUs Algorithm Java program
More informationCache-Oblivious Algorithms
Cache-Oblivious Algorithms 1 Cache-Oblivious Model 2 The Unknown Machine Algorithm C program gcc Object code linux Execution Can be executed on machines with a specific class of CPUs Algorithm Java program
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationDigital Integrated Circuits A Design Perspective
Semiconductor Memories Adapted from Chapter 12 of Digital Integrated Circuits A Design Perspective Jan M. Rabaey et al. Copyright 2003 Prentice Hall/Pearson Outline Memory Classification Memory Architectures
More informationTimeline of a Vulnerability
Introduction Timeline of a Vulnerability Is this all a conspiracy? Vulnerability existed for many years 2 Michael Schwarz (@misc0110) www.iaik.tugraz.at Timeline of a Vulnerability Is this all a conspiracy?
More informationClojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014
Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationLecture: Pipelining Basics
Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:
More informationSpeculative Parallelism in Cilk++
Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel
More informationIBM Research Report. Performance Metrics for Erasure Codes in Storage Systems
RJ 10321 (A0408-003) August 2, 2004 Computer Science IBM Research Report Performance Metrics for Erasure Codes in Storage Systems James Lee Hafner, Veera Deenadhayalan, Tapas Kanungo, KK Rao IBM Research
More informationCS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/
CS-206 Concurrency Lecture 13 Wrap Up Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ Created by Nooshin Mirzadeh, Georgios Psaropoulos and Babak Falsafi EPFL Copyright 2015 EPFL CS-206 Spring
More informationA General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY
A General-Purpose Counting Filter: Making Every Bit Count Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY Approximate Membership Query (AMQ) insert(x) ismember(x)
More informationComparing the Effects of Intermittent and Transient Hardware Faults on Programs
Comparing the Effects of Intermittent and Transient Hardware Faults on Programs Jiesheng Wei, Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Department of Electrical and Computer Engineering,
More informationFall 2008 CSE Qualifying Exam. September 13, 2008
Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for
More information