DMP. Deterministic Shared Memory Multiprocessing. Presenter: Wu, Weiyi Yale University
|
|
- Job Garrett
- 6 years ago
- Views:
Transcription
1 DMP Deterministic Shared Memory Multiprocessing 1 Presenter: Wu, Weiyi Yale University
2 Outline What is determinism? How to make execution deterministic? What s the overhead of determinism? 2
3 What Is Determinism? The SAME input, the SAME output The same execution sequence (with Linearizability) (strong) The same resource-accessing ordering (weak) 3
4 (Weak) Determinism ld X ld X ld X ld Y ld Y st S st S ld Y Time Flow st S st T st X st T st T Data Dependency st X st X st Y st Y st Y Concurrent Program Execution 1 Execution 2 4
5 Non-Determinism ld X ld X ld X ld Y ld Y st S st S ld Y Time Flow st S st T st T st T st X Communications st X st X st Y st Y st Y Concurrent Program Execution 1 Execution 2 5
6 Non-determinism in Non-Determinism M barnes 2M 3M 4M 5M Non-Determinism Practice 1M ocean-contig 2M 3M 4M 5M Time (Insns.) Time (Insns.) Figure 3. Amount of nondeterminism over the execution of barnes and ocean-contig. The x axis is the position in the execution where each sample of 100,000 instructions was taken. The y axis is ND (Eq. 1) computed for each sample in the execution. 6
7 Is Record n Replay Deterministic? Yes! When the log is replayed deterministically, the replayed execution is definitely deterministic No... It cannot deploy to any machines and still run deterministically for any input 7
8 How to Execute Deterministically? Eliminating all instruction communications Deterministically arrange all communications Read-after-Write Write-after-Write, Write-after-Read 8
9 DMP-Serial No concurrency! Make it single-threaded! Programs are divided into quanta All processors act as one Only execute when hold token Deterministically pass token when quanta end 9
10 Example of DMP-Serial ld X ld X ld Y ld X ld Y st S Time Flow st S st S ld Y st T st X st T st T Token Pass st Y st X st Y st X st Y Quantum Concurrent Program Small Quanta Bigger Quanta 10
11 Recovering Parallelism DMP-Serial is way too strong Weak determinism is also acceptable Instructions without communication can execute concurrently Deterministically schedule communications 11
12 DMP-ShTab Break quantum into parallel prefix and serial suffix (dynamically), do prefix concurrently Deterministically control shared memory access Only read from Shared memory position Only write to Private memory position 12
13 Example of DMP-ShTab ld X ld Y ld X ld X st S st S ld Y st S ld Y Time Flow st T st T st T st X st X st Y Token Pass st Y st X st Y Quantum Concurrent Program DMP-Serial DMP-ShTab 13
14 Similarity to MESI coherence protocol Probe Write Hit Reset Invalid Read Miss, Exclusive Exclusive Read Hit Read Miss, Shared Probe Write Hit Probe Read Hit Probe Write Hit Write Hit Read Hit Probe Read Hit Shared Probe Read Hit Write Hit Write Miss Modified Read Hit Write Hit 14
15 Similarity to MESI coherence protocol Probe Write Hit Reset Invalid Read Miss, Exclusive Exclusive Read Hit Read Miss, Shared Probe Write Hit Probe Read Hit Probe Write Hit Write Hit Read Hit Probe Read Hit Shared Probe Read Hit Write Hit Write Miss Modified Read Hit Write Hit Shared Private 14
16 DMP-TM Atomicity, isolation and deterministic total order of quanta will guarantee determinism Make quantum transaction to explicitly execute concurrently Deterministically commit Abort and re-execute latter quantum when conflicting 15
17 DMP-TMFwd Read-after-Write will be no conflict if propagating modified values in time Forward writes across transactions Need to abort infected transactions when aborting 16
18 Example of DMP-TM* N memory operation deterministic token passing deterministic order atomic quantum uncommitted value flow P0 P1 P0 P1 1 ld B st C (squash) st C 1 ld B 2 3 ld T st C 2 3 ld B ld B 4 (a) Pure TM ld B 4 (b) TM w/ uncommitted value flow Figure 7. Recovering parallelism by executing quanta as memory transactions (a). Avoiding unnecessary squashes with un-committed data forwarding (b). 17
19 How to build quanta? Split code evenly - QB-Count The less waiting time, the better No spinning on lock - QB-SyncFollow (token is enough for synchronization) End quantum when finishing working on shared data others wait for - QB-Sharing 18
20 Example of QB-SyncFollowing forward immediately if a thread starts spinning on a lock. memory operation deterministic token passing N deterministic order atomic quantum P0 P1 P0 P1 lock L lock L 1 unlock L lock L... spin... (wasted work) 2 1 unlock L lock L (grab) 2 3 grab lock L 3 4 (a) Regular quantum break 4 (b) Better quantum break Figure 8. Example of a situation when better quantum breaking policies leads to better performance. 19
21 Example of QB-Sharing P0 P1 P0 P1 P0 P1 (a) Parallel execution (b) Deterministic serialized execution at a fine-grain (c) Deterministic serialized execution at a coarse-grain Figure 4. Deterministic serialization of memory operations. Dots are memory operations and dashed arrows are happens-before synchronization. 20
22 Overhead & Scalability Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-Serial runtime normalized to non-deterministic parallel execution barnes cholesky fft(p) 16 4 fmm 8 4 lu-nc 8 ocean-c(p) radix(p) 4 8 vlrend 16 water-sp SPLASH blacksch bodytr 4 8 fluid strmcl PARSEC gmean gmean Figure 9. Runtime overheads with 4, 8 and 16 threads. (P) indicates page-level conflict detection; line-level otherwise. up over word-granularity 500% 450% 400% 350% 300% 250% 200% 150% 100% Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab 21 eedup over QB-Count 180% 160% 140% 120% 100% 80% 60% 40% Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-Serial
23 Quanta Sensitivity Figure 9. Runtime overheads with 4, 8 and 16 threads. (P) indicates page-level conflict detection; line-level otherwise. however, Hw-DMP-Serial is less affected by quantum size. % speedup over word-granularity 500% 450% 400% 350% 300% 250% 200% 150% 100% 50% 0% -50% -100% % speedup over 1,000-insn quanta 40% 20% 0% -20% -40% -60% -80% -100% fft 2 X C lu-nc lu-nc ocean-c 2 X C vlrend radix SPLASH gmean vlrend 2 X C SPLASH gmean Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-TMFwd Hw-DMP-Serial Hw-DMP-TM Hw-DMP-ShTab 2 X C strmcl 2 X C PARSEC gmean Figure 10. Performance of 2,000 (2), 10,000 (X) and Figure100, Performance (C) instruction of page-granularity quanta, relative to conflict 1,000 detectiontion relative quanta. to instruc- line-granularity. bodytr fluid PARSEC gmean strmcl % speedup over QB-Count 180% 160% 140% 120% 100% 80% 60% 40% 20% 0% -20% -40% erate progress along the application s critical path. With 10,000-instruction quanta (Figure 13), the effects of the different quantum builders are more pronounced. In general, the quantum builders that take program synchronization into account (QB-SyncFollow and QB-SyncSharing) outperform those that do not. Hw-DMP-Serial and Hw-DMP-ShTab perform better with smarter quantum building, while the TMs sf ss barnes s sf ss radix(p) s sf ss SPLASH gmean Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-Serial s sf ss bodytr s sf ss strmcl s sf ss PARSEC gmean Figure 12. Performance of QB-Sharing (s), QB- SyncFollow (sf) and QB-SyncSharing (ss) quantum builders, relative to QB-Count, with 1,000-insn quanta. eedup over QB-Count % 160% 140% 120% 100% 80% 60% 40% Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-Serial
24 Software-DMP threads 4 threads 8 threads Sw-DMP-ShTab runtime normalized to nondeterministic parallel execution barnes fft fmm lu-c radix water-sp SPLASH gmean Figure 14. Runtime of Sw-DMP-ShTab relative to nondeterministic execution. y ability of Swity and energy trade-offs; (2) support for debugging; (3) interaction with operating system and I/O nondeterminism; 23 and
25 Conclusion (First?) implementation of DMP Demonstration of low-overhead determinism for concurrent programs Not pervasive No deterministic interrupts Can not save nondeterministic hardwares
26 Thanks~ 25
What is the Cost of Determinism? Cedomir Segulja, Tarek S. Abdelrahman University of Toronto
What is the Cost of Determinism? Cedomir Segulja, Tarek S. Abdelrahman University of Toronto Source: [Youtube] [Intel] Non-Determinism Same program + same input same output This is bad for Testing Too
More informationReal Time Operating Systems
Real Time Operating ystems Luca Abeni luca.abeni@unitn.it Interacting Tasks Until now, only independent tasks... A job never blocks or suspends A task only blocks on job termination In real world, jobs
More information1 st Semester 2007/2008
Chapter 17: System Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides baseados nos slides oficiais do livro Database System c Silberschatz, Korth and Sudarshan.
More informationCyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism
Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism Nima Honarmand, Nathan Dautenhahn, Josep Torrellas and Samuel T. King (UIUC) Gilles Pokam and Cristiano Pereira (Intel) iacoma.cs.uiuc.edu
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationOperating Systems. VII. Synchronization
Operating Systems VII. Synchronization Ludovic Apvrille ludovic.apvrille@telecom-paristech.fr Eurecom, office 470 http://soc.eurecom.fr/os/ @OS Eurecom Outline Synchronization issues 2/22 Fall 2017 Institut
More informationCIS 4930/6930: Principles of Cyber-Physical Systems
CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles
More informationRollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash:
Rollback-Recovery Uncoordinated Checkpointing Easy to understand No synchronization overhead p!! Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart How (not)to
More informationReal Time Operating Systems
Real Time Operating ystems hared Resources Luca Abeni Credits: Luigi Palopoli, Giuseppe Lipari, and Marco Di Natale cuola uperiore ant Anna Pisa -Italy Real Time Operating ystems p. 1 Interacting Tasks
More informationMICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE
MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By
More informationDistributed Transactions. CS5204 Operating Systems 1
Distributed Transactions CS5204 Operating Systems 1 . Transactions transactions T. TM T Distributed DBMS Model data manager DM T. TM T transaction manager network DM TM DM T T physical database CS 5204
More informationAnalytical Modeling of Parallel Programs (Chapter 5) Alexandre David
Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on
More informationAnalytical Modeling of Parallel Systems
Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing
More informationTDDB68 Concurrent programming and operating systems. Lecture: CPU Scheduling II
TDDB68 Concurrent programming and operating systems Lecture: CPU Scheduling II Mikael Asplund, Senior Lecturer Real-time Systems Laboratory Department of Computer and Information Science Copyright Notice:
More informationCEC 450 Real-Time Systems
E 450 Real-ime Systems Lecture 4 Rate Monotonic heory Part September 7, 08 Sam Siewert Quiz Results 93% Average, 75 low, 00 high Goal is to learn what you re not learning yet Motivation to keep up with
More informationVector Lane Threading
Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program
More informationCHAPTER 5 - PROCESS SCHEDULING
CHAPTER 5 - PROCESS SCHEDULING OBJECTIVES To introduce CPU scheduling, which is the basis for multiprogrammed operating systems To describe various CPU-scheduling algorithms To discuss evaluation criteria
More informationAndrew Morton University of Waterloo Canada
EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo Canada Outline 1) Introduction and motivation 2) Review of EDF and feasibility analysis 3) Hardware accelerators and scheduling
More informationUC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara
Operating Systems Christopher Kruegel Department of Computer Science http://www.cs.ucsb.edu/~chris/ Many processes to execute, but one CPU OS time-multiplexes the CPU by operating context switching Between
More informationLecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013
Lecture 3 Real-Time Scheduling Daniel Kästner AbsInt GmbH 203 Model-based Software Development 2 SCADE Suite Application Model in SCADE (data flow + SSM) System Model (tasks, interrupts, buses, ) SymTA/S
More informationSafety and Liveness. Thread Synchronization: Too Much Milk. Critical Sections. A Really Cool Theorem
Safety and Liveness Properties defined over an execution of a program Thread Synchronization: Too Much Milk Safety: nothing bad happens holds in every finite execution prefix Windows never crashes No patient
More informationEmbedded Systems Development
Embedded Systems Development Lecture 3 Real-Time Scheduling Dr. Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Model-based Software Development Generator Lustre programs Esterel programs
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More information/ : Computer Architecture and Design
16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)
More informationLecture Note #6: More on Task Scheduling EECS 571 Principles of Real-Time Embedded Systems Kang G. Shin EECS Department University of Michigan
Lecture Note #6: More on Task Scheduling EECS 571 Principles of Real-Time Embedded Systems Kang G. Shin EECS Department University of Michigan Note 6-1 Mars Pathfinder Timing Hiccups? When: landed on the
More informationShared resources. Sistemi in tempo reale. Giuseppe Lipari. Scuola Superiore Sant Anna Pisa -Italy
istemi in tempo reale hared resources Giuseppe Lipari cuola uperiore ant Anna Pisa -Italy inher.tex istemi in tempo reale Giuseppe Lipari 7/6/2005 12:35 p. 1/21 Interacting tasks Until now, we have considered
More informationSimulation of Process Scheduling Algorithms
Simulation of Process Scheduling Algorithms Project Report Instructor: Dr. Raimund Ege Submitted by: Sonal Sood Pramod Barthwal Index 1. Introduction 2. Proposal 3. Background 3.1 What is a Process 4.
More informationFall 2008 CSE Qualifying Exam. September 13, 2008
Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for
More informationLSN 15 Processor Scheduling
LSN 15 Processor Scheduling ECT362 Operating Systems Department of Engineering Technology LSN 15 Processor Scheduling LSN 15 FCFS/FIFO Scheduling Each process joins the Ready queue When the current process
More informationLatches. October 13, 2003 Latches 1
Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory
More informationNotation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing
Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE
More informationAllocate-On-Use Space Complexity of Shared-Memory Algorithms
1 2 3 4 5 6 7 8 9 10 Allocate-On-Use Space Complexity of Shared-Memory Algorithms James Aspnes 1 Yale University Department of Computer Science Bernhard Haeupler Carnegie Mellon School of Computer Science
More informationCMP N 301 Computer Architecture. Appendix C
CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)
More informationReal-Time Software Transactional Memory: Contention Managers, Time Bounds, and Implementations
Real-Time Software Transactional Memory: Contention Managers, Time Bounds, and Implementations Mohammed El-Shambakey Dissertation Submitted to the Faculty of the Virginia Polytechnic Institute and State
More informationAccelerating Decoupled Look-ahead to Exploit Implicit Parallelism
Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Raj Parihar Advisor: Prof. Michael C. Huang March 22, 2013 Raj Parihar Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism
More informationCSE 380 Computer Operating Systems
CSE 380 Computer Operating Systems Instructor: Insup Lee & Dianna Xu University of Pennsylvania, Fall 2003 Lecture Note 3: CPU Scheduling 1 CPU SCHEDULING q How can OS schedule the allocation of CPU cycles
More information1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007
Lecture 1 Introduction 1 Introduction 1.1 The Problem Domain Today, we are going to ask whether a system can recover from perturbation. Consider a children s top: If it is perfectly vertically, you can
More informationThis Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example
This Unit: Scheduling (Static + Dnamic) CIS 50 Computer Architecture Unit 8: Static and Dnamic Scheduling Application OS Compiler Firmware CPU I/O Memor Digital Circuits Gates & Transistors! Previousl:!
More informationOrigami: Folding Warps for Energy Efficient GPUs
Origami: Folding Warps for Energy Efficient GPUs Mohammad Abdel-Majeed*, Daniel Wong, Justin Huang and Murali Annavaram* * University of Southern alifornia University of alifornia, Riverside Stanford University
More informationTransactional Information Systems:
Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery Gerhard Weikum and Gottfried Vossen 2002 Morgan Kaufmann ISBN 1-55860-508-8 Teamwork is essential.
More informationDeadlock Ezio Bartocci Institute for Computer Engineering
TECHNISCHE UNIVERSITÄT WIEN Fakultät für Informatik Cyber-Physical Systems Group Deadlock Ezio Bartocci Institute for Computer Engineering ezio.bartocci@tuwien.ac.at Deadlock Permanent blocking of a set
More informationCounters. We ll look at different kinds of counters and discuss how to build them
Counters We ll look at different kinds of counters and discuss how to build them These are not only examples of sequential analysis and design, but also real devices used in larger circuits 1 Introducing
More informationLab Course: distributed data analytics
Lab Course: distributed data analytics 01. Threading and Parallelism Nghia Duong-Trung, Mohsan Jameel Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany International
More informationECE 571 Advanced Microprocessor-Based Design Lecture 10
ECE 571 Advanced Microprocessor-Based Design Lecture 10 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 23 February 2017 Announcements HW#5 due HW#6 will be posted 1 Oh No, More
More informationINF 4140: Models of Concurrency Series 3
Universitetet i Oslo Institutt for Informatikk PMA Olaf Owe, Martin Steffen, Toktam Ramezani INF 4140: Models of Concurrency Høst 2016 Series 3 14. 9. 2016 Topic: Semaphores (Exercises with hints for solution)
More informationParallel Performance Theory - 1
Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis
More informationExtending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations
September 18, 2012 Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations Yuxing Peng, Chris Knight, Philip Blood, Lonnie Crosby, and Gregory A. Voth Outline Proton Solvation
More informationDeadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]
Deadlock CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] 1 Outline Resources Deadlock Deadlock Prevention Deadlock Avoidance Deadlock Detection Deadlock Recovery 2 Review: Synchronization
More informationPerformance and Scalability. Lars Karlsson
Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix
More informationarxiv: v1 [cs.dc] 9 Feb 2015
Why Transactional Memory Should Not Be Obstruction-Free Petr Kuznetsov 1 Srivatsan Ravi 2 1 Télécom ParisTech 2 TU Berlin arxiv:1502.02725v1 [cs.dc] 9 Feb 2015 Abstract Transactional memory (TM) is an
More informationModule 5: CPU Scheduling
Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained
More informationarxiv: v2 [cs.dc] 20 Nov 2017
The Optimal Pessimistic Transactional Memory Algorithm arxiv:1605.01361v2 [cs.dc] 20 Nov 2017 Paweł T. Wojciechowski, Konrad Siek {pawel.t.wojciechowski,konrad.siek}@cs.put.edu.pl Institute of Computing
More informationAbstractions and Decision Procedures for Effective Software Model Checking
Abstractions and Decision Procedures for Effective Software Model Checking Prof. Natasha Sharygina The University of Lugano, Carnegie Mellon University Microsoft Summer School, Moscow, July 2011 Lecture
More informationReal-Time Scheduling and Resource Management
ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 2008 Real-Time Scheduling and Resource Management Lecturer: Giorgio Buttazzo Full Professor Scuola Superiore Sant Anna
More informationLogic Design II (17.342) Spring Lecture Outline
Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least
More informationChapter 6: CPU Scheduling
Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 6.1 Basic Concepts Maximum CPU utilization obtained
More informationEnergy-efficient Mapping of Big Data Workflows under Deadline Constraints
Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute
More informationSpeculative Parallelism in Cilk++
Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel
More informationLoop Parallelization Techniques and dependence analysis
Loop Parallelization Techniques and dependence analysis Data-Dependence Analysis Dependence-Removing Techniques Parallelizing Transformations Performance-enchancing Techniques 1 When can we run code in
More informationModule 9: "Introduction to Shared Memory Multiprocessors" Lecture 17: "Introduction to Cache Coherence Protocols" Invalidation vs.
Invalidation vs. Update Sharing patterns Migratory hand-off States of a cache line Stores MSI protocol MSI example MESI protocol MESI example MOESI protocol Hybrid inval+update file:///e /parallel_com_arch/lecture17/17_1.htm[6/13/2012
More information6. Finite State Machines
6. Finite State Machines 6.4x Computation Structures Part Digital Circuits Copyright 25 MIT EECS 6.4 Computation Structures L6: Finite State Machines, Slide # Our New Machine Clock State Registers k Current
More informationClojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014
Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,
More informationTowards a Practical Secure Concurrent Language
Towards a Practical Secure Concurrent Language Stefan Muller and Stephen Chong TR-05-12 Computer Science Group Harvard University Cambridge, Massachusetts Towards a Practical Secure Concurrent Language
More informationAnalytical Modeling of Parallel Programs. S. Oliveira
Analytical Modeling of Parallel Programs S. Oliveira Fall 2005 1 Scalability of Parallel Systems Efficiency of a parallel program E = S/P = T s /PT p Using the parallel overhead expression E = 1/(1 + T
More informationVariants of Turing Machine (intro)
CHAPTER 3 The Church-Turing Thesis Contents Turing Machines definitions, examples, Turing-recognizable and Turing-decidable languages Variants of Turing Machine Multi-tape Turing machines, non-deterministic
More informationTHE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY. Daniel Sanchez and Christos Kozyrakis Stanford University
THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY Daniel Sanchez and Christos Kozyrakis Stanford University MICRO-43, December 6 th 21 Executive Summary 2 Mitigating the memory wall requires large, highly
More informationCPU Scheduling. Heechul Yun
CPU Scheduling Heechul Yun 1 Recap Four deadlock conditions: Mutual exclusion No preemption Hold and wait Circular wait Detection Avoidance Banker s algorithm 2 Recap: Banker s Algorithm 1. Initialize
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 18: Sharing Patterns and Cache Coherence Protocols. The Lecture Contains:
The Lecture Contains: Invalidation vs. Update Sharing Patterns Migratory Hand-off States of a Cache Line Stores MSI Protocol State Transition MSI Example MESI Protocol MESI Example MOESI Protocol MOSI
More informationTDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.
TDDI4 Concurrent Programming, Operating Systems, and Real-time Operating Systems CPU Scheduling Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Multiprocessor
More informationCPU SCHEDULING RONG ZHENG
CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM
More informationNondeterminism. September 7, Nondeterminism
September 7, 204 Introduction is a useful concept that has a great impact on the theory of computation Introduction is a useful concept that has a great impact on the theory of computation So far in our
More informationPart I: Definitions and Properties
Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States
More informationProcess Scheduling for RTS. RTS Scheduling Approach. Cyclic Executive Approach
Process Scheduling for RTS Dr. Hugh Melvin, Dept. of IT, NUI,G RTS Scheduling Approach RTS typically control multiple parameters concurrently Eg. Flight Control System Speed, altitude, inclination etc..
More informationSolution of Linear Systems
Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing
More informationCISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata
CISC 4090: Theory of Computation Chapter Regular Languages Xiaolan Zhang, adapted from slides by Prof. Werschulz Section.: Finite Automata Fordham University Department of Computer and Information Sciences
More informationReducing NVM Writes with Optimized Shadow Paging
Reducing NVM Writes with Optimized Shadow Paging Yuanjiang Ni, Jishen Zhao, Daniel Bittman, Ethan L. Miller Center for Research in Storage Systems University of California, Santa Cruz Emerging Technology
More informationComplexity Theory Part I
Complexity Theory Part I Outline for Today Recap from Last Time Reviewing Verifiers Nondeterministic Turing Machines What does nondeterminism mean in the context of TMs? And just how powerful are NTMs?
More informationMarwan Burelle. Parallel and Concurrent Programming. Introduction and Foundation
and and marwan.burelle@lse.epita.fr http://wiki-prog.kh405.net Outline 1 2 and 3 and Evolutions and Next evolutions in processor tends more on more on growing of cores number GPU and similar extensions
More informationHighly-scalable branch and bound for maximum monomial agreement
Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program
More informationAppendix A. Formal Proofs
Distributed Reasoning for Multiagent Simple Temporal Problems Appendix A Formal Proofs Throughout this paper, we provided proof sketches to convey the gist of the proof when presenting the full proof would
More informationCS 6112 (Fall 2011) Foundations of Concurrency
CS 6112 (Fall 2011) Foundations of Concurrency 29 November 2011 Scribe: Jean-Baptiste Jeannin 1 Readings The readings for today were: Eventually Consistent Transactions, by Sebastian Burckhardt, Manuel
More informationSpecial Nodes for Interface
fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster
More informationModels: Amdahl s Law, PRAM, α-β Tal Ben-Nun
spcl.inf.ethz.ch @spcl_eth Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun Design of Parallel and High-Performance Computing Fall 2017 DPHPC Overview cache coherency memory models 2 Speedup An application
More informationSynchronous Reactive Systems
Synchronous Reactive Systems Stephen Edwards sedwards@synopsys.com Synopsys, Inc. Outline Synchronous Reactive Systems Heterogeneity and Ptolemy Semantics of the SR Domain Scheduling the SR Domain 2 Reactive
More informationLeveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks
Leveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks Stefan Metzlaff, Sebastian Weis, and Theo Ungerer Department of Computer Science,
More informationScheduling I. Today Introduction to scheduling Classical algorithms. Next Time Advanced topics on scheduling
Scheduling I Today Introduction to scheduling Classical algorithms Next Time Advanced topics on scheduling Scheduling out there You are the manager of a supermarket (ok, things don t always turn out the
More informationCSE 548: Analysis of Algorithms. Lecture 12 ( Analyzing Parallel Algorithms )
CSE 548: Analysis of Algorithms Lecture 12 ( Analyzing Parallel Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Fall 2017 Why Parallelism? Moore s Law Source: Wikipedia
More information2/5/07 CSE 30341: Operating Systems Principles
page 1 Shortest-Job-First (SJR) Scheduling Associate with each process the length of its next CPU burst. Use these lengths to schedule the process with the shortest time Two schemes: nonpreemptive once
More informationCS/IT OPERATING SYSTEMS
CS/IT 5 (CR) Total No. of Questions :09] [Total No. of Pages : 0 II/IV B.Tech. DEGREE EXAMINATIONS, DECEMBER- 06 CS/IT OPERATING SYSTEMS. a) System Boot Answer Question No. Compulsory. Answer One Question
More informationMark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions
Lecture Slides Intro Number Systems Logic Functions EE 0 in Context EE 0 EE 20L Logic Design Fundamentals Logic Design, CAD Tools, Lab tools, Project EE 357 EE 457 Computer Architecture Using the logic
More informationClock-driven scheduling
Clock-driven scheduling Also known as static or off-line scheduling Michal Sojka Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering November 8, 2017
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationEmbedded Systems Development
Embedded Systems Development Lecture 2 Finite Automata & SyncCharts Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Some things I forgot to mention 2 Remember the HISPOS registration
More informationLightweight Real-Time Synchronization under P-EDF on Symmetric and Asymmetric Multiprocessors
Consistent * Complete * Well Documented * Easy to Reuse * Technical Report MPI-SWS-216-3 May 216 Lightweight Real-Time Synchronization under P-EDF on Symmetric and Asymmetric Multiprocessors (extended
More informationQuantum Cluster Simulations of Low D Systems
Quantum Cluster Simulations of Low D Systems Electronic Correlations on Many Length Scales M. Jarrell, University of Cincinnati High Perf. QMC Hybrid Method SP Sep. in 1D NEW MEM SP Sep. in 1D 2-Chain
More informationDesign Patterns and Refactoring
Singleton Oliver Haase HTWG Konstanz 1 / 19 Description I Classification: object based creational pattern Puropse: ensure that a class can be instantiated exactly once provide global access point to single
More informationAdders allow computers to add numbers 2-bit ripple-carry adder
Lecture 12 Logistics HW was due yesterday HW5 was out yesterday (due next Wednesday) Feedback: thank you! Things to work on: ig picture, ook chapters, Exam comments Last lecture dders Today Clarification
More informationReview for the Midterm Exam
Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations
More informationRecap from the previous lecture on Analytical Modeling
COSC 637 Parallel Computation Analytical Modeling of Parallel Programs (II) Edgar Gabriel Fall 20 Recap from the previous lecture on Analytical Modeling Speedup: S p = T s / T p (p) Efficiency E = S p
More informationScheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling
Scheduling I Today! Introduction to scheduling! Classical algorithms Next Time! Advanced topics on scheduling Scheduling out there! You are the manager of a supermarket (ok, things don t always turn out
More information