DMP. Deterministic Shared Memory Multiprocessing. Presenter: Wu, Weiyi Yale University

Size: px
Start display at page:

Download "DMP. Deterministic Shared Memory Multiprocessing. Presenter: Wu, Weiyi Yale University"

Transcription

1 DMP Deterministic Shared Memory Multiprocessing 1 Presenter: Wu, Weiyi Yale University

2 Outline What is determinism? How to make execution deterministic? What s the overhead of determinism? 2

3 What Is Determinism? The SAME input, the SAME output The same execution sequence (with Linearizability) (strong) The same resource-accessing ordering (weak) 3

4 (Weak) Determinism ld X ld X ld X ld Y ld Y st S st S ld Y Time Flow st S st T st X st T st T Data Dependency st X st X st Y st Y st Y Concurrent Program Execution 1 Execution 2 4

5 Non-Determinism ld X ld X ld X ld Y ld Y st S st S ld Y Time Flow st S st T st T st T st X Communications st X st X st Y st Y st Y Concurrent Program Execution 1 Execution 2 5

6 Non-determinism in Non-Determinism M barnes 2M 3M 4M 5M Non-Determinism Practice 1M ocean-contig 2M 3M 4M 5M Time (Insns.) Time (Insns.) Figure 3. Amount of nondeterminism over the execution of barnes and ocean-contig. The x axis is the position in the execution where each sample of 100,000 instructions was taken. The y axis is ND (Eq. 1) computed for each sample in the execution. 6

7 Is Record n Replay Deterministic? Yes! When the log is replayed deterministically, the replayed execution is definitely deterministic No... It cannot deploy to any machines and still run deterministically for any input 7

8 How to Execute Deterministically? Eliminating all instruction communications Deterministically arrange all communications Read-after-Write Write-after-Write, Write-after-Read 8

9 DMP-Serial No concurrency! Make it single-threaded! Programs are divided into quanta All processors act as one Only execute when hold token Deterministically pass token when quanta end 9

10 Example of DMP-Serial ld X ld X ld Y ld X ld Y st S Time Flow st S st S ld Y st T st X st T st T Token Pass st Y st X st Y st X st Y Quantum Concurrent Program Small Quanta Bigger Quanta 10

11 Recovering Parallelism DMP-Serial is way too strong Weak determinism is also acceptable Instructions without communication can execute concurrently Deterministically schedule communications 11

12 DMP-ShTab Break quantum into parallel prefix and serial suffix (dynamically), do prefix concurrently Deterministically control shared memory access Only read from Shared memory position Only write to Private memory position 12

13 Example of DMP-ShTab ld X ld Y ld X ld X st S st S ld Y st S ld Y Time Flow st T st T st T st X st X st Y Token Pass st Y st X st Y Quantum Concurrent Program DMP-Serial DMP-ShTab 13

14 Similarity to MESI coherence protocol Probe Write Hit Reset Invalid Read Miss, Exclusive Exclusive Read Hit Read Miss, Shared Probe Write Hit Probe Read Hit Probe Write Hit Write Hit Read Hit Probe Read Hit Shared Probe Read Hit Write Hit Write Miss Modified Read Hit Write Hit 14

15 Similarity to MESI coherence protocol Probe Write Hit Reset Invalid Read Miss, Exclusive Exclusive Read Hit Read Miss, Shared Probe Write Hit Probe Read Hit Probe Write Hit Write Hit Read Hit Probe Read Hit Shared Probe Read Hit Write Hit Write Miss Modified Read Hit Write Hit Shared Private 14

16 DMP-TM Atomicity, isolation and deterministic total order of quanta will guarantee determinism Make quantum transaction to explicitly execute concurrently Deterministically commit Abort and re-execute latter quantum when conflicting 15

17 DMP-TMFwd Read-after-Write will be no conflict if propagating modified values in time Forward writes across transactions Need to abort infected transactions when aborting 16

18 Example of DMP-TM* N memory operation deterministic token passing deterministic order atomic quantum uncommitted value flow P0 P1 P0 P1 1 ld B st C (squash) st C 1 ld B 2 3 ld T st C 2 3 ld B ld B 4 (a) Pure TM ld B 4 (b) TM w/ uncommitted value flow Figure 7. Recovering parallelism by executing quanta as memory transactions (a). Avoiding unnecessary squashes with un-committed data forwarding (b). 17

19 How to build quanta? Split code evenly - QB-Count The less waiting time, the better No spinning on lock - QB-SyncFollow (token is enough for synchronization) End quantum when finishing working on shared data others wait for - QB-Sharing 18

20 Example of QB-SyncFollowing forward immediately if a thread starts spinning on a lock. memory operation deterministic token passing N deterministic order atomic quantum P0 P1 P0 P1 lock L lock L 1 unlock L lock L... spin... (wasted work) 2 1 unlock L lock L (grab) 2 3 grab lock L 3 4 (a) Regular quantum break 4 (b) Better quantum break Figure 8. Example of a situation when better quantum breaking policies leads to better performance. 19

21 Example of QB-Sharing P0 P1 P0 P1 P0 P1 (a) Parallel execution (b) Deterministic serialized execution at a fine-grain (c) Deterministic serialized execution at a coarse-grain Figure 4. Deterministic serialization of memory operations. Dots are memory operations and dashed arrows are happens-before synchronization. 20

22 Overhead & Scalability Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-Serial runtime normalized to non-deterministic parallel execution barnes cholesky fft(p) 16 4 fmm 8 4 lu-nc 8 ocean-c(p) radix(p) 4 8 vlrend 16 water-sp SPLASH blacksch bodytr 4 8 fluid strmcl PARSEC gmean gmean Figure 9. Runtime overheads with 4, 8 and 16 threads. (P) indicates page-level conflict detection; line-level otherwise. up over word-granularity 500% 450% 400% 350% 300% 250% 200% 150% 100% Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab 21 eedup over QB-Count 180% 160% 140% 120% 100% 80% 60% 40% Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-Serial

23 Quanta Sensitivity Figure 9. Runtime overheads with 4, 8 and 16 threads. (P) indicates page-level conflict detection; line-level otherwise. however, Hw-DMP-Serial is less affected by quantum size. % speedup over word-granularity 500% 450% 400% 350% 300% 250% 200% 150% 100% 50% 0% -50% -100% % speedup over 1,000-insn quanta 40% 20% 0% -20% -40% -60% -80% -100% fft 2 X C lu-nc lu-nc ocean-c 2 X C vlrend radix SPLASH gmean vlrend 2 X C SPLASH gmean Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-TMFwd Hw-DMP-Serial Hw-DMP-TM Hw-DMP-ShTab 2 X C strmcl 2 X C PARSEC gmean Figure 10. Performance of 2,000 (2), 10,000 (X) and Figure100, Performance (C) instruction of page-granularity quanta, relative to conflict 1,000 detectiontion relative quanta. to instruc- line-granularity. bodytr fluid PARSEC gmean strmcl % speedup over QB-Count 180% 160% 140% 120% 100% 80% 60% 40% 20% 0% -20% -40% erate progress along the application s critical path. With 10,000-instruction quanta (Figure 13), the effects of the different quantum builders are more pronounced. In general, the quantum builders that take program synchronization into account (QB-SyncFollow and QB-SyncSharing) outperform those that do not. Hw-DMP-Serial and Hw-DMP-ShTab perform better with smarter quantum building, while the TMs sf ss barnes s sf ss radix(p) s sf ss SPLASH gmean Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-Serial s sf ss bodytr s sf ss strmcl s sf ss PARSEC gmean Figure 12. Performance of QB-Sharing (s), QB- SyncFollow (sf) and QB-SyncSharing (ss) quantum builders, relative to QB-Count, with 1,000-insn quanta. eedup over QB-Count % 160% 140% 120% 100% 80% 60% 40% Hw-DMP-TMFwd Hw-DMP-TM Hw-DMP-ShTab Hw-DMP-Serial

24 Software-DMP threads 4 threads 8 threads Sw-DMP-ShTab runtime normalized to nondeterministic parallel execution barnes fft fmm lu-c radix water-sp SPLASH gmean Figure 14. Runtime of Sw-DMP-ShTab relative to nondeterministic execution. y ability of Swity and energy trade-offs; (2) support for debugging; (3) interaction with operating system and I/O nondeterminism; 23 and

25 Conclusion (First?) implementation of DMP Demonstration of low-overhead determinism for concurrent programs Not pervasive No deterministic interrupts Can not save nondeterministic hardwares

26 Thanks~ 25

What is the Cost of Determinism? Cedomir Segulja, Tarek S. Abdelrahman University of Toronto

What is the Cost of Determinism? Cedomir Segulja, Tarek S. Abdelrahman University of Toronto What is the Cost of Determinism? Cedomir Segulja, Tarek S. Abdelrahman University of Toronto Source: [Youtube] [Intel] Non-Determinism Same program + same input same output This is bad for Testing Too

More information

Real Time Operating Systems

Real Time Operating Systems Real Time Operating ystems Luca Abeni luca.abeni@unitn.it Interacting Tasks Until now, only independent tasks... A job never blocks or suspends A task only blocks on job termination In real world, jobs

More information

1 st Semester 2007/2008

1 st Semester 2007/2008 Chapter 17: System Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides baseados nos slides oficiais do livro Database System c Silberschatz, Korth and Sudarshan.

More information

Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism

Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism Nima Honarmand, Nathan Dautenhahn, Josep Torrellas and Samuel T. King (UIUC) Gilles Pokam and Cristiano Pereira (Intel) iacoma.cs.uiuc.edu

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Operating Systems. VII. Synchronization

Operating Systems. VII. Synchronization Operating Systems VII. Synchronization Ludovic Apvrille ludovic.apvrille@telecom-paristech.fr Eurecom, office 470 http://soc.eurecom.fr/os/ @OS Eurecom Outline Synchronization issues 2/22 Fall 2017 Institut

More information

CIS 4930/6930: Principles of Cyber-Physical Systems

CIS 4930/6930: Principles of Cyber-Physical Systems CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles

More information

Rollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash:

Rollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash: Rollback-Recovery Uncoordinated Checkpointing Easy to understand No synchronization overhead p!! Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart How (not)to

More information

Real Time Operating Systems

Real Time Operating Systems Real Time Operating ystems hared Resources Luca Abeni Credits: Luigi Palopoli, Giuseppe Lipari, and Marco Di Natale cuola uperiore ant Anna Pisa -Italy Real Time Operating ystems p. 1 Interacting Tasks

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

Distributed Transactions. CS5204 Operating Systems 1

Distributed Transactions. CS5204 Operating Systems 1 Distributed Transactions CS5204 Operating Systems 1 . Transactions transactions T. TM T Distributed DBMS Model data manager DM T. TM T transaction manager network DM TM DM T T physical database CS 5204

More information

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on

More information

Analytical Modeling of Parallel Systems

Analytical Modeling of Parallel Systems Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

TDDB68 Concurrent programming and operating systems. Lecture: CPU Scheduling II

TDDB68 Concurrent programming and operating systems. Lecture: CPU Scheduling II TDDB68 Concurrent programming and operating systems Lecture: CPU Scheduling II Mikael Asplund, Senior Lecturer Real-time Systems Laboratory Department of Computer and Information Science Copyright Notice:

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems E 450 Real-ime Systems Lecture 4 Rate Monotonic heory Part September 7, 08 Sam Siewert Quiz Results 93% Average, 75 low, 00 high Goal is to learn what you re not learning yet Motivation to keep up with

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

CHAPTER 5 - PROCESS SCHEDULING

CHAPTER 5 - PROCESS SCHEDULING CHAPTER 5 - PROCESS SCHEDULING OBJECTIVES To introduce CPU scheduling, which is the basis for multiprogrammed operating systems To describe various CPU-scheduling algorithms To discuss evaluation criteria

More information

Andrew Morton University of Waterloo Canada

Andrew Morton University of Waterloo Canada EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo Canada Outline 1) Introduction and motivation 2) Review of EDF and feasibility analysis 3) Hardware accelerators and scheduling

More information

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara Operating Systems Christopher Kruegel Department of Computer Science http://www.cs.ucsb.edu/~chris/ Many processes to execute, but one CPU OS time-multiplexes the CPU by operating context switching Between

More information

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013 Lecture 3 Real-Time Scheduling Daniel Kästner AbsInt GmbH 203 Model-based Software Development 2 SCADE Suite Application Model in SCADE (data flow + SSM) System Model (tasks, interrupts, buses, ) SymTA/S

More information

Safety and Liveness. Thread Synchronization: Too Much Milk. Critical Sections. A Really Cool Theorem

Safety and Liveness. Thread Synchronization: Too Much Milk. Critical Sections. A Really Cool Theorem Safety and Liveness Properties defined over an execution of a program Thread Synchronization: Too Much Milk Safety: nothing bad happens holds in every finite execution prefix Windows never crashes No patient

More information

Embedded Systems Development

Embedded Systems Development Embedded Systems Development Lecture 3 Real-Time Scheduling Dr. Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Model-based Software Development Generator Lustre programs Esterel programs

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

/ : Computer Architecture and Design

/ : Computer Architecture and Design 16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)

More information

Lecture Note #6: More on Task Scheduling EECS 571 Principles of Real-Time Embedded Systems Kang G. Shin EECS Department University of Michigan

Lecture Note #6: More on Task Scheduling EECS 571 Principles of Real-Time Embedded Systems Kang G. Shin EECS Department University of Michigan Lecture Note #6: More on Task Scheduling EECS 571 Principles of Real-Time Embedded Systems Kang G. Shin EECS Department University of Michigan Note 6-1 Mars Pathfinder Timing Hiccups? When: landed on the

More information

Shared resources. Sistemi in tempo reale. Giuseppe Lipari. Scuola Superiore Sant Anna Pisa -Italy

Shared resources. Sistemi in tempo reale. Giuseppe Lipari. Scuola Superiore Sant Anna Pisa -Italy istemi in tempo reale hared resources Giuseppe Lipari cuola uperiore ant Anna Pisa -Italy inher.tex istemi in tempo reale Giuseppe Lipari 7/6/2005 12:35 p. 1/21 Interacting tasks Until now, we have considered

More information

Simulation of Process Scheduling Algorithms

Simulation of Process Scheduling Algorithms Simulation of Process Scheduling Algorithms Project Report Instructor: Dr. Raimund Ege Submitted by: Sonal Sood Pramod Barthwal Index 1. Introduction 2. Proposal 3. Background 3.1 What is a Process 4.

More information

Fall 2008 CSE Qualifying Exam. September 13, 2008

Fall 2008 CSE Qualifying Exam. September 13, 2008 Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for

More information

LSN 15 Processor Scheduling

LSN 15 Processor Scheduling LSN 15 Processor Scheduling ECT362 Operating Systems Department of Engineering Technology LSN 15 Processor Scheduling LSN 15 FCFS/FIFO Scheduling Each process joins the Ready queue When the current process

More information

Latches. October 13, 2003 Latches 1

Latches. October 13, 2003 Latches 1 Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory

More information

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE

More information

Allocate-On-Use Space Complexity of Shared-Memory Algorithms

Allocate-On-Use Space Complexity of Shared-Memory Algorithms 1 2 3 4 5 6 7 8 9 10 Allocate-On-Use Space Complexity of Shared-Memory Algorithms James Aspnes 1 Yale University Department of Computer Science Bernhard Haeupler Carnegie Mellon School of Computer Science

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

Real-Time Software Transactional Memory: Contention Managers, Time Bounds, and Implementations

Real-Time Software Transactional Memory: Contention Managers, Time Bounds, and Implementations Real-Time Software Transactional Memory: Contention Managers, Time Bounds, and Implementations Mohammed El-Shambakey Dissertation Submitted to the Faculty of the Virginia Polytechnic Institute and State

More information

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Raj Parihar Advisor: Prof. Michael C. Huang March 22, 2013 Raj Parihar Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

More information

CSE 380 Computer Operating Systems

CSE 380 Computer Operating Systems CSE 380 Computer Operating Systems Instructor: Insup Lee & Dianna Xu University of Pennsylvania, Fall 2003 Lecture Note 3: CPU Scheduling 1 CPU SCHEDULING q How can OS schedule the allocation of CPU cycles

More information

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007 Lecture 1 Introduction 1 Introduction 1.1 The Problem Domain Today, we are going to ask whether a system can recover from perturbation. Consider a children s top: If it is perfectly vertically, you can

More information

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example This Unit: Scheduling (Static + Dnamic) CIS 50 Computer Architecture Unit 8: Static and Dnamic Scheduling Application OS Compiler Firmware CPU I/O Memor Digital Circuits Gates & Transistors! Previousl:!

More information

Origami: Folding Warps for Energy Efficient GPUs

Origami: Folding Warps for Energy Efficient GPUs Origami: Folding Warps for Energy Efficient GPUs Mohammad Abdel-Majeed*, Daniel Wong, Justin Huang and Murali Annavaram* * University of Southern alifornia University of alifornia, Riverside Stanford University

More information

Transactional Information Systems:

Transactional Information Systems: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery Gerhard Weikum and Gottfried Vossen 2002 Morgan Kaufmann ISBN 1-55860-508-8 Teamwork is essential.

More information

Deadlock Ezio Bartocci Institute for Computer Engineering

Deadlock Ezio Bartocci Institute for Computer Engineering TECHNISCHE UNIVERSITÄT WIEN Fakultät für Informatik Cyber-Physical Systems Group Deadlock Ezio Bartocci Institute for Computer Engineering ezio.bartocci@tuwien.ac.at Deadlock Permanent blocking of a set

More information

Counters. We ll look at different kinds of counters and discuss how to build them

Counters. We ll look at different kinds of counters and discuss how to build them Counters We ll look at different kinds of counters and discuss how to build them These are not only examples of sequential analysis and design, but also real devices used in larger circuits 1 Introducing

More information

Lab Course: distributed data analytics

Lab Course: distributed data analytics Lab Course: distributed data analytics 01. Threading and Parallelism Nghia Duong-Trung, Mohsan Jameel Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany International

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 10

ECE 571 Advanced Microprocessor-Based Design Lecture 10 ECE 571 Advanced Microprocessor-Based Design Lecture 10 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 23 February 2017 Announcements HW#5 due HW#6 will be posted 1 Oh No, More

More information

INF 4140: Models of Concurrency Series 3

INF 4140: Models of Concurrency Series 3 Universitetet i Oslo Institutt for Informatikk PMA Olaf Owe, Martin Steffen, Toktam Ramezani INF 4140: Models of Concurrency Høst 2016 Series 3 14. 9. 2016 Topic: Semaphores (Exercises with hints for solution)

More information

Parallel Performance Theory - 1

Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis

More information

Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations

Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations September 18, 2012 Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations Yuxing Peng, Chris Knight, Philip Blood, Lonnie Crosby, and Gregory A. Voth Outline Proton Solvation

More information

Deadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]

Deadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] Deadlock CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] 1 Outline Resources Deadlock Deadlock Prevention Deadlock Avoidance Deadlock Detection Deadlock Recovery 2 Review: Synchronization

More information

Performance and Scalability. Lars Karlsson

Performance and Scalability. Lars Karlsson Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix

More information

arxiv: v1 [cs.dc] 9 Feb 2015

arxiv: v1 [cs.dc] 9 Feb 2015 Why Transactional Memory Should Not Be Obstruction-Free Petr Kuznetsov 1 Srivatsan Ravi 2 1 Télécom ParisTech 2 TU Berlin arxiv:1502.02725v1 [cs.dc] 9 Feb 2015 Abstract Transactional memory (TM) is an

More information

Module 5: CPU Scheduling

Module 5: CPU Scheduling Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained

More information

arxiv: v2 [cs.dc] 20 Nov 2017

arxiv: v2 [cs.dc] 20 Nov 2017 The Optimal Pessimistic Transactional Memory Algorithm arxiv:1605.01361v2 [cs.dc] 20 Nov 2017 Paweł T. Wojciechowski, Konrad Siek {pawel.t.wojciechowski,konrad.siek}@cs.put.edu.pl Institute of Computing

More information

Abstractions and Decision Procedures for Effective Software Model Checking

Abstractions and Decision Procedures for Effective Software Model Checking Abstractions and Decision Procedures for Effective Software Model Checking Prof. Natasha Sharygina The University of Lugano, Carnegie Mellon University Microsoft Summer School, Moscow, July 2011 Lecture

More information

Real-Time Scheduling and Resource Management

Real-Time Scheduling and Resource Management ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 2008 Real-Time Scheduling and Resource Management Lecturer: Giorgio Buttazzo Full Professor Scuola Superiore Sant Anna

More information

Logic Design II (17.342) Spring Lecture Outline

Logic Design II (17.342) Spring Lecture Outline Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least

More information

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 6.1 Basic Concepts Maximum CPU utilization obtained

More information

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute

More information

Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++ Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel

More information

Loop Parallelization Techniques and dependence analysis

Loop Parallelization Techniques and dependence analysis Loop Parallelization Techniques and dependence analysis Data-Dependence Analysis Dependence-Removing Techniques Parallelizing Transformations Performance-enchancing Techniques 1 When can we run code in

More information

Module 9: "Introduction to Shared Memory Multiprocessors" Lecture 17: "Introduction to Cache Coherence Protocols" Invalidation vs.

Module 9: Introduction to Shared Memory Multiprocessors Lecture 17: Introduction to Cache Coherence Protocols Invalidation vs. Invalidation vs. Update Sharing patterns Migratory hand-off States of a cache line Stores MSI protocol MSI example MESI protocol MESI example MOESI protocol Hybrid inval+update file:///e /parallel_com_arch/lecture17/17_1.htm[6/13/2012

More information

6. Finite State Machines

6. Finite State Machines 6. Finite State Machines 6.4x Computation Structures Part Digital Circuits Copyright 25 MIT EECS 6.4 Computation Structures L6: Finite State Machines, Slide # Our New Machine Clock State Registers k Current

More information

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,

More information

Towards a Practical Secure Concurrent Language

Towards a Practical Secure Concurrent Language Towards a Practical Secure Concurrent Language Stefan Muller and Stephen Chong TR-05-12 Computer Science Group Harvard University Cambridge, Massachusetts Towards a Practical Secure Concurrent Language

More information

Analytical Modeling of Parallel Programs. S. Oliveira

Analytical Modeling of Parallel Programs. S. Oliveira Analytical Modeling of Parallel Programs S. Oliveira Fall 2005 1 Scalability of Parallel Systems Efficiency of a parallel program E = S/P = T s /PT p Using the parallel overhead expression E = 1/(1 + T

More information

Variants of Turing Machine (intro)

Variants of Turing Machine (intro) CHAPTER 3 The Church-Turing Thesis Contents Turing Machines definitions, examples, Turing-recognizable and Turing-decidable languages Variants of Turing Machine Multi-tape Turing machines, non-deterministic

More information

THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY. Daniel Sanchez and Christos Kozyrakis Stanford University

THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY. Daniel Sanchez and Christos Kozyrakis Stanford University THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY Daniel Sanchez and Christos Kozyrakis Stanford University MICRO-43, December 6 th 21 Executive Summary 2 Mitigating the memory wall requires large, highly

More information

CPU Scheduling. Heechul Yun

CPU Scheduling. Heechul Yun CPU Scheduling Heechul Yun 1 Recap Four deadlock conditions: Mutual exclusion No preemption Hold and wait Circular wait Detection Avoidance Banker s algorithm 2 Recap: Banker s Algorithm 1. Initialize

More information

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 18: Sharing Patterns and Cache Coherence Protocols. The Lecture Contains:

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 18: Sharing Patterns and Cache Coherence Protocols. The Lecture Contains: The Lecture Contains: Invalidation vs. Update Sharing Patterns Migratory Hand-off States of a Cache Line Stores MSI Protocol State Transition MSI Example MESI Protocol MESI Example MOESI Protocol MOSI

More information

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts. TDDI4 Concurrent Programming, Operating Systems, and Real-time Operating Systems CPU Scheduling Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Multiprocessor

More information

CPU SCHEDULING RONG ZHENG

CPU SCHEDULING RONG ZHENG CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM

More information

Nondeterminism. September 7, Nondeterminism

Nondeterminism. September 7, Nondeterminism September 7, 204 Introduction is a useful concept that has a great impact on the theory of computation Introduction is a useful concept that has a great impact on the theory of computation So far in our

More information

Part I: Definitions and Properties

Part I: Definitions and Properties Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States

More information

Process Scheduling for RTS. RTS Scheduling Approach. Cyclic Executive Approach

Process Scheduling for RTS. RTS Scheduling Approach. Cyclic Executive Approach Process Scheduling for RTS Dr. Hugh Melvin, Dept. of IT, NUI,G RTS Scheduling Approach RTS typically control multiple parameters concurrently Eg. Flight Control System Speed, altitude, inclination etc..

More information

Solution of Linear Systems

Solution of Linear Systems Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing

More information

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata CISC 4090: Theory of Computation Chapter Regular Languages Xiaolan Zhang, adapted from slides by Prof. Werschulz Section.: Finite Automata Fordham University Department of Computer and Information Sciences

More information

Reducing NVM Writes with Optimized Shadow Paging

Reducing NVM Writes with Optimized Shadow Paging Reducing NVM Writes with Optimized Shadow Paging Yuanjiang Ni, Jishen Zhao, Daniel Bittman, Ethan L. Miller Center for Research in Storage Systems University of California, Santa Cruz Emerging Technology

More information

Complexity Theory Part I

Complexity Theory Part I Complexity Theory Part I Outline for Today Recap from Last Time Reviewing Verifiers Nondeterministic Turing Machines What does nondeterminism mean in the context of TMs? And just how powerful are NTMs?

More information

Marwan Burelle. Parallel and Concurrent Programming. Introduction and Foundation

Marwan Burelle.  Parallel and Concurrent Programming. Introduction and Foundation and and marwan.burelle@lse.epita.fr http://wiki-prog.kh405.net Outline 1 2 and 3 and Evolutions and Next evolutions in processor tends more on more on growing of cores number GPU and similar extensions

More information

Highly-scalable branch and bound for maximum monomial agreement

Highly-scalable branch and bound for maximum monomial agreement Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program

More information

Appendix A. Formal Proofs

Appendix A. Formal Proofs Distributed Reasoning for Multiagent Simple Temporal Problems Appendix A Formal Proofs Throughout this paper, we provided proof sketches to convey the gist of the proof when presenting the full proof would

More information

CS 6112 (Fall 2011) Foundations of Concurrency

CS 6112 (Fall 2011) Foundations of Concurrency CS 6112 (Fall 2011) Foundations of Concurrency 29 November 2011 Scribe: Jean-Baptiste Jeannin 1 Readings The readings for today were: Eventually Consistent Transactions, by Sebastian Burckhardt, Manuel

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun

Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun spcl.inf.ethz.ch @spcl_eth Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun Design of Parallel and High-Performance Computing Fall 2017 DPHPC Overview cache coherency memory models 2 Speedup An application

More information

Synchronous Reactive Systems

Synchronous Reactive Systems Synchronous Reactive Systems Stephen Edwards sedwards@synopsys.com Synopsys, Inc. Outline Synchronous Reactive Systems Heterogeneity and Ptolemy Semantics of the SR Domain Scheduling the SR Domain 2 Reactive

More information

Leveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks

Leveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks Leveraging Transactional Memory for a Predictable Execution of Applications Composed of Hard Real-Time and Best-Effort Tasks Stefan Metzlaff, Sebastian Weis, and Theo Ungerer Department of Computer Science,

More information

Scheduling I. Today Introduction to scheduling Classical algorithms. Next Time Advanced topics on scheduling

Scheduling I. Today Introduction to scheduling Classical algorithms. Next Time Advanced topics on scheduling Scheduling I Today Introduction to scheduling Classical algorithms Next Time Advanced topics on scheduling Scheduling out there You are the manager of a supermarket (ok, things don t always turn out the

More information

CSE 548: Analysis of Algorithms. Lecture 12 ( Analyzing Parallel Algorithms )

CSE 548: Analysis of Algorithms. Lecture 12 ( Analyzing Parallel Algorithms ) CSE 548: Analysis of Algorithms Lecture 12 ( Analyzing Parallel Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Fall 2017 Why Parallelism? Moore s Law Source: Wikipedia

More information

2/5/07 CSE 30341: Operating Systems Principles

2/5/07 CSE 30341: Operating Systems Principles page 1 Shortest-Job-First (SJR) Scheduling Associate with each process the length of its next CPU burst. Use these lengths to schedule the process with the shortest time Two schemes: nonpreemptive once

More information

CS/IT OPERATING SYSTEMS

CS/IT OPERATING SYSTEMS CS/IT 5 (CR) Total No. of Questions :09] [Total No. of Pages : 0 II/IV B.Tech. DEGREE EXAMINATIONS, DECEMBER- 06 CS/IT OPERATING SYSTEMS. a) System Boot Answer Question No. Compulsory. Answer One Question

More information

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions Lecture Slides Intro Number Systems Logic Functions EE 0 in Context EE 0 EE 20L Logic Design Fundamentals Logic Design, CAD Tools, Lab tools, Project EE 357 EE 457 Computer Architecture Using the logic

More information

Clock-driven scheduling

Clock-driven scheduling Clock-driven scheduling Also known as static or off-line scheduling Michal Sojka Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering November 8, 2017

More information

How to deal with uncertainties and dynamicity?

How to deal with uncertainties and dynamicity? How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling

More information

Embedded Systems Development

Embedded Systems Development Embedded Systems Development Lecture 2 Finite Automata & SyncCharts Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Some things I forgot to mention 2 Remember the HISPOS registration

More information

Lightweight Real-Time Synchronization under P-EDF on Symmetric and Asymmetric Multiprocessors

Lightweight Real-Time Synchronization under P-EDF on Symmetric and Asymmetric Multiprocessors Consistent * Complete * Well Documented * Easy to Reuse * Technical Report MPI-SWS-216-3 May 216 Lightweight Real-Time Synchronization under P-EDF on Symmetric and Asymmetric Multiprocessors (extended

More information

Quantum Cluster Simulations of Low D Systems

Quantum Cluster Simulations of Low D Systems Quantum Cluster Simulations of Low D Systems Electronic Correlations on Many Length Scales M. Jarrell, University of Cincinnati High Perf. QMC Hybrid Method SP Sep. in 1D NEW MEM SP Sep. in 1D 2-Chain

More information

Design Patterns and Refactoring

Design Patterns and Refactoring Singleton Oliver Haase HTWG Konstanz 1 / 19 Description I Classification: object based creational pattern Puropse: ensure that a class can be instantiated exactly once provide global access point to single

More information

Adders allow computers to add numbers 2-bit ripple-carry adder

Adders allow computers to add numbers 2-bit ripple-carry adder Lecture 12 Logistics HW was due yesterday HW5 was out yesterday (due next Wednesday) Feedback: thank you! Things to work on: ig picture, ook chapters, Exam comments Last lecture dders Today Clarification

More information

Review for the Midterm Exam

Review for the Midterm Exam Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations

More information

Recap from the previous lecture on Analytical Modeling

Recap from the previous lecture on Analytical Modeling COSC 637 Parallel Computation Analytical Modeling of Parallel Programs (II) Edgar Gabriel Fall 20 Recap from the previous lecture on Analytical Modeling Speedup: S p = T s / T p (p) Efficiency E = S p

More information

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling Scheduling I Today! Introduction to scheduling! Classical algorithms Next Time! Advanced topics on scheduling Scheduling out there! You are the manager of a supermarket (ok, things don t always turn out

More information