Lecture 16 More Profiling: gperftools, systemwide tools: oprofile, perf, DTrace, etc.
|
|
- Emily Shaw
- 5 years ago
- Views:
Transcription
1 Lecture 16 More Profiling: gperftools, systemwide tools: oprofile, perf, DTrace, etc. ECE 459: Programming for Performance March 6, 2014
2 Part I gperftools 2 / 49
3 Introduction to gperftools Google Performance Tools include: CPU profiler. Heap profiler. Heap checker. Faster (multithreaded) malloc. We ll mostly use the CPU profiler: purely statistical sampling; no recompilation; at most linking; and built-in visual output. 3 / 49
4 Google Perf Tools profiler usage You can use the profiler without any recompilation. Not recommended worse data. LD_PRELOAD="/ u s r / l i b / l i b p r o f i l e r. so " \ CPUPROFILE=t e s t. p r o f. / t e s t The other option is to link to the profiler: -lprofiler Both options read the CPUPROFILE environment variable: states the location to write the profile data. 4 / 49
5 Other Usage You can use the profiling library directly as well: #include <gperftools/profiler.h> Bracket code you want profiled with: ProfilerStart() ProfilerEnd() You can change the sampling frequency with the CPUPROFILE_FREQUENCY environment variable. Default value: / 49
6 pprof Usage Like gprof, it will analyze profiling results. % p p r o f t e s t t e s t. p r o f E n t e r s " i n t e r a c t i v e " mode % p p r o f t e x t t e s t t e s t. p r o f Outputs one l i n e p e r p r o c e d u r e % p p r o f gv t e s t t e s t. p r o f D i s p l a y s a n n o t a t e d c a l l graph v i a gv % p p r o f gv f o c u s=mutex t e s t t e s t. p r o f R e s t r i c t s to code p a t h s i n c l u d i n g a. Mutex. e n t r y % p p r o f gv f o c u s=mutex i g n o r e=s t r i n g t e s t t e s t. p r o f Code p a t h s i n c l u d i n g Mutex but not s t r i n g % p p r o f l i s t =g e t d i r t e s t t e s t. p r o f ( Per l i n e ) a n n o t a t e d s o u r c e l i s t i n g f o r g e t d i r ( ) % p p r o f disasm=g e t d i r t e s t t e s t. p r o f ( Per PC) a n n o t a t e d d i s a s s e m b l y f o r g e t d i r ( ) Can also output dot, ps, pdf or gif instead of gv. 6 / 49
7 Text Output Similar to the flat profile in gprof j o r i k e r examples master % p p r o f t e x t t e s t t e s t. p r o f Using l o c a l f i l e t e s t. Using l o c a l f i l e t e s t. p r o f. Removing k i l l p g from a l l s t a c k t r a c e s. T o t a l : 300 s a m p l e s % 31.7% % int_ power % 51.0% % f l o a t _ p o w e r % 68.0% % f l o a t _ m a t h _ h e l p e r % 84.7% % i n t _ m a t h _ h e l p e r % 90.7% % float_ ma th % 95.3% % int_math % 100.0% % main 0 0.0% 100.0% % l i b c _ s t a r t _ m a i n 0 0.0% 100.0% % _ s t a r t 7 / 49
8 Text Output Explained Columns, from left to right: Number of checks (samples) in this function. Percentage of checks in this function. Same as time in gprof. Percentage of checks in the functions printed so far. Equivalent to cumulative (but in %). Number of checks in this function and its callees. Percentage of checks in this function and its callees. Function name. 8 / 49
9 of 300 (100.0%) of 300 (100.0%) Graphical Output a.out Total samples: 300 Focusing on: 300 Dropped nodes with <= 1 abs(samples) Dropped edges with <= 0 samples _start 0 (0.0%) 300 libc_start_main 0 (0.0%) 300 main 14 (4.7%) of 300 (100.0%) float_math 18 (6.0%) of 131 (43.7%) 16 4 int_math 3 14 (4.7%) of 159 (53.0%) int_math_helper 50 (16.7%) of 137 (45.7%) float_math_helper 51 (17.0%) of 96 (32.0%) int_power 95 (31.7%) of 102 (34.0%) float_power 58 (19.3%) 51 9 / 49
10 Graphical Output Explained Output was too small to read on the slide. Shows the same numbers as the text output. Directed edges denote function calls. Note: number of samples in callees = number in this function + callees - number in this function. Example: float_math_helper, 51 (local) of 96 (cumulative) = 45 (callees) callee int_power = 7 (bogus) callee float_power = 38 callees total = / 49
11 Things You May Notice Call graph is not exact. In fact, it shows many bogus relations which clearly don t exist. For instance, we know that there are no cross-calls between int and float functions. As with gprof, optimizations will change the graph. You ll probably want to look at the text profile first, then use the focus flag to look at individual functions. 11 / 49
12 Part II System profiling: oprofile, perf, DTrace, WAIT 12 / 49
13 Introduction: oprofile Sampling-based tool. Uses CPU performance counters. Tracks currently-running function; records profiling data for every application run. Can work system-wide (across processes). Technology: Linux Kernel Performance Events (formerly a Linux kernel module). 13 / 49
14 Setting up oprofile Must run as root to use system-wide, otherwise can use per-process. % sudo o p c o n t r o l \ v m l i n u x=/u s r / s r c / l i n u x ARCH/ v m l i n u x % echo 0 sudo t e e / p r o c / s y s / k e r n e l / nmi_watchdog % sudo o p c o n t r o l s t a r t Using d e f a u l t e v e n t : CPU_CLK_UNHALTED: : 0 : 1 : 1 Using 2.6+ O P r o f i l e k e r n e l i n t e r f a c e. Reading module i n f o. Using l o g f i l e / v a r / l i b / o p r o f i l e / s a m p l e s / o p r o f i l e d. l o g Daemon s t a r t e d. P r o f i l e r r u n n i n g. Per-process: [ plam@lynch nm morph ] $ o p e r f. / t e s t _ h a r n e s s o p e r f : P r o f i l e r s t a r t e d P r o f i l i n g done. 14 / 49
15 oprofile Usage (1) Pass your executable to opreport. % sudo o p r e p o r t l. / t e s t CPU : I n t e l Core / i7, speed MHz ( e s t i m a t e d ) Counted CPU_CLK_UNHALTED e v e n t s ( Clock c y c l e s when not h a l t e d ) w i t h a u n i t mask o f 0 x00 (No u n i t mask ) count s a m p l e s % symbol name i n t _ m a t h _ h e l p e r int_ power f l o a t _ p o w e r flo at_ math int_math f l o a t _ m a t h _ h e l p e r main If you have debug symbols (-g) you could use: % sudo o p a n n o t a t e s o u r c e \ output d i r =/path / to / annotated s o u r c e / path / to / mybinary 15 / 49
16 oprofile Usage (2) Use opreport by itself for a whole-system view. You can also reset and stop the profiling. % sudo o p c o n t r o l r e s e t S i g n a l l i n g daemon... done % sudo o p c o n t r o l s t o p S t o p p i n g p r o f i l i n g. 16 / 49
17 Perf: Introduction Interface to Linux kernel built-in sampling-based profiling. Per-process, per-cpu, or system-wide. Can even report the cost of each line of code. 17 / 49
18 Perf: Usage Example On last year s Assignment 3 code: [ plam@lynch nm morph ] $ p e r f s t a t. / t e s t _ h a r n e s s Performance c o u n t e r s t a t s f o r. / t e s t _ h a r n e s s : task c l o c k # CPUs u t i l i z e d 666 c o n t e x t s w i t c h e s # K/ s e c 0 cpu m i g r a t i o n s # K/ s e c 3,791 page f a u l t s # K/ s e c 24,874,267,078 c y c l e s # GHz [ 83.32%] 12,565,457,337 s t a l l e d c y c l e s f r o n t e n d # 50.52% f r o n t e n d c y c l e s i d l e [ % ] 5,874,853,028 s t a l l e d c y c l e s backend # 23.62% backend c y c l e s i d l e [ % ] 33,787,408,650 i n s t r u c t i o n s # i n s n s p e r c y c l e # s t a l l e d c y c l e s p e r i n s n [ % ] 5,271,501,213 b r a n c h e s # M/ s e c [ % ] 155,568,356 branch misses # 2.95% of a l l branches [ 83.36%] s e c o n d s time e l a p s e d 18 / 49
19 Perf: Source-level Analysis perf can tell you which instructions are taking time, or which lines of code. Compile with -ggdb to enable source code viewing. % p e r f r e c o r d. / t e s t _ h a r n e s s % p e r f a n n o t a t e perf annotate is interactive. Play around with it. 19 / 49
20 DTrace: Introduction Intrumentation-based tool. System-wide. Meant to be used on production systems. (Eh?) 20 / 49
21 DTrace: Introduction Intrumentation-based tool. System-wide. Meant to be used on production systems. (Eh?) (Typical instrumentation can have a slowdown of 100x (Valgrind).) Design goals: 1 No overhead when not in use; 2 Guarantee safety must not crash (strict limits on expressiveness of probes). 21 / 49
22 DTrace: Operation How does DTrace achieve 0 overhead? only when activated, dynamically rewrites code by placing a branch to instrumentation code. Uninstrumented: runs as if nothing changed. Most instrumentation: at function entry or exit points. You can also instrument kernel functions, locking, instrument-based on other events. Can express sampling as instrumentation-based events also. 22 / 49
23 DTrace Example You write this: s y s c a l l : : r e a d : e n t r y { s e l f >t = timestamp ; } s y s c a l l : : r e a d : r e t u r n / s e l f >t / { p r i n t f ("%d/%d s p e n t %d n s e c s i n r e a d \n " pid, t i d, timestamp s e l f >t ) ; } t is a thread-local variable. This code prints how long each call to read takes, along with context. To ensure safety, DTrace limits what you write; e.g. no loops. (Hence, no infinite loops!) 23 / 49
24 Other Tools AMD CodeAnalyst based on oprofile; leverages AMD processor features. WAIT IBM s tool tells you what operations your JVM is waiting on while idle. Non-free and not available. 24 / 49
25 Other Tools AMD CodeAnalyst based on oprofile. WAIT IBM s tool tells you what operations your JVM is waiting on while idle. Non-free and not available. 25 / 49
26 WAIT: Introduction Built for production environments. Specialized for profiling JVMs, uses JVM hooks to analyze idle time. Sampling-based analysis; infrequent samples (1 2 per minute!) 26 / 49
27 WAIT: Operation At each sample: records each thread s state, call stack; participation in system locks. Enables WAIT to compute a wait state (using expert-written rules): what the process is currently doing or waiting on, e.g. disk; GC; network; blocked; etc. 27 / 49
28 WAIT: Workflow You: run your application; collect data (using a script or manually); and upload the data to the server. Server provides a report. You fix the performance problems. Report indicates processor utilization (idle, your application, GC, etc); runnable threads; waiting threads (and why they are waiting); thread states; and a stack viewer. Paper presents 6 case studies where WAIT identified performance problems: deadlocks, server underloads, memory leaks, database bottlenecks, and excess filesystem activity. 28 / 49
29 Other Profiling Tools Profiling: Not limited to C/C++, or even code. You can profile Python using cprofile; standard profiling technology. Google s Page Speed Tool: profiling for web pages how can you make your page faster? reducing number of DNS lookups; leveraging browser caching; combining images; plus, traditional JavaScript profiling. 29 / 49
30 Part III Reentrancy 30 / 49
31 Reentrancy A function can be suspended in the middle and re-entered (called again) before the previous execution returns. Does not always mean thread-safe (although it usually is). Recall: thread-safe is essentially no data races. Moot point if the function only modifies local data, e.g. sin(). 31 / 49
32 Reentrancy Example Courtesy of Wikipedia (with modifications): i n t t ; v o i d swap ( i n t x, i n t y ) { t = x ; x = y ; // hardware i n t e r r u p t might i n v o k e i s r ( ) h e r e! y = t ; } v o i d i s r ( ) { i n t x = 1, y = 2 ; swap(&x, &y ) ; }... i n t a = 3, b = 4 ;... swap(&a, &b ) ; 32 / 49
33 Reentrancy Example Explained (a trace) c a l l swap(&a, &b ) ; t = x ; // t = 3 ( a ) x = y ; // a = 4 ( b ) c a l l i s r ( ) ; x = 1 ; y = 2 ; c a l l swap(&x, &y ) t = x ; // t = 1 ( x ) x = y ; // x = 2 ( y ) y = t ; // y = 1 y = t ; // b = 1 F i n a l v a l u e s : a = 4, b = 1 Expected v a l u e s : a = 4, b = 3 33 / 49
34 Reentrancy Example, Fixed i n t t ; v o i d swap ( i n t x, i n t y ) { i n t s ; } s = t ; // s a v e g l o b a l v a r i a b l e t = x ; x = y ; // hardware i n t e r r u p t might i n v o k e i s r ( ) h e r e! y = t ; t = s ; // r e s t o r e g l o b a l v a r i a b l e v o i d i s r ( ) { i n t x = 1, y = 2 ; swap(&x, &y ) ; }... i n t a = 3, b = 4 ;... swap(&a, &b ) ; 34 / 49
35 Reentrancy Example, Fixed Explained (a trace) c a l l swap(&a, &b ) ; s = t ; // s = UNDEFINED t = x ; // t = 3 ( a ) x = y ; // a = 4 ( b ) c a l l i s r ( ) ; x = 1 ; y = 2 ; c a l l swap(&x, &y ) s = t ; // s = 3 t = x ; // t = 1 ( x ) x = y ; // x = 2 ( y ) y = t ; // y = 1 t = s ; // t = 3 y = t ; // b = 3 t = s ; // t = UNDEFINED F i n a l v a l u e s : a = 4, b = 3 Expected v a l u e s : a = 4, b = 3 35 / 49
36 Previous Example: thread-safety Is the previous reentrant code also thread-safe? (This is more what we re concerned about in this course.) Let s see: i n t t ; v o i d swap ( i n t x, i n t y ) { i n t s ; } s = t ; // s a v e g l o b a l v a r i a b l e t = x ; x = y ; // hardware i n t e r r u p t might i n v o k e i s r ( ) h e r e! y = t ; t = s ; // r e s t o r e g l o b a l v a r i a b l e Consider two calls: swap(a, b), swap(c, d) with a = 1, b = 2, c = 3, d = / 49
37 Previous Example: thread-safety trace g l o b a l : t / t h r e a d 1 / / t h r e a d 2 / a = 1, b = 2 ; s = t ; // s = UNDEFINED t = a ; // t = 1 c = 3, d = 4 ; s = t ; // s = 1 t = c ; // t = 3 c = d ; // c = 4 d = t ; // d = 3 a = b ; // a = 2 b = t ; // b = 3 t = s ; // t = UNDEFINED t = s ; // t = 1 F i n a l v a l u e s : a = 2, b = 3, c = 4, d = 3, t = 1 Expected v a l u e s : a = 2, b = 1, c = 4, d = 3, t = UNDEFINED 37 / 49
38 Reentrancy vs Thread-Safety (1) Re-entrant does not always mean thread-safe (as we saw) But, for most sane implementations, it is thread-safe Ok, but are thread-safe functions reentrant? 38 / 49
39 Reentrancy vs Thread-Safety (2) Are thread-safe functions reentrant? Nope. Consider: i n t f ( ) { l o c k ( ) ; // p r o t e c t e d code u n l o c k ( ) ; } Recall: Reentrant functions can be suspended in the middle of execution and called again before the previous execution completes. f() obviously isn t reentrant. Plus, it will deadlock. Interrupt handling is more for systems programming, so the topic of reentrancy may or may not come up again. 39 / 49
40 Summary of Reentrancy vs Thread-Safety Difference between reentrant and thread-safe functions: Reentrancy Has nothing to do with threads assumes a single thread. Reentrant means the execution can context switch at any point in in a function, call the same function, and complete before returning to the original function call. Function s result does not depend on where the context switch happens. Thread-safety Result does not depend on any interleaving of threads from concurrency or parallelism. No unexpected results from multiple concurrent executions of the function. 40 / 49
41 Another Definition of Thread-Safe Functions A function whose effect, when called by two or more threads, is guaranteed to be as if the threads each executed the function one after another, in an undefined order, even if the actual execution is interleaved. 41 / 49
42 Good Example of an Exam Question v o i d swap ( i n t x, i n t y ) { i n t t ; t = x ; x = y ; y = t ; } Is the above code thread-safe? Write some expected results for running two calls in parallel. Argue these expected results always hold, or show an example where they do not. 42 / 49
43 Part IV Good Practices 43 / 49
44 Inlining We have seen the notion of inlining: Instructs the compiler to just insert the function code in-place, instead of calling the function. Hence, no function call overhead! Compilers can also do better context-sensitive operations they couldn t have done before. No overhead... sounds like better performance... let s inline everything! 44 / 49
45 Inlining in C++ Implicit inlining (defining a function inside a class definition): c l a s s P { p u b l i c : i n t get_x ( ) c o n s t { r e t u r n x ; }... p r i v a t e : i n t x ; } ; Explicit inlining: i n l i n e max ( c o n s t i n t& x, c o n s t i n t& y ) { r e t u r n x < y? y : x ; } 45 / 49
46 The Other Side of Inlining One big downside: Your program size is going to increase. This is worse than you think: Fewer cache hits. More trips to memory. Some inlines can grow very rapidly (C++ extended constructors). Just from this your performance may go down easily. 46 / 49
47 Compilers on Inlining Inlining is merely a suggestion to compilers. They may ignore you. For example: taking the address of an inline function and using it; or virtual functions (in C++), will get you ignored quite fast. 47 / 49
48 From a Usability Point-of-View Debugging is more difficult (e.g. you can t set a breakpoint in a function that doesn t actually exist). Most compilers simply won t inline code with debugging symbols on. Some do, but typically it s more of a pain. Library design: If you change any inline function in your library, any users of that library have to recompile their program if the library updates. (non-binary-compatible change!) Not a problem for non-inlined functions programs execute the new function dynamically at runtime. 48 / 49
49 Summary System profiling tools: oprofile, DTrace, WAIT, perf Reentrancy vs thread-safety. Inlining: limit your inlining to trivial functions: makes debugging easier and improves usability; won t slow down your program before you even start optimizing it. Tell the compiler high-level information but think low-level. 49 / 49
Lecture 15 Compiler Optimizations
Lecture 15 Compiler Optimizations ECE 459: Programming for Performance March 5, 2013 Previous Lecture Lock Granularity What Reentrant means Inlining Leveraging Abstractions (C++) 2 / 36 About That Midterm
More informationClojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014
Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,
More informationAdministrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application
Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory
More informationSpeculative Parallelism in Cilk++
Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel
More informationReasoning About Imperative Programs. COS 441 Slides 10b
Reasoning About Imperative Programs COS 441 Slides 10b Last time Hoare Logic: { P } C { Q } Agenda If P is true in the initial state s. And C in state s evaluates to s. Then Q must be true in s. Program
More informationUC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara
Operating Systems Christopher Kruegel Department of Computer Science http://www.cs.ucsb.edu/~chris/ Many processes to execute, but one CPU OS time-multiplexes the CPU by operating context switching Between
More informationAbstractions and Decision Procedures for Effective Software Model Checking
Abstractions and Decision Procedures for Effective Software Model Checking Prof. Natasha Sharygina The University of Lugano, Carnegie Mellon University Microsoft Summer School, Moscow, July 2011 Lecture
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationClock-driven scheduling
Clock-driven scheduling Also known as static or off-line scheduling Michal Sojka Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering November 8, 2017
More informationLab Course: distributed data analytics
Lab Course: distributed data analytics 01. Threading and Parallelism Nghia Duong-Trung, Mohsan Jameel Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany International
More informationINF 4140: Models of Concurrency Series 3
Universitetet i Oslo Institutt for Informatikk PMA Olaf Owe, Martin Steffen, Toktam Ramezani INF 4140: Models of Concurrency Høst 2016 Series 3 14. 9. 2016 Topic: Semaphores (Exercises with hints for solution)
More informationLecture 4: Process Management
Lecture 4: Process Management Process Revisited 1. What do we know so far about Linux on X-86? X-86 architecture supports both segmentation and paging. 48-bit logical address goes through the segmentation
More informationPerformance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu
Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance
More informationLatches. October 13, 2003 Latches 1
Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory
More informationBayes Nets III: Inference
1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy
More informationCHAPTER 5 - PROCESS SCHEDULING
CHAPTER 5 - PROCESS SCHEDULING OBJECTIVES To introduce CPU scheduling, which is the basis for multiprogrammed operating systems To describe various CPU-scheduling algorithms To discuss evaluation criteria
More informationSafety and Liveness. Thread Synchronization: Too Much Milk. Critical Sections. A Really Cool Theorem
Safety and Liveness Properties defined over an execution of a program Thread Synchronization: Too Much Milk Safety: nothing bad happens holds in every finite execution prefix Windows never crashes No patient
More informationDeadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]
Deadlock CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] 1 Outline Resources Deadlock Deadlock Prevention Deadlock Avoidance Deadlock Detection Deadlock Recovery 2 Review: Synchronization
More informationProcess Scheduling for RTS. RTS Scheduling Approach. Cyclic Executive Approach
Process Scheduling for RTS Dr. Hugh Melvin, Dept. of IT, NUI,G RTS Scheduling Approach RTS typically control multiple parameters concurrently Eg. Flight Control System Speed, altitude, inclination etc..
More informationMulticore Semantics and Programming
Multicore Semantics and Programming Peter Sewell Tim Harris University of Cambridge Oracle October November, 2015 p. 1 These Lectures Part 1: Multicore Semantics: the concurrency of multiprocessors and
More informationCSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019
CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis Ruth Anderson Winter 2019 Today Algorithm Analysis What do we care about? How to compare two algorithms Analyzing Code Asymptotic Analysis
More informationAn Experimental Evaluation of Passage-Based Process Discovery
An Experimental Evaluation of Passage-Based Process Discovery H.M.W. Verbeek and W.M.P. van der Aalst Technische Universiteit Eindhoven Department of Mathematics and Computer Science P.O. Box 513, 5600
More informationComputational Complexity. This lecture. Notes. Lecture 02 - Basic Complexity Analysis. Tom Kelsey & Susmit Sarkar. Notes
Computational Complexity Lecture 02 - Basic Complexity Analysis Tom Kelsey & Susmit Sarkar School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ twk@st-andrews.ac.uk
More informationOperating Systems. VII. Synchronization
Operating Systems VII. Synchronization Ludovic Apvrille ludovic.apvrille@telecom-paristech.fr Eurecom, office 470 http://soc.eurecom.fr/os/ @OS Eurecom Outline Synchronization issues 2/22 Fall 2017 Institut
More informationLecture 13: Sequential Circuits, FSM
Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each
More informationDiscrete-event simulations
Discrete-event simulations Lecturer: Dmitri A. Moltchanov E-mail: moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/elt-53606/ OUTLINE: Why do we need simulations? Step-by-step simulations; Classifications;
More informationUnit: Blocking Synchronization Clocks, v0.3 Vijay Saraswat
Unit: Blocking Synchronization Clocks, v0.3 Vijay Saraswat This lecture discusses X10 clocks. For reference material please look at the chapter on Clocks in []. 1 Motivation The central idea underlying
More informationSorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk
CMPS 2200 Fall 2017 Sorting Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk 11/17/17 CMPS 2200 Intro. to Algorithms 1 How fast can we sort? All the sorting algorithms
More informationMath 54 Homework 3 Solutions 9/
Math 54 Homework 3 Solutions 9/4.8.8.2 0 0 3 3 0 0 3 6 2 9 3 0 0 3 0 0 3 a a/3 0 0 3 b b/3. c c/3 0 0 3.8.8 The number of rows of a matrix is the size (dimension) of the space it maps to; the number of
More informationINF Models of concurrency
INF4140 - Models of concurrency Fall 2017 October 17, 2017 Abstract This is the handout version of the slides for the lecture (i.e., it s a rendering of the content of the slides in a way that does not
More informationA GUI FOR EVOLVE ZAMS
A GUI FOR EVOLVE ZAMS D. R. Schlegel Computer Science Department Here the early work on a new user interface for the Evolve ZAMS stellar evolution code is presented. The initial goal of this project is
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationPHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14
PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14 GENERAL INFO The goal of this lab is to determine the speed of sound in air, by making measurements and taking into consideration the
More informationUsing web-based Java pplane applet to graph solutions of systems of differential equations
Using web-based Java pplane applet to graph solutions of systems of differential equations Our class project for MA 341 involves using computer tools to analyse solutions of differential equations. This
More informationStatic Program Analysis
Static Program Analysis Xiangyu Zhang The slides are compiled from Alex Aiken s Michael D. Ernst s Sorin Lerner s A Scary Outline Type-based analysis Data-flow analysis Abstract interpretation Theorem
More informationCSE370: Introduction to Digital Design
CSE370: Introduction to Digital Design Course staff Gaetano Borriello, Brian DeRenzi, Firat Kiyak Course web www.cs.washington.edu/370/ Make sure to subscribe to class mailing list (cse370@cs) Course text
More informationLab 4. Current, Voltage, and the Circuit Construction Kit
Physics 2020, Spring 2009 Lab 4 Page 1 of 8 Your name: Lab section: M Tu Wed Th F TA name: 8 10 12 2 4 Lab 4. Current, Voltage, and the Circuit Construction Kit The Circuit Construction Kit (CCK) is a
More informationCS1210 Lecture 23 March 8, 2019
CS1210 Lecture 23 March 8, 2019 HW5 due today In-discussion exams next week Optional homework assignment next week can be used to replace a score from among HW 1 3. Will be posted some time before Monday
More informationComp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 40
Comp 11 Lectures Mike Shah Tufts University July 26, 2017 Mike Shah (Tufts University) Comp 11 Lectures July 26, 2017 1 / 40 Please do not distribute or host these slides without prior permission. Mike
More informationITI Introduction to Computing II
(with contributions from R. Holte) School of Electrical Engineering and Computer Science University of Ottawa Version of January 9, 2019 Please don t print these lecture notes unless you really need to!
More informationCSE 380 Computer Operating Systems
CSE 380 Computer Operating Systems Instructor: Insup Lee & Dianna Xu University of Pennsylvania, Fall 2003 Lecture Note 3: CPU Scheduling 1 CPU SCHEDULING q How can OS schedule the allocation of CPU cycles
More informationWorst-Case Execution Time Analysis. LS 12, TU Dortmund
Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper
More informationBinding Performance and Power of Dense Linear Algebra Operations
10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique
More information1 SIMPLE PENDULUM 1 L (1.1)
1 SIMPLE PENDULUM 1 October 13, 2015 1 Simple Pendulum IMPORTANT: You must work through the derivation required for this assignment before you turn up to the laboratory. You are also expected to know what
More informationCSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT
CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch
More informationTable of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25
Chapter 11 Dedicated Microprocessors Page 1 of 25 Table of Content Table of Content... 1 11 Dedicated Microprocessors... 2 11.1 Manual Construction of a Dedicated Microprocessor... 3 11.2 FSM + D Model
More informationAnalysis of Software Artifacts
Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University
More information( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of
Factoring Review for Algebra II The saddest thing about not doing well in Algebra II is that almost any math teacher can tell you going into it what s going to trip you up. One of the first things they
More informationHYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017
HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher
More informationCSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018
CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis Ruth Anderson Winter 2018 Today Algorithm Analysis What do we care about? How to compare two algorithms Analyzing Code Asymptotic Analysis
More informationLecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 23: Illusiveness of Parallel Performance James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L23 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping peel
More informationBinocular Observing 101: A guide to binocular observing at LBT
Binocular Observing 101: A guide to binocular observing at LBT 481s625f 23-March-2014 Presented by: John M. Hill LBT Technical Director Outline of the presentation Introduction to Telescope Control Concepts
More information1 Trees. Listing 1: Node with two child reference. public class ptwochildnode { protected Object data ; protected ptwochildnode l e f t, r i g h t ;
1 Trees The next major set of data structures belongs to what s called Trees. They are called that, because if you try to visualize the structure, it kind of looks like a tree (root, branches, and leafs).
More informationCSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018
CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis Ruth Anderson Winter 2018 Today Algorithm Analysis What do we care about? How to compare two algorithms Analyzing Code Asymptotic Analysis
More informationECE 448 Lecture 6. Finite State Machines. State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code. George Mason University
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code George Mason University Required reading P. Chu, FPGA Prototyping by VHDL Examples
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationCptS 464/564 Fall Prof. Dave Bakken. Cpt. S 464/564 Lecture January 26, 2014
Overview of Ordering and Logical Time Prof. Dave Bakken Cpt. S 464/564 Lecture January 26, 2014 Context This material is NOT in CDKB5 textbook Rather, from second text by Verissimo and Rodrigues, chapters
More informationThe New Horizons Geometry Visualizer: Planning the Encounter with Pluto
The New Horizons Geometry Visualizer: Planning the Encounter with Pluto IDL User Group October 16, 2008 LASP, Boulder, CO Dr. Henry Throop Sr. Research Scientist Southwest Research Institute Boulder, CO
More informationComp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 45
Comp 11 Lectures Mike Shah Tufts University July 26, 2017 Mike Shah (Tufts University) Comp 11 Lectures July 26, 2017 1 / 45 Please do not distribute or host these slides without prior permission. Mike
More informationTimeline of a Vulnerability
Introduction Timeline of a Vulnerability Is this all a conspiracy? Vulnerability existed for many years 2 Michael Schwarz (@misc0110) www.iaik.tugraz.at Timeline of a Vulnerability Is this all a conspiracy?
More informationChemical Reaction Engineering Prof. JayantModak Department of Chemical Engineering Indian Institute of Science, Bangalore
Chemical Reaction Engineering Prof. JayantModak Department of Chemical Engineering Indian Institute of Science, Bangalore Module No. #05 Lecture No. #29 Non Isothermal Reactor Operation Let us continue
More informationBiology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:
Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week: Course general information About the course Course objectives Comparative methods: An overview R as language: uses and
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationExample: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3
Linear Algebra Row Reduced Echelon Form Techniques for solving systems of linear equations lie at the heart of linear algebra. In high school we learn to solve systems with or variables using elimination
More informationTimeline of a Vulnerability
Timeline of a Vulnerability Is this all a conspiracy? Vulnerability existed for many years 1 Daniel Gruss, Moritz Lipp, Michael Schwarz www.iaik.tugraz.at Timeline of a Vulnerability Is this all a conspiracy?
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationModel Checking, Theorem Proving, and Abstract Interpretation: The Convergence of Formal Verification Technologies
Model Checking, Theorem Proving, and Abstract Interpretation: The Convergence of Formal Verification Technologies Tom Henzinger EPFL Three Verification Communities Model checking: -automatic, but inefficient
More informationEfficient Dependency Tracking for Relevant Events in Concurrent Systems
Distributed Computing manuscript No. (will be inserted by the editor) Anurag Agarwal Vijay K. Garg Efficient Dependency Tracking for Relevant Events in Concurrent Systems Received: date / Accepted: date
More informationSorting. CS 3343 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk
CS 3343 Fall 2010 Sorting Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk 10/7/10 CS 3343 Analysis of Algorithms 1 How fast can we sort? All the sorting algorithms we
More informationCS 453 Operating Systems. Lecture 7 : Deadlock
CS 453 Operating Systems Lecture 7 : Deadlock 1 What is Deadlock? Every New Yorker knows what a gridlock alert is - it s one of those days when there is so much traffic that nobody can move. Everything
More informationCSE 241 Class 1. Jeremy Buhler. August 24,
CSE 41 Class 1 Jeremy Buhler August 4, 015 Before class, write URL on board: http://classes.engineering.wustl.edu/cse41/. Also: Jeremy Buhler, Office: Jolley 506, 314-935-6180 1 Welcome and Introduction
More informationCompiling Techniques
Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd
More informationAlgorithms and Programming I. Lecture#1 Spring 2015
Algorithms and Programming I Lecture#1 Spring 2015 CS 61002 Algorithms and Programming I Instructor : Maha Ali Allouzi Office: 272 MSB Office Hours: T TH 2:30:3:30 PM Email: mallouzi@kent.edu The Course
More informationActors for Reactive Programming. Actors Origins
Actors Origins Hewitt, early 1970s (1973 paper) Around the same time as Smalltalk Concurrency plus Scheme Agha, early 1980s (1986 book) Erlang (Ericsson), 1990s Akka, 2010s Munindar P. Singh (NCSU) Service-Oriented
More informationDesigning geared 3D twisty puzzles (on the example of Gear Pyraminx) Timur Evbatyrov, Jan 2012
Designing geared 3D twisty puzzles (on the example of Gear Pyraminx) Timur Evbatyrov, Jan 2012 Introduction In this tutorial I ll demonstrate you my own method of designing geared 3D twisty puzzles on
More informationPerformance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6
Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun
More informationComputer Architecture
Computer Architecture QtSpim, a Mips simulator S. Coudert and R. Pacalet January 4, 2018..................... Memory Mapping 0xFFFF000C 0xFFFF0008 0xFFFF0004 0xffff0000 0x90000000 0x80000000 0x7ffff4d4
More informationCosmic Ray Detector Software
Cosmic Ray Detector Software Studying cosmic rays has never been easier Matthew Jones Purdue University 2012 QuarkNet Summer Workshop 1 Brief History First cosmic ray detector built at Purdue in about
More informationSequential programs. Uri Abraham. March 9, 2014
Sequential programs Uri Abraham March 9, 2014 Abstract In this lecture we deal with executions by a single processor, and explain some basic notions which are important for concurrent systems as well.
More informationCIS 4930/6930: Principles of Cyber-Physical Systems
CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles
More informationGenerate i/o load on your vm and use iostat to demonstrate the increased in i/o
Jae Sook Lee SP17 CSIT 432-01 Dr. Chris Leberknight iostat Generate i/o load on your vm and use iostat to demonstrate the increased in i/o Terminal # 1 iostat -c -d -t 2 100 > JaeSookLee_iostat.txt
More informationReal Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence
Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras Lecture - 13 Conditional Convergence Now, there are a few things that are remaining in the discussion
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationLecture 27: Theory of Computation. Marvin Zhang 08/08/2016
Lecture 27: Theory of Computation Marvin Zhang 08/08/2016 Announcements Roadmap Introduction Functions Data Mutability Objects This week (Applications), the goals are: To go beyond CS 61A and see examples
More informationUpdate and Modernization of Sales Tax Rate Lookup Tool for Public and Agency Users. David Wrigh
Update and Modernization of Sales Tax Rate Lookup Tool for Public and Agency Users David Wrigh GIS at the Agency Introduction Who we are! George Alvarado, David Wright, Marty Parsons and Bob Bulgrien make
More informationAutomata Theory CS S-12 Turing Machine Modifications
Automata Theory CS411-2015S-12 Turing Machine Modifications David Galles Department of Computer Science University of San Francisco 12-0: Extending Turing Machines When we added a stack to NFA to get a
More informationPhysics 2020 Lab 5 Intro to Circuits
Physics 2020 Lab 5 Intro to Circuits Name Section Tues Wed Thu 8am 10am 12pm 2pm 4pm Introduction In this lab, we will be using The Circuit Construction Kit (CCK). CCK is a computer simulation that allows
More informationDESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018
DESIGN THEORY FOR RELATIONAL DATABASES csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018 1 Introduction There are always many different schemas for a given
More informationHowto make an Arduino fast enough to... Willem Maes
Howto make an Arduino fast enough to... Willem Maes May 1, 2018 0.1 Introduction Most problems student face when using the Arduino in their projects seems to be the speed. There seems to be a general opinion
More informationCISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata
CISC 4090: Theory of Computation Chapter Regular Languages Xiaolan Zhang, adapted from slides by Prof. Werschulz Section.: Finite Automata Fordham University Department of Computer and Information Sciences
More informationToday s Agenda: 1) Why Do We Need To Measure The Memory Component? 2) Machine Pool Memory / Best Practice Guidelines
Today s Agenda: 1) Why Do We Need To Measure The Memory Component? 2) Machine Pool Memory / Best Practice Guidelines 3) Techniques To Measure The Memory Component a) Understanding Your Current Environment
More informationOur Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering
Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions:
More informationPhysics Motion Math. (Read objectives on screen.)
Physics 302 - Motion Math (Read objectives on screen.) Welcome back. When we ended the last program, your teacher gave you some motion graphs to interpret. For each section, you were to describe the motion
More informationThis chapter covers asymptotic analysis of function growth and big-o notation.
Chapter 14 Big-O This chapter covers asymptotic analysis of function growth and big-o notation. 14.1 Running times of programs An important aspect of designing a computer programs is figuring out how well
More informationIntroduction to Algorithms
Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that
More informationITI Introduction to Computing II
(with contributions from R. Holte) School of Electrical Engineering and Computer Science University of Ottawa Version of January 11, 2015 Please don t print these lecture notes unless you really need to!
More informationMarwan Burelle. Parallel and Concurrent Programming. Introduction and Foundation
and and marwan.burelle@lse.epita.fr http://wiki-prog.kh405.net Outline 1 2 and 3 and Evolutions and Next evolutions in processor tends more on more on growing of cores number GPU and similar extensions
More informationLet s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.
Finite State Machines Introduction Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Such devices form
More informationCOSMOS: Making Robots and Making Robots Intelligent Lecture 3: Introduction to discrete-time dynamics
COSMOS: Making Robots and Making Robots Intelligent Lecture 3: Introduction to discrete-time dynamics Jorge Cortés and William B. Dunbar June 3, 25 Abstract In this and the coming lecture, we will introduce
More informationEE 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Fall 2016
EE 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Fall 2016 Discrete Event Simulation Stavros Tripakis University of California, Berkeley Stavros Tripakis (UC Berkeley)
More information