GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation

Size: px
Start display at page:

Download "GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation"

Transcription

1 GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation Konrad Trifunovic 2 Albert Cohen 2 David Edelsohn 3 Li Feng 6 Tobias Grosser 5 Harsha Jagasia 1 Razya Ladelsky 4 Sebastian Pop 1 Jan Sjödin 1 Ramakrishna Upadrasta 2 1 Open Source Compiler Engineering, AMD, Austin, Texas, USA 2 INRIA Saclay Île-de-France and LRI, Paris-Sud 11 University, Orsay, France 3 IBM T. J. Watson Research, Yorktown Heights, USA 4 IBM Haifa Research, Haifa, Israel 5 University of Passau, Passau, Germany 6 Xi an Jiaotong University, Xi an, China January 30, 2010 GROW Workshop, Jan 2010, Pisa, Italy 1 / 13

2 1.Motivation Keeping sustained performance increase GROW Workshop, Jan 2010, Pisa, Italy 2 / 13

3 1.Motivation Keeping sustained performance increase Multi-level parallelism (ILP) Instruction-Level-Parallelism (instruction scheduling) Data-level parallelism (vectorization) Thread-level parallelism (automatic parallelization) GROW Workshop, Jan 2010, Pisa, Italy 2 / 13

4 1.Motivation Keeping sustained performance increase Multi-level parallelism (ILP) Instruction-Level-Parallelism (instruction scheduling) Data-level parallelism (vectorization) Thread-level parallelism (automatic parallelization) Memory hierarchy Caches Registers Scratchpad memories GROW Workshop, Jan 2010, Pisa, Italy 2 / 13

5 1.Motivation Keeping sustained performance increase Multi-level parallelism (ILP) Instruction-Level-Parallelism (instruction scheduling) Data-level parallelism (vectorization) Thread-level parallelism (automatic parallelization) Memory hierarchy Caches Registers Scratchpad memories Need for complex program (loop) optimizations GROW Workshop, Jan 2010, Pisa, Italy 2 / 13

6 2.Why polyhedral model in GCC? Why polyhedral model in GCC? GROW Workshop, Jan 2010, Pisa, Italy 3 / 13

7 2.Why polyhedral model in GCC? Why polyhedral model in GCC? Source to source compilers Syntax based Output source code might lose semantical information Need for source code normalization GROW Workshop, Jan 2010, Pisa, Italy 3 / 13

8 2.Why polyhedral model in GCC? Why polyhedral model in GCC? Source to source compilers Syntax based Output source code might lose semantical information Need for source code normalization Low level internal polyhedral representation Semantics based SSA GIMPLE form Scalar evolution analysis (inductions, reductions) Leveraging > 100 optimization passes of GCC Tight interaction with vectorizer, parallelizer and memory layout optimizations GROW Workshop, Jan 2010, Pisa, Italy 3 / 13

9 3.Compilation workflow Compilation workflow GIMPLE, SSA, CFG SCoP detection GPOLY C C++ F95 SCoPs GPOLY construction Legality check Transformations transformed GPOLY GENERIC GIMPLE GIMPLE+CFG+SSA+LOOP GRAPHITE GRAPHITE pass GLOOG (CLOOG based) GIMPLE RTL GIMPLE, SSA, CFG ASM x86 PPC SPU GROW Workshop, Jan 2010, Pisa, Italy 4 / 13

10 4.Polyhedral model Domains GPOLY - Iteration domains D S = {(v, h) 0 v, h N 1 hv 0 v < N h < N h 0 v for ( v =0; v<n; v ++) for ( h =0; h<n; h ++) out [v][h] = 0; GROW Workshop, Jan 2010, Pisa, Italy 5 / 13

11 4.Polyhedral model Domains GPOLY - Iteration domains D S = {(v, h) 0 v, h N 1 hv 0 v < N h < N for ( v =0; v<n; v ++) for ( h =0; h<n; h ++) out [v][h] = 0; h 0 v v B h C N A v 0 v N 1 h 0 h N 1 GROW Workshop, Jan 2010, Pisa, Italy 5 / 13

12 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

13 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

14 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

15 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

16 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

17 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

18 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

19 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

20 4.Polyhedral model Data accesses Data accesses - mapping iterations to memory f (i, g) = F (i, g, 1) T t 2 hv 0 v < N h < N h 0 v t 1 out[1][1] out[1][2] out[1][3] Linearized memory layout out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

21 4.Polyhedral model Scheduling Scheduling - execution order t = θ S (i) = Θ S (i, g, 1) T h v 0 v < N h v 0 v < N h < N h < N h 0 v for ( v =0; v<n; v ++) for ( h =0; h<n; h ++) out [v][h] = 0; Θ S = [ ] h 0 for (t1 =0; t1 <N; t1 ++) for (t2 =0; t2 <N; t2 ++) out [t2 ][ t1] = 0; Θ S = [ ] v GROW Workshop, Jan 2010, Pisa, Italy 7 / 13

22 5.SSA-based polyhedral model bb 3 i_21 = PHI <i_11(7), 0(2)> b[i_21] = 0.0; b_i_lsm.5_16 = b[i_21]; Polyhedral representation bb 4 j_22 = PHI <j_10(5), 0(3)> pre.3_28 = PHI <D.3_9(5), 0.0(3)> D.0_6 = A[i_21][j_22]; D.1_7 = x[j_22]; D.2_8 = D.1_7 * D.0_6; D.3_9 = D.2_8 + pre.3_28; b_i_lsm.5_5 = D.3_9; j_10 = j_22 + 1; if (j_10 < N) goto <bb 5>; else goto <bb 6>; Dbb3 S = {(i) 0 i N 1 Dbb4 S = {(i, j) 0 i N 1 0 j N 1 Dbb6 S = {(i) 0 i N 1 Fdr1 = {(i, a, s1) a = 0 s1 = i 0 s1 N 1 Fdr2 = {(i, j, a, s1) a = 1 s1 = j 0 s1 N 1 Fdr4 = {(i, a, s1) a = 0 s1 = i 0 s1 N 1 θbb3 = {(i, t1, t2, t3) t1 = 0 t2 = i t3 = 0 θbb4 = {(i, j, t1, t2, t3, t4, t5) t1 = 0 t2 = i t3 = 1 t4 = j t5 = 0 bb 5 goto <bb 4> bb 6 b_i_lsm.5_30 = PHI <b_i_lsm.5_5(4)> b[i_21] = b_i_lsm.5_30; i_11 = i_21 + 1; if (i_11 < N) goto <bb 7>; else goto <bb 8>; bb 8 bb 7 MVT kernel for (i = 0; i < N; i++) { b[i] = 0; for (j = 0; j < N; j++) b[i] += A[i][j] * x[j]; return; goto <bb 3>; GROW Workshop, Jan 2010, Pisa, Italy 8 / 13

23 6.Research Cost-modelling for vectorization Cost-modelling for vectorization for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for ( i =0; i<k; i ++) for ( j =0; j<k; j ++) s+= img [v+i][h+j] * filter [i][ j]; out [v][h]= s; [Trifunovic et al. 2009] GROW Workshop, Jan 2010, Pisa, Italy 9 / 13

24 6.Research Cost-modelling for vectorization Cost-modelling for vectorization for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for ( i =0; i<k; i ++) for ( j =0; j<k; j ++) s+= img [v+i][h+j] * filter [i][ j]; out [v][h]= s; for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for (i =0; i<k; i ++) { vs [0:3]={0,0,0,0; for (j =0; j<k; j +=4) { vs [0:3]+= img [v+i][h+j:h+j +3] * filter [i][j:j +3] s+= sum (vs [0:3]); out [v][h] = s; [Trifunovic et al. 2009] GROW Workshop, Jan 2010, Pisa, Italy 9 / 13

25 6.Research Cost-modelling for vectorization Cost-modelling for vectorization for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for ( i =0; i<k; i ++) for ( j =0; j<k; j ++) s+= img [v+i][h+j] * filter [i][ j]; out [v][h]= s; for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for (i =0; i<k; i ++) { vs [0:3]={0,0,0,0; for (j =0; j<k; j +=4) { vs [0:3]+= img [v+i][h+j:h+j +3] * filter [i][j:j +3] s+= sum (vs [0:3]); out [v][h] = s; Reduction costs: sum operation vector vs is reduced into scalar s: N 2 K number of times [Trifunovic et al. 2009] GROW Workshop, Jan 2010, Pisa, Italy 9 / 13

26 6.Research Cost-modelling for vectorization Cost-modelling for vectorization for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for ( i =0; i<k; i ++) for ( j =0; j<k; j ++) s+= img [v+i][h+j] * filter [i][ j]; out [v][h]= s; for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for (i =0; i<k; i ++) { vs [0:3]={0,0,0,0; for (j =0; j<k; j +=4) { vs [0:3]+= img [v+i][h+j:h+j +3] * filter [i][j:j +3] s+= sum (vs [0:3]); out [v][h] = s; Reduction costs: sum operation vector vs is reduced into scalar s: N 2 K number of times Benefits: VF = 4 scalar ops are replaced by 1 vector operation [Trifunovic et al. 2009] GROW Workshop, Jan 2010, Pisa, Italy 9 / 13

27 6.Research Cost-modelling for vectorization Cost-modelling for vectorization for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for ( i =0; i<k; i ++) for ( j =0; j<k; j ++) s+= img [v+i][h+j] * filter [i][ j]; out [v][h]= s; for ( v =0; v<n; v ++) for (h =0; h<n; h ++) { s =0; for (i =0; i<k; i ++) { vs [0:3]={0,0,0,0; for (j =0; j<k; j +=4) { vs [0:3]+= img [v+i][h+j:h+j +3] * filter [i][j:j +3] s+= sum (vs [0:3]); out [v][h] = s; Reduction costs: sum operation vector vs is reduced into scalar s: N 2 K number of times Benefits: VF = 4 scalar ops are replaced by 1 vector operation [Trifunovic et al. 2009] GROW Workshop, Jan 2010, Pisa, Italy 9 / 13

28 6.Research Cost-modelling for vectorization Cost-modelling for vectorization Front end Vectorized loop selection Middle End GIMPLE SSA Loop nest optimization Graphite pass and loop nest level model Analytical modeling Vectorization pass Vectorization API and instruction level model Back end RTL GROW Workshop, Jan 2010, Pisa, Italy 10 / 13

29 6.Research Automatic parallelization Autopar h v 0 v < N h < N parloop () { for (h = 0; h < N; h ++) for (v = 1; v < N; v ++) x[h][v] = x[h][v -1] + 1; (a) h v 0 h 0 v v < N h < N h 0 v parloop () {. paral_data.x = &x; builtin_gomp_parallel_start ( parloop. _loopfn, &. paral_data, 4); parloop. _loopfn (&. paral_data ); builtin_gomp_parallel_end (); parloop. _loopfn (. paral_data ) { for (h = start ; h < end ; h ++) for (v = 1; v < N; v ++) (*. paral_data ->x)[h][v] = x[h][v -1] + 1; (b) GROW Workshop, Jan 2010, Pisa, Italy 11 / 13

30 7.Alias Analysis and GRAPHITE Encoding aliasing information Representation of Alias-sets in GRAPHITE Dependence Analysis requires Alias information Alias sets encoded as an extra dimension of access functions int a[10], b[10]; A_1 A_2 void foo (int *p); a p b Points-to mapping: a {A 1, p {A 1, A 2, b {A 2 Equivalent to solving Minimum Edge Clique Cover (ECC) NP-Complete problem GROW Workshop, Jan 2010, Pisa, Italy 12 / 13

31 7.Alias Analysis and GRAPHITE Empirical Analysis on Alias Graphs (4481 graphs) Only 11 graphs are interesting, up to 90 vertices In others, every connected component is a clique! 1,5 2,8 1,5 2,8 9, 10, 11, 12, 13, 14, 15, 16 9, 10, 11, 12, 13, 14, 15, 16 4,6 3,7 4,6 3,7 (i) (ii) Alias Graph from H.264 Future Work A faster algorithm using modular decomposition properties Currently, the fastest is a O( V E ) algorithm ([Gramm et al. 2009], in Haskell, using Patricia trees, does not seem simple to implement) GROW Workshop, Jan 2010, Pisa, Italy 13 / 13

32 8.Development Development Libraries used PPL - The Parma Polyhedra Library CLooG - the Chunky Loop Generator Year Commits Weekly phonecalls Every Wednesday Pisa time sip: @iptel.org GROW Workshop, Jan 2010, Pisa, Italy 14 / 13

33 9.Bibliography Bibliography [Gramm et al. 2009] J. Gramm, J. Guo, F. Hüffner and R. Niedermeier. Data reduction and exact algorithms for clique cover. J. Exp. Algorithmics, 14: , [Trifunovic et al. 2009] K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks and I. Rosen. Polyhedral-Model Guided Loop-Nest Auto-Vectorization. In Parallel Architectures and Compilation Techniques (PACT 09), Raleigh, North Carolina, Sept GROW Workshop, Jan 2010, Pisa, Italy 15 / 13

34 10.Questions Thank You for Your attention Questions? GROW Workshop, Jan 2010, Pisa, Italy 16 / 13

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Loop Transformations Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Today Recall stencil computations Intro to loop transformations Data dependencies between

More information

Hydra: Generation and Tuning of parallel solutions for linear algebra equations. Alexandre X. Duchâteau University of Illinois at Urbana Champaign

Hydra: Generation and Tuning of parallel solutions for linear algebra equations. Alexandre X. Duchâteau University of Illinois at Urbana Champaign Hydra: Generation and Tuning of parallel solutions for linear algebra equations Alexandre X. Duchâteau University of Illinois at Urbana Champaign Collaborators Thesis Advisors Denis Barthou (Labri/INRIA

More information

N-Synchronous Kahn Networks A Relaxed Model of Synchrony for Real-Time Systems

N-Synchronous Kahn Networks A Relaxed Model of Synchrony for Real-Time Systems N-Synchronous Kahn Networks A Relaxed Model of Synchrony for Real-Time Systems Albert Cohen 1, Marc Duranton 2, Christine Eisenbeis 1, Claire Pagetti 1,4, Florence Plateau 3 and Marc Pouzet 3 POPL, Charleston

More information

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

COSE312: Compilers. Lecture 17 Intermediate Representation (2) COSE312: Compilers Lecture 17 Intermediate Representation (2) Hakjoo Oh 2017 Spring Hakjoo Oh COSE312 2017 Spring, Lecture 17 May 31, 2017 1 / 19 Common Intermediate Representations Three-address code

More information

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

Computing the transitive closure of a union of affine integer tuple relations

Computing the transitive closure of a union of affine integer tuple relations Computing the transitive closure of a union of affine integer tuple relations Anna Beletska, Denis Barthou, Wlodzirmierz Bielecki, Albert Cohen To cite this version: Anna Beletska, Denis Barthou, Wlodzirmierz

More information

Organization of a Modern Compiler. Middle1

Organization of a Modern Compiler. Middle1 Organization of a Modern Compiler Source Program Front-end syntax analysis + type-checking + symbol table High-level Intermediate Representation (loops,array references are preserved) Middle1 loop-level

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Scheduling: Exact Methods

ECE 5775 (Fall 17) High-Level Digital Design Automation. Scheduling: Exact Methods ECE 5775 (Fall 17) High-Level Digital Design Automation Scheduling: Exact Methods Announcements Sign up for the first student-led discussions today One slot remaining Presenters for the 1st session will

More information

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1 Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt

More information

B629 project - StreamIt MPI Backend. Nilesh Mahajan

B629 project - StreamIt MPI Backend. Nilesh Mahajan B629 project - StreamIt MPI Backend Nilesh Mahajan March 26, 2013 Abstract StreamIt is a language based on the dataflow model of computation. StreamIt consists of computation units called filters connected

More information

Recent advances in HPC with FreeFem++

Recent advances in HPC with FreeFem++ Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud homme. Outline

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43 Most Essential Assumptions for Real-Time Systems Upper

More information

Evaluation and Validation

Evaluation and Validation Evaluation and Validation Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2016 年 01 月 05 日 These slides use Microsoft clip arts. Microsoft copyright

More information

Compiling Techniques

Compiling Techniques Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Topic-I-C Dataflow Analysis

Topic-I-C Dataflow Analysis Topic-I-C Dataflow Analysis 2012/3/2 \course\cpeg421-08s\topic4-a.ppt 1 Global Dataflow Analysis Motivation We need to know variable def and use information between basic blocks for: constant folding dead-code

More information

ECE 5775 (Fall 17) High-Level Digital Design Automation. Control Flow Graph

ECE 5775 (Fall 17) High-Level Digital Design Automation. Control Flow Graph ECE 5775 (Fall 17) High-Level Digital Design Automation Control Flow Graph Announcements ECE colloquium on Monday 9/11 from 4:30pm, PHL 233 Smart Healthcare by Prof. Niraj Jha of Princeton Lab 1 is due

More information

Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen

Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen Advanced Restructuring Compilers Advanced Topics Spring 2009 Prof. Robert van Engelen Overview Data and control dependences The theory and practice of data dependence analysis K-level loop-carried dependences

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

Static Program Analysis using Abstract Interpretation

Static Program Analysis using Abstract Interpretation Static Program Analysis using Abstract Interpretation Introduction Static Program Analysis Static program analysis consists of automatically discovering properties of a program that hold for all possible

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper

More information

An Algorithm to Determine the Clique Number of a Split Graph. In this paper, we propose an algorithm to determine the clique number of a split graph.

An Algorithm to Determine the Clique Number of a Split Graph. In this paper, we propose an algorithm to determine the clique number of a split graph. An Algorithm to Determine the Clique Number of a Split Graph O.Kettani email :o_ket1@yahoo.fr Abstract In this paper, we propose an algorithm to determine the clique number of a split graph. Introduction

More information

Topics on Compilers

Topics on Compilers Assignment 2 4541.775 Topics on Compilers Sample Solution 1. Test for dependences on S. Write down the subscripts. Which positions are separable, which are coupled? Which dependence test would you apply

More information

Disjunctive relational abstract interpretation for interprocedural program analysis

Disjunctive relational abstract interpretation for interprocedural program analysis Disjunctive relational abstract interpretation for interprocedural program analysis Nicolas Halbwachs, joint work with Rémy Boutonnet Verimag/CNRS, and Grenoble-Alpes University Grenoble, France R. Boutonnet,

More information

11 Parallel programming models

11 Parallel programming models 237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages

More information

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo Dependence Analysis Dependence Examples Last Time: Brief introduction to interprocedural analysis Today: Optimization for parallel machines and memory hierarchies Dependence analysis Loop transformations

More information

Advanced Compiler Construction

Advanced Compiler Construction CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 DEPENDENCE ANALYSIS The slides adapted from Vikram Adve and David Padua Kinds of Data Dependence Direct Dependence X =

More information

EFFICIENT PREDICATE ABSTRACTION OF PROGRAM SUMMARIES

EFFICIENT PREDICATE ABSTRACTION OF PROGRAM SUMMARIES EFFICIENT PREDICATE ABSTRACTION OF PROGRAM SUMMARIES Arie Gurfinkel, Sagar Chaki and Samir Sapra Carnegie Mellon Uni In NFM11 Presented by Nimrod Partush OUTLINE Introduction Predicate Abstraction CEGAR

More information

A Trillion Particles: Studying Large-Scale Structure Formation on the BG/Q

A Trillion Particles: Studying Large-Scale Structure Formation on the BG/Q A Trillion Particles: Studying Large-Scale Structure Formation on the BG/Q David Daniel, Patricia Fasel, Hal Finkel, Nicholas Frontiere, Salman Habib, Katrin Heitmann, Zarija Lukic, Adrian Pope July 10,

More information

A new dichotomic algorithm for the uniform random generation of words in regular languages

A new dichotomic algorithm for the uniform random generation of words in regular languages A new dichotomic algorithm for the uniform random generation of words in regular languages Johan Oudinet 1,2, Alain Denise 1,2,3, and Marie-Claude Gaudel 1,2 1 Univ Paris-Sud, Laboratoire LRI, UMR8623,

More information

Building Algorithmic Polytopes. David Bremner with D. Avis, H.R. Tiwary, and O. Watanabe December 16, 2014

Building Algorithmic Polytopes. David Bremner with D. Avis, H.R. Tiwary, and O. Watanabe December 16, 2014 Building Algorithmic Polytopes David Bremner with D. Avis, H.R. Tiwary, and O. Watanabe December 16, 2014 Outline Matchings in graphs Given G = (V, E), M E is called a matching {e M v e} 1 v V A matching

More information

Instruction Selection

Instruction Selection Compiler Design Instruction Selection Hwansoo Han Structure of a Compiler O(n) O(n) O(n log n) Scanner words Parser IR Analysis & Optimization IR Instruction Selection Either fast or NP-Complete asm asm

More information

Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling.

Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling. Advanced Topics Which Loops are Parallel? review Optimization for parallel machines and memory hierarchies Last Time Dependence analysis Today Loop transformations An example - McKinley, Carr, Tseng loop

More information

Space-aware data flow analysis

Space-aware data flow analysis Space-aware data flow analysis C. Bernardeschi, G. Lettieri, L. Martini, P. Masci Dip. di Ingegneria dell Informazione, Università di Pisa, Via Diotisalvi 2, 56126 Pisa, Italy {cinzia,g.lettieri,luca.martini,paolo.masci}@iet.unipi.it

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Adaptive Coordinate Descent

Adaptive Coordinate Descent Adaptive Coordinate Descent Ilya Loshchilov 1,2, Marc Schoenauer 1,2, Michèle Sebag 2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 and Laboratoire de Recherche en Informatique (UMR CNRS 8623) Université

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Multicore Semantics and Programming

Multicore Semantics and Programming Multicore Semantics and Programming Peter Sewell Tim Harris University of Cambridge Oracle October November, 2015 p. 1 These Lectures Part 1: Multicore Semantics: the concurrency of multiprocessors and

More information

Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators

Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators PARIS-SACLAY, FRANCE Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators Serge G. Petiton a and Langshi Chen b April the 6 th, 2016 a Université de Lille, Sciences et Technologies

More information

Algorithms and Their Complexity

Algorithms and Their Complexity CSCE 222 Discrete Structures for Computing David Kebo Houngninou Algorithms and Their Complexity Chapter 3 Algorithm An algorithm is a finite sequence of steps that solves a problem. Computational complexity

More information

Construction of Static Single-Assignment Form

Construction of Static Single-Assignment Form COMP 506 Rice University Spring 2018 Construction of Static Single-Assignment Form Part II source IR IR target code Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all

More information

A Functional Perspective

A Functional Perspective A Functional Perspective on SSA Optimization Algorithms Patryk Zadarnowski jotly with Manuel M. T. Chakravarty Gabriele Keller 19 April 2004 University of New South Wales Sydney THE PLAN ➀ Motivation an

More information

Structure. Background. them to their higher order equivalents. functionals in Standard ML source code and converts

Structure. Background. them to their higher order equivalents. functionals in Standard ML source code and converts Introduction 3 Background functionals factor out common behaviour map applies a function to every element of a list fun map [ ] = [ ] map f ( x : : xs ) = f x : : map f xs filter keeps only those elements

More information

' $ Dependence Analysis & % 1

' $ Dependence Analysis & % 1 Dependence Analysis 1 Goals - determine what operations can be done in parallel - determine whether the order of execution of operations can be altered Basic idea - determine a partial order on operations

More information

Solving Recurrences. Lecture 23 CS2110 Fall 2011

Solving Recurrences. Lecture 23 CS2110 Fall 2011 Solving Recurrences Lecture 23 CS2110 Fall 2011 1 Announcements Makeup Prelim 2 Monday 11/21 7:30-9pm Upson 5130 Please do not discuss the prelim with your classmates! Quiz 4 next Tuesday in class Topics:

More information

Material Covered on the Final

Material Covered on the Final Material Covered on the Final On the final exam, you are responsible for: Anything covered in class, except for stories about my good friend Ken Kennedy All lecture material after the midterm ( below the

More information

Optimization Techniques for Parallel Code 1. Parallel programming models

Optimization Techniques for Parallel Code 1. Parallel programming models Optimization Techniques for Parallel Code 1. Parallel programming models Sylvain Collange Inria Rennes Bretagne Atlantique http://www.irisa.fr/alf/collange/ sylvain.collange@inria.fr OPT - 2017 Goals of

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Register Allocation. Maryam Siahbani CMPT 379 4/5/2016 1

Register Allocation. Maryam Siahbani CMPT 379 4/5/2016 1 Register Allocation Maryam Siahbani CMPT 379 4/5/2016 1 Register Allocation Intermediate code uses unlimited temporaries Simplifying code generation and optimization Complicates final translation to assembly

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

Homework #2: assignments/ps2/ps2.htm Due Thursday, March 7th.

Homework #2:   assignments/ps2/ps2.htm Due Thursday, March 7th. Homework #2: http://www.cs.cornell.edu/courses/cs612/2002sp/ assignments/ps2/ps2.htm Due Thursday, March 7th. 1 Transformations and Dependences 2 Recall: Polyhedral algebra tools for determining emptiness

More information

Column Generation for Extended Formulations

Column Generation for Extended Formulations 1 / 28 Column Generation for Extended Formulations Ruslan Sadykov 1 François Vanderbeck 2,1 1 INRIA Bordeaux Sud-Ouest, France 2 University Bordeaux I, France ISMP 2012 Berlin, August 23 2 / 28 Contents

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

Mechanized Operational Semantics

Mechanized Operational Semantics Mechanized Operational Semantics J Strother Moore Department of Computer Sciences University of Texas at Austin Marktoberdorf Summer School 2008 (Lecture 5: Boyer-Moore Fast String Searching) 1 The Problem

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization Wei Li 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11,

More information

Scalar Optimisation Part 1

Scalar Optimisation Part 1 calar Optimisation Part 1 Michael O Boyle January, 2014 1 Course tructure L1 Introduction and Recap 4/5 lectures on classical optimisation 2 lectures on scalar optimisation Today example optimisations

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

A Functional Language for Hyperstreaming XSLT

A Functional Language for Hyperstreaming XSLT A Functional Language for Hyperstreaming XSLT Pavel Labath 1, Joachim Niehren 2 1 Commenius University, Bratislava, Slovakia 2 Inria Lille, France May 2013 Motivation some types of tree transformations

More information

Decision Diagrams for Discrete Optimization

Decision Diagrams for Discrete Optimization Decision Diagrams for Discrete Optimization Willem Jan van Hoeve Tepper School of Business Carnegie Mellon University www.andrew.cmu.edu/user/vanhoeve/mdd/ Acknowledgments: David Bergman, Andre Cire, Samid

More information

sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy

sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy 2D Implicit Charge- and Energy- Conserving sri Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy Mentors Dana Knoll and Allen McPherson IS&T CoDesign Summer School 2012, Los Alamos

More information

Modeling and Tuning Parallel Performance in Dense Linear Algebra

Modeling and Tuning Parallel Performance in Dense Linear Algebra Modeling and Tuning Parallel Performance in Dense Linear Algebra Initial Experiences with the Tile QR Factorization on a Multi Core System CScADS Workshop on Automatic Tuning for Petascale Systems Snowbird,

More information

Optimizing Intra-Task Voltage Scheduling Using Data Flow Analysis Λ

Optimizing Intra-Task Voltage Scheduling Using Data Flow Analysis Λ Optimizing Intra-Task Voltage Scheduling Using Data Flow Analysis Λ Dongkun Shin Jihong Kim School of CSE School of CSE Seoul National University Seoul National University Seoul, Korea 151-742 Seoul, Korea

More information

Data-Flow Analysis. Compiler Design CSE 504. Preliminaries

Data-Flow Analysis. Compiler Design CSE 504. Preliminaries Data-Flow Analysis Compiler Design CSE 504 1 Preliminaries 2 Live Variables 3 Data Flow Equations 4 Other Analyses Last modifled: Thu May 02 2013 at 09:01:11 EDT Version: 1.3 15:28:44 2015/01/25 Compiled

More information

Linear Programming. Scheduling problems

Linear Programming. Scheduling problems Linear Programming Scheduling problems Linear programming (LP) ( )., 1, for 0 min 1 1 1 1 1 11 1 1 n i x b x a x a b x a x a x c x c x z i m n mn m n n n n! = + + + + + + = Extreme points x ={x 1,,x n

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Formal Avenue for Chasing Metamorphic Malware

Formal Avenue for Chasing Metamorphic Malware Formal Avenue for Chasing Metamorphic Malware Mila Dalla Preda University of Verona, Italy Joint work with Roberto Giacobazzi, Saumya Debray, Arun Lakhotia presented by Isabella Mastroeni CREST, May 30th

More information

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:

More information

A PARALLEL INTERIOR POINT DECOMPOSITION ALGORITHM FOR BLOCK-ANGULAR SEMIDEFINITE PROGRAMS IN POLYNOMIAL OPTIMIZATION

A PARALLEL INTERIOR POINT DECOMPOSITION ALGORITHM FOR BLOCK-ANGULAR SEMIDEFINITE PROGRAMS IN POLYNOMIAL OPTIMIZATION A PARALLEL INTERIOR POINT DECOMPOSITION ALGORITHM FOR BLOCK-ANGULAR SEMIDEFINITE PROGRAMS IN POLYNOMIAL OPTIMIZATION Kartik K. Sivaramakrishnan Department of Mathematics North Carolina State University

More information

Model Checking, Theorem Proving, and Abstract Interpretation: The Convergence of Formal Verification Technologies

Model Checking, Theorem Proving, and Abstract Interpretation: The Convergence of Formal Verification Technologies Model Checking, Theorem Proving, and Abstract Interpretation: The Convergence of Formal Verification Technologies Tom Henzinger EPFL Three Verification Communities Model checking: -automatic, but inefficient

More information

Saving Energy in Sparse and Dense Linear Algebra Computations

Saving Energy in Sparse and Dense Linear Algebra Computations Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain

More information

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide) Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

Compiling Techniques

Compiling Techniques Lecture 7: Abstract Syntax 13 October 2015 Table of contents Syntax Tree 1 Syntax Tree Semantic Actions Examples Abstract Grammar 2 Internal Representation AST Builder 3 Visitor Processing Semantic Actions

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Programming Language Concepts, CS2104 Lecture 3

Programming Language Concepts, CS2104 Lecture 3 Programming Language Concepts, CS2104 Lecture 3 Statements, Kernel Language, Abstract Machine 31 Aug 2007 CS2104, Lecture 3 1 Reminder of last lecture Programming language definition: syntax, semantics

More information

Operational Semantics

Operational Semantics Operational Semantics Semantics and applications to verification Xavier Rival École Normale Supérieure Xavier Rival Operational Semantics 1 / 50 Program of this first lecture Operational semantics Mathematical

More information

Background. Another interests. Sieve method. Parallel Sieve Processing on Vector Processor and GPU. RSA Cryptography

Background. Another interests. Sieve method. Parallel Sieve Processing on Vector Processor and GPU. RSA Cryptography Background Parallel Sieve Processing on Vector Processor and GPU Yasunori Ushiro (Earth Simulator Center) Yoshinari Fukui (Earth Simulator Center) Hidehiko Hasegawa (Univ. of Tsukuba) () RSA Cryptography

More information

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro

More information

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 3 13 January 2009 Acknowledgments Slides

More information

Principles of Program Analysis: A Sampler of Approaches

Principles of Program Analysis: A Sampler of Approaches Principles of Program Analysis: A Sampler of Approaches Transparencies based on Chapter 1 of the book: Flemming Nielson, Hanne Riis Nielson and Chris Hankin: Principles of Program Analysis Springer Verlag

More information

Parallelism and Machine Models

Parallelism and Machine Models Parallelism and Machine Models Andrew D Smith University of New Brunswick, Fredericton Faculty of Computer Science Overview Part 1: The Parallel Computation Thesis Part 2: Parallelism of Arithmetic RAMs

More information

Compiling Techniques

Compiling Techniques Lecture 6: 9 October 2015 Announcement New tutorial session: Friday 2pm check ct course webpage to find your allocated group Table of contents 1 2 Ambiguity s Bottom-Up Parser A bottom-up parser builds

More information

THE UNIVERSITY OF MICHIGAN. Faster Static Timing Analysis via Bus Compression

THE UNIVERSITY OF MICHIGAN. Faster Static Timing Analysis via Bus Compression Faster Static Timing Analysis via Bus Compression by David Van Campenhout and Trevor Mudge CSE-TR-285-96 THE UNIVERSITY OF MICHIGAN Computer Science and Engineering Division Department of Electrical Engineering

More information

Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers

Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers JMLR: Workshop and Conference Proceedings vol 35:1 8, 014 Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers Balázs Kégl LAL/LRI, University

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA

More information

Saturday, April 23, Dependence Analysis

Saturday, April 23, Dependence Analysis Dependence Analysis Motivating question Can the loops on the right be run in parallel? i.e., can different processors run different iterations in parallel? What needs to be true for a loop to be parallelizable?

More information

Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++ Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology

Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology Kyung-Goo Doh 1, Hyunha Kim 1, David A. Schmidt 2 1. Hanyang University, Ansan, South Korea 2. Kansas

More information

Flow grammars a flow analysis methodology

Flow grammars a flow analysis methodology Flow grammars a flow analysis methodology James S. Uhl and R. Nigel Horspool Dept. of Computer Science, University of Victoria P.O. Box 3055, Victoria, BC, Canada V8W 3P6 E-mail: juhl@csr.uvic.ca, nigelh@csr.uvic.ca

More information

An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions

An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions Shinobu Nagayama Tsutomu Sasao Jon T. Butler Dept. of Computer and Network Eng., Hiroshima City University, Hiroshima,

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Dataflow Analysis. A sample program int fib10(void) { int n = 10; int older = 0; int old = 1; Simple Constant Propagation

Dataflow Analysis. A sample program int fib10(void) { int n = 10; int older = 0; int old = 1; Simple Constant Propagation -74 Lecture 2 Dataflow Analysis Basic Blocks Related Optimizations SSA Copyright Seth Copen Goldstein 200-8 Dataflow Analysis Last time we looked at code transformations Constant propagation Copy propagation

More information

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse Yongjia Song, James Luedtke Virginia Commonwealth University, Richmond, VA, ysong3@vcu.edu University

More information

Functional Big-step Semantics

Functional Big-step Semantics Functional Big-step Semantics FM talk, 11 Mar 2015 Magnus Myréen Books Big-step semantics are defined as inductively defined relation. Functions are better! me Context: CakeML verified compiler Old compiler:

More information

NP-Completeness. A language B is NP-complete iff B NP. This property means B is NP hard

NP-Completeness. A language B is NP-complete iff B NP. This property means B is NP hard NP-Completeness A language B is NP-complete iff B NP A NP A P B This property means B is NP hard 1 3SAT is NP-complete 2 Result Idea: B is known to be NP complete Use it to prove NP-Completeness of C IF

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information