Automatic Loop Interchange

Size: px
Start display at page:

Download "Automatic Loop Interchange"

Transcription

1 RETROSPECTIVE: Automatic Loop Interchange Randy Allen Catalytic Compilers 1900 Embarcadero Rd, #206 Palo Alto, CA Ken Kennedy Department of Computer Science Rice University Houston, TX Retrospectives provide a rare and interesting opportunity to reflect upon the past and recall (or more accurately after a couple of decades, speculate upon) our state of mind and understandings at earlier times. Automatic Loop Interchange was published almost 20 years ago at a midpoint in research on data dependence and program transformations. This paper pulled together the work of many predecessors [9,10,11,15] into a simple, clean theory, providing a checkpoint on earlier predictions on the power and applicability of data dependence. At the same time, the paper was published just as the field of data dependence entered a golden age. It is our hope that this paper helped catalyze the following research into data dependence theory and applications. A sabbatical at IBM catalyzed our initial interest in automatic vectorization. We started development with a version of the Parafrase system built at the University of Illinois by Dave Kuck and his group (including Michael Wolfe) [9,15], but we began work on an entirely new system in 1981 to provide a better platform for our research on multilevel code generation [8] (eventually published in 1987 [3]). The new system became known as the Parallel Fortran Converter (PFC). At the time we started the PFC project, few programmers had access to vectorizing compilers that used data dependence. Vector units and vectorizing compilers were employed exclusively on expensive high-end machines or specialized array processors, which were available to only a small percentage of the general programming public. Despite this limited access, vectorizing compilers had already earned the informal nicknames paralyzers and terrorizers due to their large compile times and often less-than-optimal output. We began our effort with modest expectations. PFC was deliberately structured as a source-to-source translator, primarily because we believed the algorithms that we wanted to employ would require more compile-time than could be justified in a production compiler. We also doubted the power of data dependence, and expected that we would need to employ techniques from artificial intelligence to achieve satisfactory results. This paper marked a point in PFC s development where our early assumptions had been proved wrong. What PFC and this paper had shown was that a fairly simple set of program transformations based on a unified underlying theory could provide effective restructuring without requiring unacceptable compile times. 20 Years of the ACM/SIGPLAN Conference on Programming Language Design and Implementation ( ): A Selection, Copyright 2003 ACM $5.00 Foundation Although this paper is entitled Automatic Loop Interchange, it is far broader in scope. As an introduction to interchange, the paper also covers a wide spectrum of dependence-based theory and transformations. This work is built on the efforts of many others, and we would be remiss if we did not acknowledge at least some of those efforts acknowledging all of them would quickly blow our page limits. The earliest papers on dependence-based program transformations include papers by Lamport [10,11] and Kuck [9]. Lamport developed a form of loop interchange for use in vectorization, as well as the wavefront method for parallelization, an early form of what came to be called loop skewing. As indicated earlier, we also had access to the Parafrase system and the associated body of research. In particular, Michael Wolfe s Master s thesis focused on loop interchange [15], a topic that he developed further in later works [16,17]. Our own work on the subject began with our multilevel code generation strategy [8, 1, 2, 3], which we implemented in the summer of Real implementations often provide incredible insights into the weaknesses of theoretical approaches; this was definitely true in the case of PFC. The code generation strategy proved extremely effective in practice and was far more efficient in terms of compile-time than we had anticipated. 1 However, a real implementation quickly showed us that loop interchange was the key missing piece. While PFC performed well in terms of the vectorization it detected, we quickly saw that loop interchange was the key incremental transformation. The practical strategy presented in the paper ( innermosting loops that carried no dependence, testing loops that carried dependences for interchange only to the next deeper position) evolved out of discussions with Randy Scarborough, Joe Warren, and others in the PFC project. The strategy reported in this paper was implemented in the PFC system. Although we reported no experimental results in the paper, a later study reviewed in our book [4] showed that PFC was able to do extremely well on the Callahan, Dongarra, and Levine vectorization tests [6]. 1 At that time, we had to pay for computer time by the CPUminute. The first time that we tried a large test case (roughly 1000 lines of code), Ken insisted that we limit the CPU time to 10 minutes (which was still several thousand dollars of computer time) to avoid blowing our research budget. We didn t expect the test case to complete in the time limit; when it took only 40 seconds, we assumed that PFC had crashed processing the input. It took us a day of wading through the output to verify that it had in fact completely and correctly processed the test. ACM SIGPLAN 75 Best of PLDI

2 Impact The approaches to dependence and loop interchange presented in this paper were soon incorporated into a number of commercial compilers. We are directly aware of the implementations in the IBM compiler for the 3090 Vector Feature [13] and the Convex vectorizing compiler, and were involved in the implementation of the Ardent restructuring compilers. Beyond the immediate practical impact, Automatic Loop Interchange also established interchange as a fundamental transformation in all advanced optimizing compilers: vectorizing, parallelizing, and even scalar. While many previous papers had focused on dependences as execution constraints that limit reordering, this paper (in a section devoted to other applications of dependence) also pointed out the dual aspect: dependences represented reused memory locations. Accordingly, dependence provided a basis for optimizing for memory hierarchies by moving the most frequently accessed memory locations into the fastest elements of the hierarchy. Later research would prove loop interchange to be as important for moving dependences into inner loops (thereby optimizing memory reuse) as it had proven to be for moving dependences out of inner loops (as was necessary for vectorizing loops). Particularly important exemplars of this research are the papers by Callahan, Carr, and Kennedy on register optimization [6] and by Wolf and Lam on cache blocking [14]. Both papers are included in this volume. Practical implementations that included this aspect of dependence include the Ardent compiler [5]. Future Applications Looking back over the past 18 years, we doubt that we would have predicted the impact of loop interchange on the compiler literature. Although our own work and the work of others went on to more powerful transformation strategies based on direction and distance matrices [4, 14, 16, 17], this work was one of the first to establish that powerful and effective program transformations could be implemented in practical compiler systems. Of course, one reason for the growth in importance of this work is the increased use of parallelism in computer architecture and the increasing disparity between CPU and memory speeds. Looking to the future, we believe these factors are only going to increase in the design of computer systems, making these compiler techniques even more relevant. Memory hierarchies in particular are increasingly dominating computation times, and automatic loop interchange is a key transformation for exploiting that hierarchy. While loop interchange has been thoroughly explored in the context of restructuring compilers, there are other contexts which have not been so thoroughly explored. For instance, given the intimate relationship between dependence and loop iterations, it is natural to assume that dependence and loop interchange should have as important a role to play in the design of pipelined architectures as it does in exploiting pipelined architectures. Bibliography 1. J.R. Allen. Dependence analysis for subscripted variables and its application to program transformations. Ph.D dissertation, Department of Mathematical Sciences, Rice University, May, J. R. Allen and K. Kennedy. PFC: a program to convert Fortran to parallel form. In Supercomputers: Design and Applications, K. Hwang, editor, pages IEEE Computer Society Press, August J. R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4): , October R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, R. Allen. Unifying vectorization, parallelization, and optimization: the Ardent compiler. In Proceedings of the Third International Conference on Supercomputing, D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In PLDI 90 (also included in this volume). 7. D. Callahan, J. Dongarra, and D. Levine. Vectorizing compilers: A test suite and results. In Proceedings of Supercomputing 88, Orlando, FL, K. Kennedy. Automatic translation of Fortran programs to vector form. Rice Technical Report , Department of Mathematical Sciences, Rice University, D. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. J. Wolfe. Dependence graphs and compiler optimizations. In Conference Record of the Eighth Annual ACM Symposium on the Principles of Programming Languages, Williamsburg, VA, January L. Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83 93, February L. Lamport. The coordinate method for the parallel execution of iterative DO loops. Technical Report CA , SRI, Menlo Park, CA, August 1976, revised October D. A. Padua and M. J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12): , December R. G. Scarborough and H. G. Kolsky. A vectorizing FORTRAN compiler. IBM Journal of Research and Development, March M. E. Wolf and M. Lam. A data locality optimizing algorithm. In PLDI 91 (also included in this volume). 15. M. J. Wolfe. Techniques for improving the inherent parallelism in programs. Master s thesis, Dept.of Computer Science, University of Illinois at Urbana-Champaign, July M. J. Wolfe. Advanced loop interchanging. In Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, IL, August M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, Redwood City, CA, Acknowledgements As was the case at the time the paper was published, this work has progressed over the years only by the efforts and collaborations of others far too numerous to list here. However, we would be remiss if we did not acknowledge the contributions of Randy Scarborough, Joe Warren, Horace Flatt, and all the graduate students who worked on PFC. ACM SIGPLAN 76 Best of PLDI

3 ACM SIGPLAN 77 Best of PLDI

4 ACM SIGPLAN 78 Best of PLDI

5 ACM SIGPLAN 79 Best of PLDI

6 ACM SIGPLAN 80 Best of PLDI

7 ACM SIGPLAN 81 Best of PLDI

8 ACM SIGPLAN 82 Best of PLDI

9 ACM SIGPLAN 83 Best of PLDI

10 ACM SIGPLAN 84 Best of PLDI

11 ACM SIGPLAN 85 Best of PLDI

12 ACM SIGPLAN 86 Best of PLDI

13 ACM SIGPLAN 87 Best of PLDI

14 ACM SIGPLAN 88 Best of PLDI

15 ACM SIGPLAN 89 Best of PLDI

16 ACM SIGPLAN 90 Best of PLDI

Compiler Optimisation

Compiler Optimisation Compiler Optimisation 8 Dependence Analysis Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This

More information

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo Dependence Analysis Dependence Examples Last Time: Brief introduction to interprocedural analysis Today: Optimization for parallel machines and memory hierarchies Dependence analysis Loop transformations

More information

Improving Memory Hierarchy Performance Through Combined Loop. Interchange and Multi-Level Fusion

Improving Memory Hierarchy Performance Through Combined Loop. Interchange and Multi-Level Fusion Improving Memory Hierarchy Performance Through Combined Loop Interchange and Multi-Level Fusion Qing Yi Ken Kennedy Computer Science Department, Rice University MS-132 Houston, TX 77005 Abstract Because

More information

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1 Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt

More information

Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling.

Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling. Advanced Topics Which Loops are Parallel? review Optimization for parallel machines and memory hierarchies Last Time Dependence analysis Today Loop transformations An example - McKinley, Carr, Tseng loop

More information

Timing analysis and predictability of architectures

Timing analysis and predictability of architectures Timing analysis and predictability of architectures Cache analysis Claire Maiza Verimag/INP 01/12/2010 Claire Maiza Synchron 2010 01/12/2010 1 / 18 Timing Analysis Frequency Analysis-guaranteed timing

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

Static Program Analysis using Abstract Interpretation

Static Program Analysis using Abstract Interpretation Static Program Analysis using Abstract Interpretation Introduction Static Program Analysis Static program analysis consists of automatically discovering properties of a program that hold for all possible

More information

Electrostatic Breakdown Analysis

Electrostatic Breakdown Analysis Utah State University DigitalCommons@USU Senior Theses and Projects Materials Physics 11-18-2014 Electrostatic Breakdown Analysis Sam Hansen Utah State University Follow this and additional works at: https://digitalcommons.usu.edu/mp_seniorthesesprojects

More information

Classes of data dependence. Dependence analysis. Flow dependence (True dependence)

Classes of data dependence. Dependence analysis. Flow dependence (True dependence) Dependence analysis Classes of data dependence Pattern matching and replacement is all that is needed to apply many source-to-source transformations. For example, pattern matching can be used to determine

More information

Dependence analysis. However, it is often necessary to gather additional information to determine the correctness of a particular transformation.

Dependence analysis. However, it is often necessary to gather additional information to determine the correctness of a particular transformation. Dependence analysis Pattern matching and replacement is all that is needed to apply many source-to-source transformations. For example, pattern matching can be used to determine that the recursion removal

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

A Simple Model for Sequences of Relational State Descriptions

A Simple Model for Sequences of Relational State Descriptions A Simple Model for Sequences of Relational State Descriptions Ingo Thon, Niels Landwehr, and Luc De Raedt Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001 Heverlee,

More information

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Loop Transformations Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Today Recall stencil computations Intro to loop transformations Data dependencies between

More information

Examiners Report/ Principal Examiner Feedback. Summer GCE Core Mathematics C3 (6665) Paper 01

Examiners Report/ Principal Examiner Feedback. Summer GCE Core Mathematics C3 (6665) Paper 01 Examiners Report/ Principal Examiner Feedback Summer 2013 GCE Core Mathematics C3 (6665) Paper 01 Edexcel and BTEC Qualifications Edexcel and BTEC qualifications come from Pearson, the UK s largest awarding

More information

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 10 10 February 2009 Announcement Feb 17 th

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

Automata Theory. Definition. Computational Complexity Theory. Computability Theory

Automata Theory. Definition. Computational Complexity Theory. Computability Theory Outline THEORY OF COMPUTATION CS363, SJTU What is Theory of Computation? History of Computation Branches and Development Xiaofeng Gao Dept. of Computer Science Shanghai Jiao Tong University 2 The Essential

More information

Quantum Classification of Malware

Quantum Classification of Malware Quantum Classification of Malware John Seymour seymour1@umbc.edu Charles Nicholas nicholas@umbc.edu August 24, 2015 Abstract Quantum computation has recently become an important area for security research,

More information

Final Report. COMET Partner's Project. University of Texas at San Antonio

Final Report. COMET Partner's Project. University of Texas at San Antonio Final Report COMET Partner's Project University: Name of University Researcher Preparing Report: University of Texas at San Antonio Dr. Hongjie Xie National Weather Service Office: Name of National Weather

More information

Multicore Semantics and Programming

Multicore Semantics and Programming Multicore Semantics and Programming Peter Sewell Tim Harris University of Cambridge Oracle October November, 2015 p. 1 These Lectures Part 1: Multicore Semantics: the concurrency of multiprocessors and

More information

Handouts. CS701 Theory of Computation

Handouts. CS701 Theory of Computation Handouts CS701 Theory of Computation by Kashif Nadeem VU Student MS Computer Science LECTURE 01 Overview In this lecturer the topics will be discussed including The Story of Computation, Theory of Computation,

More information

Leland Jameson Division of Mathematical Sciences National Science Foundation

Leland Jameson Division of Mathematical Sciences National Science Foundation Leland Jameson Division of Mathematical Sciences National Science Foundation Wind tunnel tests of airfoils Wind tunnels not infinite but some kind of closed loop Wind tunnel Yet another wind tunnel The

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval

More information

One-dimensional I test and direction vector I test with array references by induction variable

One-dimensional I test and direction vector I test with array references by induction variable Int. J. High Performance Computing an Networking, Vol. 3, No. 4, 2005 219 One-imensional I test an irection vector I test with array references by inuction variable Minyi Guo School of Computer Science

More information

Algebraic Equations. 2.0 Introduction. Nonsingular versus Singular Sets of Equations. A set of linear algebraic equations looks like this:

Algebraic Equations. 2.0 Introduction. Nonsingular versus Singular Sets of Equations. A set of linear algebraic equations looks like this: Chapter 2. 2.0 Introduction Solution of Linear Algebraic Equations A set of linear algebraic equations looks like this: a 11 x 1 + a 12 x 2 + a 13 x 3 + +a 1N x N =b 1 a 21 x 1 + a 22 x 2 + a 23 x 3 +

More information

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3 Numerical Optimization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at

More information

Branch Prediction using Advanced Neural Methods

Branch Prediction using Advanced Neural Methods Branch Prediction using Advanced Neural Methods Sunghoon Kim Department of Mechanical Engineering University of California, Berkeley shkim@newton.berkeley.edu Abstract Among the hardware techniques, two-level

More information

the library from whu* * »WS^-SA minimum on or before the W«t D tee Oi? _ J_., n of books oro W«' previous due date.

the library from whu* * »WS^-SA minimum on or before the W«t D tee Oi? _ J_., n of books oro W«' previous due date. on or before the W«t D»WS^-SA the library from whu* * tee Oi? _ J_., n of books oro ^ minimum W«' 21997 previous due date. Digitized by the Internet Archive in University of Illinois 2011 with funding

More information

Examiners Report. Summer Pearson Edexcel GCE in Mechanics M1 (6677/01)

Examiners Report. Summer Pearson Edexcel GCE in Mechanics M1 (6677/01) Examiners Report Summer 2014 Pearson Edexcel GCE in Mechanics M1 (6677/01) Edexcel and BTEC Qualifications Edexcel and BTEC qualifications are awarded by Pearson, the UK s largest awarding body. We provide

More information

1 Simplex and Matrices

1 Simplex and Matrices 1 Simplex and Matrices We will begin with a review of matrix multiplication. A matrix is simply an array of numbers. If a given array has m rows and n columns, then it is called an m n (or m-by-n) matrix.

More information

A Universe Filled with Questions

A Universe Filled with Questions Volume 19 Issue 1 Spring 2010 Illinois Wesleyan University Magazine Article 2 2010 A Universe Filled with Questions Tim Obermiller Illinois Wesleyan University, iwumag@iwu.edu Recommended Citation Obermiller,

More information

On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops Alain Darte and Frederic Vivien Laboratoire LIP, URA CNR

On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops Alain Darte and Frederic Vivien Laboratoire LIP, URA CNR On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops Alain Darte and Frederic Vivien Laboratoire LIP, URA CNRS 1398 Ecole Normale Superieure de Lyon, F - 69364

More information

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007 Lecture 1 Introduction 1 Introduction 1.1 The Problem Domain Today, we are going to ask whether a system can recover from perturbation. Consider a children s top: If it is perfectly vertically, you can

More information

Discovering Exoplanets Transiting Bright and Unusual Stars with K2

Discovering Exoplanets Transiting Bright and Unusual Stars with K2 Discovering Exoplanets Transiting Bright and Unusual Stars with K2 PhD Thesis Proposal, Department of Astronomy, Harvard University Andrew Vanderburg Advised by David Latham April 18, 2015 After four years

More information

Distributed Data Mining for Pervasive and Privacy-Sensitive Applications. Hillol Kargupta

Distributed Data Mining for Pervasive and Privacy-Sensitive Applications. Hillol Kargupta Distributed Data Mining for Pervasive and Privacy-Sensitive Applications Hillol Kargupta Dept. of Computer Science and Electrical Engg, University of Maryland Baltimore County http://www.cs.umbc.edu/~hillol

More information

Loop Parallelization Techniques and dependence analysis

Loop Parallelization Techniques and dependence analysis Loop Parallelization Techniques and dependence analysis Data-Dependence Analysis Dependence-Removing Techniques Parallelizing Transformations Performance-enchancing Techniques 1 When can we run code in

More information

Literary Geographies, Past and Future. Sheila Hones. The University of Tokyo

Literary Geographies, Past and Future. Sheila Hones. The University of Tokyo 1 THINKING SPACE Thinking Space is a series of short position papers on key terms and concepts for literary geography. Cumulatively, these accessible and wide-ranging pieces will explore the scope, parameters,

More information

Massive Parallelization of First Principles Molecular Dynamics Code

Massive Parallelization of First Principles Molecular Dynamics Code Massive Parallelization of First Principles Molecular Dynamics Code V Hidemi Komatsu V Takahiro Yamasaki V Shin-ichi Ichikawa (Manuscript received April 16, 2008) PHASE is a first principles molecular

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Integer Factorisation on the AP1000

Integer Factorisation on the AP1000 Integer Factorisation on the AP000 Craig Eldershaw Mathematics Department University of Queensland St Lucia Queensland 07 cs9@student.uq.edu.au Richard P. Brent Computer Sciences Laboratory Australian

More information

Ch 01. Analysis of Algorithms

Ch 01. Analysis of Algorithms Ch 01. Analysis of Algorithms Input Algorithm Output Acknowledgement: Parts of slides in this presentation come from the materials accompanying the textbook Algorithm Design and Applications, by M. T.

More information

Transposition Mechanism for Sparse Matrices on Vector Processors

Transposition Mechanism for Sparse Matrices on Vector Processors Transposition Mechanism for Sparse Matrices on Vector Processors Pyrrhos Stathis Stamatis Vassiliadis Sorin Cotofana Electrical Engineering Department, Delft University of Technology, Delft, The Netherlands

More information

Chapter 0. Prologue. Algorithms (I) Johann Gutenberg. Two ideas changed the world. Decimal system. Al Khwarizmi

Chapter 0. Prologue. Algorithms (I) Johann Gutenberg. Two ideas changed the world. Decimal system. Al Khwarizmi Algorithms (I) Yijia Chen Shanghai Jiaotong University Chapter 0. Prologue Johann Gutenberg Two ideas changed the world Because of the typography, literacy spread, the Dark Ages ended, the human intellect

More information

Induction of Non-Deterministic Finite Automata on Supercomputers

Induction of Non-Deterministic Finite Automata on Supercomputers JMLR: Workshop and Conference Proceedings 21:237 242, 2012 The 11th ICGI Induction of Non-Deterministic Finite Automata on Supercomputers Wojciech Wieczorek Institute of Computer Science, University of

More information

ONLINE SCHEDULING OF MALLEABLE PARALLEL JOBS

ONLINE SCHEDULING OF MALLEABLE PARALLEL JOBS ONLINE SCHEDULING OF MALLEABLE PARALLEL JOBS Richard A. Dutton and Weizhen Mao Department of Computer Science The College of William and Mary P.O. Box 795 Williamsburg, VA 2317-795, USA email: {radutt,wm}@cs.wm.edu

More information

A hard integer program made easy by lexicography

A hard integer program made easy by lexicography Noname manuscript No. (will be inserted by the editor) A hard integer program made easy by lexicography Egon Balas Matteo Fischetti Arrigo Zanette October 12, 2010 Abstract A small but notoriously hard

More information

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts) Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We

More information

Optimal Utilization Bounds for the Fixed-priority Scheduling of Periodic Task Systems on Identical Multiprocessors. Sanjoy K.

Optimal Utilization Bounds for the Fixed-priority Scheduling of Periodic Task Systems on Identical Multiprocessors. Sanjoy K. Optimal Utilization Bounds for the Fixed-priority Scheduling of Periodic Task Systems on Identical Multiprocessors Sanjoy K. Baruah Abstract In fixed-priority scheduling the priority of a job, once assigned,

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Introduction The Nature of High-Performance Computation

Introduction The Nature of High-Performance Computation 1 Introduction The Nature of High-Performance Computation The need for speed. Since the beginning of the era of the modern digital computer in the early 1940s, computing power has increased at an exponential

More information

Nonlinear Adaptive Robust Control. Theory and Applications to the Integrated Design of Intelligent and Precision Mechatronic Systems.

Nonlinear Adaptive Robust Control. Theory and Applications to the Integrated Design of Intelligent and Precision Mechatronic Systems. A Short Course on Nonlinear Adaptive Robust Control Theory and Applications to the Integrated Design of Intelligent and Precision Mechatronic Systems Bin Yao Intelligent and Precision Control Laboratory

More information

DEFINITION OF COLLAPSIBLE GRAPHS

DEFINITION OF COLLAPSIBLE GRAPHS Appendix A DEFINITION OF COLLAPSIBLE GRAPHS In the following discussion, i, j, and k are nodes. The indegree of a node i, which will be called indeg (i), is the number of incoming arcs. The outdegree of

More information

Department of. Computer Science. Empirical Estimation of Fault. Naixin Li and Yashwant K. Malaiya. August 20, Colorado State University

Department of. Computer Science. Empirical Estimation of Fault. Naixin Li and Yashwant K. Malaiya. August 20, Colorado State University Department of Computer Science Empirical Estimation of Fault Exposure Ratio Naixin Li and Yashwant K. Malaiya Technical Report CS-93-113 August 20, 1993 Colorado State University Empirical Estimation of

More information

Stockmarket Cycles Report for Wednesday, January 21, 2015

Stockmarket Cycles Report for Wednesday, January 21, 2015 Stockmarket Cycles Report for Wednesday, January 21, 2015 Welcome to 2015! As those of you who have been reading these reports over the past year or longer know, 2015 is set up in so many different ways

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

QR Decomposition in a Multicore Environment

QR Decomposition in a Multicore Environment QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance

More information

A Detailed Study on Phase Predictors

A Detailed Study on Phase Predictors A Detailed Study on Phase Predictors Frederik Vandeputte, Lieven Eeckhout, and Koen De Bosschere Ghent University, Electronics and Information Systems Department Sint-Pietersnieuwstraat 41, B-9000 Gent,

More information

Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen

Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen Advanced Restructuring Compilers Advanced Topics Spring 2009 Prof. Robert van Engelen Overview Data and control dependences The theory and practice of data dependence analysis K-level loop-carried dependences

More information

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR'S ACRONYM(S)

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR'S ACRONYM(S) REPORT DOCUMENTATION PAGE 1 AFRL-SR-AR-TR-05_ The public reporting burden for this collection of information is estimated to average 1 hour per response, including tlde time gathering and maintaining the

More information

Perm State University Research-Education Center Parallel and Distributed Computing

Perm State University Research-Education Center Parallel and Distributed Computing Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling

More information

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2 Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

More information

An Entropy Bound for Random Number Generation

An Entropy Bound for Random Number Generation 244 An Entropy Bound for Random Number Generation Sung-il Pae, Hongik University, Seoul, Korea Summary Many computer applications use random numbers as an important computational resource, and they often

More information

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m ) A Digit-Serial Systolic Multiplier for Finite Fields GF( m ) Chang Hoon Kim, Sang Duk Han, and Chun Pyo Hong Department of Computer and Information Engineering Taegu University 5 Naeri, Jinryang, Kyungsan,

More information

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned

More information

PSRGs via Random Walks on Graphs

PSRGs via Random Walks on Graphs Spectral Graph Theory Lecture 9 PSRGs via Random Walks on Graphs Daniel A. Spielman September 30, 2009 9.1 Overview There has been a lot of work on the design of Pseudo-Random Number Generators (PSRGs)

More information

DO NOT COPY DO NOT COPY

DO NOT COPY DO NOT COPY Drill Problems 3 benches. Another practical book is VHDL for Programmable Logic, by Kevin Skahill of Cypress Semiconductor (Addison-esley, 1996). All of the ABEL and VHDL examples in this chapter and throughout

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

' $ Dependence Analysis & % 1

' $ Dependence Analysis & % 1 Dependence Analysis 1 Goals - determine what operations can be done in parallel - determine whether the order of execution of operations can be altered Basic idea - determine a partial order on operations

More information

Annual Report for Blue Waters Allocation: Sonia Lasher-Trapp, Oct 2017

Annual Report for Blue Waters Allocation: Sonia Lasher-Trapp, Oct 2017 Annual Report for Blue Waters Allocation: Sonia Lasher-Trapp, Oct 2017 Project Information: Untangling Entrainment Effects on Hail and Rain in Deep Convective Clouds o Sonia Lasher-Trapp, UIUC, slasher@illinois.edu

More information

Lecture 1 - Preliminaries

Lecture 1 - Preliminaries Advanced Algorithms Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2013-2014 Lecture 1 - Preliminaries 1 Typography vs algorithms Johann Gutenberg (c. 1398 February

More information

Variational assimilation Practical considerations. Amos S. Lawless

Variational assimilation Practical considerations. Amos S. Lawless Variational assimilation Practical considerations Amos S. Lawless a.s.lawless@reading.ac.uk 4D-Var problem ] [ ] [ 2 2 min i i i n i i T i i i b T b h h J y R y B,,, n i i i i f subject to Minimization

More information

Park Forest Math Team. Meet #3. Self-study Packet

Park Forest Math Team. Meet #3. Self-study Packet Park Forest Math Team Meet #3 Self-study Packet Problem Categories for this Meet (in addition to topics of earlier meets): 1. Mystery: Problem solving 2. Geometry: Properties of Polygons, Pythagorean Theorem

More information

Using Kernel Couplings to Predict Parallel Application Performance

Using Kernel Couplings to Predict Parallel Application Performance Using Kernel Couplings to Predict Parallel Application Performance Valerie Taylor, Xingfu Wu, Jonathan Geisler Department of Electrical and Computer Engineering, Northwestern University, Evanston IL 60208

More information

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical & Electronic

More information

Well Drilling in Benin, West Africa 30 March 2008 Executive Summary

Well Drilling in Benin, West Africa 30 March 2008 Executive Summary Report on Expenditure of Funding Ann Campana Judge Foundation National Ground Water Research and Educational Foundation Well Drilling in Benin, West Africa 30 March 2008 Executive Summary The University

More information

The Fibonacci Sequence

The Fibonacci Sequence The Fibonacci Sequence MATH 100 Survey of Mathematical Ideas J. Robert Buchanan Department of Mathematics Summer 2018 The Fibonacci Sequence In 1202 Leonardo of Pisa (a.k.a Fibonacci) wrote a problem in

More information

Optimum Circuits for Bit Reversal

Optimum Circuits for Bit Reversal Optimum Circuits for Bit Reversal Mario Garrido Gálvez, Jesus Grajal and Oscar Gustafsson Linköping University Post Print.B.: When citing this work, cite the original article. 2011 IEEE. Personal use of

More information

ECS 120 Lesson 23 The Class P

ECS 120 Lesson 23 The Class P ECS 120 Lesson 23 The Class P Oliver Kreylos Wednesday, May 23th, 2001 We saw last time how to analyze the time complexity of Turing Machines, and how to classify languages into complexity classes. We

More information

output of newer technologies and the subsequent adoption by the general population. The

output of newer technologies and the subsequent adoption by the general population. The Jared Stillford 16 November 2015 CS4960-001 Dr. Melanie Martin Quantum Computing: A Matter of Time Within the last few decades, computer programming has grown exponentially in both its output of newer

More information

is your scource from Sartorius at discount prices. Manual of Weighing Applications Part 2 Counting

is your scource from Sartorius at discount prices. Manual of Weighing Applications Part 2 Counting Manual of Weighing Applications Part Counting Preface In many everyday areas of operation, the scale or the weight is only a means to an end: the quantity that is actually of interest is first calculated

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

3.1 Exercises. Amount saved A (dollars) Section 3.1 Linear Models 233

3.1 Exercises. Amount saved A (dollars) Section 3.1 Linear Models 233 Section 3.1 Linear Models 233 3.1 Exercises 1. Jodiah is saving his money to buy a Playstation 3 gaming system. He estimates that he will need $950 to buy the unit itself, accessories, and a few games.

More information

Evaluation Metrics for Intrusion Detection Systems - A Study

Evaluation Metrics for Intrusion Detection Systems - A Study Evaluation Metrics for Intrusion Detection Systems - A Study Gulshan Kumar Assistant Professor, Shaheed Bhagat Singh State Technical Campus, Ferozepur (Punjab)-India 152004 Email: gulshanahuja@gmail.com

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Research statement. Antonina Kolokolova

Research statement. Antonina Kolokolova Research statement Antonina Kolokolova What does it mean for a problem or a concept to be hard? It can be a computational problem that we don t know how to solve without spending a lot of time and memory,

More information

Fall 2008 CSE Qualifying Exam. September 13, 2008

Fall 2008 CSE Qualifying Exam. September 13, 2008 Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for

More information

Tensors and n-d Arrays: A Mathematics of Arrays (MoA) and the ψ-calculus

Tensors and n-d Arrays: A Mathematics of Arrays (MoA) and the ψ-calculus Tensors and n-d Arrays: A Mathematics of Arrays (MoA) and the ψ-calculus Composition of Tensor and Array Operations Lenore M. Mullin and James E. Raynolds 0 Message of This Talk An algebra of multi-dimensional

More information

Conduction Modes of a Peak Limiting Current Mode Controlled Buck Converter

Conduction Modes of a Peak Limiting Current Mode Controlled Buck Converter Conduction Modes of a Peak Limiting Current Mode Controlled Buck Converter Predrag Pejović and Marija Glišić Abstract In this paper, analysis of a buck converter operated applying a peak limiting current

More information

Deep Algebra Projects: Algebra 1 / Algebra 2 Go with the Flow

Deep Algebra Projects: Algebra 1 / Algebra 2 Go with the Flow Deep Algebra Projects: Algebra 1 / Algebra 2 Go with the Flow Topics Solving systems of linear equations (numerically and algebraically) Dependent and independent systems of equations; free variables Mathematical

More information

B629 project - StreamIt MPI Backend. Nilesh Mahajan

B629 project - StreamIt MPI Backend. Nilesh Mahajan B629 project - StreamIt MPI Backend Nilesh Mahajan March 26, 2013 Abstract StreamIt is a language based on the dataflow model of computation. StreamIt consists of computation units called filters connected

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research Justin B. Sorensen J. Willard Marriott Library University of Utah justin.sorensen@utah.edu Abstract As emerging technologies

More information

Problem of the Month. Once Upon a Time

Problem of the Month. Once Upon a Time Problem of the Month Once Upon a Time Level A: When it is four o clock, how many minutes must pass before the big hand (minute hand) gets to where the little hand (hour hand) was at four o clock? How did

More information

A LaTeX Template for Ph.D (or Ms) Thesis, Sorimsa-II/Physics Dept., Korea University

A LaTeX Template for Ph.D (or Ms) Thesis, Sorimsa-II/Physics Dept., Korea University Thesis for the Degree of Doctor A LaTeX Template for Ph.D (or Ms) Thesis, Sorimsa-II/Physics Dept., Korea University by Your name comes here Department of Physics College of Science Graduate School Korea

More information

Chemistry Informatics in Academic Laboratories: Lessons Learned

Chemistry Informatics in Academic Laboratories: Lessons Learned Chemistry Informatics in Academic Laboratories: Lessons Learned Michael Hudock Center for Biophysics & Computational Biology University of Illinois at Urbana-Champaign My Background Ph.D. candidate, Biophysics

More information

Advanced Computing Systems for Scientific Research

Advanced Computing Systems for Scientific Research Undergraduate Review Volume 10 Article 13 2014 Advanced Computing Systems for Scientific Research Jared Buckley Jason Covert Talia Martin Recommended Citation Buckley, Jared; Covert, Jason; and Martin,

More information

Simulation of CESR-c Luminosity from Beam Functions

Simulation of CESR-c Luminosity from Beam Functions Simulation of CESR-c Luminosity from Beam Functions Abhijit C. Mehta Trinity College, Duke University, Durham, North Carolina, 27708 (Dated: August 13, 2004) It is desirable to have the ability to compute

More information