EECS 358 Introduction to Parallel Computing Final Assignment

Size: px
Start display at page:

Download "EECS 358 Introduction to Parallel Computing Final Assignment"

Transcription

1 EECS 358 Introduction to Parallel Computing Final Assignment Jiangtao Gou Zhenyu Zhao March 19, Problem Matrix-vector Multiplication on Hypercube and Torus As shown in slide 15.11, we assumed row-wise striping for an n n matrix and a n 1 vector. Without loss of generality, we assumed that p n. For a special case where one row per processor, p = n. Ax = y. Processor P i initially stored vector elements x[in/p],, x[(i + 1)n/p 1] and matrix elements A[in/p, 0],, A[in/p, n 1], A[in/p+1, 0],, A[in/p+ 1, n 1],, A[(i + 1)n/p 1, 0],, A[(i + 1)n/p 1, n 1], and was responsible for calculating y[in/p],, y[(i + 1)n/p 1]. The first step was an all-to-all broadcast because every processor need the entire vector. The second step was an in-processor computing process. Matrix multiplication itself contained n 2 multiplications and n(n 1) additions. By assuming the sequential run time W = n 2, each processor spent n 2 /p time multiplying its own n/p rows to get the n/p elements of result vector. 1

2 1.1.1 Hypercube The communication time on hypercube was T H all2all. T H all2all = log 2 p i=1 ( ts + t h + 2 i 1 (n/p)t s ) = t s log 2 p + t h log 2 p + t w (n/p)(p 1). By neglecting per-hop time t h, the communication time was So the runtime on hypercube is Torus T H all2all = t s log 2 p + t w (n/p)(p 1). T P = n 2 /p + t s log 2 p + t w (n n/p). The first phase of p simultaneous ring-style all-to-all broadcasts consumed T 1 = (t s + t h + t w (n/p)) ( p 1). The second phase of ring-style all to all broadcasts was on the other dimension, which took T 2 = (t s + t h + t w ( pn/p)) ( p 1). The total communication time was T T all2all = T 1 + T 2 = 2(t s + t h ) ( p 1) + t w ( p 1) (n/p + n/ p) = 2(t s + t h ) ( p 1) + t w (n n/p) 2t s ( p 1) + t w (n n/p). So the runtime on hypercube is T P = n 2 /p + 2t s ( p 1) + t w (n n/p). 2

3 1.2 Matrix Transposition on Ring Here we considered three different ring structures, as shown in Fig 1. In Fig 1, we took a 16-processor parallel machine as an example. If we used the first ring structure, when transposing a matrix, the longest path would cover 7 links. If we used the second ring structure, when transposing a matrix, the longest path would cover 6 links. If we used the third ring structure, when transposing a matrix, the longest path would cover 3 links. Note that the third ring structure was significantly better than the other two when doing a matrix transposition, we decided to use this structure. By assuming that the number of processors p is less than n 2, the transpose of the entrie matrix was computed in two phases. In the first phase, we transposed the square matrix blocks. Note that the longest path contained p 1 links, so the communication time between processors was T R = t s ( p 1) + t w ( p 1)n 2 /p t s p + tw n 2 / p, where we assumed that per-hop time t h was negligible. In the second phase, we processed a local exchange. Each processor contained a n/ p n/ p matrix, and the transposition took a time n 2 /2p. The total parallel tun time on ring was by using our specific ring connection. T P = n 2 /2p + t s p + tw n 2 / p, 2 Problem Algorithm Description We applied a top-down greedy algorithm to find the partition with the minimal cost. 3

4 Figure 1: Ring Structures 4

5 2.1.1 Sequential Algorithm, 1 processor Our Greedy Partition Algorithm was shown in Fig 2. In each step, we chose the better partition with the smaller cost between the horizontal partition and the vertical partition, to equally divide the points in this given intermediate quadrant. We kept partitioning until we reached the pre-specified number of quadrants. When comparing the horizontal partition and the vertical partition, we need to compute the cost. There was a trick that we did not need to compute within-group cost, but only need to compute between-group cost, as shown in Figure 3. Let us assume that there were M = 2 m points in this quadrant. We had two partition choices, a vertical partition by dividing this quadrant into area 1 and 2, and a horizontal partition by dividing this quadrant into area 3 and 4. The cost of the vertical partition was a sum of (1) cost within group 13, (2) cost within group 14, (3) cost within group 23, (4) cost within group 24, (5) cost between group 13 and 14, (5) cost between group 23 and 24. The cost of the horizontal partition was a sum of (1) cost within group 13, (2) cost within group 14, (3) cost within group 23, (4) cost within group 24, (5) cost between group 13 and 23, (5) cost between group 14 and 24. So in order to make the comparison, we only need to compute betweengroup costs. Since each area 1, 2, 3 or 4 contained M/4 points, the comparison need to compute 4 (M/4) 2 = M 2 /4 distances. Assume that there were a total of N = 2 n points in a 2-dimensional coordinate system (Here N = 524, 288 and n = 19 in find quadrants given by professor, or N = 1, 048, 576 and n = 20 in the description of homework 4). We assume that in a unit time one processor can compute a distance between two points (which contained a square root, two multiplications and three additions). Assume that the number of quadrants was Q = 2 q, here q = 6, 7, 8. Numbers of processors were p = 1, 2, 4, 8, 16. 5

6 Figure 2: Greedy Partition Algorithm 6

7 Figure 3: Cost Comparison 7

8 In order to compare cost, we need to compute T 1 = N (N/2)2 4 q 1 = 2 i (N/2i ) 2 4 i=0 q 1 1 = 4 i=0 = N 2 2 N 2 2 i (1 12 q ). + 4 (N/4)2 4 + So when q was relatively large, there will be no difference between different q. Since we need the totally cost, in the last step we need to compute the with-in group distance, which took T 2 = 2 q 1 (N/2q 1 ) 2 So the time cost of computing cost is 4 = N 2 2 q+1. T cost = T 1 + T 2 = N 2 2, which is irrelevant with q. When partitioning, we need to sort the points based on either x-coordinate or y-coordinate, since we use quick sort which has an average complexity 1.39n log 2 n, which was significantly smaller than T cost, we concluded that the sequential algorithm took T S = N 2 to search the partition with minimum cost Parallel Algorithm, p processor When using multiple processors, we used interleaved schedule to assign cost computation to different processors. When there were p processors, it took N 2 /2p to compute cost. 8 2

9 Between every two steps, we need one single-node accumulation in order to summarize the cost, and one one-to-all broadcast to let each processor knew the decision between two possible partitions. The decision was made by one processor, say P 0. After getting the decision about the better partition, each processor would individually partition the points, where we need to sort the list of points again at first. The sorting process (O(N log N)) took much less time than computing cost (O(N 2 )), so we could omit the time cost of sorting. By assuming that accumulation and broadcast each took log p time, we conclude that T P = N 2 + 2q log p. 2p 2.2 Scalability Speedup Efficiency Note that S = N N 2 + 2q log p = pn N 2 + 4qp log p. 2p E = S/p = N 2 N 2 + 4qp log p. S = = = pn 2 N 2 + 4qp log p p 1 + 4qp log p/n 2 p 1 + (p 1). 4qp log p (p 1)N 2 So α = 4qp log p (p 1)N 2. Since a 0 as N, this algorithm is effective. Meanwhile, note that S = pn 2 N 2 + 4qp log p, 9

10 and so it is scalable. (2p)N 2 N 2 + 4q(2p) log(2p) 2pN 2 N 2 + 4qp log p = 2S, 2.3 Results: Run time and Partition cost We used the setting that N = 524, 288 and n = 19. Outputs were shown in Figure 5, 6, 7, 8, 9, 10, 11, 12 and 13. We print the coordinates of four corners of each quadrant, the cost of the quadrant distribution, and the wall time, as shown in Figure Results were summarized in Table 1 and Table 2. Table 1: Cost Number of Quadrants Cost Table 2: Time consuming on Lab Machine (second) Number of Processors 64 Quadrants 128 Quadrants 256 Quadrants Running on 358smp Host machine took longer time than running on Wilkinson lab machines (almost triple). 10

11 Figure 4: Time vs Number of Processors 11

12 Figure 5: Results (Lab machine): 64 quadrants, 1 and 2 processors 12

13 Figure 6: Results (Lab machine): 64 quadrants, 4 and 8 processors 13

14 Figure 7: Results (Lab machine): 64 quadrants, 16 processors 14

15 Figure 8: Results (Lab machine): 128 quadrants, 1 and 2 processors 15

16 Figure 9: Results (Lab machine): 128 quadrants, 4 and 8 processors 16

17 Figure 10: Results (Lab machine): 128 quadrants, 16 processors 17

18 Figure 11: Results (Lab machine): 256 quadrants, 1 and 2 processors 18

19 Figure 12: Results (Lab machine): 256 quadrants, 4 and 8 processors 19

20 Figure 13: Results (Lab machine): 256 quadrants, 16 processors 20

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

Analytical Modeling of Parallel Systems

Analytical Modeling of Parallel Systems Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

The Complexity Classes P and NP. Andreas Klappenecker [partially based on slides by Professor Welch]

The Complexity Classes P and NP. Andreas Klappenecker [partially based on slides by Professor Welch] The Complexity Classes P and NP Andreas Klappenecker [partially based on slides by Professor Welch] P Polynomial Time Algorithms Most of the algorithms we have seen so far run in time that is upper bounded

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra By: David McQuilling; Jesus Caban Deng Li Jan.,31,006 CS51 Solving Linear Equations u + v = 8 4u + 9v = 1 A x b 4 9 u v = 8 1 Gaussian Elimination Start with the matrix representation

More information

Solution of Linear Systems

Solution of Linear Systems Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing

More information

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE

More information

ECON 331 Homework #2 - Solution. In a closed model the vector of external demand is zero, so the matrix equation writes:

ECON 331 Homework #2 - Solution. In a closed model the vector of external demand is zero, so the matrix equation writes: ECON 33 Homework #2 - Solution. (Leontief model) (a) (i) The matrix of input-output A and the vector of level of production X are, respectively:.2.3.2 x A =.5.2.3 and X = y.3.5.5 z In a closed model the

More information

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

Program Performance Metrics

Program Performance Metrics Program Performance Metrics he parallel run time (par) is the time from the moment when computation starts to the moment when the last processor finished his execution he speedup (S) is defined as the

More information

Approximation Algorithms (Load Balancing)

Approximation Algorithms (Load Balancing) July 6, 204 Problem Definition : We are given a set of n jobs {J, J 2,..., J n }. Each job J i has a processing time t i 0. We are given m identical machines. Problem Definition : We are given a set of

More information

Parallel Performance Theory - 1

Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis

More information

Analytical Modeling of Parallel Programs. S. Oliveira

Analytical Modeling of Parallel Programs. S. Oliveira Analytical Modeling of Parallel Programs S. Oliveira Fall 2005 1 Scalability of Parallel Systems Efficiency of a parallel program E = S/P = T s /PT p Using the parallel overhead expression E = 1/(1 + T

More information

Fortran program + Partial data layout specifications Data Layout Assistant.. regular problems. dynamic remapping allowed Invoked only a few times Not part of the compiler Can use expensive techniques HPF

More information

7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.

7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved. 7.5 Operations with Matrices Copyright Cengage Learning. All rights reserved. What You Should Learn Decide whether two matrices are equal. Add and subtract matrices and multiply matrices by scalars. Multiply

More information

n n matrices The system of m linear equations in n variables x 1, x 2,..., x n can be written as a matrix equation by Ax = b, or in full

n n matrices The system of m linear equations in n variables x 1, x 2,..., x n can be written as a matrix equation by Ax = b, or in full n n matrices Matrices Definitions Diagonal, Identity, and zero matrices Addition Multiplication Transpose and inverse The system of m linear equations in n variables x 1, x 2,..., x n a 11 x 1 + a 12 x

More information

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning

More information

Parallel Performance Theory

Parallel Performance Theory AMS 250: An Introduction to High Performance Computing Parallel Performance Theory Shawfeng Dong shaw@ucsc.edu (831) 502-7743 Applied Mathematics & Statistics University of California, Santa Cruz Outline

More information

NP-Completeness. NP-Completeness 1

NP-Completeness. NP-Completeness 1 NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979. NP-Completeness 1 General Problems, Input Size and

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

1 / 28. Parallel Programming.

1 / 28. Parallel Programming. 1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:

More information

MATRICES. a m,1 a m,n A =

MATRICES. a m,1 a m,n A = MATRICES Matrices are rectangular arrays of real or complex numbers With them, we define arithmetic operations that are generalizations of those for real and complex numbers The general form a matrix of

More information

CSE 202 Homework 4 Matthias Springer, A

CSE 202 Homework 4 Matthias Springer, A CSE 202 Homework 4 Matthias Springer, A99500782 1 Problem 2 Basic Idea PERFECT ASSEMBLY N P: a permutation P of s i S is a certificate that can be checked in polynomial time by ensuring that P = S, and

More information

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 2008-09 Salvatore Orlando 1 0-1 Knapsack problem N objects, j=1,..,n Each kind of item j has a value p j and a weight w j (single

More information

Parallel Programming. Parallel algorithms Linear systems solvers

Parallel Programming. Parallel algorithms Linear systems solvers Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric

More information

Matrices Gaussian elimination Determinants. Graphics 2009/2010, period 1. Lecture 4: matrices

Matrices Gaussian elimination Determinants. Graphics 2009/2010, period 1. Lecture 4: matrices Graphics 2009/2010, period 1 Lecture 4 Matrices m n matrices Matrices Definitions Diagonal, Identity, and zero matrices Addition Multiplication Transpose and inverse The system of m linear equations in

More information

Pipelined Computations

Pipelined Computations Chapter 5 Slide 155 Pipelined Computations Pipelined Computations Slide 156 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each

More information

Matrices and Vectors

Matrices and Vectors Matrices and Vectors James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 11, 2013 Outline 1 Matrices and Vectors 2 Vector Details 3 Matrix

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

Recap from the previous lecture on Analytical Modeling

Recap from the previous lecture on Analytical Modeling COSC 637 Parallel Computation Analytical Modeling of Parallel Programs (II) Edgar Gabriel Fall 20 Recap from the previous lecture on Analytical Modeling Speedup: S p = T s / T p (p) Efficiency E = S p

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

APTAS for Bin Packing

APTAS for Bin Packing APTAS for Bin Packing Bin Packing has an asymptotic PTAS (APTAS) [de la Vega and Leuker, 1980] For every fixed ε > 0 algorithm outputs a solution of size (1+ε)OPT + 1 in time polynomial in n APTAS for

More information

QR Decomposition in a Multicore Environment

QR Decomposition in a Multicore Environment QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance

More information

Things we can already do with matrices. Unit II - Matrix arithmetic. Defining the matrix product. Things that fail in matrix arithmetic

Things we can already do with matrices. Unit II - Matrix arithmetic. Defining the matrix product. Things that fail in matrix arithmetic Unit II - Matrix arithmetic matrix multiplication matrix inverses elementary matrices finding the inverse of a matrix determinants Unit II - Matrix arithmetic 1 Things we can already do with matrices equality

More information

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan

More information

Performance Evaluation of Codes. Performance Metrics

Performance Evaluation of Codes. Performance Metrics CS6230 Performance Evaluation of Codes Performance Metrics Aim to understanding the algorithmic issues in obtaining high performance from large scale parallel computers Topics for Conisderation General

More information

MODEL ANSWERS TO THE THIRD HOMEWORK

MODEL ANSWERS TO THE THIRD HOMEWORK MODEL ANSWERS TO THE THIRD HOMEWORK 1 (i) We apply Gaussian elimination to A First note that the second row is a multiple of the first row So we need to swap the second and third rows 1 3 2 1 2 6 5 7 3

More information

Lecture 5b: Starting Matlab

Lecture 5b: Starting Matlab Lecture 5b: Starting Matlab James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University August 7, 2013 Outline 1 Resources 2 Starting Matlab 3 Homework

More information

Section 1.6. Functions

Section 1.6. Functions Section 1.6 Functions Definitions Relation, Domain, Range, and Function The table describes a relationship between the variables x and y. This relationship is also described graphically. x y 3 2 4 1 5

More information

1300 Linear Algebra and Vector Geometry Week 2: Jan , Gauss-Jordan, homogeneous matrices, intro matrix arithmetic

1300 Linear Algebra and Vector Geometry Week 2: Jan , Gauss-Jordan, homogeneous matrices, intro matrix arithmetic 1300 Linear Algebra and Vector Geometry Week 2: Jan 14 18 1.2, 1.3... Gauss-Jordan, homogeneous matrices, intro matrix arithmetic R. Craigen Office: MH 523 Email: craigenr@umanitoba.ca Winter 2019 What

More information

Parallel Scientific Computing

Parallel Scientific Computing IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.

More information

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms Anne Benoit, Fanny Dufossé and Yves Robert LIP, École Normale Supérieure de Lyon, France {Anne.Benoit Fanny.Dufosse

More information

Lecture 6: Lies, Inner Product Spaces, and Symmetric Matrices

Lecture 6: Lies, Inner Product Spaces, and Symmetric Matrices Math 108B Professor: Padraic Bartlett Lecture 6: Lies, Inner Product Spaces, and Symmetric Matrices Week 6 UCSB 2014 1 Lies Fun fact: I have deceived 1 you somewhat with these last few lectures! Let me

More information

Timing Results of a Parallel FFTsynth

Timing Results of a Parallel FFTsynth Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu

More information

MATH 445/545 Homework 1: Due February 11th, 2016

MATH 445/545 Homework 1: Due February 11th, 2016 MATH 445/545 Homework 1: Due February 11th, 2016 Answer the following questions Please type your solutions and include the questions and all graphics if needed with the solution 1 A business executive

More information

10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections )

10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections ) c Dr. Igor Zelenko, Fall 2017 1 10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections 7.2-7.4) 1. When each of the functions F 1, F 2,..., F n in right-hand side

More information

Graphing Square Roots - Class Work Graph the following equations by hand. State the domain and range of each using interval notation.

Graphing Square Roots - Class Work Graph the following equations by hand. State the domain and range of each using interval notation. Graphing Square Roots - Class Work Graph the following equations by hand. State the domain and range of each using interval notation. 1. y = x + 2 2. f(x) = x 1. y = x +. g(x) = 2 x 1. y = x + 2 + 6. h(x)

More information

Answer Key for Exam #2

Answer Key for Exam #2 . Use elimination on an augmented matrix: Answer Key for Exam # 4 4 8 4 4 4 The fourth column has no pivot, so x 4 is a free variable. The corresponding system is x + x 4 =, x =, x x 4 = which we solve

More information

Markov Model. Model representing the different resident states of a system, and the transitions between the different states

Markov Model. Model representing the different resident states of a system, and the transitions between the different states Markov Model Model representing the different resident states of a system, and the transitions between the different states (applicable to repairable, as well as non-repairable systems) System behavior

More information

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on

More information

Apprentice Program: Linear Algebra

Apprentice Program: Linear Algebra Apprentice Program: Linear Algebra Instructor: Miklós Abért Notes taken by Matt Holden and Kate Ponto June 26,2006 1 Matrices An n k matrix A over a ring R is a collection of nk elements of R, arranged

More information

Matrix Dimensions(orders)

Matrix Dimensions(orders) Definition of Matrix A matrix is a collection of numbers arranged into a fixed number of rows and columns. Usually the numbers are real numbers. In general, matrices can contain complex numbers but we

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

CSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms

CSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures

More information

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort Sorting Algorithms We have already seen: Selection-sort Insertion-sort Heap-sort We will see: Bubble-sort Merge-sort Quick-sort We will show that: O(n log n) is optimal for comparison based sorting. Bubble-Sort

More information

Algorithm Design. Scheduling Algorithms. Part 2. Parallel machines. Open-shop Scheduling. Job-shop Scheduling.

Algorithm Design. Scheduling Algorithms. Part 2. Parallel machines. Open-shop Scheduling. Job-shop Scheduling. Algorithm Design Scheduling Algorithms Part 2 Parallel machines. Open-shop Scheduling. Job-shop Scheduling. 1 Parallel Machines n jobs need to be scheduled on m machines, M 1,M 2,,M m. Each machine can

More information

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may

More information

A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices*

A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* Fredrik Manne Department of Informatics, University of Bergen, N-5020 Bergen, Norway Fredrik. Manne@ii. uib. no

More information

P vs NP & Computational Complexity

P vs NP & Computational Complexity P vs NP & Computational Complexity Miles Turpin MATH 89S Professor Hubert Bray P vs NP is one of the seven Clay Millennium Problems. The Clay Millenniums have been identified by the Clay Mathematics Institute

More information

Review for the Midterm Exam

Review for the Midterm Exam Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations

More information

1 GSW Sets of Systems

1 GSW Sets of Systems 1 Often, we have to solve a whole series of sets of simultaneous equations of the form y Ax, all of which have the same matrix A, but each of which has a different known vector y, and a different unknown

More information

Matrix Theory and Differential Equations Homework 6 Solutions, 10/5/6

Matrix Theory and Differential Equations Homework 6 Solutions, 10/5/6 Matrix Theory and Differential Equations Homework 6 Solutions, 0/5/6 Question Find the general solution of the matrix system: x 3y + 5z 8t 5 x + 4y z + t Express your answer in the form of a particulaolution

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation EECS 6A Designing Information Devices and Systems I Fall 08 Lecture Notes Note. Positioning Sytems: Trilateration and Correlation In this note, we ll introduce two concepts that are critical in our positioning

More information

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461)

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D-52425 Jülich, Tel. (02461) 61-6402 Interner Bericht Isoefficiency Analysis of Parallel QMR-Like Iterative Methods and its Implications

More information

Determinants - Uniqueness and Properties

Determinants - Uniqueness and Properties Determinants - Uniqueness and Properties 2-2-2008 In order to show that there s only one determinant function on M(n, R), I m going to derive another formula for the determinant It involves permutations

More information

DAA 8 TH UNIT DETAILS

DAA 8 TH UNIT DETAILS DAA 8 TH UNIT DETAILS UNIT VIII: NP-Hard and NP-Complete problems: Basic concepts, non deterministic algorithms, NP - Hard and NP-Complete classes, Cook s theorem. Important Questions for exam point of

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Matrix decompositions

Matrix decompositions Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers

More information

R ij = 2. Using all of these facts together, you can solve problem number 9.

R ij = 2. Using all of these facts together, you can solve problem number 9. Help for Homework Problem #9 Let G(V,E) be any undirected graph We want to calculate the travel time across the graph. Think of each edge as one resistor of 1 Ohm. Say we have two nodes: i and j Let the

More information

22m:033 Notes: 3.1 Introduction to Determinants

22m:033 Notes: 3.1 Introduction to Determinants 22m:033 Notes: 3. Introduction to Determinants Dennis Roseman University of Iowa Iowa City, IA http://www.math.uiowa.edu/ roseman October 27, 2009 When does a 2 2 matrix have an inverse? ( ) a a If A =

More information

CS612 Algorithm Design and Analysis

CS612 Algorithm Design and Analysis CS612 Algorithm Design and Analysis Lecture 16. Paging problem 1 Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 The slides are made based on Algorithm Design, Randomized

More information

ICS 252 Introduction to Computer Design

ICS 252 Introduction to Computer Design ICS 252 fall 2006 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred [Mic94] G. De Micheli Synthesis and Optimization of Digital Circuits McGraw-Hill, 1994. [CLR90]

More information

Introduction to Bioinformatics Algorithms Homework 3 Solution

Introduction to Bioinformatics Algorithms Homework 3 Solution Introduction to Bioinformatics Algorithms Homework 3 Solution Saad Mneimneh Computer Science Hunter College of CUNY Problem 1: Concave penalty function We have seen in class the following recurrence for

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System Yang You 1, James Demmel 1, Kent Czechowski 2, Le Song 2, Richard Vuduc 2 UC Berkeley 1, Georgia Tech 2 Yang You (Speaker) James

More information

Determining a span. λ + µ + ν = x 2λ + 2µ 10ν = y λ + 3µ 9ν = z.

Determining a span. λ + µ + ν = x 2λ + 2µ 10ν = y λ + 3µ 9ν = z. Determining a span Set V = R 3 and v 1 = (1, 2, 1), v 2 := (1, 2, 3), v 3 := (1 10, 9). We want to determine the span of these vectors. In other words, given (x, y, z) R 3, when is (x, y, z) span(v 1,

More information

Linear Systems of Equations by Gaussian Elimination

Linear Systems of Equations by Gaussian Elimination Chapter 6, p. 1/32 Linear of Equations by School of Engineering Sciences Parallel Computations for Large-Scale Problems I Chapter 6, p. 2/32 Outline 1 2 3 4 The Problem Consider the system a 1,1 x 1 +

More information

Linear Algebra V = T = ( 4 3 ).

Linear Algebra V = T = ( 4 3 ). Linear Algebra Vectors A column vector is a list of numbers stored vertically The dimension of a column vector is the number of values in the vector W is a -dimensional column vector and V is a 5-dimensional

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

COL 730: Parallel Programming

COL 730: Parallel Programming COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some

More information

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4 Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math Week # 1 Saturday, February 1, 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x

More information

This means that we can assume each list ) is

This means that we can assume each list ) is This means that we can assume each list ) is of the form ),, ( )with < and Since the sizes of the items are integers, there are at most +1pairs in each list Furthermore, if we let = be the maximum possible

More information

Calculating Frobenius Numbers with Boolean Toeplitz Matrix Multiplication

Calculating Frobenius Numbers with Boolean Toeplitz Matrix Multiplication Calculating Frobenius Numbers with Boolean Toeplitz Matrix Multiplication For Dr. Cull, CS 523, March 17, 2009 Christopher Bogart bogart@eecs.oregonstate.edu ABSTRACT I consider a class of algorithms that

More information

Matrices: 2.1 Operations with Matrices

Matrices: 2.1 Operations with Matrices Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions Summary of the previous lecture Recall that we mentioned the following topics: P: is the set of decision problems (or languages) that are solvable in polynomial time. NP: is the set of decision problems

More information

The Theory of the Simplex Method. Chapter 5: Hillier and Lieberman Chapter 5: Decision Tools for Agribusiness Dr. Hurley s AGB 328 Course

The Theory of the Simplex Method. Chapter 5: Hillier and Lieberman Chapter 5: Decision Tools for Agribusiness Dr. Hurley s AGB 328 Course The Theory of the Simplex Method Chapter 5: Hillier and Lieberman Chapter 5: Decision Tools for Agribusiness Dr. Hurley s AGB 328 Course Terms to Know Constraint Boundary Equation, Hyperplane, Constraint

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation EECS 6A Designing Information Devices and Systems I Fall 08 Lecture Notes Note. Positioning Sytems: Trilateration and Correlation In this note, we ll introduce two concepts that are critical in our positioning

More information

ODD EVEN SHIFTS IN SIMD HYPERCUBES 1

ODD EVEN SHIFTS IN SIMD HYPERCUBES 1 ODD EVEN SHIFTS IN SIMD HYPERCUBES 1 Sanjay Ranka 2 and Sartaj Sahni 3 Abstract We develop a linear time algorithm to perform all odd (even) length circular shifts of data in an SIMD hypercube. As an application,

More information

Designing Information Devices and Systems I Fall 2017 Official Lecture Notes Note 2

Designing Information Devices and Systems I Fall 2017 Official Lecture Notes Note 2 EECS 6A Designing Information Devices and Systems I Fall 07 Official Lecture Notes Note Introduction Previously, we introduced vectors and matrices as a way of writing systems of linear equations more

More information

= ( 1 P + S V P) 1. Speedup T S /T V p

= ( 1 P + S V P) 1. Speedup T S /T V p Numerical Simulation - Homework Solutions 2013 1. Amdahl s law. (28%) Consider parallel processing with common memory. The number of parallel processor is 254. Explain why Amdahl s law for this situation

More information

Matrices. In this chapter: matrices, determinants. inverse matrix

Matrices. In this chapter: matrices, determinants. inverse matrix Matrices In this chapter: matrices, determinants inverse matrix 1 1.1 Matrices A matrix is a retangular array of numbers. Rows: horizontal lines. A = a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 a 41 a

More information

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

MATH2210 Notebook 2 Spring 2018

MATH2210 Notebook 2 Spring 2018 MATH2210 Notebook 2 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2009 2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH2210 Notebook 2 3 2.1 Matrices and Their Operations................................

More information

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a

More information

Logic Design II (17.342) Spring Lecture Outline

Logic Design II (17.342) Spring Lecture Outline Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information