EECS 358 Introduction to Parallel Computing Final Assignment
|
|
- Gregory Beasley
- 5 years ago
- Views:
Transcription
1 EECS 358 Introduction to Parallel Computing Final Assignment Jiangtao Gou Zhenyu Zhao March 19, Problem Matrix-vector Multiplication on Hypercube and Torus As shown in slide 15.11, we assumed row-wise striping for an n n matrix and a n 1 vector. Without loss of generality, we assumed that p n. For a special case where one row per processor, p = n. Ax = y. Processor P i initially stored vector elements x[in/p],, x[(i + 1)n/p 1] and matrix elements A[in/p, 0],, A[in/p, n 1], A[in/p+1, 0],, A[in/p+ 1, n 1],, A[(i + 1)n/p 1, 0],, A[(i + 1)n/p 1, n 1], and was responsible for calculating y[in/p],, y[(i + 1)n/p 1]. The first step was an all-to-all broadcast because every processor need the entire vector. The second step was an in-processor computing process. Matrix multiplication itself contained n 2 multiplications and n(n 1) additions. By assuming the sequential run time W = n 2, each processor spent n 2 /p time multiplying its own n/p rows to get the n/p elements of result vector. 1
2 1.1.1 Hypercube The communication time on hypercube was T H all2all. T H all2all = log 2 p i=1 ( ts + t h + 2 i 1 (n/p)t s ) = t s log 2 p + t h log 2 p + t w (n/p)(p 1). By neglecting per-hop time t h, the communication time was So the runtime on hypercube is Torus T H all2all = t s log 2 p + t w (n/p)(p 1). T P = n 2 /p + t s log 2 p + t w (n n/p). The first phase of p simultaneous ring-style all-to-all broadcasts consumed T 1 = (t s + t h + t w (n/p)) ( p 1). The second phase of ring-style all to all broadcasts was on the other dimension, which took T 2 = (t s + t h + t w ( pn/p)) ( p 1). The total communication time was T T all2all = T 1 + T 2 = 2(t s + t h ) ( p 1) + t w ( p 1) (n/p + n/ p) = 2(t s + t h ) ( p 1) + t w (n n/p) 2t s ( p 1) + t w (n n/p). So the runtime on hypercube is T P = n 2 /p + 2t s ( p 1) + t w (n n/p). 2
3 1.2 Matrix Transposition on Ring Here we considered three different ring structures, as shown in Fig 1. In Fig 1, we took a 16-processor parallel machine as an example. If we used the first ring structure, when transposing a matrix, the longest path would cover 7 links. If we used the second ring structure, when transposing a matrix, the longest path would cover 6 links. If we used the third ring structure, when transposing a matrix, the longest path would cover 3 links. Note that the third ring structure was significantly better than the other two when doing a matrix transposition, we decided to use this structure. By assuming that the number of processors p is less than n 2, the transpose of the entrie matrix was computed in two phases. In the first phase, we transposed the square matrix blocks. Note that the longest path contained p 1 links, so the communication time between processors was T R = t s ( p 1) + t w ( p 1)n 2 /p t s p + tw n 2 / p, where we assumed that per-hop time t h was negligible. In the second phase, we processed a local exchange. Each processor contained a n/ p n/ p matrix, and the transposition took a time n 2 /2p. The total parallel tun time on ring was by using our specific ring connection. T P = n 2 /2p + t s p + tw n 2 / p, 2 Problem Algorithm Description We applied a top-down greedy algorithm to find the partition with the minimal cost. 3
4 Figure 1: Ring Structures 4
5 2.1.1 Sequential Algorithm, 1 processor Our Greedy Partition Algorithm was shown in Fig 2. In each step, we chose the better partition with the smaller cost between the horizontal partition and the vertical partition, to equally divide the points in this given intermediate quadrant. We kept partitioning until we reached the pre-specified number of quadrants. When comparing the horizontal partition and the vertical partition, we need to compute the cost. There was a trick that we did not need to compute within-group cost, but only need to compute between-group cost, as shown in Figure 3. Let us assume that there were M = 2 m points in this quadrant. We had two partition choices, a vertical partition by dividing this quadrant into area 1 and 2, and a horizontal partition by dividing this quadrant into area 3 and 4. The cost of the vertical partition was a sum of (1) cost within group 13, (2) cost within group 14, (3) cost within group 23, (4) cost within group 24, (5) cost between group 13 and 14, (5) cost between group 23 and 24. The cost of the horizontal partition was a sum of (1) cost within group 13, (2) cost within group 14, (3) cost within group 23, (4) cost within group 24, (5) cost between group 13 and 23, (5) cost between group 14 and 24. So in order to make the comparison, we only need to compute betweengroup costs. Since each area 1, 2, 3 or 4 contained M/4 points, the comparison need to compute 4 (M/4) 2 = M 2 /4 distances. Assume that there were a total of N = 2 n points in a 2-dimensional coordinate system (Here N = 524, 288 and n = 19 in find quadrants given by professor, or N = 1, 048, 576 and n = 20 in the description of homework 4). We assume that in a unit time one processor can compute a distance between two points (which contained a square root, two multiplications and three additions). Assume that the number of quadrants was Q = 2 q, here q = 6, 7, 8. Numbers of processors were p = 1, 2, 4, 8, 16. 5
6 Figure 2: Greedy Partition Algorithm 6
7 Figure 3: Cost Comparison 7
8 In order to compare cost, we need to compute T 1 = N (N/2)2 4 q 1 = 2 i (N/2i ) 2 4 i=0 q 1 1 = 4 i=0 = N 2 2 N 2 2 i (1 12 q ). + 4 (N/4)2 4 + So when q was relatively large, there will be no difference between different q. Since we need the totally cost, in the last step we need to compute the with-in group distance, which took T 2 = 2 q 1 (N/2q 1 ) 2 So the time cost of computing cost is 4 = N 2 2 q+1. T cost = T 1 + T 2 = N 2 2, which is irrelevant with q. When partitioning, we need to sort the points based on either x-coordinate or y-coordinate, since we use quick sort which has an average complexity 1.39n log 2 n, which was significantly smaller than T cost, we concluded that the sequential algorithm took T S = N 2 to search the partition with minimum cost Parallel Algorithm, p processor When using multiple processors, we used interleaved schedule to assign cost computation to different processors. When there were p processors, it took N 2 /2p to compute cost. 8 2
9 Between every two steps, we need one single-node accumulation in order to summarize the cost, and one one-to-all broadcast to let each processor knew the decision between two possible partitions. The decision was made by one processor, say P 0. After getting the decision about the better partition, each processor would individually partition the points, where we need to sort the list of points again at first. The sorting process (O(N log N)) took much less time than computing cost (O(N 2 )), so we could omit the time cost of sorting. By assuming that accumulation and broadcast each took log p time, we conclude that T P = N 2 + 2q log p. 2p 2.2 Scalability Speedup Efficiency Note that S = N N 2 + 2q log p = pn N 2 + 4qp log p. 2p E = S/p = N 2 N 2 + 4qp log p. S = = = pn 2 N 2 + 4qp log p p 1 + 4qp log p/n 2 p 1 + (p 1). 4qp log p (p 1)N 2 So α = 4qp log p (p 1)N 2. Since a 0 as N, this algorithm is effective. Meanwhile, note that S = pn 2 N 2 + 4qp log p, 9
10 and so it is scalable. (2p)N 2 N 2 + 4q(2p) log(2p) 2pN 2 N 2 + 4qp log p = 2S, 2.3 Results: Run time and Partition cost We used the setting that N = 524, 288 and n = 19. Outputs were shown in Figure 5, 6, 7, 8, 9, 10, 11, 12 and 13. We print the coordinates of four corners of each quadrant, the cost of the quadrant distribution, and the wall time, as shown in Figure Results were summarized in Table 1 and Table 2. Table 1: Cost Number of Quadrants Cost Table 2: Time consuming on Lab Machine (second) Number of Processors 64 Quadrants 128 Quadrants 256 Quadrants Running on 358smp Host machine took longer time than running on Wilkinson lab machines (almost triple). 10
11 Figure 4: Time vs Number of Processors 11
12 Figure 5: Results (Lab machine): 64 quadrants, 1 and 2 processors 12
13 Figure 6: Results (Lab machine): 64 quadrants, 4 and 8 processors 13
14 Figure 7: Results (Lab machine): 64 quadrants, 16 processors 14
15 Figure 8: Results (Lab machine): 128 quadrants, 1 and 2 processors 15
16 Figure 9: Results (Lab machine): 128 quadrants, 4 and 8 processors 16
17 Figure 10: Results (Lab machine): 128 quadrants, 16 processors 17
18 Figure 11: Results (Lab machine): 256 quadrants, 1 and 2 processors 18
19 Figure 12: Results (Lab machine): 256 quadrants, 4 and 8 processors 19
20 Figure 13: Results (Lab machine): 256 quadrants, 16 processors 20
Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationAnalytical Modeling of Parallel Systems
Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing
More informationThe Complexity Classes P and NP. Andreas Klappenecker [partially based on slides by Professor Welch]
The Complexity Classes P and NP Andreas Klappenecker [partially based on slides by Professor Welch] P Polynomial Time Algorithms Most of the algorithms we have seen so far run in time that is upper bounded
More informationNumerical Linear Algebra
Numerical Linear Algebra By: David McQuilling; Jesus Caban Deng Li Jan.,31,006 CS51 Solving Linear Equations u + v = 8 4u + 9v = 1 A x b 4 9 u v = 8 1 Gaussian Elimination Start with the matrix representation
More informationSolution of Linear Systems
Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing
More informationNotation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing
Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE
More informationECON 331 Homework #2 - Solution. In a closed model the vector of external demand is zero, so the matrix equation writes:
ECON 33 Homework #2 - Solution. (Leontief model) (a) (i) The matrix of input-output A and the vector of level of production X are, respectively:.2.3.2 x A =.5.2.3 and X = y.3.5.5 z In a closed model the
More informationCS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University
CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationProgram Performance Metrics
Program Performance Metrics he parallel run time (par) is the time from the moment when computation starts to the moment when the last processor finished his execution he speedup (S) is defined as the
More informationApproximation Algorithms (Load Balancing)
July 6, 204 Problem Definition : We are given a set of n jobs {J, J 2,..., J n }. Each job J i has a processing time t i 0. We are given m identical machines. Problem Definition : We are given a set of
More informationParallel Performance Theory - 1
Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis
More informationAnalytical Modeling of Parallel Programs. S. Oliveira
Analytical Modeling of Parallel Programs S. Oliveira Fall 2005 1 Scalability of Parallel Systems Efficiency of a parallel program E = S/P = T s /PT p Using the parallel overhead expression E = 1/(1 + T
More informationFortran program + Partial data layout specifications Data Layout Assistant.. regular problems. dynamic remapping allowed Invoked only a few times Not part of the compiler Can use expensive techniques HPF
More information7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.
7.5 Operations with Matrices Copyright Cengage Learning. All rights reserved. What You Should Learn Decide whether two matrices are equal. Add and subtract matrices and multiply matrices by scalars. Multiply
More informationn n matrices The system of m linear equations in n variables x 1, x 2,..., x n can be written as a matrix equation by Ax = b, or in full
n n matrices Matrices Definitions Diagonal, Identity, and zero matrices Addition Multiplication Transpose and inverse The system of m linear equations in n variables x 1, x 2,..., x n a 11 x 1 + a 12 x
More informationAlgorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen
Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning
More informationParallel Performance Theory
AMS 250: An Introduction to High Performance Computing Parallel Performance Theory Shawfeng Dong shaw@ucsc.edu (831) 502-7743 Applied Mathematics & Statistics University of California, Santa Cruz Outline
More informationNP-Completeness. NP-Completeness 1
NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979. NP-Completeness 1 General Problems, Input Size and
More informationLinear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4
Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix
More information1 / 28. Parallel Programming.
1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:
More informationMATRICES. a m,1 a m,n A =
MATRICES Matrices are rectangular arrays of real or complex numbers With them, we define arithmetic operations that are generalizations of those for real and complex numbers The general form a matrix of
More informationCSE 202 Homework 4 Matthias Springer, A
CSE 202 Homework 4 Matthias Springer, A99500782 1 Problem 2 Basic Idea PERFECT ASSEMBLY N P: a permutation P of s i S is a certificate that can be checked in polynomial time by ensuring that P = S, and
More information0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA
0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 2008-09 Salvatore Orlando 1 0-1 Knapsack problem N objects, j=1,..,n Each kind of item j has a value p j and a weight w j (single
More informationParallel Programming. Parallel algorithms Linear systems solvers
Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric
More informationMatrices Gaussian elimination Determinants. Graphics 2009/2010, period 1. Lecture 4: matrices
Graphics 2009/2010, period 1 Lecture 4 Matrices m n matrices Matrices Definitions Diagonal, Identity, and zero matrices Addition Multiplication Transpose and inverse The system of m linear equations in
More informationPipelined Computations
Chapter 5 Slide 155 Pipelined Computations Pipelined Computations Slide 156 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each
More informationMatrices and Vectors
Matrices and Vectors James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 11, 2013 Outline 1 Matrices and Vectors 2 Vector Details 3 Matrix
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More informationRecap from the previous lecture on Analytical Modeling
COSC 637 Parallel Computation Analytical Modeling of Parallel Programs (II) Edgar Gabriel Fall 20 Recap from the previous lecture on Analytical Modeling Speedup: S p = T s / T p (p) Efficiency E = S p
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers
More informationAPTAS for Bin Packing
APTAS for Bin Packing Bin Packing has an asymptotic PTAS (APTAS) [de la Vega and Leuker, 1980] For every fixed ε > 0 algorithm outputs a solution of size (1+ε)OPT + 1 in time polynomial in n APTAS for
More informationQR Decomposition in a Multicore Environment
QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance
More informationThings we can already do with matrices. Unit II - Matrix arithmetic. Defining the matrix product. Things that fail in matrix arithmetic
Unit II - Matrix arithmetic matrix multiplication matrix inverses elementary matrices finding the inverse of a matrix determinants Unit II - Matrix arithmetic 1 Things we can already do with matrices equality
More informationThe Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan
More informationPerformance Evaluation of Codes. Performance Metrics
CS6230 Performance Evaluation of Codes Performance Metrics Aim to understanding the algorithmic issues in obtaining high performance from large scale parallel computers Topics for Conisderation General
More informationMODEL ANSWERS TO THE THIRD HOMEWORK
MODEL ANSWERS TO THE THIRD HOMEWORK 1 (i) We apply Gaussian elimination to A First note that the second row is a multiple of the first row So we need to swap the second and third rows 1 3 2 1 2 6 5 7 3
More informationLecture 5b: Starting Matlab
Lecture 5b: Starting Matlab James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University August 7, 2013 Outline 1 Resources 2 Starting Matlab 3 Homework
More informationSection 1.6. Functions
Section 1.6 Functions Definitions Relation, Domain, Range, and Function The table describes a relationship between the variables x and y. This relationship is also described graphically. x y 3 2 4 1 5
More information1300 Linear Algebra and Vector Geometry Week 2: Jan , Gauss-Jordan, homogeneous matrices, intro matrix arithmetic
1300 Linear Algebra and Vector Geometry Week 2: Jan 14 18 1.2, 1.3... Gauss-Jordan, homogeneous matrices, intro matrix arithmetic R. Craigen Office: MH 523 Email: craigenr@umanitoba.ca Winter 2019 What
More informationParallel Scientific Computing
IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.
More informationOn the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms
On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms Anne Benoit, Fanny Dufossé and Yves Robert LIP, École Normale Supérieure de Lyon, France {Anne.Benoit Fanny.Dufosse
More informationLecture 6: Lies, Inner Product Spaces, and Symmetric Matrices
Math 108B Professor: Padraic Bartlett Lecture 6: Lies, Inner Product Spaces, and Symmetric Matrices Week 6 UCSB 2014 1 Lies Fun fact: I have deceived 1 you somewhat with these last few lectures! Let me
More informationTiming Results of a Parallel FFTsynth
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu
More informationMATH 445/545 Homework 1: Due February 11th, 2016
MATH 445/545 Homework 1: Due February 11th, 2016 Answer the following questions Please type your solutions and include the questions and all graphics if needed with the solution 1 A business executive
More information10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections )
c Dr. Igor Zelenko, Fall 2017 1 10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections 7.2-7.4) 1. When each of the functions F 1, F 2,..., F n in right-hand side
More informationGraphing Square Roots - Class Work Graph the following equations by hand. State the domain and range of each using interval notation.
Graphing Square Roots - Class Work Graph the following equations by hand. State the domain and range of each using interval notation. 1. y = x + 2 2. f(x) = x 1. y = x +. g(x) = 2 x 1. y = x + 2 + 6. h(x)
More informationAnswer Key for Exam #2
. Use elimination on an augmented matrix: Answer Key for Exam # 4 4 8 4 4 4 The fourth column has no pivot, so x 4 is a free variable. The corresponding system is x + x 4 =, x =, x x 4 = which we solve
More informationMarkov Model. Model representing the different resident states of a system, and the transitions between the different states
Markov Model Model representing the different resident states of a system, and the transitions between the different states (applicable to repairable, as well as non-repairable systems) System behavior
More informationAnalytical Modeling of Parallel Programs (Chapter 5) Alexandre David
Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on
More informationApprentice Program: Linear Algebra
Apprentice Program: Linear Algebra Instructor: Miklós Abért Notes taken by Matt Holden and Kate Ponto June 26,2006 1 Matrices An n k matrix A over a ring R is a collection of nk elements of R, arranged
More informationMatrix Dimensions(orders)
Definition of Matrix A matrix is a collection of numbers arranged into a fixed number of rows and columns. Usually the numbers are real numbers. In general, matrices can contain complex numbers but we
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationCSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms
Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures
More informationSorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort
Sorting Algorithms We have already seen: Selection-sort Insertion-sort Heap-sort We will see: Bubble-sort Merge-sort Quick-sort We will show that: O(n log n) is optimal for comparison based sorting. Bubble-Sort
More informationAlgorithm Design. Scheduling Algorithms. Part 2. Parallel machines. Open-shop Scheduling. Job-shop Scheduling.
Algorithm Design Scheduling Algorithms Part 2 Parallel machines. Open-shop Scheduling. Job-shop Scheduling. 1 Parallel Machines n jobs need to be scheduled on m machines, M 1,M 2,,M m. Each machine can
More informationHYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni
HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may
More informationA Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices*
A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* Fredrik Manne Department of Informatics, University of Bergen, N-5020 Bergen, Norway Fredrik. Manne@ii. uib. no
More informationP vs NP & Computational Complexity
P vs NP & Computational Complexity Miles Turpin MATH 89S Professor Hubert Bray P vs NP is one of the seven Clay Millennium Problems. The Clay Millenniums have been identified by the Clay Mathematics Institute
More informationReview for the Midterm Exam
Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations
More information1 GSW Sets of Systems
1 Often, we have to solve a whole series of sets of simultaneous equations of the form y Ax, all of which have the same matrix A, but each of which has a different known vector y, and a different unknown
More informationMatrix Theory and Differential Equations Homework 6 Solutions, 10/5/6
Matrix Theory and Differential Equations Homework 6 Solutions, 0/5/6 Question Find the general solution of the matrix system: x 3y + 5z 8t 5 x + 4y z + t Express your answer in the form of a particulaolution
More informationDesigning Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation
EECS 6A Designing Information Devices and Systems I Fall 08 Lecture Notes Note. Positioning Sytems: Trilateration and Correlation In this note, we ll introduce two concepts that are critical in our positioning
More informationFORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461)
FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D-52425 Jülich, Tel. (02461) 61-6402 Interner Bericht Isoefficiency Analysis of Parallel QMR-Like Iterative Methods and its Implications
More informationDeterminants - Uniqueness and Properties
Determinants - Uniqueness and Properties 2-2-2008 In order to show that there s only one determinant function on M(n, R), I m going to derive another formula for the determinant It involves permutations
More informationDAA 8 TH UNIT DETAILS
DAA 8 TH UNIT DETAILS UNIT VIII: NP-Hard and NP-Complete problems: Basic concepts, non deterministic algorithms, NP - Hard and NP-Complete classes, Cook s theorem. Important Questions for exam point of
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationMatrix decompositions
Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers
More informationR ij = 2. Using all of these facts together, you can solve problem number 9.
Help for Homework Problem #9 Let G(V,E) be any undirected graph We want to calculate the travel time across the graph. Think of each edge as one resistor of 1 Ohm. Say we have two nodes: i and j Let the
More information22m:033 Notes: 3.1 Introduction to Determinants
22m:033 Notes: 3. Introduction to Determinants Dennis Roseman University of Iowa Iowa City, IA http://www.math.uiowa.edu/ roseman October 27, 2009 When does a 2 2 matrix have an inverse? ( ) a a If A =
More informationCS612 Algorithm Design and Analysis
CS612 Algorithm Design and Analysis Lecture 16. Paging problem 1 Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 The slides are made based on Algorithm Design, Randomized
More informationICS 252 Introduction to Computer Design
ICS 252 fall 2006 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred [Mic94] G. De Micheli Synthesis and Optimization of Digital Circuits McGraw-Hill, 1994. [CLR90]
More informationIntroduction to Bioinformatics Algorithms Homework 3 Solution
Introduction to Bioinformatics Algorithms Homework 3 Solution Saad Mneimneh Computer Science Hunter College of CUNY Problem 1: Concave penalty function We have seen in class the following recurrence for
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationCA-SVM: Communication-Avoiding Support Vector Machines on Distributed System
CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System Yang You 1, James Demmel 1, Kent Czechowski 2, Le Song 2, Richard Vuduc 2 UC Berkeley 1, Georgia Tech 2 Yang You (Speaker) James
More informationDetermining a span. λ + µ + ν = x 2λ + 2µ 10ν = y λ + 3µ 9ν = z.
Determining a span Set V = R 3 and v 1 = (1, 2, 1), v 2 := (1, 2, 3), v 3 := (1 10, 9). We want to determine the span of these vectors. In other words, given (x, y, z) R 3, when is (x, y, z) span(v 1,
More informationLinear Systems of Equations by Gaussian Elimination
Chapter 6, p. 1/32 Linear of Equations by School of Engineering Sciences Parallel Computations for Large-Scale Problems I Chapter 6, p. 2/32 Outline 1 2 3 4 The Problem Consider the system a 1,1 x 1 +
More informationLinear Algebra V = T = ( 4 3 ).
Linear Algebra Vectors A column vector is a list of numbers stored vertically The dimension of a column vector is the number of values in the vector W is a -dimensional column vector and V is a 5-dimensional
More informationLecture 4: Linear Algebra 1
Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation
More informationCOL 730: Parallel Programming
COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some
More informationLinear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4
Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math Week # 1 Saturday, February 1, 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x
More informationThis means that we can assume each list ) is
This means that we can assume each list ) is of the form ),, ( )with < and Since the sizes of the items are integers, there are at most +1pairs in each list Furthermore, if we let = be the maximum possible
More informationCalculating Frobenius Numbers with Boolean Toeplitz Matrix Multiplication
Calculating Frobenius Numbers with Boolean Toeplitz Matrix Multiplication For Dr. Cull, CS 523, March 17, 2009 Christopher Bogart bogart@eecs.oregonstate.edu ABSTRACT I consider a class of algorithms that
More informationMatrices: 2.1 Operations with Matrices
Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,
More informationEfficient algorithms for symmetric tensor contractions
Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to
More informationP P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions
Summary of the previous lecture Recall that we mentioned the following topics: P: is the set of decision problems (or languages) that are solvable in polynomial time. NP: is the set of decision problems
More informationThe Theory of the Simplex Method. Chapter 5: Hillier and Lieberman Chapter 5: Decision Tools for Agribusiness Dr. Hurley s AGB 328 Course
The Theory of the Simplex Method Chapter 5: Hillier and Lieberman Chapter 5: Decision Tools for Agribusiness Dr. Hurley s AGB 328 Course Terms to Know Constraint Boundary Equation, Hyperplane, Constraint
More informationDesigning Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation
EECS 6A Designing Information Devices and Systems I Fall 08 Lecture Notes Note. Positioning Sytems: Trilateration and Correlation In this note, we ll introduce two concepts that are critical in our positioning
More informationODD EVEN SHIFTS IN SIMD HYPERCUBES 1
ODD EVEN SHIFTS IN SIMD HYPERCUBES 1 Sanjay Ranka 2 and Sartaj Sahni 3 Abstract We develop a linear time algorithm to perform all odd (even) length circular shifts of data in an SIMD hypercube. As an application,
More informationDesigning Information Devices and Systems I Fall 2017 Official Lecture Notes Note 2
EECS 6A Designing Information Devices and Systems I Fall 07 Official Lecture Notes Note Introduction Previously, we introduced vectors and matrices as a way of writing systems of linear equations more
More information= ( 1 P + S V P) 1. Speedup T S /T V p
Numerical Simulation - Homework Solutions 2013 1. Amdahl s law. (28%) Consider parallel processing with common memory. The number of parallel processor is 254. Explain why Amdahl s law for this situation
More informationMatrices. In this chapter: matrices, determinants. inverse matrix
Matrices In this chapter: matrices, determinants inverse matrix 1 1.1 Matrices A matrix is a retangular array of numbers. Rows: horizontal lines. A = a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 a 41 a
More informationCSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms
CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationMATH2210 Notebook 2 Spring 2018
MATH2210 Notebook 2 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2009 2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH2210 Notebook 2 3 2.1 Matrices and Their Operations................................
More informationCSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo
CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a
More informationLogic Design II (17.342) Spring Lecture Outline
Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More information