Outline. Recursive QR factorization. Hybrid Recursive QR factorization. The linear least squares problem. SMP algorithms and implementations
|
|
- Octavia McGee
- 5 years ago
- Views:
Transcription
1 Outline Recursie QR factorization Hybrid Recursie QR factorization he linear least squares problem SMP algorithms and implementations Conclusions
2 Recursion Automatic ariable blocking (e.g., QR factorization) Factorization completed Update completed Fits low leel in memory hierarchy Fits high leel in memory hierarchy. Partition. Factor left hand side 3. Update right hand side 4. Factor right hand side
3 Recursie QR factorization A A A A R Q 0 R R. Diide matrix in two parts (left & right). Factorize left hand side by a recursie call Stopping criteria: if A is a single column, apply a Householder transformation Q R 0?? 3. Update right hand side and factorize by a recursie call A ~R Q Q R A ~ A A
4 Aggregating Householder transformations: Q I - YY ( ) and Y t 0 t t t then, t - I and Q t - I Gien Q wo elementary transformations ( ) Y and Y t 0 t Y then, t I - andq - Y Y I Gien Q One block and one elementary transformation Column by column using Leel operations ( ) Y Y and Y 0 Y Y then, Y Y - I and Q - Y Y I Gien Q wo block transformations Recursiely, block by block using Leel 3 operations
5 RGEQR3 - Recursie algorithm for QR factorization [Y, R, ] RGEQR3 A(:m, :n) if (n ): In practice, Y and R oerwrite A Compute Householder transformation Q I - t u u, such that Q A (x, 0) else return (u, x, t) let n n/ and j n + [Y, R, ] RGEQR3 A(:m, :n )! Recursiely factor left hand side A(:m, j :n) (I - Y Y ) A(:m, j :n)! Update remaining part of A [Y, R, ] RGEQR3 A(j :n, j :n)! Recursiely factor remaining part of A 3 - (Y Y ) end Let R 3 A(:n, j :n) Y Now, return [Y, R, ] 3 3 ( Y Y ), R and R 0 R R 0
6 Some Performance Issues Oerhead from recursion becomes significant as n decreases Cure: Prune the recursion tree - stop recursion at, e.g., n 4 Increasing FLOP count for Q I - Y Y computations prohibits efficient use of pure recursie algorithm for large n Cure: Hybrid recursie algorithm
7 Hybrid recursie algorithm RGEQRF [Y, R, ] RGEQRF A(:m, :n) do j, n, nb jb min(n-j+, nb)! nb is the block size! Factor panel using recursie routine [Y, R, ] RGEQR3 A(j:m, j+jb-) if (j+jb n)! Update remaining part of A A(j:m, j+jb:n) (I - Y Y ) A(j:m, j+jb:n) Relation to LAPACK DGEQRF Leel- factorization of block panels replaced by recursie leel-3 computations Leel- computations of replaced by recursie leel-3 computations in connection with factorization end end Increased performance for block panels and computations Optimal block size is larger Improed performance of leel-3 updates
8 QR: Performance results - 60 MHz Power, m n
9 QR: Performance results - 33 MHz PPC604e, m > n
10 RGELS - Linear least squares routine Sole AX B F X RGELS ( A(:m, :n), B(:n, :nrhs) ) do j, n, nb! nb is the block size jb min(n-j+, nb) Relation to LAPACK DGELS Leel- factorization of block panels replaced by recursie leel-3 computations end! Factor panel using recursie routine [Y, R, ] RGEQR3 A(j:m, j+jb-)! Update remaining part of A & the complete B A(j:m, j+jb:n) (I - Y Y ) A(j:m, j+jb:n) B (I - Y Y ) B leel- computations of replaced by recursie leel-3 computation in connection with factorization (reuse of )! Sole triangular system X R - B A is m x n X is n x nrhs B is m x nrhs
11 RGELS - Additional Cases Sole A X B LAPACK DGELS computes LQ factorization of A and soles remaining triangular system Each Householder transformation is computed on a row of A, i.e., elements are stored with stride LDA RGELS explicitly transposes A and soles AX - B for transposed A F Underdetermined systems: LAPACK DGELS computes minimum norm solution by computing A QR or A LQ Work on RGELS in progress Gains made for oerdetermined systems will generalize to underdetermined case Additional gains can be made by remoing redundant computations in updates A is m x n X is m x nrhs B is n x nrhs
12 RGELS: NRHS AX B F
13 RGELS: transposed, M 50 A X B F
14 SMP-parallel algorithms for matrix factorizations F U F U F 3 U 3 U r- F r. Factor first panel F. Update U & factor F F 3 4 U U U F U Update U & factor F 3 etc Factor F i can start when U i- is completed for that panel Update U i can start when F i is completed and U i- is completed for that panel
15 Parallel RGEQRF Factor first panel HRQR(m, n, A, work,...) if me 0 then call RGEQR3 A(:m,: firstjb) (Y, R, next) end if do while (here is still work enough for me) call GEJOB(j, first, last, jb,, next, Y, nexty, R, dofact,...) A(j:m, first: last) (I - Y Y ) A(j:m, first: last) if dofact then call RGEQR3 A(first:m, first+jb-) (nexty, R, next) end if end while Get a new panel Update Possibly factor Repeat
16 GEJOB - the Pool-of-tasks implementation do while I hae not yet found a new task Enter Critical section If I did factor in my last task: update global ariables If remaining problem is too small for the current # processors then Update global ariables and terminate else Find the next matrix block to update est if it is OK to start working on this block, i.e. test that: - no one is writing on any column in this block - the block I will read is computed - if I will factor: is it safe to oerwrite one of the matrices? If (it is OK to start working on this block) then Update global ariables else Update ariables to show that no block is resered endif endif Leae Critical section enddo Inform about completed job Exit if problem is to small Dependency problems? Find next task Inform about any changes to the pool
17 Parallel performance - 4 processor PPC604e
18 Conclusions Recursion efficiently proides automatic ariable blocking for arbitrary number of leels in a memory hierarchy QR factorization Linear least squares problem Hybrid implementations outperform LAPACK algorithms by around 0% for large square problems up to a factor.9 for tall-thin matrices up to a factor of. - RGELS: AX - B case QR factorization up to a factor of 5 - RGELS: A X - B case Speedups up to 3.97 on 4 processors for parallel QR Serial and parallel QR will be part of the IBM ESSL 3.
19 References E. Elmroth and F. Gustason. A New Much Faster and Simpler Algorithm for LAPACK DGELS. Report UMINF September 000. Submitted to BI. E. Elmroth and F. Gustason. High-Performance Library Software for QR Factorization. In P. Bjørstad et al (eds), Applied Parallel Computing. New Paradigms for HPC in Industry and Academia. Lecture Notes in Computer Science. Springer-Verlag, June 000. (o appear). E. Elmroth and F. Gustason. Applying Recursion to Serial and Parallel QR Factorization Leads to Better Performance. IBM J. Research & Deelopment, Vol. 44, No. 4, 000, pp E. Elmroth and F. Gustason. New Serial and Parallel Recursie QR Factorization Algorithms for SMP Systems. In B. Kågström et al (eds), Applied Parallel Computing. Large Scale Scientific and Industrial Problems. Lecture Notes in Computer Science, No. 54, 998, pp 0-8. Springer-Verlag.
Out-of-Core SVD and QR Decompositions
Out-of-Core SVD and QR Decompositions Eran Rabani and Sivan Toledo 1 Introduction out-of-core singular-value-decomposition algorithm. The algorithm is designed for tall narrow matrices that are too large
More informationCommunication-avoiding parallel and sequential QR factorizations
Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationCommunication-avoiding parallel and sequential QR factorizations
Communication-avoiding parallel and sequential QR factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at
More informationLAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006.
LAPACK-Style Codes for Pivoted Cholesky and QR Updating Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig 2007 MIMS EPrint: 2006.385 Manchester Institute for Mathematical Sciences School of Mathematics
More informationF08BEF (SGEQPF/DGEQPF) NAG Fortran Library Routine Document
NAG Fortran Library Routine Document Note. Before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent
More informationLAPACK-Style Codes for Pivoted Cholesky and QR Updating
LAPACK-Style Codes for Pivoted Cholesky and QR Updating Sven Hammarling 1, Nicholas J. Higham 2, and Craig Lucas 3 1 NAG Ltd.,Wilkinson House, Jordan Hill Road, Oxford, OX2 8DR, England, sven@nag.co.uk,
More informationMinisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu
Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially
More information5 Solving Systems of Linear Equations
106 Systems of LE 5.1 Systems of Linear Equations 5 Solving Systems of Linear Equations 5.1 Systems of Linear Equations System of linear equations: a 11 x 1 + a 12 x 2 +... + a 1n x n = b 1 a 21 x 1 +
More informationAccelerating computation of eigenvectors in the nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationHybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC
Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,
More informationTheoretical Computer Science
Theoretical Computer Science 412 (2011) 1484 1491 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: wwwelseviercom/locate/tcs Parallel QR processing of Generalized
More informationAccelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 5 Eigenvalue Problems Section 5.1 Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael
More informationBinding Performance and Power of Dense Linear Algebra Operations
10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique
More informationMTH 464: Computational Linear Algebra
MTH 464: Computational Linear Algebra Lecture Outlines Exam 2 Material Prof. M. Beauregard Department of Mathematics & Statistics Stephen F. Austin State University February 6, 2018 Linear Algebra (MTH
More informationCommunication avoiding parallel algorithms for dense matrix factorizations
Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline
More informationEvaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries
Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support Scientific Computing Centre @ AUTH Presentation Outline
More informationKrylov Subspace Methods that Are Based on the Minimization of the Residual
Chapter 5 Krylov Subspace Methods that Are Based on the Minimization of the Residual Remark 51 Goal he goal of these methods consists in determining x k x 0 +K k r 0,A such that the corresponding Euclidean
More informationAlgorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems
Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems LESLIE FOSTER and RAJESH KOMMU San Jose State University Existing routines, such as xgelsy or xgelsd in LAPACK, for
More informationLU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California
More informationInverses. Stephen Boyd. EE103 Stanford University. October 28, 2017
Inverses Stephen Boyd EE103 Stanford University October 28, 2017 Outline Left and right inverses Inverse Solving linear equations Examples Pseudo-inverse Left and right inverses 2 Left inverses a number
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More informationAn introduction to parallel algorithms
An introduction to parallel algorithms Knut Mørken Department of Informatics Centre of Mathematics for Applications University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 1/26
More informationBLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product
Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8
More information4.2 Floating-Point Numbers
101 Approximation 4.2 Floating-Point Numbers 4.2 Floating-Point Numbers The number 3.1416 in scientific notation is 0.31416 10 1 or (as computer output) -0.31416E01..31416 10 1 exponent sign mantissa base
More informationCommunication-avoiding LU and QR factorizations for multicore architectures
Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding
More informationQ T A = R ))) ) = A n 1 R
Q T A = R As with the LU factorization of A we have, after (n 1) steps, Q T A = Q T A 0 = [Q 1 Q 2 Q n 1 ] T A 0 = [Q n 1 Q n 2 Q 1 ]A 0 = (Q n 1 ( (Q 2 (Q 1 A 0 }{{} A 1 ))) ) = A n 1 R Since Q T A =
More informationSymmetric rank-2k update on GPUs and/or multi-cores
Symmetric rank-2k update on GPUs and/or multi-cores Assignment 2, 5DV050, Spring 2012 Due on May 4 (soft) or May 11 (hard) at 16.00 Version 1.0 1 Background and motivation Quoting from Beresford N. Parlett's
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3
ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3 ISSUED 24 FEBRUARY 2018 1 Gaussian elimination Let A be an (m n)-matrix Consider the following row operations on A (1) Swap the positions any
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform
CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast
More informationA Divide-and-Conquer Algorithm for Functions of Triangular Matrices
A Divide-and-Conquer Algorithm for Functions of Triangular Matrices Ç. K. Koç Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report, June 1996 Abstract We propose
More informationApplied Numerical Linear Algebra. Lecture 8
Applied Numerical Linear Algebra. Lecture 8 1/ 45 Perturbation Theory for the Least Squares Problem When A is not square, we define its condition number with respect to the 2-norm to be k 2 (A) σ max (A)/σ
More informationParallel Eigensolver Performance on High Performance Computers
Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization
More informationChapter 22 Fast Matrix Multiplication
Chapter 22 Fast Matrix Multiplication A simple but extremely valuable bit of equipment in matrix multiplication consists of two plain cards, with a re-entrant right angle cut out of one or both of them
More informationMatrix Eigensystem Tutorial For Parallel Computation
Matrix Eigensystem Tutorial For Parallel Computation High Performance Computing Center (HPC) http://www.hpc.unm.edu 5/21/2003 1 Topic Outline Slide Main purpose of this tutorial 5 The assumptions made
More information1 :: Mathematical notation
1 :: Mathematical notation x A means x is a member of the set A. A B means the set A is contained in the set B. {a 1,..., a n } means the set hose elements are a 1,..., a n. {x A : P } means the set of
More informationThe Full-rank Linear Least Squares Problem
Jim Lambers COS 7 Spring Semeseter 1-11 Lecture 3 Notes The Full-rank Linear Least Squares Problem Gien an m n matrix A, with m n, and an m-ector b, we consider the oerdetermined system of equations Ax
More informationReview Questions REVIEW QUESTIONS 71
REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of
More informationSymmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano
Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization
More information11.5 Reduction of a General Matrix to Hessenberg Form
476 Chapter 11. Eigensystems 11.5 Reduction of a General Matrix to Hessenberg Form The algorithms for symmetric matrices, given in the preceding sections, are highly satisfactory in practice. By contrast,
More informationA short note on the Householder QR factorization
A short note on the Householder QR factorization Alfredo Buttari October 25, 2016 1 Uses of the QR factorization This document focuses on the QR factorization of a dense matrix. This method decomposes
More informationIntroduction to Mathematical Programming
Introduction to Mathematical Programming Ming Zhong Lecture 6 September 12, 2018 Ming Zhong (JHU) AMS Fall 2018 1 / 20 Table of Contents 1 Ming Zhong (JHU) AMS Fall 2018 2 / 20 Solving Linear Systems A
More informationGeneralized interval arithmetic on compact matrix Lie groups
myjournal manuscript No. (will be inserted by the editor) Generalized interval arithmetic on compact matrix Lie groups Hermann Schichl, Mihály Csaba Markót, Arnold Neumaier Faculty of Mathematics, University
More informationAck: 1. LD Garcia, MTH 199, Sam Houston State University 2. Linear Algebra and Its Applications - Gilbert Strang
Gaussian Elimination CS6015 : Linear Algebra Ack: 1. LD Garcia, MTH 199, Sam Houston State University 2. Linear Algebra and Its Applications - Gilbert Strang The Gaussian Elimination Method The Gaussian
More informationAlgorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen
Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning
More informationCOMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions
COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity
More informationIntroduction to communication avoiding linear algebra algorithms in high performance computing
Introduction to communication avoiding linear algebra algorithms in high performance computing Laura Grigori Inria Rocquencourt/UPMC Contents 1 Introduction............................ 2 2 The need for
More informationTile QR Factorization with Parallel Panel Processing for Multicore Architectures
Tile QR Factorization with Parallel Panel Processing for Multicore Architectures Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, Jack Dongarra Department of Electrical Engineering and Computer Science, University
More informationLinear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4
Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix
More informationModel Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University
Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul
More informationLU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version
1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2
More informationMatrix balancing and robust Monte Carlo algorithm for evaluating dominant eigenpair
Computer Science Journal of Moldova, vol.18, no.3(54), 2010 Matrix balancing and robust Monte Carlo algorithm for evaluating dominant eigenpair Behrouz Fathi Vajargah Farshid Mehrdoust Abstract Matrix
More informationParallel Scientific Computing
IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.
More informationA Comparison of Parallel Solvers for Diagonally. Dominant and General Narrow-Banded Linear. Systems II.
A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems II Peter Arbenz 1, Andrew Cleary 2, Jack Dongarra 3, and Markus Hegland 4 1 Institute of Scientic Computing,
More informationLecture 6. Numerical methods. Approximation of functions
Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation
More informationIntroduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra
Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationFast matrix algebra for dense matrices with rank-deficient off-diagonal blocks
CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices
More informationLoop Invariants and Binary Search. Chapter 4.4, 5.1
Loop Invariants and Binary Search Chapter 4.4, 5.1 Outline Iterative Algorithms, Assertions and Proofs of Correctness Binary Search: A Case Study Outline Iterative Algorithms, Assertions and Proofs of
More informationIn this section again we shall assume that the matrix A is m m, real and symmetric.
84 3. The QR algorithm without shifts See Chapter 28 of the textbook In this section again we shall assume that the matrix A is m m, real and symmetric. 3.1. Simultaneous Iterations algorithm Suppose we
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationIntel Math Kernel Library (Intel MKL) LAPACK
Intel Math Kernel Library (Intel MKL) LAPACK Linear equations Victor Kostin Intel MKL Dense Solvers team manager LAPACK http://www.netlib.org/lapack Systems of Linear Equations Linear Least Squares Eigenvalue
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD
More informationSolving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *
Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * J.M. Badía and A.M. Vidal Dpto. Informática., Univ Jaume I. 07, Castellón, Spain. badia@inf.uji.es Dpto. Sistemas Informáticos y Computación.
More informationTowards Highly Parallel and Compute-Bound Computation of Eigenvectors of Matrices in Schur Form
Towards Highly Parallel and Compute-Bound Computation of Eigenvectors of Matrices in Schur Form Björn Adlerborn, Carl Christian Kjelgaard Mikkelsen, Lars Karlsson, Bo Kågström May 30, 2017 Abstract In
More informationRe-design of Higher level Matrix Algorithms for Multicore and Heterogeneous Architectures. Based on the presentation at UC Berkeley, October 7, 2009
III.1 Re-design of Higher level Matrix Algorithms for Multicore and Heterogeneous Architectures Based on the presentation at UC Berkeley, October 7, 2009 Background and motivation Running time of an algorithm
More informationReview: From problem to parallel algorithm
Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationLecture 13 Stability of LU Factorization; Cholesky Factorization. Songting Luo. Department of Mathematics Iowa State University
Lecture 13 Stability of LU Factorization; Cholesky Factorization Songting Luo Department of Mathematics Iowa State University MATH 562 Numerical Analysis II ongting Luo ( Department of Mathematics Iowa
More informationMath 407: Linear Optimization
Math 407: Linear Optimization Lecture 16: The Linear Least Squares Problem II Math Dept, University of Washington February 28, 2018 Lecture 16: The Linear Least Squares Problem II (Math Dept, University
More informationEigenvalue problems. Eigenvalue problems
Determination of eigenvalues and eigenvectors Ax x, where A is an N N matrix, eigenvector x 0, and eigenvalues are in general complex numbers In physics: - Energy eigenvalues in a quantum mechanical system
More informationRank Revealing QR factorization. F. Guyomarc h, D. Mezher and B. Philippe
Rank Revealing QR factorization F. Guyomarc h, D. Mezher and B. Philippe 1 Outline Introduction Classical Algorithms Full matrices Sparse matrices Rank-Revealing QR Conclusion CSDA 2005, Cyprus 2 Situation
More informationMulticore Parallelization of Determinant Quantum Monte Carlo Simulations
Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March
More informationSorting DS 2017/2018
Sorting DS 2017/2018 Content Sorting based on comparisons Bubble sort Insertion sort Selection sort Merge sort Quick sort Counting sort Distribution sort FII, UAIC Lecture 8 DS 2017/2018 2 / 44 The sorting
More informationPracticality of Large Scale Fast Matrix Multiplication
Practicality of Large Scale Fast Matrix Multiplication Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz UC Berkeley IWASEP June 5, 2012 Napa Valley, CA Research supported by
More informationTall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222
Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222 Bilel Hadri 1, Hatem Ltaief 1, Emmanuel Agullo 1, and Jack Dongarra 1,2,3 1 Department
More informationUpdating the QR factorization and the least squares problem. Hammarling, Sven and Lucas, Craig. MIMS EPrint:
Updating the QR factorization and the least squares problem Hammarling, Sven and Lucas, Craig 2008 MIMS EPrint: 2008.111 Manchester Institute for Mathematical Sciences School of Mathematics The University
More informationMultiplying matrices by diagonal matrices is faster than usual matrix multiplication.
7-6 Multiplying matrices by diagonal matrices is faster than usual matrix multiplication. The following equations generalize to matrices of any size. Multiplying a matrix from the left by a diagonal matrix
More informationMATHEMATICS FOR COMPUTER VISION WEEK 2 LINEAR SYSTEMS. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year
1 MATHEMATICS FOR COMPUTER VISION WEEK 2 LINEAR SYSTEMS Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14 OUTLINE OF WEEK 2 Linear Systems and solutions Systems of linear
More informationTRANSPORTATION PROBLEMS
Chapter 6 TRANSPORTATION PROBLEMS 61 Transportation Model Transportation models deal with the determination of a minimum-cost plan for transporting a commodity from a number of sources to a number of destinations
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationNAG Library Routine Document F08ZFF (DGGRQF).1
NAG Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationAvoiding Communication in Distributed-Memory Tridiagonalization
Avoiding Communication in Distributed-Memory Tridiagonalization SIAM CSE 15 Nicholas Knight University of California, Berkeley March 14, 2015 Joint work with: Grey Ballard (SNL) James Demmel (UCB) Laura
More informationLinear Systems of n equations for n unknowns
Linear Systems of n equations for n unknowns In many application problems we want to find n unknowns, and we have n linear equations Example: Find x,x,x such that the following three equations hold: x
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationLecture 4: Linear Algebra 1
Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation
More informationQR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9
STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 1. qr and complete orthogonal factorization poor man s svd can solve many problems on the svd list using either of these factorizations but they
More informationMultifrontal QR factorization in a multiprocessor. environment. Abstract
Multifrontal QR factorization in a multiprocessor environment P. Amestoy 1, I.S. Du and C. Puglisi Abstract We describe the design and implementation of a parallel QR decomposition algorithm for a large
More information6 Linear Systems of Equations
6 Linear Systems of Equations Read sections 2.1 2.3, 2.4.1 2.4.5, 2.4.7, 2.7 Review questions 2.1 2.37, 2.43 2.67 6.1 Introduction When numerically solving two-point boundary value problems, the differential
More informationPerformance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs
Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack Dongarra presenter 1 Low-Rank
More informationReduction of Smith Normal Form Transformation Matrices
Reduction of Smith Normal Form Transformation Matrices G. Jäger, Kiel Abstract Smith normal form computations are important in group theory, module theory and number theory. We consider the transformation
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationExercise 1.3 Work out the probabilities that it will be cloudy/rainy the day after tomorrow.
54 Chapter Introduction 9 s Exercise 3 Work out the probabilities that it will be cloudy/rainy the day after tomorrow Exercise 5 Follow the instructions for this problem given on the class wiki For the
More informationMapping Dense LU Factorization on Multicore Supercomputer Nodes
Mapping Dense L Factorization on Multicore Supercomputer Nodes Jonathan Lifflander, Phil Miller, Ramprasad Venkataraman, nshu rya, erry Jones, Laxmikant Kale niversity Oak of Illinois rbana-champaign Ridge
More information