Restructuring the Symmetric QR Algorithm for Performance. Field Van Zee Gregorio Quintana-Orti Robert van de Geijn
|
|
- Imogen Cummings
- 6 years ago
- Views:
Transcription
1 Restructuring the Symmetric QR Algorithm for Performance Field Van Zee regorio Quintana-Orti Robert van de eijn 1
2 For details: Field Van Zee, Robert van de eijn, and regorio Quintana-Orti. Restructuring the QR algorithm for Performance. ACM TOMS. Accepted (pending minor modifications) This work was supported by l UTAustin-Portugal Colab program l Microsoft l NSF under grants OCI , CCF , and OCI Any opinions, findings and conclusions or recommendations expressed in this materialare those of the author(s) and do not necessarily reflect the views of the NationalScience Foundation (NSF). 2
3 Overview l 50+ years of progress l The hidden costs of MRRR and D&C l QR algorithm basics l Accumulating and applying rotations l Performance l Conclusion 3
4 Overview l 50+ years of progress l The hidden costs of MRRR and D&C l QR algorithm basics l Accumulating and applying rotations l Performance l Conclusion 4
5 Symmetric EVD/SVD: 50+ Years of Progress l Recent progress focuses a lot on the mathematics side Divide & Conquer (Cuppen s) algorithm (D&C) Method of Relatively Robust Representations (MRRR) l Occasional revisit of Jacobi s method l Progress on QR has been for non-symmetric problem. Aggressive Early Deflation Multishift 5
6 Two Insights l WHEN COMPUTIN THE DENSE EVD (all eigenvalues and vectors), D&C and MRRR have hidden O(n 3 ) cost l QR becomes competitive if rotations are applied in batches Classical QR: cast in terms of vector-vector operations Batched application: cast in terms of computation that reuses data in cache, like matrix-matrix operations. 6
7 Overview l 50+ years of progress l The hidden costs of MRRR and D&C l QR algorithm basics l Accumulating and applying rotations l Performance l Conclusion 7
8 The Hidden Cost of D&C and MRRR l Start with symmetric, dense A l Reduce to tridiagonal form: l Compute Spectral Decomposition of T: l Backtransform: 8
9 Reduction to Tridiagonal Form 9
10 Reduction to Tridiagonal Form 10
11 Reduction to Tridiagonal Form 11
12 Reduction to Tridiagonal Form 12
13 Backtransformation 13
14 Backtransformation 14
15 Backtransformation 15
16 Backtransformation 16
17 Cost of QR algorithm l Start with symmetric, dense A l Reduce to tridiagonal form: l Form Q A l Compute Spectral Decomposition of T while updating Q A 17
18 Form Q A 18
19 Form Q A 19
20 Form Q A 20
21 Form Q A 21
22 Form Q A 22
23 Cost l Backtransformation: 2 n 3 flops l Form Q A : 4/3 n 3 flops l Hidden cost of MRRR and D&C: 2/3 n 3 flops EVD OF A DENSE MATRIX!!! (All eigenvalues and eigenvectors) 23
24 Overview l 50+ years of progress l The hidden costs of MRRR and D&C l QR algorithm basics l Accumulating and applying rotations l Performance l Conclusion 24
25 Classical QR algorithm 25
26 26
27 T 27
28 T 28
29 T 29
30 T 30
31 T 31
32 T 32
33 T 33
34 T 34
35 T 35
36 T 36
37 T 37
38 T 38
39 Overview l 50+ years of progress l The hidden costs of MRRR and D&C l QR algorithm basics l Accumulating and applying rotations l Performance l Conclusion 39
40 Accumulating Rotations (LAPACK) T 40
41 Accumulating Rotations (LAPACK) T 41
42 Accumulating Rotations (LAPACK) T 42
43 Accumulating Rotations (LAPACK) T 43
44 Accumulating Rotations (LAPACK) Apply one sweep worth of rotations. Makes application like level-2 BLAS 44
45 Accumulating Rotations (libflame) T 45
46 Accumulating Rotations (libflame) T 46
47 Accumulating Rotations (libflame) T 47
48 Accumulating Rotations (libflame) T 48
49 Accumulating Rotations (libflame) T 49
50 Accumulating Rotations (libflame) T 50
51 Accumulating Rotations (libflame) T 51
52 Accumulating Rotations (libflame) T 52
53 Accumulating Rotations (libflame) T 53
54 Accumulating Rotations (libflame) T 54
55 Accumulating Rotations (libflame) T 55
56 Accumulating Rotations (libflame) T 56
57 Applying Rotations (libflame) 57
58 Applying Rotations (libflame) 58
59 Optimization l Applying a batch of ivens rotations: O(n 2 b) operations on O(n 2 ) data. Can attain level-3 BLAS performance 59
60 Overview l 50+ years of progress l The hidden costs of MRRR and D&C l QR algorithm basics l Accumulating and applying rotations l Performance l Conclusion 60
61 Predicted Performance Conventional QR / MRRR (real) Restructured QR / MRRR (real) 61
62 Predicted performance (EVD) Conventional QR / MRRR (complex) Restructured QR / MRRR (complex) 62
63 Observed Performance l Target architecture: l Single core of a Dell PowerEdge R900 server l 16 megabyte L2 cache/core. l Single core peak of FLOPS. 63
64 Application of ivens rotations Theoretical Peak for dgemm dgemm Theoretical peak for ivens Kernel Observed 64
65 EVD performance (relative to netlib MRRR) MKL MRRR Netlib MRRR Restructured QR 65
66 66
67 libflame SVD Performance libflame SVD Netlib via DC 67
68 EVD Parallel Performance (24 cores) Performance on clarksville (24 cores) Standardized FLOPS LAPACK QR LAPACK DC LAPACK MRRR Ideal MRRR libflame var1 libflame var2 var2a (vertical wspace in backtrans.) var2r (outside BLAS parallelism) Infinitely fast MRRR tridiag EVD LAPACK MRRR restructured QR Matrix size 68
69 Is your favorite graph missing? l The paper has an electronic appendix with tons of performance graphs. 69
70 Overview l 50+ years of progress l The hidden costs of MRRR and D&C l QR algorithm basics l Accumulating and applying rotations l Performance l Conclusion 70
71 Conclusion l The QR algorithm lives! l Future directions: Parallelization (multi)pu Aggressive early deflation 71
Restructuring the QR Algorithm for High-Performance Application of Givens Rotations
Restructuring the QR Algorithm for High-Performance Application of Givens Rotations FLAME Working Note #60 Field G. Van Zee Robert A. van de Geijn Department of Computer Science The University of Texas
More informationThe Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationIndex. for generalized eigenvalue problem, butterfly form, 211
Index ad hoc shifts, 165 aggressive early deflation, 205 207 algebraic multiplicity, 35 algebraic Riccati equation, 100 Arnoldi process, 372 block, 418 Hamiltonian skew symmetric, 420 implicitly restarted,
More informationSaving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors
20th Euromicro International Conference on Parallel, Distributed and Network-Based Special Session on Energy-aware Systems Saving Energy in the on Multi-Core Processors Pedro Alonso 1, Manuel F. Dolz 2,
More informationSection 4.5 Eigenvalues of Symmetric Tridiagonal Matrices
Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Key Terms Symmetric matrix Tridiagonal matrix Orthogonal matrix QR-factorization Rotation matrices (plane rotations) Eigenvalues We will now complete
More informationBinding Performance and Power of Dense Linear Algebra Operations
10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationAPPLIED NUMERICAL LINEAR ALGEBRA
APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation
More informationMatrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland
Matrix Algorithms Volume II: Eigensystems G. W. Stewart University of Maryland College Park, Maryland H1HJ1L Society for Industrial and Applied Mathematics Philadelphia CONTENTS Algorithms Preface xv xvii
More informationAccelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationNotes on the Symmetric QR Algorithm
Notes on the Symmetric QR Algorithm Robert A van de Geijn Department of Computer Science The University of Texas Austin, TX 78712 rvdg@csutexasedu November 4, 2014 The QR algorithm is a standard method
More informationStrassen s Algorithm for Tensor Contraction
Strassen s Algorithm for Tensor Contraction Jianyu Huang, Devin A. Matthews, Robert A. van de Geijn The University of Texas at Austin September 14-15, 2017 Tensor Computation Workshop Flatiron Institute,
More informationAccelerating computation of eigenvectors in the nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationThe Future of LAPACK and ScaLAPACK
The Future of LAPACK and ScaLAPACK Jason Riedy, Yozo Hida, James Demmel EECS Department University of California, Berkeley November 18, 2005 Outline Survey responses: What users want Improving LAPACK and
More informationNumerical Methods in Matrix Computations
Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices
More informationTOWARD HIGH PERFORMANCE TILE DIVIDE AND CONQUER ALGORITHM FOR THE DENSE SYMMETRIC EIGENVALUE PROBLEM
TOWARD HIGH PERFORMANCE TILE DIVIDE AND CONQUER ALGORITHM FOR THE DENSE SYMMETRIC EIGENVALUE PROBLEM AZZAM HAIDAR, HATEM LTAIEF, AND JACK DONGARRA Abstract. Classical solvers for the dense symmetric eigenvalue
More informationSolving large scale eigenvalue problems
arge scale eigenvalue problems, Lecture 5, March 23, 2016 1/30 Lecture 5, March 23, 2016: The QR algorithm II http://people.inf.ethz.ch/arbenz/ewp/ Peter Arbenz Computer Science Department, ETH Zürich
More informationDepartment of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in. NUMERICAL ANALYSIS Spring 2015
Department of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in NUMERICAL ANALYSIS Spring 2015 Instructions: Do exactly two problems from Part A AND two
More informationAlgorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems
Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems LESLIE FOSTER and RAJESH KOMMU San Jose State University Existing routines, such as xgelsy or xgelsd in LAPACK, for
More informationOrthogonal iteration to QR
Notes for 2016-03-09 Orthogonal iteration to QR The QR iteration is the workhorse for solving the nonsymmetric eigenvalue problem. Unfortunately, while the iteration itself is simple to write, the derivation
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationSaving Energy in Sparse and Dense Linear Algebra Computations
Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain
More informationParallel Eigensolver Performance on the HPCx System
Parallel Eigensolver Performance on the HPCx System Andrew Sunderland, Elena Breitmoser Terascaling Applications Group CCLRC Daresbury Laboratory EPCC, University of Edinburgh Outline 1. Brief Introduction
More informationParallel Eigensolver Performance on High Performance Computers
Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization
More informationOn aggressive early deflation in parallel variants of the QR algorithm
On aggressive early deflation in parallel variants of the QR algorithm Bo Kågström 1, Daniel Kressner 2, and Meiyue Shao 1 1 Department of Computing Science and HPC2N Umeå University, S-901 87 Umeå, Sweden
More informationWeek6. Gaussian Elimination. 6.1 Opening Remarks Solving Linear Systems. View at edx
Week6 Gaussian Elimination 61 Opening Remarks 611 Solving Linear Systems View at edx 193 Week 6 Gaussian Elimination 194 61 Outline 61 Opening Remarks 193 611 Solving Linear Systems 193 61 Outline 194
More informationA DIVIDE-AND-CONQUER METHOD FOR THE TAKAGI FACTORIZATION
SIAM J MATRIX ANAL APPL Vol 0, No 0, pp 000 000 c XXXX Society for Industrial and Applied Mathematics A DIVIDE-AND-CONQUER METHOD FOR THE TAKAGI FACTORIZATION WEI XU AND SANZHENG QIAO Abstract This paper
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform
CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast
More informationA PARALLEL EIGENSOLVER FOR DENSE SYMMETRIC MATRICES BASED ON MULTIPLE RELATIVELY ROBUST REPRESENTATIONS
SIAM J. SCI. COMPUT. Vol. 27, No. 1, pp. 43 66 c 2005 Society for Industrial and Applied Mathematics A PARALLEL EIGENSOLVER FOR DENSE SYMMETRIC MATRICES BASED ON MULTIPLE RELATIVELY ROBUST REPRESENTATIONS
More informationDesign of Scalable Dense Linear Algebra Libraries for Multithreaded Architectures: the LU Factorization
Design of Scalable Dense Linear Algebra Libraries for Multithreaded Architectures: the LU Factorization Gregorio Quintana-Ortí, Enrique S. Quintana-Ortí, Ernie Chan 2, Robert A. van de Geijn 2, and Field
More informationFamilies of Algorithms for Reducing a Matrix to Condensed Form
Families of Algorithms for Reducing a Matrix to Condensed Form FIELD G. VAN ZEE, The University of Texas at Austin ROBERT A. VAN DE GEIJN, The University of Texas at Austin GREGORIO QUINTANA-ORTí, Universidad
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 Algorithms Notes for 2016-10-31 There are several flavors of symmetric eigenvalue solvers for which there is no equivalent (stable) nonsymmetric solver. We discuss four algorithmic ideas: the workhorse
More informationDirect Methods for Matrix Sylvester and Lyapunov Equations
Direct Methods for Matrix Sylvester and Lyapunov Equations D. C. Sorensen and Y. Zhou Dept. of Computational and Applied Mathematics Rice University Houston, Texas, 77005-89. USA. e-mail: {sorensen,ykzhou}@caam.rice.edu
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationParallel Algorithms for Reducing the Generalized Hermitian-Definite Eigenvalue Problem
Parallel lgorithms for Reducing the Generalized Hermitian-Definite Eigenvalue Problem FLME Working Note #56 Jack Poulson Robert. van de Geijn Jeffrey Bennighof February, 2 bstract We discuss the parallel
More informationParallel Eigensolver Performance on High Performance Computers 1
Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific
More informationNUMERICAL COMPUTATION IN SCIENCE AND ENGINEERING
NUMERICAL COMPUTATION IN SCIENCE AND ENGINEERING C. Pozrikidis University of California, San Diego New York Oxford OXFORD UNIVERSITY PRESS 1998 CONTENTS Preface ix Pseudocode Language Commands xi 1 Numerical
More informationDirect methods for symmetric eigenvalue problems
Direct methods for symmetric eigenvalue problems, PhD McMaster University School of Computational Engineering and Science February 4, 2008 1 Theoretical background Posing the question Perturbation theory
More informationPreface to Second Edition... vii. Preface to First Edition...
Contents Preface to Second Edition..................................... vii Preface to First Edition....................................... ix Part I Linear Algebra 1 Basic Vector/Matrix Structure and
More informationIndex. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden
Index 1-norm, 15 matrix, 17 vector, 15 2-norm, 15, 59 matrix, 17 vector, 15 3-mode array, 91 absolute error, 15 adjacency matrix, 158 Aitken extrapolation, 157 algebra, multi-linear, 91 all-orthogonality,
More informationParallel Algorithms for Reducing the Generalized Hermitian-Definite Eigenvalue Problem
Parallel lgorithms for Reducing the Generalized Hermitian-Definite Eigenvalue Problem JCK POULSON The University of Texas at ustin and ROBERT. VN DE GEIJN The University of Texas at ustin We discuss the
More informationDivide and Conquer Symmetric Tridiagonal Eigensolver for Multicore Architectures
Divide and Conquer Symmetric Tridiagonal Eigensolver for Multicore Architectures Grégoire Pichon, Azzam Haidar, Mathieu Faverge, Jakub Kurzak To cite this version: Grégoire Pichon, Azzam Haidar, Mathieu
More informationAlgorithms for Reducing a Matrix to Condensed Form
lgorithms for Reducing a Matrix to Condensed Form FLME Working Note #53 Field G. Van Zee Robert. van de Geijn Gregorio Quintana-Ortí G. Joseph Elizondo October 3, 2 Revised January 3, 22 bstract In a recent
More informationA Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures. F Tisseur and J Dongarra
A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures F Tisseur and J Dongarra 999 MIMS EPrint: 2007.225 Manchester Institute for Mathematical
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More informationA hybrid Hermitian general eigenvalue solver
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,
More informationPerformance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures
Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures Khairul Kabir University of Tennessee kkabir@vols.utk.edu Azzam Haidar
More informationPRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM
Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value
More informationCOMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare
COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of
More informationExponentials of Symmetric Matrices through Tridiagonal Reductions
Exponentials of Symmetric Matrices through Tridiagonal Reductions Ya Yan Lu Department of Mathematics City University of Hong Kong Kowloon, Hong Kong Abstract A simple and efficient numerical algorithm
More information1. Introduction. Applying the QR algorithm to a real square matrix A yields a decomposition of the form
BLOCK ALGORITHMS FOR REORDERING STANDARD AND GENERALIZED SCHUR FORMS LAPACK WORKING NOTE 171 DANIEL KRESSNER Abstract. Block algorithms for reordering a selected set of eigenvalues in a standard or generalized
More informationD. Gimenez, M. T. Camara, P. Montilla. Aptdo Murcia. Spain. ABSTRACT
Accelerating the Convergence of Blocked Jacobi Methods 1 D. Gimenez, M. T. Camara, P. Montilla Departamento de Informatica y Sistemas. Univ de Murcia. Aptdo 401. 0001 Murcia. Spain. e-mail: fdomingo,cpmcm,cppmmg@dif.um.es
More informationPERFORMANCE AND ACCURACY OF LAPACK S SYMMETRIC TRIDIAGONAL EIGENSOLVERS
SIAM J. SCI. COMPUT. Vol. 30, No. 3, pp. 1508 1526 c 2008 Society for Industrial and Applied Mathematics PERFORMANCE AND ACCURACY OF LAPACK S SYMMETRIC TRIDIAGONAL EIGENSOLVERS JAMES W. DEMMEL, OSNI A.
More informationCuppen s Divide and Conquer Algorithm
Chapter 4 Cuppen s Divide and Conquer Algorithm In this chapter we deal with an algorithm that is designed for the efficient solution of the symmetric tridiagonal eigenvalue problem a b (4) x λx, b a bn
More informationLinear algebra & Numerical Analysis
Linear algebra & Numerical Analysis Eigenvalues and Eigenvectors Marta Jarošová http://homel.vsb.cz/~dom033/ Outline Methods computing all eigenvalues Characteristic polynomial Jacobi method for symmetric
More informationHigh Relative Precision of Eigenvalues Calculated with Jacobi Methods
High Relative Precision of Eigenvalues Calculated with Jacobi Methods ANA JULIA VIAMONE*, RUI RALHA ** *Departamento de Inovação, Ciência e ecnologia Universidade Portucalense Rua Dr. Ant. Bernardino de
More informationAlgorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD
Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD Yuji Nakatsukasa PhD dissertation University of California, Davis Supervisor: Roland Freund Householder 2014 2/28 Acknowledgment
More informationTHE QR ALGORITHM REVISITED
THE QR ALGORITHM REVISITED DAVID S. WATKINS Abstract. The QR algorithm is still one of the most important methods for computing eigenvalues and eigenvectors of matrices. Most discussions of the QR algorithm
More informationA NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS
A NOVEL PARALLEL QR ALGORITHM FOR HYBRID DISTRIBUTED MEMORY HPC SYSTEMS ROBERT GRANAT, BO KÅGSTRÖM, AND DANIEL KRESSNER Abstract A novel variant of the parallel QR algorithm for solving dense nonsymmetric
More informationBalanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems
Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.
More informationUpdating an LU factorization with Pivoting. FLAME Working Note #21
Updating an LU factorization with Pivoting FLAM Working Note # nrique S. Quintana-Ortí epartamento de Ingeniería y Ciencia de Computadores Universidad Jaume I Campus Riu Sec.7 Castellón, Spain quintana@icc.uji.es
More informationDomain Decomposition-based contour integration eigenvalue solvers
Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM
More informationLevel-3 BLAS on a GPU
Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón
More informationParallelization of the Molecular Orbital Program MOS-F
Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of
More informationPower Profiling of Cholesky and QR Factorizations on Distributed Memory Systems
Noname manuscript No. (will be inserted by the editor) Power Profiling of Cholesky and QR Factorizations on Distributed s George Bosilca Hatem Ltaief Jack Dongarra Received: date / Accepted: date Abstract
More informationAn Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor
An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor Christian Lessig Abstract The Algorithm of Multiple Relatively Robust Representations (MRRR) is one of the most efficient and accurate
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More information1 Eigenvalues and eigenvectors
1 Eigenvalues and eigenvectors 1.1 Introduction A non-zero column-vector v is called the eigenvector of a matrix A with the eigenvalue λ, if Av = λv. (1) If an n n matrix A is real and symmetric, A T =
More informationAn Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor
An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor Christian Lessig Abstract The Algorithm of Multiple Relatively Robust Representations (MRRRR) is one of the most efficient and most
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 17 1 / 26 Overview
More informationLecture 2: Numerical linear algebra
Lecture 2: Numerical linear algebra QR factorization Eigenvalue decomposition Singular value decomposition Conditioning of a problem Floating point arithmetic and stability of an algorithm Linear algebra
More informationTensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University
Tensor Network Computations in Quantum Chemistry Charles F. Van Loan Department of Computer Science Cornell University Joint work with Garnet Chan, Department of Chemistry and Chemical Biology, Cornell
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationPerformance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor
Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Masami Takata 1, Hiroyuki Ishigami 2, Kini Kimura 2, and Yoshimasa Nakamura 2 1 Academic Group of Information
More informationMore Gaussian Elimination and Matrix Inversion
Week7 More Gaussian Elimination and Matrix Inversion 7 Opening Remarks 7 Introduction 235 Week 7 More Gaussian Elimination and Matrix Inversion 236 72 Outline 7 Opening Remarks 235 7 Introduction 235 72
More informationApplied Linear Algebra
Applied Linear Algebra Peter J. Olver School of Mathematics University of Minnesota Minneapolis, MN 55455 olver@math.umn.edu http://www.math.umn.edu/ olver Chehrzad Shakiban Department of Mathematics University
More informationThe geometric mean algorithm
The geometric mean algorithm Rui Ralha Centro de Matemática Universidade do Minho 4710-057 Braga, Portugal email: r ralha@math.uminho.pt Abstract Bisection (of a real interval) is a well known algorithm
More informationEigenvalue problems and optimization
Notes for 2016-04-27 Seeking structure For the past three weeks, we have discussed rather general-purpose optimization methods for nonlinear equation solving and optimization. In practice, of course, we
More informationarxiv: v2 [math.na] 7 Dec 2016
HOUSEHOLDER QR FACTORIZATION WITH RANDOMIZATION FOR COLUMN PIVOTING HQRRP PER-GUNNAR MARTINSSON, GREGORIO QUINTANA ORTí, NATHAN HEAVNER, AND ROBERT VAN DE GEIJN arxiv:1512.02671v2 [math.na] 7 Dec 2016
More informationEfficient Evaluation of Matrix Polynomials
Efficient Evaluation of Matrix Polynomials Niv Hoffman 1, Oded Schwartz 2, and Sivan Toledo 1(B) 1 Tel-Aviv University, Tel Aviv, Israel stoledo@tau.ac.il 2 The Hebrew University of Jerusalem, Jerusalem,
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationLINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 74 6 Summary Here we summarize the most important information about theoretical and numerical linear algebra. MORALS OF THE STORY: I. Theoretically
More informationMinisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu
Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationWhy the QR Factorization can be more Accurate than the SVD
Why the QR Factorization can be more Accurate than the SVD Leslie V. Foster Department of Mathematics San Jose State University San Jose, CA 95192 foster@math.sjsu.edu May 10, 2004 Problem: or Ax = b for
More informationPreconditioned Parallel Block Jacobi SVD Algorithm
Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic
More informationNumerical Methods I: Eigenvalues and eigenvectors
1/25 Numerical Methods I: Eigenvalues and eigenvectors Georg Stadler Courant Institute, NYU stadler@cims.nyu.edu November 2, 2017 Overview 2/25 Conditioning Eigenvalues and eigenvectors How hard are they
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012
MAGMA Matrix Algebra on GPU and Multicore Architectures Mark Gates February 2012 1 Hardware trends Scale # cores instead of clock speed Hardware issue became software issue Multicore Hybrid 1.E+07 1e7
More informationMaking electronic structure methods scale: Large systems and (massively) parallel computing
AB Making electronic structure methods scale: Large systems and (massively) parallel computing Ville Havu Department of Applied Physics Helsinki University of Technology - TKK Ville.Havu@tkk.fi 1 Outline
More informationParallel Model Reduction of Large Linear Descriptor Systems via Balanced Truncation
Parallel Model Reduction of Large Linear Descriptor Systems via Balanced Truncation Peter Benner 1, Enrique S. Quintana-Ortí 2, Gregorio Quintana-Ortí 2 1 Fakultät für Mathematik Technische Universität
More informationComputing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm
Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo November 19, 2010 Today
More informationEigenvalue problems. Eigenvalue problems
Determination of eigenvalues and eigenvectors Ax x, where A is an N N matrix, eigenvector x 0, and eigenvalues are in general complex numbers In physics: - Energy eigenvalues in a quantum mechanical system
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationA Divide-and-Conquer Method for the Takagi Factorization
A Divide-and-Conquer Method for the Takagi Factorization Wei Xu 1 and Sanzheng Qiao 1, Department of Computing and Software, McMaster University Hamilton, Ont, L8S 4K1, Canada. 1 xuw5@mcmaster.ca qiao@mcmaster.ca
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationGPU accelerated Arnoldi solver for small batched matrix
15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA
More informationSpectral Methods for Subgraph Detection
Spectral Methods for Subgraph Detection Nadya T. Bliss & Benjamin A. Miller Embedded and High Performance Computing Patrick J. Wolfe Statistics and Information Laboratory Harvard University 12 July 2010
More informationSakurai-Sugiura algorithm based eigenvalue solver for Siesta. Georg Huhs
Sakurai-Sugiura algorithm based eigenvalue solver for Siesta Georg Huhs Motivation Timing analysis for one SCF-loop iteration: left: CNT/Graphene, right: DNA Siesta Specifics High fraction of EVs needed
More information