Scalable, Robust, Fault-Tolerant Parallel QR Factorization. Camille Coti
|
|
- Eustace Townsend
- 5 years ago
- Views:
Transcription
1 Communication-avoiding Fault-tolerant Scalable, Robust, Fault-Tolerant Parallel Factorization Camille Coti LIPN, CNRS UMR 7030, SPC, Université Paris 13, France DCABES August 24th, / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
2 Roadmap Communication-avoiding Fault-tolerant 1 Failures Factorization 2 Communication-avoiding 3 Fault-tolerant 4 2 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
3 Communication-avoiding Fault-tolerant Large-scale systems are volatile Failures Factorization Life expectancy of an electronic component: the famous bathtub curve 3 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
4 Communication-avoiding Fault-tolerant Reliability of a distributed system Failures Factorization Mean Time Between Failures n 1 1 MT BF total = ( ) 1 (1) MT BF i The more components a system is made of, the more likely it is to have a failure. Mean Time Between Failures of the system (hours) i= H H H H e+06 Number of components in the system 4 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
5 Factorization Communication-avoiding Fault-tolerant Failures Factorization Matrix A: A = R is upper-triangular Q is orthogonal Unicity: Yes, if the diagonal elements of R are positive No otherwise Several algorithms: Dense linear algebra: algorithms based on unitary transformations. (Modified) Gram-Schmidt, Givens rotation, Householder projection... 5 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
6 Factorization Communication-avoiding Fault-tolerant Failures Factorization Panel-update approach Factorize a panel Update the trailing matrix Repeat recursively on the trailing matrix R ( ) A11 A 12 A = A 21 A 22 ( ) 1 R 12 = Q 1 0 A 1 22 Q
7 Factorization Communication-avoiding Fault-tolerant Failures Factorization Panel-update approach Factorize a panel Update the trailing matrix Repeat recursively on the trailing matrix R ( ) A11 A 12 A = A 21 A 22 ( ) 1 R 12 = Q 1 0 A 1 22 Q
8 Factorization Communication-avoiding Fault-tolerant Failures Factorization Panel-update approach Factorize a panel Update the trailing matrix Repeat recursively on the trailing matrix R ( ) A11 A 12 A = A 21 A 22 ( ) 1 R 12 = Q 1 0 A 1 22 Q panel trailing matrix 6 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
9 Communication-avoiding Fault-tolerant 1 Failures Factorization 2 Communication-avoiding 3 Fault-tolerant 4 7 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
10 Communication-avoiding Fault-tolerant factorization of a panel: particular shape M >> N: the matrix is tall-and-skinny Specific algorithm: TS ( A11 A 21 ) ( ) 1 = Q 1 0 P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 8 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
11 Communication-avoiding Fault-tolerant factorization of a panel: particular shape M >> N: the matrix is tall-and-skinny Specific algorithm: TS ( A11 A 21 ) ( ) 1 = Q 1 0 P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 8 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
12 Communication-avoiding Fault-tolerant factorization of a panel: particular shape M >> N: the matrix is tall-and-skinny Specific algorithm: TS ( A11 A 21 ) ( ) 1 = Q 1 0 P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 8 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
13 Communication-avoiding Fault-tolerant factorization of a panel: particular shape M >> N: the matrix is tall-and-skinny Specific algorithm: TS ( A11 A 21 ) ( ) 1 = Q 1 0 P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 8 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
14 Communication-avoiding Fault-tolerant factorization of a panel: particular shape M >> N: the matrix is tall-and-skinny Specific algorithm: TS ( A11 A 21 ) ( ) 1 = Q 1 0 P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 8 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
15 Communication-avoiding Fault-tolerant factorization of a panel: particular shape M >> N: the matrix is tall-and-skinny Specific algorithm: TS ( A11 A 21 ) ( ) 1 = Q 1 0 P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 8 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
16 Communication-avoiding Fault-tolerant After the panel factorization: Compute the compact representation of Q 1: Q 1 = I Y 1T 1Y1 T Use the Y 1 and T 1 matrices to update the trailing matrix: ( I Y1T 1Y T 1 ) ( A 12 A 22 ) = = ( A12 A 22 ( 2 A 1 22 ) ) Y 1(T T 1 (Y T 1 ( A12 A 22 ) )) R 12 : upper part of the R A 1 22 : new trailing matrix. ( ) A11 A 12 A = A 21 A 22 ( ) 1 R 12 = Q 1 0 A / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
17 Communication-avoiding Fault-tolerant Step 0 Step 1 Step 2 P0 P0 P0 P1 P1 P1 P2 P2 P2 P3 P3 P3 Cij 10 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
18 Communication-avoiding Fault-tolerant Step 0 Step 1 Step 2 P0 P0 P0 P1 P1 P1 P2 P2 P2 P3 P3 P3 Cij 10 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
19 Communication-avoiding Fault-tolerant Step 0 Step 1 Step 2 P0 P0 P0 P1 P1 P1 P2 P2 P2 P3 P3 P3 Cij 10 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
20 Communication-avoiding Fault-tolerant 1 Failures Factorization 2 Communication-avoiding 3 Fault-tolerant 4 11 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
21 Communication-avoiding Fault-tolerant Let s look at TS in details P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 12 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
22 Communication-avoiding Fault-tolerant Let s look at TS in details P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 12 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
23 Communication-avoiding Fault-tolerant Let s look at TS in details P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 12 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
24 Communication-avoiding Fault-tolerant Let s look at TS in details P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 12 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
25 Communication-avoiding Fault-tolerant Let s look at TS in details P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 12 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
26 Communication-avoiding Fault-tolerant Let s look at TS in details P0 A0 R V0 V0 V P1 A1 V1 P2 A2 V2 V2 P3 A3 V3 12 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
27 Communication-avoiding Fault-tolerant P 0 works beginning end P 2 works during the first two steps, then stops P 1 and P 3 work during the first step, then stops Let s put these lazy dudes to work! 13 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
28 Fault-tolerant TS Communication-avoiding Fault-tolerant P 0 P 1 P 2 P 3 14 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
29 Fault-tolerant TS Communication-avoiding Fault-tolerant P 0 P 1 P 2 P 3 14 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
30 Fault-tolerant TS Communication-avoiding Fault-tolerant P 0 P 1 P 2 P 3 14 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
31 Communication-avoiding Fault-tolerant Update of the trailing matrix Triggered by the tree of the panel factorization Uses the intermediate Y and T matrices Pairwise operation tree ( ) R Ĉ 0 A = Q Ĉ 1 Decompose the blocks of the trailing matrix: ( ) ( ) C C i = i Ci[: N 1] C i = C i[n :] Compute the update : ( ) ( ) C 0 C R 1 C 1 = 0 C 1 P 0 Ĉ 0 = C 0 W C 0 W P 1 T W = T T (C 0 Y T 1 C 1) Ĉ 1 = C 1 Y 1W 15 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
32 Fault-tolerant update Communication-avoiding Fault-tolerant Change the order of the operations T W = T T (C 0 Y T 1 C 1) Ĉ 0 = C 0 W P 0 C 0 C 1, Y 1 P 1 T W = T T (C 0 Y T 1 C 1) Ĉ 1 = C 1 Y 1W If a process fails Both process have the W matrix Computation recovered from the W matrix and the initial submatrix 16 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
33 Fault-tolerant update Communication-avoiding Fault-tolerant Change the order of the operations T W = T T (C 0 Y T 1 C 1) Ĉ 0 = C 0 W P 0 C 0 C 1, Y 1 P 1 T W = T T (C 0 Y T 1 C 1) Ĉ 1 = C 1 Y 1W If a process fails Both process have the W matrix Computation recovered from the W matrix and the initial submatrix 16 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
34 Communication scheme Communication-avoiding Fault-tolerant P 0 P 1 P 2 P 3 17 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
35 Soft errors Communication-avoiding Fault-tolerant Soft errors : Bit flips causing numerical errors Caused by radiations, cosmic rays, various electric disturbances... Approaches to tackle soft errors: Triple modular redundancy ECC memory Multiple executions... At several levels: hardware, system, application Idea here: Take advantage of partial redundancy to detect errors Recompute only the erroneous step 18 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
36 Soft errors Communication-avoiding Fault-tolerant exchange check check exchange check P 0 P 1 P 2 P 3 19 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
37 Soft errors Communication-avoiding Fault-tolerant exchange check check exchange check P 0 P 1 P 2 P 3 19 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
38 Soft errors Communication-avoiding Fault-tolerant exchange check check exchange check P 0 P 1 P 2 P 3 19 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
39 Soft errors Communication-avoiding Fault-tolerant exchange check check exchange check P 0 P 1 P 2 P 3 19 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
40 Soft errors Communication-avoiding Fault-tolerant exchange check check exchange check P 0 P 1 P 2 P 3 19 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
41 Communication-avoiding Fault-tolerant 1 Failures Factorization 2 Communication-avoiding 3 Fault-tolerant 4 20 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
42 Communication-avoiding Fault-tolerant Algorithm-based fault-tolerance Failure recovery is handled by the application Behavior upon failures: defined in the algorithm Goal: Minimal overhead during failure-free executions As few recomputations as possible after failures Fault-tolerant algorithms for factorization Use intrinsic redundancy or idle processes Operation reordering based on algebraic properties Small modifications in the critical path No additional computations Data exchange instead of send/recv Partial results already ready for recovery 21 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
43 Appendix: performance Communication-avoiding Fault-tolerant Time (sec) CA FT TS FT CA TS 0 1k 4k 8k 16k 32k Matrix size 21 / 21 C. Coti Scalable, Robust, Fault-Tolerant Parallel Factorization
Scalable, Robust, Fault-Tolerant Parallel QR Factorization
Scalable, Robust, Fault-Tolerant Parallel QR Factorization Camille Coti LIPN, CNRS UMR 7030, Université Paris 13, Sorbonne Paris Cité 99, avenue Jean-Baptiste Clément, F-93430 Villetaneuse, FRANCE camille.coti@lipn.univ-paris13.fr
More informationQR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT
More informationAvoiding Communication in Distributed-Memory Tridiagonalization
Avoiding Communication in Distributed-Memory Tridiagonalization SIAM CSE 15 Nicholas Knight University of California, Berkeley March 14, 2015 Joint work with: Grey Ballard (SNL) James Demmel (UCB) Laura
More informationCommunication avoiding parallel algorithms for dense matrix factorizations
Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013
More informationThe 'linear algebra way' of talking about "angle" and "similarity" between two vectors is called "inner product". We'll define this next.
Orthogonality and QR The 'linear algebra way' of talking about "angle" and "similarity" between two vectors is called "inner product". We'll define this next. So, what is an inner product? An inner product
More informationEECS150 - Digital Design Lecture 26 - Faults and Error Correction. Types of Faults in Digital Designs
EECS150 - Digital Design Lecture 26 - Faults and Error Correction April 25, 2013 John Wawrzynek 1 Types of Faults in Digital Designs Design Bugs (function, timing, power draw) detected and corrected at
More informationFault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing
20 Years of Innovative Computing Knoxville, Tennessee March 26 th, 200 Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing Zizhong (Jeffrey) Chen zchen@mines.edu Colorado School
More informationHigh Performance Dense Linear System Solver with Resilience to Multiple Soft Errors
Procedia Computer Science Procedia Computer Science 00 (2012) 1 10 International Conference on Computational Science, ICCS 2012 High Performance Dense Linear System Solver with Resilience to Multiple Soft
More informationMATH 350: Introduction to Computational Mathematics
MATH 350: Introduction to Computational Mathematics Chapter V: Least Squares Problems Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Spring 2011 fasshauer@iit.edu MATH
More informationFault Tolerance. Dealing with Faults
Fault Tolerance Real-time computing systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform
CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast
More informationCommunication-avoiding parallel and sequential QR factorizations
Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationReliability of Technical Systems
Main Topics 1. Introduction, Key Terms, Framing the Problem 2. Reliability Parameters: Failure Rate, Failure Probability, etc. 3. Some Important Reliability Distributions 4. Component Reliability 5. Software
More informationTradeoff between Reliability and Power Management
Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,
More informationFAULT TOLERANT SYSTEMS
ﻋﻨﻮان درس ﻧﺎم اﺳﺘﺎد 1394-95 ﻧﺎم درس ﻧﺎم رﺷﺘﻪ ﻧﺎم ﮔﺮاﯾﺶ ﻧﺎم ﻣﻮﻟﻒ ﻧﺎم ﮐﺎرﺷﻨﺎس درس FAULT TOLERANT SYSTEMS Part 8 RAID Systems Chapter 3 Information Redundancy RAID - Redundant Arrays of Inexpensive (Independent
More informationlecture 2 and 3: algorithms for linear algebra
lecture 2 and 3: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 27, 2018 Solving a system of linear equations
More informationHybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC
Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,
More informationCommunication-avoiding LU and QR factorizations for multicore architectures
Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding
More informationIntroduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra
Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and
More informationRadiation Effects on Electronics. Dr. Brock J. LaMeres Associate Professor Electrical & Computer Engineering Montana State University
Dr. Brock J. LaMeres Associate Professor Electrical & Computer Engineering Montana State University Research Statement Support the Computing Needs of Space Exploration & Science Computation Power Efficiency
More informationLecture 6. Numerical methods. Approximation of functions
Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation
More informationLINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS
LINEAR ALGEBRA, -I PARTIAL EXAM SOLUTIONS TO PRACTICE PROBLEMS Problem (a) For each of the two matrices below, (i) determine whether it is diagonalizable, (ii) determine whether it is orthogonally diagonalizable,
More informationLINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 74 6 Summary Here we summarize the most important information about theoretical and numerical linear algebra. MORALS OF THE STORY: I. Theoretically
More informationA Backward/Forward Recovery Approach for the Preconditioned Conjugate Gradient Method
A Backward/Forward Recovery Approach for the Preconditioned Conjugate Gradient Method Fasi, Massimiliano and Langou, Julien and Robert, Yves and Uçar, Bora 2015 MIMS EPrint: 2015.113 Manchester Institute
More informationCommunication-avoiding parallel and sequential QR factorizations
Communication-avoiding parallel and sequential QR factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at
More informationLecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation
Lecture Cholesky method QR decomposition Terminology Linear systems: Eigensystems: Jacobi transformations QR transformation Cholesky method: For a symmetric positive definite matrix, one can do an LU decomposition
More informationWhat is a quantum computer? Quantum Architecture. Quantum Mechanics. Quantum Superposition. Quantum Entanglement. What is a Quantum Computer (contd.
What is a quantum computer? Quantum Architecture by Murat Birben A quantum computer is a device designed to take advantage of distincly quantum phenomena in carrying out a computational task. A quantum
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 5 Eigenvalue Problems Section 5.1 Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael
More informationlecture 3 and 4: algorithms for linear algebra
lecture 3 and 4: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 30, 2016 Solving a system of linear equations
More informationScalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults
Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults K.Morris, F.Rizzi, K.Sargsyan, K.Dahlgren, P.Mycek, C.Safta, O.LeMaitre, O.Knio, B.Debusschere Sandia National
More informationEECS150 - Digital Design Lecture 26 Faults and Error Correction. Recap
EECS150 - Digital Design Lecture 26 Faults and Error Correction Nov. 26, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof.
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationTile QR Factorization with Parallel Panel Processing for Multicore Architectures
Tile QR Factorization with Parallel Panel Processing for Multicore Architectures Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, Jack Dongarra Department of Electrical Engineering and Computer Science, University
More informationDRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13
slide 1 DRAM Reliability: Parity, ECC, Chipkill, Scrubbing Alpha Particle or Cosmic Ray electron-hole pairs silicon Alpha Particles: Radioactive impurity in package material slide 2 - Soft errors were
More informationParallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors
Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer
More informationRobust Network Codes for Unicast Connections: A Case Study
Robust Network Codes for Unicast Connections: A Case Study Salim Y. El Rouayheb, Alex Sprintson, and Costas Georghiades Department of Electrical and Computer Engineering Texas A&M University College Station,
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 1 Introduction C. M. Krishna Fall 2006 ECE655/Krishna Part.1.1 Prerequisites Basic courses in
More informationTerminology and Concepts
Terminology and Concepts Prof. Naga Kandasamy 1 Goals of Fault Tolerance Dependability is an umbrella term encompassing the concepts of reliability, availability, performability, safety, and testability.
More informationIn this section again we shall assume that the matrix A is m m, real and symmetric.
84 3. The QR algorithm without shifts See Chapter 28 of the textbook In this section again we shall assume that the matrix A is m m, real and symmetric. 3.1. Simultaneous Iterations algorithm Suppose we
More information2.5D algorithms for distributed-memory computing
ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationTall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222
Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222 Bilel Hadri 1, Hatem Ltaief 1, Emmanuel Agullo 1, and Jack Dongarra 1,2,3 1 Department
More informationSection 4.5 Eigenvalues of Symmetric Tridiagonal Matrices
Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Key Terms Symmetric matrix Tridiagonal matrix Orthogonal matrix QR-factorization Rotation matrices (plane rotations) Eigenvalues We will now complete
More informationWork in Progress: Reachability Analysis for Time-triggered Hybrid Systems, The Platoon Benchmark
Work in Progress: Reachability Analysis for Time-triggered Hybrid Systems, The Platoon Benchmark François Bidet LIX, École polytechnique, CNRS Université Paris-Saclay 91128 Palaiseau, France francois.bidet@polytechnique.edu
More informationA FUZZY LOGIC BASED MULTI-SENSOR NAVIGATION SYSTEM FOR AN UNMANNED SURFACE VEHICLE. T Xu, J Chudley and R Sutton
A FUZZY LOGIC BASED MULTI-SENSOR NAVIGATION SYSTEM FOR AN UNMANNED SURFACE VEHICLE T Xu, J Chudley and R Sutton Marine and Industrial Dynamic Analysis Research Group School of Engineering The University
More informationThis can be accomplished by left matrix multiplication as follows: I
1 Numerical Linear Algebra 11 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method
More informationImportant Matrix Factorizations
LU Factorization Choleski Factorization The QR Factorization LU Factorization: Gaussian Elimination Matrices Gaussian elimination transforms vectors of the form a α, b where a R k, 0 α R, and b R n k 1,
More informationOn aggressive early deflation in parallel variants of the QR algorithm
On aggressive early deflation in parallel variants of the QR algorithm Bo Kågström 1, Daniel Kressner 2, and Meiyue Shao 1 1 Department of Computing Science and HPC2N Umeå University, S-901 87 Umeå, Sweden
More informationCoding for loss tolerant systems
Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon
More informationLU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version
1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2
More informationPerformance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs
Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack Dongarra presenter 1 Low-Rank
More informationEECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs)
EECS150 - igital esign Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Nov 21, 2002 John Wawrzynek Fall 2002 EECS150 Lec26-ECC Page 1 Outline Error detection using parity Hamming
More informationAn Efficient Algorithm for Computing the Reliability of Consecutive-k-out-of-n:F Systems
1 An Efficient Algorithm for Computing the Reliability of Consecutive-k-out-of-n:F Systems Thomas Cluzeau, Jörg Keller, Member, IEEE Computer Society, and Winfrid Schneeweiss, Senior Life Member, IEEE
More informationLecture 3: QR-Factorization
Lecture 3: QR-Factorization This lecture introduces the Gram Schmidt orthonormalization process and the associated QR-factorization of matrices It also outlines some applications of this factorization
More informationTheoretical Computer Science
Theoretical Computer Science 412 (2011) 1484 1491 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: wwwelseviercom/locate/tcs Parallel QR processing of Generalized
More informationFraction-free Row Reduction of Matrices of Skew Polynomials
Fraction-free Row Reduction of Matrices of Skew Polynomials Bernhard Beckermann Laboratoire d Analyse Numérique et d Optimisation Université des Sciences et Technologies de Lille France bbecker@ano.univ-lille1.fr
More informationEnhancing Performance of Tall-Skinny QR Factorization using FPGAs
Enhancing Performance of Tall-Skinny QR Factorization using FPGAs Abid Rafique Imperial College London August 31, 212 Enhancing Performance of Tall-Skinny QR Factorization using FPGAs 1/18 Our Claim Common
More informationECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1
ECC for NAND Flash Osso Vahabzadeh TexasLDPC Inc. 1 Overview Why Is Error Correction Needed in Flash Memories? Error Correction Codes Fundamentals Low-Density Parity-Check (LDPC) Codes LDPC Encoding and
More informationMatrix Factorization and Analysis
Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their
More information30.3. LU Decomposition. Introduction. Prerequisites. Learning Outcomes
LU Decomposition 30.3 Introduction In this Section we consider another direct method for obtaining the solution of systems of equations in the form AX B. Prerequisites Before starting this Section you
More informationPractice Exam. 2x 1 + 4x 2 + 2x 3 = 4 x 1 + 2x 2 + 3x 3 = 1 2x 1 + 3x 2 + 4x 3 = 5
Practice Exam. Solve the linear system using an augmented matrix. State whether the solution is unique, there are no solutions or whether there are infinitely many solutions. If the solution is unique,
More informationReduction of Detected Acceptable Faults for Yield Improvement via Error-Tolerance
Reduction of Detected Acceptable Faults for Yield Improvement via Error-Tolerance Tong-Yu Hsieh and Kuen-Jong Lee Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan 70101
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationPart IB Numerical Analysis
Part IB Numerical Analysis Definitions Based on lectures by G. Moore Notes taken by Dexter Chua Lent 206 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More informationDirect and Incomplete Cholesky Factorizations with Static Supernodes
Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)
More informationA 3 3 DILATION COUNTEREXAMPLE
A 3 3 DILATION COUNTEREXAMPLE MAN DUEN CHOI AND KENNETH R DAVIDSON Dedicated to the memory of William B Arveson Abstract We define four 3 3 commuting contractions which do not dilate to commuting isometries
More informationKrylov Subspace Methods that Are Based on the Minimization of the Residual
Chapter 5 Krylov Subspace Methods that Are Based on the Minimization of the Residual Remark 51 Goal he goal of these methods consists in determining x k x 0 +K k r 0,A such that the corresponding Euclidean
More informationReliability Assessment Electric Utility Mapping. Maged Yackoub Eva Szatmari Veridian Connections Toronto, October 2015
Reliability Assessment Electric Utility Mapping Maged Yackoub Eva Szatmari Veridian Connections Toronto, October 2015 Agenda Introduction About Veridian Connections Veridian s GIS platform Reliability
More informationContents. Preface... xi. Introduction...
Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism
More informationPerformance and Scalability. Lars Karlsson
Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix
More informationRedundant Array of Independent Disks
Redundant Array of Independent Disks Yashwant K. Malaiya 1 Redundant Array of Independent Disks (RAID) Enables greater levels of performance and/or reliability How? By concurrent use of two or more hard
More informationA short note on the Householder QR factorization
A short note on the Householder QR factorization Alfredo Buttari October 25, 2016 1 Uses of the QR factorization This document focuses on the QR factorization of a dense matrix. This method decomposes
More informationQR Decomposition in a Multicore Environment
QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance
More information[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]
Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the
More informationCALU: A Communication Optimal LU Factorization Algorithm
CALU: A Communication Optimal LU Factorization Algorithm James Demmel Laura Grigori Hua Xiang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-9
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design
More informationDo we have a quorum?
Do we have a quorum? Quorum Systems Given a set U of servers, U = n: A quorum system is a set Q 2 U such that Q 1, Q 2 Q : Q 1 Q 2 Each Q in Q is a quorum How quorum systems work: A read/write shared register
More informationCRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY
CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY MEMSYS-2017 SWAMIT TANNU DOUG CARMEAN MOINUDDIN QURESHI Why Quantum Computers? 2 Molecule and Material Simulations
More informationA Vector Space Justification of Householder Orthogonalization
A Vector Space Justification of Householder Orthogonalization Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico August 28, 2015 Abstract We demonstrate
More informationInteger Least Squares: Sphere Decoding and the LLL Algorithm
Integer Least Squares: Sphere Decoding and the LLL Algorithm Sanzheng Qiao Department of Computing and Software McMaster University 28 Main St. West Hamilton Ontario L8S 4L7 Canada. ABSTRACT This paper
More informationError Detection, Correction and Erasure Codes for Implementation in a Cluster File-system
Error Detection, Correction and Erasure Codes for Implementation in a Cluster File-system Steve Baker December 6, 2011 Abstract. The evaluation of various error detection and correction algorithms and
More informationAssignment #9: Orthogonal Projections, Gram-Schmidt, and Least Squares. Name:
Assignment 9: Orthogonal Projections, Gram-Schmidt, and Least Squares Due date: Friday, April 0, 08 (:pm) Name: Section Number Assignment 9: Orthogonal Projections, Gram-Schmidt, and Least Squares Due
More informationSolving linear equations with Gaussian Elimination (I)
Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian
More informationSwitch Fabrics. Switching Technology S P. Raatikainen Switching Technology / 2004.
Switch Fabrics Switching Technology S38.65 http://www.netlab.hut.fi/opetus/s3865 L4 - Switch fabrics Basic concepts Time and space switching Two stage switches Three stage switches Cost criteria Multi-stage
More informationAMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems
AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given
More informationOrthonormal Bases; Gram-Schmidt Process; QR-Decomposition
Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 205 Motivation When working with an inner product space, the most
More informationShortest paths with negative lengths
Chapter 8 Shortest paths with negative lengths In this chapter we give a linear-space, nearly linear-time algorithm that, given a directed planar graph G with real positive and negative lengths, but no
More informationLinear Algebra, part 3 QR and SVD
Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We
More information5. Communication resources
5. Communication resources Classical channel Quantum channel Entanglement How does the state evolve under LOCC? Properties of maximally entangled states Bell basis Quantum dense coding Quantum teleportation
More informationMatrices, Vector Spaces, and Information Retrieval
Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes
More information5.6. PSEUDOINVERSES 101. A H w.
5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and
More informationTowards optimal synchronous counting
Towards optimal synchronous counting Christoph Lenzen Joel Rybicki Jukka Suomela MPI for Informatics MPI for Informatics Aalto University Aalto University PODC 5 July 3 Focus on fault-tolerance Fault-tolerant
More informationSafety and Reliability of Embedded Systems
(Sicherheit und Zuverlässigkeit eingebetteter Systeme) Fault Tree Analysis Mathematical Background and Algorithms Prof. Dr. Liggesmeyer, 0 Content Definitions of Terms Introduction to Combinatorics General
More informationA New Space for Comparing Graphs
A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING
More informationPART II : Least-Squares Approximation
PART II : Least-Squares Approximation Basic theory Let U be an inner product space. Let V be a subspace of U. For any g U, we look for a least-squares approximation of g in the subspace V min f V f g 2,
More informationPreliminaries and Complexity Theory
Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra
More informationArnoldi Methods in SLEPc
Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,
More information