Lecture 3. Ax x i a i. i i

Similar documents
Errors for Linear Systems

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

APPENDIX A Some Linear Algebra

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

The Order Relation and Trace Inequalities for. Hermitian Operators

Eigenvalues of Random Graphs

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Spectral Graph Theory and its Applications September 16, Lecture 5

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Linear Approximation with Regularization and Moving Least Squares

2.3 Nilpotent endomorphisms

Norms, Condition Numbers, Eigenvalues and Eigenvectors

Lecture 3: Shannon s Theorem

CSCE 790S Background Results

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Randomness and Computation

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Formulas for the Determinant

NP-Completeness : Proofs

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

Notes on Frequency Estimation in Data Streams

An Introduction to Morita Theory

Lecture 10: May 6, 2013

Deriving the X-Z Identity from Auxiliary Space Method

Lecture 12: Discrete Laplacian

THE WEIGHTED WEAK TYPE INEQUALITY FOR THE STRONG MAXIMAL FUNCTION

Module 3: Element Properties Lecture 1: Natural Coordinates

Singular Value Decomposition: Theory and Applications

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Dynamic Systems on Graphs

The Second Eigenvalue of Planar Graphs

Lecture 2: Prelude to the big shrink

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Math 217 Fall 2013 Homework 2 Solutions

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Lecture Notes on Linear Regression

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Differentiating Gaussian Processes

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

find (x): given element x, return the canonical element of the set containing x;

Week 5: Neural Networks

Lecture 4: Constant Time SVD Approximation

18.1 Introduction and Recap

Week 2. This week, we covered operations on sets and cardinality.

Complete subgraphs in multipartite graphs

Lecture 4: Universal Hash Functions/Streaming Cont d

COS 511: Theoretical Machine Learning

The Expectation-Maximization Algorithm

Lecture 4: September 12

CALCULUS CLASSROOM CAPSULES

P A = (P P + P )A = P (I P T (P P ))A = P (A P T (P P )A) Hence if we let E = P T (P P A), We have that

Assortment Optimization under MNL

Communication Complexity 16:198: February Lecture 4. x ij y ij

FUZZY FINITE ELEMENT METHOD

DISCRIMINANTS AND RAMIFIED PRIMES. 1. Introduction A prime number p is said to be ramified in a number field K if the prime ideal factorization

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

STEINHAUS PROPERTY IN BANACH LATTICES

σ τ τ τ σ τ τ τ σ Review Chapter Four States of Stress Part Three Review Review

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Supplementary Notes for Chapter 9 Mixture Thermodynamics

More metrics on cartesian products

Lecture 10 Support Vector Machines II

Computing MLE Bias Empirically

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

MATH Sensitivity of Eigenvalue Problems

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Lecture 4 Hypothesis Testing

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

5 The Rational Canonical Form

a b a In case b 0, a being divisible by b is the same as to say that

FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP

Random Walks on Digraphs

MEM 255 Introduction to Control Systems Review: Basics of Linear Algebra

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Lecture 5 September 17, 2015

From Biot-Savart Law to Divergence of B (1)

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Quantum Mechanics for Scientists and Engineers. David Miller

Lecture 12: Classification

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

Composite Hypotheses testing

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Min Cut, Fast Cut, Polynomial Identities

MMA and GCMMA two methods for nonlinear optimization

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Transcription:

18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest sngular value n addton to the lower bound on the smallest that we derved last class. Snce the largest sngular value of A + G can be bounded by σ n (A + G) = A + G A + G and we can t really do much about A, the mportant thng to do s bound G. To start off wth a weak but easy bound, we use the followng smple lemma. Lemma 1. If a denote the columns of the matrx A, then max a A d max a Proof. If e denotes the vector wth 1 n the th component but 0 s everywhere else, then Ae = a Hence the left hand nequalty s clear. For the other nequalty, let x be a unt vector and wrte ( ) Ax = A x e = x a Therefore Applyng Cauchy Schwarz and usng the fact that x = 1, we get Ax x a 2 d max a 2 whch s what we want. Ax x a If g s a vector of Gaussan random varables wth varance 1, then g 2 s dstrbuted accordng to the χ 2 dstrbuton wth d degrees of freedom, whch has densty functon x d/2 1 e x/2 We need the followng bound on how large a χ 2 random varable can be. 1

Lemma 2. If X s a random varable dstrbuted accordng to the χ 2 dstrbuton wth d degrees of freedom, then Pr{X kd} k d/2 1 e d(k 1)/2 Snce G kd mples max g k d, hence usng lemma 2 and the unon bound, we get e d(k2 1)/2 Pr{ G kd} dk d 2 2 A sharper bound usng nets The bound above s unsatsfyng: for any fxed unt vector x, the vector Gx s a Gaussan random vector, and so ts length should be about d on average. Ths secton wll show how to get a bound on G that uses ths dea to get a bound on G that grows as d rather than as d. Let S d 1 denote the (d 1) dmensonal unt sphere (the boundary of the unt ball n d dmensons). Defnton 1. A λ net on S d 1 s a collecton of ponts {x 1, x 2,... x n } such that for any x S d 1, mn x x λ We wll use only 1 nets, and the followng lemma clams that they need not be too large. Lemma 3. For d 2, there exsts a 1 net wth at most 2 d (d 1) ponts. Usng ths lemma, we can prove the followng bound on G : Lemma 4. If G s a matrx of standard normal varables, then e d(k2 1)/2 Pr{ G 2k d} 2 d (d 1)k d 2 (Ths lemma appears wth a slghtly dfferent bound as lemma 2.8 on pg. 907 of [Sza90]) Proof. Let N be the 1 net gven by lemma 3. Let G = UΣV T be the sngular value decomposton of G, and let u and v be the columns of U and V respectvely. By defnton of the net, there exsts a vector x N such that v n x 1 Ths s equvalent to 1 v n x 2 Expandng x n the bass v, we obtan x = x v 2

wth x n 1/2. Hence Gx = x Gv = x σ u x n σ n G /2 Hence G 2k d mples that there exsts x N such that Gx k d By the unon bound and lemma 2, we obtan Pr{ G 2k d} N k d 2 e d(k2 1)/2 whch s the stated result. 3 Gaussan elmnaton In the next couple of lectures, we wll use the results we have proved to analyze Gaussan elmnaton. Brefly, Gaussan elmnaton solves a system Ax = b by performng row and column operatons on A to reduce t to an upper trangular matrx, whch can then be easly solved. Theoretcally, one can vew ths process as factorng A nto a product of a lower trangular matrx representng the row operatons performed (actually, ther nverses), and an upper trangular matrx representng the result of these operatons. Ths s called the LU factorzaton of A. There are three pvotng strateges one can use whle performng ths algorthm (pvotng s the process of permutng rows and/or columns before dong the elmnaton). 1. No pvotng: Just what t says. Ths can be done only f we never run nto zeros on the dagonal. Ths s easy to analyze. 2. Partal pvotng: Here only row permutatons are permtted. The strategy s to brng the largest entry n the column we are consderng onto the dagonal. The LU factorzaton now actually has to be wrtten as LU = P A where P s a permutaton matrx representng the row permutatons performed. Partal pvotng guarantees that no entry n L can exceed 1 n absolute value. 3. Complete pvotng: Here both row and column permutatons are permtted, and the strategy s to move the largest entry n the part of the matrx that we have not yet processed to the dagonal. The factorzaton now looks lke LU = P AQ where P and Q are permutaton matrces. 3

Wlknson showed that f L, ˆ U ˆ and x ˆ represent the computed values of L, U and x n floatng pont to an accuracy of ɛ, then wth δa such that (A + δa)ˆx = b δa dɛ(3 A + 5 L U ) Matlab uses partal pvotng, and t can be shown that there exst matrces A for whch partal pvotng fals, n the sense that U becomes exponentally large (n d). Ths leads to a total loss of precson unless at least d bts are used to store ntermedate results. Wlknson also showed that for complete pvotng, U A d 1 2 lg d whch means that the number of bts requred s only lg 2 d n the worst case. However, complete pvotng s much more expensve n floatng pont than partal pvotng, whch seems to work qute well n practce. One of the goals of ths class s to understand why. In the next couple of lectures, we wll show n fact that no pvotng does well most of the tme. 4 Proof of techncal lemmas For completeness, we gve the proofs of lemmas 2 and 3. Proof of lemma 2. We have x d/2 1 e x/2 Pr{X kd} = dx kd (x + (k 1)d) d/2 1 e (k 1)d/2 x/2 = dx d Usng x + (k 1)d kx, and we are done. k d/2 1 e (k 1)d/2 k d/2 1 e (k 1)d/2 d x d/2 1 e x/2 dx Proof of lemma 3. Let N be a maxmal set of ponts on the unt sphere such that the great crcle dstance between any two ponts n N s at least π/3. Then N wll be a 1 net, because f u were a unt vector such that no vector n N s wthn dstance 1 of u, then there would be no pont of N wthn great crcle dstance π/3 of u, so u could be added to N. 4

To see that N (d 1)2 d, observe that the sets B(x, π/6) = {u S d 1 : d(u, x) π/6}, x N are dsjont. A lower bound on the (d 1) dmensonal volume of each B(x, π/6) s gven by the volume of the (d 1) dmensonal ball of radus sn(π/6) = 1/2. If S d 1 denotes the volume of S d 1 and V d the volume of the unt ball n d dmensons, then V d = 2π d/2 dγ(d/2) and S d 1 = 2πd/2 Γ(d/2) Hence N 2 d 1 S d 1 V d 1 Γ((d 1)/2) = 2 d 1 (d 1) π Γ(d/2) 2 d (d 1) A somewhat tghter bound can be obtaned by usng the fact that lm d Γ((d 1)/2) = e Γ(d/2) d References [Sza90] Stanslaw J. Szarek, Spaces wth large dstance to l n and random matrces, Amercan Journal of Mathematcs 112 (1990), no. 6, 899 942. 5