Homework 1. Yuan Yao. September 18, 2011

Similar documents
Numerical Methods I Singular Value Decomposition

The Singular Value Decomposition

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

EE731 Lecture Notes: Matrix Computations for Signal Processing

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

1 Singular Value Decomposition and Principal Component

The Singular Value Decomposition and Least Squares Problems

Singular Value Decomposition

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Singular Value Decomposition

IV. Matrix Approximation using Least-Squares

Linear Least Squares. Using SVD Decomposition.

The Singular Value Decomposition

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

STA141C: Big Data & High Performance Statistical Computing

Basic Calculus Review

Introduction to Numerical Linear Algebra II

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Computational math: Assignment 1

Linear Algebra Methods for Data Mining

Lecture 6: September 19

Singular Value Decomposition and Polar Form

Lecture 8: Linear Algebra Background

Singular Value Decompsition

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

14 Singular Value Decomposition

CSC 576: Variants of Sparse Learning

Singular Value Decomposition (SVD) and Polar Form

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

MATH36001 Generalized Inverses and the SVD 2015

Notes on Eigenvalues, Singular Values and QR

STA141C: Big Data & High Performance Statistical Computing

Principal Component Analysis

Singular Value Decomposition

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Latent Semantic Analysis. Hongning Wang

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

arxiv: v1 [math.na] 1 Sep 2018

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

Linear Methods in Data Mining

Compressed Sensing and Robust Recovery of Low Rank Matrices

Independent Component Analysis

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Mathematical foundations - linear algebra

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Lecture: Face Recognition and Feature Reduction

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.

Linear Algebra Review. Vectors

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Preliminary Examination, Numerical Analysis, August 2016

Math Fall Final Exam

Problem set 5: SVD, Orthogonal projections, etc.

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Lecture: Face Recognition and Feature Reduction

Stat 159/259: Linear Algebra Notes

Maths for Signals and Systems Linear Algebra in Engineering

Lecture 2: Linear Algebra Review

Preface to the Second Edition...vii Preface to the First Edition... ix

MATH 612 Computational methods for equation solving and function minimization Week # 2

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Singular Value Decomposition (SVD)

Linear Algebra. Session 12

The Singular Value Decomposition

CHARACTERIZATIONS. is pd/psd. Possible for all pd/psd matrices! Generating a pd/psd matrix: Choose any B Mn, then

Principal Component Analysis

Review problems for MA 54, Fall 2004.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Section 3.9. Matrix Norm

Functional Analysis Review

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Low Rank Approximation Lecture 1. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL

Review of Some Concepts from Linear Algebra: Part 2

Singular Value Decomposition

Summary of Week 9 B = then A A =

Principal Component Analysis

Review of similarity transformation and Singular Value Decomposition

15 Singular Value Decomposition

EXAM. Exam 1. Math 5316, Fall December 2, 2012

3.5 SINGULAR VALUE DECOMPOSITION (OR SVD)

Background Mathematics (2/2) 1. David Barber

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Lecture notes: Applied linear algebra Part 1. Version 2

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Chapter 0 Miscellaneous Preliminaries

Multiple eigenvalues

. = V c = V [x]v (5.1) c 1. c k

Machine Learning (Spring 2012) Principal Component Analysis

Homework 1 Elena Davidson (B) (C) (D) (E) (F) (G) (H) (I)

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation

1 Linearity and Linear Systems

LinGloss. A glossary of linear algebra

Transcription:

Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to the singular value decomposition is Chapter 2 in this book: Matrix Computations, Golub and Van Loan, 3rd edition. Parts of the book are available online here: http://books.google.com/books?id=mloa7wpx6oyc&dq=matrix+computations&printsec= frontcover&source=bl&ots=lbfmg9jbly&sig=_znyb_4zdwftrtn3zckezh9hvza&hl= en&ei=cry8sug7nmo2lafuhtwyba&sa=x&oi=book_result&ct=result&resnum= 3#v=onepage&q=&f=false (a) Existence: Prove the existence of the singular value decomposition. That is, show that if A is an m n real valued matrix, then A = UΣV T, where U is m m orthogonal matrix, V is n n orthogonal matrix, and Σ = diag(σ 1, σ 2,..., σ p ) (where p = min{m, n}) is an m n diagonal matrix. It is customary to order the singular values in decreasing order: σ 1 σ 2... σ p 0. Determine to what extent the SVD is unique. (See Theorem 2.5.2, page 70 in Golub and Van Loan). (b) Best rank-k approximation - operator norm: Prove that the best rank-k approximation of a matrix in the operator norm sense is given by its SVD. That is, if A = UΣV T is the SVD of A, then A k = UΣ k V T (where Σ k = diag(σ 1, σ 2,..., σ k, 0,..., 0) is a diagonal matrix containing the largest k singular values) is a rank-k matrix that satisfies A A k = min A B. rank(b)=k (Recall that the operator norm of A is A = max x =1 Ax. See Theorem 2.5.3 (page 72) in Golub and Van Loan). 1

(c) Best rank-k approximation - Frobenius norm: Show that the SVD also provides the best rank-k approximation for the Frobenius norm, that is, A k = UΣ k V T satisfies A A k F = min A B F. rank(b)=k (d) Schatten p-norms: A matrix norm that satisfies QAZ = A, for all Q and Z orthogonal matrices is called a unitarily invariant norm. The Schatten p-norm of a matrix A is given by the l p norm (p 1) of its vector of singular values, namely, ( ) 1/p A p = σ p i. Show that the Schatten p-norm is unitarily invariant. Note that the case p = 1 is sometimes called the nuclear norm of the matrix, the case p = 2 is the Frobenius norm, and p = is the operator norm. (e) Best rank-k approximation for unitarily invariant norms: Show that the SVD provides the best rank-k approximation for any unitarily invariant norm. See also 7.4.51 and 7.4.52 in: Matrix Analysis, Horn and Johnson, Cambridge University Press, 1985. (f) Closest rotation: Given a square n n matrix A whose SVD is A = UΣV T, show that its closest (in the Frobenius norm) orthogonal matrix R (satisfying RR T = R T R = I) is given by R = UV T. That is, show that A UV T F = i min A R F, RR T =R T R=I where A = UΣV T. In other words, R is obtained from the SVD of A by dropping the diagonal matrix Σ. Use this observation to conclude what is the optimal rotation that aligns two sets of points p 1, p 2,..., p n and q 1,..., q n in R d, that is, find R that minimizes n i=1 Rp i q i 2. See also (the papers are posted on course website): 2

[Arun87] Arun, K. S., Huang, T. S., and Blostein, S. D., Leastsquares fitting of two 3-D point sets, IEEE Transactions on Pattern Analysis and Machine Intelligence, 9 (5), pp. 698 700, 1987. [Keller75] Keller, J. B., Closest Unitary, Orthogonal and Hermitian Operators to a Given Operator, Mathematics Magazine, 48 (4), pp. 192 197, 1975. [FanHoffman55] Fan, K. and Hoffman, A. J., Some Metric Inequalities in the Space of Matrices, Proceedings of the American Mathematical Society, 6 (1), pp. 111 116, 1955. 2. James-Stein Estimators: Suppose that X i N (µ, I p )(i = 1... n) are independent p-gaussian variables, let y = ˆµ n = 1 n n i=1 X i 1 N (µ, n I p ) = N (µ, εi p ), ε = 1 n, consider the James-Stein estimator µ JS = ( ) 1 ε2 (p 2) ˆµ n 2 ˆµ n and its positive part truncation ( ) µ JS+ = 1 ε2 (p 2) ˆµ n 2 where (x) + = max(0, x). (a) Show that + ˆµ n, E µ JS µ 2 E ˆµ n µ 2 ε4 (p 2) 2 ε 2 p + µ 2 so when p >> 2 and µ = 0, the gain will be in O(ε 2 p). (Hint: use Jenson inequality as Lemma 3.10 in [Tsybakov09]). (b) Show that E µ JS+ µ 2 E µ JS µ 2. (Hint: see Lemma A.6 in Appendix of [Tsybakov09]) 3. Phase transition in PCA spike model: Consider a finite sample of n i.i.d vectors x 1, x 2,..., x n drawn from the p-dimensional Gaussian distribution N (0, σ 2 I p p + λ 0 uu T ), where λ 0 /σ 2 is the signal-to-noise 3

ratio (SNR) and u R p. In class we showed that the largest eigenvalue λ of the sample covariance matrix S n S n = 1 n x i x T i n pops outside the support of the Marcenko-Pastur distribution if or equivalently, if i=1 λ 0 σ 2 > γ, SNR > p n. (Notice that γ < (1+ γ) 2, that is, λ 0 can be buried well inside the support Marcenko-Pastur distribution and still the largest eigenvalue pops outside its support). All the following questions refer to the limit n and to almost surely values: (a) Find λ given SNR > γ. (b) Use your previous answer to explain how the SNR can be estimated from the eigenvalues of the sample covariance matrix. (c) Find the squared correlation between the eigenvector v of the sample covariance matrix (corresponding to the largest eigenvalue λ) and the true signal component u, as a function of the SNR, p and n. That is, find u, v 2. (d) Confirm your result using MATLAB simulations (e.g. set u = e; and choose σ = 1 and λ 0 in different levels. Compute the largest eigenvalue and its associated eigenvector, with a comparison to the true ones.) 4. Finite rank perturbations of random symmetric matrices: Wigner s semi-circle law (proved by Eugene Wigner in 1951) concerns the limiting distribution of the eigenvalues of random symmetric matrices. It states, for example, that the limiting eigenvalue distribution of n n symmetric matrices whose entries w ij on and above the diagonal (i j) are i.i.d Gaussians N (0, 1 4n ) (and the entries below the diagonal are determined by symmetrization, i.e., w ji = w ij ) is the semi-circle: p(t) = 2 π 1 t 2, 1 t 1, where the distribution is supported in the interval [ 1, 1]. 4

(a) Confirm Wigner s semi-circle law using MATLAB simulations (take, e.g., n = 400). (b) Find the largest eigenvalue of a rank-1 perturbation of a Wigner matrix. That is, find the largest eigenvalue of the matrix W + λ 0 uu T, where W is an n n random symmetric matrix as above, and u is some deterministic unit-norm vector. Determine the value of λ 0 for which a phase transition occurs. What is the correlation between the top eigenvector of W + λ 0 uu T and the vector u as a function of λ 0? Use techniques similar to the ones we used in class for analyzing finite rank perturbations of sample covariance matrices. 5. PCA experiments: Take any digit data ( 0,..., 9 ), or all of them, from website http://www-stat.stanford.edu/~tibs/elemstatlearn/datasets/zip.digits/ and perform PCA experiments with Matlab or other language you are familiar: (a) Set up data matrix X = (x 1,..., x n ) R p n ; (b) Compute the sample mean ˆµ n and form X = X eˆµ T n ; (c) Compute top k SVD of X = USk V T ; (d) Plot eigenvalue curve, i.e. i vs. λ i (ˆΣ n )/tr(ˆσ n ) (i = 1,..., k), with top-k eigenvalue λ i for sample covariance matrix ˆΣ n = 1 n X X T ; (e) Use imshow to visualize the mean and top-k principle components as left singular vectors U = [u 1,..., u k ]; (f) For k = 1, sort the image data (x i ) (i = 1,..., n) according to the top right singular vectors, v 1, in an ascending order; (g) For k = 2, scatter plot (v 1, v 2 ) and select a grid on such a plane to show those images on the grid (e.g. Figure 14.23 in book [ESL]: Elements of Statistical Learning). 5