Convex and Nonsmooth Optimization: Assignment Set # 5 Spring 2009 Professor: Michael Overton April 23, 2009

Similar documents
Homework 1 Elena Davidson (B) (C) (D) (E) (F) (G) (H) (I)

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

CS 246 Review of Linear Algebra 01/17/19

Finite Math - J-term Section Systems of Linear Equations in Two Variables Example 1. Solve the system

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E.

Math 103, Summer 2006 Determinants July 25, 2006 DETERMINANTS. 1. Some Motivation

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Image Registration Lecture 2: Vectors and Matrices

22A-2 SUMMER 2014 LECTURE 5

Linear Algebra Review. Vectors

MATH 3511 Lecture 1. Solving Linear Systems 1

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Symmetric matrices and dot products

MAT 1332: CALCULUS FOR LIFE SCIENCES. Contents. 1. Review: Linear Algebra II Vectors and matrices Definition. 1.2.

Notes on Eigenvalues, Singular Values and QR

The QR Factorization

Lecture 1 Systems of Linear Equations and Matrices

Numerical Methods I Non-Square and Sparse Linear Systems

1 Outline Part I: Linear Programming (LP) Interior-Point Approach 1. Simplex Approach Comparison Part II: Semidenite Programming (SDP) Concludin

Linear Algebra Methods for Data Mining

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

9. Numerical linear algebra background

1 Linearity and Linear Systems

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

The Ongoing Development of CSDP

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

1 Determinants. 1.1 Determinant

Linear Algebra, Vectors and Matrices

ACM106a - Homework 2 Solutions

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Introduction. Vectors and Matrices. Vectors [1] Vectors [2]

Homework 4. Convex Optimization /36-725

Multiplying matrices by diagonal matrices is faster than usual matrix multiplication.

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

Linear Algebra and Matrix Inversion

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Semidefinite Programming Basics and Applications

Linear Algebra March 16, 2019

ECE133A Applied Numerical Computing Additional Lecture Notes

Math 320, spring 2011 before the first midterm

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Infeasible Primal-Dual (Path-Following) Interior-Point Methods for Semidefinite Programming*

Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 2015

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Matrix decompositions

Homework 5. Convex Optimization /36-725

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

18.06 Quiz 2 April 7, 2010 Professor Strang

Singular Value Decompsition

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

Lecture 2: Linear Algebra Review

Conceptual Questions for Review

Collaborative Filtering

9. Iterative Methods for Large Linear Systems

Numerical Linear Algebra

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Numerical Linear Algebra

Lecture 4: January 26

Linear Algebra Massoud Malek

L. Vandenberghe EE133A (Spring 2017) 3. Matrices. notation and terminology. matrix operations. linear and affine functions.

22m:033 Notes: 3.1 Introduction to Determinants

Review problems for MA 54, Fall 2004.

Orthonormal Transformations and Least Squares

The QR Decomposition

12. Perturbed Matrices

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

arxiv: v1 [math.oc] 26 Sep 2015

B553 Lecture 5: Matrix Algebra Review

Matrix Factorization Reading: Lay 2.5

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Numerical Methods for Inverse Kinematics

I. Multiple Choice Questions (Answer any eight)

L2-7 Some very stylish matrix decompositions for solving Ax = b 10 Oct 2015

Chapter 4 - MATRIX ALGEBRA. ... a 2j... a 2n. a i1 a i2... a ij... a in

9. Numerical linear algebra background

This ensures that we walk downhill. For fixed λ not even this may be the case.

Lecture 6 Positive Definite Matrices

This operation is - associative A + (B + C) = (A + B) + C; - commutative A + B = B + A; - has a neutral element O + A = A, here O is the null matrix

6 EIGENVALUES AND EIGENVECTORS

On the solving of matrix equation of Sylvester type

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning

1 Principal component analysis and dimensional reduction

Matrix Operations: Determinant

MATH 235: Inner Product Spaces, Assignment 7

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices

A Brief Outline of Math 355

Transcription:

Eduardo Corona Convex and Nonsmooth Optimization: Assignment Set # 5 Spring 29 Professor: Michael Overton April 23, 29. So, let p q, and A be a full rank p q matrix. Also, let A UV > be the SVD factorization of A. Finally, we de ne a p + q p + q symmetric matrix by blocks: A > V B > U > A UV > V > V > U U > Now, we know that the matrix is essentially a q q diagonal matrix b with p q q zero rows attached to it on the bottom, and > is this same matrix with p q zero columns attached to its right. Hence, we can rewrite this as a three by three block matrix, and diagonalize this matrix by "rotating" a part of the space by 9 degrees: b V @ U b A V > U > V p2 @ I q I q b I U q I q A @ b A p @ I q I q I q I q A V > I 2 U p q I > p q > It is then clear that B has 2q positive eigenvalues, and these correspond to i ; with i the singular values of A. In the case where the range of A is not q; we can still do this, although now B would have 2r positive eigenvalues (since some of the s would be ). Finally, we observe that in the case p q, B has no zero eigenvalues (it is invertible). 2. In the Candes and Recht paper, it is shown that the low rank matrix completion problem can be cast as a nuclear norm imization problem, which in turn is a SDP. Here X; M 2 R qq : trace(x) subject to X ij M ij (i; j) 2 X For a general matrix M 2 R pq ; we introduce the auxiliary matrices W 2 R pp ; W 2 2 R qq to obtain an alternative formulation for this as a SDP: 2 (trace(w )+trace(w 2 )) subject to X ij M ij (i; j) 2 W X X W 2 Now, given any X satisfying X ij M ij; we set W I p,w 2 I. Then, the semide nite contraint looks like: Ip X X X I + I q X

And from problem ; making X A >, this constraint is equivalent to I + B. However, we know that the nonzero eigenvalues of B are i ; where i are the singular values of X. Thus, this constraint is equivalent to asking i 8i () maxf i g. Hence, the best upper bound we can come up with for 2 (trace(w )+trace(w 2 )) is p+q 2. If X ; then clearly this pair, W ; W 2 I is the optimal solution. 3. Let y (vec(w ); vec(w 2 ); vec(x)); b 2 if y i is a diagonal entry of W or W 2, zero otherwise. Also, let E ij denote a matrix with in the (i; j)th entry, and zeroes everywhere else. This SDP in standard dual form then looks like: max b > y px qx subject to i;j w ij Eij + i;j w 2 ij + X E ij Z (i;j) 2 X ij Eij E ji! M f fm Z Where f M is a matrix with the entries of M in ( otherwise). And of course, we can rewrite this completely in terms of the y s; renag A i the corresponding matrix in front of our variables. Hence, the number of dual variables is m p 2 + q 2 + card( c ): Alternatively, we can also write this problem in the primal standard form: Eji 2 E ij 2 trace(z) ; Z M ij (i; j) 2 Z The Schur complement matrix is an m m matrix, of the form: S A(Z P )A T S ij tr(p A k Z A l ) Where A k is one of the block matrices used on the dual problem formulation. Hence, to construct it we have to nd m 2 of these entries, which in turn involves computing the products P A k Z A l. The A matrices are often sparse (in fact, in our example they only have 2 or nonzero entries), however V and Z are probably full matrices in R p+qp+q. In the SDP paper provided (Alizadeh-Haeberly- Overton), the authors argue that the most expensive operation in the XZ method is the construction of this matrix, and provide complexity bounds of O(mn 3 + m 2 n 2 ) work. For our problem, n p + q and written in the dual standard form, m p 2 + q 2 + card( c ); whereas in the standard primal form, m card( c ). Hence, it is computationally cheaper to use the primal form. How do we get this bound? A way to go about inverting Z is to compute its Cholesky factors, which takes O(n 3 ) work, and inverting L takes n 2 work (or n 3 work if done naively). In any case, this is done once, so it doesn t enter our calculations. Now, we have m products of the form P A k and Z A l ; and the computational cost of these depends on the sparsity structure of the A k s. In general, one doesn t know the sparsity structure of these matrices, and so if the number of non-zero entries is roughly comparable with n; one gets a bound of 2mn 3 work. However, in our case, the matrices A have 2 or nonzero entries regardless of p or q; and we only need to multiply two or one row. Hence, if done smartly, one can reduce this to 4m(p + q) work. Finally, for each of the m 2 entries, we need to compute the trace of the matrix (P A k )(Z A l ); which takes m 2 n 2 work (since we only need to obtain the diagonal entries of this product). However, in our case, these matrices have only two nonzero rows, so this is reduced to 4m 2 (p + q) work. Overall, constructing the Schur complement matrix takes O((p+q) 3 +m 2 (p+q)) work, and computing the cholesky factor of the Schur complement matrix takes O(m 3 ) work. For the dual problem, m O((p + q) 2 ) (it has at most p 2 + q 2 + pq variables), and so constructing the Schur complement matrix is more expensive. However, here we are using too many variables. Using the primal form of this problem, m O(pq); and so both these operations take comparably the same amount of work. 2

Proportion of Matrices Completed 4. If this is a dual problem in its standard form, the corresponding primal problem would be: *! + M f fm ; P subject to ha i ; P i b i P The constraints corresponding to the entries of W and W 2 tell us that: p ii 2 p ij 8i 6 j; i; j 2 f; ::; pg or i; j 2 fp + ; ::; p + qg That is, the variable in our primal problem looks like: P 2 I p Q Q 2 I q Finally, taking into account the constraints corresponding to X; we conclude that q ij (i; j) 2. for all Matlab Programg Assignment: CVX and the Matrix Completion Problem 5 I wrote a matlab function [X,error]rankSDP(M,num) to solve the corresponding SDP using CVX. Using this package, I wrote the optimization problem, declaring the variables W ; W 2 to be symmetric, and using the SDP notation. The program grabs num entries from the matrix M (which are drawn randomly from a complete list), and returns the completed matrix, and the error (which is taken to be the norm of the di erence between X and M). For each of the three matrices in Xdata:mat, I ran 5 tests, in which I randomly added one or two entries at a time to the constraints, and ran the optimization program. For each case, I observed both the proportion of matrices completed vs number of entries provided (where a matrix is considered to be completed if the norm of the error falls under a certain tolerance), and I also plotted a histogram of the imum of entries needed to complete the matrix: comp mat X Matrices Completed for Experiments with X.9.8.7.6.5.4.3.2. :pdf 5 5 2 25 Number of entries provided 3

Frequency Proportion of Matrices Completed Frequency comp mat 25 Histogram: Entries Needed to Complete X (Out of 25) 2 5 5 2:pdf 2 4 6 8 2 22 Entries Needed to Complete comp mat X2.9 Matrices Completed for Experiments with X2.8.7.6.5.4.3.2. 3:pdf 5 5 2 25 3 35 4 45 5 Number of Entries Provided comp mat 2 5 Histogram: Number of Entries Needed to Complete X2 (Out of 5) 5 4:pdf 34 36 38 4 42 44 46 48 Number of Entries Needed 4

F r equency Proportion of Matrices Completed comp mat X3 Matrices Completed for Experiments with X3.9.8.7.6.5.4.3.2. 5:pdf 4 45 5 55 6 65 7 75 8 Number of Entries Provided comp mat 3 4 Histogram: Number of Entries Needed to Complete X3 (Out of 8) 2 8 6 4 2 56 58 6 62 64 66 68 7 72 74 76 Number of Entries Needed 6:pdf We observe that, as we provide more and more entries, there is a imum number of entries after which the algorithm starts completing the matrix in some of the experiments. Eventually, all matrices are completed.the mean number of entries needed in each case is 6; 4 and 68; which corresponds to a proportion of 64%; 82% and 85%. This seems contradictory: however, these matrices are distinct, and have a very small number of entries. To really observe the behaviour of this algorithm, we need to run a more systematic set of experiments like those proposed on problem 6. 6 Now, I wrote a matlab function to run this experiment, [num,e,t]rankexp(q,r,m,tol), which generates m random experiments of order q and rank r (In this case, I only use it for rank 3). These matrices are randomly generated by obtaining a random matrix V 2 R q3 ; and computing V V >. With overwhelg probability, the result is a matrix of rank r (and in any case, it is lower or equal than r). Experiment : I ran experiments on the CIMS number crunching servers (Solaris, 6GB Ram) for q 2; 3; 4; 5 increasing the number of entries provided by 2% of the total number of entries each time. For these experiments, the mean imum number of entries required to complete the matrix was: q % #total 2 72% 2884 3 62% 5589 4 5% 86 5 44% 25 28% 28 5

Average Time (Minutes) Number of Entries Needed 3 Number of Entries Needed vs Matrix Size 25 2 5 5 2 3 4 5 6 7 8 9 q (Order of X) The mean running times were: q Total (s) 2 :375 3 :628 4 :93 5 :258 5:64.3 Average Time vs Size of Completed Matrix.2..9.8.7.6.5.4 2 25 3 35 4 45 5 q (order of X) I tried to run more experiments (for example, for q > ; or trying to increase the number of entries by a smaller percentage of q 2 ), but the runs took too long. However, based on the observations I have, I think a good upper bound for a q such that the mean completion time is bigger than 5 is q. In computing running time, we observe the trade-o between the increased e ciency of the matrix completion algorithm (which has a theoretical bound of n 6 5 log(n); which means the theoretical bound on the proportion of entries needed should go to like n 4 5 log(n)), and the increase in size and complexity of the problem as q grows. Also, we note that although the overall running time increases, the average time (per SDP problem solved) decreases with matrix size. It is most likely that for larger matrix sizes this behaviour is reversed. 7 The ratio test algorithm to nd s is very simple. We have: x + sx + s x x 6

Now, as was stated in the problem, for a particular component x i ; if (x) i ; then this entry will be positive 8s. Otherwise, s ensures positivity. Hence, we can take t to be: x i (x) i t f (x) i< x i (x) i g If there are any negative entries, and otherwise. 8 For SDPs, we can do a similar test. First, we obtain the Cholesky factorization of X; X LL >. Inverting these triangular factor, we obtain the following: X + sx I + sl X(L ) > Now, we can just perform the exact same test on the eigenvalues of L X(L ) >. That is, we can take t to be: t f 2 (L X(L ) > ) : < g (L X(L ) > ); L X(L ) > ; L X(L ) > 7