CHAPTER 7. Regression

Size: px
Start display at page:

Download "CHAPTER 7. Regression"

Transcription

1 CHAPTER 7 Regression This chapter presents an extended example, illustrating and extending many of the concepts introduced over the past three chapters. Perhaps the best known multi-variate optimisation problem is known as the linear least squares. In general, it could be seen as an extension of solving linear systems to matrices, which are not square. There, for A R m n, b R m the system is over-determined when m > n and under-determined when n < m. In either case, there is no unique solution. One hence relaxes Ax = b into an unconstrained optimisation problem, so as to guarantee the existence of solution, and adds additional terms into the objective, so as guarantee certain properties of the solution. Key concepts to illustrate include: Linear least squares seek arg min x R n Ax b 2, where A R m n, b R m. Key decompositions in solving the problem include: QRD: takes m n matrix A, m n and produces an m m orthogonal matrix, i.e., Q T Q = QQ T = I, and an m n upper-triangular matrix R SVD: takes an m n matrix A and produces an m m matrix U, an m n matrix Σ with Σ i,i 0 being the singular values of A, and an n n matrix V T. Sparse least squares approximate constraint x 0 c by including d x 1 in the objective. This can be seen as a regularisation. Such regularisations have an intimate connection to perturbation analysis. When one wants to optimise with such non-smooth regularisations, one has to consider: A subderivative of a function f : R R at a point y R is a real number c such that f(x) f(y) c(x y) for all x R. (This generalises to higher dimensions, but we will not need that.) The subdifferential of the function f at y is a set of all subderivatives of f at y. 1. Linear Least Squares In regression, we consider a model f : R n R m R of a system, which translates an independent n-dimensional variable ( inputs ) into a scalar dependent variable ( output ), using an m-dimensional adjustable parameter, while introducing some noise. Given a number of observations of the dependent and independent variables of a fixed model, the goal is to estimate the adjustable parameters. Let us assume that the model is known and linear, e.g., f(x) := A i x+ɛ := b i, for a row-vector A i. Then, under standard assumptions (zero-mean noise ɛ, etc) the best linear unbiased estimator of x is arg min x R n b Ax 2. C.f. GaussMarkov theorem. arg min b Ax 2 m = bi x R n n A ij x j 2 j=1 (7.1) = (b Ax) T (b Ax) (7.2) = b T b x T A T b b T Ax + x T A T Ax (7.3) = b T b 2x T A T b + x T A T Ax. (7.4) where in order to obtain (7.4) from (7.3), note that (x T A T b) T = b T Ax is 1D and hence x T A T b = b T Ax. Notice that this is convex and smooth. The gradient for j = 1, 2,..., n is: b Ax 2 x j = 2 = 2 m ( bi m ( bi n j=1 A ij x j ) (b i n j=1 A ijx j ) x j (7.5) n ) A ij x j ( Aij ) (7.6) j=1 1

2 The conditions of first-order optimality for j = 1, 2,..., n hence are: ( ) m n b i A ik ˆx k ( A ij ) = 0 or, equivalently, (7.7) k=1 m n m A ij A ik ˆx k = A ij b i, (7.8) k=1 which in the matrix notation (A T A)ˆx = A T b is known as the normal equation. Second-order conditions involve A being full-rank, which involves A T A positive definite. 2. Linear-Algebraic Methods Linear least squares are a very special case of the multi-variate optimisation, studied since There are a variety of specialised approaches. We will present a brief overview of four approaches developed in linear algebra over the past two centuries. For a complete treatment, see Björck [1996]. First, notice that one can the solution of the normal equation is (A T A) 1 A T b = A b where A is the Moore Penrose pseudoinverse of A. If A T A is positive definite (and hence full-rank), one can use Cholesky decomposition into R T R, where R is upper-triangular. A more stable approach does not form A T A, but consider the so called QR decomposition A = QR, where Q is an m n orthogonal matrix, i.e. Q T Q = I, and R is an n n upper triangular matrix with r ii > The Pseudo-Inverse. The Moore Penrose pseudo-inverse A is a generalisation of the inverse A 1 A = I that satisfies AA A = A ( maps all column vectors to themselves ) A AA = A ( weak multiplicative inverse ) (AA ) T = AA (A A) T = A A In practice, it can be computed using SVD decomposition. The issue with the pseudo-inverse is captured in the following: Theorem 7.1 (1.4.1 in Björck [1996]). If rank(a + E) rank(a), then (A + E) A 2 1/ E 2. Example 7.2 (1.4.1 in Björck [1996]). Consider: [ ] [ σ 0 σ A =, A 1 0 = where (A + E) A 2 = ɛ 1 = 1/ E 2. ], E = [ ɛ ] [ σ, (A + E) 1 0 = 0 ɛ 1 ], 2.2. QRD. Alternatively, one can consider the QR decomposition of an m n matrix A, m n into QR, where Q is a m m orthogonal matrix, i.e., Q T Q = QQ T = I or Q T = Q 1 R is a m n upper-triangular matrix with the bottom (mn) rows equal to 0 One solves the least squares by solving Rx = ( Q T b ), which is easy, considering R is upper triangular SVD. Alternatively, one can consider the singular value decomposition (SVD) of an m n matrix A into UΣV T, where U is an m m matrix whose m columns are left-singular vectors of A Σ is m n matrix with Σ i,i 0 being the singular values of A and all elements outside of the main diagonal equal to 0 V T is n n matrix whose n columns are right-singular vectors of A. We will see much more of SVD in the next chapter. Once you have A = UΣV T, solving least squares involves x = V Σ U T b, where the diagonal matrix Σ is easy to invert. Using SVD makes it also possible to use truncated SVD, i.e., ignore small singular values, whose inversion would create numerical issues. 2

3 3. The Condition Number Considering that solving linear systems (with full-rank A) is a special case of linear least squares, the analysis will seem familiar. Definition 7.3. The condition number of matrix AR m n with respect to linear least squares is cond(a) = A 2 A 2 = σ 1 σ r. where 0 < r = rank(a) and σ 1 σ 2... σ r > 0 are non-zero singular values of A. One can also introduce a component-wise condition number, sometimes called Bauer-Skeel. Let A be perturbed by δa and let the right-hand-side b be perturbed by δb. For arg min (A + δa)(x + δx) (b + δb), (7.9) x Rn we want to bound δx as a function of δb or δa. With basic algebra, c.f in Björck [1996], one obtains: δx 2 1 ( δb 1 + δa 2 x σ n σn 2 δa 2 b Ax 2, (7.10) where 1 σ n = cond(a) A 2, which makes the second summand O(cond 2 (A)). When the perturbation does not change the rank ( acute perturbation ), the behaviour is better, but still rather awful. How would you fix this? 4. Regularisations In many applications of least squares, one hopes to immunise the solution against such perturbations. In many applications, one also hopes to find a solution with only a few non-zero elements of x, which would improve interpretability. For example, if we are looking for genes affecting the height of a person, it may be reasonable to assume that there is a small number of genes involved. 1 The use of the number of non-zero elements, x 0, in the objective function or constraints makes the problem NP-Hard [Natarajan, 1995]. The compressed sensing movement hence uses regularisations, which sometimes come with very interesting probabilistic analyses, and address both immunisation and sparsity issues. The best-known regularisations include: l 0 regularisation: Ax b 2 + β x 0 : non-convex, NP-Hard l 1 regularisation: Ax b 2 + β x 1 : non-smooth, convex l 2 regularisation: Ax b 2 + α x 2 : smooth, convex Tikhonov regularisation: Ax b 2 + Γx 2 for some matrix Γ: smooth, convex. Notice that l 2 regularisation is Tikhonov with Γ = αi. As it turns out, there is a close connection between regularisation and perturbance analysis: Theorem 7.4 (Bertsimas and Copenhaver [2014]). Let Υ R m n be any non-empty, compact set and g : R m R a seminorm. Then there exists some seminorm h : R n R so that for any z R m, b R n, max g(z + b) g(z) + h(b), Υ with equality when z = 0, where Υ R m n is Υ = { : λ} and is a norm whose dual satisfies: there exist norms φ, ψ so that uv = φ(u)ψ(v) for all u R m, v R n. N.B. Seminorms generalise norms, but allow zero length to be assignedto some non-zero vectors. Exercise 7.5. A table in the lecture notes summarises the results of the paper of Bertsimas and Copenhaver [2014]. Download the paper and try to understand what it says. 1 Genetic Investigation of Anthropometric Traits, a consortium, has recently identified 423 genetic regions connected to height, which explain 60% of the genetic component. 3

4 Table: A summary of equivalencies for robustification with uncertainty set Υ and regularization with penalty h, cited from Bertsimas and Copenhaver [2014] in verbatim. Equivalence means that for all z R m and b R n, max Υ g(z + b) = g(z) + h(b), where g is the loss function. Throughout p, q [1, ] and m 2 and δ i denotes the ith row of and h is as in the theorem. For any remaining definitions, please see Bertsimas and Copenhaver [2014]. Loss function Uncertainty set Υ R m n h(b) Equivalence if and only if seminorm g Υ independent(h,g) (h norm) λh(b) always l p Υ σq λδ m (p, 2) b 2 p {1, 2, } l p Υ Fq λδ m (p, q) b q p = q or p {1, } l p Υ independent(q,r) λδ m (p, r) b q p = r or p {1, } l p { : δ i q λ i} = Υ independent(q, ) λm 1/p b q p {1, } 5. Sub-Gradient Methods A subderivative of a function f : R R at a point y is a real number c such that f(x) f(y) c(x y) for all x R. The set of all subderivatives of f at a point y is called the subdifferential of the function f at y. For a convex function f, the subdifferential is a non-empty closed interval [a, b], where a and b are the one-sided limits a = lim f(x) f(y) x y x y b = lim f(x) f(y) x y + x y which are guaranteed to exist and satisfy a b. A point x is a global minimum of a convex function f if and only if zero is contained in the subdifferential at x. Example 7.6. Consider f(x) = x, which is convex. Its subdifferential at 0 is [1, 1]. The subdifferential at positive y is the singleton set {1}. The subdifferential at negative y is the singleton set { 1}. These should be seen as the slopes of subtangent lines. Contrast this with f(x) = x, which is differentiable, its subdifferential is a singleton at any point, and there is hence a single tangent line at any x R. l 1 regularization can be seen as an optimization problems of the form: min f(x) + ψ(x), (7.11) x Rn where f is a smooth, convex and partially block separable function, and ψ is non-smooth and convex. This is typically solved using sub-gradient methods. These can be shown to have the rate of convergence of O(log(1/ɛ)) for strongly convex F and O(1/ɛ) for general convex F, e.g., Mareček et al. [2015]. In coordinate descent for minimisation, each iteration k is associated with coordinate 1 i n and one updates only the ith coordinate a x k+1 = x k + e i h k (x k ), where e i is the vector with 1 at position i and 0 elsewhere, and step-length h k (x k ) arg min t R i f(x k ), t + t + ψ i (x k i + t). This requires i f(x) = m j=1 A j,i(b (j) A j: x), where A j: denotes j-th row of matrix A. Once we know how to compute i f(x), all that remains is a 1D problem min a + bt + c t R 2 t2 + λ d + t, (7.12) where a, b, d R and c, λ R >0. There exists a closed-form solution, known as the soft-thresholding formula: t = sgn(ζ)( ζ λ c ) + d, where ζ = d b c and ( ) + takes the positive part. When carefully implemented, e.g., Mareček et al. [2015] on a machine with 4,096 hardware threads, such a coordinate descent has been used to solve artificial instances of sparse least squares with a matrix of n = 10 9 rows and d = columns in block-angular form, which require 3 TB to store. Such matrices often arise in stochastic optimization. 6. Integer-Programming Methods Consider the problem of best subset selection : Ax b q such that x 0 k for q {1, 2}. This is an NP-Hard mixed-integer programming problem. Within the so called compressed sensing community, a number of heuristics for it have been developed. 4

5 Example 7.7 (Blumensath and Davies [2009]). Blumensath and Davies [2009] studied the Iterative Hard Thresholding (IHT). Starting with x 0 = 0, one applies x k+1 = H s (x k + A T (b Ax k )), where H s (a) is a non-linear operator setting all but the s largest elements of a to zero, with arbitrary tie-breaking in case this is not unique. For A < 1, this converges to a local minimum of the best subset selection. Beyond heuristics, best subset selection can be solved using mixed-integer programming solvers such as IBM ILOG CPLEX in time possibly exponential in the size of the problem. Bertsimas et al. [2015] has recently solved instances with n in the 100s and p in the 1000s in minutes. As he suggests, this is because algorithmic advances and hardware improvements between 1990 and 2014 resulted in the speedup in solving mixed-integer programming problems by the factor of 200 billion. 5

6

7 Bibliography Dimitris Bertsimas and Martin S Copenhaver. Characterization of the equivalence of robustification and regularization in linear, median, and matrix regression. arxiv preprint arxiv: , Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. arxiv preprint arxiv: , øake Björck. Numerical Methods for Least Squares Problems. Society for Industrial and Applied Mathematics, ISBN URL myzipbwybbcc. Thomas Blumensath and Mike E. Davies. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27(3): , ISSN doi: /j.acha URL S Jakub Mareček, Peter Richtárik, and Martin Takác. Distributed block coordinate descent for minimizing partially separable functions. In Mehiddin Al-Baali, Lucio Grandinetti, and Anton Purnama, editors, Numerical Analysis and Optimization, volume 134 of Springer Proceedings in Mathematics & Statistics, pages Springer International Publishing, ISBN doi: / URL Balas Kausik Natarajan. Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2): ,

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 9 1 / 23 Overview

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico Lecture 9 Errors in solving Linear Systems J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico J. Chaudhry (Zeb) (UNM) Math/CS 375 1 / 23 What we ll do: Norms and condition

More information

1 Error analysis for linear systems

1 Error analysis for linear systems Notes for 2016-09-16 1 Error analysis for linear systems We now discuss the sensitivity of linear systems to perturbations. This is relevant for two reasons: 1. Our standard recipe for getting an error

More information

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms Christopher Engström November 14, 2014 Hermitian LU QR echelon Contents of todays lecture Some interesting / useful / important of matrices Hermitian LU QR echelon Rewriting a as a product of several matrices.

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

More information

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?) Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Pseudoinverse & Moore-Penrose Conditions

Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego

More information

MATH 350: Introduction to Computational Mathematics

MATH 350: Introduction to Computational Mathematics MATH 350: Introduction to Computational Mathematics Chapter V: Least Squares Problems Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Spring 2011 fasshauer@iit.edu MATH

More information

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17 Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and

More information

Linear Algebraic Equations

Linear Algebraic Equations Linear Algebraic Equations 1 Fundamentals Consider the set of linear algebraic equations n a ij x i b i represented by Ax b j with [A b ] [A b] and (1a) r(a) rank of A (1b) Then Axb has a solution iff

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg Linear Algebra, part 3 Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2010 Going back to least squares (Sections 1.7 and 2.3 from Strang). We know from before: The vector

More information

Applied Numerical Linear Algebra. Lecture 8

Applied Numerical Linear Algebra. Lecture 8 Applied Numerical Linear Algebra. Lecture 8 1/ 45 Perturbation Theory for the Least Squares Problem When A is not square, we define its condition number with respect to the 2-norm to be k 2 (A) σ max (A)/σ

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

Least squares: the big idea

Least squares: the big idea Notes for 2016-02-22 Least squares: the big idea Least squares problems are a special sort of minimization problem. Suppose A R m n where m > n. In general, we cannot solve the overdetermined system Ax

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

On the Solution of Constrained and Weighted Linear Least Squares Problems

On the Solution of Constrained and Weighted Linear Least Squares Problems International Mathematical Forum, 1, 2006, no. 22, 1067-1076 On the Solution of Constrained and Weighted Linear Least Squares Problems Mohammedi R. Abdel-Aziz 1 Department of Mathematics and Computer Science

More information

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Peter J.C. Dickinson p.j.c.dickinson@utwente.nl http://dickinson.website version: 12/02/18 Monday 5th February 2018 Peter J.C. Dickinson

More information

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity

More information

Positive Definite Matrix

Positive Definite Matrix 1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 21: Sensitivity of Eigenvalues and Eigenvectors; Conjugate Gradient Method Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis

More information

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 27, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

5.6. PSEUDOINVERSES 101. A H w.

5.6. PSEUDOINVERSES 101. A H w. 5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Alp Yurtsever (EPFL),

More information

Review Questions REVIEW QUESTIONS 71

Review Questions REVIEW QUESTIONS 71 REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Mathematical Beer Goggles or The Mathematics of Image Processing

Mathematical Beer Goggles or The Mathematics of Image Processing How Mathematical Beer Goggles or The Mathematics of Image Processing Department of Mathematical Sciences University of Bath Postgraduate Seminar Series University of Bath 12th February 2008 1 How 2 How

More information

Lecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014

Lecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014 Lecture 7 Gaussian Elimination with Pivoting David Semeraro University of Illinois at Urbana-Champaign February 11, 2014 David Semeraro (NCSA) CS 357 February 11, 2014 1 / 41 Naive Gaussian Elimination

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

ECE 275A Homework # 3 Due Thursday 10/27/2016

ECE 275A Homework # 3 Due Thursday 10/27/2016 ECE 275A Homework # 3 Due Thursday 10/27/2016 Reading: In addition to the lecture material presented in class, students are to read and study the following: A. The material in Section 4.11 of Moon & Stirling

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 6: Some Other Stuff PD Dr.

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

Notes on Solving Linear Least-Squares Problems

Notes on Solving Linear Least-Squares Problems Notes on Solving Linear Least-Squares Problems Robert A. van de Geijn The University of Texas at Austin Austin, TX 7871 October 1, 14 NOTE: I have not thoroughly proof-read these notes!!! 1 Motivation

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Lecture 6 : Projected Gradient Descent

Lecture 6 : Projected Gradient Descent Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Computational Methods. Least Squares Approximation/Optimization

Computational Methods. Least Squares Approximation/Optimization Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the

More information

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Suares Problem Hongguo Xu Dedicated to Professor Erxiong Jiang on the occasion of his 7th birthday. Abstract We present

More information

Lecture 5 Least-squares

Lecture 5 Least-squares EE263 Autumn 2008-09 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

Linear Least squares

Linear Least squares Linear Least squares Method of least squares Measurement errors are inevitable in observational and experimental sciences Errors can be smoothed out by averaging over more measurements than necessary to

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

Basic Elements of Linear Algebra

Basic Elements of Linear Algebra A Basic Review of Linear Algebra Nick West nickwest@stanfordedu September 16, 2010 Part I Basic Elements of Linear Algebra Although the subject of linear algebra is much broader than just vectors and matrices,

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

LinGloss. A glossary of linear algebra

LinGloss. A glossary of linear algebra LinGloss A glossary of linear algebra Contents: Decompositions Types of Matrices Theorems Other objects? Quasi-triangular A matrix A is quasi-triangular iff it is a triangular matrix except its diagonal

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Background Mathematics (2/2) 1. David Barber

Background Mathematics (2/2) 1. David Barber Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Preliminary Examination, Numerical Analysis, August 2016

Preliminary Examination, Numerical Analysis, August 2016 Preliminary Examination, Numerical Analysis, August 2016 Instructions: This exam is closed books and notes. The time allowed is three hours and you need to work on any three out of questions 1-4 and any

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 16 1 / 21 Overview

More information

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2 Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an

More information

Lecture Note 2: The Gaussian Elimination and LU Decomposition

Lecture Note 2: The Gaussian Elimination and LU Decomposition MATH 5330: Computational Methods of Linear Algebra Lecture Note 2: The Gaussian Elimination and LU Decomposition The Gaussian elimination Xianyi Zeng Department of Mathematical Sciences, UTEP The method

More information

SIGNAL AND IMAGE RESTORATION: SOLVING

SIGNAL AND IMAGE RESTORATION: SOLVING 1 / 55 SIGNAL AND IMAGE RESTORATION: SOLVING ILL-POSED INVERSE PROBLEMS - ESTIMATING PARAMETERS Rosemary Renaut http://math.asu.edu/ rosie CORNELL MAY 10, 2013 2 / 55 Outline Background Parameter Estimation

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for At a high level, there are two pieces to solving a least squares problem:

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for At a high level, there are two pieces to solving a least squares problem: 1 Trouble points Notes for 2016-09-28 At a high level, there are two pieces to solving a least squares problem: 1. Project b onto the span of A. 2. Solve a linear system so that Ax equals the projected

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

ECE 275A Homework #3 Solutions

ECE 275A Homework #3 Solutions ECE 75A Homework #3 Solutions. Proof of (a). Obviously Ax = 0 y, Ax = 0 for all y. To show sufficiency, note that if y, Ax = 0 for all y, then it must certainly be true for the particular value of y =

More information

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2 HE SINGULAR VALUE DECOMPOSIION he SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD he Singular Value Decomposition (SVD) heorem For

More information

10725/36725 Optimization Homework 2 Solutions

10725/36725 Optimization Homework 2 Solutions 10725/36725 Optimization Homework 2 Solutions 1 Convexity (Kevin) 1.1 Sets Let A R n be a closed set with non-empty interior that has a supporting hyperplane at every point on its boundary. (a) Show that

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Numerical Methods. Elena loli Piccolomini. Civil Engeneering.  piccolom. Metodi Numerici M p. 1/?? Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information