CHAPTER 7. Regression
|
|
- Mavis Ball
- 5 years ago
- Views:
Transcription
1 CHAPTER 7 Regression This chapter presents an extended example, illustrating and extending many of the concepts introduced over the past three chapters. Perhaps the best known multi-variate optimisation problem is known as the linear least squares. In general, it could be seen as an extension of solving linear systems to matrices, which are not square. There, for A R m n, b R m the system is over-determined when m > n and under-determined when n < m. In either case, there is no unique solution. One hence relaxes Ax = b into an unconstrained optimisation problem, so as to guarantee the existence of solution, and adds additional terms into the objective, so as guarantee certain properties of the solution. Key concepts to illustrate include: Linear least squares seek arg min x R n Ax b 2, where A R m n, b R m. Key decompositions in solving the problem include: QRD: takes m n matrix A, m n and produces an m m orthogonal matrix, i.e., Q T Q = QQ T = I, and an m n upper-triangular matrix R SVD: takes an m n matrix A and produces an m m matrix U, an m n matrix Σ with Σ i,i 0 being the singular values of A, and an n n matrix V T. Sparse least squares approximate constraint x 0 c by including d x 1 in the objective. This can be seen as a regularisation. Such regularisations have an intimate connection to perturbation analysis. When one wants to optimise with such non-smooth regularisations, one has to consider: A subderivative of a function f : R R at a point y R is a real number c such that f(x) f(y) c(x y) for all x R. (This generalises to higher dimensions, but we will not need that.) The subdifferential of the function f at y is a set of all subderivatives of f at y. 1. Linear Least Squares In regression, we consider a model f : R n R m R of a system, which translates an independent n-dimensional variable ( inputs ) into a scalar dependent variable ( output ), using an m-dimensional adjustable parameter, while introducing some noise. Given a number of observations of the dependent and independent variables of a fixed model, the goal is to estimate the adjustable parameters. Let us assume that the model is known and linear, e.g., f(x) := A i x+ɛ := b i, for a row-vector A i. Then, under standard assumptions (zero-mean noise ɛ, etc) the best linear unbiased estimator of x is arg min x R n b Ax 2. C.f. GaussMarkov theorem. arg min b Ax 2 m = bi x R n n A ij x j 2 j=1 (7.1) = (b Ax) T (b Ax) (7.2) = b T b x T A T b b T Ax + x T A T Ax (7.3) = b T b 2x T A T b + x T A T Ax. (7.4) where in order to obtain (7.4) from (7.3), note that (x T A T b) T = b T Ax is 1D and hence x T A T b = b T Ax. Notice that this is convex and smooth. The gradient for j = 1, 2,..., n is: b Ax 2 x j = 2 = 2 m ( bi m ( bi n j=1 A ij x j ) (b i n j=1 A ijx j ) x j (7.5) n ) A ij x j ( Aij ) (7.6) j=1 1
2 The conditions of first-order optimality for j = 1, 2,..., n hence are: ( ) m n b i A ik ˆx k ( A ij ) = 0 or, equivalently, (7.7) k=1 m n m A ij A ik ˆx k = A ij b i, (7.8) k=1 which in the matrix notation (A T A)ˆx = A T b is known as the normal equation. Second-order conditions involve A being full-rank, which involves A T A positive definite. 2. Linear-Algebraic Methods Linear least squares are a very special case of the multi-variate optimisation, studied since There are a variety of specialised approaches. We will present a brief overview of four approaches developed in linear algebra over the past two centuries. For a complete treatment, see Björck [1996]. First, notice that one can the solution of the normal equation is (A T A) 1 A T b = A b where A is the Moore Penrose pseudoinverse of A. If A T A is positive definite (and hence full-rank), one can use Cholesky decomposition into R T R, where R is upper-triangular. A more stable approach does not form A T A, but consider the so called QR decomposition A = QR, where Q is an m n orthogonal matrix, i.e. Q T Q = I, and R is an n n upper triangular matrix with r ii > The Pseudo-Inverse. The Moore Penrose pseudo-inverse A is a generalisation of the inverse A 1 A = I that satisfies AA A = A ( maps all column vectors to themselves ) A AA = A ( weak multiplicative inverse ) (AA ) T = AA (A A) T = A A In practice, it can be computed using SVD decomposition. The issue with the pseudo-inverse is captured in the following: Theorem 7.1 (1.4.1 in Björck [1996]). If rank(a + E) rank(a), then (A + E) A 2 1/ E 2. Example 7.2 (1.4.1 in Björck [1996]). Consider: [ ] [ σ 0 σ A =, A 1 0 = where (A + E) A 2 = ɛ 1 = 1/ E 2. ], E = [ ɛ ] [ σ, (A + E) 1 0 = 0 ɛ 1 ], 2.2. QRD. Alternatively, one can consider the QR decomposition of an m n matrix A, m n into QR, where Q is a m m orthogonal matrix, i.e., Q T Q = QQ T = I or Q T = Q 1 R is a m n upper-triangular matrix with the bottom (mn) rows equal to 0 One solves the least squares by solving Rx = ( Q T b ), which is easy, considering R is upper triangular SVD. Alternatively, one can consider the singular value decomposition (SVD) of an m n matrix A into UΣV T, where U is an m m matrix whose m columns are left-singular vectors of A Σ is m n matrix with Σ i,i 0 being the singular values of A and all elements outside of the main diagonal equal to 0 V T is n n matrix whose n columns are right-singular vectors of A. We will see much more of SVD in the next chapter. Once you have A = UΣV T, solving least squares involves x = V Σ U T b, where the diagonal matrix Σ is easy to invert. Using SVD makes it also possible to use truncated SVD, i.e., ignore small singular values, whose inversion would create numerical issues. 2
3 3. The Condition Number Considering that solving linear systems (with full-rank A) is a special case of linear least squares, the analysis will seem familiar. Definition 7.3. The condition number of matrix AR m n with respect to linear least squares is cond(a) = A 2 A 2 = σ 1 σ r. where 0 < r = rank(a) and σ 1 σ 2... σ r > 0 are non-zero singular values of A. One can also introduce a component-wise condition number, sometimes called Bauer-Skeel. Let A be perturbed by δa and let the right-hand-side b be perturbed by δb. For arg min (A + δa)(x + δx) (b + δb), (7.9) x Rn we want to bound δx as a function of δb or δa. With basic algebra, c.f in Björck [1996], one obtains: δx 2 1 ( δb 1 + δa 2 x σ n σn 2 δa 2 b Ax 2, (7.10) where 1 σ n = cond(a) A 2, which makes the second summand O(cond 2 (A)). When the perturbation does not change the rank ( acute perturbation ), the behaviour is better, but still rather awful. How would you fix this? 4. Regularisations In many applications of least squares, one hopes to immunise the solution against such perturbations. In many applications, one also hopes to find a solution with only a few non-zero elements of x, which would improve interpretability. For example, if we are looking for genes affecting the height of a person, it may be reasonable to assume that there is a small number of genes involved. 1 The use of the number of non-zero elements, x 0, in the objective function or constraints makes the problem NP-Hard [Natarajan, 1995]. The compressed sensing movement hence uses regularisations, which sometimes come with very interesting probabilistic analyses, and address both immunisation and sparsity issues. The best-known regularisations include: l 0 regularisation: Ax b 2 + β x 0 : non-convex, NP-Hard l 1 regularisation: Ax b 2 + β x 1 : non-smooth, convex l 2 regularisation: Ax b 2 + α x 2 : smooth, convex Tikhonov regularisation: Ax b 2 + Γx 2 for some matrix Γ: smooth, convex. Notice that l 2 regularisation is Tikhonov with Γ = αi. As it turns out, there is a close connection between regularisation and perturbance analysis: Theorem 7.4 (Bertsimas and Copenhaver [2014]). Let Υ R m n be any non-empty, compact set and g : R m R a seminorm. Then there exists some seminorm h : R n R so that for any z R m, b R n, max g(z + b) g(z) + h(b), Υ with equality when z = 0, where Υ R m n is Υ = { : λ} and is a norm whose dual satisfies: there exist norms φ, ψ so that uv = φ(u)ψ(v) for all u R m, v R n. N.B. Seminorms generalise norms, but allow zero length to be assignedto some non-zero vectors. Exercise 7.5. A table in the lecture notes summarises the results of the paper of Bertsimas and Copenhaver [2014]. Download the paper and try to understand what it says. 1 Genetic Investigation of Anthropometric Traits, a consortium, has recently identified 423 genetic regions connected to height, which explain 60% of the genetic component. 3
4 Table: A summary of equivalencies for robustification with uncertainty set Υ and regularization with penalty h, cited from Bertsimas and Copenhaver [2014] in verbatim. Equivalence means that for all z R m and b R n, max Υ g(z + b) = g(z) + h(b), where g is the loss function. Throughout p, q [1, ] and m 2 and δ i denotes the ith row of and h is as in the theorem. For any remaining definitions, please see Bertsimas and Copenhaver [2014]. Loss function Uncertainty set Υ R m n h(b) Equivalence if and only if seminorm g Υ independent(h,g) (h norm) λh(b) always l p Υ σq λδ m (p, 2) b 2 p {1, 2, } l p Υ Fq λδ m (p, q) b q p = q or p {1, } l p Υ independent(q,r) λδ m (p, r) b q p = r or p {1, } l p { : δ i q λ i} = Υ independent(q, ) λm 1/p b q p {1, } 5. Sub-Gradient Methods A subderivative of a function f : R R at a point y is a real number c such that f(x) f(y) c(x y) for all x R. The set of all subderivatives of f at a point y is called the subdifferential of the function f at y. For a convex function f, the subdifferential is a non-empty closed interval [a, b], where a and b are the one-sided limits a = lim f(x) f(y) x y x y b = lim f(x) f(y) x y + x y which are guaranteed to exist and satisfy a b. A point x is a global minimum of a convex function f if and only if zero is contained in the subdifferential at x. Example 7.6. Consider f(x) = x, which is convex. Its subdifferential at 0 is [1, 1]. The subdifferential at positive y is the singleton set {1}. The subdifferential at negative y is the singleton set { 1}. These should be seen as the slopes of subtangent lines. Contrast this with f(x) = x, which is differentiable, its subdifferential is a singleton at any point, and there is hence a single tangent line at any x R. l 1 regularization can be seen as an optimization problems of the form: min f(x) + ψ(x), (7.11) x Rn where f is a smooth, convex and partially block separable function, and ψ is non-smooth and convex. This is typically solved using sub-gradient methods. These can be shown to have the rate of convergence of O(log(1/ɛ)) for strongly convex F and O(1/ɛ) for general convex F, e.g., Mareček et al. [2015]. In coordinate descent for minimisation, each iteration k is associated with coordinate 1 i n and one updates only the ith coordinate a x k+1 = x k + e i h k (x k ), where e i is the vector with 1 at position i and 0 elsewhere, and step-length h k (x k ) arg min t R i f(x k ), t + t + ψ i (x k i + t). This requires i f(x) = m j=1 A j,i(b (j) A j: x), where A j: denotes j-th row of matrix A. Once we know how to compute i f(x), all that remains is a 1D problem min a + bt + c t R 2 t2 + λ d + t, (7.12) where a, b, d R and c, λ R >0. There exists a closed-form solution, known as the soft-thresholding formula: t = sgn(ζ)( ζ λ c ) + d, where ζ = d b c and ( ) + takes the positive part. When carefully implemented, e.g., Mareček et al. [2015] on a machine with 4,096 hardware threads, such a coordinate descent has been used to solve artificial instances of sparse least squares with a matrix of n = 10 9 rows and d = columns in block-angular form, which require 3 TB to store. Such matrices often arise in stochastic optimization. 6. Integer-Programming Methods Consider the problem of best subset selection : Ax b q such that x 0 k for q {1, 2}. This is an NP-Hard mixed-integer programming problem. Within the so called compressed sensing community, a number of heuristics for it have been developed. 4
5 Example 7.7 (Blumensath and Davies [2009]). Blumensath and Davies [2009] studied the Iterative Hard Thresholding (IHT). Starting with x 0 = 0, one applies x k+1 = H s (x k + A T (b Ax k )), where H s (a) is a non-linear operator setting all but the s largest elements of a to zero, with arbitrary tie-breaking in case this is not unique. For A < 1, this converges to a local minimum of the best subset selection. Beyond heuristics, best subset selection can be solved using mixed-integer programming solvers such as IBM ILOG CPLEX in time possibly exponential in the size of the problem. Bertsimas et al. [2015] has recently solved instances with n in the 100s and p in the 1000s in minutes. As he suggests, this is because algorithmic advances and hardware improvements between 1990 and 2014 resulted in the speedup in solving mixed-integer programming problems by the factor of 200 billion. 5
6
7 Bibliography Dimitris Bertsimas and Martin S Copenhaver. Characterization of the equivalence of robustification and regularization in linear, median, and matrix regression. arxiv preprint arxiv: , Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. arxiv preprint arxiv: , øake Björck. Numerical Methods for Least Squares Problems. Society for Industrial and Applied Mathematics, ISBN URL myzipbwybbcc. Thomas Blumensath and Mike E. Davies. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27(3): , ISSN doi: /j.acha URL S Jakub Mareček, Peter Richtárik, and Martin Takác. Distributed block coordinate descent for minimizing partially separable functions. In Mehiddin Al-Baali, Lucio Grandinetti, and Anton Purnama, editors, Numerical Analysis and Optimization, volume 134 of Springer Proceedings in Mathematics & Statistics, pages Springer International Publishing, ISBN doi: / URL Balas Kausik Natarajan. Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2): ,
CHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationTHE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR
THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 9 1 / 23 Overview
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationLecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico
Lecture 9 Errors in solving Linear Systems J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico J. Chaudhry (Zeb) (UNM) Math/CS 375 1 / 23 What we ll do: Norms and condition
More information1 Error analysis for linear systems
Notes for 2016-09-16 1 Error analysis for linear systems We now discuss the sensitivity of linear systems to perturbations. This is relevant for two reasons: 1. Our standard recipe for getting an error
More informationLinear Algebra in Actuarial Science: Slides to the lecture
Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More informationforms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms
Christopher Engström November 14, 2014 Hermitian LU QR echelon Contents of todays lecture Some interesting / useful / important of matrices Hermitian LU QR echelon Rewriting a as a product of several matrices.
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationMaths for Signals and Systems Linear Algebra in Engineering
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE
More information1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)
Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationPseudoinverse & Moore-Penrose Conditions
ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego
More informationMATH 350: Introduction to Computational Mathematics
MATH 350: Introduction to Computational Mathematics Chapter V: Least Squares Problems Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Spring 2011 fasshauer@iit.edu MATH
More informationBindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17
Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and
More informationLinear Algebraic Equations
Linear Algebraic Equations 1 Fundamentals Consider the set of linear algebraic equations n a ij x i b i represented by Ax b j with [A b ] [A b] and (1a) r(a) rank of A (1b) Then Axb has a solution iff
More informationOptimisation in Higher Dimensions
CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained
More informationLinear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg
Linear Algebra, part 3 Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2010 Going back to least squares (Sections 1.7 and 2.3 from Strang). We know from before: The vector
More informationApplied Numerical Linear Algebra. Lecture 8
Applied Numerical Linear Algebra. Lecture 8 1/ 45 Perturbation Theory for the Least Squares Problem When A is not square, we define its condition number with respect to the 2-norm to be k 2 (A) σ max (A)/σ
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationAM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition
AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten
More information1 Cricket chirps: an example
Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number
More informationLeast squares: the big idea
Notes for 2016-02-22 Least squares: the big idea Least squares problems are a special sort of minimization problem. Suppose A R m n where m > n. In general, we cannot solve the overdetermined system Ax
More information9. Numerical linear algebra background
Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course
More informationApplied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization
More informationOn the Solution of Constrained and Weighted Linear Least Squares Problems
International Mathematical Forum, 1, 2006, no. 22, 1067-1076 On the Solution of Constrained and Weighted Linear Least Squares Problems Mohammedi R. Abdel-Aziz 1 Department of Mathematics and Computer Science
More informationMathematical Optimisation, Chpt 2: Linear Equations and inequalities
Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Peter J.C. Dickinson p.j.c.dickinson@utwente.nl http://dickinson.website version: 12/02/18 Monday 5th February 2018 Peter J.C. Dickinson
More informationRecovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm
Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity
More informationPositive Definite Matrix
1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 21: Sensitivity of Eigenvalues and Eigenvectors; Conjugate Gradient Method Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis
More informationCLASS NOTES Computational Methods for Engineering Applications I Spring 2015
CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 27, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More information5.6. PSEUDOINVERSES 101. A H w.
5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationSketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage
Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Alp Yurtsever (EPFL),
More informationReview Questions REVIEW QUESTIONS 71
REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of
More informationMatrix decompositions
Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The
More informationMathematical Beer Goggles or The Mathematics of Image Processing
How Mathematical Beer Goggles or The Mathematics of Image Processing Department of Mathematical Sciences University of Bath Postgraduate Seminar Series University of Bath 12th February 2008 1 How 2 How
More informationLecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014
Lecture 7 Gaussian Elimination with Pivoting David Semeraro University of Illinois at Urbana-Champaign February 11, 2014 David Semeraro (NCSA) CS 357 February 11, 2014 1 / 41 Naive Gaussian Elimination
More informationbe a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u
MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =
More informationECE 275A Homework # 3 Due Thursday 10/27/2016
ECE 275A Homework # 3 Due Thursday 10/27/2016 Reading: In addition to the lecture material presented in class, students are to read and study the following: A. The material in Section 4.11 of Moon & Stirling
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More information9. Numerical linear algebra background
Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization
More informationScientific Computing
Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 6: Some Other Stuff PD Dr.
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationNotes on Solving Linear Least-Squares Problems
Notes on Solving Linear Least-Squares Problems Robert A. van de Geijn The University of Texas at Austin Austin, TX 7871 October 1, 14 NOTE: I have not thoroughly proof-read these notes!!! 1 Motivation
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationLecture 6 : Projected Gradient Descent
Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationComputational Methods. Least Squares Approximation/Optimization
Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the
More informationA Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem
A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Suares Problem Hongguo Xu Dedicated to Professor Erxiong Jiang on the occasion of his 7th birthday. Abstract We present
More informationLecture 5 Least-squares
EE263 Autumn 2008-09 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationlinearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice
3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is
More informationLinear Least squares
Linear Least squares Method of least squares Measurement errors are inevitable in observational and experimental sciences Errors can be smoothed out by averaging over more measurements than necessary to
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationComputational Methods. Eigenvalues and Singular Values
Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations
More informationBasic Elements of Linear Algebra
A Basic Review of Linear Algebra Nick West nickwest@stanfordedu September 16, 2010 Part I Basic Elements of Linear Algebra Although the subject of linear algebra is much broader than just vectors and matrices,
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationLinGloss. A glossary of linear algebra
LinGloss A glossary of linear algebra Contents: Decompositions Types of Matrices Theorems Other objects? Quasi-triangular A matrix A is quasi-triangular iff it is a triangular matrix except its diagonal
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationBackground Mathematics (2/2) 1. David Barber
Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationPreliminary Examination, Numerical Analysis, August 2016
Preliminary Examination, Numerical Analysis, August 2016 Instructions: This exam is closed books and notes. The time allowed is three hours and you need to work on any three out of questions 1-4 and any
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 16 1 / 21 Overview
More informationFall TMA4145 Linear Methods. Exercise set Given the matrix 1 2
Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an
More informationLecture Note 2: The Gaussian Elimination and LU Decomposition
MATH 5330: Computational Methods of Linear Algebra Lecture Note 2: The Gaussian Elimination and LU Decomposition The Gaussian elimination Xianyi Zeng Department of Mathematical Sciences, UTEP The method
More informationSIGNAL AND IMAGE RESTORATION: SOLVING
1 / 55 SIGNAL AND IMAGE RESTORATION: SOLVING ILL-POSED INVERSE PROBLEMS - ESTIMATING PARAMETERS Rosemary Renaut http://math.asu.edu/ rosie CORNELL MAY 10, 2013 2 / 55 Outline Background Parameter Estimation
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for At a high level, there are two pieces to solving a least squares problem:
1 Trouble points Notes for 2016-09-28 At a high level, there are two pieces to solving a least squares problem: 1. Project b onto the span of A. 2. Solve a linear system so that Ax equals the projected
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationECE 275A Homework #3 Solutions
ECE 75A Homework #3 Solutions. Proof of (a). Obviously Ax = 0 y, Ax = 0 for all y. To show sufficiency, note that if y, Ax = 0 for all y, then it must certainly be true for the particular value of y =
More informationσ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2
HE SINGULAR VALUE DECOMPOSIION he SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD he Singular Value Decomposition (SVD) heorem For
More information10725/36725 Optimization Homework 2 Solutions
10725/36725 Optimization Homework 2 Solutions 1 Convexity (Kevin) 1.1 Sets Let A R n be a closed set with non-empty interior that has a supporting hyperplane at every point on its boundary. (a) Show that
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationNumerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??
Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement
More informationLecture notes: Applied linear algebra Part 1. Version 2
Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and
More informationUNIT 6: The singular value decomposition.
UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More information