Total Least Squares Approach in Regression Methods

Size: px
Start display at page:

Download "Total Least Squares Approach in Regression Methods"

Transcription

1 WDS'08 Proceedings of Contributed Papers, Part I, 88 93, ISBN MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic. Abstract. Total least squares (TLS) is a data modelling technique which can be used for many types of statistical analysis, e.g. a regression. In the regression setup, both dependent and independent variables are considered to be measured with errors. Thereby, the TLS approach in statistics is sometimes called an errors-invariables (EIV) modelling and, moreover, this type of regression is usually known as an orthogonal regression. We take an EIV regression model into account. Necessary algebraic tools are introduced in order to construct the TLS estimator. A comparison with the classical ordinary least squares estimator is illustrated. Consequently, the existence and uniqueness of the TLS estimator are discussed. Finally, we show the large sample properties of the TLS estimator, i.e. a strong and weak consistency, and an asymptotic distribution. Introduction Observing several characteristics may be thought as variables straightforwardly postulates a natural question: What is the relationship between these measured characteristics? One of many possible attitudes can arise, that some of the characteristics might be explained by a (functional) dependence on the other characteristics. Therefore, we consider the first mentioned variables as dependent or response and the second ones as independent or explanatory. Our proposed model of dependence contains errors in the response variable (we think only of one dependent variable) and in the explanatory variables as well. But firstly, we just try to find an appropriate fit for some points in the Euclidean space using a hyperplane, i.e. approximating several incompatible linear relations. Afterwards, some properties for the measurement errors are added and, hence, several statistical asymptotical qualities are developed. Overdetermined System Let us consider the overdetermined system of linear relations y Xβ, y Ê n, X Ê n m, n > m. () Relations in () are deliberately not denoted as equations, because in many cases, the exact solution need not exist. Thereby, only an approximation can be found. Hence, one can speak about the best solution of the overdetermined system (). But the best in which way? Singular Value Decomposition Before inquiring into an appropriate solution of (), we should introduce some very important tools for further exploration. Theorem (Singular Value Decomposition SVD) If A Ê n m then there exist orthonormal matrices U = [u,...,u n Ê n n and V = [v,...,v m Ê m m such that U AV = Σ = diag {σ,..., σ p } Ê n m, σ... σ p 0, and p = min {n, m}. (2) Proof. See Golub and Van Loan [996. In SVD, the diagonal matrix Σ is uniquely determined by A (though the matrices U and V are not). Previous powerful matrix decomposition allows us to define a cutting point r for a given matrix A Ê n m using its singular values σ i σ... σ r > σ r+ =... = σ p = 0, p = min {n, m}. 88

2 Since the matrices U and V in (2) are orthonormal, it yields rank(a) = r and one may obtain a dyadic decomposition (expansion) of the matrix A: A = r σ i u i vi. (3) i= A suitable matrix norm is also required and, hence, the Frobenius norm for matrix A (a ij ) n,m i,j= is defined as follows n m A F := a 2 ij tr(a = A) = p σi 2 = r σi 2, p = min {n, m}. (4) i= j= Furthermore, the following approximation theorem plays the main role in the forthcoming derivation, where a matrix is approximated with another one with lower rank. Theorem (Eckart-Young-Mirsky Matrix Approximation) Let the SVD of A Ê n m be given by A = r i= σ iu i vi with rank(a) = r. If k < r and A k = k i= σ iu i vi, min A B F = A A r k F = σi 2. (5) rank(b)=k i= i= i=k+ Proof. See Eckart and Young [936 and Mirsky [960. Above all, one more technical property needs to be incorporated. Theorem (Sturm Interlacing Property) Let n m and the singular values of A Ê n m are σ... σ m. If B results from A by deleting one column of A and B has singular values σ... σ m, then σ σ σ 2 σ 2... σ m σ m 0. (6) Proof. See Thompson [972. Total Least Squares Solution Now, three basic approximation ways of the overdetermined system () are suggested. The traditional approach penalizes only the misfit in the dependent variable part min ǫ Ê n,β Ê ǫ m 2 s.t. y + ǫ = Xβ (7) and is called the ordinary least squares (OLS). Here, the data matrix X is thought as exactly known and errors occur only in the vector y. An opposite case to the OLS is represented by the data least squares (DLS), which allow corrections only in the explanatory variables (independent input data) min Θ Ê n m,β Ê m Θ F s.t. y = (X + Θ)β. (8) Finally, we concentrate ourselves on the total least squares approach minimizing the squares of errors in the values of both dependent and independent variables min [ε,ξ Ê n (m+),β Ê m [ε,ξ F s.t. y + ε = (X + Ξ)β. (9) A graphical illustration of three previous cases can be found in Figure. One may notice that the TLS search for the orthogonal projection of the observed data onto the unknown approximation corresponding to a TLS solution. Once a minimizing [ˆε, ˆΞ of the TLS problem (9) is found, then any β satisfying y+ˆε = (X+ ˆΞ)β is called a TLS solution. The basic form of the TLS solution was investigated for the first time by Golub and Van Loan [

3 OLS DLS TLS Various Least Squares Fit OLS DLS TLS Figure. Various least squares fits (ordinary, data, and total LS) for the same three data points in the two-dimensional plane that coincides with the regression setup of one response and one explanatory variable. Theorem (TLS Solution of y Xβ) Let the SVD of X Ê n m be given by X = m i= σ i u i v i the SVD of [y,x = m+ i= σ iu i vi. If σ m > σ m+, then [ŷ, ˆX := [y + ˆε,X + ˆΞ = UˆΣV and ˆΣ = diag {σ,...,σ m, 0} (0) with the corresponding TLS correction matrix and [ˆε, ˆΞ = σ m+ u m+ v m+ () solves the TLS problem and ˆβ = e v [v 2,m+,..., v m+,m+ (2) m+ exists and is the unique solution to ŷ = ˆXβ. Proof. Proof by contradiction, we firstly show that e v m+ 0. Suppose v,m+ = 0, then there exist 0 w Ê m such that [ 0,w [ [y,x 0 [y,x w = σm+ 2 which yields into w X Xw = σ 2 m+. But this is a contradiction with the assumption σ m > σ m+, since σ 2 m is the smallest eigenvalue of X X. Sturm interlacing theorem (6) and the assumption σ m > σ m+ yield σ m > σ m+. Therefore, σ m+ is not a repeated singular value of [y,x and σ m > 0. If σ m+ 0, then rank[y,x = m+. We want to find [ŷ, ˆX such that [y,x [ŷ, ˆX F is minimal and [ŷ, ˆX[, β = 0 for some β. Therefore, rank([ŷ, ˆX) = m and applying Eckart-Young-Mirsky 90

4 theorem (5), one may easily obtain the SVD of [ŷ, ˆX in (0) and the TLS correction matrix (), which must have rank one. Now, it is clear that the TLS solution is given by the last column of V. Finally, since dim Ker([ŷ, ˆX) =, then the TLS solution (2) must be unique. If σ m+ = 0, then v m+ Ker([y,X) and [y,x[, β = 0. Hence, no approximation is needed, overdetermined system () is compatible, and the exact TLS solution is given by (2). Uniqueness of this TLS solution follows from the fact that [, β Range([y,X ). A closed-form expression of the TLS solution (2) can be derived. If σ m > σ m+, the existence and uniqueness of the TLS solution has already been shown. Thereby, since singular vectors v i, i.e. from (0), are eigenvectors of [y,x [y,x, then ˆβ also satisfies and, hence, [y,x [y,x [ ˆβ = [ y y y X X y X X [ ˆβ = σm+ 2 [ ˆβ ˆβ = (X X σ 2 m+i m ) X y. (3) Previous equation reminds us a form of an estimator in the ridge regression setup. Therefore, one may expect avoiding multicollinearity problems with classical OLS regression (7), due to the ridge regression and the TLS orthogonal regression correspondence. Expression (3) looks almost similar to the OLS estimator β of (7), except the term containing σm+ 2. This term is missing in the well-known OLS estimator with full rank regression matrix providing by Gauss-Markov theorem of a solution as so-called normal equations X X β = X y. From a statistical point of view, a situation when σ m = σ m+ occurs for real data is unlikely and also quite irrelevant. But Van Huffel and Vandewalle [99 investigated this case and concluded the following summary. Suppose σ q > σ q+ =... = σ m+, q m and denote Q := [v q+,...,v m+. Then: σ m > σ m+ the unique TLS solution (2) exists; σ m = σ m+ & e Q 0 infinitely many TLS solutions of (9) exist and one can pick up one of them with the smallest norm; σ m = σ m+ & e Q = 0 no solution of (9) exists and one needs to define another ( more restrictive ) TLS problem. A more restrictive TLS problem, mentioned previously, is called a nongeneric TLS problem. Simply, additional restriction [ε, Ξ Q = 0 added to the constraints in (9) tries to project out unimportant or redundant data from the original TLS problem (9). Errors-in-Variables Model One should not only pay attention to the existence or form of the TLS solution, but also to its properties, e.g. statistical ones. In statistics, the TLS problem (9) corresponds to a so-called errors-invariables setup. Here, unobservable true values y 0 and X 0 satisfy a single linear relationship y 0 = α n + X 0 β (4) and unknown parameters α (intercept) and β (regression coefficients) need to be estimated. Observations y and X measure y 0 and X 0 with additive errors ε and Ξ σ 2 ν y = y 0 + ε, (5) X = X 0 + Ξ. (6) Rows of the errors [ε,ξ are iid with common zero mean and covariance matrix σνi 2 m+, where > 0 is unknown. TLS Estimator For simplicity, we suppose that condition σ m > σ m+ is satisfied. Let us denote G := I n n n n with n := [,..., for practical purposes. Then, we define the estimate of coefficient β as the TLS solution ˆβ and the estimate of intercept α as follows ˆα := ȳ [ x,..., x m ˆβ (7) where x i means the average of the elements of ith column of matrix X. Finally, the variance term σ 2 ν is estimated using singular values ˆσ 2 := n σ 2 m+. 9

5 Large Sample Properties An asymptotical behaviour of an estimator is one of its basic characteristics. The asymptotical properties can provide some information about the quality (i.e. efficiency) of the estimator. Consistency Firstly, we provide a theorem showing the strong consistency of the TLS estimator. Theorem (Strong Consistency) If lim n n X 0 X 0 exists, then Moreover, if lim n n X 0 GX 0 > 0, then Proof. See Gleser [98. lim n ˆσ2 a.s. = σν 2. (8) lim ˆβ n a.s. = β, (9) a.s. lim ˆα = α. (20) n The assumptions in the previous theorem are somewhat restrictive and need not be satisfied, e.g. univariate errors-in-variables model with the values of the independent variable vary linearly with the sample size. Therefore, these assumptions need to be weakened yielding the following theorem. Theorem (Weak Consistency) Suppose that the distribution of the rows of [ε,ξ possesses finite fourth moment. Denote X 0 := [ n,x 0. If then ( λ min X n 0 X ) 0 ( ) λ 2 min X 0 X 0 λ max ( X 0 X 0 [ ˆαˆβ [ P α β, n, ), n Proof. Can be easily derived using Theorem 2 by Gallo [982a., n. (2) Notation λ min (respectively, λ max ) denotes the minimal (respectively, maximal) eigenvalue. It has to be remarked on the fourth moment finiteness of the rows of [ε, Ξ, that this mathematically means for all i {,...,n} r j ij <, ω ij {ε i,ξ i,,...ξ i,m }, r j Æ. (22) ω j rj=4 The assumptions in the previous theorems ensure that the values of the independent variables spread out fast enough. Gallo [982a proved that the previous intermediate assumptions are implied by the assumptions in the theorem for strong consistency. Asymptotic Distributions Finally, an asymptotic distribution for further statistical inference has to be shown. Theorem (Asymptotic Normality) Suppose that the distribution of the rows of [ε, Ξ possesses finite fourth moment. If then [ ˆα α n ˆβ β Proof. See Gallo [982b. lim n n X 0 X 0 > 0 has an asymptotic zero-mean multivariate normal distribution as n. The covariance matrix of the multivariate normal distribution from the previous theorem is not shown here due to its complicated form and one may find that formula in Gallo [982b. 92

6 Discussion and Conclusions In this paper, the TLS problem from algebraical point of view is summarized and a connection with the errors-in-variables a statistical model is shown. An unification of algebraical and numerical results with statistical ones is demonstrated. The TLS optimizing problem is defined here also with the OLS and DLS alternatives. Its solution is found using spectral information of the system; and the existence and uniqueness of this solution are discussed. The errors-in-variables model as a correspondence to the orthogonal regression is introduced. Moreover, a comparison of the classical regression approach with the errors-in-variables setup is shown. Finally, large sample properties such as a strong and weak consistency, and an asymptotical distribution of the TLS estimator an estimator in the errors-in-variables model are recapitulated. For a further research, one may be interested in the extension of the TLS approach in the nonlinear regression or, on the top of that, in the nonparametric regression. Amemiya [997 proposed a way of the first order linearization of the nonlinear relations. A computational stability could be improved using the Golub-Kahan bidiagonalization connected up with the TLS problem by Paige and Strakoš [2006. This approach needs to be studied from the statistical point of view as well. Acknowledgments. The present work was supported by the Grant Agency of the Czech Republic (grant 20/05/H007). References Amemiya, Y., Generalization of the TLS approach in the errors-in-variables problem, in Proceedings of the Second International Workshop on Total Least Squares and Errors-in-Variables Modeling, edited by S. Van Huffel, pp , 997. Eckart, G. and Young, G., The approximation of one matrix by another of lower rank, Psychometrica,, 2 28, 936. Gallo, P. P., Consistency of regression estimates when some variables are subject to error, Communications in Statistics: Theory and Methods,, , 982a. Gallo, P. P., Properties of Estimators in Errors-in-variables Models, Ph.D. thesis, Institute of Statistics Mimeoseries #5, University of North Carolina, Chapel Hill, NC, 982b. Gleser, L. J., Estimation in a multivariate errors in variables regression model: Large sample results, Annals of Statistics, 9, 24 44, 98. Golub, G. H. and Van Loan, C. F., An analysis of the total least squares problem, SIAM Journal on Numerical Analysis, 7, , 980. Golub, G. H. and Van Loan, C. F., Matrix Computation, Johns Hopkins University Press, Baltimore, MD, 3rd edn., 996. Mirsky, L., Symmetric gauge functions and unitarily invariant norms, Quarterly Journal of Mathematics Oxford,, 50 59, 960. Paige, C. C. and Strakoš, Z., Core problems in linear algebraic systems, SIAM Journal on Matrix Analysis and Applications, 27, , Thompson, R. C., Principal submatricies IX: Interlacing inequalities for singular values of submatrices, Linear Algebra Applications, 5, 2, 972. Van Huffel, S. and Vandewalle, J., The Total Least Squares Problem: Computational Aspects and Analysis, SIAM, Philadelphia, PA,

UNIFYING LEAST SQUARES, TOTAL LEAST SQUARES AND DATA LEAST SQUARES

UNIFYING LEAST SQUARES, TOTAL LEAST SQUARES AND DATA LEAST SQUARES UNIFYING LEAST SQUARES, TOTAL LEAST SQUARES AND DATA LEAST SQUARES Christopher C. Paige School of Computer Science, McGill University, Montreal, Quebec, Canada, H3A 2A7 paige@cs.mcgill.ca Zdeněk Strakoš

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Total least squares. Gérard MEURANT. October, 2008

Total least squares. Gérard MEURANT. October, 2008 Total least squares Gérard MEURANT October, 2008 1 Introduction to total least squares 2 Approximation of the TLS secular equation 3 Numerical experiments Introduction to total least squares In least squares

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Review of similarity transformation and Singular Value Decomposition

Review of similarity transformation and Singular Value Decomposition Review of similarity transformation and Singular Value Decomposition Nasser M Abbasi Applied Mathematics Department, California State University, Fullerton July 8 7 page compiled on June 9, 5 at 9:5pm

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Key words. conjugate gradients, normwise backward error, incremental norm estimation. Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Introduction to Numerical Linear Algebra II

Introduction to Numerical Linear Algebra II Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity

More information

Jianhua Z. Huang, Haipeng Shen, Andreas Buja

Jianhua Z. Huang, Haipeng Shen, Andreas Buja Several Flawed Approaches to Penalized SVDs A supplementary note to The analysis of two-way functional data using two-way regularized singular value decompositions Jianhua Z. Huang, Haipeng Shen, Andreas

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

CHARACTERIZATIONS. is pd/psd. Possible for all pd/psd matrices! Generating a pd/psd matrix: Choose any B Mn, then

CHARACTERIZATIONS. is pd/psd. Possible for all pd/psd matrices! Generating a pd/psd matrix: Choose any B Mn, then LECTURE 6: POSITIVE DEFINITE MATRICES Definition: A Hermitian matrix A Mn is positive definite (pd) if x Ax > 0 x C n,x 0 A is positive semidefinite (psd) if x Ax 0. Definition: A Mn is negative (semi)definite

More information

Characterization of half-radial matrices

Characterization of half-radial matrices Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Section 3.9. Matrix Norm

Section 3.9. Matrix Norm 3.9. Matrix Norm 1 Section 3.9. Matrix Norm Note. We define several matrix norms, some similar to vector norms and some reflecting how multiplication by a matrix affects the norm of a vector. We use matrix

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

The total least squares problem in AX B. A new classification with the relationship to the classical works

The total least squares problem in AX B. A new classification with the relationship to the classical works Eidgenössische Technische Hochschule Zürich Ecole polytechnique fédérale de Zurich Politecnico federale di Zurigo Swiss Federal Institute of Technology Zurich The total least squares problem in AX B. A

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Consistent and equivariant estimation in errors-in-variables models with dependent errors

Consistent and equivariant estimation in errors-in-variables models with dependent errors Consistent and equivariant estimation in errors-in-variables models with dependent errors Michal Pešta Charles University in Prague Department of Probability and Mathematical Statistics ROBUST 2010 February

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

THE CORE PROBLEM WITHIN A LINEAR APPROXIMATION PROBLEM AX B WITH MULTIPLE RIGHT-HAND SIDES. 1. Introduction. Consider a linear approximation problem

THE CORE PROBLEM WITHIN A LINEAR APPROXIMATION PROBLEM AX B WITH MULTIPLE RIGHT-HAND SIDES. 1. Introduction. Consider a linear approximation problem HE CORE PROBLEM WIHIN A LINEAR APPROXIMAION PROBLEM AX B WIH MULIPLE RIGH-HAND SIDES IVEA HNĚYNKOVÁ, MARIN PLEŠINGER, AND ZDENĚK SRAKOŠ Abstract his paper focuses on total least squares (LS) problems AX

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold Appendix B Y Mathematical Refresher This appendix presents mathematical concepts we use in developing our main arguments in the text of this book. This appendix can be read in the order in which it appears,

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Lecture 1: Review of linear algebra

Lecture 1: Review of linear algebra Lecture 1: Review of linear algebra Linear functions and linearization Inverse matrix, least-squares and least-norm solutions Subspaces, basis, and dimension Change of basis and similarity transformations

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

The SVD-Fundamental Theorem of Linear Algebra

The SVD-Fundamental Theorem of Linear Algebra Nonlinear Analysis: Modelling and Control, 2006, Vol. 11, No. 2, 123 136 The SVD-Fundamental Theorem of Linear Algebra A. G. Akritas 1, G. I. Malaschonok 2, P. S. Vigklas 1 1 Department of Computer and

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Summary of Week 9 B = then A A =

Summary of Week 9 B = then A A = Summary of Week 9 Finding the square root of a positive operator Last time we saw that positive operators have a unique positive square root We now briefly look at how one would go about calculating the

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

MATH36001 Generalized Inverses and the SVD 2015

MATH36001 Generalized Inverses and the SVD 2015 MATH36001 Generalized Inverses and the SVD 201 1 Generalized Inverses of Matrices A matrix has an inverse only if it is square and nonsingular. However there are theoretical and practical applications

More information

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, 2. REVIEW OF LINEAR ALGEBRA 1 Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, where Y n 1 response vector and X n p is the model matrix (or design matrix ) with one row for

More information

Singular Value Decomposition and Polar Form

Singular Value Decomposition and Polar Form Chapter 14 Singular Value Decomposition and Polar Form 14.1 Singular Value Decomposition for Square Matrices Let f : E! E be any linear map, where E is a Euclidean space. In general, it may not be possible

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Solution of Linear Equations

Solution of Linear Equations Solution of Linear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 7, 07 We have discussed general methods for solving arbitrary equations, and looked at the special class of polynomial equations A subclass

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

arxiv: v1 [math.na] 1 Sep 2018

arxiv: v1 [math.na] 1 Sep 2018 On the perturbation of an L -orthogonal projection Xuefeng Xu arxiv:18090000v1 [mathna] 1 Sep 018 September 5 018 Abstract The L -orthogonal projection is an important mathematical tool in scientific computing

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition We are interested in more than just sym+def matrices. But the eigenvalue decompositions discussed in the last section of notes will play a major role in solving general

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Linear Algebra: Characteristic Value Problem

Linear Algebra: Characteristic Value Problem Linear Algebra: Characteristic Value Problem . The Characteristic Value Problem Let < be the set of real numbers and { be the set of complex numbers. Given an n n real matrix A; does there exist a number

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

A Note on Simple Nonzero Finite Generalized Singular Values

A Note on Simple Nonzero Finite Generalized Singular Values A Note on Simple Nonzero Finite Generalized Singular Values Wei Ma Zheng-Jian Bai December 21 212 Abstract In this paper we study the sensitivity and second order perturbation expansions of simple nonzero

More information

1 Principal component analysis and dimensional reduction

1 Principal component analysis and dimensional reduction Linear Algebra Working Group :: Day 3 Note: All vector spaces will be finite-dimensional vector spaces over the field R. 1 Principal component analysis and dimensional reduction Definition 1.1. Given an

More information

Forecast comparison of principal component regression and principal covariate regression

Forecast comparison of principal component regression and principal covariate regression Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

7. Symmetric Matrices and Quadratic Forms

7. Symmetric Matrices and Quadratic Forms Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information

Lecture 19 Multiple (Linear) Regression

Lecture 19 Multiple (Linear) Regression Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Error estimates for the ESPRIT algorithm

Error estimates for the ESPRIT algorithm Error estimates for the ESPRIT algorithm Daniel Potts Manfred Tasche Let z j := e f j j = 1,..., M) with f j [ ϕ, 0] + i [ π, π) and small ϕ 0 be distinct nodes. With complex coefficients c j 0, we consider

More information

Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II)

Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II) 1 Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II) Merico Argentati (speaker), Andrew Knyazev, Ilya Lashuk and Abram Jujunashvili Department of Mathematics

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset

More information

Least squares Solution of Homogeneous Equations

Least squares Solution of Homogeneous Equations Least squares Solution of Homogeneous Equations supportive text for teaching purposes Revision: 1.2, dated: December 15, 2005 Tomáš Svoboda Czech Technical University, Faculty of Electrical Engineering

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

Stat 206: Linear algebra

Stat 206: Linear algebra Stat 206: Linear algebra James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Vectors We have already been working with vectors, but let s review a few more concepts. The inner product of two

More information

identity matrix, shortened I the jth column of I; the jth standard basis vector matrix A with its elements a ij

identity matrix, shortened I the jth column of I; the jth standard basis vector matrix A with its elements a ij Notation R R n m R n m r R n s real numbers set of n m real matrices subset of R n m consisting of matrices with rank r subset of R n n consisting of symmetric matrices NND n subset of R n s consisting

More information

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact Zdeněk Strakoš Academy of Sciences and Charles University, Prague http://www.cs.cas.cz/ strakos Hong Kong, February

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Pseudoinverse & Orthogonal Projection Operators

Pseudoinverse & Orthogonal Projection Operators Pseudoinverse & Orthogonal Projection Operators ECE 174 Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 48 The Four

More information

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5. 2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information