Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Size: px
Start display at page:

Download "Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices"

Transcription

1 Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is a pre-requisite for this class; I strongly urge you to go back to your textbook and notes for review Expectations and Variances with Vectors and Matrices If we have p random variables, Z, Z 2, Z p, we can put them into a random vector Z Z Z 2 Z p T This random vector can be thought of as a p matrix of random variables This expected value of Z is defined to be the vector If a and b are non-random scalars, then If a is a non-random vector then If A is a non-random matrix, then µ E Z E Z E Z 2 E Z p () E az + bw ae Z + be W (2) E(a T Z) a T E(Z) E AZ AE Z (3) Every coordinate of a random vector has some covariance with every other coordinate The variance-covariance matrix of Z is the p p matrix which stores these value In other words, Var Z Cov Z, Z 2 Cov Z, Z p Cov Z 2, Z Var Z 2 Cov Z 2, Z p Var Z (4) Cov Z p, Z Cov Z p, Z 2 Var Z p This inherits properties of ordinary variances and covariances Just as Var Z E Z 2 (E Z) 2, we have Var Z E ZZ T E Z (E Z) T (5) For a non-random vector a and a non-random scalar b, For a non-random matrix C, Var a + bz b 2 Var Z (6) Var CZ CVar Z C T (7) (Check that the dimensions all conform here: if c is q p, Var cz should be q q, and so is the right-hand side)

2 A random vector Z has a multivariate Normal distribution with mean µ and variance Σ if its density is ( f(z) (2π) p/2 exp ) Σ /2 2 (z µ)t Σ (z µ) We write this as Z N(µ, Σ) or Z MV N(µ, Σ) If A is a square matrix, then the trace of A denoted by A is defined to be the sum of the diaginal elements In other words, tr A A jj j Recall that the trace satisfies these properties: and we have the cyclic property tr(a + B) tr(a) + tr(b), tr(ca) c tr(a), tr(a T ) tr(a) tr(abc) tr(bca) tr(cab) If C is non-random, then Z T CZ is called a quadratic form We have that To see this, notice that E Z T CZ E Z T CE Z + trcvar Z (8) Z T CZ tr Z T CZ (9) because it s a matrix But the trace of a matrix product doesn t change when we cyclicly permute the matrices, so Z T CZ tr CZZ T (0) Therefore E Z T CZ E tr CZZ T () tr E CZZ T (2) tr CE ZZ T (3) tr C(Var Z + E Z E Z T ) (4) tr CVar Z + tr CE Z E Z T ) (5) tr CVar Z + tr E Z T CE Z (6) tr CVar Z + E Z T CE Z (7) using the fact that tr is a linear operation so it commutes with taking expectations; the decomposition of Var Z; the cyclic permutation trick again; and finally dropping tr from a scalar Unfortunately, there is generally no simple formula for the variance of a quadratic form, unless the random vector is Gaussian If Z N(µ, Σ) then Var(Z T CZ) 2 tr(cσcσ) + 4µ T CΣCµ 2 Least Squares in Matrix Form Our data consists of n paired observations of the predictor variable X and the response variable Y, ie, (X, Y ), (X n, Y n ) We wish to fit the model Y β 0 + β X + ɛ (8) where E ɛ X x 0, Var ɛ X x σ 2, and ɛ is uncorrelated across measurements 2

3 2 The Basic Matrices Y Y 2 Y, β Y n β0 β, X X X 2 ɛ, ɛ ɛ 2 X n ɛ n (9) Note that X which is called the design matrix is an n 2 matrix, where the first column is always, and the second column contains the actual observations of X Now β 0 + β X β 0 + β X 2 Xβ (20) β 0 + β X n So we can write the set of equations Y i β 0 + β X i + ɛ i, i,, n in the simpler form Y Xβ + ɛ 22 Mean Squared Error Let e e(β) Y Xβ (2) The training error (or observed mean squared error) is MSE(β) n n e 2 i (β) MSE(β) n et e (22) i Let us expand this a little for further use We have: MSE(β) n et e (23) n (Y Xβ)T (Y Xβ) (24) n (YT β T X T )(Y Xβ) (25) ( Y T Y Y T Xβ β T X T Y + β T X T Xβ ) n (26) ( Y T Y 2β T X T Y + β T X T Xβ ) n (27) where we used the fact that β T X T Y (Y T Xβ) T Y T Xβ 3

4 23 Minimizing the MSE First, we find the gradient of the MSE with respect to β: MSE(β) ( Y T Y 2 β T X T Y + β T X T Xβ ) n (28) ( 0 2X T Y + 2X T Xβ ) n (29) 2 ( X T Xβ X T Y ) n (30) We now set this to zero at the optimum, β: X T X β X T Y 0 (3) This equation, for the two-dimensional vector β, corresponds to our pair of normal or estimating equations for β 0 and β Thus, it, too, is called an estimating equation Solving, we get β (X T X) X T Y (32) That is, we ve got one matrix equation which gives us both coefficient estimates If this is right, the equation we ve got above should in fact reproduce the least-squares estimates we ve already derived, which are of course β c XY, β0 Y β X (33) Let s see if that s right As a first step, let s introduce normalizing factors of /n into both the matrix products: β (n X T X) (n X T Y) (34) Now n XT Y n X X 2 X n n i Y i i X iy i Y XY Y Y 2 Y n (35) (36) Similarly, n XT X n n i X i i X i i X2 i X X X 2 (37) Hence, ( ) n XT X X 2 X 2 X 2 X X X 2 X X 4

5 Therefore, (X T X) X T Y X 2 X Y X XY X 2 Y X XY (X Y ) + XY ( + X2 )Y X(c XY + X Y ) c XY s 2 xy + X 2 Y Xc XY X 2 Y Y c XY X c XY c XY β0 β (38) (39) (40) (4) (42) 3 Fitted Values and Residuals Remember that when the coefficient vector is β, the point predictions (fitted values) for each data point are Xβ Thus the vector of fitted values is Using our equation for β, we then have where Ŷ m(x) m X β Ŷ X β X(X T X) X T Y HY is called the hat matrix or the influence matrix Let s look at some of the properties of the hat matrix H X(X T X) X T (43) Influence Check that Ŷi/ Y j H ij Thus, H ij is the rate at which the i th fitted value changes as we vary the j th observation, the influence that observation has on that fitted value 2 Symmetry It s easy to see that H T H 3 Idempotency Check that H 2 H, so the matrix is idempotent Geometry A symmtric, idempotent matrix is a projection matrix This means that H projects Y into a lower dimensional subspace Specifically, Y is a point in R n but Ŷ HY is a linear combination of two vectors, namely, the two columns of X In other words: H projects Y onto the column space of X The column space of X is the set of vectors that can be written as linear combinations of the columns of X 5

6 3 Residuals The vector of residuals, e, is Here are some properties of I H: Influence e i / y j (I H) ij 2 Symmetry (I H) T I H e Y Ŷ Y HY (I H)Y (44) 3 Idempotency (I H) 2 (I H)(I H) I H H + H 2 But, since H is idempotent, H 2 H, and thus (I H) 2 (I H) Thus, MSE( β) n YT (I H) T (I H)Y n YT (I H)Y (45) 32 Expectations and Covariances Remember that Y Xβ + ɛ where ɛ is an n matrix of random variables, with mean vector 0, and variance-covariance matrix σ 2 I What can we deduce from this? First, the expectation of the fitted values: EŶ E HY HE Y HXβ + HE ɛ (46) X(X T X) X T Xβ + 0 Xβ (47) Next, the variance-covariance of the fitted values: Var Ŷ Var HY Var H(Xβ + ɛ) (48) using the symmetry and idempotency of H Similarly, the expected residual vector is zero: The variance-covariance matrix of the residuals: Var Hɛ HVar ɛ H T σ 2 HIH σ 2 H (49) E e (I H)(Xβ + E ɛ) Xβ Xβ 0 (50) Var e Var (I H)(Xβ + ɛ) (5) Var (I H)ɛ (52) (I H)Var ɛ (I H)) T (53) σ 2 (I H)(I H) T (54) σ 2 (I H) (55) Thus, the variance of each residual is not quite σ 2, nor are the residuals exactly uncorrelated Finally, the expected MSE is E n et e n E ɛ T (I H)ɛ (56) We know that this must be (n 2)σ 2 /n 6

7 4 Sampling Distribution of Estimators Let s now assume that ɛ i N(0, σ 2 ), and are independent of each other and of X The vector of all n noise terms, ɛ, is an n matrix Its distribution is a multivariate Gaussian or multivariate Normal with mean vector 0, and variance-covariance matrix σ 2 I We write this as ɛ N(0, σ 2 I) We may use this to get the sampling distribution of the estimator β: β (X T X) X T Y (57) (X T X) X T (Xβ + ɛ) (58) (X T X) X T Xβ + (X T X) X T ɛ (59) β + (X T X) X T ɛ (60) Since ɛ is Gaussian and is being multiplied by a non-random matrix, (X T X) X T ɛ is also Gaussian Its mean vector is E (X T X) X T ɛ (X T X) X T E ɛ 0 (6) while its variance matrix is Var (X T X) X T ɛ (X T X) X T Var ɛ ( (X T X) X T ) T (62) (X T X) X T σ 2 IX(X T X) (63) σ 2 (X T X) X T X(X T X) (64) σ 2 (X T X) (65) Since Var β Var (X T X) X T ɛ, we conclude that that β N(β, σ 2 (X T X) ) (66) Re-writing slightly, β N ) (β, σ2 n (n X T X) (67) will make it easier to prove to yourself that, according to this, β 0 and β are both unbiased, that Var β σ2 n s2 X β0, and that Var σ2 n ( + X2 / β0 ) This will also give us Cov, β, which otherwise would be tedious to calculate I will leave you to show, in a similar way, that the fitted values HY are multivariate Gaussian, as are the residuals e, and to find both their mean vectors and their variance matrices 5 Derivatives with Respect to Vectors This is a brief review of basic vector calculus Consider some scalar function of a vector, say f(x), where X is represented as a p matrix (Here X is just being used as a place-holder or generic variable; it s not necessarily the design matrix of a regression) We would like to think about the derivatives of f with respect to X We can write f(x) f(x,, x p ) where X (x,, x p ) T 7

8 The gradient of f is the vector of partial derivatives: f The first order Taylor series of f around X 0 is f(x) f(x 0 ) + Here are some properties of the gradient: Linearity x x 2 x p (68) (X X 0 ) i x i i X 0 (69) f(x 0 ) + (X X 0 ) T f(x 0 ) (70) (af(x) + bg(x)) a f(x) + b g(x) (7) Proof: Directly from the linearity of partial derivatives 2 Linear forms If f(x) X T a, with a not a function of X, then (X T a) a (72) Proof: f(x) i X ia i, so / X i a i Notice that a was already a p matrix, so we don t have to transpose anything to get the derivative 3 Linear forms the other way If f(x) bx, with b not a function of X, then (bx) b T (73) Proof: Once again, / X i b i, but now remember that b was a p matrix, and f is p, so we need to transpose 4 Quadratic forms Let C be a p p matrix which is not a function of X, and consider the quadratic form X T CX (You can check that this is scalar) The gradient is (X T CX) (C + C T )X (74) Proof: First, write out the matrix multiplications as explicit sums: X T CX j x j c jk x k k Now take the derivative with respect to x i : x i j k 8 j k x j c jk x k (75) x j c jk x k X i (76)

9 If j k i, the term in the inner sum is 2c ii x i If j i but k i, the term in the inner sum is c ik x k If j i but k i, we get x j c ji Finally, if j i and k i, we get zero The j i terms add up to (cx) i The k i terms add up to (c T X) i (This splits the 2c ii x i evenly between them) Thus, x i ((c + c T X) i (77) and (Check that this has the right dimensions) 5 Symmetric quadratic forms If c c T, then 5 Second Derivatives f (c + c T )X (78) X T cx 2cX (79) The p p matrix of second partial derivatives is called the Hessian I won t step through its properties, except to note that they, too, follow from the basic rules for partial derivatives 52 Maxima and Minima We need all the partial derivatives to be equal to zero at a minimum or maximum This means that the gradient must be zero there At a minimum, the Hessian must be positive-definite (so that moves away from the minimum always increase the function); at a maximum, the Hessian must be negative definite (so moves away always decrease the function) If the Hessian is neither positive nor negative definite, the point is neither a minimum nor a maximum, but a saddle (since moving in some directions increases the function but moving in others decreases it, as though one were at the center of a horse s saddle) 9

Lecture 13: Simple Linear Regression in Matrix Format

Lecture 13: Simple Linear Regression in Matrix Format See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

STAT Homework 8 - Solutions

STAT Homework 8 - Solutions STAT-36700 Homework 8 - Solutions Fall 208 November 3, 208 This contains solutions for Homework 4. lease note that we have included several additional comments and approaches to the problems to give you

More information

Stat 206: Linear algebra

Stat 206: Linear algebra Stat 206: Linear algebra James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Vectors We have already been working with vectors, but let s review a few more concepts. The inner product of two

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

Basic Linear Algebra in MATLAB

Basic Linear Algebra in MATLAB Basic Linear Algebra in MATLAB 9.29 Optional Lecture 2 In the last optional lecture we learned the the basic type in MATLAB is a matrix of double precision floating point numbers. You learned a number

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

ANOVA: Analysis of Variance - Part I

ANOVA: Analysis of Variance - Part I ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.

More information

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

5 More on Linear Algebra

5 More on Linear Algebra 14.102, Math for Economists Fall 2004 Lecture Notes, 9/23/2004 These notes are primarily based on those written by George Marios Angeletos for the Harvard Math Camp in 1999 and 2000, and updated by Stavros

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Lecture 16 Solving GLMs via IRWLS

Lecture 16 Solving GLMs via IRWLS Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example

More information

Matrix Algebra, part 2

Matrix Algebra, part 2 Matrix Algebra, part 2 Ming-Ching Luoh 2005.9.12 1 / 38 Diagonalization and Spectral Decomposition of a Matrix Optimization 2 / 38 Diagonalization and Spectral Decomposition of a Matrix Also called Eigenvalues

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly. C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Appendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008

Appendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008 Appendix 5 Derivatives of Vectors and Vector-Valued Functions Draft Version 24 November 2008 Quantitative genetics often deals with vector-valued functions and here we provide a brief review of the calculus

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Chapter 3. Matrices. 3.1 Matrices

Chapter 3. Matrices. 3.1 Matrices 40 Chapter 3 Matrices 3.1 Matrices Definition 3.1 Matrix) A matrix A is a rectangular array of m n real numbers {a ij } written as a 11 a 12 a 1n a 21 a 22 a 2n A =.... a m1 a m2 a mn The array has m rows

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Reference: Davidson and MacKinnon Ch 2. In particular page

Reference: Davidson and MacKinnon Ch 2. In particular page RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A ) Econ 60 Matrix Differentiation Let a and x are k vectors and A is an k k matrix. a x a x = a = a x Ax =A + A x Ax x =A + A x Ax = xx A We don t want to prove the claim rigorously. But a x = k a i x i i=

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Stat751 / CSI771 Midterm October 15, 2015 Solutions, Comments. f(x) = 0 otherwise

Stat751 / CSI771 Midterm October 15, 2015 Solutions, Comments. f(x) = 0 otherwise Stat751 / CSI771 Midterm October 15, 2015 Solutions, Comments 1. 13pts Consider the beta distribution with PDF Γα+β ΓαΓβ xα 1 1 x β 1 0 x < 1 fx = 0 otherwise, for fixed constants 0 < α, β. Now, assume

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

Lecture 24: Weighted and Generalized Least Squares

Lecture 24: Weighted and Generalized Least Squares Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)

More information

Lecture 6 Positive Definite Matrices

Lecture 6 Positive Definite Matrices Linear Algebra Lecture 6 Positive Definite Matrices Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 2017/6/8 Lecture 6: Positive Definite Matrices

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Statistics 910, #15 1. Kalman Filter

Statistics 910, #15 1. Kalman Filter Statistics 910, #15 1 Overview 1. Summary of Kalman filter 2. Derivations 3. ARMA likelihoods 4. Recursions for the variance Kalman Filter Summary of Kalman filter Simplifications To make the derivations

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.

More information

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT Math Camp II Basic Linear Algebra Yiqing Xu MIT Aug 26, 2014 1 Solving Systems of Linear Equations 2 Vectors and Vector Spaces 3 Matrices 4 Least Squares Systems of Linear Equations Definition A linear

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM Unless otherwise stated, all vector spaces in this worksheet are finite dimensional and the scalar field F is R or C. Definition 1. A linear operator

More information

Notes on Linear Algebra I. # 1

Notes on Linear Algebra I. # 1 Notes on Linear Algebra I. # 1 Oussama Moutaoikil Contents 1 Introduction 1 2 On Vector Spaces 5 2.1 Vectors................................... 5 2.2 Vector Spaces................................ 7 2.3

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2) RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made

More information

Review Packet 1 B 11 B 12 B 13 B = B 21 B 22 B 23 B 31 B 32 B 33 B 41 B 42 B 43

Review Packet 1 B 11 B 12 B 13 B = B 21 B 22 B 23 B 31 B 32 B 33 B 41 B 42 B 43 Review Packet. For each of the following, write the vector or matrix that is specified: a. e 3 R 4 b. D = diag{, 3, } c. e R 3 d. I. For each of the following matrices and vectors, give their dimension.

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

MATH 431: FIRST MIDTERM. Thursday, October 3, 2013.

MATH 431: FIRST MIDTERM. Thursday, October 3, 2013. MATH 431: FIRST MIDTERM Thursday, October 3, 213. (1) An inner product on the space of matrices. Let V be the vector space of 2 2 real matrices (that is, the algebra Mat 2 (R), but without the mulitiplicative

More information

Appendix 3. Derivatives of Vectors and Vector-Valued Functions DERIVATIVES OF VECTORS AND VECTOR-VALUED FUNCTIONS

Appendix 3. Derivatives of Vectors and Vector-Valued Functions DERIVATIVES OF VECTORS AND VECTOR-VALUED FUNCTIONS Appendix 3 Derivatives of Vectors and Vector-Valued Functions Draft Version 1 December 2000, c Dec. 2000, B. Walsh and M. Lynch Please email any comments/corrections to: jbwalsh@u.arizona.edu DERIVATIVES

More information

A primer on matrices

A primer on matrices A primer on matrices Stephen Boyd August 4, 2007 These notes describe the notation of matrices, the mechanics of matrix manipulation, and how to use matrices to formulate and solve sets of simultaneous

More information

Ch. 12 Linear Bayesian Estimators

Ch. 12 Linear Bayesian Estimators Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).

More information

STAT 111 Recitation 7

STAT 111 Recitation 7 STAT 111 Recitation 7 Xin Lu Tan xtan@wharton.upenn.edu October 25, 2013 1 / 13 Miscellaneous Please turn in homework 6. Please pick up homework 7 and the graded homework 5. Please check your grade and

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality

Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality Orthonormal Bases, Orthogonal Matrices The Major Ideas from Last Lecture Vector Span Subspace Basis Vectors Coordinates in different bases Matrix Factorization (Basics) The Major Ideas from Last Lecture

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1 1 Rows first, columns second. Remember that. R then C. 1 A matrix is a set of real or complex numbers arranged in a rectangular array. They can be any size and shape (provided they are rectangular). A

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

STAT 8260 Theory of Linear Models Lecture Notes

STAT 8260 Theory of Linear Models Lecture Notes STAT 8260 Theory of Linear Models Lecture Notes Classical linear models are at the core of the field of statistics, and are probably the most commonly used set of statistical techniques in practice. For

More information

STA 2101/442 Assignment Four 1

STA 2101/442 Assignment Four 1 STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

The Statistical Property of Ordinary Least Squares

The Statistical Property of Ordinary Least Squares The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

We begin by thinking about population relationships.

We begin by thinking about population relationships. Conditional Expectation Function (CEF) We begin by thinking about population relationships. CEF Decomposition Theorem: Given some outcome Y i and some covariates X i there is always a decomposition where

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

5.2 Expounding on the Admissibility of Shrinkage Estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Lecture 4: Least Squares (LS) Estimation

Lecture 4: Least Squares (LS) Estimation ME 233, UC Berkeley, Spring 2014 Xu Chen Lecture 4: Least Squares (LS) Estimation Background and general solution Solution in the Gaussian case Properties Example Big picture general least squares estimation:

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Statistical Machine Learning Notes 1. Background

Statistical Machine Learning Notes 1. Background Statistical Machine Learning Notes 1 Instructor: Justin Domke Background Contents 1 Probability 2 11 Sample Spaces 2 12 Random Variables 3 13 Multiple Random Variables 5 14 Expectations 6 15 Empirical

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes TK4150 - Intro 1 Chapter 4 - Fundamentals of spatial processes Lecture notes Odd Kolbjørnsen and Geir Storvik January 30, 2017 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

STA 302f16 Assignment Five 1

STA 302f16 Assignment Five 1 STA 30f16 Assignment Five 1 Except for Problem??, these problems are preparation for the quiz in tutorial on Thursday October 0th, and are not to be handed in As usual, at times you may be asked to prove

More information

Quantum Information & Quantum Computing

Quantum Information & Quantum Computing Math 478, Phys 478, CS4803, February 9, 006 1 Georgia Tech Math, Physics & Computing Math 478, Phys 478, CS4803 Quantum Information & Quantum Computing Problems Set 1 Due February 9, 006 Part I : 1. Read

More information