Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices
|
|
- Barnard Fields
- 5 years ago
- Views:
Transcription
1 Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is a pre-requisite for this class; I strongly urge you to go back to your textbook and notes for review Expectations and Variances with Vectors and Matrices If we have p random variables, Z, Z 2, Z p, we can put them into a random vector Z Z Z 2 Z p T This random vector can be thought of as a p matrix of random variables This expected value of Z is defined to be the vector If a and b are non-random scalars, then If a is a non-random vector then If A is a non-random matrix, then µ E Z E Z E Z 2 E Z p () E az + bw ae Z + be W (2) E(a T Z) a T E(Z) E AZ AE Z (3) Every coordinate of a random vector has some covariance with every other coordinate The variance-covariance matrix of Z is the p p matrix which stores these value In other words, Var Z Cov Z, Z 2 Cov Z, Z p Cov Z 2, Z Var Z 2 Cov Z 2, Z p Var Z (4) Cov Z p, Z Cov Z p, Z 2 Var Z p This inherits properties of ordinary variances and covariances Just as Var Z E Z 2 (E Z) 2, we have Var Z E ZZ T E Z (E Z) T (5) For a non-random vector a and a non-random scalar b, For a non-random matrix C, Var a + bz b 2 Var Z (6) Var CZ CVar Z C T (7) (Check that the dimensions all conform here: if c is q p, Var cz should be q q, and so is the right-hand side)
2 A random vector Z has a multivariate Normal distribution with mean µ and variance Σ if its density is ( f(z) (2π) p/2 exp ) Σ /2 2 (z µ)t Σ (z µ) We write this as Z N(µ, Σ) or Z MV N(µ, Σ) If A is a square matrix, then the trace of A denoted by A is defined to be the sum of the diaginal elements In other words, tr A A jj j Recall that the trace satisfies these properties: and we have the cyclic property tr(a + B) tr(a) + tr(b), tr(ca) c tr(a), tr(a T ) tr(a) tr(abc) tr(bca) tr(cab) If C is non-random, then Z T CZ is called a quadratic form We have that To see this, notice that E Z T CZ E Z T CE Z + trcvar Z (8) Z T CZ tr Z T CZ (9) because it s a matrix But the trace of a matrix product doesn t change when we cyclicly permute the matrices, so Z T CZ tr CZZ T (0) Therefore E Z T CZ E tr CZZ T () tr E CZZ T (2) tr CE ZZ T (3) tr C(Var Z + E Z E Z T ) (4) tr CVar Z + tr CE Z E Z T ) (5) tr CVar Z + tr E Z T CE Z (6) tr CVar Z + E Z T CE Z (7) using the fact that tr is a linear operation so it commutes with taking expectations; the decomposition of Var Z; the cyclic permutation trick again; and finally dropping tr from a scalar Unfortunately, there is generally no simple formula for the variance of a quadratic form, unless the random vector is Gaussian If Z N(µ, Σ) then Var(Z T CZ) 2 tr(cσcσ) + 4µ T CΣCµ 2 Least Squares in Matrix Form Our data consists of n paired observations of the predictor variable X and the response variable Y, ie, (X, Y ), (X n, Y n ) We wish to fit the model Y β 0 + β X + ɛ (8) where E ɛ X x 0, Var ɛ X x σ 2, and ɛ is uncorrelated across measurements 2
3 2 The Basic Matrices Y Y 2 Y, β Y n β0 β, X X X 2 ɛ, ɛ ɛ 2 X n ɛ n (9) Note that X which is called the design matrix is an n 2 matrix, where the first column is always, and the second column contains the actual observations of X Now β 0 + β X β 0 + β X 2 Xβ (20) β 0 + β X n So we can write the set of equations Y i β 0 + β X i + ɛ i, i,, n in the simpler form Y Xβ + ɛ 22 Mean Squared Error Let e e(β) Y Xβ (2) The training error (or observed mean squared error) is MSE(β) n n e 2 i (β) MSE(β) n et e (22) i Let us expand this a little for further use We have: MSE(β) n et e (23) n (Y Xβ)T (Y Xβ) (24) n (YT β T X T )(Y Xβ) (25) ( Y T Y Y T Xβ β T X T Y + β T X T Xβ ) n (26) ( Y T Y 2β T X T Y + β T X T Xβ ) n (27) where we used the fact that β T X T Y (Y T Xβ) T Y T Xβ 3
4 23 Minimizing the MSE First, we find the gradient of the MSE with respect to β: MSE(β) ( Y T Y 2 β T X T Y + β T X T Xβ ) n (28) ( 0 2X T Y + 2X T Xβ ) n (29) 2 ( X T Xβ X T Y ) n (30) We now set this to zero at the optimum, β: X T X β X T Y 0 (3) This equation, for the two-dimensional vector β, corresponds to our pair of normal or estimating equations for β 0 and β Thus, it, too, is called an estimating equation Solving, we get β (X T X) X T Y (32) That is, we ve got one matrix equation which gives us both coefficient estimates If this is right, the equation we ve got above should in fact reproduce the least-squares estimates we ve already derived, which are of course β c XY, β0 Y β X (33) Let s see if that s right As a first step, let s introduce normalizing factors of /n into both the matrix products: β (n X T X) (n X T Y) (34) Now n XT Y n X X 2 X n n i Y i i X iy i Y XY Y Y 2 Y n (35) (36) Similarly, n XT X n n i X i i X i i X2 i X X X 2 (37) Hence, ( ) n XT X X 2 X 2 X 2 X X X 2 X X 4
5 Therefore, (X T X) X T Y X 2 X Y X XY X 2 Y X XY (X Y ) + XY ( + X2 )Y X(c XY + X Y ) c XY s 2 xy + X 2 Y Xc XY X 2 Y Y c XY X c XY c XY β0 β (38) (39) (40) (4) (42) 3 Fitted Values and Residuals Remember that when the coefficient vector is β, the point predictions (fitted values) for each data point are Xβ Thus the vector of fitted values is Using our equation for β, we then have where Ŷ m(x) m X β Ŷ X β X(X T X) X T Y HY is called the hat matrix or the influence matrix Let s look at some of the properties of the hat matrix H X(X T X) X T (43) Influence Check that Ŷi/ Y j H ij Thus, H ij is the rate at which the i th fitted value changes as we vary the j th observation, the influence that observation has on that fitted value 2 Symmetry It s easy to see that H T H 3 Idempotency Check that H 2 H, so the matrix is idempotent Geometry A symmtric, idempotent matrix is a projection matrix This means that H projects Y into a lower dimensional subspace Specifically, Y is a point in R n but Ŷ HY is a linear combination of two vectors, namely, the two columns of X In other words: H projects Y onto the column space of X The column space of X is the set of vectors that can be written as linear combinations of the columns of X 5
6 3 Residuals The vector of residuals, e, is Here are some properties of I H: Influence e i / y j (I H) ij 2 Symmetry (I H) T I H e Y Ŷ Y HY (I H)Y (44) 3 Idempotency (I H) 2 (I H)(I H) I H H + H 2 But, since H is idempotent, H 2 H, and thus (I H) 2 (I H) Thus, MSE( β) n YT (I H) T (I H)Y n YT (I H)Y (45) 32 Expectations and Covariances Remember that Y Xβ + ɛ where ɛ is an n matrix of random variables, with mean vector 0, and variance-covariance matrix σ 2 I What can we deduce from this? First, the expectation of the fitted values: EŶ E HY HE Y HXβ + HE ɛ (46) X(X T X) X T Xβ + 0 Xβ (47) Next, the variance-covariance of the fitted values: Var Ŷ Var HY Var H(Xβ + ɛ) (48) using the symmetry and idempotency of H Similarly, the expected residual vector is zero: The variance-covariance matrix of the residuals: Var Hɛ HVar ɛ H T σ 2 HIH σ 2 H (49) E e (I H)(Xβ + E ɛ) Xβ Xβ 0 (50) Var e Var (I H)(Xβ + ɛ) (5) Var (I H)ɛ (52) (I H)Var ɛ (I H)) T (53) σ 2 (I H)(I H) T (54) σ 2 (I H) (55) Thus, the variance of each residual is not quite σ 2, nor are the residuals exactly uncorrelated Finally, the expected MSE is E n et e n E ɛ T (I H)ɛ (56) We know that this must be (n 2)σ 2 /n 6
7 4 Sampling Distribution of Estimators Let s now assume that ɛ i N(0, σ 2 ), and are independent of each other and of X The vector of all n noise terms, ɛ, is an n matrix Its distribution is a multivariate Gaussian or multivariate Normal with mean vector 0, and variance-covariance matrix σ 2 I We write this as ɛ N(0, σ 2 I) We may use this to get the sampling distribution of the estimator β: β (X T X) X T Y (57) (X T X) X T (Xβ + ɛ) (58) (X T X) X T Xβ + (X T X) X T ɛ (59) β + (X T X) X T ɛ (60) Since ɛ is Gaussian and is being multiplied by a non-random matrix, (X T X) X T ɛ is also Gaussian Its mean vector is E (X T X) X T ɛ (X T X) X T E ɛ 0 (6) while its variance matrix is Var (X T X) X T ɛ (X T X) X T Var ɛ ( (X T X) X T ) T (62) (X T X) X T σ 2 IX(X T X) (63) σ 2 (X T X) X T X(X T X) (64) σ 2 (X T X) (65) Since Var β Var (X T X) X T ɛ, we conclude that that β N(β, σ 2 (X T X) ) (66) Re-writing slightly, β N ) (β, σ2 n (n X T X) (67) will make it easier to prove to yourself that, according to this, β 0 and β are both unbiased, that Var β σ2 n s2 X β0, and that Var σ2 n ( + X2 / β0 ) This will also give us Cov, β, which otherwise would be tedious to calculate I will leave you to show, in a similar way, that the fitted values HY are multivariate Gaussian, as are the residuals e, and to find both their mean vectors and their variance matrices 5 Derivatives with Respect to Vectors This is a brief review of basic vector calculus Consider some scalar function of a vector, say f(x), where X is represented as a p matrix (Here X is just being used as a place-holder or generic variable; it s not necessarily the design matrix of a regression) We would like to think about the derivatives of f with respect to X We can write f(x) f(x,, x p ) where X (x,, x p ) T 7
8 The gradient of f is the vector of partial derivatives: f The first order Taylor series of f around X 0 is f(x) f(x 0 ) + Here are some properties of the gradient: Linearity x x 2 x p (68) (X X 0 ) i x i i X 0 (69) f(x 0 ) + (X X 0 ) T f(x 0 ) (70) (af(x) + bg(x)) a f(x) + b g(x) (7) Proof: Directly from the linearity of partial derivatives 2 Linear forms If f(x) X T a, with a not a function of X, then (X T a) a (72) Proof: f(x) i X ia i, so / X i a i Notice that a was already a p matrix, so we don t have to transpose anything to get the derivative 3 Linear forms the other way If f(x) bx, with b not a function of X, then (bx) b T (73) Proof: Once again, / X i b i, but now remember that b was a p matrix, and f is p, so we need to transpose 4 Quadratic forms Let C be a p p matrix which is not a function of X, and consider the quadratic form X T CX (You can check that this is scalar) The gradient is (X T CX) (C + C T )X (74) Proof: First, write out the matrix multiplications as explicit sums: X T CX j x j c jk x k k Now take the derivative with respect to x i : x i j k 8 j k x j c jk x k (75) x j c jk x k X i (76)
9 If j k i, the term in the inner sum is 2c ii x i If j i but k i, the term in the inner sum is c ik x k If j i but k i, we get x j c ji Finally, if j i and k i, we get zero The j i terms add up to (cx) i The k i terms add up to (c T X) i (This splits the 2c ii x i evenly between them) Thus, x i ((c + c T X) i (77) and (Check that this has the right dimensions) 5 Symmetric quadratic forms If c c T, then 5 Second Derivatives f (c + c T )X (78) X T cx 2cX (79) The p p matrix of second partial derivatives is called the Hessian I won t step through its properties, except to note that they, too, follow from the basic rules for partial derivatives 52 Maxima and Minima We need all the partial derivatives to be equal to zero at a minimum or maximum This means that the gradient must be zero there At a minimum, the Hessian must be positive-definite (so that moves away from the minimum always increase the function); at a maximum, the Hessian must be negative definite (so moves away always decrease the function) If the Hessian is neither positive nor negative definite, the point is neither a minimum nor a maximum, but a saddle (since moving in some directions increases the function but moving in others decreases it, as though one were at the center of a horse s saddle) 9
Lecture 13: Simple Linear Regression in Matrix Format
See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix
More informationLinear Algebra Review
Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationMatrix Approach to Simple Linear Regression: An Overview
Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix
More informationSTAT Homework 8 - Solutions
STAT-36700 Homework 8 - Solutions Fall 208 November 3, 208 This contains solutions for Homework 4. lease note that we have included several additional comments and approaches to the problems to give you
More informationStat 206: Linear algebra
Stat 206: Linear algebra James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Vectors We have already been working with vectors, but let s review a few more concepts. The inner product of two
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationBasic Linear Algebra in MATLAB
Basic Linear Algebra in MATLAB 9.29 Optional Lecture 2 In the last optional lecture we learned the the basic type in MATLAB is a matrix of double precision floating point numbers. You learned a number
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More informationANOVA: Analysis of Variance - Part I
ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.
More informationTopic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form
Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in
More informationChapter 5 Matrix Approach to Simple Linear Regression
STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More information5 More on Linear Algebra
14.102, Math for Economists Fall 2004 Lecture Notes, 9/23/2004 These notes are primarily based on those written by George Marios Angeletos for the Harvard Math Camp in 1999 and 2000, and updated by Stavros
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationMatrix Algebra, part 2
Matrix Algebra, part 2 Ming-Ching Luoh 2005.9.12 1 / 38 Diagonalization and Spectral Decomposition of a Matrix Optimization 2 / 38 Diagonalization and Spectral Decomposition of a Matrix Also called Eigenvalues
More information. a m1 a mn. a 1 a 2 a = a n
Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by
More information11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.
C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationChapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables
Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationAppendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008
Appendix 5 Derivatives of Vectors and Vector-Valued Functions Draft Version 24 November 2008 Quantitative genetics often deals with vector-valued functions and here we provide a brief review of the calculus
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter
More informationNeed for Several Predictor Variables
Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables
More informationChapter 3. Matrices. 3.1 Matrices
40 Chapter 3 Matrices 3.1 Matrices Definition 3.1 Matrix) A matrix A is a rectangular array of m n real numbers {a ij } written as a 11 a 12 a 1n a 21 a 22 a 2n A =.... a m1 a m2 a mn The array has m rows
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationReference: Davidson and MacKinnon Ch 2. In particular page
RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy
More informationExercises * on Principal Component Analysis
Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................
More informationEcon 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )
Econ 60 Matrix Differentiation Let a and x are k vectors and A is an k k matrix. a x a x = a = a x Ax =A + A x Ax x =A + A x Ax = xx A We don t want to prove the claim rigorously. But a x = k a i x i i=
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationStat751 / CSI771 Midterm October 15, 2015 Solutions, Comments. f(x) = 0 otherwise
Stat751 / CSI771 Midterm October 15, 2015 Solutions, Comments 1. 13pts Consider the beta distribution with PDF Γα+β ΓαΓβ xα 1 1 x β 1 0 x < 1 fx = 0 otherwise, for fixed constants 0 < α, β. Now, assume
More informationMath Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88
Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant
More informationLecture 24: Weighted and Generalized Least Squares
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)
More informationLecture 6 Positive Definite Matrices
Linear Algebra Lecture 6 Positive Definite Matrices Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 2017/6/8 Lecture 6: Positive Definite Matrices
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix
More informationMIT Spring 2015
Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is
Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y
More informationStatistics 910, #15 1. Kalman Filter
Statistics 910, #15 1 Overview 1. Summary of Kalman filter 2. Derivations 3. ARMA likelihoods 4. Recursions for the variance Kalman Filter Summary of Kalman filter Simplifications To make the derivations
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.
More informationMath Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT
Math Camp II Basic Linear Algebra Yiqing Xu MIT Aug 26, 2014 1 Solving Systems of Linear Equations 2 Vectors and Vector Spaces 3 Matrices 4 Least Squares Systems of Linear Equations Definition A linear
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationLINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM
LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM Unless otherwise stated, all vector spaces in this worksheet are finite dimensional and the scalar field F is R or C. Definition 1. A linear operator
More informationNotes on Linear Algebra I. # 1
Notes on Linear Algebra I. # 1 Oussama Moutaoikil Contents 1 Introduction 1 2 On Vector Spaces 5 2.1 Vectors................................... 5 2.2 Vector Spaces................................ 7 2.3
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationThe Hilbert Space of Random Variables
The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationMLES & Multivariate Normal Theory
Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate
More informationIn the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)
RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made
More informationReview Packet 1 B 11 B 12 B 13 B = B 21 B 22 B 23 B 31 B 32 B 33 B 41 B 42 B 43
Review Packet. For each of the following, write the vector or matrix that is specified: a. e 3 R 4 b. D = diag{, 3, } c. e R 3 d. I. For each of the following matrices and vectors, give their dimension.
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationMATH 431: FIRST MIDTERM. Thursday, October 3, 2013.
MATH 431: FIRST MIDTERM Thursday, October 3, 213. (1) An inner product on the space of matrices. Let V be the vector space of 2 2 real matrices (that is, the algebra Mat 2 (R), but without the mulitiplicative
More informationAppendix 3. Derivatives of Vectors and Vector-Valued Functions DERIVATIVES OF VECTORS AND VECTOR-VALUED FUNCTIONS
Appendix 3 Derivatives of Vectors and Vector-Valued Functions Draft Version 1 December 2000, c Dec. 2000, B. Walsh and M. Lynch Please email any comments/corrections to: jbwalsh@u.arizona.edu DERIVATIVES
More informationA primer on matrices
A primer on matrices Stephen Boyd August 4, 2007 These notes describe the notation of matrices, the mechanics of matrix manipulation, and how to use matrices to formulate and solve sets of simultaneous
More informationCh. 12 Linear Bayesian Estimators
Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).
More informationSTAT 111 Recitation 7
STAT 111 Recitation 7 Xin Lu Tan xtan@wharton.upenn.edu October 25, 2013 1 / 13 Miscellaneous Please turn in homework 6. Please pick up homework 7 and the graded homework 5. Please check your grade and
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationOrthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality
Orthonormal Bases, Orthogonal Matrices The Major Ideas from Last Lecture Vector Span Subspace Basis Vectors Coordinates in different bases Matrix Factorization (Basics) The Major Ideas from Last Lecture
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationECON The Simple Regression Model
ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In
More informationGetting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1
1 Rows first, columns second. Remember that. R then C. 1 A matrix is a set of real or complex numbers arranged in a rectangular array. They can be any size and shape (provided they are rectangular). A
More informationOrdinary Least Squares Regression
Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section
More informationRegression. Oscar García
Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is
More informationSTAT 8260 Theory of Linear Models Lecture Notes
STAT 8260 Theory of Linear Models Lecture Notes Classical linear models are at the core of the field of statistics, and are probably the most commonly used set of statistical techniques in practice. For
More informationSTA 2101/442 Assignment Four 1
STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationThe Statistical Property of Ordinary Least Squares
The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationWe begin by thinking about population relationships.
Conditional Expectation Function (CEF) We begin by thinking about population relationships. CEF Decomposition Theorem: Given some outcome Y i and some covariates X i there is always a decomposition where
More informationXβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =
The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design
More information5.2 Expounding on the Admissibility of Shrinkage Estimators
STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2
MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is
More informationLecture 4: Least Squares (LS) Estimation
ME 233, UC Berkeley, Spring 2014 Xu Chen Lecture 4: Least Squares (LS) Estimation Background and general solution Solution in the Gaussian case Properties Example Big picture general least squares estimation:
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationStatistical Machine Learning Notes 1. Background
Statistical Machine Learning Notes 1 Instructor: Justin Domke Background Contents 1 Probability 2 11 Sample Spaces 2 12 Random Variables 3 13 Multiple Random Variables 5 14 Expectations 6 15 Empirical
More informationLecture 8: Linear Algebra Background
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationChapter 4 - Fundamentals of spatial processes Lecture notes
TK4150 - Intro 1 Chapter 4 - Fundamentals of spatial processes Lecture notes Odd Kolbjørnsen and Geir Storvik January 30, 2017 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationSTA 302f16 Assignment Five 1
STA 30f16 Assignment Five 1 Except for Problem??, these problems are preparation for the quiz in tutorial on Thursday October 0th, and are not to be handed in As usual, at times you may be asked to prove
More informationQuantum Information & Quantum Computing
Math 478, Phys 478, CS4803, February 9, 006 1 Georgia Tech Math, Physics & Computing Math 478, Phys 478, CS4803 Quantum Information & Quantum Computing Problems Set 1 Due February 9, 006 Part I : 1. Read
More information