STAT 100C: Linear models

Size: px
Start display at page:

Download "STAT 100C: Linear models"

Transcription

1 STAT 100C: Linear models Arash A. Amini April 27, / 1

2 Table of Contents 2 / 1

3 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R n Im(X ) = C(X ) = image = column space = range. ker(x ) = N(X ) = kernel = null space 2. Orthogonal decomposition of a space w.r.t. a subspace V : R n = V V 3. Spectral decomposition of a symmetric matrix A: A = UΛU T, U : orthogonal, Λ : diagonal 3 / 1

4 Table of Contents 4 / 1

5 Linear independence A set of vector {x 1,..., x n } is linearly dependent if a nontrivial linear combination of them is zero: That is, there exists c 1,..., c n not all of which zero such that n c i x i = 0 i=1 Otherwise the set is linearly independent. Example 1 Which of the two is a linearly independent set? 1 1, , 1 1, 1 1, / 1

6 Exercise: Show that if a set of nonzero vectors are pairwise orthogonal, they are linearly independent. Exercise: Show that if a set of vectors contains the zero vector, they are linearly dependent. 6 / 1

7 Span The span of a set of vectors is the set of all their linear combinations: { n } span{x 1,..., x n } = c i x i c1,..., c n R i=1 Example 2 span 1 1, 2 2 = t t t1, t 2 R = (t 1 + 2t 2 ) 1 t1, t 2 R 0 1 = t 1 t R 0 7 / 1

8 Example 3 span 1 1, 1 1 = t t t1, t 2 R = t 1 + t 2 t 1 t 2 t1, t 2 R 0 = α β α, β R = α 0 + β 1 α, β R = span 0, / 1

9 Basis and dimension A set of linearly independent vectors that span a subspace V is called a basis of the subspace. A subspace V can have different bases: V := α β α, β R = span 1 1, 1 1 = span 1 0, All the bases for a given subspace V have the same number of elements. This common number is the dimension of V, denoted dim(v ). In the example, dim(v ) = 2. Dimension formalizes notion of degrees of freedom. 9 / 1

10 Table of Contents 10 / 1

11 Image or column space Column space of a matrix X is the span of the columns of X. Let {x 1,..., x n } R n (n-dimensional vectors). Form a matrix with these columns: X = ( x 1 x 2 x p ) R n p Recall that for β = (β 1,..., β p ) R p : We have β 1 X β = ( ) β 2 x 1 x 2 x p. = β p p β j x j { p col(x ) = span{x 1,..., x p } = β j x j β1,..., β p R p} j=1 j=1 = {X β : β R p } 11 / 1

12 Image or column space Column space is also called range or image: col(x ) = ran(x ) = Im(X ) = {X β : β R p } Im(X ) is a linear subspace of R n. Dimension of the image is called the rank of the matrix: rank(x ) := dim(im(x )) rank(x ) is the number of linearly independent columns of X. 12 / 1

13 Kernel or null space Kernel of X R n p is the set of all vector that are mapped to zero by X : Note that ker(x ) R p. ker(x ) = {β R p X β = 0} 13 / 1

14 Example 4 Consider X = What is Im(X )? Im(X ) = span 1 1, 2 2 = span rank(x ) = dim(im(x )) = 1. What is ker(x )? 1 2 ker(x ) = β = (β 1, β 2 ) β β 2 2 = = { β = (β 1, β 2 ) β1 + 2β 2 = 0 } null(x ) = dim(ker(x )) = 1. rank(x ) + null(x ) = / 1

15 Table of Contents 15 / 1

16 Inner product, orthogonality For two vectors x, y R n, the (Euclidean) inner product is n x, y := x i y i = [ y 1 ] x 1,..., x n. = x T y i=1 y n x is orthogonal to y, denoted x y if x, y = 0. Let V R n be a (linear) subspace. We say x is orthogonal to V, denoted as x V whenever x v, v V 16 / 1

17 Orthogonal complement The set of all vectors x that are orthogonal to V is called the orthogonal complement of V : V = {x R n : x V } = {x R n : x, y = 0 for all y R n } 17 / 1

18 Example 5 Consider X = Let V = Im(X ). What is V? Let x 1, x 2 be the two columns of X. z V iff z x 1 and z x 2. Equivalent to z 1 = z 2 = 0 and z 3 could be anything: 0 0 V = 0 : α R = span 0. α / 1

19 Norm and distance The (Euclidean) norm of a vector x R n also called l 2 norm is x = x, x = x T x = n where x = (x 1,..., x n ). For two vectors x, y R n, their (Euclidean) distance is x y = n (x i y i ) 2. i=1 i=1 x 2 i Exercise: Show that x y 2 = x 2 + y 2 2 x, y. 19 / 1

20 Projection Let V R n be a subspace and x R n be a vector. Then, there is a unique closest element in V to x: x = argmin x z. z V x is called the projection of x onto V. Consider the error e = x x. The projection x is the only vector in V such that e V. In other words, e V. Thus, Proposition 1 Every vector x R n can be uniquely represented as x = x + e, where x V and e V mnemonically, we write R n = V V. Thus, we have an orthogonal decomposition of the space w.r.t a subspace V and its orthogonal complement V. 20 / 1

21 Proof (Optional) e V implies x is the projection: Assume x V is such that e := x x V. For any z V, we have (substitute x = x + e) x z 2 = x z e, x z + e 2. (1) Since x z V (why?), the cross-term vanishes. Thus, x z 2 = x z 2 + e 2 for all z V, showing that x is the projection (why?). Conversely, x being projection implies e V : Since x minimizes z x z 2, from (??), x z e, x z = x z 2 e 2 0, z V. By the change of variable u = x z, u e, u 0, u V. Changing u to tu and letting t 0 gives e, u 0 for all u V. 21 / 1

22 Other facts If V R n is a linear subspace: (V ) = V. (a) dim(r n ) = dim(v ) + dim(v ). (b) dim(v ) = n dim(v ). (b ) If X R m n, then [Im(X )] = ker(x T ). rank(x ) + nullity(x T ) = n. (c) (d) Notes: (b) follows from R n = V V. We will see the proof of (c) later. (d) follows by taking dimensions of both sides of (c) and using (b). Recall that nullity(a) = rank(ker(a)), i.e., the dimension of the null space. 22 / 1

23 Table of Contents 23 / 1

24 Spectral decomposition of symmetric matrices Let A R n n be a symmetric matrix: A = A T. Then, we have the eigenvalue decomposition (EVD) of the matrix: A = UΛU T where U is an orthogonal matrix: UU T = U T U = I (i.e, U 1 = U T ) Λ = diag(λ 1,..., λ n ) where {λ 1,..., λ n } are the eigenvalues of A. The columns of U, denoted as {u 1,..., u n } are eigenvectors of A: Au i = λ i u i {u 1,..., u n } is an orthonormal basis for R n : { 0 i j u i, u j = 1 i = j There is a corresponding decomposition for a general (rectangular) matrix called singular value decomposition (SVD). 24 / 1

25 Example: A 1 = 1 2 = T Example: 1 1 = What is rank(a 1 ) and rank(a 2 )? T 25 / 1

26 Table of Contents 26 / 1

27 Positive semi-definite (PSD) matrices A symmetric matrix A R n n is PSD if x, Ax = x T Ax 0, for all x R n It is positive definite (PD) of x T Ax > 0 for all x 0. Let λ 1 (A), λ 2 (A),..., λ n (A) be the eigenvalues of A. A is PSD if and only if A is PD if and only if λ i (A) 0 for all i = 1,..., n. λ i (A) > 0 for all i = 1,..., n. Every PSD matrix A has a symmetric square A 1/2 defined as the unique symmetric matrix such that A 1/2 A 1/2 = A If A = UΛU T is the EVD of A, then easy to show that A 1/2 = UΛ 1/2 U T 27 / 1

28 Table of Contents 28 / 1

29 Table of Contents 29 / 1

30 Expectation Assume that y = (y 1,..., y n ) is a random vector. We define E(y) := ( E(y 1 ),..., E(y n ) ) or compactly [E(y)] i = E(y i ). E(y) is a nonrandom vector in R n. Important consequence of the linearity of expectation: Lemma 1 If A R m n is nonrandom and y R n is random, then Proof: E(Ay) = A E(y) [ n ] [E(Ay)] i = E[(Ay) i ] = E A ij y j = j=1 n A ij E(y j ) = [A E(y)] i j=1 30 / 1

31 Similarly, we defined the expectations of random matrices elementwise: If A R m n is a random matrix with entries a ij, EA is a nonrandom matrix in R m n with entries [E(A)] ij := E(a ij ) Example 6 X 1, X 2 N(0, 1) and cov(x 1, X 2) = ρ. Consider Then, E(A) = ( ) X 2 A = 1 X 1X 2 X 1X 2 X2 2. ( ) E(X 2 1 ) E(X 1X 2) E(X 1X 2) E(X2 2 = ) ( ) 1 ρ ρ 1 31 / 1

32 Extension of Lemma?? The following is a very useful extension of Lemma??: Lemma 2 Consider matrices A, B, C such that A is random, B and C are nonrandom. Assume that the dimensions allow us to form matrix product BAC. Then, E(BAC) = B E(A) C (2) Important: We keep the order of matrix multiplication in (??). (Matrix multiplication is noncommutative.) 32 / 1

33 Example 7 ( α Let x = and A as before β) ( ) X 2 A = 1 X 1 X 2 X 1 X 2 X2 2. Then, E ( x T Ax ) = x T E(A)x = ( α β ) ( ( ) 1 ρ α ρ 1) β = α 2 + β 2 + 2ραβ 33 / 1

34 Table of Contents 34 / 1

35 Covariance matrix Consider a random vector y = (y 1,..., y n ) R n. Definition 1 The covariance matrix of y, denoted as cov(y), is an n n matrix with entries [cov(y)] ij = cov(y i, y j ) = E[(y i Ey i )(y j Ey j )] Note: Ey i = E(y i ), we drop parentheses for simplicity. cov(y i, y j ) is the usual covariance between y i and y j. The diagonal entries of the covariance matrix are: [cov(y)] ii = cov(y i, y i ) = var(y i ) Recall the alternative formula: cov(y i, y j ) = E(y i y j ) E(y i )E(y j ). 35 / 1

36 Some properties of covariance matrix Σ := cov(y): Σ is symmetric (Σ = Σ T ). If (y 1,..., y n ) are pairwise uncorrelated, then Σ is diagonal. Letting µ = E(y) R n, we have Σ = E[(y µ)(y µ) T ]. We also have Σ = E(yy T ) µµ T. Let ỹ := y E(y) be the centered version of y. Then, cov(y) = E(ỹỹ T ). Let α R and b R n, then cov(αy + b) = α 2 cov(y) 36 / 1

37 Lemma 3 Let A R m n, and y R n a random vector. Then cov(ay) = A cov(y)a T Proof: Let u := Ay, and ũ := u Eu. Also, let ỹ := y Ey. Since E(u) = A E(y), we have ũ = Aỹ, hence cov(u) = E(ũũ T ) = E(Aỹỹ T A T ) = A E(ỹỹ T ) A T where last equality is by Lemma??. (Recall that (AB) T = B T A T.) 37 / 1

38 Example 8 X 1, X 2 N(0, 1) and cov(x 1, X 2 ) = ρ. Let X = (X 1, X 2 ) R 2. (Think of it as a column vector.) What is the covariance matrix of X? ( [ ] X ) ( ) cov(x ) = cov 1 var(x1 ) cov(x = 1, X 2 ) X 2 cov(x 2, X 1 ) var(x 2 ) ( ) 1 ρ = ρ 1 Let u := αx 1 + βx 2. for α, β R. Then u = [ α β ] [ ] X 1 X 2 We have by Lemma?? cov(u) = [ α β ] cov(x ) Note cov(u) = var(u) is a scalar in this case. [ ] α = α 2 + β 2 + 2ραβ β 38 / 1

39 Example 9 X 1, X 2 N(0, 1) and cov(x 1, X 2 ) = ρ as in the previous Example. Z 1 = X 1, Z 2 = X 2 and Z 3 = X 1 X 2. Z = Z 1 Z 2, E(Z) = 0 0, cov(z) =? Z 3 0 Approach 1: (recall that covariance is bilinear) cov(z 1, Z 3 ) = cov(x 1, X 1 X 2 ) = cov(x 1, X 1 ) cov(x 1, X 2 ) = var(x 1 ) cov(x 1, X 2 ) = 1 ρ. cov(z 2, Z 3 ) = cov(x 2, X 1 X 2 ) = ρ 1. Using previous example: var(z 3 ) = ρ = 2(1 ρ). Continued on next slide / 1

40 Conclude that 1 ρ 1 ρ cov(z) = ρ 1 ρ 1 1 ρ ρ 1 2(1 ρ) Approach 2 (matrix approach): 1 0 ( ) Z = 0 1 X1 X Then, E(Z) = A E(X ) = 0 and which gives the desired result. cov(z) = A cov(x ) A T = 1 0 ( ) ( ) ρ ρ / 1

41 Proposition 2 A covariance matrix is always positive semi-definite (PSD). Proof: Fix a R n and let y R n be random. Then, var(a T y) = cov(a T y) = a T cov(y)a Since var(a T y) 0, we conclude a T cov(y)a 0, for all a R. Pathalogical case: If for some a 0, we have a T cov(y)a = 0, then var(a T y) = 0, hence a T y = constant with probability 1, that is, the distribution of y lies on a lower dimensional subspace. Note that in this case, cov(y) is a singular matrix. 41 / 1

42 Table of Contents 42 / 1

43 Decorrelation or whitening For any vector y, it is possible to find a linear transform A such that z := Ay has identity covariance matrix: cov(z) = I n = diag(1, 1,..., 1), that is, the components of z = (z 1,..., z n ) are uncorrelated and have unit variance. How to get A? Let Σ = cov(y) and take A := Σ 1/2. Hence, z = Σ 1/2 y, and cov(z) = Σ 1/2 cov(y)σ 1/2 = Σ 1/2 ΣΣ 1/2 = } Σ 1/2 {{ Σ 1/2 }} Σ 1/2 {{ Σ 1/2 } = I n. I n I n Exercise: Show that we can also take A = UΛ 1/2 where Σ = UΛU T is the EVD of Σ. 43 / 1

44 Table of Contents 44 / 1

45 Multivariate normal distribution Definition 2 A random vector y = (y 1,..., y n ) has a multivariate normal distribution (MVN) with mean vector µ = (µ 1,..., µ n ) and covariance matrix Σ R n n if it has the density f (y) = We write y N(µ, Σ). 1 [ (2π) n/2 Σ exp 1 ] 1/2 2 (y µ)t Σ 1 (y µ), y R n. We implicitly assume that Σ is invertible. The MVN has numerous interesting properties. We write y N n (0, Σ) to emphasize the dimension (i.e., y R n.) 45 / 1

46 Important properties of MVN 1. Any affine transformation of y N(µ, Σ) again has MVN distribution. Lemma 4 Assume that y N n (µ, Σ) and let u := Ay + b where A R p n and b R p are nonrandom. Then, u N( µ, Σ), where µ = Aµ + b, Σ = AΣA T. Special case: Taking u = a T y for nonrandom a R n, we obtain a T y N(a T µ, a T Σa) 46 / 1

47 2. Marginal distributions of y N(µ, Σ) are again MVN. This a consequence of Lemma??: Suppose we partition y R n into y 1 R p and y 2 R n p : [ ] [ ] y1 Ip p 0 p (n p) = y }{{} y 1 + 0y 2 = y 1. 2 }{{} A y Thus y 1 = Ay, hence y 1 N(Aµ, A T ΣA): Aµ = [ I 0 ] [ ] µ 1 = µ µ 1 2 AΣA T = [ I 0 ] [ Σ 11 Σ 12 Σ 21 Σ 22 ] [ I 0 ] = Σ 11 So, y 1 N p (µ 1, Σ 1 ). Similarly, for y 2 N n p (µ 2, Σ 22 ). 47 / 1

48 3. Conditional distributions of y N(µ, Σ) are again MVN. Lemma 5 Assume that y N(µ, Σ) and partition y into two pieces y 1 and y 2. Then, y 1 y 2 N ( µ 1 2 (y 2 ), Σ 1 2 ) where µ 1 2 (y 2 ) = µ 1 + Σ 12 Σ 1 22 (y 2 µ 2 ). Σ 1 2 = Σ 11 Σ 12 Σ 1 22 ΣT 12. Note that if Σ 12 = 0, then y 1 y 2 N(µ 1, Σ 1 ), that is, y 1 and y 2 are independent. 48 / 1

49 4. Uncorrelatedness is equivalent to independence: If [ ] ( [ ] [ ] ) y1 0 Σ11 Σ N, 12 y 2 0 Σ 21 Σ 22 then y 1 and y 2 are independent if and only if Σ 12 = 0. Proof: Follows from Lemma?? as mentioned. Independence always implies uncorrelatedness, but not necessarily vice versa. In MVN however, the reverse implication holds as well. 49 / 1

50 Example 10 Let y N(0, Σ). Since Σ is PSD, it has a square-root, Σ 1/2. Let z = Σ 1/2 y. What is the distribution of z? We claim that z N(0, I n ): E(z) = Σ 1/2 E(y) = 0, and cov(z) = Σ 1/2 cov(y)σ 1/2 = Σ 1/2 ΣΣ 1/2 = I n. Recall that z = Σ 1/2 y is the whitened version of y. We can reduce many problems about y to problems about z. The advantage is iid z N(0, I n ) z 1,..., z n N(0, 1) i.e., z has independent identically distributed (iid) N(0, 1) coordinates. 50 / 1

51 Table of Contents 51 / 1

52 Table of Contents 52 / 1

53 Linear model A multiple linear regression (MLR) model, or simply a linear model, is: Population version: y = β 0 + β 1 x β p x p +ε }{{} µ y is the response, or dependent variable (observed). x j, j = 1,..., p are independent, or explanatory, or regressor variables, or predictors or covariates. β j are fixed unknown parameters (unobserved). x j are fixed i.e., deterministic for now (observed). Alternatively, we work conditioned on {x j }. ε is random: the noise, random error, unexplained variation. The only source of randomness in the model. Assumption: E[ε] = 0 so that The goal is to estimate β j. E[y] = µ + E[ε] = µ. 53 / 1

54 Examples Give an example: Explaining 100C grades. Gas consumption data. Oral contraceptives. 54 / 1

55 Sampled version Our data or observation is a collection of n i.i.d. samples from this model: We observe {y i, x i1,..., x ip }, i = 1,..., n, and Matrix-vector form: Let Then, y i = β 0 + β 1 x i1 + + β p x ip + ε i p = β 0 + β j x ij + ε i j=1 x j = (x 1j,..., x nj ) R n x 0 := 1 := (1,..., 1) R n ε = (ε 1,..., ε n ) y = (y 1,..., y n ) R n y = β β 1 x β p x p + ε p = β j x j + ε j=0 where everything is now a vector except β j. 55 / 1

56 Let β = (β 0, β 1,..., β p ) R p, and let X = (x 0 x 1 x p ) R n (p+1). Since X β = p j=0 β jx j, we have y = X β + ε. }{{} µ This is called multiple linear regression (MLR) model. Assumptions on the noise: (E1) E[ε] = 0, cov(ε) = σ 2 I n, and {ε i } independent. (E2) Distributional assumption: ε N(0, σ 2 I n ). (E1) implies that E[y] = µ = X β, and cov(y) = σ 2 I n (why?). (E2) implies y N(X β, σ 2 I n ). Assumptions on the design matrix: (X1) X R n (p+1) is fixed (nonrandom) and has full column rank, that is, p + 1 n and rank(x ) = p / 1

57 Table of Contents 57 / 1

58 Maximum likelihood estimation (MLE) We are interested in estimating both β and σ 2 (noise variance). The MLE requires a distributional assumption. Here we assume (E2). The PDF of y N(µ, σ 2 I ) is 1 ( f (y) = (2π) n/2 σ 2 I n exp 1 ) 1/2 2 (y µ)t (σ 2 I n ) 1 (y µ) 1 = (2π) n/2 (σ 2 ) exp ( 1 y µ 2) n/2 2σ2 Viewed as a function β and σ 2, this is the likelihood L(β, σ 2 1 y) = (2πσ 2 ) exp ( 1 y X β 2) n/2 2σ2 Let us estimate β first: Maximizing the likelihood is equivalent to minimizing S(β) := y X β 2 = n [y i (X β) i ] 2 The problem min β S(β) is called least-squares (LS) problem. i=1 58 / 1

59 Use chain rule to compute the gradient of S(β) and set to zero: S(β) β k = 2 n i=1 (X β) i β k [y i (X β) i ], (X β) i β k = X ik Hence S(β) β k = 2[X T (y X β)] k or S(β) = 2X T (y X β). Setting to zero gives the normal equations S( β) = 0 (X T X ) β = X T y If p + 1 n and if X R n (p+1) is full-rank, i.e., rank(x ) = p + 1, then X T X is invertible and we get β = (X T X ) 1 X T y. 59 / 1

60 Remark 1 Shown that the maximum likelihood estimate of β under a Gaussian assumption, is the same as the least-squares (LS) estimate. The LS estimate in general makes sense even if we have no distributional assumption. 60 / 1

61 Table of Contents 61 / 1

62 Geometric interpretation of LS. (Section 4.2.1) Least-squares problem is equivalent to min y X β R β 2 min y µ 2 p+1 µ Im(X ) since, we recall, Im(X ) = {X β : β R p+1 }. That is, we are trying to find the projection µ of y onto Im(X ): µ = argmin y µ 2 µ Im(X ) By orthogonality principle: The residual e = y µ Im(X ), i.e., y µ [Im(X )] = ker(x T ) X T (y µ) = 0 µ Im(X ) means there is at least one β R p+1 that µ = X β, and X T (y X β) = 0 which is the same normal equations for β. 62 / 1

63 y Im(X) µ ε e µ Remark 2 The projection µ is unique in general, but β need not be. If X is full (column) rank, then β is unique. 63 / 1

64 Consequences of geometric interpretation The residual vector: e = y µ = y X β R n. The vector of fitted values: µ = X β. Sometimes µ is also referred to as ŷ. The following hold: Recall that x j is the jth column of X. e Im(X ). Since x j Im(X ), we have { { n e 1 e x j, j = 1,..., p i=1 e i = 0 n i=1 e ix ij = 0, j = 1,..., p Residuals are orthogonal to all the covariate vectors. Since µ Im(X ), we have e µ i e i µ i = 0. S( β) = e 2 = min β S(β). 64 / 1

65 Table of Contents 65 / 1

66 Estimation of σ 2 Back to the likelihood, substituting β for β, L( β, σ 2 1 ( y) = (2πσ 2 ) exp 1 ) n/2 2σ 2 S( β) which we want to maximize over σ 2. Equivalent to maximizing log-likelihood: over σ 2, or maximizing: log L( β, σ 2 y) = n 2 log(2πσ2 ) S( β) 2σ 2 l( β, v y) := log L( β, v y) = const. 1 2 [n log v + v 1 S( β) ] over v. (Change of variable v = σ 2.) The problem reduces to v := argmax v>0 l( β, v y) = argmin v>0 Setting the derivative to zero gives σ 2 = v = S( β)/n. (Check that this is indeed the maximizer.) [ n log v + v 1 S( β) ]. 66 / 1

67 For reasons that will become clear later, we often use the following modified estimator: s 2 = Compare with the MLE for σ 2 : S( β) n (p + 1). σ 2 = S( β) n. We see that s 2 is unbiased for σ 2 while σ 2 is not. Note that S( β) is the sum of squares of residuals: n S( β) = (y i (X β) n i ) 2 = ei 2 = e 2. i=1 i=1 67 / 1

68 Table of Contents 68 / 1

69 The hat matrix Recall that β = (X T X ) 1 X T y. We have H is called the hat matrix. µ = X β = X (X T X ) 1 X T y = Hy. }{{} H It is an example of an orthogonal projection matrix. These matrices have a lot of properties. 69 / 1

70 Properties of projection matrices These matrices have a lot of properties: Lemma 6 For any vector y R n, Hy R n is the orthogonal projection of y onto Im(X ). H is symmetric: H T = H. (Exercise: use (BCD) T = D T C T B T.) H is idempotent: H 2 = H. (Exercise.) This is what we expect from a projection matrix. If we project Hy onto Im(X ), we should get back the same thing: H(Hy) = Hy for all y. A matrix is an orthogonal projection matrix if and only if it is symmetric and idempotent. I H is also a projection matrix (symmetric and idempotent): For every y R n, (I H)y is the projection of y onto [Im(X )]. Note that (I H)y = e. To summarize, we can decompose every y R n as y = µ + e = Hy + (I H)y. }{{}}{{} Im(X ) [Im(X )] 70 / 1

71 Sidenote: Gram matrix Write X = (x 0 x 1 x 2 x p ) R n (p+1) Here x j is the jth column of X. Verify the following useful result: for i, j = 0, 1,..., p + 1. (X T X ) ij = x T i x j = x i, x j Thus, the entries of X T X are the pairwise inner products of columns of X. For example (X T X ) ij = 0 means x i and x j are orthogonal. X T X is called the Gram matrix of {x 0, x 1,..., x p }. Exercise: Show that X T X is always PSD. 71 / 1

72 Table of Contents 72 / 1

73 Example 11 (Simple linear regression) This is the case p = 1, and the model is y = β β 1 x 1 + ε. We have X = [1 x 1 ] R n 2. Assumption (X1) is satisfied if n 2 and x 1 is not a constant multiple of 1. We have β = ( 1 n X T X ) 1 1 n X T y = ( ) 1 ( 1 x ȳ x x 2 xy 1 = x 2 ( x) 2 ) ( x 2 x x 1 ) ( ) ȳ. xy From which it follows that β 1 = xy xȳ x 2 ( x) 2 = ρ xy ρ xx, β0 = ȳ β 1 x where / 1

74 Example (Simple linear regression (cont d)) From which it follows that β 1 = xy xȳ x 2 ( x) 2 = ρ xy ρ xx, β0 = ȳ β 1 x where ρ xy = 1 n (x i x)(y i ȳ) = xy xȳ, n i=1 ρ xx = 1 (x i x) 2 = x n 2 ( x) 2. i The formula for β 0 is easier to see if one writes down the normal equations as ( 1 n X T X ) β = 1 n X T y and solves for β 0 in terms of β 1. Note also that [( var( β 1 ) = σ2 1 ) 1 ] n n X T X = σ2. 22 nρ xx 74 / 1

75 Table of Contents 75 / 1

76 Sampling distribution Both β and σ 2 are random quantities (due to the randomness in ε). The distribution of an estimate, called its sampling distribution, allows us to quantify the uncertainty in the estimate. Basic properties like the mean and, the variance and covariances can be determined under our basic assumption (E1). To determine the full sampling distribution, we need to assume some distribution for the noise vector ε, e.g. (E2). Recall that E[y] = µ = X β, cov(y) = σ 2 I. 76 / 1

77 Properties of β Let us write A = (X T X ) 1 X T R (p+1) n so that β = Ay Note that AX = I p+1. Proposition 3 Under the linear model, y = X β + ε, 1. β is an unbiased estimate β, i.e., E[ β] = β. 2. with covariance matrix cov( β) = σ 2 (X T X ) 1. Exercise: Prove this. For covariance, note AA T = (X T X ) / 1

78 Properties of a T β: Consider a T β for some nonrandom a R p+1. This is interesting since e.g., with a = (0, 1, 1, 0,..., 0) we have a T β = β1 β 2 In general, E[a T β] = a T E[ β] = a T β, var(a T β) = a T cov( β)a = σ 2 a T (X T X ) 1 a The first equation shows that a T β is an unbiased estimate of a T β. 78 / 1

79 Fitted values The vector of fitted values µ = X β = Hy. µ is an unbiased estimate µ: E[ µ] = X E[ β] = X β = µ. Approach 2: µ = X β so that µ Im(X ) hence Hµ = µ (Verify directly!): E[ µ] = HE[y] = Hµ = µ The covariance matrix is cov( µ) = cov(hy) = H(σ 2 I n )H T = σ 2 HH = σ 2 H. 79 / 1

80 Residuals The residual is e = y X β = y µ = (I H)y R n. We have E[e] = E[y µ] = E[y] E[ µ] = µ µ = 0, cov(e) = (I H)(σ 2 I n )(I H) T = σ 2 (I H) 2 = σ 2 (I H) 80 / 1

81 Joint behavior of ( β, e) R (p+1+n) 1 Recall A = (X T X ) 1 X T. Stack β on top of e. We have β = Ay and e = (I H)y. Since H = X (X T X ) 1 X T = XA, E ) ( ) ( β A = y. e I H }{{} P ) ( β = e ( ) ( E[ β] β =. (3) E[e] 0) 81 / 1

82 The covariance matrix is (Recall that I H is symmetric and idempotent): cov ( ( β ) ) = P cov(y)p T e = σ 2 PP T [ ] = σ 2 A [A T (I H) ] I H T [ ] = σ 2 A [A T I H ] I H [ ] = σ 2 AA T A(I H) (I H)A T I H [ ] σ = 2 (X T X ) σ 2 (I H) Note that first diagonal block matches the covariance of β as expected. 82 / 1

83 Sidenote Why A(I H) = 0? Algebraic calculation: AH = (X T X ) 1 X T [XA] = A. Geometric interpretation: A T = X (X T X ) 1 hence Im(A T ) Im(X ) (Check!). This means that H leaves A T intact: HA T = A T. 83 / 1

84 Important consequences Proposition 4 Under the linear regression model y = X β + ε and assumption (E1), cov ( ( β ) ) [ ] σ = 2 (X T X ) 1 0 e 0 σ 2 (I H) (4) where β is the LS estimate of the regression coefficient and e is the residual. Under (E1), β and e are uncorrelated. Under (E2), we have: ( β, e) has MVN normal distribution (why?) with mean vector (β, 0) and covariance matrix (??). β and e are independent (why?). S( β) = e 2 is independent of β (why?). Similarly, s 2 is independent of β. e and µ are independent. ( µ = X β is a function of β). 84 / 1

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

STAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017

STAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017 STAT 151A: Lab 1 Billy Fang 2 September 2017 1 Logistics Billy Fang (blfang@berkeley.edu) Office hours: Monday 9am-11am, Wednesday 10am-12pm, Evans 428 (room changes will be written on the chalkboard)

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

STAT 714 LINEAR STATISTICAL MODELS

STAT 714 LINEAR STATISTICAL MODELS STAT 714 LINEAR STATISTICAL MODELS Fall, 2011 Lecture Notes Instructor: Ian Dryden Based on the original notes by Joshua M Tebbs Department of Statistics The University of South Carolina CHAPTER 0 STAT

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2014 1 See last slide for copyright information. 1 / 37 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Lecture 1: Review of linear algebra

Lecture 1: Review of linear algebra Lecture 1: Review of linear algebra Linear functions and linearization Inverse matrix, least-squares and least-norm solutions Subspaces, basis, and dimension Change of basis and similarity transformations

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38 Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38 Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters

More information

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we

More information

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices. 3d scatterplots You can also make 3d scatterplots, although these are less common than scatterplot matrices. > library(scatterplot3d) > y par(mfrow=c(2,2)) > scatterplot3d(y,highlight.3d=t,angle=20)

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

identity matrix, shortened I the jth column of I; the jth standard basis vector matrix A with its elements a ij

identity matrix, shortened I the jth column of I; the jth standard basis vector matrix A with its elements a ij Notation R R n m R n m r R n s real numbers set of n m real matrices subset of R n m consisting of matrices with rank r subset of R n n consisting of symmetric matrices NND n subset of R n s consisting

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Econ 2120: Section 2

Econ 2120: Section 2 Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Eigenvectors and SVD 1

Eigenvectors and SVD 1 Eigenvectors and SVD 1 Definition Eigenvectors of a square matrix Ax=λx, x=0. Intuition: x is unchanged by A (except for scaling) Examples: axis of rotation, stationary distribution of a Markov chain 2

More information

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84 Chapter 2 84 Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random variables. For instance, X = X 1 X 2.

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, 2. REVIEW OF LINEAR ALGEBRA 1 Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, where Y n 1 response vector and X n p is the model matrix (or design matrix ) with one row for

More information

WLS and BLUE (prelude to BLUP) Prediction

WLS and BLUE (prelude to BLUP) Prediction WLS and BLUE (prelude to BLUP) Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 21, 2018 Suppose that Y has mean X β and known covariance matrix V (but Y need

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

MATH 240 Spring, Chapter 1: Linear Equations and Matrices MATH 240 Spring, 2006 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 8th Ed. Sections 1.1 1.6, 2.1 2.2, 3.2 3.8, 4.3 4.5, 5.1 5.3, 5.5, 6.1 6.5, 7.1 7.2, 7.4 DEFINITIONS Chapter 1: Linear

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

ELEMENTS OF MATRIX ALGEBRA

ELEMENTS OF MATRIX ALGEBRA ELEMENTS OF MATRIX ALGEBRA CHUNG-MING KUAN Department of Finance National Taiwan University September 09, 2009 c Chung-Ming Kuan, 1996, 2001, 2009 E-mail: ckuan@ntuedutw; URL: homepagentuedutw/ ckuan CONTENTS

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold Appendix B Y Mathematical Refresher This appendix presents mathematical concepts we use in developing our main arguments in the text of this book. This appendix can be read in the order in which it appears,

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016 Linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 5, 2016 1 / 16 Outline for today linear models least squares estimation orthogonal projections estimation

More information

STAT 8260 Theory of Linear Models Lecture Notes

STAT 8260 Theory of Linear Models Lecture Notes STAT 8260 Theory of Linear Models Lecture Notes Classical linear models are at the core of the field of statistics, and are probably the most commonly used set of statistical techniques in practice. For

More information

Designing Information Devices and Systems II

Designing Information Devices and Systems II EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

MAT Linear Algebra Collection of sample exams

MAT Linear Algebra Collection of sample exams MAT 342 - Linear Algebra Collection of sample exams A-x. (0 pts Give the precise definition of the row echelon form. 2. ( 0 pts After performing row reductions on the augmented matrix for a certain system

More information

Random Vectors and Multivariate Normal Distributions

Random Vectors and Multivariate Normal Distributions Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random 75 variables. For instance, X = X 1 X 2., where each

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Review of Some Concepts from Linear Algebra: Part 2

Review of Some Concepts from Linear Algebra: Part 2 Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

ESTIMATION THEORY. Chapter Estimation of Random Variables

ESTIMATION THEORY. Chapter Estimation of Random Variables Chapter ESTIMATION THEORY. Estimation of Random Variables Suppose X,Y,Y 2,...,Y n are random variables defined on the same probability space (Ω, S,P). We consider Y,...,Y n to be the observed random variables

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5. 2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB Glossary of Linear Algebra Terms Basis (for a subspace) A linearly independent set of vectors that spans the space Basic Variable A variable in a linear system that corresponds to a pivot column in the

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark March 22, 2017 WLS and BLUE (prelude to BLUP) Suppose that Y has mean β and known covariance matrix V (but Y need not

More information

Pseudoinverse & Moore-Penrose Conditions

Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Linear Algebra Formulas. Ben Lee

Linear Algebra Formulas. Ben Lee Linear Algebra Formulas Ben Lee January 27, 2016 Definitions and Terms Diagonal: Diagonal of matrix A is a collection of entries A ij where i = j. Diagonal Matrix: A matrix (usually square), where entries

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Chapter 4 Euclid Space

Chapter 4 Euclid Space Chapter 4 Euclid Space Inner Product Spaces Definition.. Let V be a real vector space over IR. A real inner product on V is a real valued function on V V, denoted by (, ), which satisfies () (x, y) = (y,

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Ch4. Distribution of Quadratic Forms in y

Ch4. Distribution of Quadratic Forms in y ST4233, Linear Models, Semester 1 2008-2009 Ch4. Distribution of Quadratic Forms in y 1 Definition Definition 1.1 If A is a symmetric matrix and y is a vector, the product y Ay = i a ii y 2 i + i j a ij

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

1. General Vector Spaces

1. General Vector Spaces 1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule

More information

Review of some mathematical tools

Review of some mathematical tools MATHEMATICAL FOUNDATIONS OF SIGNAL PROCESSING Fall 2016 Benjamín Béjar Haro, Mihailo Kolundžija, Reza Parhizkar, Adam Scholefield Teaching assistants: Golnoosh Elhami, Hanjie Pan Review of some mathematical

More information

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution Relevant

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013 Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm 2 Chapter Gaussian Vectors.

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

1 9/5 Matrices, vectors, and their applications

1 9/5 Matrices, vectors, and their applications 1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric

More information