STAT 100C: Linear models

Size: px
Start display at page:

Download "STAT 100C: Linear models"

Transcription

1 STAT 100C: Linear models Arash A. Amini June 9, / 56

2 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 2 / 56

3 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 3 / 56

4 Linear model A multiple linear regression (MLR) model, or simply a linear model, is: Population version: y = β 0 + β 1 x β p x p +ε }{{} µ y is the response, or dependent variable (observed). x j, j = 1,..., p are independent, or explanatory, or regressor variables, or predictors or covariates. β j are fixed unknown parameters (unobserved). x j are fixed i.e., deterministic for now (observed). Alternatively, we work conditioned on {x j }. ε is random: the noise, random error, unexplained variation. The only source of randomness in the model. Assumption: E[ε] = 0 so that The goal is to estimate β j. E[y] = µ + E[ε] = µ. 4 / 56

5 Examples Give an example: Explaining 100C grades. Gas consumption data. Oral contraceptives. 5 / 56

6 Sampled version Our data or observation is a collection of n i.i.d. samples from this model: We observe {y i, x i1,..., x ip }, i = 1,..., n, and Matrix-vector form: Let Then, y i = β 0 + β 1 x i1 + + β p x ip + ε i p = β 0 + β j x ij + ε i j=1 x j = (x 1j,..., x nj ) R n x 0 := 1 := (1,..., 1) R n ε = (ε 1,..., ε n ) y = (y 1,..., y n ) R n y = β β 1 x β p x p + ε p = β j x j + ε j=0 where everything is now a vector except β j. 6 / 56

7 Let β = (β 0, β 1,..., β p ) R p, and let X = (x 0 x 1 x p ) R n (p+1). Since X β = p j=0 β jx j, we have y = X β + ε. }{{} µ This is called multiple linear regression (MLR) model. 7 / 56

8 Assumptions Assumptions on the noise: y = X β + ε. }{{} µ (E1) E[ε] = 0, cov(ε) = σ 2 I n, and {ε i } independent. (E2) Distributional assumption: ε N(0, σ 2 I n ). (E1) implies that E[y] = µ = X β, and cov(y) = σ 2 I n (why?). (E2) implies y N(X β, σ 2 I n ). Assumptions on the design matrix: (X1) X R n (p+1) is fixed (nonrandom) and has full column rank, that is, p + 1 n and rank(x ) = p / 56

9 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 9 / 56

10 Maximum likelihood estimation (MLE) We are interested in estimating both β and σ 2 (noise variance). The MLE requires a distributional assumption. Here we assume (E2). The PDF of y N(µ, σ 2 I ) is 1 ( f (y) = (2π) n/2 σ 2 I n exp 1 ) 1/2 2 (y µ)t (σ 2 I n ) 1 (y µ) 1 = (2π) n/2 (σ 2 ) exp ( 1 y µ 2) n/2 2σ2 Viewed as a function β and σ 2, this is the likelihood L(β, σ 2 1 y) = (2πσ 2 ) exp ( 1 y X β 2) n/2 2σ2 Let us estimate β first: Maximizing the likelihood is equivalent to minimizing S(β) := y X β 2 = n [y i (X β) i ] 2 The problem min β S(β) is called least-squares (LS) problem. i=1 10 / 56

11 Use chain rule to compute the gradient of S(β) and set to zero: S(β) β k = 2 n i=1 (X β) i β k [y i (X β) i ], (X β) i β k = X ik Hence S(β) β k = 2[X T (y X β)] k or S(β) = 2X T (y X β). Setting to zero gives the normal equations S( β) = 0 (X T X ) β = X T y If p + 1 n and if X R n (p+1) is full-rank, i.e., rank(x ) = p + 1, then X T X is invertible and we get β = (X T X ) 1 X T y. 11 / 56

12 Remark 1 Shown that the maximum likelihood estimate of β under a Gaussian assumption, is the same as the least-squares (LS) estimate. The LS estimate in general makes sense even if we have no distributional assumption. 12 / 56

13 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 13 / 56

14 Geometric interpretation of LS. (Section 4.2.1) Least-squares problem is equivalent to min y X β R β 2 min y µ 2 p+1 µ Im(X ) since, we recall, Im(X ) = {X β : β R p+1 }. That is, we are trying to find the projection µ of y onto Im(X ): µ = argmin y µ 2 µ Im(X ) By orthogonality principle: The residual e = y µ Im(X ), i.e., y µ [Im(X )] = ker(x T ) X T (y µ) = 0 µ Im(X ) means there is at least one β R p+1 that µ = X β, and X T (y X β) = 0 which is the same normal equations for β. 14 / 56

15 y Im(X) µ ε e µ Remark 2 The projection µ is unique in general, but β need not be. If X is full (column) rank, then β is unique. 15 / 56

16 Consequences of geometric interpretation The residual vector: e = y µ = y X β R n. The vector of fitted values: µ = X β. Sometimes µ is also referred to as ŷ. The following hold: Recall that x j is the jth column of X. e Im(X ). Since x j Im(X ), we have { { n e 1 e x j, j = 1,..., p i=1 e i = 0 n i=1 e ix ij = 0, j = 1,..., p Residuals are orthogonal to all the covariate vectors. Since µ Im(X ), we have e µ i e i µ i = 0. S( β) = e 2 = min β S(β). 16 / 56

17 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 17 / 56

18 Estimation of σ 2 Back to the likelihood, substituting β for β, L( β, σ 2 1 ( y) = (2πσ 2 ) exp 1 ) n/2 2σ 2 S( β) which we want to maximize over σ 2. Equivalent to maximizing log-likelihood: over σ 2, or maximizing: log L( β, σ 2 y) = n 2 log(2πσ2 ) S( β) 2σ 2 l( β, v y) := log L( β, v y) = const. 1 2 [n log v + v 1 S( β) ] over v. (Change of variable v = σ 2.) The problem reduces to v := argmax v>0 l( β, v y) = argmin v>0 Setting the derivative to zero gives σ 2 = v = S( β)/n. (Check that this is indeed the maximizer.) [ n log v + v 1 S( β) ]. 18 / 56

19 For reasons that will become clear later, we often use the following modified estimator: s 2 = Compare with the MLE for σ 2 : S( β) n (p + 1). σ 2 = S( β) n. We will see that s 2 is unbiased for σ 2 while σ 2 is not. Note that S( β) is the sum of squares of residuals: n S( β) = (y i (X β) n i ) 2 = ei 2 = e 2. i=1 i=1 19 / 56

20 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 20 / 56

21 The hat matrix Recall that β = (X T X ) 1 X T y. We have H is called the hat matrix. µ = X β = X (X T X ) 1 X T y = Hy. }{{} H It is an example of an orthogonal projection matrix. These matrices have a lot of properties. 21 / 56

22 Properties of projection matrices These matrices have a lot of properties: Lemma 1 For any vector y R n, Hy R n is the orthogonal projection of y onto Im(X ). H is symmetric: H T = H. (Exercise: use (BCD) T = D T C T B T.) H is idempotent: H 2 = H. (Exercise.) This is what we expect from a projection matrix. If we project Hy onto Im(X ), we should get back the same thing: H(Hy) = Hy for all y. A matrix is an orthogonal projection matrix if and only if it is symmetric and idempotent. I H is also a projection matrix (symmetric and idempotent): For every y R n, (I H)y is the projection of y onto [Im(X )]. Note that (I H)y = e. To summarize, we can decompose every y R n as y = µ + e = Hy + (I H)y. }{{}}{{} Im(X ) [Im(X )] 22 / 56

23 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 23 / 56

24 Gram matrix Write X = (x 0 x 1 x 2 x p ) R n (p+1) Here x j is the jth column of X. Verify the following useful result: for i, j = 0, 1,..., p + 1. (X T X ) ij = x T i x j = x i, x j Thus, the entries of X T X are the pairwise inner products of columns of X. For example (X T X ) ij = 0 means x i and x j are orthogonal. X T X is called the Gram matrix of {x 0, x 1,..., x p }. Exercise: Show that X T X is always PSD. 24 / 56

25 Gram matrix (Optional) Theorem 1 (Rank of Gram matrix) Let G = X T X be the Gram matrix of X. Then rank(g) = rank(x ). If X R n (p+1), then G R (p+1) (p+1) and the theorem says that rank(x ) = p + 1 if and only if rank(g) = p + 1, i.e., G is invertible. Proof: Recall that if A R n (p+1) then rank(a) + null(a) = p + 1. Since G and X have the same input dimension, enough to show: null(g) = null(x ). Assume that u ker(g), that is, X T Xu = 0. Then u T X T Xu = 0 = Xu 2 = 0 = Xu = 0, i.e., u ker(x ). Now assume u ker(x ), i.e. Xu = 0. Then, X T Xu = 0, i.e. u ker(g). Thus, we have shown ker(g) = ker(x ). Recalling that nullity is the dimension of the kernel, we are done. An alternative simple proof is based on SVD of X. 25 / 56

26 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 26 / 56

27 Example 1 (Simple linear regression) This is the case p = 1, and the model is y = β β 1 x 1 + ε. We have X = [1 x 1 ] R n 2. Assumption (X1) is satisfied if n 2 and x 1 is not a constant multiple of 1. We have β = ( 1 n X T X ) 1 1 n X T y = ( ) 1 ( 1 x ȳ x x 2 xy 1 = x 2 ( x) 2 ) ( x 2 x x 1 ) ( ) ȳ. xy From which it follows that β 1 = xy xȳ x 2 ( x) 2 = ρ xy ρ xx, β0 = ȳ β 1 x where / 56

28 Example (Simple linear regression (cont d)) From which it follows that β 1 = xy xȳ x 2 ( x) 2 = ρ xy ρ xx, β0 = ȳ β 1 x where ρ xy = 1 n (x i x)(y i ȳ) = xy xȳ, n i=1 ρ xx = 1 (x i x) 2 = x n 2 ( x) 2. i The formula for β 0 is easier to see if one writes down the normal equations as ( 1 n X T X ) β = 1 n X T y and solves for β 0 in terms of β 1. Note also that [( var( β 1 ) = σ2 1 ) 1 ] n n X T X = σ2. 22 nρ xx 28 / 56

29 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 29 / 56

30 Sampling distribution Both β and σ 2 are random quantities (due to the randomness in ε). The distribution of an estimate, called its sampling distribution, allows us to quantify the uncertainty in the estimate. Basic properties like the mean and, the variance and covariances can be determined under our basic assumption (E1). To determine the full sampling distribution, we need to assume some distribution for the noise vector ε, e.g. (E2). Recall that E[y] = µ = X β, cov(y) = σ 2 I. 30 / 56

31 Properties of β Let us write A = (X T X ) 1 X T R (p+1) n so that β = Ay Note that AX = I p+1. Proposition 1 Under the linear model, y = X β + ε, 1. β is an unbiased estimate β, i.e., E[ β] = β. 2. with covariance matrix cov( β) = σ 2 (X T X ) 1. Exercise: Prove this. For covariance, note AA T = (X T X ) / 56

32 Properties of a T β: Consider a T β for some nonrandom a R p+1. This is interesting since e.g., with a = (0, 1, 1, 0,..., 0) we have a T β = β1 β 2 In general, E[a T β] = a T E[ β] = a T β, var(a T β) = a T cov( β)a = σ 2 a T (X T X ) 1 a The first equation shows that a T β is an unbiased estimate of a T β. 32 / 56

33 Fitted values The vector of fitted values µ = X β = Hy. µ is an unbiased estimate µ: E[ µ] = X E[ β] = X β = µ. Approach 2: µ = X β so that µ Im(X ) hence Hµ = µ (Verify directly!): E[ µ] = HE[y] = Hµ = µ The covariance matrix is cov( µ) = cov(hy) = H(σ 2 I n )H T = σ 2 HH = σ 2 H. 33 / 56

34 Residuals The residual is e = y X β = y µ = (I H)y R n. We have E[e] = E[y µ] = E[y] E[ µ] = µ µ = 0, cov(e) = (I H)(σ 2 I n )(I H) T = σ 2 (I H) 2 = σ 2 (I H) 34 / 56

35 Joint behavior of ( β, e) R (p+1+n) 1 Recall A = (X T X ) 1 X T. Stack β on top of e. We have β = Ay and e = (I H)y. Since H = X (X T X ) 1 X T = XA, E ) ( ) ( β A = y. e I H }{{} P ) ( β = e ( ) ( E[ β] β =. (1) E[e] 0) 35 / 56

36 The covariance matrix is (Recall that I H is symmetric and idempotent): cov ( ( β ) ) = P cov(y)p T e = σ 2 PP T [ ] = σ 2 A [A T (I H) ] I H T [ ] = σ 2 A [A T I H ] I H [ ] = σ 2 AA T A(I H) (I H)A T I H [ ] σ = 2 (X T X ) σ 2 (I H) Note that first diagonal block matches the covariance of β as expected. 36 / 56

37 Sidenote Why A(I H) = 0? Algebraic calculation: AH = (X T X ) 1 X T [XA] = A. Geometric interpretation: A T = X (X T X ) 1 hence Im(A T ) Im(X ) (Check!). This means that H leaves A T intact: HA T = A T. 37 / 56

38 Important consequences Proposition 2 Under the linear regression model y = X β + ε and assumption (E1), cov ( ( β ) ) [ ] σ = 2 (X T X ) 1 0 e 0 σ 2 (I H) (2) where β is the LS estimate of the regression coefficient and e is the residual. Under (E1), β and e are uncorrelated. Under (E2), we have: ( β, e) has MVN normal distribution (why?) with mean vector (β, 0) and covariance matrix (2). β and e are independent (why?). S( β) = e 2 is independent of β (why?). Similarly, s 2 is independent of β. e and µ are independent. ( µ = X β is a function of β). 38 / 56

39 Distribution of s 2? To answer this question, need to talk about quadratic forms (QF). For now, let us just accept the following result, assuming (E2): Recall that s 2 = S( β) σ 2 = e 2 σ 2 χ 2 n (p+1). (3) S( β) n (p + 1) = e 2 n (p + 1). We know that the mean of χ 2 n (p+1) is n (p + 1). Hence, E[s 2 /σ 2 ] = 1 showing that s 2 is unbiased for σ / 56

40 Sidenote (optional) Could have shown unbiasedness of s 2 directly without knowing (3). Under assumption (E1), E e 2 = n var(e i ) Eei 2 = i=1 i = tr(cov(e)) = tr(i H) = n p 1... needs some arguing using the rank of I H which is n p 1. Remark 3 The rank of an orthogonal projection matrix is equal to the dimension of the space it is projecting onto: rank(h) = dim(im(h)). I H is projecting onto [Im(X )], hence rank(i H) = dim([im X ] ) = n dim(im X ) = n rank(x ) = n p 1 40 / 56

41 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 41 / 56

42 Gauss Markov Theorem Says that in the linear regression model LSE of β is BLUE. LSE = Least squares estimator. BLUE = Best linear unbiased estimator. Among all linear unbiased estimators of β, β = (X T X ) 1 X T y has the smallest variance (it says more). It shows that LSE is in some sense optimal. 42 / 56

43 PSD matrices revisited A matrix A R n n is PSD of x T Ax 0 for all x R n. We write A 0 when A is PSD. We also write A B iff A B 0. Lemma 2 A matrix A R n n is PSD if and only if A = ZZ T for some matrix Z R n m. If A 0 then A = A 1/2 A 1/2. Take Z = A 1/2. In the other direction, assume A = ZZ T for some Z. Then, x T Ax = x T ZZ T x = (Z T x) T (Z T x) = Z T x 2 0. Exercise: If A is PSD, any principal submatirx of A is PSD. In particular A ii 0 (diagonal elements are nonnegative). 43 / 56

44 Theorem 2 (Gauss-Markov) Consider the linear regression model y = X β + ε with E[ε] = 0, cov(ε) = σ 2 I n. Let β = (X T X ) 1 X T y be the LSE of β. Let ˆb be any other unbiased linear estimator of β Then cov(ˆb) cov( β) that is, cov(ˆb) cov( β) is PSD. This implies var(ˆb j ) var( β j ) 0 for all j. 44 / 56

45 Proof We have ˆb = My for some matrix M R p+1 n, and E[ˆb] = β for all β R p+1 (why?). Expanding the above, conclude that MX = I. We also recall that β = Ay where A = (X T X ) 1 X T (and E[ β] = β). What are the covariance matrices? Let M := M A. Then, and we have cov(ˆb) = σ 2 MM T cov( β) = σ 2 AA T = σ 2 (X T X ) 1 M M T = MM T MA T AM T + AA T MA T = MX (X T X ) 1 = AA T It follows that M M T = MM T AA T. Hence, cov(ˆb) cov( β) = σ 2 M MT / 56

46 Remark 4 Gauss-Markov only says that β is the best within the class of linear unbiased estimates. It might be that a nonlinear unbiased estimator has a smaller variance. However, if we assume normality, i.e., ε N(0, σ 2 I ), then β is the best unbiased estimator, among all estimators linear or nonlinear. This is a consequence of the Cramer Rao bound: cov( β) = [I (β)] 1 where I (β) is the Fisher information matrix. Remark 5 A consequence of Gauss Markov is that a T β is BLUE for a T β (for any a). 46 / 56

47 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix Simple linear regression (as special case) Statistical properties of estimators and related vectors Gauss Markov Theorem Generalized Least-Squares 47 / 56

48 Generalized least-squares (GLS) ( 4.6) Now relax the assumption of uncorrelated noise with equal variance: y = X β + ε, cov(ε) = σ 2 I. Some applications when this assumption is not reasonable: Case 1: Each y i is obtained from averaging m i independent measurements. If noise in each measurement has var. = σ 2, then var(y i ) = var(ε i ) = σ2 m i. m i could be different for i = 1,..., n: m 1 1 cov(ε) = σ 2 m m n. 48 / 56

49 Case 2: Serial correlation among noise variables. Often happens when dealing with time series. For example: y t = β 0 + β 1 x 1t + β 2 x 2t + ε t where y t sales quantity x 1t price x 2t how much spent on ads. ε t error; could contain hidden variables not included in the model like economic factors Reasonable to assume that y t depends on y t 1 in some form. One way to model this is to assume dependence among {ε t }. For example, a first order autoregressive, a.k.a. AR(1), process for t = 1,..., n and ε 0 = 0. ε t = φ ε t 1 + z t, z t iid N(0, σ 2 ) 49 / 56

50 For example, a first order autoregressive, a.k.a. AR(1), process iid ε t = φ ε t 1 + z t, z t N(0, σ 2 ) for t = 1,..., n and ε 0 = 0. Exercise: Show that cov(ε t, ε t 1 ) = σ 2 φ r, r t 1. The covariance matrix has Toeplitz structure: 1 φ φ 2 φ 3 φ n 1 φ 1 φ φ 2 φ n 2 cov(ε) = σ 2 φ 2 φ 1 φ φ n 3 φ 3 φ 2 φ 1 φ n φ n 1 1 Case 3: Spatial observations. 50 / 56

51 Generalized least-squares (GLS) Consider y = X β + ε but with more general noise covariance: cov(ε) = σ 2 V We assume V R n n is known but σ 2 is unknown. Assume that V is nonsingular (i.e., invertible). Since V 0, it makes sense to take about V 1/2. Because of nonsingularity, V 1/2 also makes sense. Consider Note that V 1/2 y }{{} ỹ = V } 1/2 {{ X } X β + V 1/2 ε }{{} ε cov( ε) = V 1/2 cov(ε)v 1/2 = σ 2 V 1/2 VV 1/2 = σ 2 I. So, by transforming the measurements and the design matrix, we get back the standard regression setup (i.e., uncorrelated noise vector). 51 / 56

52 Why transform? Gauss Markov only applies to the uncorrelated case. By Gauss Markov the following is the BLUE estimator of β: β GLS = ( X T X ) 1 X T ỹ = (X T V 1 X ) 1 X T V 1 y (Do the plug-in as an exercise.) We have E[ β GLS ] = β and cov( β GLS ) = σ 2 ( X T X ) 1 = σ 2 (X T V 1 X ) 1 52 / 56

53 SSE will be where e GLS = y X β GLS. S( β GLS ) = ỹ X β GLS 2 = e T GLSV 1 e GLS Since the transformed model is standard regression, we have S( β GLS ) σ 2 χ 2 n p 1 and the unbiased estimate of the variance is s 2 GLS = S( β GLS ) n p 1 53 / 56

54 What about the ordinary least-squares estimate β OLS, which ignores the noise dependence (derived under wrong assumptions)? β OLS = (X T X ) 1 X T y is still unbiased. However, Gauss Markov implies that β OLS has higher variance than β GLS : cov( β OLS ) cov( β GLS ). Exercise: Show that β GLS = (X T V 1 X ) 1 X T V 1 y is the solution of the following (quadratic) optimization problem min Q(β) β where Q(β) = (y X β) T V 1 (y X β) = V 1/2 (y X β) 2 54 / 56

55 Weighted least squares (WLS) Assume that we want to weigh the samples differently. (I.e., discount or emphasize the effect of some samples.) Given weights w 1,..., w n we can solve min β S w (β) = n i=1 w i (y i x T i β) 2 where xi T is the i th row of X. The objective can be written as S w (β) = (y X β) T W (y X β) where w 1 w 2 W =.... wn Thus WLS can be considered a special case of GLS with V 1 = W. 55 / 56

56 The solution simplifies to with covariance matrix ( n ) 1 ( n ) β WLS = w i x i xi T w i x i y i i=1 cov( β WLS ) = σ 2[ n i=1 i=1 ] 1 w i x i xi T 56 / 56

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Lecture 20: Linear model, the LSE, and UMVUE

Lecture 20: Linear model, the LSE, and UMVUE Lecture 20: Linear model, the LSE, and UMVUE Linear Models One of the most useful statistical models is X i = β τ Z i + ε i, i = 1,...,n, where X i is the ith observation and is often called the ith response;

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A ) Econ 60 Matrix Differentiation Let a and x are k vectors and A is an k k matrix. a x a x = a = a x Ax =A + A x Ax x =A + A x Ax = xx A We don t want to prove the claim rigorously. But a x = k a i x i i=

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

The Gauss-Markov Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 61

The Gauss-Markov Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 61 The Gauss-Markov Model Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 61 Recall that Cov(u, v) = E((u E(u))(v E(v))) = E(uv) E(u)E(v) Var(u) = Cov(u, u) = E(u E(u)) 2 = E(u 2

More information

Lecture 34: Properties of the LSE

Lecture 34: Properties of the LSE Lecture 34: Properties of the LSE The following results explain why the LSE is popular. Gauss-Markov Theorem Assume a general linear model previously described: Y = Xβ + E with assumption A2, i.e., Var(E

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

1. The OLS Estimator. 1.1 Population model and notation

1. The OLS Estimator. 1.1 Population model and notation 1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

Economics 620, Lecture 2: Regression Mechanics (Simple Regression) 1 Economics 620, Lecture 2: Regression Mechanics (Simple Regression) Observed variables: y i ; x i i = 1; :::; n Hypothesized (model): Ey i = + x i or y i = + x i + (y i Ey i ) ; renaming we get: y i =

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure)

COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure) STAT 52 Completely Randomized Designs COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure) Completely randomized means no restrictions on the randomization

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27 Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

Lecture Notes on Different Aspects of Regression Analysis

Lecture Notes on Different Aspects of Regression Analysis Andreas Groll WS 2012/2013 Lecture Notes on Different Aspects of Regression Analysis Department of Mathematics, Workgroup Financial Mathematics, Ludwig-Maximilians-University Munich, Theresienstr. 39,

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

COPYRIGHT. Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006

COPYRIGHT. Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006 COPYRIGHT Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006 4 Multiple Linear Regression Model 4.1 INTRODUCTION In this chapter we consider the general linear

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance: 8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18) Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18) MichaelR.Roberts Department of Economics and Department of Statistics University of California

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION.

SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION. SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION. Tatjana Pavlenko 17 January 2018 WHAT IS REGRESSION? INTRODUCTION Regression analysis is a statistical technique for investigating and modeling

More information

WLS and BLUE (prelude to BLUP) Prediction

WLS and BLUE (prelude to BLUP) Prediction WLS and BLUE (prelude to BLUP) Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 21, 2018 Suppose that Y has mean X β and known covariance matrix V (but Y need

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

Instrumental Variables

Instrumental Variables Università di Pavia 2010 Instrumental Variables Eduardo Rossi Exogeneity Exogeneity Assumption: the explanatory variables which form the columns of X are exogenous. It implies that any randomness in the

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 +  1 y 2 = + x 2 +  2 :::::::: :::::::: y N = + x N +  N 1 Economics 620, Lecture 4: The K-Variable Linear Model I Consider the system y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N or in matrix form y = X + " where y is N 1, X is N

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

Multivariate Regression: Part I

Multivariate Regression: Part I Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a

More information

Multiple Regression Model: I

Multiple Regression Model: I Multiple Regression Model: I Suppose the data are generated according to y i 1 x i1 2 x i2 K x ik u i i 1...n Define y 1 x 11 x 1K 1 u 1 y y n X x n1 x nk K u u n So y n, X nxk, K, u n Rks: In many applications,

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

The Statistical Property of Ordinary Least Squares

The Statistical Property of Ordinary Least Squares The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

STAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017

STAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017 STAT 151A: Lab 1 Billy Fang 2 September 2017 1 Logistics Billy Fang (blfang@berkeley.edu) Office hours: Monday 9am-11am, Wednesday 10am-12pm, Evans 428 (room changes will be written on the chalkboard)

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information