Standard Linear Regression Model (SLM)

Size: px
Start display at page:

Download "Standard Linear Regression Model (SLM)"

Transcription

1 Ordinary Least Squares Estimator (OLSE Nathan Smooha Abstract Estimation of Standard Linear Model - Ordinary Least Squares Estimator: Model specfication, objective function, finite sample and asymptotic properties of the OLSE, measurement of Goodness of Fit, Tests of Linear Hypotheses, Restriced LSE Standard Linear Regression Model (SLM Linear Regression Model The data in economics usually cannot be generated by experiments, so both the dependent and independent variables have to be treated as random variables, variables whose values are subject to chance. A model is a set of restictions on the joint distribtuion of the dependent and independent variables. That is, a model is a set of joint distributions satisfying a set of assumptions. A linear regression model specifies the dependent variable (regressand y t as a sum of linear function of the K observable explanatory variables (regressors X tk, k = 1,2,...,K, and an unobersable error term u t, y t = X t β + u t, t = 1,2,...,N (1 where the subscript t indiciates the t th observation, X t is a K-dimensional row vector and β is a K-dimensional column vector of unknonw coefficients. This is written in a matrix form for the entire N obersvations as y = Xβ + u where the dimensions of the matrices are y : N 1, β : K 1, and u : N 1 Ordinary Least Squares (OLS Estimation Method The OLS estimation method does not require information about the statistical properties of the variables in the model. It just needs to specify a linear (or nonlinear regression equation. Properties of the OLS estimator of course depends on the statistical properties of the variables. Although we do not observe the error term, we can calculate the value implied by a hypothetical value, β, of β as y t X t β this is called the residual for observation t. From this, form the sum of squared residuals (SSR: SSR( β N t=1 (y t X t β 2 = (y X β (y X β (2 1

2 This is also called the error sum of squares (ESS or the residual sum of squares (RSS. It is a function of β because the residual depends on it. The OLS estimate, ˆβ, of β is the β that minimizes this function: ˆβ argmin SSR( β (3 β As an example, if K = 2 and the first column of X is all 1 s, then we want to find ˆβ 1 and ˆβ 2 such that the estimated regression line, ŷ t = ˆβ 1 + x t ˆβ2, is close to the data points. Closeness is measured by the SSR. So, the least squares method of estimation of unknown coefficeints β k finds the vector ˆβ that minimizes the sum of squared residuals. Since it depends on the sample (y,x, the OLS estimate ˆβ is in general different from the true value β. By having squared residuals in the objective function, this method imposes a heavy penalty on large residuals; the OLS estimate is chosen to prevent large residuals for a few observations at the expense of tolerating relatively small residuals for many other observations. A sure-fire way of solving the minimization problem is to derive the first-order conditions by setting the partial derivatives equal to zero 1 SSR( β = (y X β (y X β = (y β X (y X β = y y β X y y X β + β X X β = y y 2y X β + β X X β (since β X y is a scalar and it equals its transpose y X β y y 2a β + β A β (with a X y and A X X Recalling from matrix algebra that ( a β ( β A β = a and = 2A β for A symmetric β β the K-dimensional column vector of partial derivatives is SSR( β = 2a + 2A β β The first-order conditions are obtained by setting the partial derivative above equal to zero. X y and A X X, we can write the first-order conditions as X X ˆβ = X y Recalling that a Here, we have replaced β by ˆβ because the OLS estimate ˆβ is the β that satisfies the first-order conditions. These K equations are called the normal equations. The OLS predictor, or the vector of fitted values, of the dependent variable is defined as ŷ = X ˆβ The vector of residuals evaluated at β = ˆβ, û y X ˆβ = y ŷ is called the vector of OLS residuals. Its t-th element is û t y t X t ˆβ. To be sure, the first-order conditions are just a necessary condition for minimization, so we have to check the second-order condition to make sure that ˆβ achieves, the minimum. 2 SSR( β = 2X X > 0 β β The second-order condition is satisfied since X X is positive definite. 1 If h : R k R is a scalar-valued function of a K-dimensional column vector x, the derivative of h with respect to x is a K-dimensional column vector whose k-th element is h(x x where x k k is the k-th element of x. This K-dimensional column vector is called the gradient. Here, the x is β and the function h(x is SSR( β. (4 (5 (6 Page 2 of 20

3 Finite-Sample Properties of the OLSE Since the estimator ˆβ depends on the random sample (y,x, then it is also random, and thus, it has statistical properties. Note, the characteristics of the distribution of the estimator we will derive are valid for any given sample size n. The OLSE is the best linear unbiased estimator (BLUE under certain assumptions. We present the minimum set of assumptions for each property of the OLSE. Assumption 1: rank(x = K and rank(x,y = K + 1 The assumption of rank(x = K implies that N K, and the columns of X (observation vectors of each regressor are linearly independent, i.e., none of the columns can be expressed as a linear combination of the other columns. This assumption is called the assumption of full column rank. The regressors are said to be (perfectly multicollinear if the assumption is not satisfied. The assumption of rank(x,y = K + 1 implies that there is no exact solution for the unknown coefficient β in a linear equation y = Xβ. If this rank is K, then y is linearly dependent on the columns of X and we can find a nonzero vector β such that y = K k=1 β k X k = β 1 x 11. x t β k x 1k. x tk Under these assumptions, the OLSE has the following properties: (a ˆβ is unique This holds because X X is positive definite and nonsingular due to the assumption of full column rank. The normal equations can be solved uniquely for ˆβ by premultiplying both sides by (X X 1. ˆβ = (X X 1 X y (7 Viewed as a function of the sample (y,x, this is often called the OLS estimator. For any given sample (y,x, the value of this function is the OLS estimate. Note, ˆβ k is an estimate of the unknown population parameter β k. The predicted value of y can now be written as ŷ = X ˆβ = X(X X 1 X y = Py where P X(X X 1 X is N N. The projection matrix, P, projects y onto the column space of X to obtain ŷ. P is symmetric and idempotent, and PX = X. If rank(x < K, then X X is positive semi-definite and singular, so ˆβ is not unique because an infinite number of solutions exist. The sampling error is defined as ˆβ β. It can be related to u as follows: ˆβ β = (X X 1 X y β (by (7 = (X X 1 X (Xβ + u β (since y = Xβ + u = β + (X X 1 X u β = (X X 1 X u = Au (where A (X X 1 X (8 (b ˆβ is a linear estimator ˆβ is a linear function of y, i.e., ˆβ is linear in y (ˆβk = b k1 y b kn y N, k = 1,...,K Page 3 of 20

4 Example. [ ] [ ˆβ1 b11 b = 12 b 1N ˆβ 2 b 21 b 22 b 2N [ where (X X 1 X b11 b = 12 b 1N b 21 b 22 b 2N ] ] y 1. y N = [ b11 y b 1N y N b 21 y b 2N y N ] (c The OLS regression residual, û Note, Since u it is a random vector, û should not be considered an estimate of u. Recall, the vector of OLS regression residuals is û y X ˆβ = y ŷ = [ I N X(X X 1 X ] y = My where M [ I N X(X X 1 X ]. The annihilator matrix, M, is a symmetric and idempotent matrix with rank N K, and MX = 0. (c.1 û = My = M(Xβ + u = MXβ + Mu = Mu (c.2 X û = X Mu = (MX u = 0. Thus, û is orthogonal to the columns of X, i.e., x 11 x 1N û 1 X N û = = x k1 û x kn û N = X tk û t = X kû = 0 t=1 x K1 x KN û N This shows that the normal equations can be interpreted as the sample analogue of the orthogonality conditions E(X t u t = 0, which will be covered in the next subsection. (c.3 Let X have a column of a constant, so X t1 = 1, t = 1,...,N. From the result in (c.2, N t=1 X t1 û t = 0 N t=1 û = 0 n 1 N û = 0 û = 0 As a result, ŷ = X ˆβ = y = X ˆβ + û = ȳ = X ˆβ + û = X ˆβ t=1 To sum up, if X has a column of a constant, then N t=1 û = 0 and ȳ = X ˆβ, where ȳ and X are the samples means of y and X. That is, the regression line passes through the point of sample means. If X does not have a column of a constant, the regression line does not pass through the point of sample means. Assumption 2: E(u X = 0 (Strict Exogeneity This assumption implies that the regressors are strictly exogeneous. So, the expectiation (mean is conditional on the regressors for all observations, i.e., E(u t X 1,...,X N = 0 (t = 1,...,N. (9 To state the assumption differently, take, for any given observation t, the joint distribution of the nk + 1 random variables, f (u t,x 1,...,X N, and consider the conditional distribution, f (u t X 1,...,X N. The conditional mean E(u t X 1,...,X N is, in general, a nonlinear function of (X 1,...,X N. The strict exogeneity assumption says that this function is a constant value of zero. It also implies that E(y X = Xβ. Note, if X is a fixed variate, then X is strongly exogeneous. Weak exogeneity of X requires only E(u t X t = 0 t. Assuming this constant to be zero is not restrictive if the regressors include a constant because the equation can be rewritten so that the conditional mean of the eror is zero. To see this, suppose that E(u t X = µ and x t1 = 1. The equation can be written at y t = β 1 + β 2 x t2 + + β K x tk + u t = (β 1 + µ + β 2 x t2 + + β K x tk + (u t µ If we redefine β 1 to be β 1 + µ and u t to be u t µ, the conditional mean of the new error term is zero. In virtually all applications, the regressors include a constant term. Page 4 of 20

5 Implications of Strict Exogeneity The unconditional mean of the error term is zero, i.e., E(u t = 0 (t = 1,...,N (10 This is because, by the Law of Total Expectations 2, E [E(u t X] = E(u t. Under strict exogeneity, the regressors are orthogonal to the error term for all observations, i.e., E(x sk u t = 0 (s,t = 1,...,N; k = 1,...,K or E(X su t = E(x s1 u t.. E(x sk u t = 0 (K 1 ( t,s (11 Proof. Since x sk is an element of X, strict exogeneity implies E(u t x sk = E [E(u t X x sk ] = 0 by the Law of Iterated Expectiations. 3 It follows from this that E(x sk u t = E [E(x sk u t x sk ] (by the Law of Total Expectiations = E[x sk E(u t x sk ] (by the linearity of conditional expectiations = 0 Strict exogeneity requires the regressors be orthogonal not only to the error term from the same observation (i.e., E(x tk u t = 0 k, but also to the error term from the other observations (i.e., E(x sk u t k and for s t. Because the mean of the error term is zero, the orthogonality conditions are equivalent to zero-correlation conditions. This is because cov(u t,x sk = E(x sk u t E(x sk E(u t (by definition of covariance = E(x sk u t (by (10 = 0 (by the orthogonality conditions (11 In particular, for t = s, cov(x tk,u t = 0. Therefore, strict exogeneity implies the requirement that the regressors be contemporaneously uncorreclated with the error term. (d ˆβ is unbiased E (ˆβ X = β β Proof. E (ˆβ X ( (X = E X 1 X y X (by (7 ( (X = E X 1 X (Xβ + u X (since y = Xβ + u ( = E β + ( X X 1 X u X ( (X = E (β X + E X 1 X u X (by the linearity of conditional expectations = β + ( X X 1 X E(u X (since β is a constant and ( X X 1 X is a function of X = β (by (9 2 The Law of Total Expectiations states that E [E (y x] = E(y. 3 The Law of Iterated Expectations states that E [E (y x,z x] = E(y x. Page 5 of 20

6 The OLS estimator ˆβ is a function of the sample (y,x. Since (y,x are random, so is ˆβ. Now imagine that we fix X at some given value, calculate ˆβ for all samples corresponding to all possible realizations of y, and take the average of ˆβ. This average is the (population conditional mean E (ˆβ X. The result above says that this average equals the true value β. It should be emphasized that the strict exogeneity assumption is critical for proving unbiasedness. Anything short of strict exogeneity will not do. There is [ another notion ] of unbiasedness that is weaker than the unbiasedness above. The result above implies that E (ˆβ = E E (ˆβ X = β, by the Law of Total Expectations. This says that if we calculated ˆβ for all possible different samples, differeing not only in y but also in X, the average would be the true value. [ Finally, ˆβ ] 2 is the best estimator of β in that it minimizes the mean squared error (MSE, E (ˆβ β. (e Linear parametric functions Let R be a J K matrix of nonzero constants. An unbiased estimator of linearly estimable linear parametric functions Rβ is given by Rˆβ. (f û is unbiased The residual û is an unbiased estimator of the error term u in the sense that E (û u X = 0. Assumption 3: E(uu X = σ 2 I N (Spherical error covariance This assumption implies that the error terms (u t are homoskedastic (have the same variance and uncorrelated with each other conditional on X. E(u 2 t X = σ 2 > 0 (t = 1,2,...,N (12 E(u t u s X = 0 (t,s = 1,2,...,N; t s (13 The homoskedasticity assumption (12 says that the conditional second moment, which in general is a nonlinear function of X, is a constant. Thanks to strict exogeneity, this condition can be stated equivalently in more familiar terms. Consider the conditional variance var(u t X. It equals the same constant because var(u t X = E(u 2 t X E(u t X 2 (by the definition of conditional variance = E(u 2 t X (by strict exogeneity Similarly, (13 is equivalent to the requirement that cov(u t,u s X = 0 (t,s = 1,2,...,N; t s That is, in the joint distribution of (u t,u s conditional on X, the covariance is zero. In the context of time-series models, (13 states that there is no serial correlation in the error term. The discussion above shows that the assumption can also be written as cov(u X = σ 2 I N With the third assumption added to the previous two, we have the following key results for the OLSE. Page 6 of 20

7 (g The covariance matrix of ˆβ V cov(ˆβ X = σ 2 (X X 1 Proof. cov(ˆβ X = cov(ˆβ β X (since βis not random = cov(au X (by (8 = Acov(u XA (since A is a function of X = A ( σ 2 I N A (since cov(u X = σ 2 I N = σ 2 AA = σ 2 (X X 1 (since AA = (X X 1 X X(X X 1 = (X X 1 So, cov(rˆβ X = Rcov(Rˆβ XR = σ 2 R(X X 1 R = RV R for a matrix of R of constants. (h Gauss-Markov Theorem The OLSE is efficient in the class of linear unbiased estimators. That is, for any unbiased estimator β that is linear in y, cov( β X cov(ˆβ X in the matrix sense, i.e., cov( β X cov(ˆβ X is positive semi-definite. So, [ ] a cov( β X cov(ˆβ X a 0 or [ ] [ ] a cov( β X a a cov(ˆβ X a for any K-dimensional vector a. In particular, consider a special vector whose elements are all 0 except for the k-th element, which is 1. For this particular a, the quadratic form above picks up the (k,k element, which is var( β k X where β k is the k-th element of β. Thus, the matrix inequality above implies var( β k X var(ˆβ k X (k = 1,2,...,K (14 That is, for any regression coefficient, the variance of the OLS estimator is no larger than that of any other linear unbiased estimator. (14, along with the unbiasedness of ˆβ, imply var( β var(ˆβ (where β is any linear unbiased estimator The Gauss-Markov says that the OLSE is efficient in the sense that its conditional covariance matrix cov(ˆβ X is smallest among linear unbiased estimators. For this reason, the OLSE is called the Best Linear Unbiased Estimator (BLUE. Proof. Since β is linear in y, then it can be written as β = Cy for some matrix C, which is possibly a function of X. Let D C A or C = D + A where A = (X X 1 X. Then β = (D + Ay = Dy + Ay = D(Xβ + u + ˆβ (since y = Xβ + u and ˆβ = Ay = DXβ + Du + ˆβ Page 7 of 20

8 Take the conditional expectations on both sides, we obtain E( β X = E(DXβ + Du + ˆβ X = DXβ + DE(u X + β (since ˆβ is unbiased = DXβ + β (since E(u X = 0 β is unbiased if and only if DXβ = 0. For this to be true for any given β, it is necessary that DX = 0. So β = Du + ˆβ and Now, Thus, β β = Du + (ˆβ β = (D + Au (by (8 cov( β X = cov( β β X = cov((d + Au X = (D + Acov(u X(D + A (since D and A are functions of X = σ 2 (D + A(D + A (since cov(u X = σ 2 I N = σ 2 (DD + DA + AD + AA = σ 2 (DD + (X X 1 (DA = DX(X X 1 = 0 since DX = 0 and AA = (X X 1 cov( β X = σ 2 [DD + (X X 1 ] σ 2 (X X 1 = cov(ˆβ X (since DD is positive semi-definite (i BLUE of linear parametric functions The unique BLUE of linearly estimable linear parametric functions Rβ is given by Rˆβ. Proof. Let β be as defined above. Then cov(r β cov(rˆβ = σ 2 R(DD + (X X 1 R σ 2 R[(X X 1 ]R = σ 2 R[DD + (X X 1 (X X 1 ]R = σ 2 R[DD ]R 0 (since DD is positive semi-definite Page 8 of 20

9 (j Conditional covariance of û cov(û X = σ 2 M, which is singluar. Thus, û t and û s are correlated (cov(û t,û s X s 0 even if we assmed that the true error terms are uncorrelated. Proof. cov(û X = cov(mu XM = cov(mu X (since û = Mu = Mcov(u XM = σ 2 MM (since cov(u X = σ 2 I N = σ 2 M (since M = M and MM = M (k Estimator of σ 2 Let the LSE of σ 2 be ˆσ 2 = SSR N K = û û N K (15 The intuitive reason to dvide the SSR by N K rather than by N is that K parameters (β have to be estimated before obtaining the residual vector û used to calculate ˆσ 2. More specifically, û has to satisfy the K normal equations (4, which limits the variability of the residual. ˆσ 2 is an unbased estimator of σ 2. Proof. Since ˆσ 2 = û û N K the proof amounts to showing that E(û û X = (N Kσ 2. E(û û X = E[(Mu (Mu X] (since û = Mu = E[u Mu X] (since M = M and MM = M = E[tr(u Mu X] (since u Mu is a scalar = E[tr(Muu X] (since tr(abc = tr(cab = tr[e(muu X] (since trace is a linear operator = tr[me(uu X] (since M is a function of X = σ 2 tr(m (since E(uu X = σ 2 I N = σ 2 tr(i N X(X X 1 X (since M = I N X(X X 1 X = σ 2 [tr(i N tr(x(x X 1 X ] (since tr(a + B = tr(a +tr(b = σ 2 [tr(i N tr((x X 1 X X] (since tr(abc = tr(cab = σ 2 [tr(i N tr(i K ] (since X X : K K = σ 2 (N K Thus, the unbiased estimator of cov(ˆβ X is ˆV ˆσ 2 (X X 1, and the unbiased estimator of cov(rˆβ X is R ˆV R = ˆσ 2 R(X X 1 R. The square root of ˆσ 2, ˆσ, is called the standard error of the regression (SER or standard error of the equation (SEE. It is an estimate of the standard deviation of the error term. Page 9 of 20

10 Assumption 4: (u X is distributed as a multivariate normal (0,σ 2 I Definition. An n-dimensional random vector y is distributed as a multivariate normal, y N(µ, Σ, with a mean vector µ and a nonsingular (positive definite variance-covariance matrix Σ, if the joint density function is given by { f (y = (2π 2 n Σ 1 (y µ Σ 1 } (y µ 2 exp 2 Theorem 1. Let y be an n-dimensional normal variate: y N(µ,Σ. Then (a (Ay b N((Aµ b,aσa, where A is an m n matrix of constants with rank(a = m n, and b is an m- dimensional vector of constants. (b Let P P = Σ 1. There exists such a nonsingular matrix P for a positive definite matrix Σ. Then, P(y µ is distributed as a standed multivariate normal: N(0,I. Now, under Assumption 4, we can specify the distribution functions of ˆβ and ˆσ 2. We need to know the distribution of u because it drives the statistical properties of the estimators. When we know the distribution of u, we know the distribution of ˆβ and ˆσ 2. We will then be able to formulate statistical tests, or inferences, of hypotheses. Assumption 4 implies that the distribution of y conditional on X is a multivariate normal N(Xβ,σ 2 I. This is because a linear transformation of a multivariate normal random vector is still a multivariate normal random vector by Theorem 1 (l Distribution of ˆβ and û ˆβ N(β,V and û N(0,σ 2 M We have already derived the mean and covariance matrix of both random vectors. Since ˆβ is linear function of y, i.e., ˆβ = (X X 1 X y, and û is a linear function of u, i.e., û = Mu, then the results of Theorem 1 hold. So, both random vectors are multivariate normal. (m Distribution of ˆσ 2 Theorem 2. Let y be an n-dimensional normal variate: y N(µ,Σ. Then, (a (y µ Σ 1 (y µ is distributed as χ 2 (n, a central chi-square random variable with n degrees of freedom. Note, from (b in the previous Theorem, we have z = P(y µ N(0,I, that is, z i are i.i.d. standard normal random variables. The quadratic function can be written as (y µ Σ 1 (y µ = (y µ P P(y µ = z z = which is the sum of squared independent standard normal random variables. Thus, we can define a χ 2 random variable with n degrees of freedom as the sum of n squared independent standard normal random variables. (b y Σ 1 y is distributed as χ 2 (n;δ, a noncentral chi-square distribution with n degrees of freesom and the noncentrality parameter δ. n z 2 i i=1 (c (y b Σ 1 (y b is distributed as χ 2 (n;δ, where δ 2 = (µ b Σ 1 (µ b. (d If Σ = σ 2 I, then y Ay is distributed as χ 2 (m;δ if and only if A is an idempotent matrix of rank m, where the σ 2 noncentrality parameter is δ 2 = µ Aµ. σ 2 (e Let y be an n- dimensional (not necessarily normal random vector with E(y = µ and cov(y = Σ. Then, E(y Ay = tr(aσ + µ Aµ. Let W (N K ˆσ2 = û û = u Mu σ 2 σ 2 σ 2 Since u N(0,σ 2 I, M is idempotent, and rank(m = N K, then by part (d of Theorem 2, W = u Mu σ 2 χ 2 (N K. Page 10 of 20

11 (n ˆβ is independent of ˆσ 2 Theorem 3. Let y N(µ, Σ. Then, (a A linear function By and a quadratic function y Ay are independent if BΣA = 0, where A is a symmetric matrix. (b Two quadratic functions y Ay and y By are independent if AΣB = 0. Example. ˆβ = β + (X X 1 X u and ˆβ β = (X X 1 X u ˆσ 2 = (N K 1 û û = (N K 1 u Mu Let B = (X X 1 X, A = M, Σ = σ 2 I BΣA = σ 2 (X X 1 X M = σ 2 (X X 1 (MX = 0 By part (a from Theorem 3 above, ˆβ is independent of ˆσ 2. Distributions Used in Statistical Inference Theorem 4. Let Y N(µ,1, Z χ 2 (n;δ, and W χ 2 (m which are independent. Then, (a Y Wm t(m;µ, noncentral t distribution (b Z/n F(n,m;δ, noncentral F distribution W/m Let R be a J K matrix of known constants with rank J K, and q be a J 1 vector of known constants. Then results in (l-(n imply (o Rˆβ N(Rβ,RV R (p Z (Rˆβ Rβ (RV R 1 (Rˆβ Rβ χ 2 (J (q Z (Rˆβ q (RV R 1 (Rˆβ q χ 2 (J;δ (r Central F distribution Deriving the F test: F = 1 J Z /J W/(N K (Rˆβ Rβ (RV R 1 (Rˆβ Rβ ( (N K ˆσ2 /σ 2 /(N K = σ2 (Rˆβ Rβ (R(X X 1 R 1 (Rˆβ Rβ Jσ 2 ˆσ 2 F = 1 J (Rˆβ Rβ (R ˆV R 1 (Rˆβ Rβ F(J,N K (16 (s Noncentral F distribution F Z /J W/(N K = 1 J (Rˆβ q (R ˆV R 1 (Rˆβ q F(J,N K;δ (17 Page 11 of 20

12 (t Central and noncentral t distributions When the number of restrictions is 1, we need to consider the following t-statistic. Let Z 1 N(µ,1 and Z 2 χ 2 (m. If Z 1 and Z 2 are independent, then t = Z 1 Z2/m t(m;µ where µ is the noncentrality parameter which can be either positive, zero, or negative. If µ = 0, it is a central t distribution If J = 1 and Rˆβ is a scalar, its true variance and estimated variance are also scalars. t = (Rˆβ Rβ/sd(Rˆβ W/(N K = (Rˆβ Rβ/sd(Rˆβ ˆσ 2 /σ 2 = (Rˆβ Rβ t(n K (18 est.sd(rˆβ t = (Rˆβ q t(n K;Rβ q (19 est.sd(rˆβ Remark: It is easy to verfiy the following relationship between t distribution and F distribution: t(n K 2 F(1,N K Example. Consider a row vector R = (0,0,...,0,1,0,...,0 which has 1 in the k-th position. Then, Rˆβ = ˆβ k, and R ˆV R = ˆv kk is the k-th diagonal element of ˆV, which is the estimate of the variance of ˆβ k. Therefore, we can write t = ˆβ k β k ˆvkk t(n K t = ˆβ k q ˆvkk t(n K;β k q For another example, consider a row vector R = (a,b,0,...,0 where a and b are constants. Then, Rˆβ = aˆβ 1 + bˆβ 2, and R ˆV R = a 2 ˆv 11 + b 2 ˆv ab ˆv 12, which is the estimated variance of Rˆβ, and hence t = (aˆβ 1 + bˆβ 2 (aβ 1 + bβ 2 est.sd(aˆβ 1 + bˆβ 2 t(n K t = (aˆβ 1 + bˆβ 2 q est.sd(aˆβ 1 + bˆβ 2 t(n K;aβ 1 + bβ 2 q The Classical Regression Model for Random Samples The sample (y,x is a random sample if {y t,x t } is i.i.d. (independent and identically distributed across observations. Since u t is a function of (y t,x t by Assumption 1 and since (y t,x t is independent of (y s,x s for t s, (u t,x t is independent of X s for t s. Thus, E(u t X = E(u t X t E(ut 2 X = E(ut 2 X t E(u t u s X = E(u t X t E(u s X s (for t s (20 Therefore, Assumptions (A2 and (A3 reduce to (A2 : E(u t X t = 0 (t = 1,2,...,N (21 (A3 : E(ut 2 X t = σ 2 > 0 (t = 1,2,...,N (22 Page 12 of 20

13 The implication of the identical distribution aspect of a random sample is that the joint distribution (u t,x t does not depened on t. So the unconditional second moment E(ut 2 is constant across t (this is referred to as unconditional homoskedasticity and the functional form of the conditional second moment E(ut 2 X t is the same across t. However, (A3, that the value of the conditional second moment is the same across t, does not follow. Therefore, (A3 remains restricitive for the case of a random sample; without it, the conditional second moment E(ut 2 X t can differ across t through its possible dependence on X t. To emphasize the distinction, the restictions on the conditional second moments are referred to as conditional homoskedasticity. Measurement of Goodness of Fit How well do the regressors explain the dependent variable? How do we measure the goodness of the regression? A measure of the goodness of fit is the squared correlation coefficient between the dependent variable y t and the least squares predictor ŷ t = X t ˆβ: R 2 [corr(y t,ŷ t ] 2 = [cov(y t,ŷ t ] 2 var(y t var(ŷ t = [(y ȳ (ŷ ŷ/n] 2 [(y ȳ (y ȳ/n][(ŷ ŷ (ŷ ŷ/n] where ȳ and ŷ are the vectors of sample means of y t and ŷ t, respectively. Since R 2 is a squared correlation coefficient, it akes a value between 0 and 1 (0 R 2 1, and its value is independent of the measurement unit. A high R 2 is interpreted as a good fit and the regressors explain the variation of the dependentvariable well. This expression can be simplified further by using the following results. Lemma 1. When a constant is one of the regressors, (a ȳ = ŷ (b û (ŷ ȳ = 0 (c (y ȳ (ŷ ȳ = (ŷ ȳ (ŷ ȳ (d (y ȳ (y ȳ = (ŷ ȳ (ŷ ȳ + û û Proof. Let a constant be one of the regressors, then (a ȳ = X ˆβ = n 1 N X t ˆβ = n 1 N ŷ t = ŷ t=1 t=1 (b û (ŷ ȳ = û ŷ û ȳ = (Mu (X ˆβ N t=1 û t ȳ = 0 ȳ N t=1 û t = 0 (c (y ȳ (ŷ ȳ = ([ŷ ȳ] + û (ŷ ȳ = (ŷ ȳ (ŷ ȳ + û (ŷ ȳ = (ŷ ȳ (ŷ ȳ (d (y ȳ (y ȳ = ([ŷ ȳ] + û ([ŷ ȳ] + û = (ŷ ȳ (ŷ ȳ + û û Now, we substitute these results into the equation for R 2 : (23 R 2 = = = [(y ȳ (ŷ ŷ/n] 2 [(y ȳ (y ȳ/n][(ŷ ŷ (ŷ ŷ/n] [(y ȳ (ŷ ȳ] 2 [(y ȳ (y ȳ][(ŷ ȳ (ŷ ȳ] [(ŷ ȳ (ŷ ȳ] 2 [(y ȳ (y ȳ][(ŷ ȳ (ŷ ȳ] (since ȳ = ŷ (since (y ȳ (ŷ ȳ = (ŷ ȳ (ŷ ȳ = (ŷ ȳ (ŷ ȳ (y ȳ (y ȳ = (y ȳ (y ȳ (y ȳ (y ȳ û û (y ȳ (y ȳ (since (ŷ ȳ (ŷ ȳ = (y ȳ (y ȳ û û = 1 û û (y ȳ (y ȳ (24 Page 13 of 20

14 This is the expression more commonly used to define R 2, which is called the coefficient of (multiple determination. The denominator term is called the total sum of squares (TSS and is a measure of the variation of the dependent variable. The numerator in the first expression is called the explained sum of squares or regression sum of squares (RSS, and the SSR, û û, is also called the unexplained sum of squares or error sum of squares (ESS. R 2 = RSS T SS = 1 ESS T SS (25 If the estimation equation does not include an intercept term, and you compute R 2 by the second expression, then the value of R 2 can be negative. With an intercept of 0, the OLS residuals no longer sum to zero, so most of the properties in the Lemma above fail to hold. We can see this by the following derivation: This implies that ESS = û û = (y ŷ (y ŷ = [(y ȳ + (ȳ ŷ] [(y ȳ + (ȳ ŷ] = (y ȳ (y ȳ 2(y ȳ (y ȳ + (ȳ ŷ (ȳ ŷ R 2 = 2(y ȳ (y ȳ + (ȳ ŷ (ȳ ŷ (y ȳ (y ȳ Another serious drawback of using R 2 as a measure of goodness of fit is that it is a nondecreasing, and typically an increasing, function of the number of regressors. To explain, T SS = RSS + ESS, so minimizing the SSR, or ESS, is equivalent to maximizing R 2. When an additional variable is included, R 2 improves if the coefficient on the variable is non-zero because (SSR r SSR u. Thus, One can increase the value of R 2 by simply throwing in more variables into the regression equation, even if they do not belong to the equation in theory. To avoid this problem in assessing the goodness of the regression equation, the adjusted R 2 (adjusted for the degrees of freedom is often used. R 2 = 1 (û û/(n K = 1 N 1 (y ȳ (y ȳ/(n 1 N K (1 R2 ; K 1 N K R2 1 (26 The Best Linear Unbiased Predictor Consider a linear regression model y 1 = X 1 β + u 1 and its OLS estimator ˆβ which is the BLUE under assumptions (A1-(A3. Suppose we wish to predict the value of the dependent [ variable ] at new values of regressors X 2 when the u1 regession relationship has not changed, i.e., y 2 = X 2 β + u 2 and u =, where u 1 : N 1 and u 2 : N 1, satisfies the assumptions (A1-(A3, [ E(uu E(u1 u X = 1 X E(u 1u 2 X E(u 2 u 1 X E(u 2u 2 X Note that E(u 1 u 2 X = E(u 2u 1 X = 0. ] [ σ = 2 I N 0 0 σ 2 I N u 2 ] = σ 2 I 2N To elaborate, we have the data on the new values of the regressors, X 2, but we do not have data on y 2. However, we still wish to predict the values of y 2. To do this, we first obtain ˆβ from the linear regression model y 1 = X 1 β 1 + u 1. Then, we use this estimate of ˆβ and the new values of the regressors, X 2, to predict y 2, i.e., ŷ 2 = X 2 ˆβ. Theorem 5. The best linear unbiased predictor (BLUP of y 2 for a given X 2 is given by ŷ 2 = X 2 ˆβ. Page 14 of 20

15 Proof. (i Linearity. ŷ 2 is a linear functionof y 1 because ˆβ is a linear function of y 1. (ii Unbiasedness. The prediction error û 2 = y 2 ŷ 2 has zero mean because E(û 2 X = E[X 2 (β ˆβ + u 2 X] = X 2 (β β = 0 (iii Best. The prediction error has the smallest covariance matrix. To show this, cov(û 2 X = cov(y 2 ŷ 2 X = cov(x 2 (β ˆβ + u 2 X = σ 2 I N2 + X 2 cov(ˆβ XX 2 (since u 2 is uncorrelated with X 2 and (β ˆβ = σ 2 I N2 + σ 2 X 2 (X 1X 1 1 X 2 Let another linear predictor be ỹ 2 = Ay 1 and prediction error ũ 2 = y 2 ỹ 2. We need to choose A such that the predictor is unbiased, E(ũ 2 = 0, and then we need to show cov(ũ 2 cov(û 2 is a positive semidefinite. E(ũ 2 X = E(y 2 ỹ 2 X = E(X 2 β + u 2 Ay 1 X = E[X 2 β + u 2 A(X 1 β + u 1 X] = E(X 2 β + u 2 AX 1 β + Au 1 X = 0 if X 2 = AX 1 cov(ũ 2 X = cov(u 2 + Au 1 X = σ 2 I N2 + σ 2 AA cov(ũ 2 X cov(û 2 X = σ 2 I N2 + σ 2 AA [σ 2 I N2 + σ 2 X 2 (X 1X 1 1 X 2] = σ 2 [AA X 2 (X 1X 1 1 X 2] = σ 2 [AA AX 1 (X 1X 1 1 X 1A ] = σ 2 A[I X 1 (X 1X 1 1 X 1]A = σ 2 AM 1 A = σ 2 (AM 1 (AM 1 0 Statistical Inference A major part of statistical inference is the test of hypothesis. A test of hypothesis consits of 1. specity the null hypothesis (H 0 and the alternative hypothesis (H 1 2. choose a test statistic 3. choose a rejection region for a given level of significance 4. conclude Each hypothesis specifies the parameter space Θ 0 and Θ 1, respectively. If Θ 0 is a subset of Θ 1, then it is called a nested hypothesis; otherwise, it is called a nonnested hypothesis. A nested null (or maintained hypothesis H 0 : h(θ = 0 against the alternative hypothesis H 1 : h(θ 0 restricts the parameter space. To test H 0 against H 1, we need a test statistic that is computable from the sample (i.e., must not involve any unknown parameters, and a decision rule of Page 15 of 20

16 when to reject or not reject H 0. The decision rule is expressed in terms of the rejection region (or critical region, such that H 0 is rejected if the test statistic is in the critial region. Since the test statistic is a random variable, there is a positive probability that we will reject the true H 0 for any reasonable choice of the critical region. Rejection of a true H 0 is called a Type I error, and the probability of Type I error is called the size of a test, or the significance level of a test, commonly denoted by α. The conventional choice of the significance level is 0.05 or 0.1. We may also fail to reject a false H 0, which is called a Type II error. The probability of rejecting a false H 0 is called the power of a test, which depends on the alternative hypothesis. Null Hypothesis H 0 H 0 is correct H 0 is false reject H 0 Type I error correct decision accept H 0 correct decision Type II error significance level: α = Prob(reject a true H 0 = Prob(type I error Power of test: π = Prob(reject a false H 0 = 1 Prob(not reject a false H 0 = 1 Prob(type II error Since the significance level is the probability that the test statistic is in the rejection region when the null hypothesis is correct, we need to know the distribution function of the test statistic under H 0. Similarly, the power of test requires the knowledge about the distribution function of the test statistic under H 1. Since the distribution function of the test statistic depends on the true value of the parameters, there are many values of the power when the alternative hypothesis is a composite hypothesis. Since the power of the test requires knowledge of the true population parameter β, we are usually unable to compute it. An ideal test (meaning an ideal choice of test statistic and critical region is the one that has a zero probability of Type I and Type II errors. But such a test does not exist. Typically, choosing a smaller size of a test decreases the power of the test. A test is called the most powerful test if it has greater power than any other test of same test size, and the uniformly most powerful test if it has greater power than any other test of same test size for all possible values of the parameters. A test is called an unbiased test if its power is at least as large as its test size. A test is called a consistent test if its power converges to 1 as the sample size grows to infinite. Finite Sample Tests of Nested Linear Hypotheses in a SLM Linear Regression Model: y = Xβ + u, u X N(0,σ 2 I Number of observations: N Unrestricted OLS estimator: minssr u = (y Xβ (y Xβ β Properties of Unrestricted OLSE: ˆβ = (X X 1 X y N(β,V, V = σ 2 ( X X 1 ˆσ 2 = SSR u (N K, W (N K ˆσ2 u σ 2 χ 2 (N K ˆβ is independent of ˆσ 2 Rˆβ N(Rβ,S, S RV R = σ 2 R(X X 1 R Z (Rˆβ q S 1 (Rˆβ q χ 2 (J;δ F Z /J W u/(n K = 1 J (Rˆβ q Ŝ 1 (Rˆβ q F(J,N K;δ δ = [ (Rβ q S 1 (Rβ q ]1/2 Page 16 of 20

17 Resricted OLS estimator: min β s.t. SSR r = (y Xβ (y Xβ Rβ = q R (J K and q (J 1 are known matrices, rank(r = J < K. Properties of Restricted OLSE β = ˆβ ( V R S 1 Rˆβ q N ( β V R S 1 (Rβ q,v V R S 1 RV λ = σ 2 S 1 (Rˆβ q N ( σ 2 S 1 (Rβ q,σ 4 S 1 σ 2 = SSR r N (K J (N K + J σ2 W r = SSR r χ 2 (N K + J;δ σ 2 σ 2 β, λ, σ 2 are mutually independent SSR r = SSR u + σ 2 (Rˆβ q S 1 (Rˆβ q Nested Linear Hypotheses H 0 : Rβ = q, H 1 : Rβ q (or H 0 is false where R is a J K matrix of known constants and q is a J 1 vector of known constants. The restriction matrix R has full row rank that is less than K, rank(r = J < K. An intuition suggests that we may compare the unrestricted estimation ˆβ and restricted estimation β to verify whether the restriction under H 0 is acceptable or not. These two estimators will be different with probability 1 due to the random variation of the sample. But, what do we expect about the magnitude of the difference? a big difference or small difference? It is easy to imagine that the difference will be relatively small (relative to their variances if the restriciton under H 0 is correct and it is not binding in the minization of SSR. On the other hand, the difference is expected to be relatively large if the null hypothesis is incorrect and the null restriction is a binding constraint in the optimization. Then, it is reasonable to consider the difference ˆβ β as a test statistic and construct a rejection region. Of course, we have to decide how to measure the difference because it is a K-dimensional vector As we will discuss in more detail later, the Durbin-Wu-Hasuman test can be interpreted as a test based on this type of reasoning, though it is used for different type of hypotheses. There is no reason why we should limit the test only to the comparison of the estimators themselves. It is possible to compare the estimated values of a certain function of parameters. Four tests presented below compare different functions of estimators. We will call them likelihood ratio (LR type test, Wald type test, Lagrange Multiplier (LM type test and efficient score type test. The reason for the word type is because the ideas of the finite sample tests discussed below are based on the ideas of the classical test procedures, but they are not the true LR, Wald, LM, and Score tests which are asymptotic tests and use the asymptotic χ 2 distribution of the test statistics. Likelihood Ratio (LR Type Test Since the least squares method tries to minimize the SSR, we may compare the values of the objective function (SSR. We know SSR r is greater than SSR u. Their difference is expected to be large if the restricitons under H 0 is not true and hence they are binding. Thus, it is intuitively reasonable to use the difference SSR r SSR u as the test statistic and we reject the null hypothesis if SSR r SSR u is large. Once the test statistic is chosen, we need to determine the decision rule of when to reject H 0 in favor of H 1. As alluded to already, we will reject H 0 if SSR r SSR u is sufficiently large. Otherwise, the test concludes that the data Page 17 of 20

18 does not provide enough evidence to reject H 0. How big a difference is sufficiently large enough to reject the null hypothesis? This requires a choice of the critical (rejection region of the test. Since we reject the null hypothesis when SSR r SSR u is large, the proper critical region is the set of the sample R c {SSR r SSR u > c}, where c is a pre-chosen value. The choice of the critical value depends on how much of Type I error we are willing to tolerate. A choice of large critical value will reject the null hypothesis less frequently and we are more likely to commit the Type II error. The choice of the critical value is governed by the choice of significance level (size of the test. That is, for a pre-chosen significance level α, the crtical value is determined by P(SSR r SSR u > c H 0 = α which requires knowledge of the distribution function of the test statistic under the null hypothesis. Remark. A small α leads to a higher critical value. So we reject H 0 less frequently and we are more likely to commit the Type II error, so the power of the test decreases. The p-value is defined as p = Prob(this sample H 0 is true It tells us how likely it is to get the sample we got if the null hypothesis is true. It is the probability of obtaining a test statistic at least as extremem as the one that was actually observed, assuming that the null hypothesis is true. It can also be interpreted as the chance of a Type I error if we reject H 0. If the p-value is less than α, then we reject H 0, indicating that the observed result would be highly unlikely under the null hypothesis. We have shown Z SSR r SSR u σ 2 = (Rˆβ q S 1 (Rˆβ q χ 2 (J;δ Recall that S = σ 2 R(X X 1 R involves the unknown parameter σ 2. Hence, this is not a statistic (i.e, not computable from the sample. We may just replace σ 2 with its estimator ˆσ 2. But then, Z does not have the χ 2 distribution. To see this, note that the replacement of σ 2 with its estimator ˆσ 2 is equivalent to taking a ratio of two independent χ 2 random variables, Z and W u : F Z /J W u/(n K = 1 J (Rˆβ q Ŝ 1 (Rˆβ q F(J,N K;δ which can also be expressed as F Z /J = (SSR r SSR u /J = N K SSR r SSR u F(J,N K;δ W u/(n K SSR u/(n K J SSR u This is a statistic that has a central F distribution under H 0 and a noncentral F distribution under H 1. The null hypothesis is rejected if F is greater than the critical value F α of significance level α: P(F > F α H 0 = α. To compute this test statistic, we need both unrestricted estimators for SSR u and restricted estimators for SSR r Wald type Test We know that restricted estimator satisfies the restriction under H 0, i.e., R β = q. The Wald type test asks whether the unrestricted estimator also satisfies the restriciton equation reasonably well. If the null hypothesis is correct so that the restricted and unrestricted estimators are close enough, we would exprec that Rˆβ is close to its hypothesized value q. If they are not close enough we will reject the null hypothesis. We amy also interpret this test as a comparison of Rˆβ q with R β q, where the latter is a zero vector. Unlike the scalar values of SSR s in the LR type test, Rˆβ is a J 1 vector in general. Therefore, we need to choose a measure of the distance of Rˆβ from q. One may just consider the squared Euclidean norm, Rˆβ q 2 = ( ( Rˆβ q Rˆβ q. However, this does not take into account the differences in the variances of each element. We Page 18 of 20

19 ( need to standardize Rˆβ q in a certain way. Let P P = S 1. Then P Rˆβ q N(P(Rβ q,i. We now take the Euclidean norm of this standardized vector. ( 2 ( Z = P Rˆβ q = Rˆβ q ( S 1 Rˆβ q χ 2 (J;δ As in the previous case of the LR type test, we take the ratio of independent χ 2 random variables to eliminate the known σ 2 in S. The test statistic is then exactly the same as the LR type test statistic F 1 J (Rˆβ q Ŝ 1 (Rˆβ q F(J,N K;δ Under the null hypothesis, this F-statistic distribution has a central F(J,N K. The null hypothesis is rejected if F is greater than the critical value F α of significance level α: P(F > F α H 0 = α. To compute this test statistic, we need only unrestricted estimators. Lagrange Multiplier (LM type Test As the name indicates, this test is based on the estimator λ of the Lagrange multiplier λ. If the restriction of H 0 is true, it will not be a binding constraint and hence, the restricted estimator λ (estimator of the shadow price of the restricitons will be close to zero. Thus, we reject the null hypothesis if λ is sufficiently different from a zero vector. As in the Wald type test, we need to find a measure of the distance between λ and a zero vector because λ is a vector in general. Following the same procedure as in the Wald type test, we note Z r λ [ σ 4 S 1] 1 λ = σ 4 λ S λ = σ 2 λ [ R(X X 1 R ] λ χ 2 (J;δ and we take the ratio of independent χ 2 random variables to eliminate the unknown σ 2 Z r/j F r = = 1 [ W r/(n K+J J σ 2 λ R(X X 1 R ] λ F(J,N K + J under H0 Note that this test statistic requires only restricted estimators, and its degrees of freedom are different from those of the LR and Wald type tests. Under the null hypothesis, this has a central F distribution because both the numerator and denominator terms are distributed as a central χ 2 under H 0. However, under the alternative hypothesis (Rβ q, both Z r and W r are noncentral χ 2 random variables with the same noncentrality parameter δ. Therefore, F r has a doubly-noncentral F distribution under H 1. This will affect the power of the test. The critical value for the rejection region is still found from the null central F distribution. We have shown earlier that the LR type test and the Wald type test are identical. To see the relationship of the LM type test with other tests, note the solution λ = σ 2 S 1 (Rˆβ q. Since S 1 has a full column rank, λ 0 is equivalent to Rˆβ q 0, which is the Wald type test. Therefore, both tests are checking on the same quantity. This becomes clear if λ = σ 2 S 1 (Rˆβ q is substituted into Z r equation. The resulting equation is Z r = (Rˆβ q S 1 (Rˆβ q, which is the same as Z. Thus, ther numerator terms of F and F r are the same. The difference is the denominator term. If we use the unrestricted estimator of ˆσ 2 instead of the restricted estimator σ 2, that is, if we use W u/(n K instead of W r/(n K+J in the ratio F r, then we have a test statistic that is the same as the Wald type test. Efficient Score type Test The unrestricted estimator is the solution of the first order condition SSR β = 2X y + 2X X ˆβ = 0 = g(ˆβ = X y + X X ˆβ = 0 β=ˆβ If the restrictions are correct, then the restricted estimator must satisfy this equation closely so that g( β = X y + X X β = 0 Page 19 of 20

20 The score type test thus asks whether the restricted estimator satisfies the first order condition of the unrestricted least squares estimation. The test rejects the null hypothesis if g( β is far from a zero vector. This test can also be considered as a comparison of restricted gradient g( β and unrestricted g(ˆβ which is of course zero. The score type test is equivalent to the LM type test. Consider the first order condition of the restricted least squares X y + X X β + R λ = 0 g( β + R λ = 0 Since R has a full column rank by our specification of the null hypothesis, λ = 0 if and only if g( β = 0. Therefore, the LM type test of asking whether λ is close to zero or not is equivalent to asking whether g( β is close to zero or not. Substitution of R λ = g( β into the Z r gives and Z r = σ 2 λ [ R(X X 1 R ] λ ( λ = σ 2 (X R X ( 1 R λ = σ 2 g( β ( X X 1 g( β F r = 1 J σ 2 g( β ( X X 1 g( β F(J,N K + J under H0 Case of Single Restriction: t test If J = 1 and Rˆβ is a scalar, then under H 0 and under H 1 t = (Rˆβ Rβ/sd(Rˆβ W/(N K = (Rˆβ Rβ/sd(Rˆβ ˆσ 2 /σ 2 = t = (Rˆβ q t(n K,Rβ q est. sd(rˆβ ( Rˆβ Rβ t(n K est. sd(rˆβ H 0 is rejected when t > t α where the critical value is determined by P( t > t α H 0 = α. Page 20 of 20

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

The Statistical Property of Ordinary Least Squares

The Statistical Property of Ordinary Least Squares The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

The Linear Regression Model

The Linear Regression Model The Linear Regression Model Carlo Favero Favero () The Linear Regression Model 1 / 67 OLS To illustrate how estimation can be performed to derive conditional expectations, consider the following general

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Multiple Regression Model: I

Multiple Regression Model: I Multiple Regression Model: I Suppose the data are generated according to y i 1 x i1 2 x i2 K x ik u i i 1...n Define y 1 x 11 x 1K 1 u 1 y y n X x n1 x nk K u u n So y n, X nxk, K, u n Rks: In many applications,

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Financial Econometrics

Financial Econometrics Material : solution Class : Teacher(s) : zacharias psaradakis, marian vavra Example 1.1: Consider the linear regression model y Xβ + u, (1) where y is a (n 1) vector of observations on the dependent variable,

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

ANALYSIS OF VARIANCE AND QUADRATIC FORMS 4 ANALYSIS OF VARIANCE AND QUADRATIC FORMS The previous chapter developed the regression results involving linear functions of the dependent variable, β, Ŷ, and e. All were shown to be normally distributed

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Simple Linear Regression: The Model

Simple Linear Regression: The Model Simple Linear Regression: The Model task: quantifying the effect of change X in X on Y, with some constant β 1 : Y = β 1 X, linear relationship between X and Y, however, relationship subject to a random

More information

Projection. Ping Yu. School of Economics and Finance The University of Hong Kong. Ping Yu (HKU) Projection 1 / 42

Projection. Ping Yu. School of Economics and Finance The University of Hong Kong. Ping Yu (HKU) Projection 1 / 42 Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Projection 1 / 42 1 Hilbert Space and Projection Theorem 2 Projection in the L 2 Space 3 Projection in R n Projection

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 17, 2012 Outline Heteroskedasticity

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38 Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38 Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T, Regression Analysis The multiple linear regression model with k explanatory variables assumes that the tth observation of the dependent or endogenous variable y t is described by the linear relationship

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Lecture 6: Geometry of OLS Estimation of Linear Regession

Lecture 6: Geometry of OLS Estimation of Linear Regession Lecture 6: Geometry of OLS Estimation of Linear Regession Xuexin Wang WISE Oct 2013 1 / 22 Matrix Algebra An n m matrix A is a rectangular array that consists of nm elements arranged in n rows and m columns

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4

More information

8. Hypothesis Testing

8. Hypothesis Testing FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements

More information

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance: 8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions

More information

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #5 1 / 24 Introduction What is a confidence interval? To fix ideas, suppose

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

L2: Two-variable regression model

L2: Two-variable regression model L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014 What we have learned last time...

More information

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0:

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0: Linear Regression 1 Single Explanatory Variable Assume (y is not necessarily normal) where Examples: y» F; Ey = + x Vary = ¾ 2 ) y = + x + u Eu = 0 Varu = ¾ 2 Exu = 0: 1. School performance as a function

More information

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

FIRST MIDTERM EXAM ECON 7801 SPRING 2001 FIRST MIDTERM EXAM ECON 780 SPRING 200 ECONOMICS DEPARTMENT, UNIVERSITY OF UTAH Problem 2 points Let y be a n-vector (It may be a vector of observations of a random variable y, but it does not matter how

More information

Heteroskedasticity and Autocorrelation

Heteroskedasticity and Autocorrelation Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Chapter 2: simple regression model

Chapter 2: simple regression model Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

More information

Ch4. Distribution of Quadratic Forms in y

Ch4. Distribution of Quadratic Forms in y ST4233, Linear Models, Semester 1 2008-2009 Ch4. Distribution of Quadratic Forms in y 1 Definition Definition 1.1 If A is a symmetric matrix and y is a vector, the product y Ay = i a ii y 2 i + i j a ij

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 9. Specification Tests (1) Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Spatial Regression 9. Specification Tests (1) Luc Anselin http://spatial.uchicago.edu 1 basic concepts types of tests Moran s I classic ML-based tests LM tests 2 Basic Concepts 3 The Logic of Specification

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

1. The OLS Estimator. 1.1 Population model and notation

1. The OLS Estimator. 1.1 Population model and notation 1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 3 Jakub Mućk Econometrics of Panel Data Meeting # 3 1 / 21 Outline 1 Fixed or Random Hausman Test 2 Between Estimator 3 Coefficient of determination (R 2

More information

INTRODUCTORY ECONOMETRICS

INTRODUCTORY ECONOMETRICS INTRODUCTORY ECONOMETRICS Lesson 2b Dr Javier Fernández etpfemaj@ehu.es Dpt. of Econometrics & Statistics UPV EHU c J Fernández (EA3-UPV/EHU), February 21, 2009 Introductory Econometrics - p. 1/192 GLRM:

More information

Econometrics II - EXAM Answer each question in separate sheets in three hours

Econometrics II - EXAM Answer each question in separate sheets in three hours Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following

More information

Lecture 07 Hypothesis Testing with Multivariate Regression

Lecture 07 Hypothesis Testing with Multivariate Regression Lecture 07 Hypothesis Testing with Multivariate Regression 23 September 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Goals for today 1. Review of assumptions and properties of linear model 2. The

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27 Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Instrumental Variables

Instrumental Variables Università di Pavia 2010 Instrumental Variables Eduardo Rossi Exogeneity Exogeneity Assumption: the explanatory variables which form the columns of X are exogenous. It implies that any randomness in the

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Markus Haas LMU München Summer term 2011 15. Mai 2011 The Simple Linear Regression Model Considering variables x and y in a specific population (e.g., years of education and wage

More information

WLS and BLUE (prelude to BLUP) Prediction

WLS and BLUE (prelude to BLUP) Prediction WLS and BLUE (prelude to BLUP) Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 21, 2018 Suppose that Y has mean X β and known covariance matrix V (but Y need

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information