ECONOMETRICS (I) MEI-YUAN CHEN. Department of Finance National Chung Hsing University. July 17, 2003

Size: px
Start display at page:

Download "ECONOMETRICS (I) MEI-YUAN CHEN. Department of Finance National Chung Hsing University. July 17, 2003"

Transcription

1 ECONOMERICS (I) MEI-YUAN CHEN Department of Finance National Chung Hsing University July 17, 2003 c Mei-Yuan Chen. he L A EX source file is ec471.tex.

2 Contents 1 Introduction 1 2 Reviews of Statistics Random Variables Estimation Hypothesis esting Random Sampling Model, Projection, and Regression Random Sampling Regression Linear Models Linear Projection Assumptions on the Regression Errors Classical Linear Regression Models Simple Linear Regression Hypothesis esting Prediction Multiple Linear Regression Geometric Interpretations Measures of Goodness of Fit Properties of the OLS Estimators Bias Variance-Covariance Matrix of Regression Error Variance-Covariance Matrix of OLS Estimator Gauss-Markov heorem OLS Estimation of Error Variance Gaussian Quasi-MLE and MVUE Distribution of ˆβ in Normal Regression Models Method of Moments Estimation 44 2

3 7 Asymptotic Distribution heory Some Basic Mathematical Concepts Some Inequalities Modes of Convergence Order Notations Consistency and Asymptotic Normality of OLS Estimators Consistency Asymptotic Normality Linear Hypothesis esting: Finite Sample and Large Sample ests Finite Sample ests t- and F -ests An Alternative Approach Large Sample ests t- and F -tests Wald est Lagrange Multiplier est Confidence Regions Power of the ests Multicollinearity Near Multicollinearity Digress: Dummy Variables Generalized Least Squares heory GLS Estimators Feasible GLS esting for Heteroskedasticity Consistent Estimation of Covariance Matrices Generalized Method of Moments Endogeneity Instrument Variables

4 12.3 Reduced Form Identification Instrument Variables Estimation GMM Estimator SLS Estimator Distribution of GMM Estimator Optimal Weight Matrix Estimation of the Efficient Weight Matrix Nonlinear Regression Models NLLS Estimation Concentration Computation Using Linearization Asymptotic Distribution Regression Models with Limited Dependent Variables A Binary Dependent Variable: the Linear Probability Models Logit and Probit Models for Binary Response he Bootstrap An Example Definition of the Bootstrap he Empirical Distribution Function References 107 4

5 1 Introduction Economists have proposed numerous theories to characterize the relationships between economic variables; whether these theories are supported by real world data is an empirical issue. By econometrics we mean the application of statistical and mathematical methods to the analysis of economic data, with a purpose of verifying or refuting economic theories. One of the most commonly used econometric techniques is regression analysis. In the nineteenth century, Sir Francis Galton ( ) studied the relationship between the heights of children and their parents. He observed that although tall parents tended to have tall children and short parents tended to have short children, there was a tendency for children s heights to converge toward the average. He termed this as a regression toward mediocrity. Contemporary regression analysis is concerned with describing and evaluating the relationship between a dependent variable and one or more explanatory variables. his involves formulating an econometric model, estimating its unknown parameters, and drawing statistical inference about the estimated results. 2 Reviews of Statistics 2.1 Random Variables A random variable is a variable whose values are determined by an experiment of chance (i.e., governed by a probability distribution). We use capital letter to denote a random variable and lower case to denote its value. 1. Discrete random variable X. Probability: P{X = x}. Probability distribution: P{X a} = {i: x i a} P{X = x i}. 2. Continuous random variable X. Probability density function (p.d.f.): f(x). Cumulative distribution function (c.d.f.) F (a) = P{X a} = a f(x) dx. 1

6 he behavior of a random variable is completely determined by its probability density function. Moments are numerical measures summarizing certain behavior of a random variable, e.g., expected value and variance. 1. Expected value: E(X) = µ. If X is discrete, E(X) = i x ip{x = x i }. If X is continuous, E(X) = xf(x) dx. If c is nonstochastic, E(c) = c, and E(cX) = c E(X). 2. Variance: var(x) = E(X µ) 2 = E(X 2 ) µ 2 = σ 2. If X is discrete, var(x) = i (x i µ) 2 P{X = x i }. If X is continuous, var(x) = (x µ)2 f(x) dx. If c is nonstochastic, var(c) = 0, and var(cx) = c 2 var(x). he behavior of two (or more) random variables is determined by their joint probability density function. 1. Joint p.d.f. f XY (x, y) = P{X = x, Y = y}. 2. Joint c.d.f. F XY (a, b) = P{X a, Y b} = b a f XY (x, y) dx dy. 3. Marginal p.d.f. f X (x) = f XY (x, y) dy; f Y (y) = f XY (x, y) dx. 4. Conditional p.d.f. f(x y) = f XY (x, y)/f Y (y); f(y x) = f XY (x, y)/f X (x). 5. If f XY (x, y) = f X (x)f Y (y), then X and Y are said to be independent. he linear association between two random variables are characterized by their covariance (or correlation). 1. cov(x, Y ) = E((X µ X )(Y µ Y )) = E(XY ) µ X µ Y = σ XY. 2. corr(x, Y ) = σ XY /(σ X σ Y ) = ρ XY and 1 ρ XY 1. Note that corr(x, Y ) is nothing but the covariance between Z X and Z Y, where Z X = [X E(X)]/ var(x) and Z Y = [Y E(Y )]/ var(y ) are Z-scores of X and Y. 2

7 3. If X and Y are independent, then cov(x, Y ) = 0; the converse is not true. 4. E(X + Y ) = E(X) + E(Y ); var(x + Y ) = var(x) + var(y ) + 2cov(X, Y ). Some frequently used random variables are: Normal random variable X N(µ, σ 2 ). (X µ)/σ N(0, 1). If X 1,... X m are independent N(0, 1), then Z = m X2 i χ2 m. If X N(0, 1) and Y χ 2 m are independent, then W = X/ Y/m t m. If X χ 2 n and Y χ 2 m are independent, then U = (X/n)/(Y/m) F n,m. 2.2 Estimation ypically, we do not know the population characteristics θ (e.g., mean and variance) because we not know the probabilistic structure governing the random variable. Hence, we collect data to estimate these unknown parameters. 1. Point estimation: An estimator is a function (a rule) of sample data; an estimate is its particular value. Given a sample x 1,..., x n, an estimator of θ can be represented as ˆθ = g(x 1,..., x n ). Examples: Given a sample (x 1, y 1 ),..., (x n, y n ): An estimator of mean: the sample average x = n x i/n. An estimator of variance: the sample variance n (x i x) 2 /(n 1). An estimator of covariance: the sample covariance n (x i x)(y i ȳ)/(n 1). An estimator of correlation: the sample correlation n (x i x)(y i ȳ) [ n (x i x) 2 ] 1/2 [ n (y i ȳ) 2 ] 1/2. 2. Criteria to evaluate an estimator: Unbiasedness: E(ˆθ) = θ. Efficiency: If ˆθ 1 and ˆθ 2 are both unbiased estimators, then ˆθ 1 is said to be more efficient than ˆθ 2 if var(ˆθ 1 ) < var(ˆθ 2 ). 3

8 Mean Square Error: E(ˆθ θ) 2. estimators. his criterion allows us to compare biased 3. Interval estimation: Instead of providing a particular estimate of an unknown parameter, it may be desirable to provide a range of values which may contain the true parameter. o do this, we first specify a confidence coefficient γ, which is a probability, say, hen construct two functions g 1 (x 1,..., x n ) and g 2 (x 1,... x n ) such that P{g 1 (x 1,, x n ) θ g 2 (x 1,, x n )} = γ. he interval (g 1, g 2 ) is called the confidence interval. In words, we are 95% sure that this interval would contain the parameter θ. Example: x i are drawn from independent N(µ, 1). Let γ = Consider an estimator x = n x i/n for µ. It can be verified that x N(µ, 1/n) so that n( x µ) N(0, 1). From the table of the standard normal random variable, P{ 1.96 < n( x µ) < 1.96} = Hence, the 95% confidence interval of µ is ( x 1.96/ n < µ < x / n). 2.3 Hypothesis esting heory (or prior belief) may suggest that the true parameter θ equals a particular value a. Hence, we may be interested in testing the null hypothesis H 0 : θ = a against the alternative hypothesis H a : θ a (or θ > a). 1. est statistic : it typically involves the difference between the estimate and the hypothesized value, e.g., = n( x a) is used to test H 0 : µ = a. A test statistic is a random variable, hence has a distribution from which we can check its probability, e.g., N(0, 1) under the null hypothesis. A large value of is considered to be improbable, hence suggests rejection of the null hypothesis. 2. Significance level α: a probability that we would tolerate when we incorrectly reject the null hypothesis. (his probability is also known as the type I error.) Given α, 4

9 a critical value c α is such that P{ > c α } = α. We reject H 0 if the observed value = t is such that t > c α, and we say that = t is significant at the level α. 3. Power: the probability of rejecting the null hypothesis when it is indeed false. he type II error is the probability of incorrectly accepting the null hypothesis. Hence, the power of a test is (1 type II error). 4. p-value: given an observed test statistic = t, the probability of observing more extreme (i.e., t and t). hat is, the p-value is the α at which = t is just significant. 5

10 3 Random Sampling Model, Projection, and Regression 3.1 Random Sampling Suppose an eonometrician has the observational data {w i, i = 1,..., n} = {w 1, w 2,..., w n }, where each w i is a vector of numerical values which represent the characteristics of individuals. ypically, the data can be written as w 1 (y 1, x 1 ) w 2 (y 2, x 2 ) = y.. i R, x i R k w n (y n, x n ) y 1 x 11 x x 1k y 2 x 21 x x 2k = y n x n1 x n2... x nk = (y, x 1, x 2,..., x k ). If this data is cross-sectional (data w i, i = 1,..., n were observed at a certain time and i represents individual ), it is reasonable to assume they are mutually independent ( spatial data are an exception). Furthermore, if the data are symmetrically gathered (e.g., randomly), it is also reasonable to model each observation as a random draw from the same probability distribution. hus, the data are independent and identical distributed, or i.i.d. We call this a random sample. 3.2 Regression In regression, we want to find the central tendency of the conditional distribution of y given x = x i. A standard measure of central tendency is the mean. he conditional analog is the conditional mean. Let f(y, x) denote the joint density of (y, x), then the conditional density f(y x = x i ) = f(y, x = x i) f x (x = x i ) 6

11 exists, where f x (x = x i ) = f(y, x = x i)dy is the marginal density of x at x i. he conditional mean is defined as the function m(x i ) = E(y x = x i ) = yf(y x = x i )dy. Note that this definition requires the existence of densities. he conditional mean m(x i ) = E(y x = x i ) is a function, meaning that when x equals x i, then the expected value of y is m(x i ). Clearly, it is a random variable since it is a function of random variable x i. he regression error e i is defined to be the difference between y i given at x = x i and its conditional mean: e = (y x = x i ) m(x i ). By construction, this yields the formula (y x = x i ) = m(x i ) + e. (1) For the joint observed data (x i, y i ), i = 1,..., n, the considered regression can be expressed as y i = m(x i ) + e i, i = 1,..., n. It is worth emphasizing that no assumptions have been imposed to develop (1), other than that (y, x) have a joint distribution and E y <. Proposition 3.1 Properties of the regression errors e i 1. E(e i x i ) = E(e i ) = E[h(x i )e i ] = 0 for all function h( ). 4. E(x i e i ) = 0. Proof: 7

12 1. By the definition of e i and the linearity of conditional expectation, E(e i x i ) = E[(y i m(x i )) x i ] = E(y i x i ) E[m(x i ) x i ] = m(x i ) m(x i ), as E[m(x i ) x i ] = m(x i ) = By the law of iterated expectations and the first result E(e i ) = E[E(e i x i )] = E[0] = By essentially the same argument, E[h(x i )e i ] = E{E[h(x i )e i x i ]} = E{h(x i )E[e i x i ]} = E{h(x i ) 0} = Follows from the third result setting h(x i ) = x i. he final result implies that e i and x i are uncorrelated. It is important to understand that despite being uncorrelated, in general e i need not be independent of x i. Generally, the following equations y i = m(x i ) + e i E(e i x i ) = 0, i, are often stated jointly as the regression framework. It is important to understand that this is a framework, not a model, because no restrictions have been placed on the joint distribution of the data. hese equations hold true by definition. A regression model imposes further restrictions on the joint distribution; most typically, restrictions on the permissible class of regression function m(x). 8

13 3.3 Linear Models While m(x) in general can take any shape, a parametric family {m(x, β) : β R k } is typically picked to simplify estimation and interpretation. Sometimes, the form of m(x, β) is given by an economic theory or model. Most often, however, we consider a linear form for convenience and data coherence. A linear model for m(x) is written as m(x i ) = β 1 + β 2 x i2 + + β k x ik, where β = (β 1,..., β k ) is the parameter vector. In matrix notation, m(x i ) = x iβ, where x i = (1, x 2i,..., x ki ). hen the linear regression model becomes y i = x iβ + e i (2) E(e i x i ) = 0 his is a model because m( ) has been restricted to the linear form. While linearity is substantively restricted, it has still a great deal of flexibility. For example, if x i is real-valued and m(x i ) = β 1 + x i β 2 + x 2 i β x k 1 i β k is a polynomial, then a linear regression model still holds, by the redefinition of x i as (1, x i, x 2 i,..., xk 1 i ). he linear conditional mean model is illustrated in the following figure. 3.4 Linear Projection he linear regression model (2) implies E(x i e i ) = 0 as E(x i e i ) = E[x i (y i x iβ)] = E{E[x i (y i x iβ) x i ]} = E{x i [E(y i x i ) x iβ]} = E(x i 0) = 0. 9

14 y E(y x = x 3 ) E(y x = x 2 ) E(y x = x 1 ) y = α 0 + β 0 x x x 1 x 2 x 3 Figure 1: An illustration of the Linear Conditional Mean. his condition is sufficient for many asymptotic results. It is interesting to observe that in linear models, there is always a vector β such that this equation holds. his vector β may be called the linear projection coefficient or linear predictor. Proposition 3.2 For any random variables (y i, x i ), let β = [E(x i x i)] 1 E(x i y i ) (3) and e i = y i x iβ. hen E(x i e i ) = 0. Proof: E(x i e i ) = E[x i (y i x iβ)] = E{x i [y i x ie(x i x i) 1 E(x i y i )]} 10

15 = E{E{x i [y i x ie(x i x i) 1 E(x i y i )] x i }} = E{x i E(y i x i ) x i x i(x i x i) 1 x i E(y i x i )} = 0. If β is defined as in (3), then E(x i e i ) = 0 holds by construction. It does not necessarily follow that E(e i x i ) = 0. his only holds if the true conditional mean of y i is x iβ, i.e., m(x i ) = x iβ, which is substantive restriction. hus the linear regression assumption that E(e i x i ) = 0 is more restrictive than the linear projection construction. It turns out that for most issues in statistical inferences, the projection assumption is sufficient. herefore, the more general assumption E(x i e i ) = 0 is adopted. For econometric practice, however, it is typical desirable for x iβ to represent the conditional mean of y i, rather than a simple linear projection. So while it is not necessary for inference on β, it may be necessary for inference on an economic relationship of interest. 3.5 Assumptions on the Regression Errors While the regression motivation leads naturally to the model (2), at times it is more convenient to adopt assumptions which are either more restrictive or less restrictive. he standard types of models considered by econometricians and their strength and weakness are discussed as the follows. All the models are based on the decomposition y i = x iβ + e i. (4) In addition, all models normalized the error so that E(e i ) = 0 and presume a finite variance E(e 2 i ) = σ2 <. Definition 3.1 he Linear Projection Model is (4) plus E(x i e i ) = 0. he advantage of he linear projection model is that it is true by construction, and many inferential results hold under this broad condition. he disadvantage is that the coefficient vector β may not have useful economic interpretations without additional structure. Definition 3.2 he Linear Regression Model is (4) plus E(e i x i ) = 0. 11

16 his model leads naturally from the derivation of the conditional mean function. he primary advantage is that the parameter β is easily interpretable. Definition 3.3 he Homoskedastic Regression Model is the Linear Regression Model plus E(e 2 i x i ) = σ 2. (5) his model adds the auxiliary assumption (5) that the regression is conditionally homoskedastic. his assumption greatly simplifies many theoretical arguments and calculations, and it therefore very useful in illustrative arguments. Many formulae simplify under this assumption, and as a result, alternative estimators and techniques are utilized. he danger in this assumption is that these simplifications result in incorrect answers and inferences if indeed the homoskedasticity assumption is false. Another meaningful justification for assumption (5) is that while it may not be precisely true in the data, it may be approximately true, and in some applications the cost of imposing homoskedasticity on the estimates may be less than the cost of using the more general techniques appropriate for the linear regression model. Definition 3.4 he Classical Regression Model is (4) plus that e i is independent of x i. Usually, x i is assumed to be nonstochastic. his model is more restrictive than the homoskedastic regression model, and is a common starting point in classical econometrics textbooks. Definition 3.5 he Normal Regression Model is (4) plus that e i is independent of x i and distributed as N(0, σ 2 ). he above five models are strictly nested, with the first (the linear projection model) the less restrictive, and the last (the normal regression model) the most restrictive. he conditional variance function is var(y i x = x i ) = E(e 2 i x = x i ) = σ 2 (x i ) which is (potentially) a function of x i. Just as the conditional mean function may take any form, so may the conditional variance function (other than the restriction that it is non-negative). Given the random variable x i, the conditional variance is σi 2 = σ2 (x i ). 12

17 In the general case where σ 2 (x i ) is not a constant function, so σi 2 is different across i, we say that the error e i is heteroskedastic. On the other hand, when the function σ 2 (x i ) is a constant so that the conditional variance σi 2 all equal the same constant value σ 2, we say that the error e i is homoskedastic. 13

18 4 Classical Linear Regression Models In classical analysis, the tools of regression has been applied to study how response variable y is affected by the independent variables x of an experiment in the Lab. Usually, the values of x are designed by scientists so that they are controllable. herefore, x are also called the controlled variables or designed variables. hus, the explanatory variables are assumed to be nonstochastic in the classical regression analysis. 4.1 Simple Linear Regression It is typical and convenient to describe an economic relationship using a linear model. Hence, given a set of economic data, one would like to find a linear equation (straight line) that best fits the data. We know that a good estimator is the one has smallest mean squared errors. regression analysis, we are trying to estimate the dependent variable y with a set of explanatory variables x. hat is, we want to find an estimator ŷ = f(x) to estimate y. hen the mean squared errors of ŷ is mse(ŷ) = E[(ŷ y) 2 ]. As we know that the arithmetic average of sample observations is nothing but the expectation value evaluated with the sample relative frequencies as its probabilities. herefore, the sample counter part of the mse(ŷ) is the arithmetic average, n (y i f(x i )) 2 /n. In linear regression content, the estimator f(x) is restricted to a linear function form, i.e., f(x) = β 1 x 1 + β 2 x β k x k. he discussion of simple linear regression focuses on k = 2. hus, we want to find an estimator which makes n (y i α βx i ) 2 /n as small as possible. his is the ordinary least square estimator we are going to discuss. Given the linear conditional mean E(y x) = α 0 +β 0 x is assumed (usually it is unknown) from an economic or financial theory, a linear regression model is specified as y = α+βx+u. Under the believe of the representativity on obtained sample observations {x i, y i } n, the relations y i = α 0 + β 0 x i + e i, i = 1,..., n are believed and the regression model is appropriate for sample observations, i.e., y i = α + βx i + u i, i = 1,..., n. he relations y i = α 0 +β 0 x i +e i, i = 1,..., n is sometimes called the identification of relation between y and x, denoted as ID1 hereafter. hat is ID 1: y i = α 0 + β 0 x i + e i, i = 1,..., n. he OLS estimators of α and β are obtained by minimizing the average of squared In 14

19 errors: f(α, β) = 1 n (y i α βx i ) 2. he first order conditions are α f(α, β) = 2 1 (y n i ˆα n ˆβ n x i ) = 0, (6) α f(α, β) = 2 1 (y n i ˆα n ˆβ n x i )x i = 0, (7) which are also called normal equations. From (6) we obtain ˆα n = 1 y n i ˆβ 1 n x n i = ȳ n ˆβ n x n, (8) and by plugging this ˆα n into (7) we get 1 y n i x i = (ȳ n ˆβ n x n ) 1 x n i + ˆβ 1 n n so that ˆβ n ( 1 n ) x i (x i x n ) = 1 n x 2 i x i (y i ȳ n ). (9) It follows from (8) and (9) that the OLS estimators of α and β are n ˆβ n = (y i ȳ n )(x i x n ) n (x i x n ) 2, (10) ˆα n = ȳ n ˆβ n x n. (11) Note that ˆβ n exists uniquely if n (x i x n ) 2 is not equal to zero deterministically. It is obvious n (x i x n ) 2 = 0 if all x i s are constant and also could be zero when x i is stochastic. herefore, we have to impose the following assumption to have ˆβ n uniquely and deteriministically, A1: x i, i = 1,..., n are not all constant and nonstochastic. he equation ŷ = ˆα n + ˆβ n x is the regression line. he values ŷ i are called fitted values, and e i = y i ŷ i are called residuals. Note that by normal equations (6), n e i = 0 so that n y i = n ŷi and ȳ n = ŷ. Also, note that by (7), n x ie i = 0 so that n ŷie i = 0. he OLS estimators have the following properties under some appropriate assumptions. 15

20 1. Given A1 OLS estimators are linear estimators in y i, i.e., ˆβn = n k iy i and ˆα n = n h iy i. ˆβ n = = = n (x i x n )y i n (x i x n ) 2 x i x n n (x i x n ) 2 y i k i y i. and ˆα n = ȳ n ˆβ n x n = y i /n = = k i y i x n ( ) 1 n k i x n h i y i. Note that n k i = 0 and n k2 i = 1/ n (x i x n ) 2. y i 2. Given ID 1: y i = α 0 + β 0 x i + e i, i = 1,..., n and A1, the OLS estimators are conditional unbiased. First observe that, denote X = (x 1, x 2,..., x n ), ˆβ n = = n (x i x n )y i n (x i x n ) 2 n (x i x n )(α 0 + β 0 x i + e i ) n (x i x n ) 2 under ID1 = α 0 n (x i x n ) n (x i x n ) 2 + β 0 n (x i x n )x i n (x i x n ) 2 = β 0 + = β 0 + n (x i x n )e i n (x i x n ) 2 k i e i. n + (x i x n )e i n (x i x n ) 2 16

21 o prove the unbiasedness, take expectation to both sides in above equation, ( n E( ˆβ n X) = E β 0 + (x ) i x n )e i n (x i x n ) 2 X ( n = β 0 + E (x ) i x n )e i n (x i x n ) 2 X n = β 0 + (x i x n )E(e i X) n (x i x n ) 2 = β 0. As, by iterated expectation, E( ˆβ n ) = E{E[ ˆβ n X]} = E{β 0 } = β 0. hat is, ˆβ n is also unconditional unbiased. Besides, it can be seen that E(ˆα n X) = E(ȳ n ˆβ n x n X) = E( y i /n ˆβ n x n X) = E( (α 0 + β 0 x i + ɛ i )/n ˆβ n x n X) = E(α 0 + (β 0 ˆβ n ) x n + = α 0 + E(β 0 ˆβ n ) x n + = α 0, e i /n X) E(e i X)/n since ˆβ n is conditionally unbiased for β 0 so that E(β 0 ˆβ n X) = 0. Alternatively, as E(ȳ n X) = α 0 + β 0 x n, it follows that Note that E(ˆα n X) = E(ȳ n ˆβ n x n X) = α 0. ˆα n = α 0 + (β 0 ˆβ n ) x n + e i /n 17

22 = α 0 = α 0 + = α 0 + k i x n e i + e i /n [1/n k i x n ]e i h i e i. 3. Under the homoskedastic liner model, i.e., ID 1 plus E(e i ) = 0 and var(e i ) = σ 2 0 It can be shown that σ 2ˆα n := var(ˆα n ) = σ 2 0 ( 1 n + x 2 ) n n (x i x n ) 2, σ 2ˆβn := var( ˆβ n ) = σ 2 0 n (x i x n ) 2, := cov(ˆα σˆαn ˆβn n, ˆβ n ) = σ0 2 x n n (x i x n ) 2. First, we observe that E(y i ) = E(α 0 + β 0 x i + ɛ i ) = α 0 + β 0 x i, var(y i ) = E[(y i E(y i )) 2 ] = E(α 0 + β 0 x i + ɛ i α 0 β 0 x i ) 2 ] = E(ɛ 2 i ) = σ0 2 cov(y i, y j ) = E[(y i E(y i ))(y j E(y j ))] = E(ɛ i ɛ j ) = 0. by [A.3] hen, we prove above results as follows: σ 2ˆβn = var( = k i y i ) ki 2 var(y i ) + 2 = σ 2 0 ki 2 = j=i+1 σ 2 0 n (x i x n ) 2. k i k j cov(y i, y j ) 18

23 Besides, Finally, σ 2ˆα n = var( = h i y i ) h 2 i var(y i ) + 2 = σ 2 0 = σ 2 0 = σ 2 0 h 2 i (1/n k i x n ) 2 j=i+1 (1/n 2 2k i x n /n + ki 2 x 2 n) = σ0[1/n 2 2 x n k i + x 2 n = σ 2 0 σˆαn ˆβn = cov( h i h j cov(y i, y j ) ki 2 ] ( 1 n + x 2 ) n n (x i x n ) 2. k i y i, h i y i ) = E{[ k i y i E( k i y i )][ h i y i E( h i y i )]} = E{[ k i (y i E(y i ))][ h i (y i E(y i ))]} = h i k i var(y i ) + 2 = σ 2 0 = σ 2 0 h i k i (1/n k i x n )k i = σ 2 0/n k i σ0 2 x n j=i+1 k 2 i h i k j cov(y i, y j ) 19

24 = x n σ 2 0 n (x i x n ) (Gauss-Markov heorem) result says that, given ID 1, A2 ˆα n and ˆβ n have the smallest variance (the most efficient) among all linear and unbiased estimators of α 0 and β 0, i.e., they are the Best Linear Unbiased Estimators (BLUE). proof: Let β n be any other linear estimator in y i than ˆβ n so that it can be written as β n = = (k i + c i ) y i (k i + c i )(α 0 + β 0 x i + ɛ i ) = α 0 (k i + c i ) + β 0 (k i + c i ) x i + (k i + c i ) ɛ i, given c i 0, i = 1,..., n. By unbiasedness of β n, n (k i + c i ) = 0 and n (k i + c i ) x i = 1. hus, given n k i = 0 and n k i x i = 1, c i = 0 x i c i = 0. As ( ) var( β n ) = var (k i + c i ) y i = = = (k i + c i ) 2 var(y i ) + 2 (k i + c i ) 2 var(y i ) ki 2 σ0 2 + k i k j cov(y i, y j ) j=i+1 c 2 i σ k i c i σ

25 and k i c i = (xi x n )c i (xi x n ) 2 = xi c i x n ci (xi x n ) 2 = 0, we have var( β n ) = σ 2 0 (xi x n ) 2 + σ2 0 c 2 i var( ˆβ n ). herefore, ˆβ n has the smallest variance among linear and unbiased estimators. 5. ˆσ 2 n = n e2 i /(n 2) is unbiased for σ2 0. As e i = y i ŷ i = y i ˆα n ˆβ n x i = (α 0 β 0 x i + ɛ i ) (ȳ n ˆβ n x n ) ˆβ n x i = (α 0 β 0 x i + ɛ i ) ( (α 0 + β 0 + ɛ i )/n ˆβ n x n ) ˆβ n x i = β 0 x i + ɛ i β 0 x n ɛ n ˆβ n x n ˆβ n x i = ( ˆβ n β 0 )(x i x n ) + (ɛ i ɛ n ), e 2 i = (ɛ i ɛ n ) + ( ˆβ n β 0 ) 2 (x i x n ) 2 2( ˆβ n β 0 ) (ɛ i ɛ n )(x i x n ). Observe that E[ (ɛ i ɛ n ) 2 ] = E( ɛ 2 i n ɛ 2 n) = var(ɛ i ) nvar( ɛ n ) = nσ 2 0 n(σ 2 0/n) = (n 1)σ

26 And, E[( ˆβ n β 0 ) 2 (x i x n ) 2 ] = Finally, as ˆβ n = β 0 + n k iɛ i, and E[( ˆβ n β 0 )ɛ i ] = E[( = (x i x n ) 2 E( ˆβ n β 0 ) 2 (x i x n ) 2 [σ0/ 2 = σ 2 0. k i ɛ i )ɛ i ] = k i E(ɛ 2 i ) = k i σ 2 0, E[( ˆβ n β 0 ) ɛ n ] = E[( k i ɛ i )( ɛ i /n)] his implies hus, E[ 2( ˆβ n β 0 ) E( = 2[ = 2 = 1 n k i σ0 2 = 0. (ɛ i ɛ n )(x i x n )] (x i x n )E[( ˆβ n β 0 )ɛ i ]] + 2 (x i x n )k i σ0 2 = 2σ 2 0. (x i x n ) 2 ] (x i x n )E[( ˆβ n β 0 ) ɛ n ] e 2 i ) = (n 1)σ0 2 + σ0 2 2σ0 2 = (n 2)σ0. 2 We have proved that E(ˆσ 2 n) = σ

27 6. As σ 2 0 is unknown, var(ˆα n) and var( ˆβ n ) can be estimated by ( s 2ˆα n := var(ˆα 1 n ) = ˆσ n 2 n + x 2 ) n n (x i x n ) 2, s 2ˆβn := var( ˆβ n ) = ˆσ 2 n n (x i x n ) 2, sˆαn ˆβn := cov(ˆα n, ˆβ n ) = σ0 2 x n n (x i x n ) 2. Since ˆσ n 2 is unbiased for σ0 2, s2ˆα n, s 2ˆβn and are all unbiased for σ sˆαn 2ˆα ˆβn n, σ 2ˆβn, and, respectively. σˆαn ˆβn Note that, the identification plus the assumptions mentioned previously are usually called the classical assumptions: A1 y i = α 0 + β 0 x i + ɛ i, i = 1,..., n. A2 x i are nonstochastic and nonconstant. A3 E(ɛ i ) = 0. A4 E(ɛ 2 i ) = σ2 0, and E(ɛ iɛ j ) = 0 for all i j. A5 ɛ i are i.i.d. N(0, σ0 2) Hypothesis esting o perform hypothesis testing, we now assume assumption 5 (ɛ i are i.i.d. N(0, σ0 2 )) holds. As assumption 5 implies assumptions 3 and 4, previous results remain valid. From assumption 5 we have 1. y i are independent N(α 0 + β 0 x i, σ 2 0 ). 2. ˆα n N(α 0, σ 2ˆα n ) and ˆβ n N(β 0, σ 2ˆβn ). 3. ˆα n α 0 σˆαn N(0, 1) and ˆβ n β 0 σ ˆβn N(0, 1). 23

28 4. n e2 t /σ 2 0 = (n 2)ˆσ2 n/σ 2 0 χ2 n 2. Also, ˆα n and ˆβ n are independent of ˆσ 2 n. proof: As e i = ( ˆβ n β 0 )(x i x n ) + (ɛ i ɛ n ), we have e 2 i = (ɛ i ɛ n ) 2 + ( ˆβ n β 0 ) 2 (x i x n ) 2 2( ˆβ n β 0 ) (x i x n )(ɛ i ɛ n ). (12) For the first term in (12), we know (ɛ i ɛ n ) 2 = = = = = [(ɛ i E(ɛ i )) ( ɛ n E(ɛ i ))] 2 [ɛ i E(ɛ i )] [ ɛ n E(ɛ i )] 2 [(ɛ i E(ɛ i ))( ɛ n E(ɛ i ))] [ɛ i E(ɛ i )] 2 + [ ɛ n E(ɛ i )] 2 ( ) 2( ɛ n E(ɛ i ) ɛ i ne(ɛ i ) [ɛ i E(ɛ i )] 2 + [ ɛ n E(ɛ i )] 2 2( ɛ n E(ɛ i ) (n ɛ n ne(ɛ i )) [ɛ i E(ɛ i )] 2 + [ ɛ n E(ɛ i )] 2 2n( ɛ n E(ɛ i ) 2, 24

29 hus, (ɛ i ɛ n ) 2 /σ0 2 = [ ] ɛi E(ɛ i ) 2 + σ [ ] ( ɛn E(ɛ 2 i ) 2 σ 0 / n [ ɛn E(ɛ i ) σ 0 χ 2 (n) + χ 2 (1) 2χ 2 (2) = χ 2 (n 1). Next, for the second term in (12), ( ˆβ n β 0 ) 2 (x i x n ) 2 /σ0 2 = ˆβ n β 0 σ0 2/ n (x i x n ) 2 χ 2 (1). Finally, for the last term in (12), as ˆβ n β 0 = n k iɛ i 2( ˆβ n β 0 ) (x i x n )(ɛ i ɛ n )/σ0 2 = 2( ˆβ n β 0 ) (x i x n )ɛ i /σ0 2 = 2( ˆβ n β 0 )/σ 2 0 = 2( ˆβ n β 0 )/σ 2 0 = 2 ( ˆβ n β 0 ) 2 σ 0 2 n (x i x n) 2 = 2 ˆβ n β 0 ] 2 (x i x n ) 2 x i x n n (x i x n ) 2 ɛ i (x i x n ) 2 ( ˆβ n β 0 ) σ 2 0 n (x i x n) 2 2N(0, 1) 2 2χ 2 (1). 2 25

30 herefore, (n 2)ˆσ 2 n σ 2 0 = e 2 i /σ0 2 χ 2 (n 1) + χ 2 (1) 2χ 2 (1) = χ 2 (n 2). 5. ˆα n α 0 sˆαn t n 2 and ˆβ n β 0 s ˆβn t n 2. proof: ˆα n α 0 sˆαn = = ˆα n α 0 ˆσ 2 n [1/n + x 2 n/ n (x i x n ) 2 ] ˆα n α 0 σ 2 0 [1/n+ x 2 n/ n (x i x n) 2 ] [(n 2)ˆσ n/σ ]/(n 2) N(0, 1) = t(n 2). χ 2 (n 2) n 2 Similarly, ˆβ n β 0 s ˆβn = = ˆβ n β 0 ˆσ 2 n / n (x i x n ) 2 ˆβ n β 0 σ 2 0 / n (x i x n) 2 (n 2)ˆσ n/σ ]/(n 2) N(0, 1) = t(n 2). χ 2 (n 2) n 2 o test the null hypothesis H 0 : α 0 = a or H 0 : β 0 = b we can use t-tests. 1. One-sided test: H a : β 0 > b. Under the null hypothesis, τ ˆβn = ( ˆβ n b)/s ˆβn t n 2. Given the significance level γ and degrees of freedom n 2, the critical value c γ,n 2 can be found in the t-table, and we reject H 0 if τ ˆβn > c γ,n 2. Similarly, we can test against H a : α 0 > a by checking whether τˆαn = (ˆα n a)/sˆαn > c γ,n 2. 26

31 H a : β 0 < b. Reject H 0 if τ ˆβn < c γ,n wo-sided test: For H a : β 0 b, reject H 0 if τ ˆβn > c γ/2,n 2 or τ ˆβn < c γ/2,n 2. For H a : α 0 a, reject H 0 if τˆαn > c γ/2,n 2 or τˆαn < c γ/2,n he (1 γ) confidence intervals for β 0 and α 0 are ( ˆβ n s ˆβn c γ/2,n 2, ˆβn + s ˆβn c γ/2,n 2 ), (ˆα n sˆαn c γ/2,n 2, ˆα n + sˆαn c γ/2,n 2 ) Prediction Based on the regression line estimated with n observations, we can predict ŷ n+1 = ˆα + ˆβx n+1, provided that the new information x n+1 is available. Observe that the prediction error has mean zero E(ŷ n+1 y n+1 ) = E[(ˆα n + ˆβ n x n+1 ) (α 0 + β 0 x n+1 + ɛ n+1 )] = 0 and variance E(ŷ n+1 y n+1 ) 2 = E[(ˆα n α 0 ) + ( ˆβ n β 0 )x n+1 ɛ n+1 ] 2 = var(ˆα n ) + var( ˆβ n )x 2 n+1 + σ x n+1 cov(ˆα n, ˆβ n ) = σ0 ( n + (x n+1 x) 2 ) n (x i x) 2. Hence, we have better prediction if x n+1 is close to x. 4.2 Multiple Linear Regression More generally, we may postulate a linear model with k explanatory variables to represent the identification equation: of y: y = β 10 x 1 + β 20 x β k0 x k + e. Given a sample of observations, this specification can also be expressed as the identification condition: y = Xβ 0 + e, (13) 27

32 where β 0 = (β 10 β 20 β k0 ) is the vector of unknown parameters, and y and X contain all the observations of the dependent and explanatory variables, i.e., y 1 x 11 x 12 x 1k y 2 x 21 x 22 x 2k y =, X =..,..... y x 1 x 2 x k where each column vector of X contains observations for an explanatory variable. he basic identifiability requirement of this specification is that the number of regressors, k, is strictly less than the number of observations,, such that the matrix X is of full column rank k. hat is, the model does not contain any redundant regressor. It is also typical to set the first explanatory variable as the constant one so that the first column vector of X is a 1 vector of ones, l. o summary, the identification condition is ID 1: y t = β 10 + β 20 x t2 + β 30 x t3 + + β k0 x tk + e t, t = 1,...,. Our objective now is to find a k-dimensional regression hyperplane that best fits the data (y, X). In the light of Section 4.1, we must minimize the average of the sum of squared errors: Q(β) := 1 (y Xβ) (y Xβ). (14) he first order conditions for the OLS minimization problem, also known as the normal equations, are: β Q(β) = β (y y 2y Xβ + β X Xβ)/ = 2X (y Xβ)/ set = 0, the last equality can also be written as X Xβ set = X y which is known as the normal equation. o have a unique solution for the system equation for β, (X X) 1 has to exist. his is first assumption has to be satisfied to have solution for β uniquely. [A2] he k data matrix X is full column rank. 28

33 Given that X is of full column rank, X X is p.d. and hence invertible. he solution to the normal equations can then be expressed as ˆβ = (X X) 1 X y. (15) It is easy to see that the second order condition is also satisfied because 2 β Q(β) = 2(X X)/ is p.d. Hence, ˆβ is the minimizer of the OLS criterion function and known as the OLS estimator for β. As the matrix inverse is unique, the OLS estimator is also unique. he vector of OLS fitted values is ŷ = X ˆβ, and the vector of OLS residuals is ê = y ŷ. By the normal equations, X ê = 0 so that ŷ ê = 0. When the first regressor is the constant one, X ê = 0 implies that l ê = êt = 0. It follows that y t = ŷt, and the sample average of the data y t is the same as the sample average of the fitted values ŷ t. If X is not of full column rank and then its column vectors satisfies an exact linear relationship, this is also known as the problem of exact multicollinearity. In this case, without loss of generality we can write x 1 = γ 2 x γ k x k, where x i is the ith column of X and γ 2,..., γ k are not all zero. hen, for any number a 0, β 1 x 1 = (1 a)β 1 x 1 + aβ 1 (γ 2 x γ k x k ). he linear specification (13) is thus observationally equivalent to Xβ := (1 a)β 1 x 1 + (β 2 + aβ 1 γ 2 )x (β k + aβ 1 γ k )x k, where the elements of β vary with a and therefore could be anything. hat is, the parameter vector β is not identified when exact multicollinearity is present. Practically, 29

34 when X is not of full column rank, X X is not invertible, and there are infinitely many solutions to the normal equations X Xβ set = X y. Consequently, the OLS estimator ˆβ cannot be computed as (15). Exact multicollinearity usually arises from inappropriate model specifications. For example, including both total income, total wage income, and total non-wage income as regressors results in exact multicollinearity because total income is, by definition, the sum of wage and non-wage income.. It is also easy to verify that the magnitude of the coefficient estimates ˆβ i are affected by the measurement units of variables. hus, a larger coefficient estimate does not necessarily imply that the associated explanatory variable is more important in explaining the behavior of y. In fact, the coefficient estimates are not comparable in general. Remark: he OLS estimators are derived without resorting to the knowledge of the true relationship between y and X. hat is, whether y is indeed generated according to our linear specification is irrelevant to the computation of the OLS estimator; it does affect the properties of the OLS estimator, however. 4.3 Geometric Interpretations We know that the OLS estimation result has nice geometric interpretations. he vector of OLS fitted values can be written as ŷ = X(X X) 1 X y = P X y, here, and in what follows, P X = X(X X) 1 X is an orthogonal projection matrix. Hence, ŷ is the orthogonal projection of y onto span(x). he OLS residual vector is thus ê = y ŷ = (I P X )y, which is the orthogonal projection of y onto span(x) and orthogonal to ŷ and X. Consequently, ŷ is the best approximation of y, given the information contained in X. Figure 2 illustrates a simple case where the model contains only two explanatory variables. Let X = [X 1 X 2 ], where X 1 is k 1 and X 2 is k 2, and k 1 + k 2 = k. We can write y = X 1 β 1 + X 2 β 2 + random error, and ˆβ = ( ˆβ 1 ˆβ 2 ). Let P X1 = X 1 (X 1 X 1) 1 X 1 and P X 2 = X 2 (X 2 X 2) 1 X 2 denote the orthogonal projection matrices on span(x 1 ) and span(x 2 ), respectively. We have the following result. 30

35 x 1 P Xy = x 1 ˆβ1 + x 2 ˆβ2 x 1 ˆβ1 x 2 x 2 ˆβ2 y ê = (I P X )y span(x 1, x 2 ) Figure 2: he orthogonal projection of y onto span(x 1, x 2 ). heorem 4.1 (Frisch-Waugh-Lovell) Given a vector y, (I P X2 )y and (I P X1 )y can be uniquely decomposed into two orthogonal components: (I P X2 )y = (I P X2 )X 1 ˆβ1 + (I P X )y, (I P X1 )y = (I P X1 )X X2 ˆβ2 + (I P X )y. Proof: As I P X2 is in span(x 2 ) and I P X is in span(x) span(x 2 ), we have (I P X2 )(I P X ) = I P X. Hence, (I P X2 )y = (I P X2 )P X y + (I P X2 )(I P X )y = (I P X2 )X 1 ˆβ1 + (I P X2 )X 2 ˆβ2 + (I P X )y = (I P X2 )X 1 ˆβ1 + (I P X )y, and these two components are orthogonal because y P X (I P X2 )(I P X )y = y P X (I P X )y = 0. he second assertion follows similarly. 31

36 An implication of heorem 4.1 is that (I P X2 )X 1 ˆβ1 = (I P X2 )P X y is the orthogonal projection of (I P X2 )y onto span((i P X2 )X 1 ). hus, we can write ˆβ 1 = [X 1(I P X2 )X 1 ] 1 X 1(I P X2 )y, as can be directly verified from (15) using the matrix inversion formula. hat is, ˆβ1 can also be obtained by regressing (I P X2 )y on (I P X2 )X 1, where (I P X2 )y and (I P X2 )X 1 are, respectively, the residual vectors from two purging regressions: y on X 2 and X 1 on X 2. Moreover, the residual vector from regressing (I P X2 )y on (I P X2 )X 1 is the same as the residual vector from regressing y on X. Similarly, ˆβ 2 = [X 2(I P X1 )X 2 ] 1 X 2(I P X1 )y can be obtained by regressing (I P X1 )y on (I P X1 )X 2. Note that ˆβ 1 is not the same as the OLS estimator from regressing y on X 1, and that ˆβ 2 is not the same as the OLS estimator from regressing y on X 2, except when X 1 is orthogonal to X 2. From heorem 4.1 we can re-write (I P X2 )y = (I P X2 )P X y + (I P X )y as P X2 y = P X2 P X y. hus, a second implication of heorem 4.1 is that projecting y directly on span(x 2 ) is equivalent to performing iterated projections of y on span(x) then on span(x 2 ). Similarly, we have P X1 y = P X1 P X y. For an illustration of heorem 4.1 see Figure 3; see also Davidson & MacKinnon (1993) for more details. As an application, consider the model with X = [X 1 X 2 ], where X 1 contains the constant term and a time trend variable t, and X 2 includes k 2 other explanatory variables. hen, the OLS estimates of the coefficients of X 2 are the same as those obtained by regressing (detrended) y on detrended X 2, where detrended y and X 2 are obtained by regressing y and X 2 on X 1, respectively. 4.4 Measures of Goodness of Fit We have learned that for a given linear specification, the OLS method yields the best fit of data. In practice, one may postulate different linear models with different regressors and try to choose a particular one among them. It is therefore of interest to compare the 32

37 y (I P X )y span(x 1, x 2 ) x 1 (I P X2 )y P X y x 1 ˆβ1 (I P X2 )P X y x 2 x 2 ˆβ2 P X2 y Figure 3: An illustration of the Frisch-Waugh-Lovell heorem. performance across models. In this section we discuss how to measure the goodness of fit of models. A natural goodness-of-fit measure is the regression variance ˆσ 2 = ê ê/( k). his measure, however, is not invariant with respect to measurement units of the dependent variable. Instead, the following relative measures of goodness of fit are adopted in the linear regression analysis. Recall that y 2 t }{{} SS = ŷ 2 t }{{} RSS + ê 2 t }{{} ESS. where SS, RSS, and ESS denote total, regression, and error sum of squares, respectively. he non-centered coefficient of determination (or non-centered R 2 ) is defined to be the proportion of SS that can be explained by the regression hyperplane: R 2 = RSS SS = 1 ESS SS. (16) Clearly, 0 R 2 1, and the larger the R 2, the better the model fits the data. In particular, a model has a perfect fit if R 2 = 1, and it does not account for any variation 33

38 of y if R 2 = 0. Note that R 2 is non-decreasing in the number of variables in the model. hat is, adding more variables to a model will not reduce its R 2. As ŷ ŷ = ŷ y, we can also write R 2 = ŷ ŷ y y = (ŷ y) 2 (y y)(ŷ ŷ) = cos2 θ, where θ is the angle between y and ŷ. hat is, R 2 is a measure of the linear association between these two vectors. It is also easily verified that, when the model contains a constant term, (y t ȳ) 2 = }{{} Centered SS (ŷ t ŷ) 2 + }{{} Centered RSS ê 2 t }{{} ESS, where ŷ = ȳ = y t/. Analogous to (16), the centered coefficient of determination (or centered R 2 ) is defined as Centered R 2 = Centered RSS Centered SS = 1 ESS Centered SS. (17) his measure also takes on values between 0 and 1 and is non-decreasing in the number of variables in the model. In contrast with the non-centered R 2, this measure excludes the effect of the constant term in the model, and is hence invariant with respect to constant addition. If the model does not contain a constant term, the centered R 2 may be negative. As (y t ȳ)(ŷ t ȳ) = we immediately get (ŷ t ȳ) 2 (y t ȳ) 2 = (ŷ t ȳ) 2, [ (y t ȳ)(ŷ t ȳ)] 2 [ (y t ȳ) 2 ][ (ŷ t ȳ) 2 ]. hat is, the centered R 2 is also the squared sample correlation coefficient of y and ŷ. If R 2 is the only criterion to determine model adequacy, one would tend to select a model with more explanatory variables. he adjusted R 2, R 2, is the centered R 2 adjusted for the degrees of freedom: R 2 = 1 ê ê/( k) (y y ȳ 2 )/( 1). 34

39 It can also be shown that R 2 = 1 1 k (1 R2 ) = R 2 k 1 k (1 R2 ). hat is, R2 is the centered R 2 with a penalty term depending on model complexity and explanatory ability. Clearly, R2 < R 2 except for k = 1 or R 2 = 1. Note also that R 2 need not be increasing with the number of explanatory variables; in fact, R2 < 0 when R 2 < (k 1)/( 1). Remark: Models for different dependent variables are not comparable in terms of their R 2 because their total variations (i.e., SS) are different. For example, R 2 of models for y and log y are not comparable. 5 Properties of the OLS Estimators 5.1 Bias Proposition 5.1 If y = Xβ 0 + e, then ˆβ β 0 = (X X) 1 X e. Proof: Since y = Xβ 0 + e, ˆβ = (X X) 1 X y = (X X) 1 X (Xβ 0 + e) = (X X) 1 X Xβ 0 + (X X) 1 X e = β 0 + (X X) 1 X e. o have ˆβ to be unbiased for β 0, the following assumption has to be imposed: A3: E(e X) = 0. Proposition 5.2 Given ID1, A2 and A3, E(ˆβ β 0 X) = 0 and E(ˆβ ) = β 0. Proof: By the previous result, E[(ˆβ β 0 ) X] = E[(X X) 1 X e X] = (X X) 1 X E(e X) = 0, and E(ˆβ X) = β 0. And then applying the law of iterated expectations, E(ˆβ ) = E[E(ˆβ X)] = E(β 0 ) = β 0. 35

40 hus ˆβ is unbiased for β 0. Indeed, it is conditionally unbiased, conditional upon X, which is a stronger result. 5.2 Variance-Covariance Matrix of Regression Error he conditional variance-covariance matrix of the regression error vector e is D = var(e X) = E(ee X) E(e 2 1 x 1) E(e 1 e 2 x 1 ) E(e 1 e 3 x 1 ) E(e 1 e x 1 ) E(e 2 e 1 x 2 ) E(e 2 2 x 2) E(e 2 e 3 x 2 ) E(e 2 e x 2 ) = E(e 3 e 1 x 3 ) E(e 3 e 2 x 3 ) E(e 2 3 x 3) E(e 3 e x 3 )..... E(e e 1 x ) E(e e 2 x ) E(e e 3 x ) E(e 2 x ) when the data are random sample then (x t, e t ) is independent of (x s, e s ) for t s, thus E(e 2 t X) = E(e 2 t x t ) = σ 2 t E(e t e s X) = E(e t e s x t ) = E(e t x t ) E(e s x t ) = 0. hus in general D = var(e X) = σ σ σ (18) σ 2 when the data are random. Under the homoskedasticity restriction (5), E(e 2 t x t ) = σ0 2 for all t, then σ σ D = 0 0 σ0 2 0 = σ0i 2, (19) σ0 2 which is the classical assumption for the linear regression models. hat is A4: var(e X) = σ0 2I. 36

41 5.3 Variance-Covariance Matrix of OLS Estimator he conditional variance-covariance matrix for ˆβ is [ ] V = E (ˆβ β 0 )(ˆβ β 0 ) X Since ˆβ β 0 = (X X) 1 X e, V = E [ (X X) 1 X ee X(X X) 1 X ] = (X X) 1 X E[ee X]X(X X) 1 = (X X) 1 X DX(X X) 1, where D is defined in (18). It may be helpful to observe that X DX = x t x tσt 2. In the special case of (5), then σ 2 t = σ 2 0, D = σ2 0 I and X DX = X Xσ 2 0. hus, V simplifies to V = (X X) 1 X Xσ 2 0(X X) 1 = σ 2 0(X X) 1. heorem 5.1 In the linear regression model, V = (X X) 1 X DX(X X) 1. (20) If (5), E(e 2 t x t ) = σ 2 0, holds, V = σ 2 0(X X) 1. (21) he expression V = (X X) 1 X DX(X X) 1 is often called a sandwich formula, because the central variance matrix X DX is sandwiched between the moment matrices (X X) Gauss-Markov heorem heorem 5.2 (Gauss-Markov) In the linear regression model, ˆβ is the best linear unbiased estimator for β 0. 37

42 Proof: Consider an arbitrary linear estimator ˇβ = Ay = [(X X) 1 X + C]y, where C is an arbitrary non-zero matrix. ˇβ is unbiased if and only if CX = 0 since ˇβ = [(X X) 1 X + C]y = [(X X) 1 X + C](Xβ 0 + e) = β 0 + (X X) 1 X e + CXβ 0 + Ce. and E[ˇβ X] = β 0 + E[(X X) 1 X e X] + E[CXβ 0 X] + E[Ce X] = β 0 + CXβ 0. It follows that when ˇβ is unbiased var(ˇβ X) = E[(ˇβ β 0 )(ˇβ β 0 ) X] = E{[(X X) 1 X e + Ce][(X X) 1 X e + Ce] X} = (X X) 1 X σ0i 2 X(X X) 1 + (X X) 1 X C +CX(X X) 1 + Cσ0I 2 C = σ0(x 2 X) 1 + σ0cc 2, where the first term on the right-hand side is var(ˆβ ) and the second term is clearly p.s.d. hus, for any linear unbiased estimator ˇβ, var(ˇβ ) var(ˆβ ) is a p.s.d. matrix. 5.5 OLS Estimation of Error Variance Under the restriction of (5), E(e 2 t x t ) = σ 2 0 is another parameter under estimation. he OLS estimator for σ 2 0 is ˆσ 2 = ê ê k = 1 k ê 2 t, where k is the number of regressors. It is clear that ˆσ 2 is not linear in y. heorem 5.3 In the homoskedastic regression model, ˆσ 2 is an unbiased estimator for σ2 0. Proof: Recall that I P X is orthogonal to span(x). hen, ê = (I P X )y = (I P X )(Xβ 0 + e) = (I P X )e, 38

43 and E(ê ê X) = E[e (I P X )e X] = E[trace(ee (I P X )) X]. As the trace and expectation operators can be interchanged, we have that E(ê ê X) = trace(e[ee (I P X ) X]) = trace[d(i P X )]. By the fact that, trace(i P X ) = rank(i P X ) = k and D = σ 2 I, it follows that E(ê ê) = ( k)σ 2 0 and that E(ˆσ 2 ) = E(ê ê)/( k) = σ 2 0, proving the unbiasedness of ˆσ 2. he OLS estimation for variance-covariance matrix of ˆβ in the homoskedastic regression model becomes var(ˆβ ˆ ) = ˆσ 2 (X X) 1 which is unbiased for var(ˆβ ) = σ 2 0 (X X) 1 provided ˆσ 2 is unbiased for σ Gaussian Quasi-MLE and MVUE In normal regression, e t x t N(0, σ 2 ) and then the likelihood for a single observation is ( L t (β, σ 2 1 ) = (2πσ 2 exp 1 ) ) 1/2 2σ 2 (y t x tβ) 2. hen the log-likelihood function for the full sample y = (y 1,..., y ) is L (y ; β, σ 2 ) = log L t (y t ; β, σ 2 ) = 2 log(2π) 2 log(σ2 ) 1 2σ 2 (y Xβ) (y Xβ). he MLE ( β, σ 2 ) maximize L. he FOCs of the maximization problem are L β L (σ 2 ) = X (y Xβ)/σ 2 set = 0, = 2σ 2 + (y Xβ) (y Xβ) set 2σ 4 = 0, 39

44 which yield the MLEs of β 0 and σ 2 : β = (X X) 1 X y, σ 2 = (y X β ) (y X β )/ = ê ê/. Clearly, the MLE β is the same as the OLS estimator ˆβ, but the MLE β 2 is different from ˆσ 2. In fact, σ2 is biased estimator because E( σ2 ) = σ2 0 ( k)/ σ2 0. heorem 5.4 (Minimum Variance Unbiased Estimator, MVUE) In normal regression models, the OLS estimators ˆβ and ˆσ 2 (MVUE). are the minimum variance unbiased estimator Consider a collection of independent random variables z = (z 1,..., z ), where z t has the density function f t (z t, θ) with θ a r 1 vector of parameters. Let the joint log-likelihood function of z be L (z ; θ) = log f (z ; θ). hen the score function s (z ; θ) := log f (z ; θ) = 1 f (z ; θ) f (z ; θ) is the r 1 vector of the first order derivatives of log f with respect to θ. Under regularity conditions, differentiation and integration can be interchanged. density function f is the true density function of z, we have E[s (z 1 ; θ)] = f (z ; θ) f (z ; θ) f (z ; θ) dz ( ) = f (z ; θ) dz = 0. When the postulated hat is, s (z ; θ) has mean zero. he variance of s is the Fisher s information matrix: B (θ) := var[s (z ; θ)] = E[s (z ; θ) s (z ; θ) ]. Consider the r r Hessian matrix of the second order derivatives of log f : H (z ; θ) := 2 log f (z ; θ) ( ) 1 = f (z ; θ) [ f (z ; θ)] 1 = f (z ; θ) 2 f (z 1 ; θ) f (z ; θ) 2 [ f (z ; θ)][ f (z ; θ)], 40

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008 FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4 Prof. Mei-Yuan Chen Spring 008. Partition and rearrange the matrix X as [x i X i ]. That is, X i is the matrix X excluding the column x i. Let u i denote

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Classical Least Squares Theory

Classical Least Squares Theory Classical Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University October 18, 2014 C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Classical Least Squares Theory

Classical Least Squares Theory Classical Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University October 14, 2012 C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, 2012

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Statistics and Econometrics I

Statistics and Econometrics I Statistics and Econometrics I Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University September 13, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I September 13,

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

Simple Linear Regression: The Model

Simple Linear Regression: The Model Simple Linear Regression: The Model task: quantifying the effect of change X in X on Y, with some constant β 1 : Y = β 1 X, linear relationship between X and Y, however, relationship subject to a random

More information

Empirical Market Microstructure Analysis (EMMA)

Empirical Market Microstructure Analysis (EMMA) Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor 1. The regression equation 2. Estimating the equation 3. Assumptions required for

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Heteroskedasticity and Autocorrelation

Heteroskedasticity and Autocorrelation Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity

More information

ECON 3150/4150, Spring term Lecture 6

ECON 3150/4150, Spring term Lecture 6 ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

We begin by thinking about population relationships.

We begin by thinking about population relationships. Conditional Expectation Function (CEF) We begin by thinking about population relationships. CEF Decomposition Theorem: Given some outcome Y i and some covariates X i there is always a decomposition where

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018 Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2) RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

The Statistical Property of Ordinary Least Squares

The Statistical Property of Ordinary Least Squares The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Econ 2120: Section 2

Econ 2120: Section 2 Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted

More information

Econometrics Master in Business and Quantitative Methods

Econometrics Master in Business and Quantitative Methods Econometrics Master in Business and Quantitative Methods Helena Veiga Universidad Carlos III de Madrid Models with discrete dependent variables and applications of panel data methods in all fields of economics

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014 ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information