Introductory Econometrics - PDF Free Download

Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013

Outline Introduction Simple linear regression Multiple linear regression OLS in the multiple linear regression Statistical properties of OLS Inference in the multiple model OLS asymptotics Selection of regressors in the multiple model Heteroskedasticity Regressions with time-series observations Asymptotics of OLS in time-series regression Serial correlation in time-series regression Instrumental variables estimation

OLS in the multiple linear regression A multiple linear regression model with two regressors The simplest multiple linear model is one in which a dependent variable y depends on two explanatory variables x 1 and x 2, for example wages on education and work experience: y = β 0 +β 1 x 1 +β 2 x 2 +u, where the slope β 1 measures the reaction of y to a marginal change in x 1 keeping x 2 fixed (ceteris paribus) or y/ x 1. Often, regressors are closely related, and the ceteris paribus idea becomes problematic.

OLS in the multiple linear regression The multiple linear regression model In the general multiple linear regression model, y is regressed on k regressors y = β 0 +β 1 x 1 +β 2 x 2 +...+β k x k +u, with an intercept β 0 and k slope parameters (coefficients) β j,1 j k. Again, for the error term u, it will be assumed that E(u x 1,...,x k ) = 0. The multiple linear regression is the most important statistical model in econometrics. Note that multiple should not be replaced by multivariate : multivariate regression lets a vector of variables y 1,...,y g depend on a vector of regressors. Here, y is just a scalar dependent variable.

OLS in the multiple linear regression OLS in the multiple model In order to generalize the idea of OLS estimation to the multiple model, one would minimize n (y i β 0 β 1 x i1... β k x ik ) 2 i=1 in β 0,β 1,...,β k, and call the minimizing values ˆβ 0, ˆβ 1,..., ˆβ k.

OLS in the multiple linear regression Formally, the solution can be obtained by taking derivatives and solving a system of first-order conditions n (y i ˆβ 0 ˆβ 1 x i1... ˆβ k x ik ) = 0, i=1 n x i1 (y i ˆβ 0 ˆβ 1 x i1... ˆβ k x ik ) = 0, i=1 n x i2 (y i ˆβ 0 ˆβ 1 x i1... ˆβ k x ik ) = 0, i=1... n x ik (y i ˆβ 0 ˆβ 1 x i1... ˆβ k x ik ) = 0. i=1 This system does not yield a nice closed form for the OLS coefficients, unless matrix algebra is used.

OLS in the multiple linear regression Interpreting the OLS first-order conditions Just like in the simple regression model, the first-order conditions have a method-of-moments interpretation: The condition for the intercept ˆβ 0 says that the sample mean of the OLS residuals is 0. This corresponds to the population moments condition that Eu = 0; Each condition for a slope coefficient ˆβ j says that the sample correlation (or covariance) between the residuals and the regressor x j is 0. This corresponds to the population condition that the regressors and errors are uncorrelated.

OLS in the multiple linear regression Multiple linear regression in matrix form Presume all values for the dependent variable y and all regressors are written in a vector y and in an {n (k +1)} matrix X: y = y 1 y 2... y n, X = 1 x 11... x 1k 1 x 21... x 2k... 1 x n1... x nk In this notation, the OLS estimates ˆβ = (ˆβ 0, ˆβ 1,..., ˆβ k ) can be written compactly as ˆβ = (X X) 1 X y.

OLS in the multiple linear regression Fitted values and residuals Just as in simple regression, OLS estimation decomposes observed y into an explained part ŷ (the fitted value) and an unexplained part or residual û: y i = ˆβ 0 + ˆβ 1 x i1 +...+ ˆβ k x ik +û i = ŷ i +û i. Because the sample mean of the residuals is 0, the sample mean of the fitted values is ȳ and the averages are on the regression hyperplane : ȳ = ˆβ 0 + ˆβ 1 x 1 +...+ ˆβ k x k.

OLS in the multiple linear regression Simple and multiple linear regression coefficients In most cases, the estimate ˆβ 1 in a simple linear regression y = ˆβ 0 + ˆβ 1 x 1 +û differs from the estimate ˆβ 1 in a comparable multiple regression y = ˆβ 0 + ˆβ 1 x 1 +...+ ˆβ k x k +û. The coefficient estimates only coincide in special cases, such as cov(x 1,x j ) = 0 for all j 1 or ˆβ j = 0 for all j 1. Note that cov(x 1,x j ) = 0 is not the typical case: regressors are usually correlated with other regressor variables.

OLS in the multiple linear regression Simple and two-regressor regression: a property Consider the simple regression y = β 0 + β 1 x 1 +ũ and the regression of y on x 1 and on an additional x 2 : It is easily shown that y = ˆβ 0 + ˆβ 1 x 1 + ˆβ 2 x 2 +û. β 1 = ˆβ 1 + ˆβ 2ˆδ, where ˆδ is the slope coefficient in a regression of x 2 on x 1. Clearly, β 1 = ˆβ 1 iff one of the two factors ˆβ 2 and ˆδ is 0. Note: ˆβ1 is not necessarily better or more correct than β 1.

OLS in the multiple linear regression Goodness of fit in the multiple model The variance decomposition equation or n (y i ȳ) 2 = i=1 n (ŷ i ȳ) 2 + i=1 SST = SSE +SSR continues to hold in the multiple regression model. Likewise, R 2 = SSE SST = 1 SSR SST defines a descriptive statistic in the interval [0, 1] that measures the goodness of fit. Note, however, that R 2 is not the squared correlation of y and any x j but the maximum squared correlation coefficient of y and linear combinations of x 1,...,x k. n i=1 û 2 i

Statistical properties of OLS Assumptions for multiple linear regression In order to establish OLS properties such as unbiasedness etc., model assumptions have to be formulated. The first assumption is the natural counterpart to (SLR.1), the linearity in parameters: MLR.1 The population model can be written y = β 0 +β 1 x 1 +β 2 x 2 +...+β k x k +u, with unknown coefficient parameters β 1,...,β k, an intercept parameter β 0, and unobserved random error u.

Statistical properties of OLS Assumption of random sampling MLR.2 The data constitute a random sample of n observations {(x i1,...,x ik,y i ) : i = 1,...,n} of random variables corresponding to the population model (MLR.1). Due to the random-sampling assumption (MLR.2), observations and also errors are independent for different i.

Statistical properties of OLS No multicollinearity In simple regression, OLS requires some variation in the regressor (should not be entirely constant). In multiple regression, more is needed. The matrix X X must be invertible: MLR.3 There are no exact linear relationships connecting the regressor variables, and no regressor is constant in sample or in population. Assumption (MLR.3) implies n > k. When it holds in population, violation in sample happens with 0 probability for continuous random variables. (MLR.3) is violated if a regressor is the sum or difference of other regressors. It is not violated for nonlinear identities, such as x 2 = x 2 1.

Statistical properties of OLS Zero conditional expectation The assumption E(u x) = 0 is just a natural generalization of the simple regression assumption: MLR.4 The error u has an expected value of zero given any values of the regressors, in symbols E(u x 1,x 2,...,x k ) = 0. Again, (MLR.4) implies E(u) = 0 but it is stronger than that property. (MLR.4) also implies cov(x j,u) = 0 for all regressors x j.

Statistical properties of OLS Violations of assumption MLR.4 There are several reasons why (MLR.4) may not hold, in particular the ensuing condition cov(x j,u) = 0, which is often called an exogeneity condition. When it is violated, x j is called an endogenous regressor. If the true relationship is nonlinear, E(u x j ) 0 even though E(u) = 0; If an important influence factor has been omitted from the list of regressors ( omitted variable bias ), (MLR.4) is formally violated. The researcher must decide whether she wishes to estimate the regression without or with the doubtful control; If there is logical feedback from y to some x j, u and x j are correlated, x j is endogenous, and regression yields biased estimates of the true relationship. This case must be handled by special techniques (instrumental variables).

Statistical properties of OLS Unbiasedness of OLS (MLR.1) to (MLR.4) suffice for unbiasedness: Theorem Under assumptions (MLR.1) (MLR.4), E(ˆβ j ) = β j, j = 0,1,...,k, for any values of the parameters β j. In words, OLS is an unbiased estimator for the intercept and all coefficients. In short, one may write E(ˆβ) = β, using the notation β for a (k +1) vector (β 0,β 1,...,β k ) and a corresponding notation for the expectation operator.

Statistical properties of OLS Scylla and Charybdis How many regressors should be included in a multiple regression? Omitting influential regressors (too low k) tends to overstate the effects. Effects due to the omitted variables are attributed to the included regressors ( omitted variable bias ). Conversely, for example the effect of a difference between two regressors may not be found if only one of them is included; Profligate regressions with many regressors lack degrees of freedom. Results will be imprecise, variances will be large. Statistical tools for model selection are important (R 2 and R 2 do not work). Generally, economists tend to include too many regressors.

Statistical properties of OLS Homoskedasticity For the efficiency and variance properties, constant variance must be assumed: MLR.5 The error u has the same variance given any values of the explanatory variables, in symbols var(u x 1,...,x k ) = σ 2. If (MLR.5) is violated, the errors variance and also the variance of the dependent variable will change with some x j. Heteroskedasticity is often observed in cross-section data.

Statistical properties of OLS The variance of OLS The most informative way to represent the OLS variance is by using matrices: Theorem Under assumptions (MLR.1) (MLR.5), the variance of the OLS estimator ˆβ = (ˆβ 0, ˆβ 1,..., ˆβ k ) is given by var(ˆβ X) = σ 2 (X X) 1, where the operator var applied to a vector denotes a matrix of variances and covariances. The matrix expression must be evaluated in OLS estimation anyway. As n, the matrix X X divided by n may converge to a moment matrix of the regressors.

Statistical properties of OLS A property of the OLS variances From the general formula in the theorem, the interesting formula var(ˆβ j X) = σ 2 SST j (1 R 2 j ) is obtained, where SST j denotes n i=1 (x ij x j ) 2 and R 2 j is the R 2 from a regression of x j on the other regressors x l,l j. Note that this formula does not use any matrices. Strong variation in the regressor x j and weak correlation with other regressors benefits the precision of the coefficient estimate ˆβ j.

Statistical properties of OLS Estimating the OLS variance In the formulae for the OLS variance, the item σ 2 is unobserved and must be estimated. In analogy to simple regression, the following theorem holds: Theorem Under the assumptions (MLR.1) (MLR.5), it holds that E n i=1û2 i n k 1 = E SSR n k 1 = Eˆσ2 = σ 2, i.e. the estimator of the error variance is unbiased. The scale factor n k 1 corresponds to the degrees of freedom concept: n observations yield k + 1 coefficient estimates, such that n k 1 degrees of freedom remain. The proof is omitted.

Statistical properties of OLS Gauss-Markov and multiple regression In direct analogy to the case of simple regression, there is the celebrated Gauss-Markov Theorem for linear efficiency: Theorem Under assumptions (MLR.1) (MLR.5), the OLS estimators ˆβ 0, ˆβ 1,..., ˆβ k are the best linear unbiased estimators of β 0,β 1,...,β k, respectively, i.e. BLUE. For this reason, (MLR.1) (MLR.5) are called the Gauss-Markov conditions. It can be shown that a genuine multivariate generalization holds and that linear combinations of OLS estimators are BLUE for linear combinations of coefficient parameters.

Inference in the multiple model Normal regression For some results, such as unbiasedness and linear efficiency, no exact distributional assumptions are needed. For others, it is convenient to assume a Gaussian (normal) distribution: MLR.6 The error u is independent of the explanatory variables and it is normally distributed with mean 0 and variance σ 2, in symbols u N(0,σ 2 ). Assumption (MLR.6) implies (MLR.4) and (MLR.5). Normality is often a reasonable working assumption, unless there is strong evidence to the contrary. In large samples, it can be tested.

Inference in the multiple model OLS coefficient estimates as normal random variables Theorem Under the assumptions (MLR.1) (MLR.6), the conditional distribution on the regressors of the OLS coefficient estimates ˆβ j is normal, i.e. ˆβ j X N{β j,var(ˆβ j )}, with var(ˆβ j ) given by the direct expression using the idea of a regression of x j on other covariates or as the (j,j) element of the variance matrix σ 2 (X X) 1. Note that the variance is formally a random variable as it depends on X. Proof is quite obvious, as for given X, ˆβ is a linear function of y, which in turn is normal due to normal u.

Inference in the multiple model Implications of the normality of OLS estimates Normality does not only hold for the individual ˆβ j, it holds that ˆβ N(β,σ 2 (X X) 1 ) for a multivariate ((k + 1) variate) normal distribution. Thus, all sums, differences, linear combinations of coefficient estimates are also normally distributed; From the properties of the normal distribution, it follows that the theoretical standardized estimate ˆβ j β j s.e.(ˆβ j ) is standard normal N(0, 1) distributed. The denominator standard error, however, is the square root of the true and unknown variance. This distributional property does not hold for the estimated standard error.

Inference in the multiple model The empirically standardized estimate If the OLS coefficient estimates are standardized by estimated standard errors, the distribution follows the well known t law (in the older literature Student distribution): Theorem Under assumptions (MLR.1) (MLR.6), ˆβ j β j s.e.(ˆβ j ) t n k 1, in words, the empirically standardized estimate follows a t distribution with n k 1 degrees of freedom.

Inference in the multiple model Remarks on the standardized estimate The t distribution with m degrees of freedom is defined from m + 1 independent standard normal random variables a,b 1,...,b m as the distribution of the ratio a/ (b1 2 +...+b2 m )/m; For more than around 30 degrees of freedom, the t distribution becomes so close to the normal N(0,1) that the standard normal can be used instead; The standardized estimator will not be t distributed if the normality assumption (MLR.6) is violated; Degrees of freedom can be remembered as follows: out of n original degrees of freedom, k +1 are used up by estimating coefficients and the intercept, and n k 1 remain.

Inference in the multiple model Densities of t distributions f(x) 0.0 0.1 0.2 0.3 4 2 0 2 4 x Densities for the t distribution with 5 (black), 10 (blue), and 20 (green) degrees of freedom.

Inference in the multiple model Testing the null hypothesis β j = 0 Researchers are interested in testing the null hypothesis H 0 : β j = 0 usually with the alternative β j 0, less often with the alternative β j > 0 or β j < 0. An appropriate test statistic to test this H 0 is the empirically standardized estimate for β j = 0, i.e. t βj = ˆβ j ŝ.e.(ˆβ j ), which is called the t ratio or t statistic.

Inference in the multiple model What is a hypothesis test? A hypothesis test is a statistical decision procedure. Based on the value of a test statistic, which is a function of the sample and hence a random variable, it either rejects the null hypothesis or is unable to reject it (fails to reject). For example, the t test rejects the null of β j = 0 if t βj > c, t βj > c, in the one-sided and two-sided versions. c is called the critical value, and the region of R where the test rejects, is called the critical region.

Inference in the multiple model How are the critical values determined? Hypothesis tests are tuned to significance levels. A significance level is the probability of a type I error, i.e. of rejecting the null even though it is correct. The construction of a test requires knowledge of the distribution of the test statistic under the null. Suppose the significance level (specified by the researcher) is 5%. Then, any interval that has the probability of 5% under the null is a valid critical region for a valid test. In order to minimize the probability of a type II error, critical regions are defined to be situated in the tails of the null distribution. For example, the 95% quantile of the t distribution is a good critical value for a 5% test against a one-sided alternative if the test statistic is t distributed.

Inference in the multiple model Practical implementation of hypothesis tests Presume the researcher has the value of the test statistic and searches for critical values. Several options are available: If the null distribution is a known standard law, critical values are found on the web or in books: inconvenient; Critical values may also be provided by a statistical software in this case: slightly more convenient and flexible; If the software is smart, it will provide p values instead of or in addition to the critical values: very precise and convenient; If the null distribution is non-standard and rare, the researcher may have to simulate the distribution via Monte Carlo or by bootstrap procedures: computer skills needed.

Inference in the multiple model Definition of the p value Correct definitions: The p value is the significance level, at which the test becomes indifferent between rejection and acceptance for the sample at hand (the calculated value of the test statistic); The p value is the probability of generating values for the test statistic that are, under the null hypothesis, even more unusual (less typical, often larger ) than the one calculated from the sample. Incorrect definition: The p value is the probability of the null hypothesis for this sample.

Inference in the multiple model Test based on quantiles 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 10% to 90% quantiles of the normal distribution. The observed value of 2.2 for the test statistic that is, under H 0, normally distributed is significant at 10% for the one-sided test.

Inference in the multiple model Test based on p values 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 The area under the density curve to the right of the observed value of 2.2 is 0.014, which yields the p value. The one-sided test rejects on the levels of 10%, 5%, but not 1%.

Inference in the multiple model Return to the t test Assume (MLR.1) (MLR.6). Under the null hypothesis H 0 : β j = 0, the t ratio for β j, ˆβ j t βj = ŝ.e.(ˆβ j ), will be t distributed with n k 1 degrees of freedom or t n k 1 distributed. Thus, reject H 0 at 5% significance in favor of the alternative H A : β j > 0 if the test statistic is larger than the 95% quantile of the t n k 1 distribution. Reject in favor of H A : β j 0 if the test statistic is larger than the 97.5% quantile or less than the 2.5% quantile. When the t test rejects, it is often said that β j is significantly different from 0 or simply that β j is significant or also that x j is significant.

Inference in the multiple model More general t tests Presume one wishes to test H 0 : β j = β j0 for a given value β j0, such as H 0 : β j = 2.15. Then, evaluate the statistic ˆβ j β j0 ŝ.e.(ˆβ j ). Under H 0, it is clearly t n k 1 distributed, and the usual quantiles can be used.

Inference in the multiple model Testing several exclusion restrictions jointly Assume the null hypothesis of concern is now H 0 : β l+1 = β l+2 =... = β k = 0, i.e. the exclusion of k l regressors. Then, a suitable test statistic is F = (SSR r SSR u )/(k l), SSR u /(n k 1) where SSR r is the SSR for the restricted model without the k l regressors and SSR u for the unrestricted model with all k regressors. Assuming (MLR.1) (MLR.6), the statistic F is, under the null, distributed F with k l numerator and n k 1 denominator degrees of freedom.

Inference in the multiple model Densities of F distributions f(x) 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 x Densities of F distributions with 2 (black), 4 (blue), and 6 (green) numerator and 20 denominator degrees of freedom.

Inference in the multiple model Some remarks on the F test The F test is easily generalized to test for general restrictions, such as H 0 : β 2 +β 3 = 1,β 5 = 3.61,β 6 = 2β 7, as again there exists a SSR u and a SSR r. The main difficulty may be the estimation of the restricted model. Numerator degrees of freedom correspond to the number of linear independent restrictions; For large n, the F statistic will be distributed like 1/(k l) times a χ 2 (k l) distribution; The F statistic for the exclusion of one regressor x j is the square of t βj.

Inference in the multiple model The overall F test A special F test has the null hypothesis H 0 : β 1 =... = β k = 0 and the alternative that at least one coefficient is non-zero. The statistic is F = (SST SSR)/k SSR/(n k 1) = R 2 /k (1 R 2 )/(n k 1), a transformation of the R 2. When it fails to reject, the regression model fails to provide a useful description of y. This is the only F statistic that shows in a standard regression printout.

Inference in the multiple model The importance of F and t tests F and t tests are restriction tests that are tools in searching for the best specification of a regression equation the best selection of regressors that determines the targeted dependent variable y. Only nested models can be compared. For example, y = β 0 +β 1 x 1 +u can be tested against y = β 0 +β 1 x 1 +β 2 x 2 +u but not against y = β 0 +β 1 x 2 +u; In the specification search, it is often recommended to start with a profligate model and to eliminate insignificant regressors (backward elimination, general-to-specific) rather than to add regressors from a small model; The decisions of t tests, say, for two coefficients β l,β j and of the F test for β l = β j = 0 are often in conflict. Some researchers prefer the decision of the F test in doubtful cases.

Inference in the multiple model Ouch! The following statements are regarded as incorrect: The tested null hypothesis is H 0 : ˆβ j = 0; The test is rejected; The alternative hypothesis can be rejected; The test is 2.55; The coefficient β 4 is significant at 95% (unless someone really uses an unusual 95% significance level); The hypothesis that β 4 is insignificant can be rejected.

OLS asymptotics The probability limit When talking about asymptotics, i.e. large-sample behavior, statistical convergence concepts are needed. For convergence of a sequence of random variables X 1,...,X n,... to a fixed limit, we use Definition A sequence of random variables (X n ) is said to converge in probability to θ R, in symbols plimx n = θ iff for every ε > 0 n P( X n θ > ε) 0 as n. This concept is relatively weak, as it does not imply that single realizations of the random variable sequence converge. It allows simple rules, such as plim(x n Y n ) = (plimx n )(plimy n ).

OLS asymptotics Consistency of OLS An estimator ˆθ for the parameter θ is called consistent iff plim ˆθ(n) = θ, n with ˆθ(n) denoting an estimate from a sample of size n. Consistency holds under relatively weak conditions: Theorem Under assumptions (MLR.1) (MLR.4) and some technical conditions, the OLS estimator ˆβ is consistent for β, which implies that plim ˆβ j = β j for j = 0,1,...,k. n

OLS asymptotics A sketch of the consistency issue Consider ˆβ = β +(X X) 1 X u = β +( 1 n X X) 11 n X u. Typically, the term n 1 X X will converge to some kind of variance matrix. The term n 1 X u should converge to its expectation EX u, which is 0 if X and u are uncorrelated and Eu = 0. Thus, the condition MLR.4 E(u) = 0, cov(x j,u) = 0, for j = 1,...,k, will suffice for consistency and can be substituted for the stronger assumption (MLR.4).

OLS asymptotics Correlation of regressor and error is pretty bad It was shown before that correlation between a regressor and the errors (for example, with omitted variables and with endogeneity) usually causes a bias in the sense of Eˆβ β. If (MLR.4 ) is violated, this bias will not even disappear as n and becomes an inconsistency. As Clive Granger said, If you can t get it right as n goes to infinity, you shouldn t be in the business. This means that inconsistent estimators should not be used at all. Inconsistency is more serious than a finite-sample bias.

OLS asymptotics Asymptotic normality of the OLS estimator The (a) celebrated Central Limit Theorem can be used to prove Theorem Under the Gauss-Markov assumptions (MLR.1) (MLR.5) and some technical conditions, it holds that ˆβ j β j ŝ.e.(ˆβ j ) d N(0,1), and generally that n(ˆβ j β j ) d N(0,σ 2 β j ) with σ 2 β j determined either from the matrix formula σ 2 (X X) 1 or by the aforementioned construction from regressions among regressors.

OLS asymptotics Remarks for the asymptotic normality of OLS Note that normality of the errors is not required: even for most non-normal error distributions, ˆβ will approach a normal limit distribution; Under the assumptions of the theorem, ˆσ 2 will converge to σ 2 ; This latter convergence is of type plim, while the main result of the theorem uses convergence in distribution ( d ), a weaker type of convergence. Convergence in distribution means that the distribution of a random variable converges to a limit distribution, nothing else is stated on the random variables proper.

OLS asymptotics Lagrange multiplier tests: the idea Restriction tests (t and F) follow the Wald test principle, one of the three test construction principles used in parametric statistics. The other two principles are the likelihood-ratio test (LR) and the Lagrange multiplier test principle (LM). LR and LM tests are typically asymptotic tests, their small-sample null distributions are uncertain, their large-sample distributions will be regular (chi-square) even in the absence of (MLR.6). The LM test estimates the model under the null and checks the increase in the likelihood when moving toward the alternative. It is also called score test, as the derivative of the likelihood is called the score. Often, the LM test can be made operational in a sequence of regressions, with the test statistic simply calculated as nr 2 for a specific regression ( auxiliary regression ).

OLS asymptotics The LM test for exclusion of variables Consider the multiple regression model y i = β 0 +β 1 x 1,i +...+β k x k,i +u i, and the null hypothesis H 0 : β k q+1 =... = β k = 0. Estimate the restricted regression model y i = β 0 +β 1 x 1,i +...+β k q x k q,i +u i by OLS and keep the residuals ũ. Then, regress these ũ on all k regressors ũ i = γ 0 +γ 1 x 1,i +...+γ k x k,i +v i. The nr 2 from this second, auxiliary regression is the LM test statistic. Under H 0, it is asymptotically distributed as χ 2 (q).

Selection of regressors in the multiple model Model selection: the main issue The typical situation in multiple regression is that y has been specified a priori, and that the researcher looks for the optimal set of regressors that offer the best explanation for y. Tools for this specification search or regressor selection are: R 2 and R 2 can be used for comparing any two or more models, but tend to increase with adding any regressors; F and t tests can only be used for comparing nested models, and lengthy search sequences tend to invalidate the significance level; Information criteria such as AIC and BIC can compare any two or more models and penalize complexity; Specification tests can be used to eliminate ill-specified models but cannot find the optimal model.

Selection of regressors in the multiple model Adjusted R 2 The corrected or adjusted R 2, often denoted R 2 or R 2 c, is defined as R 2 = 1 (1 R 2 n 1 ) n k 1. It holds that R 2 R 2. If R 2 is seen as an estimator for corr 2 (y,β x), then the bias of R 2 is smaller than the bias of R 2. R 2 always increases if a new regressor is included in the regression. R 2 increases if the t ratio for this variable is larger than 1 and corresponds to testing at an enormous significance level. It cannot be used for serious model selection.

Selection of regressors in the multiple model Penalizing complexity Consider the estimated error variance ˆσ 2 = 1 n k 1 n i=1 û 2 i = SSR n k 1, which, just like R 2, improves (here, however, decreases) if a new regressor with a t ratio greater than one is added. Thus, it cannot be used for serious model selection. It takes a step in the right direction, however, a trade-off: the numerator improves (decreases) with increasing complexity; the denominator deteriorates (decreases) with higher complexity. This idea is pursued by information criteria, which impose a stronger penalty for complexity, strong enough for useful model selection.

Selection of regressors in the multiple model The AIC according to Akaike Akaike introduced the AIC (A Information Criterion), in one possible version AIC = logˆσ 2 2(k +1) +, n which is to be minimized: complexity decreases the first term and increases the second. (In information criteria, ˆσ 2 should be formed using scales n not n k 1.) In nested comparisons, minimizing AIC corresponds to t or F tests at an approximate 15% significance level. For n, minimizing AIC selects the best forecasting model that tends to keep slightly more regressors than those with non-zero coefficients.

Selection of regressors in the multiple model The BIC according to Schwarz Schwarz simplified the BIC that had been introduced by Akaike, in one version BIC = logˆσ 2 + (k +1)logn, n which is to be minimized. The BIC complexity penalty is stronger than the AIC penalty, so selected models tend to be more parsimonious (smaller). In nested comparisons, minimizing BIC corresponds to a significance level falling to 0 as n. For n, BIC will select the true model, exactly keeping all regressors with non-zero coefficients. In smaller samples, BIC tends to select too parsimonious models.