Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Lecture 2: Simple Regression Egypt Scholars Economic Society

Happy Eid Eid present! enter classroom at http://b.socrative.com/login/student/ room name c28efb78

Outline 1 Regression Example 2 Population regression function Sample regression function 3 Least squares principle Desirable properties of estimators OLS estimator 4 Classical linear regression model Further properties of OLS estimator 5

Regression analysis Regression Example Econometric model inexact relationship disturbance term Econometric methodology theory (hypothesis).. mathematics.. econometrics.. data estimation.. hypothesis testing.. forecasting & policy purposes regression causation.. correlation simple regression population regression function

Regression sense Regression Example estimating the (population) mean value of the dependent variable on the basis of the known (fixed) values of the explanatory variable(s)

Regression example Regression Example suppose we have a total population of 60 families and their weekly income x and weekly consumption expenditure y the 60 families are divided into 10 income groups there is considerable variation in weekly consumption expenditure in each income group despite the variability of weekly consumption expenditure within each income bracket, on the average, weekly consumption expenditure increases as income increases

Example Regression Example dark circled points show the conditional mean values of y against the various x values joining these mean values gives the population regression line (PRL) conditional distribution of expenditure for various levels of income

Population regression line Regression Example

Population regression function Population regression function Sample regression function E(y x i ) = f (x i ) each conditional mean E(Y x i ) is a function of x i known as conditional expectation function (CEF) population regression function (PRF) tells how the mean (average) response of y varies with x assuming the PRF E(y x i ) is linear function of x i E(y x i ) = β 1 + β 2 x i

Linearity assumption Population regression function Sample regression function linearity in the variables conditional expectation of y is a linear function of x i regression curve is a straight line E(y x i ) = β 1 + β 2 xi 2 linearity in the parameters NOT linear conditional expectation of y, E(y x i ) is a linear function of the parameters; the β s may or may not be linear in the variable x E(y x i ) = β 1 + β 2 x 2 i linear

linear regression models Population regression function Sample regression function

Linear regression model Population regression function Sample regression function linear regression LRM means linear in the parameters the parameters, β s, are raised to the first power E(y x i ) = β 1 + β 2 x i LRM E(y x i ) = β 1 + β 2 xi 2 LRM non-linear regression NLRM means non-linear in the parameters E(y x i ) = β 1 + β 2 2 x i NLRM

Stochastic specification of PRF Population regression function Sample regression function the deviation of an individual y i around its expected value is expressed as follows u i = y i E(y x i ) y i = E(y x i ) + u i technically, u i is known as the stochastic disturbance or stochastic error term

Stochastic specification of PRF Population regression function Sample regression function y i = E(y x i ) + u i interpretation the expenditure of an individual family, given its income level, can be expressed as the sum of 1 E(y x i ) the mean consumption of all families with the same level of income the systematic, or deterministic, component 2 u i the random, or non systematic, component a proxy for all the omitted (neglected) variables that may affect y but are not included in the regression model

Stochastic specification of PRF Population regression function Sample regression function if E(y x i ) is assumed to be linear in x i y i = E(y x i ) + u i y i = β 1 + β 2 x i + u i consumption expenditure of a family is linearly related to its income plus the disturbance term

Stochastic specification of PRF Population regression function Sample regression function the individual consumption expenditures, given x i = $80 can be expressed as follows y i = β 1 + β 2 x i + u i y 1 = 55 = β 1 + β 2 (80) + u 1 y 2 = 60 = β 1 + β 2 (80) + u 2 y 3 = 65 = β 1 + β 2 (80) + u 3 y 4 = 70 = β 1 + β 2 (80) + u 4 y 5 = 75 = β 1 + β 2 (80) + u 5

Stochastic specification of PRF Population regression function Sample regression function y i = β 1 + β 2 x i + u i E(y i x i ) = E[E(y x i )] + E(u i x i ) E(y i x i ) = E(y x i ) + E(u i x i ) where expected value of a constant is that constant itself since E(y i x i ) is the same thing as E(y x i ), it implies that E(u i x i ) = 0

Stochastic specification of PRF Population regression function Sample regression function y i = β 1 + β 2 x i + u i E(u i x i ) = 0 thus, the assumption that the regression line passes through the conditional means of y implies that the conditional mean value of u i (conditional upon the given x s) are zero it is clear that E(y x i ) = β 1 + β 2 x i equivalent forms if E(u i x i ) = 0 y i = β 1 + β 2 x i + u i

Stochastic specification of PRF Population regression function Sample regression function y i = β 1 + β 2 x i + u i the stochastic specification has the advantage that it clearly shows that there are other variables besides income that affect consumption expenditure an individual family s consumption expenditure cannot be fully explained only by the variable(s) included in the regression model

Population regression function Sample regression function Significance of stochastic disturbance term why not include all variables in the model vagueness of theory unavailability of data core variables vs. peripheral variables intrinsic randomness in human behaviour poor proxy variables principle of parsimony wrong functional form

Sample regression function Population regression function Sample regression function assume we did not have data on the population we have a randomly selected sample of y values for the fixed x s can we estimate the PRF from the sample data? yes but not accurately because of sampling fluctuations

SRF Population regression function Sample regression function

Population regression function Sample regression function

sample regression function Population regression function Sample regression function which of the two regression lines represents the true population regression line no way we can be absolutely sure but we can try to find the best approximation of the true PRL we would get N different SRF s for N different sample, and these SRF s are not likely to be the same

sample regression line Population regression function Sample regression function ŷ i = ˆβ 1 + ˆβ 2 x i where ŷ i is read as "y-hat" or "y-cap" note ŷ i estimator of E(y x i ) ˆβ 1 estimator of β 1 ˆβ 2 estimator of β 2 an estimator is a formula tells how to estimate the population parameters from the information provided by the sample at hand

SRF Population regression function Sample regression function we can express the SRF in its stochastic form as follows y i = ŷ i + û i y i = ˆβ 1 + ˆβ 2 x i + û i û i denotes the (sample) residual term conceptually û i is analogous to u i û i introduced in the SRF for the same reasons as u i was introduced in the PRF

sum up Population regression function Sample regression function our primary objective in regression analysis is to estimate the PRF on the basis of the SRF y i = β 1 + β 2 x i + u i y i = ˆβ 1 + ˆβ 2 x i + û i because more often than not our analysis is based upon a single sample from some population

sum up Population regression function Sample regression function because of sampling fluctuations our estimate of the PRF on the base of the SRF is at best an approximate one

sum up in this figure Population regression function Sample regression function for x = x i we have one (sample) observation y = y i in terms of the SRF, the observed y i can be expressed as y i = ŷ i + û i and in terms of the PRF, it can be expressed as y i = E(y x i ) + u i ŷ i approximates the true E(y x i ) for the x i for any x i to the left of the point A, the SRF will underestimate the true PRF

Critical question Population regression function Sample regression function granted that the SRF is but an approximation of the PRF can we devise a rule that will make this approximation as close as possible? how should the SRF be constructed so that ˆβ 1 and ˆβ 2 are as close as possible to the true β 1 and β 2? note that we will never know the true β 1 and β 2

Population regression function Sample regression function

least squares Least squares principle Desirable properties of estimators OLS estimator consider the two-variable PRF y i = β 1 + β 2 x i + u i the PRF is not directly observable, we estimate it from the SRF y i = ˆβ 1 + ˆβ 2 x i + û i ŷ i is the estimated (conditional mean) value of y i y i = ŷ i + u i Now how the SRF itself determined?

least squares Least squares principle Desirable properties of estimators OLS estimator Now how the SRF itself determined? or û i = y i ŷ i û i = ˆβ 1 + ˆβ 2 x i ŷ i given n pairs of observations on y and x, we would like to determine the SRF in such a manner that it is as close as possible to the actual y û i = (y i ŷ i ) we may choose the SRF in such away that the sum of the residuals is as small as possible? not a very good criterion!

least squares Least squares principle Desirable properties of estimators OLS estimator minimizing û i is more likely to result in a small (even zero) value regardless how û i s are widely scattered around the SRL

least squares Least squares principle Desirable properties of estimators OLS estimator we can avoid this problem if we adopt the least-squares criterion the SRF can be fixed in such a way that û 2 i = (y i ŷ i ) 2 or ûi 2 = (y i ˆβ 1 β 2 ˆx i ) 2 is as small as possible, where û2 i are the squared residuals

least squares Least squares principle Desirable properties of estimators OLS estimator justifying the least-squares method minimizing û i gives the same weight in the sum (û 1 + û 2 +... ) which does not reflect how far they are from the SRL by squaring û i, we give more weight to residuals such as û 1 and û 4 in the previous figure that the residuals û 2 and û 3 the estimators obtained by the least squares method have some very desirable statistical properties

Desirable properties Least squares principle Desirable properties of estimators OLS estimator different samples result in different estimates (sampling error) it will have a "sampling distribution" or the probability density may take any shape, but the sampling distribution of a good estimator will possess certain desirable properties

Desirable properties Least squares principle Desirable properties of estimators OLS estimator unbiasedness the mean of the sampling distribution for an estimator ˆβ is equal to the true population value β E( ˆβ) = β consider two possible estimators, ˆβ and β

Desirable properties Least squares principle Desirable properties of estimators OLS estimator unbiasedness E( ˆβ) = β unbiasedness does not ensure that estimates will always be accurate only that in repeated estimates using successive samples, the resulting estimates will be correct on average

Desirable properties Least squares principle Desirable properties of estimators OLS estimator efficiency more than one unbiased estimator may well exist best unbiased estimator? the estimator whose sampling distribution has the smallest variance the least widely dispersed is preferred

Desirable properties Least squares principle Desirable properties of estimators OLS estimator intuitively, when inaccurate estimates are obtained, they are more likely to be smaller in error (on average) the more efficient the estimator is

Desirable properties Least squares principle Desirable properties of estimators OLS estimator BLUE amongst the class of all unbiased linear estimators, the desired estimator is unbiased has minimum variance this is known as BLUE best linear unbiased estimator

unbiasedness vs efficiency Least squares principle Desirable properties of estimators OLS estimator trade off trade off between biasedness and efficiency choose an estimator that is biased (small bias) but has mush smaller variance (efficiency) than alternative unbiased estimator in this case, you can choose the estimator with the smallest mean squared errors MSE MSE(ˆθ) = bias 2 + variance MSE(ˆθ) = [E(ˆθ θ)] 2 + VAR(ˆθ)

OLS estimator Least squares principle Desirable properties of estimators OLS estimator the ordinary least squares OLS ensures that the estimator is BLUE under certain conditions we obtain ˆβ 2 as follows ˆβ 2 = n x i y i x i yi n xi 2 ( x i ) 2 (xi x)(yi ȳ) = (xi x) 2 = xi y i x 2 i

OLS estimator Least squares principle Desirable properties of estimators OLS estimator the ordinary least squares OLS ensures that the estimator is BLUE under certain conditions we obtain ˆβ 1 as follows ˆβ 1 = x 2 i yi x i xi y i n x 2 i ( x i ) 2 = ȳ ˆβ 2 x

Classical linear regression model Further properties of OLS estimator OLS provides the best approximation of the SRF to the PRF (BLUE) under certain conditions the summarizes those conditions recall the (stochastic) PRF: and the (stochastic) SRF: y i = β 1 + β 2 x i + u i y i = b 1 + b 2 x i + e i

Classical linear regression model Further properties of OLS estimator how close the SRF to the PRF (and the b s to the β s) depends on how close the residuals e i (û i ) to the true errors u i this depends on the properties of the u i the imposes certain conditions on the properties of the errors u i the Gauss-Markov theorem states that, under the assumptions, the OLS estimators are BLUE

assumptions Classical linear regression model Further properties of OLS estimator A1: A2: the explanatory variable x is non-stochastic x is fixed in repeated samples the average (mean) value of the error term is zero E(u i ) = 0 that is, u i is randomly positive or negative

assumptions Classical linear regression model Further properties of OLS estimator A3: A4: the variance of the error term is constant var(u i ) = σ 2 formally known as homoscedasticity (same variance) the error terms are independent cov(u i u j ) = 0 for i j no correlation between different error terms no autocorrelation (correlation with itself) A5: more conditions to be discussed later

Classical linear regression model Further properties of OLS estimator OLS attempts to make the residuals as small as possible collectively minimize e 2 i to avoid a poorly estimated relationship with very large positive and negative errors cancelling each other out property 1: the OLS regression line always passes through the sample means ȳ, x ȳ = b 1 + b 2 x

Classical linear regression model Further properties of OLS estimator property 2: the average value of the residuals is always zero property 3: E(e i ) = ē = 0 the explanatory variable(s) and the residuals are uncorrelated ei x i = 0 note that desirable properties for sample residuals which are approximating random population errors property 1 is arguably a minimal requirement of any estimator

Next Lecture you should know assignment 1 available on BB tonight due on Oct 25 (20:00 Cairo time) to be emailed to eses@egyptscholars.org recorded lecture on hypothesis testing lecture 3 on Saturday Oct. 18 20:00-22:00 (Cairo time) next lecture multiple regression regression examples on Stata