LECTURE 2: SIMPLE REGRESSION I

Size: px

Start display at page:

Download "LECTURE 2: SIMPLE REGRESSION I"

Victor Strickland
5 years ago
Views:

1 LECTURE 2: SIMPLE REGRESSION I

2 2 Introducing Simple Regression

3 Introducing Simple Regression 3 simple regression = regression with 2 variables y dependent variable explained variable response variable predicted variable regressand x independent variable explanatory variable control variable predictor variable regressor we are actually going to derive the linear regression model in three very different ways these three ways reflect three types of econometric questions we discussed in the first lecture (descriptive, causal and forecasting) while the math for doing it is identical, conceptually they are very different ideas

4 4 Descriptive Approach Why do we need a regression model? Estimation & interpretation. Correlation vs. causation.

5 Descriptive Analysis 5 goal: estimate E[y x] (called the population regression function) x is discrete (i.e., categories) wages vs. education example one can collect data for the individual categories (the more categories, the more difficult) 1,500 wage 1, education (years)

6 Descriptive Analysis (cont d) 6 x is continuous problem: no categories example: CEO s salary vs. company s ROE ROE = return on equity = net income as a percentage of common equity (ROE = 10 means if I invest $100 in equity in a firm I earn $10 a year) does a CEO s salary (typically huge) reflect her performance? 4000 CEO s salary ($1,000s) company s ROE

7 7 Descriptive Analysis (cont d) therefore, we need a model for E[y x] in other words, we need to find a good mathematical expression for f in E[y x] = f(x) ) the simplest model I can think of is E[y x] = β 0 + β 1 x Why use simple models: Simple models are: easier to estimate. easier to interpret (e.g., β 1 = Δwage/Δeduc etc.). easier to analyze from the statistical standpoint. safe: they serve as a good approximation to the real relationship, the functional nature of which might be unknown and/or complicated. Things can t go too wrong when using a simple model. Further reading: Angrist and Pischke (2008): Mostly Harmless Econometrics: An Empiricist s Companion.

8 8 Linear Model for E[y x] interepretation: intercept: β 0 = E[y x = 0] slope: 1 E[ y x] E[ y x] x x E[y x] E[y x] = β 0 + β 1 x β 1 1 β 0 x

9 9 Estimation once we ve decided about the functional form of the model (in our case, it s E[y x] = β 0 + β 1 x), we need to develop techniques to obtain estimates of the parameters β 0 and β 1 we base our estimates on a sample of the population we need to make inferences about the whole population sample: n observations (people, countries, etc.) indexed with i x and y values for ith person denoted as y i, x i (for i = 1,,n) Preliminaries it will be useful to define u = y β 0 β 1 x what do we know about u? first, because E[y x] = β 0 + β 1 x, we have E[ u x] E[ y x x] 0 1 E[ y x] x 0 1 x x

10 Estimation (cont d) 10 this means two things: 1. E u = 0. (This should be intuitive: E[u x] = 0 for all x E u = 0; alternatively, you can plug in CE.4 from Wooldridge, page 687.) 2. the expected value of u does not change when we change x the second property has numerous implications, the most important being: cov(x,u) = 0 (this is property CE.5 from Wooldridge, page 687) E[xu] = 0 (because cov(x,u) = E[xu] Ex Eu ) how is this useful in estimation? we arrived at two important facts about expectations of u: E u = 0 E[xu] = 0 typically, expectations are estimated using a sample mean (e.g., how would you estimate mean wage with a sample of 10 people?) we ll use the idea of sample analogue, forcing the sample means of u and xu to equal zero (see below)

11 11 Estimation (cont d) before we move on, we ll revise some more statistical concepts connected with random sample remember we need to make inference about the population based on our sample and its characteristics Population vs. sample characteristics: Random variable Population (size N) Sample (size n) x Ex x 1 N N i1 x i x 1 n n i 1 x i var x E( x x ) 2 1 N N i 1 ( xi x ) 2 1 n1 n i1 ( x x) i 2 cov( x, y) E( x )( y ) 1 x y N N i 1 ( yi y )( xi x ) 1 n1 n i1 ( y y)( x x) i i the expressions on the right are called sample mean, sample variance and sample covariance

12 12 Estimation (cont d) for ith person (i.e., observation) in our sample, we have the population regression model y i = β 0 + β 1 x i + u i, where β 0 and β 1 are the unknown parameters to be estimated to think about estimation let s define the sample regression model y ˆ ˆ i 0 1xi uˆ i, where 0 ˆ and 1 ˆ are our estimates of β 0 and β 1 from the sample uˆi is defined as u ˆ ˆ ˆi y i 0 1x i note: if 0 ˆ and 1 ˆ are like β 0 and β 1, then uˆi should be like u i sample analogue: in order to make inferences about the population, we have to believe that our sample looks like the population if it is so, then let s force things to be true in the sample which we know would be true in the population

13 13 Estimation (cont d) from the discussion about (the population s) u, we know that E u = 0 E[xu] = 0 the sample analogue to this is: 0 ( ˆ ˆ ) 1 n 1 n 1 ˆ ni ui ni1 yi 0 1xi 0 ( ˆ ˆ ) 1 n 1 n 1 ˆ ni xiui ni1xi yi 0 1xi as y i and x i are the data, the above equations are in fact two linear equations in variables 0 ˆ and 1 ˆ ; they can be solved (fairly) easily (see Wooldridge, pages 28 and 29) note that the first equation tells us that ˆ 0 y ˆ 1x, where x and y are the sample means of x i s and y i s solving the equations to get 1 ˆ yields: ˆ n i1 i i 1 n 2 i1( xi x) ( y y)( x x )

14 14 Estimation (cont d) we can rewrite the formula for the slope as 1 n n1 i1 i i n 1 i 1( xi x) n ( y y)( x x ) ˆ, which can be viewed as the sample analogue to both var x and its sample counterpart are always positive, therefore: cov( xy, ) var x the regression coefficient (β 1 ), the covariance, and the correlation coefficient must all have the same sign one of them is zero only if all of them are zero

15 15 Exercise: CEO Salaries and Performance 1. Download the ceosal1.gdt file from my website and open it in Gretl (or, if you have the Wooldridge dataset installed in Gretl, open the ceosal1 data file). 2. Find the sample mean and standard deviation for roe and salary, and sample correlation coefficient between the two variables. Store the results in the ceo.xls file from my website. (Use View Summary statistics and View Correlation matrix procedures.) 3. Estimate the regression model E[salary roe] = β 0 + β 1 roe using the formulas we derived for intercept and slope. Note that cov( x, y) corr( x, y), where σ x and σ y are standard deviations of x and y. The same holds for the sample counterparts. 4. Interpret the results. What does β 0 tell you about average CEOs salaries? Is its level sensible? How about β 1? Would you say that a CEO s salary reflects her work performance? x y

16 Correlation Vs. Causation 16 the difference between causation and correlation is a key concept in econometrics you saw that in the model with conditional expectations, the estimates were based on the correlation between x and y (remember the formulas) there are many ways a causal interpretation can be given that is consistent with the (correlation-based) results (see next slide) as we ll see, no econometric tool can ever prove or find a causal relationship on itself having an economic model is essential in establishing the causal interpretation (we ll talk about this in the causal-analytic part) conclusions: correlation causation statistical significance the effect of x on y is significant (only that they are tightly associated ) these issues are confused all the time by politicians and the popular press

17 17 Three Causation Schemes looking at the relationship between x and y from the causal standpoint, the association (or correlation) between x and y can represent three basic situations: y x x causes y: if a CEO s performance is good (high ROE), he gets paid a lot. y x y causes x: high salaries motivate CEOs, thus making them perform well (resulting in high ROE) y z x there is another factor z that causes both x and y, which makes x and y be associated: if a CEO is good (clever, motivated, etc.), he both performs well and gets paid a lot the descriptive approach makes no claims about which one is the case

18 Exercise: Voting Load the data voting.gdt (election outcomes and campaign expenditures for 173 two-party races (A,B) for the U.S. House of Representatives in 1988) 2. Consider the following regression model: E[voteA expena] = β 0 + β 1 expena. Does it make sense to use a model like this to describe the relationship between campaign expenditures and the eventual vote share? How would you interpret β 0 and β 1? In a two-party race, do you think it makes sense to look at the campaign expenditures of party A alone? 3. Consider the following regression model: E[voteA sharea] = β 0 + β 1 sharea, where sharea is A s percentage share in the total campaign expenditures ( total meaning the sum across all parties). Generate sharea in Gretl (use Add Define new variable), estimate the model (Model Ordinary least squares) and interpret the estimates. 4. Find a story for the association between votea and sharea supporting each of the three causation schemes

19 19 Causal Approach The need for causal analysis. Structural model and its assumptions. Estimation. Examples.

20 Causal Analysis 20 in the descriptive framework, we couldn t really say anything about causality however, in decision-making situations, causal questions are typically those we need to answer: if I face a decision, it means I can influence something (I have control over an economic variable) in order to decide effectively, I need to know the impact of my decision on things I m interested in examples: central bank sets interest rates in order to keep inflation within specified bounds (inflation targeting) companies charge prices so as to maximize profit / revenues / market share ministry of education introduces new school fees scheme; the aim is to collect money without discouraging students from education. Therefore, one has to find the effect of education on future income

21 Causal Analysis (cont d) 21 to say something about causality we need to make some more assumptions in practice, these assumptions will have to be checked using an economic theory / common sense + knowledge of the real problem mathematically, the formulation of these assumptions consists in writing down a structural data-generating model this will look similar to what we have been doing, but conceptually it is very different for the description-type analysis, we started with the data and then asked which model could help summarize the conditional expectation for the structural (causal) case we start with the model and then use it to say what the data will look like (even before we actually collect them)

22 Structural model 22 a simple structural model may look like this y = β 0 + β 1 x + u what has changed from the descriptive analysis? descriptive model: conditional expectation of y = a function of x E[y x] = f(x) structural model: y = a function of x and u y = f(x,u) it should be clear that modeling y is much more ambitious than modeling the expectation of y we have already encountered u, but this time it has a real content (see later) in choosing a particular structural model, we re actually saying that we believe that, in reality, the value of y is created from x and u using function f this is a daring claim, so we need to choose the model very carefully

23 Structural model (cont d) 23 let s go back to the simple model and think about it this way: y = β 0 + β 1 x + u β 0 and β 1 are real numbers which are out there in nature (however, we do not know there values that s what we want to estimate) we choose x (i.e., x is the thing we can control) Nature / God choose u in a way that is unrelated to our choice of x β 1 now measures the real causal effect of x on y: if we increase x by 1, y goes up by β 1 in effect example: suppose that y is your annual income (in ), x is your education (in years) β 0 = 10,000, β 1 = 5,000, u = 5,000 Is it worth studying at a university? And how about the tuition fees? And what if we don t know the value of u?

24 How About u? 24 u is probably the most important part of the structural model we ll spend a lot of time talking about u and its relation to x (note that the relation of u and y is obvious from the equation) what does u contain? Everything that the rest of the right-hand side of the equation failed to capture in the previous example, this means everything that affects your wages besides education: intelligence work effort other suggestions? in some textbooks, you can read that u contains a couple more things: measurement errors (or poor proxy variables) the intrinsic randomness (in human behaviour etc.) model specification errors these will not be all that relevant for our causal discussion

25 Crucial Assumption: E[u x] = 0 25 once we have specified the structural model, we need to estimate its parameters, i.e. β 0 and β 1 in order to be able to do so, we need to assume something about the relation between u and x mathematically, the crucial assumption takes on the form E[u x] = 0 this is the same as what we did before (in the descriptive analysis), but conceptually very different before u was defined simply u = y β 0 β 1 x. It didn t actually mean anything now we think of u as this real thing that is actually out there and means something it is just that we don t / can t observe it even though we don t observe it, we can still argue whether the assumption is fulfilled or not in order to do so, we need to know the assumption tells us in the first place

26 Crucial Assumption: E[u x] = 0 (cont d) 26 as before, the assumption that E[u x] = 0 really means two separate things, one of which is a big deal, the other is not: 1. E[u x] doesn t vary with x (i.e., it s a constant). this holds true if u is assigned at random u and x are independent perhaps something else and is not true if u and x are correlated example: intelligence (a part of u) is correlated with education (x) we re in trouble (we ll discuss the possible solutions as we go) 2. Eu = 0 (i.e., the unconditioned expectation is zero). the important part is 1; we assume 2 just for convenience

27 Crucial Assumption: E[u x] = 0 (cont d) 27 imagine 1 holds and 2 does not; for instance, let E[u x] = 5 then the model y = β 0 + β 1 x + u is equivalent to y = γ 0 + γ 1 x + v where: γ 0 = β γ 1 = β 1 u = v 5 notice that now E[v x] = E[u 5 x] = E[u x] 5 = 5 5 = 0. γ 1 and β 1 are the same and that is really what we care about (the effect of x on y) thus, the fact that we pick zero is really a normalization. It makes the model well defined (identified) without any real imposition on the data however, the fact that E[u x] does not vary with x is much more than a normalization. It has real content

28 E[u x] = 0 Vs. Causation 28 what does the assumption tell us about causation? remember the three causation schemes between x an y: y x, y x, y z x in the causal analysis, we want to rule out all but the first one this is exactly what the assumption about E[u x] effectively does the arrow scheme now contains three letters: y, x and u we know that u affects y (by definition), y u note that E[u x] = constant implies cov(x,u) = 0 therefore, we d like the arrow scheme to look like this: y x u this suggests there should be no association (correlation) between x and u

affects x, so u affects x, and u and x are correlated therefore cov(x,u) 0 and the assumption is necessarily

29 E[u x] = 0 Vs. Causation (cont d) 29 imagine we have the reversed causality y x : y x u here u affects y which in turn affects x, so u affects x, and u and x are correlated therefore cov(x,u) 0 and the assumption is necessarily violated now consider the case y z x : if there s any z that affects y, it is a part of u (by definition), and therefore z (and u) affect x y x u u contains z, so u affects x, meaning the two are correlated

30 30 Estimation now, let s take the assumption E[u x] = 0 for granted then, nothing is really any different than in the descriptive case we can write down the sample regression function as we know that y ˆ ˆ x uˆ, i 0 1 i i Eu i 0 E[ xu] 0 i i the sample analogue is 0 ( ˆ ˆ ) 1 n 1 n 1 ˆ ni ui ni1 yi 0 1xi 0 ( ˆ ˆ ) 1 n 1 n 1 ˆ ni xiui ni1xi yi 0 1xi which gives us n i1( yi y)( xi x ) 1, 2 0 y 1 n i1( xi x) ˆ ˆ ˆ x it is only the interpretation that has changed exactly as before

31 Examples 31 lets look back at our previous models and interpret things in this way let me be very clear about something: the question of whether the additional assumption is reasonable or not is definitely questionable in all of these cases. Let s worry about that later for now, just take the assumption as given and focus on the interpretation Voting example for the voting example we estimated the sample regression function votea = sharea as usual, the first parameter is not very interesting. It tells me the fraction of the votes I would get if I spent no money: I would get about 27% the key thing is that for every 1% that I outspend my opponent, my vote share increases by 0.463%

32 Examples (cont d) 32 suppose that currently I am spending the same amount as my opponent: sharea = 50 if I double my spending (and my opponent does not react) then sharea = 67 and I would get additional % = 7.8% is it worth it? Generally, this is not an econometric question. Moreover, it depends what doubling my current expenditure means in $ CEO example here we got salary = roe we can interpret this as the CEO s pay schedule if the CEO can get the return on equity to increase by 10 percentage points more, she will earn an extra $185,010 next year this really is not that much money given how large a 10 percentage point increase is (is it a high enough motivation?)

33 33 Forecasting Approach Forecasting framework. Ordinary least squares estimation (OLS).

34 Forecasting Analysis 34 let s shift gears completely imagine the following you have a bunch of data on x and y now (i.e., the x i s and y i s) you know the value of x tomorrow (we ll denote the value x*) and want to predict what the value of y will be tomorrow (denoted y*) actually, when talking about dynamic models, there are often lags in economic responses, so that today s cause (x) is the yesterday s value of an economic variable examples: inflation rate this year, unemployment rate next year corporate profits today, stock price tomorrow SAT scores, college GPA this can make x* available in advance for prediction

35 35 Forecasting Analysis (cont d) in order to do prediction we need a model let s once again use the linear sample regression model y ˆ ˆ x uˆ once we estimate the parameters, we can predict the future value of y as ˆ ˆ x* note that unlike in the causal analysis, we do not choose x* here! from the forecasting point of view, the fitted values yˆi can be regarded as the would-be predictions we d use if we didn t have the appropriate y i s therefore, it seems sensible to find βy ˆi 0 and βy ˆi 1 such that the overall distance between y s and y i s is minimized how to measure this distance? ˆi i 0 1 i i 0 1 obvious idea: absolute difference between the two: however, absolute value is a really ugly function y i yˆ i

36 36 Forecasting Analysis (cont d) 1 x 1 x a much smoother function is ( y yˆ ) we want to aggregate this distance across all data points: this says how close our model is to the data i n 2 n 2 i1( yi yˆ i) i1( yi 0 1xi ) i 2 ˆ ˆ

37 37 Ordinary Least Squares (OLS) Estimation note that the overall distance distance between yˆi s and y i s is a function of variables βy 0 and βy 1 (because y i s and x i s are known the data) ˆi ˆi we can call this function sum of squares (SS) and write SS ( ˆ, ˆ ) ( y ˆ ˆ x ) n 0 1 i1 i 0 1 i 2 we want to keep the model as close to the data as possible, which means we minimize SS, so we take the derivative of this function with respect to βy ˆi 0 and βy ˆi 1 and set to zero: SS ˆ 0 SS ˆ 1 2 ( y ˆ ˆ x ) 0 n i1 n i1 i x ( y ˆ ˆ x ) 0 i i 0 1 i i note that we re back to the same system of 2 linear equations as before (after dividing both equations by 2) the estimates are the same as before, but with a different kind of reasoning again

38 38 Three Approaches: A Comparison descriptive approach goal: find the association between x and y expressed in terms of conditional expectation E[y x] assumptions: approximate functional form of E[y x] causal approach goal: find the causal effect of x on y assumptions: structural model for y ( how y is created ) assumptions about u: E[u x] = 0 (and thus, E[xu] = 0) forecasting approach goal: predict future values of y based on the knowledge of future x assumptions: approximate functional form of the relation y vs. x different goals, different assumptions, same formulas for estimates the econometric software has only one procedure for all three cases, you have to know what you re doing, check the assumptions etc.

39 LECTURE 2: SIMPLE REGRESSION I

Simple Regression Model. January 24, 2011

Simple Regression Model. January 24, 2011 Simple Regression Model January 24, 2011 Outline Descriptive Analysis Causal Estimation Forecasting Regression Model We are actually going to derive the linear regression model in 3 very different ways