III. Econometric Methodology Regression Analysis

Page Econ07 Appled Econometrcs Topc : An Overvew of Regresson Analyss (Studenmund, Chapter ) I. The Nature and Scope of Econometrcs. Lot s of defntons of econometrcs. Nobel Prze Commttee Paul Samuelson, et al. Econometrcs may be defned as quanttatve analyss of actual economc phenomena. Goldberger... applcaton of economc theory, mathematcs and statstcal nference to the analyss of economc phenomena. (Joke) E.E. Leamer There are two thngs you don t want to see n the makng sausage and econometrc research. II. Major Uses of Econometrcs.. Descrbng economc realty 2. Testng hypothess about economc theory 3. Forecastng future economc actvty III. Econometrc Methodology Regresson Analyss An mportant methodology n econometrcs s regresson analyss whch typcally follows these steps: Use a famous example to llustrate.. State the hypotheses. Keynes n the General Theory sad a $ ncrease n ncome wll lead to less than a $ ncrease n overall consumpton.

Page 2 We want to test ths hypothess that the MPC<. 2. Specfy the mathematcal model of the theory. Although Keynes ddn t specfy the exact nature of the relatonshp. Mght suggest a smple lnear relatonshp. C = DI 0 < < where C=aggregate consumpton and DI=aggregate dsposable ncome 3. Specfy the econometrc model. Ths purely mathematcal model s unnterestng to the econometrcan. It assumes an exact or determnstc relatonshp between C and DI. C = 0 DI ε We re-wrte the equaton wth a dsturbance or error term. Ths s now an econometrc model, or more precsely a lnear regresson model.

Page 3 4. Obtan the Data. Only way to estmate the parameters of nterest n ths model, s to obtan the necessary data. Data source could nvolve tme seres, cross-sectonal or panel data. Tme seres data are collected over tme for the same country or other sngle aggregate economc unt (e.g., aggregate C and DI could be obtaned for Sngapore from 950-2000). In ths case, we d normally re-wrte the equaton wth a t subscrpt on the varables and dsturbance term to denote tme. = DI Ct t ε t Cross-sectonal data are collected for a sample over ndvduals, households, frms or other dsaggregate economc entty at a pont n tme (e.g., C and DI could be obtaned for sample of,000 Sngapore famles durng 2000). In ths case, we d normally re-wrte the equaton wth a subscrpt on the varables and dsturbance term to denote ndvdual. = DI C ε Fnally, panel data contans elements of both tme seres and cross-sectonal data (e.g., C and DI could be obtaned for all countres n the OECD durng the perod 950-2000). Note that we have varaton across countres at any sngle pont n tme, as well as varaton across tme. In ths case, we d normally re-wrte the equaton wth both an and t subscrpt on the varables and dsturbance term to denote country and tme. = DI Ct t ε t Tme seres or cross sectonal data could be plotted as a scatter dagram below:

Page 4 5. Estmate the parameters n the econometrc model. Now t s tme to estmate the coeffcents n the model. The basc dea s to come up wth a lne that best fts the data ponts. Imagne that ths regresson analyss yelds the followng consumpton functon. Ĉ = 336.9 0.820DI These are the estmates of the 2 coeffcents. The hat on C ndcates that ths s an estmated consumpton functon or regresson model. 6. Test the hypothess. Recall that we wanted to test Keynes hypothess that the MPC was between zero and. Looks reasonable, but unsure whether there s any statstcal evdence that t s below.

Page 5 7. Forecast or predct economc behavour. One of the other uses of ths model f for forecastng or predctng future economc behavour. To predct C, however, need to know future values of DI. Suppose you know that DI s gong to be $65,000 (mllons). Ĉ = 336.9 0.820(65,000)= 53,636.9 Ths also allows you to predct savngs of $,363.. Ths s just the dfference between DI and C. 8. Use the model for polcy purposes. Can also be used for control purposes. Suppose that C of 53.6 bllon s nsuffcent to mantan full-employment. Not enough spendng by households. Government could consder ncreasng DI through tax cuts to acheve a hgher target. Suppose 62 bllon s needed. 62,000 = 336.9 0.820DI DI =75,98.9 Thus, need to cut taxes by just over $0 bllon from forecasted levels. IV. Types of Econometrcs and Names of Varables n Regresson Splt nto theoretcal and appled felds. We end up straddlng these 2 approaches. Theoretcal econometrcs concerns the development of basc estmaton approaches, propertes of estmators, etc. More closely related to mathematcal statstcs (e.g., proofs, axoms,...). Appled econometrcs s bult on ths theoretcal foundaton. Apples estmaton technques to varous areas of economc enqury. Examples: Where to open a new restaurant? How much ad? Should we fx the target nterest rate? How many hours studyng on Econ07? Academcs, prvate and government sectors have ncreasngly used econometrcs.

Page 6 Regresson analyss s the study of the relatonshp between a Dependent Varable and one or more Independent or Explanatory Varables. In the lnear regresson model (or true regresson lne or populaton regresson functon) Y = X X 0 K K ε Y s called dependent or left-hand-sde varable or regressant and s random; X k ( k =,, K) s called ndependent or explanatory or rght-hand-sde varable or regressor, t can be fxed or random; ε s called error or dsturbance term and s random; s are called regresson coeffcents, they are unknown and fxed; 0 s the ntercept coeffcent; k ( k =,, K) s the slope coeffcents. The meanng of s the mpact of a one unt ncrease n X on Y, holdng constant the other ndependent varables. The estmated regresson lne (or sample regresson functon) s wrtten as Yˆ = ˆ X 0 Yˆ s called estmated or ftted value of Y ; ˆ k ( k = 0,, K) s called estmated regresson coeffcent; Defne e = Y Yˆ and call e the resdual. When K=, the regresson model s Smple Lnear Regresson (SLR) model. When K>, the regresson model s Multple Lnear Regresson (MLR) model. V. Statstcal vs. Determnstc Relatonshps ˆ ˆ X Regresson analyss s concerned wth a Statstcal, not a Functonal or Determnstc dependence among varables. In statstcal relatonshps, the varables are Random or Stochastc. K K VI. Regresson vs. Causaton Although regresson analyss deals wth the relatonshp of one varable on other varables, t doesn t necessarly mply causaton. A causal relatonshp must come from outsde of statstcs. Economc theory s supposed to provde the compellng evdence of causaton.

Page 7 VII. The True (or Populaton) Regresson Functon (PRF) Suppose we have a small communty of 2 famles. We re nterested n studyng the relatonshp between ther weekly dsposable ncome (X) and expendture on food (Y). We want to predct the populaton mean of food expendtures, gven some level of famly ncome. The 2 famles can be grouped nto four ncome groups. Each famly wthn a group has the same dsposable ncome. Ths s the entre populaton, not a sample. Dsposable Income (X) Indvdual Food Expendtures (Y) Average Food Expendtures 250 78.00, 88.50, 96.00 87.50 300 77.50, 89.00, 96.50, 09.00 93.00 350 90.50, 06.50 98.50 400 99.00, 03.00, 0.004.00 Plot these data ponts on the followng dagram. Ths s often known as a Scatter Dagram. The sold dots are the actual observatons. Now the Condtonal Mean or Condtonal Expectaton s E(Y X = X ) The crcles are the condtonal means. Clearly, food expendtures on average ncrease wth dsposable ncome. Ths can be seen even more clearly by connectng these condtonal means wth a straght lne. Ths s the True (or Populaton) Regresson Lne. Note that t could also be a True (or Populaton) Regresson Curve.

Page 8 Geometrcally, a populaton regresson lne or curve s smply the locus of the condtonal means or expectatons of the dependent varable for fxed values of the explanatory varable(s). In general, we could wrte the Populaton Regresson Functon (PRF) as: E(Y X )= f( X ) where ths s some functon of the explanatory varable. We mght antcpate that food consumpton wll be lnearly related to dsposable ncome. Ths s an ntal assumpton of our estmaton. We could narrow ths functonal form to: E(Y X )= X Ths s known as the lnear PRF (or PR Lne).

Page 9 VIII. Lnearty n Regresson Analyss What do we mean when we say that our regresson model s lnear? One possblty s that the model s nonlnear n terms of the varables. E(Y X )= 2 X The second possblty s that the PRF s nonlnear n terms of the coeffcents. E(Y X )= X Such regressons functons wll not be consdered n ths paper, but the one gven above wll be. From now on, lnear regresson models should be read as lnear (n terms of the parameters). IX. Addng the Dsturbance Term to Our PRF The PRF tells us the 'average' food expendtures for a gven level of household ncome. But we know that any 'partcular' household s unlkely to be on ths functon. For ths reason we rewrte PRF as = X Y ε where ε s a random varable wth mean 0. Lot's of reasons why ε mght exst. Mnor nfluences of Y are omtted. The underlyng theoretcal equaton mght have a dfferent functonal form than the one chosen for the regresson. Some purely random varatons are always there. Measurement Error on Y or X.

Page 0 X. The Sample (Estmated) Regresson Functon Thus far, we've dealt wth the entre populaton and the PRF. Avoded any consderaton of samplng. In most cases, we wll never observe the entre populaton. We have to nfer from a sample or samples what the PRF mght look lke. Note that we're unlkely to know just how close we get to the truth. Each sample we draw can be used to produce a Sample (Estmated) Regresson Functon (SRF), that s, the estmated regresson functon: Yˆ = ˆ 0 ˆ X Of course, we can replace the actual value of the dependent varable ( Y ) wth ts ftted value ( Y ˆ ). The LHS s no longer an estmator, t s the actual value. The RHS now ncludes the Resdual term e. Y = ˆ ˆ X e Ths means that the actual dependent varable can be decomposed nto ts ftted value and the resdual. Y =Y ˆ e Ths resdual, lke the dsturbance can be ether postve or negatve. We can ether overestmate: Y - Yˆ = e <0 f Y <Yˆ or underestmate the true value of Y : Y - Yˆ = e >0 f Y >Yˆ X. Questons for dscusson: Q.0 XI. Run the heght regresson (Secton.4) usng the data fle provded. Do further exploraton accordng to Q.4 and Q.5