Econ7 Appled Economercs Topc 5: Specfcaon: Choosng Independen Varables (Sudenmund, Chaper 6 Specfcaon errors ha we wll deal wh: wrong ndependen varable; wrong funconal form. Ths lecure deals wh wrong ndependen varables, whch may be due o omed varables, redundan varables (rrelevan varables. Use he followng example under boh ypes: lnw where W = Wage rae of worker. S = Years of formal educaon of worker.. OJT = Effecve years of On-he-Job Tranng of worker. The dea s ha we have forms of human capal: general human capal obaned hrough formal educaon and specfc human capal obaned hrough vocaonal educaon, apprenceshp programmes, ec. Boh may ncrease wages (.e., > and >, bu no a he same rae (.e.,. I. Omng a Relevan Varable. One of he mos common problems n regresson analyss. Could be based n he gnorance of he researcher (.e., varable avalable, bu no used. More lkely, daa unavalable (e.g., Household Economc Survey. Esmae he followng model nsead: S So ha he rue error n he above regresson s = lnw = OJT S = OJT So ha Assumpon does no hold because E( = OJT. More mporanly, n he case where OJT and S are correlaed, looks lke Assumpon 3 does no hold because Cov(, S. As a resul, Gauss-Markov heorem does no apply. In general, OLS esmae of he regresson coeffcen s based, e, ˆ E (
Page - And he bas s where: Suppose ha b >, hen: bas( ˆ ˆ = b b = E( Cov ( S, OJT = Var ( S E ( and he esmaed coeffcen s based upward. Bas s zero when he coeffcen of omed varable s zero or he ncluded and omed varables are uncorrelaed. In addon, he sandard errors on hese esmaed coeffcens wll be based. In he msspecfed model: Bu varance of he 'rue' esmaor s: > where r s he correlaon coeffcen beween S and OJT. Ths means ha: The varance of esmaed coeffcen s also based. We're placng 'oo much' confdence n our coeffcen esmaes. The resul s ha he es wll be msleadng (hs s rue even f r =, because our esmae of σ wll also be based. ˆ Var ( ˆ σ = Σ s Var ( ˆ σ = Σ s (- r If r >,henvar ( ˆ < Var ( ˆ The remedal measure s easy IF we know whch varable has been omed and hs omed varable s avalable. Include n he model. If he omed varable no avalable, mgh ry o fnd a proxy varable ha s closely relaed o hs mssng varable (e.g., use nformaon on he average OJT or people n a parcular ndusry and occupaon. Or a leas sgn he drecon of he bas, and esmae s poenal magnude. The above remedy works n heory. In pracce, somemes s dffcul o know f a varable has been omed. To deec he exsence of he problem of omng
Page - 3 a relevan varable, one common pracce s o examne he sgn of esmaed coeffcens and see f hey mee our expecaon or economc heory. If no, s very lkely ha relevan varables have been omed. The nex sep s o use he drecon of he bas o look for relevan varables. II. Includng an Irrelevan Varable. Suppose rue model doesn' conan OJT. Ths s conssen wh some heorecal models ha predc ha hs human capal wll no affec wages, employers are more lkely o pay for. Thus, he correc regresson model s: bu we esmae: lnw = S lnw = S OJT The problems here are less severe compared o omng a relevan varable. The rue error n he above regresson s = OJT If OJT s rrelevan, should be zero and hence Assumpon holds. Assumpon 3 holds oo. Wha are he properes of he OLS esmaes? ( Esmaed coeffcens are unbased and conssen. ( es s vald f he correc sandard error s used. ( The only problem s ha he esmaed coeffcens are neffcen. Under he 'false' model: E ( ˆ = Under he 'rue' model: Var ( ˆ σ = Σ s (- r Var ( ˆ σ = Σ s Snce f r >,henvar ( ˆ < Var ( ˆ, we're placng 'oo lle' confdence n our coeffcen esmaes (.e., he sandard error on he esmaed coeffcen s larger
Page - 4 han should be. Ths makes he -rao smaller han should be, and makes more lkely ha we won be able o rejec he null when we should. Ths s an easy one o solve n heory. If he varable shouldn be n he regresson, elmnae from he ouse. Bu n pracce, hs sn so easy. The heory n hs example says ha boh specfcaons mgh be rgh. If an ndependen varable may be relevan, nclude. III. How o Decde Wheher o Include Varable or No?. Graphc mehod o deec he problem of omng a relevan varable Plo he resduals and look for 'dsnc paern'. Take he earler example on funconal form of he regresson. We esmae: bu he 'rue' model s: lnw = S lnw = S S u = S u A plo of he resduals agans S would produce a 'deecable' paern (.e., curved downward.
Page - 5. Four crera Economc heory: s here any sound heory? Suden sasc: s sgnfcan n he correc drecon? Has R mproved? Do oher coeffcens change sgn when a varable s ncluded? Include varable f answers are posve. Don necessarly drop nsgnfcan varables. An nsgnfcan fndng can be an mporan resul. Example: (5.6 (. (. =.5. 3.5 n=5, R =. 6 where Coffee= demand for Brazlan coffee n US P bc = prce of Brazlan coffee P = prce of ea = dsposable ncome n US Y d Wha happens f you drop P bc? Coffee= ˆ 9. 7.8P.4P. 35Y bc d Coffee= ˆ 9.3.6P. 36Y d (. (.9 =.6 4. n=5, R =. 6 Wha happens f you add anoher varable, prce of Colomba coffee, P cc Coffee= ˆ 8.P 5.6P.6P. 3Y cc (4. (. (.3 (. = -.8 3 n=5, R =. 65 bc d
Page - 6 3. Three ncorrec echnques for choosng varables Daa mnng: smulaneously ry a whole seres of possble regresson formulaons and hen choose he equaon ha conforms he mos o wha he researcher wans he resuls o look lke. Dong economercs = makng sausages. Sepwse regresson echnque: sysemac way of varable selecon based on R. The compuer program s gven a shoppng ls of possble ndependen varables, and hen bulds he equaon n sep. I always adds o he regresson model he varable whch ncreases R he mos. Problem: ndependen varables could be correlaed. 3 Sequenal specfcaon search: add and drop sequenally (e esmae an undsclosed number of regressons bu only presen a fnal choce as f were he only specfcaon esmaed. When you es a model, you have a ype I error. If you esmae and es oo many models, ype I errors wll accumulae. IV. Lagged Independen Varables Consder he followng regressons: Y Y = X X ( = X X ( where =,, n. Tha s, we have sample of n me-seres observaons. Noe he change of noaon from o o emphasze me seres daa. In equaon (, he effec of X on Y s nsananeous. In equaon (, he effec s fel one perod laer. As long as X s exogenous (no nfluenced by Y, he lagged srucure of he equaon poses no problem. Of course, he nerpreaon of slope coeffcen s dfferen.
Page - 7 V. Akake s Informaon Creron and Schwarz Creron In general he more varables ncluded n he regresson, he smaller wll be he RSS. Bu f a varable only conrbues margnally o he reducon of he RSS, should no be ncluded. AIC and SC (also known BIC measures he RSS wh penaly of addonal parameers. They are defned n regresson models as: AIC = ln(rss/n (K/n SC = ln(rss/n ln(n(k/n You may selec models ha mnmze he AIC or SC. These are called model selecon crera. Noe ha R s also a model selecon creron. You choose model o maxmze R. Compared wh he AIC or SC, R ends o selec a model wh rrelevan varables. VI. Quesons for Dscusson: Q6.3, Q6.9 VII. Compung Exercse: Q6.5 (Johnson, Ch 6, Q6.5, Johnson Ch 6: AIC