Econ107 Applied Econometrics Topic 10: Dummy Dependent Variable (Studenmund, Chapter 13)

Pag- Econ7 Appld Economtrcs Topc : Dummy Dpndnt Varabl (Studnmund, Chaptr 3) I. Th Lnar Probablty Modl Suppos w hav a cross scton of 8-24 yar-olds. W spcfy a smpl 2-varabl rgrsson modl. Th probablty of nrollng n trtary study can b wrttn: whr Y f nrolld n unvrsty; othrws. X prformanc n scondary school, famly background, ncom or walth of parnts, gndr, thncty, tc. W wrt ths as a 2-varabl rgrsson for smplcty. Could b a multpl rgrsson, whr som or all of th ndpndnt varabls ar ncludd. Assum Prob P Y X ). ( Ths s known as a Lnar Probablty Modl (LPM), bcaus th condtonal xpctaton of Y gvn X s th condtonal probablty of ths vnt occurrng: whr E( ε X ). Although w can trat ths modl lk any othr rgrsson and us OLS to stmat th paramtrs, on rstrcton s that: Only probablts wthn th, ntrval mak sns. Two undsrabl charactrstcs of LPM (.., th us of OLS whr th dpndnt varabl s dscrt):. Nonnormal/htroskdastc dsturbancs In gnral, β β X Y ε E (Y X ) β β X Prob β β X Prob Y - β - β X Y - Prob ε

Pag- 2 For th purposs of statstcal nfrnc, w assum that ths dsturbancs ar normally dstrbutd wth a constant varanc. Ths ar both volatd undr th LPM. W know that Y can tak on only on of two valus: and. Thrfor: Snc ths stmatd probablty wll always b postv, th rror trms wll fluctuat btwn postv and ngatv valus. It can b shown that th rsultng varanc of th dsturbanc trms follows a bnomal dstrbuton : Th rsult s that th dsturbancs ar not normally dstrbutd, and ar htroskdastc. If thy wr homoskdastc, th Var( ε ) should b a constant. Undr th LPM, ths varanc s a functon of X (.., t dpnds on Prob ). Wth htroskdastcty coffcnt stmats ar stll unbasd, but nffcnt (.., no longr mnmum varanc or BLUE.) Ths can b ovrcom by runnng Wghtd Last Squars (WLS). Transform th data and us OLS. 2-Stp Procdur: If Y,thn, ε - Prob If Y,thn, ε - Prob Var ( ε ) 2 σ Prob ( - Prob. Run OLS. Rtan th fttd valus and comput th followng 'wght' Prob ˆ (- Prob ˆ ) 2. Transform th data n th followng way, and run OLS. Y β β Ths lmnats htroskdastcty, snc w now hav unt varanc for th compost dsturbanc trm. Th rsultng coffcnt stmats ar now BLUE. Howvr, ths procdur lmnats only on of th two problms. X ) ε

Pag- 3 2. Unrstrctd Rang of Prob W sad arlr that ths probablty must b rstrctd to th, ntrval. Th problm s that nothng n LPM 'rstrcts th rang' of Prob. Consdr th followng numrcal xampl: Suppos w stmat: Prob Y ˆ whr X s dfnd as fathr's 'yars of ducaton mnus 2'. For xampl, f fathr complts a scondary ducaton, w say that h has 2 yars of schoolng (.g., School Crtfcat). In ths cas, X. No qualfcaton would mak X ngatv. Any post-sc qualfcaton would mak X postv (.g., f h has Ph.D, X 7). Show ths n th followng dagram..97.4 X Th data ponts l on th 2 horzontal lns, whr y and. Ethr th ndvdual s nrolld n trtary study or h or sh sn t. Th dpndnt varabl s dchotomous, although th ndpndnt varabl s mor or lss contnuous. Ths s th scattr dagram for a dummy dpndnt varabl modl.

Pag- 4 OLS trs to ft a rgrsson ln through ths data ponts that mnmss th sum of th squard rsduals. Suppos w gt th upward-slopng rgrsson functon n th dagram. Enrolmnt n trtary ducaton s postvly rlatd to th fathr s ducaton. Th ntrcpt trm (.97) s th ntrscton of th rgrsson functon wth th vrtcal axs. Th slop s th stmatd coffcnt (.4). Each yar of ducaton by th fathr rass th probablty that th offsprng wll b nrolld n trtary study by 4. prcntag ponts. W can prdct th probablty that a gvn ndvdual wll b nrolld by pluggng hs or hr fathr s ducaton nto ths condtonal xpctaton. For xampl, two yars of post-sc ducaton would gv us: Prob.97.4(2).479 Th problm s that a rgrsson functon wth any slop wll vntually pass outsd th horzontal lns dfnd by th data ponts. For xampl, somon wth a fathr who droppd out of school at ag 5 (.., X -2), wll hav a ngatv probablty of trtary study: Prob.97 Ths sn t possbl. Somon wth a fathr who has a PhD (.., X 7), wll hav a probablty of trtary study n xcss of on: Prob.97 Ths sn t possbl thr. Thus, w hav a fundamntal problm wth th LPM and forcasts. As a consqunc, w nd to xplor altrnatvs to th LPM. W want a tchnqu that stmats a 'rgrsson curv' boundd by zro and on (.., t asymptotcally approachs ths two horzontal lns.) Mght also not that th R 2 statstc sn t vry usful n th LPM as a masur of th goodnss of ft of ths rgrsson functon. It s dffcult to ft a rgrsson ln through two horzontal lns of data ponts. Th ntuton s that w r tryng to dtrmn th rlatonshp btwn th probablty of ths vnt and som.4(-2) -.85.4(7).84

Pag- 5 ndpndnt varabl. But w nvr obsrv th tru probablty. All w s s th vntual outcom of zro or on. II. Th Logt Modl Undr th LPM th probablty of an vnt occurrng s wrttn: Prob Undr th logt modl ths probablty s wrttn: β X Not that thr s now a dffrnc btwn th fttd valu and th stmatd probablty. Ths probablty s now a nonlnar functon of X. Ths s th cumulatv dstrbuton functon (CDF) for th logstc dstrbuton. W nd to vrfy that th probablty rang s now rstrctd to l wthn th, ntrval. β Prob - ( Prob - β β X ) If,thn Prob Whn s rasd to a larg ngatv numbr (n absolut valu), ths probablty approachs on. If -,thn Prob Whn s rasd to a larg postv numbr, ths probablty approachs zro. Thus, ths logstc rgrsson functon asymptotcally approachs on and zro. In btwn ths two xtrms, w can show ths logt modl rlatv to th LPM n th followng dagram.

Pag- 6 Not that th margnal or ncrmntal ffct of X on Y dclns at th xtrms. Ths s th slop of th curv at a gvn pont. Contrast ths wth th constant slop of th LPM. Th largst slop of th logt modl occurs at th nflcton pont, whr w go from ncrasng at an ncrasng rat to ncrasng at a dcrasng rat. Ths dosn t hav to corrspond to X. Log-Odds Rato How do w stmat ths logt rgrsson modl? On possblty s to convrt ths nonlnar functon nto a lnar rgrsson functon and apply OLS. Bgn by wrtng th probablty of not nrollng n trtary study as: W can now wrt th 'odds rato' as: - Prob - - - - - - - - Prob - Prob -

Pag- 7 Th trck s to ralz that: - Ths odds rato s th probablty that an vnt wll occur ovr th probablty that t wll not occur. For xampl, f th Prob.75, th odds rato s 3 or 3:. If th Prob.8, th odds rato s 4 or 4: By takng th natural log of th odds rato w gt: so that th 'log-odds rato' s a lnar functon of X, but th probablty s stll a nonlnar functon of X. For xampl, β tlls us how th log of th odds rato wll chang wth a on unt chang n X. Estmaton Prob ln - Prob Imagn that w try to us th log odds rato to stmat th arlr rgrsson modl on trtary nrolmnts. Plug n obsrvd valus for Y or Prob and run OLS. What's wrong wth ths approach? It dosn't work wth our cross scton of ndvduals bcaus w don't obsrv probablts, just actual outcoms. If Prob, thn ln( ) s undfnd If Prob, thn ln( ) s undfnd On way to stmat th modl s to us th Maxmum lklhood (ML) mthod, whch s byond th scop of ths cours. Altrnatvly, w can us a mthod calld Groupd Logt. Suppos w hav 'group' rathr than 'ndvdual' data (.g., a cross scton of scondary schools). W could stmat th probablts or frquncs of trtary nrolmnts for th graduats of ach school: ) ( β β X

Pag- 8 m Prob ˆ n m numbr who attnd unvrsts or polytchncs by som ag. n numbr who compltd scondary school n that class. Assumng that ths stmatd probablty s not or, w can run th followng wth OLS. Prob ˆ ln ˆ ˆ Prob ( ) β β X ε - ˆ whr th 'hats' on th coffcnts ndcat that ths ar stmatd wth 'groupd' data, and that w los nformaton n aggrgatng. Snc th dsturbancs ar htroskdastc: W can transform th data by multplyng through by th squar root of th wghtng varabl: Var ( ε ) n Prob ˆ (- Prob ˆ ) n Prob ˆ (- Prob ˆ ) Ths WLS procdur wll yld mor ffcnt stmators. III. Th Probt Modl Th probt modl s nothng mor than an altrnatv rgrsson functon that also asymptotcally approachs th zro and on horzontal lns. Th dffrnc s that t s basd on a 'normal' rathr than a 'logstc' dstrbuton functon. Rcall that undr th Logt modl th probablty that an vnt wll occur s wrttn: Undr probt, w lt Prob Prob - to b th CDF of a normal dstrbuton:

Pag- 9 Prob 2π 2 t xp( 2 ) - whr t s a standardsd normal varabl, wth zro man and unt varanc. For ths rason, t should b calld Normt. In gnral, thr s no rason to prfr logt ovr probt or vc vrsa. Probt dos hav a slghtly dffrnt rgrsson functon (although t asymtotcally approachs zro and on lk logt). dt It approachs th xtrm valus fastr than logt. Numrcal xampl. Th rgrsson modl: LF β β M β 2 S ε whr LF f woman n labour forc; othrws. M f woman s marrd; othrws. S numbr of yars of schoolng.. LPM (OLS). No corrcton for htroskdastcty. Lˆ F -.28 -.38 M...(.5).9 S (.3)

Pag- W can ntrprt th ffct of martal status on labour forc partcpaton: E ( E ( LF M, S 2) -.28.9(2).8 LF M, S 2) -.28 -.38.9(2).42 2. LPM (Wghtd Last Squars). LF ˆ -.2 -.39...(.5) whr th rlvant wght s th product of th probablts of bng n and out of th labour forc stmatd from th prvous rgrsson. It's asy to vrfy th problm of 'unrstrctd rang' of stmatd probablts undr LPM. E ( Th stmatd probablty of an unmarrd woman wth 6 yars of ducaton bng n th labour forc s 7%. 3. Logt (Maxmum lklhood). W us th sam ndvdual data to stmat th quaton wth logt. Th rsults ar rprsntd n trms of th log of th odds rato (vn though maxmum lklhood stmaton on th ndvdual data was usd). 4. Probt (Maxmum lklhood). M S.8...(.2) LF M, S 6) -.2.8(6).7 LF ln - 5.89-2.59 M - LF...(.8).69 S (.3) Agan, th ndvdual data ar usd wth maxmum lklhood probt. ( LF ) - 3.44 -.44 M...(.62) Whr F - s th nvrs of th normal CDF. F -.4 S (.7)

Pag- W can't compar th magntuds of th coffcnt stmats from th logt and probt, but th t tsts ar prformd n th tradtonal mannr. Th t ratos ar around 2.2 and 2.3 on M, rspctvly, and bttr than 2 on S n both rgrssons. Th magntuds of th stmatd coffcnts hav no conomc manng, bcaus thy r rlatd to labour forc partcpaton n a nonlnar way. IV. Qustons for Dscusson: Q3.2 V. Computng Exrcs: Johnson, Ch3