This is natural first assumption, unless theory rejects it.

Sectn Smple Regressn What regressn es Relatnshp between varables Often n ecnmcs we beleve that there s a (perhaps causal) relatnshp between tw varables. Usually mre than tw, but that s eferre t anther ay. We call ths the ecnmc mel. Functnal frm Is the relatnshp lnear? y Ths s natural frst assumptn, unless thery rejects t. s slpe, whch etermnes whether relatnshp between an y s pstve r negatve. s ntercept r cnstant term, whch etermnes where the lnear relatnshp ntersects the y as. Is t plausble that ths s an eact, etermnstc relatnshp?. Data (almst) never ft eactly alng lne. Why? Measurement errr (ncrrect efntn r msmeasurement) Other varables that affect y Relatnshp s nt purely lnear Relatnshp may be fferent fr fferent bservatns S the ecnmc mel must be mele as etermnng the epecte value f y E y : The cntnal mean f y gven s Ang an errr term fr a stchastc relatnshp gves us the actual value f y: y e Errr term e captures all f the abve prblems. Errr term s cnsere t be a ranm varable an s nt bserve rectly. Varance f e s, whch s the cntnal varance f y gven, the varance f the cntnal strbutn f y gven. The smplest, but nt usually val, assumptn s that the cntnal varance s the same fr all bservatns n ur sample (hmskeastcty) ~ 6 ~

E y, whch means that the epecte value f y ncreases by unts when ncreases by ne unt Des t matter whch varable s n the left-han se? At ne level, n: y e, s y v, where,, v e. Fr purpses f mst estmatrs, yes: We shall see that a crtcally mprtant assumptn s that the errr term s nepenent f the regressrs r egenus varables. Are the errrs shcks t y fr gven r shcks t fr gven y? It mght nt seem lke there s much fference, but the assumptn s crucal t val estmatn. Egenety: s egenus wth respect t y f shcks t y nt affect,.e., y es nt cause. Where the ata cme frm? Sample an ppulatn We bserve a sample f bservatns n y an. Depenng n cntet these samples may be Drawn frm a larger ppulatn, such as census ata r surveys Generate by a specfc ata-generatng prcess (DGP) as n tmeseres bservatns We usually wul lke t assume that the bservatns n ur sample are cv y, y 0, j. statstcally nepenent, r at least uncrrelate: j We wll assume ntally (fr a few weeks) that the values f are chsen as n an eperment: they are nt ranm. We wll a ranm regressrs sn an scver that they n t change thngs much as lng as s nepenent f e. Gals f regressn True regressn lne: actual relatnshp n ppulatn r DGP True an f (e ) Sample f bservatns cmes frm rawng ranm realzatns f e frm f (e ) an plttng pnts apprprately abve an belw the true regressn lne. We want t fn an estmate regressn lne that cmes as clse t the true regressn lne as pssble, base n the bserve sample f y an pars: Estmate values f parameters an ~ 7 ~

Estmate prpertes f prbablty strbutn f errr term e Make nferences abut the abve estmates Use the estmates t make cntnal frecasts f y Determne the statstcal relablty f these frecasts Summarzng assumptns f smple regressn mel Assumptn #0: (Implct an unstate) The mel as specfe apples t all unts n the ppulatn an therefre all unts n the sample. All unts n the ppulatn uner cnseratn have the same frm f the relatnshp, the same ceffcents, an errr terms wth the same prpertes. If the Unte States an Mal are n the ppulatn, they really have the same parameters? Ths assumptn unerles everythng we n ecnmetrcs, an thus t must always be cnsere very carefully n chsng a specfcatn an a sample, an n ecng fr what ppulatn the results carry mplcatns. SR: y e SR: Ee 0, s E y te that f s ranm, we make these cntnal epectatns Ee 0 E y SR3: var e var y If s ranm, ths becmes var e var y We shul (an wll) cnser the mre general case n whch varance vares acrss bservatns: heterskeastcty SR4: e ej y yj cv, cv, 0 Ths, t, can be relae: autcrrelatn SR5: s nn-ranm an takes n at least tw values We wll allw ranm later an see that Ee 0 mples that e must be uncrrelate wth. SR6: (ptnal) e~ 0, Ths s cnvenent, but nt crtcal snce the law f large numbers assures that fr a we varety f strbutns f e, ur estmatrs cnverge t nrmal as the sample gets large Strateges fr btanng regressn estmatrs What s an estmatr? ~ 8 ~

A rule (frmula) fr calculatng an estmate f a parameter (,, r ) base n the sample values y, Estmatrs are ften ente by ^ ver the varable beng estmate: An estmatr f mght be ente ˆ Hw mght we estmate the ceffcents f the smple regressn mel? Three strateges: Meth f least-squares Meth f mments Meth f mamum lkelh All three strateges wth the SR assumptns lea t the same estmatr rule: the rnary least-squares regressn estmatr: (b, b, s ) Meth f least squares Estmatn strategy: Make sum f square y-evatns ( resuals ) f bserve values frm the estmate regressn lne as small as pssble. Gven ceffcent estmates b, b, resuals are efne as eˆ y b b Or eˆ ˆ y y, wth yˆ b b Why nt mnmze the sum f the resuals? We n t want sum f resuals t be large negatve number: Mnmze sum f resuals by havng all resuals nfntely negatve. Many alternatve lnes that make sum f resuals zer (whch s esrable) because pstves an negatves cancel ut. Why use square rather than abslute value t eal wth cancellatn f pstves an negatves? Square functn s cntnuusly fferentable; abslute value functn s nt. Least-squares estmatn s much easer than least-absluteevatn estmatn. Prmnence f Gaussan (nrmal) strbutn n nature an statstcal thery fcuses us n varance, whch s epectatn f square. Least-abslute-evatn estmatn s ccasnally ne (specal case f quantle regressn), but nt cmmn. Least-abslute-evatn regressn gves less mprtance t large utlers than least-squares because squarng gves large emphass t resuals wth large abslute value. Tens t raw the regressn lne twar these pnts t elmnate large square resuals. Least-squares crtern functn: S eˆ y b b ~ 9 ~ Least-squares estmatrs s the slutn t mns. Snce S s a cntnuusly fferentable functn f the estmate parameters, we can b, b

fferentate an set the partal ervatves equal t zer t get the leastsquares nrmal equatns: S y b b0, b y b b 0. S y b b 0 b y b b 0 y b b 0 b y b. te that the b cntn assures that the regressn lne passes thrugh the pnt, y. Substtutng the secn cntn nt the frst ve by : y y b b y y b 0 0 b y y y y ˆ XY ˆ X The b estmatr s the sample cvarance f an y ve by the sample varance f. What happens f s cnstant acrss all bservatns n ur sample? Denmnatr s zer an we can t calculate b. Ths s ur frst encunter wth the prblem f cllnearty: f s a cnstant then s a lnear cmbnatn f the ther regressr the cnstant ne that s multple by b. Cllnearty (r multcllnearty) wll be mre f a prblem n multple regressn. If t s etreme (r perfect), t means that we can t calculate the slpe estmates. The abve equatns are the rnary least-squares (OLS) ceffcent estmatrs. Meth f mments Anther general strategy fr btanng estmatrs s t set estmates f selecte ppulatn mments equal t ther sample cunterparts. Ths s calle the meth f mments. In rer t emply the meth f mments, we have t make sme specfc assumptns abut the ppulatn/dgp mments.. ~ 0 ~

Assume Ee 0,. Ths means that the ppulatn/dgp mean f the errr term s zer. Crrespnng t ths assumptn abut the ppulatn mean f e s the sample mean cntn eˆ 0. Thus we set the sample mean t the value we have assume fr the ppulatn mean. Assume cv e, 0, whch s equvalent t E E( ) e 0. Crrespnng t ths assumptn abut the ppulatn cvarance between the regressr an the errr term s the sample cvarance cntn: ˆ e 0. Agan, we set the sample mment t the zer value that we have assume fr the ppulatn mment. Pluggng the epressn fr the resual nt the sample mment epressns abve: y b b 0, b y b. Ths s the same as the ntercept estmate equatn fr the least-squares estmatr abve. y b b0, y yb b 0, y yb 0, y y b. Ths s eactly the same equatn as fr the OLS estmatr. cv e, 0 n the ppulatn, then the OLS estmatr can be erve by the meth f mments as well. Thus, f we assume that Ee 0, an (te that bth f these mment cntns fllw frm the etene assumptn SR that E(e ) = 0.) Meth f mamum lkelh Cnser the jnt prbablty ensty functn f y an, f (y,, ). The functn s wrtten s cntnal n the ceffcents t make eplct that the jnt strbutn f y an are affecte by the parameters. ~ ~

Ths functn measures the prbablty ensty f any partcular cmbnatn f y an values, whch can be lsely thught f as hw prbable that utcme s, gven the parameter values. Fr a gven set f parameters, sme bservatns f y an are less lkely than thers. Fr eample, f = 0 an < 0, then t s less lkely that we wul see bservatns where y > 0 when > 0, than bservatns wth y < 0. The ea f mamum-lkelh estmatn s t chse a set f parameters that makes the lkelh f bservng the sample that we actually have as hgh as pssble. The lkelh functn s just the jnt ensty functn turne n ts hea: L,, y f, y,. If the bservatns are nepenent ranm raws frm entcal prbablty strbutns (they are IID), then the verall sample ensty (lkelh) functn s the pruct f the ensty (lkelh) functn f the nvual bservatns: f, y,, y,, n, yn, f, y, n L,, y,, y,,, y L,, y. n n If the cntnal prbablty strbutn f e cntnal n s Gaussan (nrmal) wth mean zer an varance :,,,, f y L y e y Because f the epnental functn, Gaussan lkelh functns are usually manpulate n lgs. te that because the lg functn s mntnc, mamzng the lg-lkelh functn s equvalent t mamzng the lkelh functn tself. ln L ln y Fr an nvual bservatn: Aggregatng ver the sample: ln L,, y ln L,, y ln y ln y. ~ ~

The nly part f ths epressn that epens n r n the sample s the fnal summatn. Because f the negatve sgn, mamzng the lkelh functn (wth respect t ) s equvalent t mnmzng the summatn. But ths summatn s just the sum f square resuals that we mnmze n OLS. Thus, OLS s MLE f the strbutn f e cntnal n s Gaussan wth mean zer an cnstant varance, an f the bservatns are IID. Evaluatng alternatve estmatrs (nt mprtant fr cmparsn here snce all three are same, but are they any g?) Desrable crtera Unbaseness: estmatr s n average equal t the true value E ˆ Small varance: estmatr s usually clse t ts epecte value var ˆ E ˆ Eˆ Small RMSE can balance varance wth bas: RMSE MSE MSE E ˆ We wll talk abut BLUE estmatrs as mnmum varance wthn the class f unbase estmatrs. Samplng strbutn f OLS estmatrs b an b are ranm varables: they are functns f the ranm varables y an e. We can thnk f the prbablty strbutn f b as ccurrng ver repeate ranm samples frm the unerlyng ppulatn r DGP. In many (mst) cases, we cannt erve the strbutn f an estmatr theretcally, but must rely n Mnte Carl smulatn t estmate t. (See belw) Because OLS estmatr (uner ur assumptns) s lnear, we can erve ts strbutn. ~ 3 ~

We can wrte the OLS slpe estmatr as b y y e y e e e The thr step uses the prperty y, snce the epecte value f e s zer. Fr nw, we are assumng that s nn-ranm, as n a cntrlle eperment. If s fe, then the nly part f the frmula abve that s ranm s e. The frmula shws that the slpe estmate s lnear n e. Ths means that f e s Gaussan, then the slpe estmate wll als be Gaussan. Even f e s nt Gaussan, the slpe estmate wll cnverge t a Gaussan strbutn as lng as sme mest assumptns abut ts strbutn are satsfe. Because all the varables are nn-ranm, they can cme utse when we take epectatns, s e Ee Eb E. What abut the varance f b? We wll the etals f the analytcal wrk n matr frm because t s easer ~ 4 ~

var b Eb E Ee. ~ 5 ~ HGL equatns.4 an.6 prve frmulas fr varance f b an the cvarance between the ceffcents: var b b b cv, 0 te that the cvarance between the slpe an ntercept estmatrs s negatve: verestmatng ne wll ten t cause us t unerestmate the ther What etermnes the varance f b? Smaller varance f errr mre precse estmatrs Larger number f bservatns mre precse estmatrs Mre spersn f bservatns arun mean mre precse estmatrs What we knw abut the verall prbablty strbutn f b? If assumptn SR6 s satsfe an e s nrmal, then b s als nrmal because t s a lnear functn f the e varables an lnear functns f nrmally strbute varables are als nrmally strbute. If assumptn SR6 s nt satsfe, then b cnverges t a nrmal strbutn as prve sme weak cntns n the strbutn f e are satsfe. These epressns are the true varance/cvarance f the estmate ceffcent vectr. Hwever, because we nt knw, t s nt f practcal use t us. We nee an estmatr fr n rer t calculate a stanar errr f the ceffcents: an estmate f ther stanar evatn.

The requre estmate n the classcal case s s eˆ. We ve by because ths s the number f egrees f freem n ur regressn. Degrees f freem are a very mprtant ssue n ecnmetrcs. It refers t hw many ata pnts are avalable n ecess f the mnmum number requre t estmate the mel. In ths case, t takes mnmally tw pnts t efne a lne, s the smallest pssble number f bservatns fr whch we can ft a bvarate regressn s. Any bservatns beyn make t (generally) mpssble t ft a lne perfectly thrugh all bservatns. Thus, s the number f egrees f freem n the sample. We always ve sums f square resuals by the number f egrees f freem n rer t get unbase varance estmates. Fr eample, n calculatng the sample varance, we use s z z because there are egrees f freem left after usng ne t calculate the mean. Here, we have tw ceffcents t estmate, nt just ne, s we ve by. The stanar errr f each ceffcent s the square rt f the crrespnng agnal element f that estmate cvarance matr. te that the HGL tet uses an alternatve frmula base n ˆ eˆ. Ths estmatr fr s base because there are nly egrees f freem n the resuals are use up n estmatng the parameters. In large samples they are equvalent. Hw g s the OLS estmatr? Is OLS the best estmatr? Uner what cntns? Uner classcal regressn assumptns SR SR5 (but nt necessarly SR6) the Gauss- Markv Therem shws that the OLS estmatr s BLUE. Any ther estmatr that s unbase an lnear n e has hgher varance than b. te that (5, 0) s an estmatr wth zer varance, but t s base n the general case. Vlatn f any f the SR SR5 assumptns usually means that there s a better estmatr. ~ 6 ~

Least-squares regressn mel n matr ntatn (Frm Grffths, Hll, an Juge, Sectn 5.4) We can wrte the th bservatn f the bvarate lnear regressn mel as y e. Arrangng the n bservatns vertcally gves us n such equatns: y e, y e, y. e Ths s a system f lnear equatns that can be cnvenently rewrtten n matr frm. There s n real nee fr the matr representatn wth nly ne regressr because the equatns are smple, but when we a regressrs the matr ntatn s mre useful. Let y be an clumn vectr: y y y. y Let X be an matr: X. s a clumn vectr f ceffcents:. An e s an n vectr f the errr terms: e e e. e Then y X e epresses the system f equatns very cmpactly. (Wrte ut matrces an shw hw multplcatn wrks fr sngle bservatn.) In matr ntatn, eˆ yxb s the vectr f resuals. ~ 7 ~

Summng squares f the elements f a clumn vectr n matr ntatn s just the nner pruct: eˆ ˆˆ, ee where prme entes matr transpse. Thus we want t mnmze ths epressn fr least squares. ee ˆˆ yxb yxb ybx yxb yy bxy bxxb. Dfferentatng wth respect t the ceffcent vectr an settng t zer yels Xy XXb 0, r XXb Xy. Pre-multplyng by the nverse f XX yels the OLS ceffcent frmula:. b XX Xy (Ths s ne f the few frmulas that yu nee t memrze.) te symmetry between matr frmula an scalar frmula. Xy s the sum f the crss pruct f the tw varables an XX s the sum f squares f the regressr. The frmer s n the numeratr (an nt nverte) an the latter s n the enmnatr (an nverte). In matr ntatn, we can epress ur estmatr n terms f e as XX XX e XX Xe. b XX Xy XX XX XX Xe When s nn-stchastc, the cvarance matr f the ceffcent estmatr s als easy t cmpute uner the OLS assumptns. Cvarance matrces: The cvarance f a vectr ranm varable s a matr wth varances n the agnal an cvarances n the ffagnals. Fr an M vectr ranm varable z, the cvarance matr s t the fllwng uter pruct: cv( z) E zez zez E z Ez Ez Ezz Ez Ez EzzM Ez E z Ez z Ez E z Ez E z Ez z M Ez. Ez EzzM Ez Ez EzzM Ez EzM Ez In ur regressn mel, f e s IID wth mean zer an varance, then cv e E ee I, wth I beng the rer- entty Ee = 0 an matr. ~ 8 ~

We can then cmpute the cvarance matr f the (unbase) estmatr as cv E XX Xe XX Xe be bb E XX XeeX XX XX XE eexxx XX XX XX XX What happens t var. b as gets large? Summatns n XX have atnal terms, s they get larger. Ths means that nverse matr gets smaller an varance ecreases: mre bservatns mples mre accurate estmatrs. te that varance als ncreases as the varance f the errr term ges up. Mre mprecse ft mples less precse ceffcent estmates. Our estmate cvarance matr f the ceffcents s then s XX. The (, ) element f ths matr s s ˆ e Ths s the frmula we calculate n class fr the scalar system. Thus, t summarze, when the classcal assumptns hl an e s nrmally strbute, b~, XX.. Asympttc prpertes f OLS bvarate regressn estmatr (Base n S&W, Chapter 7.) Cnvergence n prbablty (prbablty lmts) Assume that S, S,, S, s a sequence f ranm varables. In practce, they are gng t be estmatrs base n,,, bservatns. ~ 9 ~

p S f an nly f lm Pr S 0 fr any > 0. Thus, fr any small value f, we can make the prbablty that S s further frm than arbtrarly small by chsng large enugh. p If S, then we can wrte plm S =. Ths means that the entre prbablty strbutn f S cnverges n the value as gets large. Estmatrs that cnverge n prbablty t the true parameter value are calle cnsstent estmatrs. Cnvergence n strbutn If the sequence f ranm varables {S } has cumulatve prbablty strbutns F, F,, F,, then whch F s cntnuus. S S f an nly f F t F t lm, fr all t at If a sequence f ranm varables cnverges n strbutn t the nrmal strbutn, t s calle asympttcally nrmal. Prpertes f prbablty lmts an cnvergence n strbutn Prbablty lmts are very frgvng: Slutsky s Therem states that plm (S + R ) = plm S + plm R plm (S R ) = plm S plm R plm (S / R ) = plm S / plm R The cntnuus-mappng therem gves us Fr cntnuus functns g, plm g(s ) = g(plm S ) An f S g S g S. S, then Further, we can cmbne prbablty lmts an cnvergence n strbutn t get If plm a = a an S S, then as as a S a S S / a S / a These are very useful snce t means that asympttcally we can treat any cnsstent estmatr as a cnstant equal t the true value. Central lmt therems There s a varety wth slghtly fferent cntns. Basc result: If {S } s a sequence f estmatrs f, then fr a we varety f S 0,, where s the varance f unerlyng strbutns, the unerlyng statstc. Applyng asympttc thery t the OLS mel Uner the mre general cntns than the nes that we have typcally assume (nclung, specfcally, the fnte kurtss assumptn, but nt the ~ 30 ~

hmskeastcty assumptn r the assumptn f fe regressrs), the OLS estmatr satsfes the cntns fr cnsstency an asympttc nrmalty. b var E( ) e 0,. Ths s general case wth var heterskeastcty. Wth hmskeastcty, the varable reuces t the usual frmula: b 0,. var plm ˆ, as prven n Sectn 7.3. b b t b se.. b 0,. Chce fr t statstc: If hmskeastc, nrmal errr term, then eact strbutn s t. If heterskeastc r nn-nrmal errr (wth fnte 4 th mment), then eact strbutn s unknwn, but asympttc strbutn s nrmal Whch s mre reasnable fr any gven applcatn? Lnearty an nnlnearty The OLS estmatr s a lnear estmatr because b s lnear n e (whch s because y s lnear n ), nt because y s lnear n. OLS can easly hanle nnlnear relatnshps between y an. lny = + y = + etc. Dummy (ncatr) varables take the value zer r ne. Eample: MALE = f male an 0 f female. y MALE e Fr females, E y MALE Fr males, Ey MALE Thus, s the fference between the epecte value f males an females. ~ 3 ~