Applied regression. Dr. Nitiphong Songsrirote

Size: px
Start display at page:

Download "Applied regression. Dr. Nitiphong Songsrirote"

Transcription

1 Appled regresson Dr. Ntphong Songsrrote 553

2 of 84 of 84 Page of 7 Avery robust statstcalmethodologythat tradtonally hasusedexstngrelatonshps exstng betweenvarablestoallowpredctonofthe valuesofonevarablefromoneormoreothers Examples Salescanbepredctedusngadvertsngexpendture g p Performanceonapttudetestscanbeusedtopredct jobperformance GPAafterfrstyearnPhDprogramcanbepredcted fromgmatscore 3 of 84 4 of 84 In the late 800s, Sr Francs Galton observed that heghts of chldren of both short and tall parents appeared to come closer to the mean of the group: extraordnary parents gave brth to more ordnary chldren Galton consdered ths to be a regresson to medocrty Today we understand that ths effect s due to the presence of other heght predctors: chldren wth parents of extraordnary heght ht may be ordnary n other heght ht determnants, such as nutrton

3 5 of 84 6 of 84 Page 3 of Sales $ S Unts Sold al GPA Actu y = x R = = sthedependent orcrteron varable sthendependent ndependent orpredctor varable Valueofexactlypredctedby No error ofpredctonexsts aperfect relatonshpbetweenand d Entrance Test Score Scatterdagramshowngrelatonshpbetweentwovarables,Score &GPA = doesnotperfectlypredct predct ourgpaattheendofthefrstyearcannotbeexactlypredctedbyyourscoreon anentranceexam RelatonshpbetweenScore andgpa appearstobelnear 7 of 84 8 of 84 Crteronandpredctorvarableswere quanttatve&measuredat,atleast,the t t t th nterval level Alnearrelatonshpmustexstbetween crteronandpredctorvarables Doestheexstenceofnonlneartynthe ndependentvarablemakethemodelanon the model a non lnearregressonmodel? Abltyto partal outtheeffectsofspecfc predctorvarablesonthecrteronnstuatons nwhchpredctorsarenotorthogonaltoone another Canestablshtheunquecontrbutonsofeach h b h predctortovarancenthecrteron Allowsdentfcatonof of spurous relatonshps Studysystemsofcausalrelatonshps=f(C,D,E, etc. Expermental&Nonexpermentaldesgns CausalModelng,CovaranceStructureModelng (C&C pp. -0

4 9 of 84 0 of 84 Page 4 of 7 Formofthedatamorethanquanttatve, ntervalorrato rato Datacanrangefromnomnaltorato Nomnallyscaledpredctorsbggestdeparture s departure Nomnalvarablestradtonalassessedwthnthecontextof ANOVA,ANCOVA(by groupng thevalues CanmxscaletypesnMRC Shapeofrelatonshpneednotbelnear Predctorsmaybelnearornonlnear l Transformatonsofnonlneardatapossbleto producelneartyrequredfortheregressonmodel requred the regresson s Score d-s Motvaton Scatter dagram showng relatonshp between two varables, Motvaton & d-s Score Motvaton ( does not perfectly predct d-s Score ( Relatonshp between Motvaton and d-s Score appears to be curvlnear of 84 of 84 Investgate condtonalrelatonshps Interactonsbetweenpredctorvarablesor t t t bl groupsofvarables ExtendsANOVA,ANCOVA Notlmtedtonteractonsbetweennomnallyscaled varables Canassessnteractonsbetweenpredctorsmeasuredat vrtuallyanylevel l l Extremelycommonnbehavoralscences. 0 Populaton Regresson Functon =valueofobservedresponseon th tral; 0 and areparameters; =valueofpredctoron th tral(aconstant; sarandomerrorterm E{ }=0(expectedvalueoferrortermsszero { }= (varanceoferrortermssconstant {, j }= 0forall,j; j(errortermsdonotcovary.e.arenot are correlated =,.n

5 3 of 84 4 of 84 Page 5 of 7 E{ } Each conssts of two parts: ( a constant term predcted by the regresson equaton; and, ( a random error term unque to. The error term makes a random varable. } 0 E{ } E{} 0 The regresson functon predcts the expected value of for a gven = the change n the mean of the probablty dstrbuton for for each unt ncrease n. The regresson functon predcts the expected value of for a gven. 0 = -ntercept; the mean of the probablty dstrbuton for when = 0. Assumes scope of model ncludes = 0. Values of come from a probablty dstrbuton wth mean of E{ } = of 84 6 of 84 Score GPA Predcton of GPA at end of frst year based on GMAT GMAT Score ear GPA Frst e t ear GPA Frst GMAT Score E { }

6 7 of 84 8 of 84 Page 6 of 7 RegressonFunctonspecfesrelatonshp betweenpredctorandresponsevarablesna response n a populaton Valuesofregressonparameters( 0 and areestmatedfromsampledatadrawnfrom thepopulaton.dataareobtanedva: Observaton Expermentaton Survey Technqueemployedtoproduceestmatesb dt d t t 0 andb for 0 and,respectvely. Fndthosevaluesofb 0 andb thatmnmze thesumofallsquarederrorterms( squa ed e o te s n Q ( 0 The estmators of 0 and are the values of b 0 and b that mnmze Qfor a set of sample observatons. 9 of 84 0 of 84 Assume from GPA example that: b0 = -.5 b = Predcted GPA GMAT GPA Error Sum = Q Q n n ( 0 ( b0 b Q 3.64

7 of 84 of 84 Page 7 of 7 GP PA GMAT b 0 = -.5; b =.0. Q = Looks pretty good! Seems qute reasonable, but. are there other values of b 0 and db that t provde smaller Q s for the sample data? GPA GMAT b 0 = -.70; b =.0084 Q = 3.4. Looks even better! Ths s the least squares soluton that mnmzes Q. No other values of fb 0 and db wll provde a smaller value of Q. Lets see a small macro. 3 of 84 4 of 84 GMAT Score GPA Predcted GPA Error Squared Error ( ( ( Terms Terms Q = 3.4 The soluton that mnmzes Q: b0 = b = Numercal Search Procedures Analytc Procedures

8 5 of 84 6 of 84 Page 8 of 7 UnconstranedOptmzatonAlgorthms Systematcallysearchforvaluesofb t hf l fb 0 andb db thatt mnmzeqforagvensetofdata Spreadsheetsolutonpossble. ExcelExampleUsngGMATdata b0 = -5 b = 0.05 GMAT Score ( GPA ( Predcted GPA ( Error Terms Squared Error Terms ˆ e ˆ b 0 b e ( ( ˆ n Q = 86.4 Q 7 of 84 8 of 84 Drectsolutonforvaluesofb of 0 andb thatmnmzeq Usngcalculuscanfndsetofsmultaneousequatons, the normalequatons Normalequatonsforb 0 andb are: b ( ( ( nb0 b b b 0 b 0 b b n

9 9 of of 84 Page 9 of GMAT Score GPA ( ( ( ( Total Mean of 84 3 of 84 ( ( 766 ( 9, 00 b 766 9, b0 b (500.70

10 33 of of 84 Page 0 of 7 ˆ b 0 b ˆ s the estmate of E{}, the mean response, when the level of the predctor s. b 0 and b are estmates of 0 and, respectvely ˆ s the ftted value for the th case (.e. when = 35 of of 84 b 0 = -.70; b =.0084 ˆ ˆ ( GMAT Predcted Score GPA GPA ( ( ( ˆ Note dfference between observed value and ftted values..

11 37 of 84 GMAT Predcted Squared Score GPA GPA Error Error ( ( ˆ ( Terms Terms ˆ b0 b e ˆ e ˆ ( ˆ e ˆ e (resdual s the known devaton between the observed value and the ftted value (model error term s the devaton between the observed value and the unknown true regresson lne. e s an estmate of 38 of 84 Page of 7 39 of of 84 Follow from propertes of normal equatons... GMAT Score ( ˆe GPA ( ˆ e e e Sums Average 500.5

12 4 of 84 4 of 84 Page of 7 Unbased estmator of s MSE MSE SSE df ˆ e n n 43 of of 84 MSE e n 8 ANOVA b Extenson of regresson model requred to: make nferences about estmators; conduct sgnfcance tests; construct confdence ntervals around estmates Model Regresson Resdual Total a. Predctors: (Constant, GMAT b. Dependent Varable: GPA Sum of Mean Squares df Square F Sg a

13 45 of of 84 Page 3 of 7 0 =valueofobservedresponseon th tral; 0 and areparameters; =valueofpredctoron th tral(aconstant; sarandomerrorterm E{ }=0(expectedvalueoferrortermsszero { }= }= (varanceoferrortermssconstant error s constant {, j }=0forall,j; j(errortermsdonotcovary.e.arenot correlated =,.n 0 0 and areparameters; = on th valueofpredctoron tral(aconstant; a =,.n =valueofobservedresponseon th tral;ndependentnormal randomvarables E{ } }= 0 + Varanceof sarandomerrortermn(0, E{ }=0(expectedvalueoferrortermsszero { }= }= (varanceoferrortermssconstant error s constant {, j }=0forall,j; j(errortermsarendependent;donotcovary;are notcorrelated Errortermsarenormallydstrbuted Same as Regresson Model, except that error terms are now assumed to be normally dstrbuted of of 84 Maxmum Lkelhood Estmaton Requres functonal form of probablty dstrbuton of random error terms. Provdes estmates of requred parameters that are most consstent wth the sample data. In case of smple lnear regresson, the MLE estmators for b 0 and b are BLUE. The MLE estmator for s based but works out OK when sample sze s large. Ch. Inferences n Smple Lnear Regresson

14 49 of of 84 Page 4 of and areparameters; =valueofpredctoron th tral(aconstant; =,.n =valueofobservedresponseon th tral;ndependentnormalrandom varables E{ }= 0 + Varanceof sarandomerrortermn(0, E{ }=0(expectedvalueoferrortermsszero { }= (varanceoferrortermssconstant {, j }=0forall,j; j(errortermsarendependent;donotcovary;arenot correlated Errortermsarenormallydstrbuted 5 of 84 5 of 84 Null Hypothess Alternatve Hypothess Usual nference about Ho: 0 Ha: 0 H 0 : Slope of the regresson lne s 0; there s no lnear relatonshp between and. Slopeoftheregressonlnes0; l Theresnolnearrelatonshpbetween and; Regressonlneshorzontal Meansofprobabltydstrbutonsforall areequal: Probabltydstrbutonsofall of all aredentcal E{} 0 (0 0 Ho: 0

15 53 of of 84 Page 5 of 7 0 b b s{b } s{b } Studentzedteststatstc(ssestmated Dstrbutedast t n s{b } { MSE s { b } Pontestmators {b } sanunbased estmatorofof {b } s{b } = s b } { 55 of of GMAT Score GPA ( ( ( ( Total Mean 500.5

16 57 of of 84 Page 6 of 7 GMAT Score ( ˆe GPA ( ˆ e e e Sums Average } MSE s{ b MSE.89 s { b }.004 9,00 59 of of 84 Ho: 0 Ha: 0 ControlrskofTypeIerrorat =.05 Teststatstcststatstc t* t b s{ b } * / ; n conclude Ho If t * t, Ho / ; n conclude not Ho If t * t, t / ; n t (. 975;8.0

17 6 of 84 6 of 84 Page 7 of 7 b =.0084;S{b }=.004 t*=.0084/.004=5.83 t(.975,8=.0(crtcalt t*>t,thereforenotho The null hypothess must be rejected... Model (Constant GMAT a. Dependent Varable: GPA Unstandardzed Coeffcents Coeffcents a Standardzed Coeffcents B Std. Error Beta t Sg E Computes probablty of two-taled t drectly -- much better! 63 of of 84 Assume for GMAT example that we thnk that the relatonshp between GMAT and GPA should always be postve... Null ll&alt Alternatve t Hypotheses Ho: 0 Ha: 0 t * t ; n conclude Ho * t ; n conclude ld not Ho If, If t,

18 65 of of 84 Page 8 of 7 t*=5.83(sameasbefore tsnowsmaller smaller all5%snone tal,ratherthanspreadacrosstwo tals t=.734 t*>t,rejectnotho b probablysnotlessthanorequalto0 less than or equal to Mayassumeb spostve 67 of of 84 ANOVAparttonsthesumofsquare(SSn thecrteronvarablentotwoparts: nto two SSthatcanbeattrbutedtothepredctor;and, ErrorSS sumofsquaresunquetothecrteron TotalSSncrteronsSSTO SSattrbutedtopredctorsSSR to s SSR ErrororunqueSSsSSE ˆ E SS SSE ˆ SSTO SSR SSE = ˆ + ˆ ˆ ˆ ˆ b b 0

19 69 of of 84 Page 9 of 7 Source SS df MS F* p SSR = Regresson MSR = MSR/ ˆ SSR/ MSE Error SSE = ˆ n- SSTO = Total n- MSE = SSE/(n- E{MSR} = E{MSE} =.e. an unbased estmator of the error varance.. If = 0, MSR and MSE about same sze & F* wll be small... ˆ ˆ ˆ ˆ Totals Averages.50 7 of 84 7 of 84 Source SS df MS F* p Regresson Error Model Regresson Resdual Total a. Predctors: (Constant, t GMAT b. Dependent Varable: GPA ANOVA b Sum of Squares df Mean Square F Sg a Total

20 73 of of 84 Page 0 of 7 AppropratetestsF Anuppertaltest test F*sdstrbutedasF(;,n Ho: =0; 0;Ha: 0 Decsonrule IfF* F(; F( ;,n n,concludeho IfF*>F(;,n,concludenotHo ForGMATexample F*=34.005,F(.95;,8=4.4 ConcludenotHonot Insmpleregresson(.e.asnglepredctor varablesemployedforagvens employed a F*=(t* tstwotaled 75 of of 84 Fta fullmodel tothedataandobtansse(f the and obtan SSE(F= ˆ SmplytheSSEobtanedfromfttngastandard SSE ftt t d d regressonlnetothedata: = Fta reducedmodel tothedataandobtan the and obtan SSE(R ConsderHo usuallyho: =0 ModelwhenHoholdssthereducedmodel When =0,modelreducesto to = 0 + I Becausebestestmatorof 0 s,sse(r= SSE(R=SSTO SSTO

21 77 of of 84 Page of 7 Snce SSE(R = SSTO, df=n- F* If F* F( ; df If F* F( ;df Snce SSE(F = SSE, df=n- SSE ( R SSE( F df R df F SSE( F df F R R df,df,conclude Ho F f df,df,conclude not Ho F f r sthecoeffcentofdetermnaton r =SSR/SSTO= = SSE/SSTO 0 r r sthe proportonofvarancenthecrteronof n the crteron assocatedwththeuseofthepredctor Whenallobservatonsfalldrectlyonregressonlne, on regresson lne predctorperfectlyexplansallvaratonnthe crteronandr = Whenregressonlneshorzontal(b =0,SSE=SSTO andr =0(Caveat:Whathappenswhenlnes horzontalbutallpontsfallont? t ll t ll t? 79 of of 84 ANOVA b Model Regresson Resdual Total a. Predctors: (Constant, GMAT b. Dependent Varable: GPA Sum of Mean Squares df Square F Sg a r = SSR/SSTO = 6.434/9.84 =.654 Measures of Strength of Assocaton Model Model Summary Std. Error Adjusted of the R R Square R Square Estmate.809 a a. Predctors: (Constant, GMAT (The coeffcent of correlaton

22 8 of 84 8 of 84 Page of 7 Correlaton Coeffcent Pearson s Product-Moment Correlaton r xy = Correlaton between two contnuous varables measured at least at nterval level - r xy + Actually, sd sd 0 n (prove ths as an exercse 4 Pearson sproductmomentcorrelaton P tc t (contnued r= r Correlatonbetweentwocontnuousvarables two measuredatleastatntervallevel Unlker,doesnothaveaclear a clearcut nterpretaton Usedextensvelynbehavoralresearch l h Inflatesapparentrelatonshpbetweenand 83 of of 84 When data are n ther orgnal metrc, b r s s When data are standardzed, zˆ y s x y rz x b r Hghrorr maynotmplystrongpredctve capablty r sashghas.9(r =.8canstllhavewdeconfdence ntervalsfortheestmate Alwayscomputeconfdenceorpredctonntervals Hghrorr alwayssuggestsregressonlnesa g goodft Onlyfrelatonshpslnear. Canstllgetrelatvelyhghr,r frelatonshps curvlnear

23 85 of of 84 Formulae for r Page 3 of 7 Lowrorr alwayssuggeststhatandare notrelated,orareweaklyrelated or are Onlyfrelatonshpslnear Canstllgetverylowr,r frelatonshps curvlnear Transformng,,orbothprortoconstructng regressonmodelmymprovetheft(latertopc p When and are standardzed Z xz r y xy n When and are n raw score form (non- standardzed. r xy / Formulae for r Is r xy a least squares estmator or an MLE estmator? Is ths estmator unbased? What s t an estmator of? Isn t ths the same as r from a smple lnear regresson model? Pont Bseral r (one varable dchotomous r pb 87 of 84 0 sd pq Formulae for r Ph Coeffcent (Both and dchotomous j j j j r j j j 88 of 84

24 89 of 84 Inferences on Correlaton Coeffcents Bvarate Normal Populaton Interpretaton I of s mportant 90 of 84 Interval Estmaton of Page 4 of 7 The Fsher z transformaton Testng the H 0 : = 0 (Relate ths to Smple Lnear Regresson! If H * 0 holds, then t gven below ~ t n- r n t r Samplng p gdstrbuton of r s complcated when 0 Cannot use t! If n 5 then z ~ N 0, z where, E ( z E log When n 5, r r log 9 of 84 9 of 84 Estmaton of (Contnued. z Then the CI for s, n 3 z z ( / z We have to retransform back to n order to get ts CI. See KNN for testng hypotheses about ndependent samples from two bvarate normal populatons Tanh(arctanh(r xy -Z Sqrt(n-3<Tanh(arctanh(r xy +Z / /Sqrt(n-3 What f populatons are not Normal? Resort to non-parametrc approach The famous Spearman Rank Correlaton coeffcent, R R R R rs R R R R If there are no tes n the ranks then we can use the more commonly found approxmaton, r 6 d s n n

25 Hypothess Test for Populaton Correlaton Coeffcent H 0 : No assocaton between and H a : There s assocaton between and Samplng dstrbuton b t of r s s avalable n tables and s not too complcated However, when n >0 then we can use, r s n t as n the Normal case r s 93 of 84 Spearman s rank correlaton coeffcent ts also used to test for heteroscedastcty 94 of 84 KNNCh.3 DagnostcsandRemedalMeasures Page 5 of 7 95 of of 84 DotPlots SequencePlots StemandLeafPlots Essentallytocheckforoutlyngobservatonswhchwll beusefulnlaterdagnoss. later WhyLookattheResduals? Detectnonlneartyofregressonfuncton of regresson functon DetectHeteroscedastcty(=lackofconstantvarance Autocorrelaton Outlers Nonnormalty Importantpredctorvarablesleftout? Regresson Model Assumptons: Errors are Independent (Have Zero Covarance Errors have Constant Varance Errors are Normally Dstrbuted

26 97 of of 84 Page 6 of 7 Dagnostcs for Resduals Dagnostcs for Resduals Detect non-lnearty of regresson functon Heteroscedastcty Auto-correlaton Outlers Non-normalty Important predctor varables left out? PLOT OF RESIDUALS. aganst predctor (f only. (Absolute or Sqd. Resdual aganst predctor 3. aganst ftted values (for many 4. aganst ttme 5. aganst omtted predctor varables 6. Box plot 7. Normal probablty plot Approxmate expected value of k th smallest resdual : Normal probablty blt plot k MSE z n of of 84 Tests nvolvng Resduals The Correlaton test for Normalty H 0 : The resduals are normal H A : The resduals are not normal Correlaton l between e ( (s and ther h expected values under normalty. Use Table B.6 B6 Observed coeff. of correlaton should be at least as large as table value for a gven level l of sgnfcance. Tests nvolvng Resduals Other tests for Normalty H 0 : The resduals are normal H A : The resduals are not normal Anderson-Darlng (very powerful, may be used for small sets, n<5 Ryan-Joner Shapro Shapro-Wlk Kolmogorov-Smrov

27 0 of 84 Tests nvolvng Resduals The Correlaton test for Normalty H 0 : The resduals are normal H A : The resduals are not normal Correlaton l between e ( (s and ther h expected values under normalty. Use Table B.6 B6 Observed coeff. of correlaton should be at least as large as table value for a gven level l of sgnfcance. 0 of 84 Tests nvolvng Resduals (Constancy of Error Varance The Modfed Levene Test Parttons the ndependent varable nto two groups (Hgh values and low values, then tests the null H 0 : The groups have equal varances Smlar to a pooled varance t-test test for dfference n two means of ndependent samples. It s robust to departures from normalty or error terms Large sample sze essental so that dependences of error terms on each other can be neglected Uses group medan nstead of the mean (Why? Page 7 of 7 * d d t L Tests nvolvng Resduals s n n where, d ~ e e and d Now, the d (Constancy of Error Varance and d The Modfed Levne Test e ( n s ( n s n n e~ on these two sets of data ponts. s 03 of 84 are the data ponts,.e the t - test s based Read Comments on page 8 and go thru the Breusch-Pagan test on page of 84 Acomparsonof of FullModel sumof squareserrorand LackofFt sumof squares. Forbestresults,requresrepeat observatonsat,atleastonelevel. t tl t l l Fullmodel: j = j + j ( j =meanresponse when= j Reducedmodel: j = 0 + j + j (Why Reduced?

28 05 of of 84 Page 8 of 7 Overvew of some Remedal Measures SSE(Full=SSPE= j j j (Labeled PureError snceunbasedestmatoroftrueerror b d t t varance.see3.3and3.3,page3 SSLF=SSE(ReducedSSPE,(whereSSE(Reduced=SSE fromordnaryleastsquaresregressonmodel SSLF TestStatstc: (whats p? * c p F SSPE n c Be sure to compare the ANOVA table on page 6 wth holsanova table. The Problem: Smple Lnear Regresson s not approprate. The soluton:. Abandon the model ( Eagle to Hawk; abort msson and return to base.. Remedy the stuaton: If Non-ndependent error terms then work wth a model that calls for correlated error terms (Ch. If Heteroscedastcty then use WLS method to estmate parameters (Ch. 0 or use transformatons of data. If scatter plot ndcates non-lnearty, then ether use non-lnear regresson functon (Ch.7 or transform to lnear. NET: We wll look at one such powerful transformaton t method. 07 of 84 The Box-Cox Transformaton Method 08 of 84 The Box-Cox Transformaton Method The famly of power transforms on s gven as: '= The famly easly ncludes smple transforms such as the square root, squared etc. By defnton, when then '=log e When the response varable s so transformed, the normal error regresson model becomes: We would lke to determne the best value of ethod : Maxmum lkelhood estmaton Max L n exp n , 0,, R ethod : Numercal Search Step : Set a value of. Step : Standardze d the observatons If then: W =K ( If then: W =K (log e n / n where, K and K K Step 3: Now regress the set W on the set. Step 4: Note the correspondng SSE. Step 5: Change and repeat steps to 4 untl lowest SSE s obtaned. Let s try both ths method wth the GMAT data. What should we get as the best

29 09 of 84 0 of 84 Page 9 of 7 Confdencentervalsareusedforasngle df l parameter,confdenceregonsforatwoormore parameters Theregonfor( 0, defnesasetoflnes Snce 0 and are(jontlynormal,thenatural confdenceregonsanellpse KNNdorectangles(KNN4. KNNCh.4 SmultaneousInferencesandOtherTopcs of 84 of 84 Wewanttheprobabltythatboth ntervalsare correcttobe(atleast.95 Bascdeasanerrorbudget( =.05 Spendhalfon 0 (.05andhalfon (.05 Weuse =.05forthe 0 CI(97.5%CI and =.05forthe CI(97.5%CI CI Soweuse b ±t * s(b b 0 ±t * s(b 0 wheret * =t(.9875,n 975,.9875= (.05/(*

30 3 of 84 4 of 84 Page 30 of 7 Notewestartwtha5%errorbudgetandwe havetwontervalssowegve so we.5%toeach Eachntervalhastwoends h lh d soweagandvdeby d So,.9875= (.05/(* LetthetwontervalsbeI andi Wewllusecor(=correctfthenterval contansthetrueparametervalue,nc (=ncorrect ncorrectfnotf not 5 of 84 6 of 84 P(bothcor=P(atleastonenc P(atleastonenc =P(I nc+p(i ncp(bothnc leqp(i nc+p(i nc SoP(bothcor geq(p(i nc+p(i nc P(bothcorgeq(P(I nc+p(i nc Sofweuse.05/foreachnterval, for each nterval (P(I nc+p(i nc=.05=.95 SoP(bothcorsatleast.95 l Wewllusethsdeawhenwedomultple comparsonsnanova

31 7 of 84 8 of 84 Page 3 of < Smultaneousestmatonforall h,use WorkngHotellng(KNN.6 g( E( h (hat± Ws(E( h (hat wherew =F(;,n Forsmultaneousestmatonforafew(g h, usebonferron E( h (hat± Bs(E( h (hat whereb=t(/(g, /(g,n 9 of 84 0 of 84 Smultaneouspredctonforafew(g h, usebonferron h (hat± Bs( h (hat whereb=t(/(g,n /(g OrScheffe h (hat± Ss( h (hat wheres =gf(;g,n ;g, = + HowtosettupnyourStatsoftware: tt t Check ConstantsZero (Excel Uncheck FtIntercept noptions(minitab Uncheck IncludeConst.nEq. noptions (SPSS NOINT optonnprocreg(sas Generallynotnot agooddea Problemswthr andotherstatstcs Seecautons,KNNp63

32 of 84 of 84 Page 3 of 7 For,thssusuallynotaproblem For,wecangetbasedestmatorsofour regressonparameters SeeKNN4.5,pp6466 Sometmescalledcalbraton Gven Gven h,predctthecorrespondngvalueof the correspondng of, h (hat Solvethefttedequatonfor h h(hat=( h b 0/b,b neq0 ApproxmateCIcanbegven,seeKNN,p67 3 of 84 4 of 84 Lookattheformulasforthevarancesofthe estmatorsofnterestof nterest Usuallywefnd( (bar na denomnator Sowewanttospreadoutthevaluesof ReadKNN4.to4.6,readproblemsonpp 4 pp 775 Nextclasswewlldoallofthswthvectors wll all of ths wth andmatrcessothatwecangeneralzeto multpleregresson l IfyouarerustynLnearAlgebra: REVIEWKNN5.to5.7

33 5 of 84 Appled Regresson Analyss 6 of 84 Page 33 of 7 KNN Ch. 5 DefntonofaMatrx: Amatrxsarectangulararrayofelements arrangednrowsandcolumns Vector:Amatrxcontanngonlyonecolumn Transpose:whenrowsbecomecolumnsand rows columns and columnsbecomerows of 84 8 of 84 EqualtyofMatrces: TwomatrcesA A andbb areequalftheyhavetheequal have samedmensonandallcorrespondngelements areequalequal AddtonandSubtracton: TheSumoftwomatrcessanothermatrxhavng matrces s another matrx havng elementsthatarethesumofthecorrespondng elementsnthetwomatrces Multplcatonofamatrxbyascalar Multplcatonofamatrxbyamatrx IdenttymatrxI r r r matrxandj r r matrx Rankofamatrx:mnmumnumberoflnear a mnmum number ndependentcolumns Inverseofamatrx:A A A =AA AA =II 7 8

34 Matrx Approach to Smple Lnear Regresson pp o c o S p e e eg ess o Why mportant? Concse representaton Very useful for multple regresson Very useful for multple regresson Easy to program and analyze large data sets n f l ft h SAS MATLAB t E l powerful software such as SAS, MATLAB etc. Excel and Mntab also have matrx capabltes. 9 9 of 84 Model and data set representaton Model and data set representaton Note that these are matrces n n n All the propertes of the smple lnear the smple lnear regresson model can be derved wth ths representaton 30 n n n 30 of 84 Some mportant formulae n Matrx format Some mportant formulae n Matrx format b Q ' ' ( ' ' ' ' ' H b b ' ' ( ' ' ( H I b SSE H n n ' ' ' ' ( ' ( ' ' ( Where H s Idempotent. It s the Hat Matrx J H I b SSE then s, ' all of matrx a square s If ' ' ' ' ( ' ( Quadratc Forms! J n SSTO ' ' Each of the A matrces are symmetrc. 3 J n H SSE SST SSR ' 3 of 84 ˆ ˆ Q =( ( =( ( Ŷ Ŷ = + dq/d = - + Ths dervatve becomes zero at b where: Ths dervatve becomes zero at b, where: - + b = 0 b = b ( - = ( - b( ( b = ( - b=( - 3 b = ( 3 of 84 Page 34 of 7

35 33 of of 84 Page 35 of b0 5, b, b b ( ' ' KNN Ch. 6 CC Ch of of 84 An Extenson of Smple Lnear Regresson Interpretaton of parameters s mportant: For example, how would you nterpret n the above model? Can be expressed n short form as, The geometrc nterpretaton s a Response Surface. Meanngoftheyntercept 0 : Ifthescopeofthemodelncludes of =0, =0, etc.then 0 sthemeanresponsee{}at =0, =0,etc.Otherwse,they the yntercepthasno no partcularmeanng Meanngoftheslope : IndcatesthechangenthemeanresponseE{} (expectedchangenperuntncreasenchange n per ncrease, when andalltheotherpredctorsareheld constant

36 37 of of 84 Page 36 of 7 The Matrx Representaton Polynomal Qualtatve Varables Non-lnear? Is ths allowed? of of 84 Formulae for Smple Regresson Apply H s Idempotent. It s the Hat Matrx Quadratc Forms! Each of the A matrces are symmetrc b x b b b ( ' ' b

37 4 of 84 4 of 84 Page 37 of 7 Tests, Estmaton anddagnostcs Tests, Estmaton anddagnostcs All tests and dagnostcs smlar to smple regresson F-test for regresson R and Adjusted R Estmaton of Mean Response and Predcton of New Observaton Smultaneous CIs for Several Mean Responses - Workng-Hotellng or Bonferron (See page 34 Predcton of Mean of m new observatons at h Predcton of g new observatons - Scheffe or Bonferron (See page 35 3-D scatter plots Resdual Plots Correlaton test for Normalty Brown-Forsythe (Modfed Levne test for heteroscedastcty Breusch-Pagan test for heteroscedastcty F-test for lack of ft Fnally, the Box-Cox procedure as a remedal measure of of 84 Cautonshouldbeexercsedforthepredctonnottofalloutsdeof exercsed the not to fall outsde of thescopeofthemodel(observedrangeofthepredctorvarables.thepontshownbelowswthntherangesof and ndvdually,butswelloutsdethejontregonofobservatons. Whattodo?WatuntlwegettoLeveragevalues(KNNch.0 t W tl t t l h Regon covered by and jontly j y A Dfferent Perspectve A Bvarate MR model wth standardzed varables zˆ. z. z Where, the s are standardzed partal regresson coeffcents and are gven as,. r r r r r r, r r. Indvdual range 0 Indvdual range 43 Note that, =. * s /s and =. * s /s The term partal above s used because the terms have been adjusted to allow for the correlaton between ndependent varables. (Check by substtutng r =0 44

38 45 of of 84 Page 38 of 7 A Dfferent Perspectve The Coeffcent of Multple Determnaton Sem-partal Correlaton Coeffcents and Venn Dagrams Multple Regresson -II PartalCorrelaton l C l Coeffcents andvenn Dagrams. Separatng drect, ndrect, spurous and entrely ndrect effects KNN Ch of of 84 Extra Sum of Squares Margnal reducton n SSE when one or several predctor varables are added to the regresson model gven that the other varables are already n the model. In what other, equvalent manner, can you state the above? The word Extra s used snce we would lke to know what the margnal contrbuton (or extra contrbuton s of a varable or a set of varables when added as explanatory varables to the regresson model Decomposton of SSR nto ESS A pctoral representaton s also possble. See page 6, Fg. 7. of KNN SSR( SSR( SSE( SSR(, SSE(, 47 48

39 Decomposton of SSR nto ESS For two or three explanatory varables the formulae are qute easy. Wth two varables we have, And wth three varables, 49 of 84 SSR( SSE( SSE(, SSR(, SSR( SSR(, SSE(, SSE(,, SSR(,, SSR(, Decomposton of SSR nto ESS Note that wth three varables, we may also have, SSR(, 3 SSE( SSE(,, 3 To test tthe hypothess, v/s, the test t statstc s gven as, H : 0 H : 0 0 k * SSR( 3, / To test (say, F SSE ( v/s,,the test,, 3 /( n 4 statstc s gven as, H 0 H, not both 0 0 : 3 50 of 84 a 0 : 3 k Page 39 of 7 Consderng3 adjusted for and as the predctor, ths would be SSR Consderng adjusted for and as the response bvarable, ths would be the SSTO Consderng adjusted for and as the response bvarable, and 3 adjusted d for and d as the predctor, ths would be the SSE 49 SSR(, 3 / F * SSE,, /( n 4 ( 3 50 Decomposton of SSR nto ESS In general however we can wrte, F 5 of 84 RF RR / dfr dff R F / dff * Ths form s very convenent to use snce we do not have to keep track of the ndvdual sums of squares Also, ths form wll mnmze any errors due to subtracton when calculatng the SSRs On the next page we see the ANOVA table wth decomposton of SSR and three varables The ANOVA Table Source of varaton Sum of squares df Mean Squares Regresson SSR,, 3 MSR,,, 3 ( 3 SSR( SSR( ( SSR(, 3 SSE Error n-4 SSTO 5 of 84 Total n- ( 3 MSR( MSR( ( MSR(, 3 MSE(,, 3 5 5

40 Another ANOVA Table (what s the dfference? Source of varaton Sum of squares df Mean Squares Regresson SSR,, 3 MSR,, 3 ( 3 ( 3 SSR MSR( ( 3 3 SSR( 3 SSR,, 3 ( 3 Error n-4 SSE Total n- SSTO 53 of 84 MSR( ( 3 MSR(, 3 MSE(,, 3 The regresson equaton s = An Example Predctor Coeff. StDev. T P Constant S = 80 R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance Source DF SS MS F P Regresson Error Total Source DF Seq SS of 84 Page 40 of 7 53 Source DF Seq SS SSR Test for a k =0, n a general model Full model wth all varables, 0... k, k k k k, k... p Compute, SSR(,..., k, k, k,..., p Reduced model wthout k Compute,, p 0... k, k k, k... p, p ( k,..., k, k,..., p The test statstc s, 55 of 84 SSR (,..., k, k, k,..., p SSR(,...,,,..., / * k k k p F SSE,...,,,,..., /( n ( k k k p p SSR,...,,,..., ( k k p 55 The regresson equaton s = of 84 An Example Predctor Coef StDev T P Constant S = 80 R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance Source DF SS MS F P Regresson Error Total The regresson equaton s = Predctor Coef StDev T P Constant S = 97 R-Sq = 94.8% R-Sq(adj = 94.7% Analyss of Varance Source DF SS MS F P Regresson Error Total

41 Test for some k =0, n a general model Full model wth all varables, 0... q, q q q q, q... p See (7.6 pg. 67 of KNN, p Compute, SSR(,..., q, q,..., p Reduced d model wthout t the vector k 0... q, q Compute, SSR( q,..., p,..., q SSR,..., q, q,..., OR, SSR,..., ( p ( q SSR,..., SSR(,...,... SSR,..., ( q q q q The test statstc s, 57 of 84 ( p p p q R... R... / p * p q, or, F R... /( n SSR(,...,..., / q * q p q F SSE,..., /( n p ( p. p p 57 The regresson equaton s = of 84 An Example Predctor Coef StDev T P Constant S = 80 R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance Source DF SS MS F P Regresson Error Total The regresson equaton s = Predctor Coef StDev T P Constant S = 866 R-Sq = 95.3% R-Sq(adj = 95.3% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Page 4 of of 84 Test for k = q, n a general model Full model wth all varables, 0... k k... q, q... p Compute, SSR,...,,...,,..., ( k q p, p 60 of 84 The regresson equaton s = An Example Predctor Coef StDev T P Constant S = 80 R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance Reduced model wth k + q 0... k ( k q... p Compute, SSR,...,,..., ( k q p, p Source DF SS MS F P Regresson Error Total The regresson equaton s = ( SSR(,...,,...,,..., / (,...,,..., / * k q p SSR k q p F SSE (,...,,...,,..., /( n p Also, when testng say,, or even the above hypothess, k k q e.e. q, one can use the General Lnear Test approach outlned n k KNN. q p 59 Predctor Coef StDev T P Constant ( S = 798 R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total

42 Coeffcents of Partal Determnaton Recall the defnton of the coeffcent of (multple determnaton: 6 of 84 R-sq s the proportonate reducton n varaton when the set of varables s consdered d n the model. Now consder a coeffcent of partal determnaton: R-sq for a predctor, gven the presence of a set of predctors n the model, measures the margnal contrbuton of each varable gven that others are already n the model. A graphcal representaton of the strength of the relatonshp between and, adjusted for, s provded by partal regresson plots (see HW6 Coeffcents of Partal Determnaton For a model wth two ndependent varables: Interpret ths: SSR( SSR( r,. r. SSE( SSE( Generalzaton s easy, for e.g., r 3.4 SSR( 3,, 4 SSE(,, ( 4 6 of 84 SSR (, 3 SSE(, r.3 3 etc. Is there an alternate nterpretaton of the above partal coeffcents? What, s say?? r. 3 Page 4 of of 84 An Example 64 of 84 Another Example The regresson equaton s = Predctor Coef StDev T P Constant S = R-Sq =.0% R-Sq(adj = 0.3% The regresson equaton s: = Predctor Coef StDev T P Constant S = 80 R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total The regresson equaton s = Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Source DF Seq SS Predctor Coef StDev T P Constant S = 9.86 R-Sq = 94.9% R-Sq(adj = 94.9% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total The regresson equaton s: = Predctor Coef StDev T P Constant S = 80 R-Sq = 95.6% R-Sq(adj = 95.5% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total

43 65 of 84 The Standardzed Multple Regresson Model 65 Page 43 of 7 66 of 84 The Standardzed Mult. Regresson Model Why necessary? - Round-off errors n normal equatons calculatons (especally when nvertng a large, matrx. What s the sze of ths nverse for say =b 0 +b.+b Lack of comparablty of coeffcents n regresson models (dfferences n unts nvolved - Especally mportant n presence of multcollnearty. The matrx s almost close to zero n ths case. OK. So we have a problem. How do we take care of t? - The Correlaton Transformaton: - Centerng: Take the dfference between each observaton and the average AND - Scalng: Dvdng the centered observaton by the standard devaton of the varable. ou must have notced that ths s nothng but regular standardzaton? What s the twst? See next slde 66 The Standardzed Mult. Regresson Model Standardzaton s k, ( k,, p s k Correlaton Transformaton ' 67 of 84 n s The Standardzed Mult. Regresson Model Once we have performed the Correlaton Transformaton, then all that remans s to obtan the new regresson parameters. The standardzed regresson model s: p, p where, the orgnal parameters can be had from the transformaton, s k k, k,, p and 0 p p s k 68 of 84 In Matrx Notaton we have some nterestng relatonshps: k ', ( k,, p n s k 67 ( p ( p ( p r r correlaton matrx of the (untransformed varables correlaton matrx of (untransformed and WH? Is ths surprsng? 68

44 69 of 84 An Example Part of the orgnal (unstandardzed data set The regresson equaton s = Predctor Coef StDev T P Constant S = 80 R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance 70 of 84 An Example (contnued Standardzed and then Correlaton Transformed Page 44 of 7 Source DF SS MS F P Regresson Resdual Error Total of 84 An Example (contnued The regresson equaton s = Predctor Coef StDev T P Constant S = R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Compare to the regresson model obtaned from the untransformed varables, what can we say about the two models? Is there a dfference n predctve power, or s there a dfference n ease of nterpretaton? Why s b 0 =0? Just by chance? 7 7 of 84 Multcollnearty One of the assumptons of the OLS model s that t the predctor varables are uncorrelated. When ths assumpton s not satsfed, then multcollnearty s sad to exst.(thnk about Venn Dagrams for ths Note that multcollnearty s strctly a sample phenomenon. We may try to avod t by dong controlled experments, but n most socal scences research, ths s very dffcult to do. Let us frst, consder the case of uncorrelated predctor varables,.e., no multcollnearty. -Usually occurs n controlled experments -In ths case the R between each par of varables s zero -The ESS for each varable s the same as when the varable s regressed alone on the response varable. 7

45 of 84 An Example The regresson equaton s = Predctor Coef StDev T P Constant S =.9 R-Sq = 5.% R-Sq(adj = 33.0% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Source DF Seq SS The regresson equaton s = of 84 An Example (contnued Predctor Coef StDev T P Constant S = 3.0 R-Sq = 0.9% R-Sq(adj = 0.0% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total The regresson equaton s = Predctor Coef StDev T P Constant S =. R-Sq = 5.3% R-Sq(adj = 43.% Analyss of Varance Page 45 of 7 Source DF Seq SS (From prevous slde 73 Source DF SS MS F P Regresson Resdual Error Total Multcollnearty y( (Effects of The regresson coeffcent or any ndependent varable cannot be nterpreted as usual. One has to take nto account whch other correlated varables are ncluded n the model. The predctve ablty of the overall model s usually unaffected. The ESS are usually reduced to a great extent. The varablty of OLS regresson parameter estmates s nflated. (Let us see an ntutve reason for ths based on a model wth p-= b ( 75 of 84 r r r Note that the standardzed regresson coeffcents have equal standard devatons. Wll ths be the case even when p-=3? Or s ths just a specal case scenaro. Multcollnearty (Effects of Hgh R, but few sgnfcant t-ratos (By now, you should be able to guess the reason for ths Wder ndvdual confdence ntervals for regresson parameters (Ths s obvous based on what we dscussed on the earler slde 76 of 84 0 e.g. What would you conclude based on the above pcture? 75 76

46 Multcollnearty (How to detect t? Hgh R (>0.8, but few sgnfcant t-ratos Caveat: There s a partcular stuaton when the above s caused w/out any multcollnearty. Thankfully ths stuaton t never arses n practce Hgh par-wse correlaton (>0.8 8between ndependent d varables Caveat: Ths s a suffcent, but not necessary condton. For example consder the case where, r =0.5, r 3 =0.5 and r 3 =-0.5. We may conclude, no multcollnearty. However, we fnd that R = when we regress on and 3 together. Ths means that s a perfect lnear combnaton of the two other ndependent varables. In fact the formula for the R s gven as, and one can readly verfy that the numbers satsfy ths equaton. R r r r 3 r r r 3 Due to the above caveat, always examne the partal correlaton coeffcents. 77 of Page 46 of 7 78 of 84 Multcollnearty (How to detect t? Run auxlary regressons,.e. Regress each of the ndependent varables on the other ndependent varables taken together and conclude f t s correlated to the other or not based on the R. The test statstc s, F R ( R The Condton Index (CI: If, multcollnearty..,...,,... p (.,...,,... p / p Maxmum Egen Value 0 CI 30 Mnmum Egen Value CI > 30 means severe multcollnearty. /( n p Moderate to Strong of 84 Multcollnearty (What s the remedy? Rely on jont confdence ntervals rather than ndvdual d ones 0 A pror nformaton of relatonshp between some ndependent varables? Then nclude t! For example: b =b s known. Then use ths n the regresson model whch then becomes, =b 0 + b, (where, = + Data Poolng (Usually done by combnng cross-sectonalsectonal and tme seres data. Tme seres data s notorous for multcollnearty of 84 Multcollnearty (What s the remedy? Delete a varable whch s causng problems Caveat: Beware of specfcaton bas. Ths arses when a model s ncorrectly specfed. For example, n order to explan consumpton expendture, we may only nclude ncome and drop wealth snce t hghly correlated to ncome. However economc theory may postulate that you use bth both varables. bl Frst dfference transformaton of varables from tme seres data The regresson n run on dfferences between successve values of varables rather than the orgnal varables. (, - +, and (, - +, etc. The logc s that even f and are correlated, there s no reason for ther frst dfferences to be correlated too. Caveat: Beware of autocorrelaton whch usually arses due to ths procedure. Also, we lose one degree of freedom due to the dfference procedure. Correlaton transformaton Gettng a new sample (Why? and/or ncreasng sample sze (Why? Factor Analyss, Prncpal Components Analyss, Rdge Regresson 80

47 8 of 84 An Example Page 47 of 7 8 of 84 An Example (contnued The regresson equaton s = Predctor Coef StDev T P Constant S =.87 R-Sq = 95.3% R-Sq(adj = 95.% Pop Income r.997 Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Source DF Seq SS Source DF Seq SS Predcted Values Ft StDev Ft 95.0% CI 95.0% PI (5.394, (4.43, Hgh R Low t-value for b Low ESS for (.e.ssr( Clearly, contrbutes lttle to the model. Really? Look at SSR(..ts humungous!! Clear case of Mult.coll. Of course we knew that r = Ths should have made us suspect that somethng was amss. 8 8 Multcollnearty (Specfcaton Bas Types of Specfcaton Errors 83 of 84 Omttng a relevant varable Includng an unnecessary or rrelevant varable Incorrect functonal form Errors of measurement bas Incorrect specfcaton of stochastc error term (Ths s a model ms-specfcaton error More on omttng a relevant varable (under-fttng 84 of 84 True Model: = Ftted Model : = Consequences of omsson:. If r s non-zero then the estmators of and are based and nconsstent KNN Ch. 8. Varance of estmator of s based estmate of varance of estmator of s ncorrectly estmated and CIs, hypothess tests are msleadng 4. E(Estmator of = b 83 84

48 85 of of 84 Page 48 of 7 A vsual gude to Polynomal lregresson and Dummy varables Quadratc Regresson model: ˆ = b 0 + b + b 87 of of 84 3 rd -order Regresson model: ˆ = b b + b + b 3 Dummy, or ndcator, varables allow for the ncluson of qualtatve varables n the model For example: I = f female 0 f male

49 89 of of 84 Page 49 of 7 Model wth Indcator varable: Model wth Indcator varable: ˆ = b 0 + b + b I ˆ = b 0 + b + b I + b 3 I Rewrte the model as: For I = 0, ˆ = b 0 + b For I =, ˆ = (b 0 + b + b Rewrte the model as: For I = 0, ˆ = b 0 + b For I =, ˆ = (b 0 + b + (b + b 3 9 of 84 9 of 84 Theproblem: ShowrelatonshpbetweenPrceand SquareFeetn3neghborhoods Thesoluton: Comeupwthgooddummyvarables& descrbecase Comeupwthafullmodel Rewrtethefullmodelasmanytmesas necessary Prce Square Feet (Sq. Ft. $50, $57, $70, $68, $50, $580, N= neghborhood 9 9

50 93 of of 84 Page 50 of 7 #ofcategores =#dummyvarables Forexample: Tomodel3qualtyvarables,youneed 3 =dummyvarables Putthemntotheformofaqueston: (yesorno fyes 0fno of of 84 N = f neghborhood 0 f otherwse N3 = f neghborhood 3 0 f otherwse Interacton & N ( Code N N 3 Prce Sq. Ft. N N N $50, $57, $70, $68, $50, $580,

51 97 of of 84 Page 5 of 7 prce ˆ 0 N N N 5 N 3 pr ce ˆ IfN : of of 84 pr ce ˆ ( ( 0 ( 4 pr ce ˆ 0 4 pr ce ˆ pr ce ˆ Rewrtten ths way to emphasze the ntercept and the slope Rewrtten ths way to emphasze the ntercept and the slope 99 00

52 0 of 84 0 of 84 Page 5 of 7 Prop:Neghborhoods&havethe sameslopentheprce/squarefootage relatonshp eato Prop:Neghborhoods&3have dfferentslopesntheprce/squarefootage n the prce/square footage relatonshp Ho Ha : 4 0 : 4 0 Theresnosgnfcantdfferencentheslopes(or sgnfcant n the slopes B 4 snotsgnfcantlydfferentfromzero. Performt testforb 4 :(wanttofaltoreject LargeP valuefaltoreject null(hypothessssupportednotproved SmallPP valuereject null of of 84 Ho : 4 Ha : Ho : 0 5 Ha : 5 0 B 4 ssgnfcantlydfferentfrombsgnfcantly from 5. B 5 ssgnfcantlydfferentfromzerosgnfcantly from Performt test:(wewanttoreject LargeP valuefaltoreject null(hypothessssupported,notproved SmallP valuereject null Performt test:(wewanttoreject LargeP LargeP valuefaltoreject null(hypothessssupported,notproved supported, not proved SmallP valuereject null (Ifreject:assumeB5sdfferentfromzeroandN&N3havedfferentslopes 03 04

53 05 of of 84 Page 53 of 7 TheeffectofwnnnganOscaronthe actor s slfeexpectancy expectancy Theplayers: Datacollecton. Oscar statuette. Actors wnnng t 3. Stores about Oscar wnners dyng at a late age 4. An academc journal of of 84 Statstcalanalyss Somedescrptvestatstcs 07 08

54 09 of 84 0 of 84 Page 54 of 7 Lfeexpectancy was3.9years longerfor AcademyAward wnners Wewllnowtrytoreplcatethsstudy andverfytheresults,usngsmple e esu ts, us s e regressonanalysstechnques 09 0 of 84 of 84 Oursmpleanalyssfaledtofnd l l f d sgnfcancentheeffectofwnnngan Oscar. Weddfndsomesgnfcancen gender Apparently,mendebeforewomen! Ddyoueverwonderwhy? Buldng the Regresson Model I Selecton and Valdaton KNN Ch. 9

55 3 of 84 4 of 84 Page 55 of 7 The Model Buldng Process Collect tand prepare data Reducton of explanatory varables for exploratory/ observatonal studes Refne model and select best model Valdate model f t passes the checks then adopt t All four of the above have several ntermedate steps. These are outlned n Fg. 9., page 344 of KNN The Model Buldng Process Data collecton Controlled Experments (levels, treatments Wth supplemental varables (ncorporate uncontrollable varables n regresson model rather than n the experment Confrmatory Observatonal Studes (hypothess testng, prmary varables and rsk factors Exploratory Observatonal Studes (Measurement errors/problems, duplcaton of varables, spurous varables, sample sze; are but some of the ssues here 3 4 The Model Buldng Process Data Preparaton 5 of 84 What are the standard technques here? Its an easy guess, a rough-cut approach s to look at varous plots and dentfy obvous problems such as outlers, spurous varables etc. Prelmnary Model Investgaton Scatter Plots and Resdual Plots (For what? Functonal forms and transformatons (of entre data or some explanatory varables or predcted varable? Interactons and..intuton 6 of 84 The Model Buldng Process Reducton of Explanatory Varables Generally an ssue for Controlled Experments wth Supplemental Varables and for Exploratory Observatonal Studes It s not dffcult to guess that for Exploratory Observatonal Studes, ths s more serous Identfcaton of good subsets of the explanatory varables and ther functonal forms and any nteractons, s perhaps the most dffcult problem n multple regresson analyss Need to be careful of specfcaton bas and latent explanatory varables. 5 6

56 The Model Buldng Process Model Refnement and Selecton Dagnostcs for canddate models Lack-of-ft ft tests f repeat obs. avalable Best model s # of varables should be used as benchmark kfor nvestgatng other models wth smlar number of varables Model Valdaton 7 of 84 Robustness and Usablty of regresson coeffcents Usablty of regresson functon. Does t all make sense? 8 of 84 All Possble Regressons: Varable Reducton Usually many explanatory varables (p- present at the outset Select the best subset of these varables Best The smallest subset of varables whch provdes an adequate predcton of. Multcollnearty usually a problem when all varables n the model. Varable selecton may be based on the determnaton coeffcent Rp or on the statstc (Equvalent Procedures. Page 56 of 7 SSE p of 84 All Possble Regressons: Varable Reducton -SSE R p and are hghest when all the p varables are n the model. One ntends to fnd the pont at whch addng more varables causes a very small ncrease n R p or a very small decrease n SSE p. Gven a value of p, we compute the maxmum of R p (or mnmum of SSE p and then we compare the several maxma (mnma. See the Surgcal Unt Example on page 350 of KNN. 0 of 84 A Smple Example Regresson Analyss The regresson equaton s = Predctor Coef StDev T P Constant t S =.80 R-Sq = 95.7% R-Sq(adj = 95.6% Regresson Analyss The regresson equaton s = Predctor Coef StDev T P Constant S =.80 R-Sq = 95.6% R-Sq(adj = 95.5% Regresson Analyss The regresson equaton s = Predctor Coef StDev T P Constant S =.866 R-Sq = 95.3% R-Sq(adj = 95.3% 0

57 of 84 All Possble Regressons: Varable Reducton R p does not take nto account the number of parameters (p and never decreases as p ncreases. Ths s a mathematcal property, but t may not make sense practcally. However, useless explanatory varables can actually worsen the predctve power of the model. How? The adjusted coeffcent of multple determnaton wll account for the ncreased p always. SSE /( n p R a SSTO /( n The R a and MSE p crteron are equvalent When can MSE p actually ncrease wth p? A Smple Example Regresson Analyss The regresson equaton s = Predctor Coef StDev T P Constant S =.878 R-Sq = 99.3% R-Sq(adj = 97.% Regresson Analyss The regresson equaton s = of 84 Predctor Coef StDev T P Constant S =.603 R-Sq = 98.8% R-Sq(adj = 97.7% Regresson Analyss The regresson equaton s = Predctor Coef StDev T P Constant S = 5.86 R-Sq = 9.% R-Sq(adj = 88.3% Page 57 of 7 Interestng All Possble Regressons: Varable Reducton The C p crteron s concerned wth the total MSE of the n ftted values. Total error for any ftted value s a sum of bas and random error components ˆ s the total error, where s the true mean response of when =. The bas s E{ ˆ} and the random error s ˆ E{ ˆ} Then the total mean squared error s shown to be: n [ E { ˆ} ˆ { }] 3 of 84 When the above s dvded by the varance of the actual values.e., by, then we get the crteron p The estmator of p s what we shall use:c p 3 All Possble Regressons: Varable Reducton SSE p C p ( n p MSE (,, P Choose a model wth small C p C p should be as close as possble to p. When all varables are ncluded then obvously C p = p (=P If the model has very lttle bas then n that case E( ˆ and E(C p p 4 of 84 When we plot a lne through the orgn at 45 o and plot the (p,c p ponts, then for models wth lttle bas, the ponts wll fll fall almost on the straght hlne, for models dl wth hsubstantal l bas, the ponts wll fall much above the lne, and f the ponts fall below the then such models have no bas but just some random samplng error. 4

58 5 of 84 All Possble Regressons: Varable Reducton n The PRESS p crteron : PRESS ( ˆ ( p ˆ ( ( s the predcted value of when the th observaton s not n the dataset. Choose models wth small values of PRESS p. It may seem that one wll have to run n separate regressons n order to calculate PRESS p. Not so, as we wll see later. Best Subsets Algorthm: 6 of 84 Best Subsets Best subsets (a lmted number are dentfed accordng to pre-specfed crtera. Requre much less computatonal effort than when evaluatng all possble subsets. Provde good subsets along wth best, whch s qute useful. When pool of varables s large, then ths algorthm can run out of steam. What then? We wll see n the ensung dscusson. Page 58 of of 84 A Smple Example 8 of 84 Forward Stepwse Regresson Best Subsets Regresson (Note: s s the square root of MSE p Response varable s Adj. Vars R-Sq R-Sq C-p s Response varable s Adj. Vars R-Sq R-Sq C-p s An teratve procedure Based on the partal F * or t * statstc one decdes whether to add a varable or not. One varable at a tme s consdered. d Before we see the actual algorthm here are some levers: Mnmum acceptable F to enter (F E Mnmum acceptable F to remove (F R Mnmum acceptable Tolerance (T mn Maxmum number of teratons (N And here s the general form of the test statstc: F * k MSR Other s already n the model k bk (,Other s already n the model { } MSE k s bk 8

59 Forward Stepwse Regresson The procedure: 9 of 84. Run a smple lnear regresson of all varables wth the varable.. If none of the ndvdual F values are larger than the cut-off F E value, then stop. Else, enter the varable wth the largest F. 3. Now run the regresson of remanng varables wth gven that the varable entered n step s already n the model. 4. Repeat step. If a canddate s found, then check for tolerance. If tolerance (-R k s not larger than cut-off tolerance value T mn, then choose a dfferent canddate. If none avalable, then termnate. Else, add the canddate varable. 5. Calculate the partal F for the varable entered n step gven that the varable entered n step 4 s already n the model. Check f ths F s less than F R. If so, then remove the varable entered n step. Else keep t. Check f number of teratons s equal to N. If yes, termnate. If not, then proceed to step Check from results of step, whch h s the next canddate varable to enter. If number of teratons exceeded, then termnate 9 Page 59 of 7 30 of 84 Other Stepwse Regresson Procedures Backward Stepwse Regresson exact opposte of forward procedure. Sometmes preferred to forward stepwse. Thnk k about how ths procedure would work why, or under whch condtons you would use t nstead of forward stepwse? Forward Selecton Smlar to forward stepwse; except that the varable droppng part s not present Backward Elmnaton Smlar to backward stepwse; except that the varable addng part s not present 30 3 of 84 An Example 3 of 84 Let us go through the example (Fg. 9.7 on page 366 of KNN. AkakeInformatonCrtera(AIC Imposeapenaltyforaddngregressors penalty addng regressors AIC= e p/n SSE p /n,wherep/nsthepenaltyfactor HarsherpenaltythanR than a (How? ModelwthlowestAICspreferred AICusedfornsampleandoutofsampleforecastng sample and out of sample forecastng performancemeasurement Usefulfornestedandnonnestedmodeandforfor nonnested for determnnglaglengthnautoregressvemodels(ch 3 3

60 SchwarzInformatonCrtera(SIC SIC=n p/n SSE p /n SmlartoAIC 33 of 84 ImposesstrcterpenaltythanAIC HassmlaradvantagesasAIC 33 Model Valdaton Checkng the predcton ablty of the model. Methods for the model valdaton;. Collecton of new data; 34 of 84 - We select a new sample wth the same varables of dmenson ; - Compute the mean squared predcton error: ( ˆ MSPR * n. Comparson of results wth theoretcal expectatons; 3. Data splttng n two data sets: model buldng and valdaton. n * Page 60 of of of 84 Outlyng Observatons KNN Ch. 0 (pp At tmes data sets have observatons that are outlyng or extreme. These outlers usually have a strong effect on the regresson analyss. We have to dentfy such observatons and then decde f they need to be elmnated or f ther nfluence needs to be reduced. When dealng wth more than one varable, smple plots (boxplots, scatterplots etc. may not be useful to dentfy outlers and we have to use the resduals or functons of resduals. We wll now look at some of these functons

61 Resduals and Semstudentzed Resduals Prevously, we examned: Resduals 37 of 84 Semstudentzed Resduals e ˆ e * e MSE We wll now ntroduce a few refnements that are more effectve n dentfyng outlers. Frst we need to recall the Hat Matrx. Leverages We prevously defned the Hat matrx as H = ( - Usng the hat matrx, 38 of 84 ˆ H and e = (I-H The dagonal elements of the hat matrx, h, 0< h <, are called Leverages These are used to detect nfluental observatons. Leverage values are useful for detectng hdden extrapolatons when p > 3 Page 6 of of 84 Measures for -outler detecton 40 of 84 Measures for -outler detecton An estmator of the st. devaton of the -th resdual s MSE h ( Therefore,, dvdng each resdual by ts st. devaton we obtan the e Studentzed Resduals: r MSE( h Another effectve measure for outler dentfcaton s obtaned when we delete observaton, ft the regresson functon to the remanng n observatons, and obtan the expected value for that observaton gven ts levels. l The dfferences between the predcted d and the actually observed value produces a deleted resdual. Ths can be also expressed usng a leverage value. e Deleted Resduals: d ˆ ( h Studentzed Deleted Resduals d e n p t e ( ( ( s d MSE h SSE( h e ~ tn p 39 40

62 4 of 84 Detecton of outlyng Observatons 4 of 84 Outlyng Observatons Page 6 of 7 Crteron for Outlers: In order to establsh that the th observaton s an outler we have to compare the value of t wth t, where t s the 00*(-/n th percentle of the t dstrbuton wth (n-p- degrees of freedom. The average value s Crteron for Outlers: h p / n If h > p/n, then observaton s an outler of 84 A Smple Example 44 of 84 A Smple Example (contnued Pop Income 3 Regresson Analyss The regresson equaton s = Predctor Coef StDev T P Constant S =.80 R-Sq = 95.7% R-Sq(adj = 95.6% Land Beds Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Pred. Resd. Stud.Res. Del. Stud. Res. h

63 45 of 84 Influence of Outlyng / Observatons 46 of 84 Page 63 of 7 Influence of Outlyng / Observatons Influence on sngle ftted value: nfluence that case has on the ftted value. Omsson s the test. Excluson causes major changes n ftted regresson functon; then a case s ndeed dnfluental. l Crtera for Influental observatons: f DFFITS > (small to medum data sets p Or f DFFITS > (large data sets n Where: DFFITS ˆ MSE ( ( h t h h 45 An aggregate measure s also requred: One whch measures the effect of omsson of case on all n ftted values, not just the -th ftted value. Statstc s Cook s Dstance: D n ( ˆ ˆ j j( j e h pmse pmse ( h Crteron for Influental Observatons: Compare D wth the F dstrbuton wth (p, n-p degrees of freedom. If the percentle (that t D cuts off from the left sde of the dstrbuton curve s 0 or 0 the observaton has lttle nfluence, f ths percentle s 50 or more the nfluence s large of 84 Influence of outlers on betas Another measure s requred: One whch measures the effect of omsson of case on OLS estmates tes of regresson coeffcents (betas. b k b k ( DFBETASk( MSE( ckk Here, c kk s the k-th dagonal element of ( - Crtera for Influental observatons: f DFBETAS > for small data sets, or f DFBETAS > for large data sets. n Regresson Analyss The regresson equaton s = of 84 A Smple Example Predctor Coef StDev T P Constant S =.80 R-Sq = 95.7% R-Sq(adj = 95.6% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Resd. Stud.Res. Del. Stud. Res. h DFFITS COOKD

64 49 of of 84 Page 64 of 7 Quanttatve Forecastng Tme Seres Models Causal Models KNN Ch. (pp Movng Average Exponental Smoothng Trend Models Regresson of 84 5 of 84 Tmeseresdatasasequenceof observatons collectedfromaprocess wthequallyspaced perodsoftme. Contrarytorestrctonsplacedoncrossto restrctons cross sectonaldata,themajorpurposeof forecastngwthtmeseresstoextrapolate tme seres s to extrapolate beyondtherangeoftheexplanatoryvarables. Smoothng Methods Movng Average No Exponental Smoothng Tme Seres Trend? es Lnear Quadratc Exponental Trend Models Auto- Regressve 5 5

65 Regresson Model wth AR( error t 0 t t 53 of 84 t u The errors u t are ndependent and normally dstrbuted N(0, The autoregressve parameter has < t t Multple Regresson Model wth AR( error The prevous smple regresson model can be expanded to accommodate multple predctors t 0 t t... t 54 of 84 t u t p t p t Page 65 of of 84 Autoregressve expanson 56 of 84 Autoregressve expanson The autocorrelaton parameter s the correlaton coeffcent between adjacent error terms Expandng the defnton of t, ( u u u t t t t t t t u u 3 t3 t t u t u t The correlaton coeffcent dmnshes over tme, snce < Ths s why an ACF plot exhbts a dmnshng correlaton pattern for AR( models: ACF PACF Autoregressve Random error component component 55 56

66 Remedal measures for AR errors n regresson models Cochrane Orcutt procedure Hldreth Lu procedure 57 of 84 Frst t dfferences procedure All estmates are close to each other, the last procedure s the smplest Frst Dfferences procedure t u t t t t t t t t Back transformatons: ˆ b b 0 b 0 b b b 58 of 84 (regresson through the orgn Page 66 of of 84 The Blasdell Company Example (Blasdell.xls ear Quarter t CompanySales IndustrySales The Blasdell Company Example (regresson through the orgn t t t t t t t t Back transformatons: b b of 84 ˆ

67 6 of 84 Forecastng 6 of 84 Page 67 of 7 Forecasts obtaned wth autoregressve error regresson models are condtonal on the past observatons Usng recursve relatons, two or three-step ahead forecasts can be obtaned, but predcton ntervals wll expand very fast KNN Ch. 4 (pp of 84 Regresson Models wth Bnary Response Varable In many applcatons the response varable has only two possble outcomes (0/: In a study of lablty nsurance possesson, usng Age of head of household, Amount of lqud assets, and Type of occupaton of head of household as predctors, the response varable had two possble outcomes: House has lablty nsurance (=, or Household does not have lablty nsurance (=0 The fnancal status of a frm (sound status, headed toward nsolvency can be coded as 0/ Blood d pressure status t (hgh h blood pressure, not hgh h blood pressure can be coded as 0/ Meanng of the Response Functon for Bnary Outcomes Consder the smple lnear regresson model 0, 0, 64 of 84 E 0 In ths case, the expected response E{ } has a specal meanng. Consder to be a Bernoull random varable: Probablty P( = = 0 P( =0 =

68 65 of of 84 Page 68 of 7 Meanng of the Response Functon for Bnary Outcomes Usng the defnton of expected value of a random varable, E ( 0( P ( E 0 Therefore, the mean response E{ } s the probablty that = when the level of the predctor varable s. E{} 0 E{} = b 0 + b Problems when Response Varable s Bnary. Error Terms are not normal: At each level, the error cannot be normally dstrbuted snce t takes only possble values, dependng on whether s 0 or. Error Varance s not constant: Error Varance s a functon of, therefore not constant 3. Constrants wth the response functon: We need to fnd response functons that do not exceed the value of, and that s not easy of of 84 Lnk Functons Inverse of dstrbuton functons have a sgmod shape that can be helpful as a response functon of a regresson model wth bnary outcome. Such a functon s called Lnk Functon. We want to choose a lnk functon that best fts our data. Goodness-of-ft ft statstcs t t can be used to compare fts usng dfferent lnk functons: Name Lnk Functon Dstrbuton Mean Varance logt g( = log( / (- logstc 0 p / 3 normt/probt g( = - ( normal 0 gompt g( = log(-log(- Gumbel - (Euler c. p / 6 67 logt transformaton Assumpton: The logt transformaton of the probabltes of the target value results n a lnear relatonshp wth the nput varables. 68

69 69 of of 84 Page 69 of 7 Interpretaton of Parameter Estmates Lnear Regresson ess Target s an nterval varable. Input varables have any measurement level. Predcted values are the mean of the target varable at the gven values of the nput varables. Logstc Regresson ess Target s a dscrete (bnary or ordnal varable. Input varables have any measurement level. Predcted values are the probablty of a partcular level(s of the target varable at the gven values of the nput varables. The nterpretaton of the parameter estmates depends on The lnk functon The reference event ( or 0 The reference factor levels (for numercal factors, reference level s the smallest value The logt lnk functon provdes the most natural nterpretaton of the estmated coeffcents: The odds of a reference event s the rato of P(event to P(not event. The estmated coeffcent of a predctor (factor or covarate s the estmated change n the log of P(event/P(not event for each unt change n the predctor, assumng the other predctors reman constant t of 84 7 of 84 w E( =x = g(x;w E( g - ( =x=g(x;w p (x w 0 + w x + + w p x p log(odds logt(p p log( g - ( p = w 0 + w x + + w p x - p p.0 logt(p w p 0.5 Tranng Data Generalzed Lnear Model Tranng Data

70 73 of of 84 Page 70 of 7 Tranng Data p log( - p log p - p ( = w 0 + w x + + w p x p p = wexp(w 0 + +ww log 0 (x +w ( ++ + x + +w w p x - p p odds rato To dentfy Use Whch h measures poorly ft factor/covarate patterns Pearson resdual the dfference between the actual and the predcted observaton factor/covarate patterns wth strong nfluence on changes n the coeffcents when the j-th factor/covarate pattern s removed, based on Pearson parameter estmates delta beta resduals factor/covarate patterns Leverage leverages of the j-th factor/covarate pattern, a measure of how unusual wth a large leverage (H predctor values are of of 84 HMEQOvervew Determnewhoshouldbe approvedforahomeequtyloan. Thetargetvarablesabnary varablethatndcateswhetheran ndcates an applcanteventuallydefaultedon theloan. Thenputvarablesarevarables are suchastheamountoftheloan, amountdueontheexstng mortgage,thevalueofthethe of the property,andthenumberofrecent credtnqures. Theconsumercredtdepartmentofabankwantstoautomatethe d t t t t t t th decsonmakngprocessforapprovalofhomeequtylnesof credt.todoths,theywllfollowtherecommendatonsofthe EqualCredtOpportuntyActtocreateanemprcallydervedand to create an emprcally derved and statstcallysoundcredtscorngmodel.themodelwllbebased ondatacollectedfromrecentapplcantsgrantedcredtthrough thecurrentprocessofloanunderwrtng.themodelwllbebult of underwrtng wll bult frompredctvemodelngtools,butthecreatedmodelmustbe suffcentlynterpretablesoastoprovdeareasonforanyadverse actons(rejectons. TheHMEQdatasetcontansbaselneandloanperformance nformatonfor5,960recenthomeequtyloans.thetarget(bad sabnaryvarablethatndcatesfanapplcanteventually ndcates f an applcant eventually defaultedorwasserouslydelnquent.thsadverseoutcome occurredn,89cases(0%.foreachapplcant,nput varableswererecorded. recorded

71 77 of of 84 Page 7 of 7 Name Model Role Measurement Level Descrpton BAD Target Bnary =defaulted on loan, 0=pad back loan REASON Input Bnary HomeImp=home mprovement, DebtCon=debt consoldaton JOB Input Nomnal Sx occupatonal categores LOAN Input Interval Amount of loan request MORTDUE Input Interval Amount due on exstng mortgage VALUE Input Interval Value of current property DEBTINC Input Interval Debt-to-ncome rato OJ Input Interval ears at present job DEROG Input Interval Number of major derogatory reports CLNO Input Interval Number of trade lnes DELINQ Input Interval Number of delnquent trade lnes CLAGE Input Interval Age of oldest trade lne n months NINQ Input Interval Number of recent credt nqures Thecredtscorngmodelcomputesa probabltyofagvenloanapplcantdefaultng bl l f l onloanrepayment.athresholdsselected suchthatallapplcantswhoseprobabltyof h l h bl defaultsnexcessofthethresholdare recommendedforrejecton. df of of 84 Formodelcomparsonpurposes,weaddedtwo varables: BEHAVIOR(good/bad,whchprecselymrrorsthe 0/valuesnBAD,toseehowwecanperfectly tl predctbadusngnsdernformaton FLIPCOIN(Head/Tal,whchscompletelyrandom, s completely toseefwecanpredctbadusngrandomflpsofa con Enterprse-grade (and expensve! Data Mnng package Implemented Methodology: Sample-Explore-Modfy-Model-Assess (SEMMA Avalable Modelng Tools: Logstc Regresson Many others, such as Decson Trees, Neural Networks, Clusterng, Market-Basket, etc

72 8 of 84 8 of 84 Page 7 of 7 Three logstc Regresson nodes were added to the Analyss Dagram. In order to compare them, a Compare node was added of of PerfectRegressons,ofcourse,perfect. InBaselneRegresson,0%oftheborrowersdefault, of borrowersdefault regardlessoffttedvalue StepwseRegressonssomewherebetweentheothertwo somewhere the other two models 84

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 13 13-1 Basc Busness Statstcs 11 th Edton Chapter 13 Smple Lnear Regresson Basc Busness Statstcs, 11e 009 Prentce-Hall, Inc. Chap 13-1 Learnng Objectves In ths chapter, you learn: How to use regresson

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Chapter 14 Simple Linear Regression

Chapter 14 Simple Linear Regression Chapter 4 Smple Lnear Regresson Chapter 4 - Smple Lnear Regresson Manageral decsons often are based on the relatonshp between two or more varables. Regresson analss can be used to develop an equaton showng

More information

Statistics for Business and Economics

Statistics for Business and Economics Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear

More information

Introduction to Regression

Introduction to Regression Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε Chapter 3 Secton 3.1 Model Assumptons: Multple Regresson Model Predcton Equaton Std. Devaton of Error Correlaton Matrx Smple Lnear Regresson: 1.) Lnearty.) Constant Varance 3.) Independent Errors 4.) Normalty

More information

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Chapter 3. Two-Variable Regression Model: The Problem of Estimation Chapter 3. Two-Varable Regresson Model: The Problem of Estmaton Ordnary Least Squares Method (OLS) Recall that, PRF: Y = β 1 + β X + u Thus, snce PRF s not drectly observable, t s estmated by SRF; that

More information

Chapter 15 - Multiple Regression

Chapter 15 - Multiple Regression Chapter - Multple Regresson Chapter - Multple Regresson Multple Regresson Model The equaton that descrbes how the dependent varable y s related to the ndependent varables x, x,... x p and an error term

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Chapter 15 Student Lecture Notes 15-1

Chapter 15 Student Lecture Notes 15-1 Chapter 15 Student Lecture Notes 15-1 Basc Busness Statstcs (9 th Edton) Chapter 15 Multple Regresson Model Buldng 004 Prentce-Hall, Inc. Chap 15-1 Chapter Topcs The Quadratc Regresson Model Usng Transformatons

More information

Learning Objectives for Chapter 11

Learning Objectives for Chapter 11 Chapter : Lnear Regresson and Correlaton Methods Hldebrand, Ott and Gray Basc Statstcal Ideas for Managers Second Edton Learnng Objectves for Chapter Usng the scatterplot n regresson analyss Usng the method

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

Topic 7: Analysis of Variance

Topic 7: Analysis of Variance Topc 7: Analyss of Varance Outlne Parttonng sums of squares Breakdown the degrees of freedom Expected mean squares (EMS) F test ANOVA table General lnear test Pearson Correlaton / R 2 Analyss of Varance

More information

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 1 Chapters 14, 15 & 16 Professor Ahmad, Ph.D. Department of Management Revsed August 005 Chapter 14 Formulas Smple Lnear Regresson Model: y =

More information

18. SIMPLE LINEAR REGRESSION III

18. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III US Domestc Beers: Calores vs. % Alcohol Ftted Values and Resduals To each observed x, there corresponds a y-value on the ftted lne, y ˆ ˆ = α + x. The are called ftted values.

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 14 Multple Regresson Models 1999 Prentce-Hall, Inc. Chap. 14-1 Chapter Topcs The Multple Regresson Model Contrbuton of Indvdual Independent Varables

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III Ftted Values and Resduals US Domestc Beers: Calores vs. % Alcohol To each observed x, there corresponds a y-value on the ftted lne, y ˆ = βˆ + βˆ x. The are called ftted

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact Multcollnearty multcollnearty Ragnar Frsch (934 perfect exact collnearty multcollnearty K exact λ λ λ K K x+ x+ + x 0 0.. λ, λ, λk 0 0.. x perfect ntercorrelated λ λ λ x+ x+ + KxK + v 0 0.. v 3 y β + β

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

The Ordinary Least Squares (OLS) Estimator

The Ordinary Least Squares (OLS) Estimator The Ordnary Least Squares (OLS) Estmator 1 Regresson Analyss Regresson Analyss: a statstcal technque for nvestgatng and modelng the relatonshp between varables. Applcatons: Engneerng, the physcal and chemcal

More information

Properties of Least Squares

Properties of Least Squares Week 3 3.1 Smple Lnear Regresson Model 3. Propertes of Least Squares Estmators Y Y β 1 + β X + u weekly famly expendtures X weekly famly ncome For a gven level of x, the expected level of food expendtures

More information

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the Chapter 11 Student Lecture Notes 11-1 Lnear regresson Wenl lu Dept. Health statstcs School of publc health Tanjn medcal unversty 1 Regresson Models 1. Answer What Is the Relatonshp Between the Varables?.

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Correlation and Regression

Correlation and Regression Correlaton and Regresson otes prepared by Pamela Peterson Drake Index Basc terms and concepts... Smple regresson...5 Multple Regresson...3 Regresson termnology...0 Regresson formulas... Basc terms and

More information

STATISTICS QUESTIONS. Step by Step Solutions.

STATISTICS QUESTIONS. Step by Step Solutions. STATISTICS QUESTIONS Step by Step Solutons www.mathcracker.com 9//016 Problem 1: A researcher s nterested n the effects of famly sze on delnquency for a group of offenders and examnes famles wth one to

More information

e i is a random error

e i is a random error Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown

More information

Biostatistics 360 F&t Tests and Intervals in Regression 1

Biostatistics 360 F&t Tests and Intervals in Regression 1 Bostatstcs 360 F&t Tests and Intervals n Regresson ORIGIN Model: Y = X + Corrected Sums of Squares: X X bar where: s the y ntercept of the regresson lne (translaton) s the slope of the regresson lne (scalng

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unt 10: Smple Lnear Regresson and Correlaton Statstcs 571: Statstcal Methods Ramón V. León 6/28/2004 Unt 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regresson analyss s a method for studyng the

More information

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students. PPOL 59-3 Problem Set Exercses n Smple Regresson Due n class /8/7 In ths problem set, you are asked to compute varous statstcs by hand to gve you a better sense of the mechancs of the Pearson correlaton

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Chapter 4: Regression With One Regressor

Chapter 4: Regression With One Regressor Chapter 4: Regresson Wth One Regressor Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-1 Outlne 1. Fttng a lne to data 2. The ordnary least squares (OLS) lne/regresson 3. Measures of ft 4. Populaton

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Diagnostics in Poisson Regression. Models - Residual Analysis

Diagnostics in Poisson Regression. Models - Residual Analysis Dagnostcs n Posson Regresson Models - Resdual Analyss 1 Outlne Dagnostcs n Posson Regresson Models - Resdual Analyss Example 3: Recall of Stressful Events contnued 2 Resdual Analyss Resduals represent

More information

β0 + β1xi. You are interested in estimating the unknown parameters β

β0 + β1xi. You are interested in estimating the unknown parameters β Ordnary Least Squares (OLS): Smple Lnear Regresson (SLR) Analytcs The SLR Setup Sample Statstcs Ordnary Least Squares (OLS): FOCs and SOCs Back to OLS and Sample Statstcs Predctons (and Resduals) wth OLS

More information

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3. Outlne 3. Multple Regresson Analyss: Estmaton I. Motvaton II. Mechancs and Interpretaton of OLS Read Wooldrdge (013), Chapter 3. III. Expected Values of the OLS IV. Varances of the OLS V. The Gauss Markov

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Problem of Estimation. Ordinary Least Squares (OLS) Ordinary Least Squares Method. Basic Econometrics in Transportation. Bivariate Regression Analysis

Problem of Estimation. Ordinary Least Squares (OLS) Ordinary Least Squares Method. Basic Econometrics in Transportation. Bivariate Regression Analysis 1/60 Problem of Estmaton Basc Econometrcs n Transportaton Bvarate Regresson Analyss Amr Samm Cvl Engneerng Department Sharf Unversty of Technology Ordnary Least Squares (OLS) Maxmum Lkelhood (ML) Generally,

More information

17 - LINEAR REGRESSION II

17 - LINEAR REGRESSION II Topc 7 Lnear Regresson II 7- Topc 7 - LINEAR REGRESSION II Testng and Estmaton Inferences about β Recall that we estmate Yˆ ˆ β + ˆ βx. 0 μ Y X x β0 + βx usng To estmate σ σ squared error Y X x ε s ε we

More information

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2 Chapter 4 Smple Lnear Regresson Page. Introducton to regresson analyss 4- The Regresson Equaton. Lnear Functons 4-4 3. Estmaton and nterpretaton of model parameters 4-6 4. Inference on the model parameters

More information

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10) I. Defnton and Problems Econ7 Appled Econometrcs Topc 9: Heteroskedastcty (Studenmund, Chapter ) We now relax another classcal assumpton. Ths s a problem that arses often wth cross sectons of ndvduals,

More information

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5). (out of 15 ponts) STAT 3340 Assgnment 1 solutons (10) (10) 1. Fnd the equaton of the lne whch passes through the ponts (1,1) and (4,5). β 1 = (5 1)/(4 1) = 4/3 equaton for the lne s y y 0 = β 1 (x x 0

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Statistics MINITAB - Lab 2

Statistics MINITAB - Lab 2 Statstcs 20080 MINITAB - Lab 2 1. Smple Lnear Regresson In smple lnear regresson we attempt to model a lnear relatonshp between two varables wth a straght lne and make statstcal nferences concernng that

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

Lecture 4 Hypothesis Testing

Lecture 4 Hypothesis Testing Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

More information

ANOVA. The Observations y ij

ANOVA. The Observations y ij ANOVA Stands for ANalyss Of VArance But t s a test of dfferences n means The dea: The Observatons y j Treatment group = 1 = 2 = k y 11 y 21 y k,1 y 12 y 22 y k,2 y 1, n1 y 2, n2 y k, nk means: m 1 m 2

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x

More information

F statistic = s2 1 s 2 ( F for Fisher )

F statistic = s2 1 s 2 ( F for Fisher ) Stat 4 ANOVA Analyss of Varance /6/04 Comparng Two varances: F dstrbuton Typcal Data Sets One way analyss of varance : example Notaton for one way ANOVA Comparng Two varances: F dstrbuton We saw that the

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting. The Practce of Statstcs, nd ed. Chapter 14 Inference for Regresson Introducton In chapter 3 we used a least-squares regresson lne (LSRL) to represent a lnear relatonshp etween two quanttatve explanator

More information

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION 014-015 MTH35/MH3510 Regresson Analyss December 014 TIME ALLOWED: HOURS INSTRUCTIONS TO CANDIDATES 1. Ths examnaton paper contans FOUR (4) questons

More information

β0 + β1xi and want to estimate the unknown

β0 + β1xi and want to estimate the unknown SLR Models Estmaton Those OLS Estmates Estmators (e ante) v. estmates (e post) The Smple Lnear Regresson (SLR) Condtons -4 An Asde: The Populaton Regresson Functon B and B are Lnear Estmators (condtonal

More information

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity ECON 48 / WH Hong Heteroskedastcty. Consequences of Heteroskedastcty for OLS Assumpton MLR. 5: Homoskedastcty var ( u x ) = σ Now we relax ths assumpton and allow that the error varance depends on the

More information

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes 25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li Bostatstcs Chapter 11 Smple Lnear Correlaton and Regresson Jng L jng.l@sjtu.edu.cn http://cbb.sjtu.edu.cn/~jngl/courses/2018fall/b372/ Dept of Bonformatcs & Bostatstcs, SJTU Recall eat chocolate Cell 175,

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction ECONOMICS 35* -- NOTE 7 ECON 35* -- NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model

More information

β0 + β1xi. You are interested in estimating the unknown parameters β

β0 + β1xi. You are interested in estimating the unknown parameters β Revsed: v3 Ordnar Least Squares (OLS): Smple Lnear Regresson (SLR) Analtcs The SLR Setup Sample Statstcs Ordnar Least Squares (OLS): FOCs and SOCs Back to OLS and Sample Statstcs Predctons (and Resduals)

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA Sngle classfcaton analyss of varance (ANOVA) When to use ANOVA ANOVA models and parttonng sums of squares ANOVA: hypothess testng ANOVA: assumptons A non-parametrc alternatve: Kruskal-Walls ANOVA Power

More information

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov, UCLA STAT 3 ntroducton to Statstcal Methods for the Lfe and Health Scences nstructor: vo Dnov, Asst. Prof. of Statstcs and Neurology Chapter Analyss of Varance - ANOVA Teachng Assstants: Fred Phoa, Anwer

More information

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X). 11.4.1 Estmaton of Multple Regresson Coeffcents In multple lnear regresson, we essentally solve n equatons for the p unnown parameters. hus n must e equal to or greater than p and n practce n should e

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation Econ 388 R. Butler 204 revsons Lecture 4 Dummy Dependent Varables I. Lnear Probablty Model: the Regresson model wth a dummy varables as the dependent varable assumpton, mplcaton regular multple regresson

More information

experimenteel en correlationeel onderzoek

experimenteel en correlationeel onderzoek expermenteel en correlatoneel onderzoek lecture 6: one-way analyss of varance Leary. Introducton to Behavoral Research Methods. pages 246 271 (chapters 10 and 11): conceptual statstcs Moore, McCabe, and

More information

Continuous vs. Discrete Goods

Continuous vs. Discrete Goods CE 651 Transportaton Economcs Charsma Choudhury Lecture 3-4 Analyss of Demand Contnuous vs. Dscrete Goods Contnuous Goods Dscrete Goods x auto 1 Indfference u curves 3 u u 1 x 1 0 1 bus Outlne Data Modelng

More information

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav; ctvty #3: Smple Lnear Regresson Resources: actgpa.sav; beer.sav; http://mathworld.wolfram.com/leastfttng.html In the last actvty, we learned how to quantfy the strength of the lnear relatonshp between

More information

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1 Lecture 9: Interactons, Quadratc terms and Splnes An Manchakul amancha@jhsph.edu 3 Aprl 7 Remnder: Nested models Parent model contans one set of varables Extended model adds one or more new varables to

More information

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected. ANSWERS CHAPTER 9 THINK IT OVER thnk t over TIO 9.: χ 2 k = ( f e ) = 0 e Breakng the equaton down: the test statstc for the ch-squared dstrbuton s equal to the sum over all categores of the expected frequency

More information

Chapter 8 Multivariate Regression Analysis

Chapter 8 Multivariate Regression Analysis Chapter 8 Multvarate Regresson Analyss 8.3 Multple Regresson wth K Independent Varables 8.4 Sgnfcance tests of Parameters Populaton Regresson Model For K ndependent varables, the populaton regresson and

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values Fall 007 Soluton to Mdterm Examnaton STAT 7 Dr. Goel. [0 ponts] For the general lnear model = X + ε, wth uncorrelated errors havng mean zero and varance σ, suppose that the desgn matrx X s not necessarly

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information