IV. Modeling a Mean: Simple Linear Regression

Size: px
Start display at page:

Download "IV. Modeling a Mean: Simple Linear Regression"

Transcription

1 IV. Modelng a Mean: Smple Lnear Regresson We have talked about nference for a sngle mean, for comparng two means, and for comparng several means. What f the mean of one varable depends on the value of another contnuous-type t varable? In the case of a lnear trend (.e., the value of one outcome tends to ncrease or decrease lnearly wth an ncrease n another varable), we can ft a model that descrbes that trend. The tool most often used for ths knd of analyss s called a lnear regresson model.

2 Samplng Pars of Ponts In ths settng, we have n pars of ponts (X 1,Y 1 ), (X 2,Y 2 ),,(X n,y n ). In other words, we are measurng two varables for each subject, samplng from a populaton as demonstrated n the schematc below. In ths case we are often nterested n how the varables correlate or covary. In other words, consderng the X varable as the explanatory or ndependent varable, and the Y varable as the outcome or dependent varable, the queston s: How does the average E(Y) depends on value of X? Populaton E(Y), σ 2 Y E(X), σ 2 X X ( X, Y ),( X 1 2, Y 2 ),...,( X n, Y 1 n ) Sample X, s Y 2 X 2, s Y

3 Exploratory Analyss Stat 5100 Lnear Regresson and Tme Seres As ndcated on the prevous slde, an mportant frst step n explorng the dstrbutons of pared observatons s to use summary statstcs and unvarate charts (e.g., hstograms, boxplots) to understand ther margnal or ndvdual dstrbutons. Snce we re nterested n how the varables relate to each other, a key subsequent step s to construct a two-dmensonal scatterplot, treatng Y as a functon of X. Here, each pont n the plot corresponds to one observaton n the sample, and s determned by the correpondng (X,Y) coordnate of that observaton. To plot a pont (X, Y )from the sample: Y (X, Y ) 0 0 X

4 Patterns of Assocaton A plot helps us quckly to dentfy relatonshps between varables that can nform how we model the relatonshp. For example, what do you observe from the followng plot, of the % of physcally actve adults n each U.S. state versus average annual temperature? (Source:

5 Lnear versus Nonlnear Assocatons For now, as we focus on smple regresson models, we wll look at examples where the relatonshp between the two varables of nterest s lnear. However, n appled research settngs one mght observe a wde varety of patterns. What s the relatonshp between X and Y n the scatterplot to the rght? Later, when we dscuss multvarable models we wll see how such nonlnear relatonshps can be accommodated usng a regresson approach.

6 Functonal versus Statstcal Relatonshps It s also mportant to dstngush between a purely functonal assocaton between two varables and what we mght term a statstcal assocaton. A functonal relatonshp s one that s determnstc.e., a gven value of X yelds the exact same value for Y whenever an experment s repeated. An example of ths s the dstance Y travelled by an object n free fall over tme T, whch s gven by Y 1 gt g 2 2, where g s the acceleraton due to gravty at or near sea level. On the other hand, a statstcal relatonshp s one for whch a gven value of X yelds dfferent values of Y for repettons of the same experment. That s, Y s random, and t s dstrbuton may depend upon X. In the lnear regresson settng, we assume that t E(Y)l lnearly l depends d upon X.

7 Example IV.A Engneers were nterested n the effects of salt dstrbuton on the roadways wth salt concentraton n adjacent waterways. They gathered data at 20 locatons, measurng the roadway area at each ste along wth the salt concentraton n the nearby rver. The data are shown on the followng slde (they are also posted on the course webste n the fle salt.txt ). We would lke to know whether or not greater roadway area s assocated wth hgher average salt concentraton. Why s t natural to desgnate the explanatory varable and Why s t natural to desgnate the explanatory varable and response varable n ths way?

8 Example IV.A (cont d) Salt Roadway Salt Roadway Obs Concentraton Area Obs Concentraton Area

9 Example IV.A (cont d) The scatterplot for these data s gven below: SA ALT CONCENTRAT TION ROADWAY AREA

10 Example IV.A (cont d) What sort of relatonshp do you observe between the area of a gven road and the amount of salt found n nearby waterways? Based on what we observe, we would lke to answer a few questons. These mght nclude: What s the observed average ncrease n salt concentraton for ncrementally larger roadway area? Is ths average ncrease statstcally sgnfcant? That s to say, s the observed correlaton real, or can t be attrbuted to chance?

11 Example IV.A (cont d) One way of answerng these sorts of questons s to ft a model. dl That s, snce we observe a somewhat lnear assocaton (average salt concentraton appears to ncrease lnearly wth ncreased road area) we ft such a lne to the data. Knowng that a lne s determned by a slope and an ntercept, the queston s how do we select the best lne? A statstcal soluton to ths problem s the so-called least-squares ft, or lnear regresson of salt concentraton t on road area. The followng slde shows ths regresson lne overlad on the scatterplot of salt versus area.

12 Example IV.A (cont d) SALT CONCE ENTRATION ROADWAY AREA

13 The Model As noted earler, n general we sample pars of ponts (X 1,Y 1 ), (X 2,Y 2 ),,(X n,y n ), where X s referred to as the explanatory varable and Y s the response varable. Note that ths does not mply that X necessarly causes Y, although that s possble. X and Y may smply be assocated, wthout any causatve effect. We typcally want to explan changes n the average of Y due to a dfference n X. If X and Y appear to be lnearly assocated, where Y on average ncreases or decreases lnearly wth an ncrease n X, then we may post the lnear model Y, 0 1X where the ntercept β 0 and slope β 1 determne the lne, and ε models the varablty around the lne.

14 The Error Term The scatterplot n Example V.A s a typcal representaton of varables that are lnearly assocated: the ponts form a sort of cloud. That s to say, the ponts do not le on a straght lne, ndcatng that even though Y tends to ncrease or decrease lnearly wth X, a gven value of X wll not necessarly result n exactly the same value of Y. That s, the relatonshp s not determnstc. The error term ε n the model accounts for ths varablty around the lne β 0 + β 1 x. Note that β 0 + β 1 X s fxed,, not random. We further typcally assume that ε ~ N(0, σ 2 ). In other words, gven the value X, we have E(Y ) = β 0 + β 1 X, and Var(Y ) = σ 2. Therefore, Y ~ N(β 0 + β 1 X, σ 2 ).

15 The Model Parameters What do the terms n the lnear model mean? The ntercept β 0 represents the average of y for x = 0. Although the ntercept s mathematcally necessary n order to specfy the form of the lne n the model, t seldom has practcal meanng. The slope β 1 s generally the focus of nference: t represents the change n the average of y for every one unt ncrease n x. Snce we are nterested n how y changes wth x, then a nonzero slope ndcates that y and x are lnearly assocated. The varance term σ 2 represents the varablty of the data around the lne.

16 The Experment As always, our object s to nfer somethng about the underlyng model parameters by samplng from the populaton and then analyzng the data. Havng gposted the regresson model, we can thnk of the samplng n ths way: Y Populaton X 0 1X 2 ~ N(0, ), ( X, Y ),( X 1 2, Y 2 ),...,( X n, Y 1 n ) Sample estmate t β 0, β 1, and σ from data.

17 Example IV.B Suppose that the purty of a chemcal soluton Y s related to the amount of catalyst X through a lnear regresson model wth β 0 = 123.0, β 1 = 2.16, and wth an error standard devaton of σ = What s the expected value of the purty when the catalyst level s 20? How much does the average purty change when the catalyst amount s ncreased by 10? What s the probablty that the purty s less than 60 when the catalyst level s 25?

18 In practcal research settngs, we do not know the actual parameter values. As ndcated on the schematc two sldes prevous, we sample from the populaton wth the posted regresson model and then estmate the regresson parameters from the data. Example IV.C Wrte out a lnear model for the experment descrbed n Example IV.A. Clearly nterpret each of the parameters of the model.

19 How do we estmate the model parameters? That s to say, what s the best lne, based on the data? In statstcal applcatons, we choose the lne that acheves the mnmum squared dstance between tself and the collectve observed data ponts. Note that the dstance from a gven value Y and ts assocated pont on the lne s gven by Y (β 0 + β 1 X ). We call ths the resdual. It turns out that we compute estmates of the slope, ntercept, and varance that mnmze the sum of the squared resduals. The resultng estmates of the slope and ntercept are gven by b n X Y nxy 1, b0 Y b1. n 2 2 X nx 1 X 1

20 Dervaton of Parameter Estmates One way of thnkng about how b 0 and b 1 are derved s to consder drect mnmzaton of the sum of squared resduals: Q n 1 2 n 1 [ Y ( 0 X 1 )] 2. We sometmes refer to Q as the objectve functon. How can we mnmze ths functon wth respect to β 0 and β 1? (Some dscusson about ths s contaned n Secton 1.6 of the text, although the techncal detals are not that mportant.)

21 Example IV.D Usng the data gven n Example IV.A, ft the model specfed n Example IV.C. The necessary summary statstcs are gven below: X Y X n 20 X Y What does the estmated ntercept represent n the model ft? Interpret the estmated slope what does t say about the observed relatonshp between road area and average salt concentraton?

22 Interpretng the Model Ft Note that once we have obtaned our estmates of the ntercept and slope, the ftted value for y s gven by Yˆ b b X 0 1. There are two ways of vewng such a ftted value: The ftted value s our predcted Y for the gven X. The ftted value s our estmate of the average Y for the gven X.

23 Example IV.E Based on the model ft n Example IV.D, what s the predcted salt concentraton when the adjacent road area s 0.75? What s our estmated average salt concentraton level when the adjacent road area s 0.75? What s the predcted salt concentraton when the road area s 2.0? Why should we be cautous about ths last predcton?

24 Estmatng σ 2 The last of the three parameters that we need to estmate s the model varance, whch represents the varablty of the y s around the regresson lne. Note that the estmated resduals based upon the model ft are gven by e Y Y ˆ Y ( b b X ), 1,...,. 0 1 n Therefore, our estmate of the model varance σ 2 s the observed average squared resdual, also called the mean square error (MSE): s n ( ˆ ) 2 n 2 SSE e. 1 Y 1 Y n 2 n 2 n 2

25 Example IV.F Stat 5100 Lnear Regresson and Tme Seres The table below shows both the observed salt concentraton and predcted salt concentraton (usng the ftted lne) for the observatons n Example IV.A. Salt Concentraton Salt Concentraton Obs # Observed Predcted Obs # Observed Predcted

26 Example IV.F (cont d) Based on the ftted values, we can see that the resdual for the frst observaton s 2.21, for the second observaton the resdual s 0.59, and so forth. The average squared resdual therefore s gven by s (-2.21) 18 2 e 2 (0.59) 2 (1.60)

27 Inference for the Slope β 1 Remember, the fundamental queston n a lnear regresson analyss s whether the dependent and ndependent varables are lnearly assocated. As always, there are two aspects to the analyss: Do we observe a slope that s dfferent from zero? That s, does the average of the outcome varable depend on the value of the explanatory varable? Do the data provde evdence that the slope s sgnfcantly dfferent from zero? That s, can we nfer from our data that the relatonshp we observe holds for the underlyng populaton? To address the second ssue we need to know somethng about the To address the second ssue, we need to know somethng about the dstrbuton of the our estmated slope.

28 Dstrbuton of the Estmated Slope Not surprsngly, t turns out that the estmated slope b 1 s approxmately normally dstrbuted (provded that the sample s random and n most cases that the sample sze s suffcently large). The mean of the dstrbuton b t of b 1 s β 1. The estmated t standard error s gven by n 2 1/2 ( X ), s.e.( b1 ) s{ b1} s / 1 X where s 2 s the model MSE (or estmate of model varance σ 2 ). Snce we need to rely on the estmated standard error (.e., σ s unknown), then we use the t(n 2) ( ) dstrbuton to obtan a confdence nterval and hypothess test for β 1.

29 Confdence Interval and Hypothess Test for β 1 A(1 α)100% confdence nterval for β 1 s therefore gven by b1 t( 1 / 2; n 2)s{ b1}. We also would lke to test the null hypothess H 0 : β 1 = 0versusthe alternatve hypothess H A : β 1 0. A test statstc for assessng the evdence aganst H 0 s gven by t b1 0. s{ b } Under H 0, ths test statstc approxmately follows the t(n 2) dstrbuton. The p-value s therefore gven by 2P{t(n 2) t }. Note that we can concevably test aganst any specfc value of β Note that we can concevably test aganst any specfc value of β 1, although 0 s generally the value of nterest. 1

30 Example IV.G Gve a 95% confdence nterval for the slope parameter n the model of Example IV.C, based upon the observed data gven n Example IV.A. Interpret ths confdence nterval. State the null and alternatve hypotheses for testng a lnear assocaton between road area and average salt concentraton. Explan these hypotheses. Carry out a test of the null hypothess of no lnear assocaton. Carry out a test of the null hypothess of no lnear assocaton. What s the p-value for ths test? Is there evdence of a relatonshp between road area and average salt concentraton?

31 Measurng the Strength of Assocaton Note that the slope s one measure of the lnear assocaton between two contnuous varables t tells you how much the average of the outcome varable changes wth respect to a one-unt ncrease n the explanatory varable. However, the estmated slope tells you nothng about the varablty of the ponts about the lne. Correlaton s a measure of the strength of assocaton between two varables that reflects the degree of varablty around the ftted lne. It s another popular summary statstc for llustratng the degree to whch varables are lnearly assocated.

32 The slope tself does not always reflect the strength of assocaton For example, note that n the two plots below we observe two data sets wth approxmately the same estmated slope. However, the assocaton n the frst case looks much stronger, as the cloud of ponts more tghtly clusters about the regresson lne.

33 The Correlaton Coeffcent The correlaton ρ s another populaton parameter that we can estmate from the data. We typcally use r to denote our estmate of ρ. The so-called correlaton coeffcent r has several mportant features: r has a range of 1 to 1. It s an ndex, and has no unts. The closer r s to 1, the stronger the postve lnear assocaton (r = 1 ndcates perfect postve correlaton). The closer r s to 1, the stronger the negatve lnear assocaton (r = 1 ndcates perfect negatve correlaton). An r close to zero ndcates weak lnear assocaton. If r = 0, ths means no lnear assocaton. r measures lnear assocaton only. Two varables can be hghly correlated n a nonlnear way, nevertheless yeldng r close to 0.

34 Example IV.H Plots llustratng varous values of r:

35 Computng r Our estmated correlaton coeffcent for two varables X and Y s gven by ) )( ( Y Y X X r n n n XY ) ( ) ( 1 1 Y Y X X n n n ny Y nx X nxy X Y n n n

36 Example IV.I Gven the fve summary statstcs n Example IV.D, and that n 1 Y , what s the correlaton coeffcent between salt concentraton and roadway area?

37 Inference for r To carry out a test of H 0 : ρ = 0 versus the alternatve hypothess H A : ρ 0, we can use ths statstc: t r n 2, 2 1 r whch approxmately follows a t(n 2) dstrbuton. In fact, t turns out that ths statstc s algebracally equvalent to the t statstc for testng that the regresson slope s equal to zero. The p-value for ths test s gven by 2P{t(n 2) t }.

38 Example IV.J Carry out a test of the null hypothess that the salt concentraton and road area are not correlated, versus the alternatve hypothess that they are correlated. What s the p-value of ths test? Interpret ths result n words.

39 Inference for Means and Predctons In addton to nferences about the slope, we may also want to construct tests and confdence ntervals for the regresson lne, tself. We wll talk about nference for: (1) The average of Y gven a correspondng value of X, and (2) A predcted value Y gven a correspondng value of X.

40 Inference for a Mean Suppose we want to estmate the average of Y for a gven value of X, denoted by X h. Our estmated average s Ths estmate has a standard error gven by. ˆ 1 0 h h X b b Y s est ate as a sta da d e o g ve by, ) ( ) ( 1 } ˆ { 2 2 X X X X s Y s h h ) ( } { 2 X X n h where s 2 s the regresson MSE (our estmate of the error varance σ 2 ). where s s the regresson MSE (our estmate of the error varance σ ).

41 Inference for a Mean, contnued If E{Y h } represents the actual mean of Y at the value X h, then the statstc Yˆ h E { Yh } s{ Yˆ } fll follows a t(n 2) dstrbuton. b t A 1 α confdence nterval for the mean of Y s therefore gven by Yˆ t 1 / 2; n 2) s{ Yˆ }. h h ( h

42 Example IV.K For the roadway data, compute and nterpret a 95% confdence nterval for the average salt concentraton when the correspondng roadway area s 1.0 m 2.

43 Inference for a Predcton As opposed to estmatng a mean, suppose nstead that we want to make a predcton for a sngle addtonal observaton. Agan, as wth the mean, our estmated predcted value for a gven X h s computed as Yˆ h b b X 0 1 h However, n ths case the estmated predcton has a standard error gven by 1/ ( X ) h X s{pred} s1. 2 n ( X X ) Note the dfference between ths standard error and the one gven for an estmated mean. The extra varablty arses snce here we are estmatng a value for a sngle observaton as opposed to an average over many observatons..

44 Inference for a Predcton, contnued If Y h(new) represents a randomly sampled value of Y for a correspondng X h, then the statstc Y Y h (new) Yˆ h s{pred} fll follows a t(n 2) dstrbuton. b t A 1 α confdence nterval for a predcted Y h(new) s therefore gven by ˆ t( 1 / 2; n 2) s{pred}. Y h

45 Example IV.L For the roadway data, compute and nterpret a 95% confdence nterval for the predcted salt concentraton when the correspondng roadway area s 1.0 m 2.

46 Confdence and Predcton Bands Researchers often fnd t useful to construct a confdence nterval for the regresson lne over the entre range of X-values. We can accomplsh ths by computng the confdence ntervals presented on the prevous sldes ether for the means or the predctons (dependng d on the nvestgatve focus). Ths s obvously accomplshed n general by usng computer software.

47 Example IV.M The plot on the followng slde llustrates confdence and predcton bands for the roadway data. Note the relatve wdths of the ntervals delmted by both sets of bounds. How do you explan the wder ntervals for the predcton bands?

48

49 Analyss of Varance (ANOVA) for Regresson Important nformaton about a regresson analyss s generally dsplayed n an ANOVA table. The underlyng prncple s that the varaton of the Y (or outcome) varable arses from two sources: Total Varaton n Y = Varaton due to Regresson + Unexplaned (Resdual) Varaton

50 Sources of Varaton In more mathematcal terms ths relatonshp can be expressed as: In more mathematcal terms, ths relatonshp can be expressed as: n n n Y Y Y Y Y Y ) ˆ ( ) ˆ ( ) ( where: ) ˆ (, ) ( SSR Y Y SSTO Y Y n n. ) ˆ (, ) ( SSE Y Y SSR Y Y n 1

51 ANOVA F Test for Regresson Coeffcents It turns out that the ANOVA approach provdes us wth a useful way of testng coeffcents and comparng models n a varety of settngs (partcularly for multple regresson wth several varables). For the smple lnear regresson model, the ANOVA F statstc for testng H 0 : β 1 = 0 versus H A : β 1 0 s gven by MSR F, MSE where MSR s the mean squared error due to regresson, or MSR = SSR/df(Regresson); and MSE s the mean squared error s 2, or MSE = SSE/df(Error). There are generally n 1 df assocated wth SSTO. Aswe ve There are generally n 1 df assocated wth SSTO. As we ve dscussed prevously, n the smple model there are n 2 df for SSE, leavng 1 df for SSR.

52 ANOVA F Test for β 1 n the Smple Model For the smple regresson model, relatvely large values of F provde evdence aganst the null H 0 : β 1 = 0, and values of F close to 1.0 ndcate lttle or no evdence aganst the null. The p-value for ths F test s determned by computng the upper-tal probablty for the observed statstc wth respect to the F(1,n 2) dstrbuton. Note that n the smple case, t turns out that the ANOVA F test and the t test tfor the slope (dscussed dearler) are dentcal. That ts, MSR b 1 F t. MSE s{ } b 1

53 ANOVA Table All of ths s summarzed n a table, typcally n ths famlar form: Source Degrees of Freedom Sum of Squares Mean Squares F-statstc p-value Regresson 1 SSR MSR MSR/MSE Error n 2 SSE MSE Total n 1 SSTO

54 Example V.N The ANOVA table for the roadway data s partally completed below. Can you fll n the mssng nformaton? Degrees of Sum of Mean Source Freedom Squares Squares F-statstc p-value Regresson Error 18 Total

55 Example IV.N (cont d) Based on the results of the ANOVA procedure, what are your conclusons regardng the assocaton between roadway area and salt concentraton?

56 Checkng Model Assumptons What are some of the underlyng assumptons we have dscussed wth respect to the smple regresson model?

57 Resduals and Standardzed Resduals Examnng the observed resduals can provde key dagnostc nformaton about whether model assumptons are volated. Recall that the resdual e for the th subject s gven by e Y Y ˆ, 1,..., n. Snce the actual varance of the resduals s σ 2, the estmated varance s gven by the MSE. It turns out that computng the actual standard devaton of the resduals s a lttle more complex than smply takng (MSE) 1/2, but ths estmate s not too far off. We therefore defne what s referred to as the semstandardzed or semstudentzed resdual as * e e. MSE

58 Exploraton of Resduals A regresson analyss s generally accompaned by an examnaton of the resduals or standardzed resduals, to assess the lnearty of the relatonshp between X and Y, the normalty of the resduals, the constancy of the resdual varance across the range of X, the ndependence d of the resduals, effects of potental outlers (both wth respect to X and Y), and whether any addtonal explanatory factors may have been omtted.

59 Dagnostcs for Lnearty A scatterplot s one of the best ways to assess the nature of the X-Y relatonshp, but a plot of the resduals (versus ether the predctor varable X or the ftted values) can also reveal patterns that could ndcate nonlnearty. Note the nonlnear pattern n the plots below:

60 Example IV.O Stat 5100 Lnear Regresson and Tme Seres Is there anythng n the resdual plot (below) for the roadway data to ndcate nonlnearty?

61 Evaluatng Non-constant Varance Stat 5100 Lnear Regresson and Tme Seres Resdual plots can also be very useful n assessng whether the varance remans constant across the range of X. The plots below llustrate a classc pattern where ths assumpton s not met: In examnng the resdual plot n Example IV.O, s there any evdence of nonconstant varance for the roadway data?

62 Evaluatng Dependence Between Resduals Ths can sometmes be trcky, but dependence most often manfests tself wth respect to the sequence, or temporal orderng, of the measurements. Where an nvestgator knows the order n whch observatons were sampled, he or she ought to plot the resduals versus samplng sequence to ensure there s no systematc correlaton between contguous observatons. Note that nformaton about samplng order may not always be avalable.

63 Example IV.P Stat 5100 Lnear Regresson and Tme Seres Consder the data plotted below. How do the plots look n terms of lnearty and varance?

64 Example IV.P (contnued) The plot below shows the relatonshp between the resduals and the order of measurements for the data plotted on the prevous slde. What do you observe?

65 Outlers In addton to ntal unvarate exploratory analyses, resdual plots can be useful for dentfyng outlers. Note that t outlers wth respect to the dstrbuton b t of Y or X n a regresson settng can potentally nfluence the model ft n dramatcally dfferent ways. In some cases, outlers may not have any apprecable effect on the analyss. Smply dentfyng outlers s no reason to smply throw them out such observatons must be examned ndvdually to (hopefully) explan why they have relatvely extreme values. An outler may exst tbecause of mscodng, ncorrect samplng, or even just sheer randomness.

66 Example IV.Q Note the outler below wth respect to the dstrbuton of the Y varable. What effect (f any) does ths observaton have on the model ft? OUTLIER

67 Example IV.Q (contnued) A plot below of the resduals for the data on the prevous slde clearly dentfes the outler. Interestngly, the observaton appears to be exertng very lttle nfluence on the model ft. OUTLIER

68 Example IV.R Stat 5100 Lnear Regresson and Tme Seres In the plot below, the outler s extreme n partcular wth respect to the dstrbuton of the X varable. These knds of outlers can be partcularly problematc n terms of ther nfluence on model ft. OUTLIER

69 Normalty of Resduals A conventonal unvarate analyss (.e., wth summary statstcs, boxplots, etc.) can be useful n examnng the dstrbuton of resduals. The so-called normal probablty plot (also known as a normalquantle or Q-Q plot) s also useful for assessng the normalty of resduals. AQ-Q plot for a gven sample s constructed by plottng the emprcal AQ Q plot for a gven sample s constructed by plottng the emprcal standardzed quantles for the data aganst the quantles that would be expected gven the data arse from a normal dstrbuton.

70 Example IV.S AQ Q-Q Q plot for the resduals from the roadway data model s shown on the followng slde. Note that the f the plotted data are at least approxmately normally dstrbuted, then the ponts should roughly follow a straght lne. What s your nterpretaton of ths plot?

71 Example IV.S (contnued)

72 Example IV.T The followng two sldes llustrate examples of Q-Q Q plots for non-normal data. What s the nature of the devaton from normalty n each case?

73 Example IV.T (contnued)

74 Example IV.T (contnued)

75 Varable Transformatons Problems wth nonlnearty, non-constant varance, or nonnormalty can frequently be fxed wth a smple transformaton. Logarthmc and power transformatons are the most wdely appled. The followng example llustrates the utlty of ths approach.

76 Example IV.U The data for ths example come from a study of water use and household ncome n Concord, NH, durng the summer of 1981 (the dataset s posted as concord.txt on the course webste). The followng three sldes contan a scatterplot wth ftted regresson lne along wth two resdual plots. What potental problems, f any, do you observe wth respect to model assumptons?

77 Example IV.U (contnued) Yˆ X

78 Example IV.U (contnued)

79 Example IV.U (contnued)

80 Example IV.U (contnued) In ths case, because of the postve skew of the water use dstrbuton, as well as the ncreasng varance of the resduals, t would be useful to explore a log transformaton or a transformaton usng a power < 1. The followng sx sldes llustrate alternatve fts for these data, frst usng a log transformaton, and second wth a transformaton usng a power of 0.3 for water use. What are your conclusons? How do you nterpret the ftted coeffcents n each case?

81 Example IV.U (contnued) log( Y ˆ) X

82 Example IV.U (contnued)

83 Example IV.U (contnued)

84 Example IV.U (contnued) Y ˆ X

85 Example IV.U (contnued)

86 Example IV.U (contnued)

87 Addtonal Notes on Dagnostcs Assessng the normalty of resduals can be a bt trcky under certan crcumstances. For example, resduals may actually be normally dstrbuted, but plots (such as boxplots or Q-Q plots) can appear nonnormal because of () randomness (especally wth a small sample sze), or () the excluson of one or more addtonal key varables. It s usually a good dea to check other assumptons frst such as lnearty and nonconstant varance before checkng normalty. Even where the outcome varable sn t exactly normally dstrbuted, substantve conclusons based on a regresson model ft may stll be fundamentally correct gven a relatvely large sample sze. Ths s n some sense due to the fact that we are estmatng an average, meanng that the Central Lmt Theorem apples to the dstrbuton of the ftted mean. We have not llustrated here wth an example, but to check the possblty that other varables are addtonally assocated wth Y, we generally begn smply by constructng addtonal scatterplots.

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 13 13-1 Basc Busness Statstcs 11 th Edton Chapter 13 Smple Lnear Regresson Basc Busness Statstcs, 11e 009 Prentce-Hall, Inc. Chap 13-1 Learnng Objectves In ths chapter, you learn: How to use regresson

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Measuring the Strength of Association

Measuring the Strength of Association Stat 3000 Statstcs for Scentsts and Engneers Measurng the Strength of Assocaton Note that the slope s one measure of the lnear assocaton between two contnuous varables t tells ou how much the average of

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III Ftted Values and Resduals US Domestc Beers: Calores vs. % Alcohol To each observed x, there corresponds a y-value on the ftted lne, y ˆ = βˆ + βˆ x. The are called ftted

More information

Learning Objectives for Chapter 11

Learning Objectives for Chapter 11 Chapter : Lnear Regresson and Correlaton Methods Hldebrand, Ott and Gray Basc Statstcal Ideas for Managers Second Edton Learnng Objectves for Chapter Usng the scatterplot n regresson analyss Usng the method

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

18. SIMPLE LINEAR REGRESSION III

18. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III US Domestc Beers: Calores vs. % Alcohol Ftted Values and Resduals To each observed x, there corresponds a y-value on the ftted lne, y ˆ ˆ = α + x. The are called ftted values.

More information

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε Chapter 3 Secton 3.1 Model Assumptons: Multple Regresson Model Predcton Equaton Std. Devaton of Error Correlaton Matrx Smple Lnear Regresson: 1.) Lnearty.) Constant Varance 3.) Independent Errors 4.) Normalty

More information

Statistics for Business and Economics

Statistics for Business and Economics Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting. The Practce of Statstcs, nd ed. Chapter 14 Inference for Regresson Introducton In chapter 3 we used a least-squares regresson lne (LSRL) to represent a lnear relatonshp etween two quanttatve explanator

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Chapter 14 Simple Linear Regression

Chapter 14 Simple Linear Regression Chapter 4 Smple Lnear Regresson Chapter 4 - Smple Lnear Regresson Manageral decsons often are based on the relatonshp between two or more varables. Regresson analss can be used to develop an equaton showng

More information

III. Econometric Methodology Regression Analysis

III. Econometric Methodology Regression Analysis Page Econ07 Appled Econometrcs Topc : An Overvew of Regresson Analyss (Studenmund, Chapter ) I. The Nature and Scope of Econometrcs. Lot s of defntons of econometrcs. Nobel Prze Commttee Paul Samuelson,

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Introduction to Regression

Introduction to Regression Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes

More information

STATISTICS QUESTIONS. Step by Step Solutions.

STATISTICS QUESTIONS. Step by Step Solutions. STATISTICS QUESTIONS Step by Step Solutons www.mathcracker.com 9//016 Problem 1: A researcher s nterested n the effects of famly sze on delnquency for a group of offenders and examnes famles wth one to

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2 Chapter 4 Smple Lnear Regresson Page. Introducton to regresson analyss 4- The Regresson Equaton. Lnear Functons 4-4 3. Estmaton and nterpretaton of model parameters 4-6 4. Inference on the model parameters

More information

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the Chapter 11 Student Lecture Notes 11-1 Lnear regresson Wenl lu Dept. Health statstcs School of publc health Tanjn medcal unversty 1 Regresson Models 1. Answer What Is the Relatonshp Between the Varables?.

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

e i is a random error

e i is a random error Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown

More information

Lecture 4 Hypothesis Testing

Lecture 4 Hypothesis Testing Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

More information

The Ordinary Least Squares (OLS) Estimator

The Ordinary Least Squares (OLS) Estimator The Ordnary Least Squares (OLS) Estmator 1 Regresson Analyss Regresson Analyss: a statstcal technque for nvestgatng and modelng the relatonshp between varables. Applcatons: Engneerng, the physcal and chemcal

More information

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction ECONOMICS 35* -- NOTE 7 ECON 35* -- NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

SIMPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION Smple Lnear Regresson and Correlaton Introducton Prevousl, our attenton has been focused on one varable whch we desgnated b x. Frequentl, t s desrable to learn somethng about the relatonshp between two

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department

More information

Statistics Chapter 4

Statistics Chapter 4 Statstcs Chapter 4 "There are three knds of les: les, damned les, and statstcs." Benjamn Dsrael, 1895 (Brtsh statesman) Gaussan Dstrbuton, 4-1 If a measurement s repeated many tmes a statstcal treatment

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

ANOVA. The Observations y ij

ANOVA. The Observations y ij ANOVA Stands for ANalyss Of VArance But t s a test of dfferences n means The dea: The Observatons y j Treatment group = 1 = 2 = k y 11 y 21 y k,1 y 12 y 22 y k,2 y 1, n1 y 2, n2 y k, nk means: m 1 m 2

More information

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION 014-015 MTH35/MH3510 Regresson Analyss December 014 TIME ALLOWED: HOURS INSTRUCTIONS TO CANDIDATES 1. Ths examnaton paper contans FOUR (4) questons

More information

Statistics MINITAB - Lab 2

Statistics MINITAB - Lab 2 Statstcs 20080 MINITAB - Lab 2 1. Smple Lnear Regresson In smple lnear regresson we attempt to model a lnear relatonshp between two varables wth a straght lne and make statstcal nferences concernng that

More information

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students. PPOL 59-3 Problem Set Exercses n Smple Regresson Due n class /8/7 In ths problem set, you are asked to compute varous statstcs by hand to gve you a better sense of the mechancs of the Pearson correlaton

More information

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA Sngle classfcaton analyss of varance (ANOVA) When to use ANOVA ANOVA models and parttonng sums of squares ANOVA: hypothess testng ANOVA: assumptons A non-parametrc alternatve: Kruskal-Walls ANOVA Power

More information

17 - LINEAR REGRESSION II

17 - LINEAR REGRESSION II Topc 7 Lnear Regresson II 7- Topc 7 - LINEAR REGRESSION II Testng and Estmaton Inferences about β Recall that we estmate Yˆ ˆ β + ˆ βx. 0 μ Y X x β0 + βx usng To estmate σ σ squared error Y X x ε s ε we

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unt 10: Smple Lnear Regresson and Correlaton Statstcs 571: Statstcal Methods Ramón V. León 6/28/2004 Unt 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regresson analyss s a method for studyng the

More information

Chapter 15 - Multiple Regression

Chapter 15 - Multiple Regression Chapter - Multple Regresson Chapter - Multple Regresson Multple Regresson Model The equaton that descrbes how the dependent varable y s related to the ndependent varables x, x,... x p and an error term

More information

Midterm Examination. Regression and Forecasting Models

Midterm Examination. Regression and Forecasting Models IOMS Department Regresson and Forecastng Models Professor Wllam Greene Phone: 22.998.0876 Offce: KMC 7-90 Home page: people.stern.nyu.edu/wgreene Emal: wgreene@stern.nyu.edu Course web page: people.stern.nyu.edu/wgreene/regresson/outlne.htm

More information

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II PubH 7405: REGRESSION ANALSIS SLR: INFERENCES, Part II We cover te topc of nference n two sessons; te frst sesson focused on nferences concernng te slope and te ntercept; ts s a contnuaton on estmatng

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VIII LECTURE - 34 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS EFFECTS MODEL Dr Shalabh Department of Mathematcs and Statstcs Indan

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected. ANSWERS CHAPTER 9 THINK IT OVER thnk t over TIO 9.: χ 2 k = ( f e ) = 0 e Breakng the equaton down: the test statstc for the ch-squared dstrbuton s equal to the sum over all categores of the expected frequency

More information

Properties of Least Squares

Properties of Least Squares Week 3 3.1 Smple Lnear Regresson Model 3. Propertes of Least Squares Estmators Y Y β 1 + β X + u weekly famly expendtures X weekly famly ncome For a gven level of x, the expected level of food expendtures

More information

Diagnostics in Poisson Regression. Models - Residual Analysis

Diagnostics in Poisson Regression. Models - Residual Analysis Dagnostcs n Posson Regresson Models - Resdual Analyss 1 Outlne Dagnostcs n Posson Regresson Models - Resdual Analyss Example 3: Recall of Stressful Events contnued 2 Resdual Analyss Resduals represent

More information

Correlation and Regression

Correlation and Regression Correlaton and Regresson otes prepared by Pamela Peterson Drake Index Basc terms and concepts... Smple regresson...5 Multple Regresson...3 Regresson termnology...0 Regresson formulas... Basc terms and

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

β0 + β1xi. You are interested in estimating the unknown parameters β

β0 + β1xi. You are interested in estimating the unknown parameters β Ordnary Least Squares (OLS): Smple Lnear Regresson (SLR) Analytcs The SLR Setup Sample Statstcs Ordnary Least Squares (OLS): FOCs and SOCs Back to OLS and Sample Statstcs Predctons (and Resduals) wth OLS

More information

Chapter 12 Analysis of Covariance

Chapter 12 Analysis of Covariance Chapter Analyss of Covarance Any scentfc experment s performed to know somethng that s unknown about a group of treatments and to test certan hypothess about the correspondng treatment effect When varablty

More information

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 1 Chapters 14, 15 & 16 Professor Ahmad, Ph.D. Department of Management Revsed August 005 Chapter 14 Formulas Smple Lnear Regresson Model: y =

More information

Topic- 11 The Analysis of Variance

Topic- 11 The Analysis of Variance Topc- 11 The Analyss of Varance Expermental Desgn The samplng plan or expermental desgn determnes the way that a sample s selected. In an observatonal study, the expermenter observes data that already

More information

The SAS program I used to obtain the analyses for my answers is given below.

The SAS program I used to obtain the analyses for my answers is given below. Homework 1 Answer sheet Page 1 The SAS program I used to obtan the analyses for my answers s gven below. dm'log;clear;output;clear'; *************************************************************; *** EXST7034

More information

Topic 7: Analysis of Variance

Topic 7: Analysis of Variance Topc 7: Analyss of Varance Outlne Parttonng sums of squares Breakdown the degrees of freedom Expected mean squares (EMS) F test ANOVA table General lnear test Pearson Correlaton / R 2 Analyss of Varance

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

CHAPTER 8. Exercise Solutions

CHAPTER 8. Exercise Solutions CHAPTER 8 Exercse Solutons 77 Chapter 8, Exercse Solutons, Prncples of Econometrcs, 3e 78 EXERCISE 8. When = N N N ( x x) ( x x) ( x x) = = = N = = = N N N ( x ) ( ) ( ) ( x x ) x x x x x = = = = Chapter

More information

Chapter 15 Student Lecture Notes 15-1

Chapter 15 Student Lecture Notes 15-1 Chapter 15 Student Lecture Notes 15-1 Basc Busness Statstcs (9 th Edton) Chapter 15 Multple Regresson Model Buldng 004 Prentce-Hall, Inc. Chap 15-1 Chapter Topcs The Quadratc Regresson Model Usng Transformatons

More information

Chapter 4: Regression With One Regressor

Chapter 4: Regression With One Regressor Chapter 4: Regresson Wth One Regressor Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-1 Outlne 1. Fttng a lne to data 2. The ordnary least squares (OLS) lne/regresson 3. Measures of ft 4. Populaton

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav; ctvty #3: Smple Lnear Regresson Resources: actgpa.sav; beer.sav; http://mathworld.wolfram.com/leastfttng.html In the last actvty, we learned how to quantfy the strength of the lnear relatonshp between

More information

F statistic = s2 1 s 2 ( F for Fisher )

F statistic = s2 1 s 2 ( F for Fisher ) Stat 4 ANOVA Analyss of Varance /6/04 Comparng Two varances: F dstrbuton Typcal Data Sets One way analyss of varance : example Notaton for one way ANOVA Comparng Two varances: F dstrbuton We saw that the

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Cathy Walker March 5, 2010

Cathy Walker March 5, 2010 Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 14 Multple Regresson Models 1999 Prentce-Hall, Inc. Chap. 14-1 Chapter Topcs The Multple Regresson Model Contrbuton of Indvdual Independent Varables

More information

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University PHYS 45 Sprng semester 7 Lecture : Dealng wth Expermental Uncertantes Ron Refenberger Brck anotechnology Center Purdue Unversty Lecture Introductory Comments Expermental errors (really expermental uncertantes)

More information

STAT 511 FINAL EXAM NAME Spring 2001

STAT 511 FINAL EXAM NAME Spring 2001 STAT 5 FINAL EXAM NAME Sprng Instructons: Ths s a closed book exam. No notes or books are allowed. ou may use a calculator but you are not allowed to store notes or formulas n the calculator. Please wrte

More information

Chapter 3 Describing Data Using Numerical Measures

Chapter 3 Describing Data Using Numerical Measures Chapter 3 Student Lecture Notes 3-1 Chapter 3 Descrbng Data Usng Numercal Measures Fall 2006 Fundamentals of Busness Statstcs 1 Chapter Goals To establsh the usefulness of summary measures of data. The

More information

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class. Socology 30 Bvarate Regresson Lyng Luo 04.28 Clarfcaton Last exam (Exam #4) s on May 7, n class. No exam n the fnal exam week (May 24). Regresson Regresson Analyss: the procedure for esjmajng and tesjng

More information