Unit 9 Regression and Correlation

Size: px
Start display at page:

Download "Unit 9 Regression and Correlation"

Transcription

1 BIOSTATS Fall 05 Regressio ad Correlatio Page of 44 Uit 9 Regressio ad Correlatio Assume that a statistical model such as a liear model is a good first start oly - Gerald va Belle Is higher blood pressure i the mom associated with a lower birth weight of her baby? Simple liear regressio explores the relatioship of oe cotiuous outcome (Y=birth weight) with oe cotiuous predictor (X=blood pressure). At the heart of statistics is the fittig of models to observed data followed by a examiatio of how they perform. -- somewhat useful The fitted model is a sufficietly good fit to the data if it permits exploratio of hypotheses such as higher blood pressure durig pregacy is associated with statistically sigificat lower birth weight ad it permits assessmet of cofoudig, effect modificatio, ad mediatio. These are ideas that will be developed i BIOSTATS 640 Uit, Multivariable Liear Regressio. -- more useful The fitted model ca be used to predict the outcomes of future observatios. For example, we might be iterested i predictig the birth weight of the baby bor to a mom with systolic blood pressure 45 mm Hg. -3- most useful Sometimes, but ot so much i public health, the fitted model derives from a physical-equatio. A example is Michaelis-Meto kietics. A Michaelis-Meto model is fit to the data for the purpose of estimatig the actual rate of a particular chemical reactio. Hece A liear model is a good first start oly Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

2 BIOSTATS Fall 05 Regressio ad Correlatio Page of 44 Table of Cotets Topic. Uit Roadmap. Learig Objectives. 3. Defiitio of the Liear Regressio Model.. 4. Estimatio The Aalysis of Variace Table. 6. Assumptios for the Straight Lie Regressio. 7. Hypothesis Testig Cofidece Iterval Estimatio Itroductio to Correlatio.. 0. Hypothesis Test for Correlatio Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

3 BIOSTATS Fall 05 Regressio ad Correlatio Page 3 of 44. Uit Roadmap / Populatios Simple liear regressio is used whe there is oe respose (depedet, Y) variable ad oe explaatory (idepedet, X) variables ad both are cotiuous. Uit 9. Regressio & Correlatio Relatioships Modelig Aalysis/ Sythesis Examples of explaatory (idepedet) respose (depedet) variable pairs are height ad weight, age ad blood pressure, etc -- A simple liear regressio aalysis begis with a scatterplot of the data to see if a straight lie model is appropriate: y = β 0 + βx where Y = the respose or depedet variable X = the explaatory or idepedet variable. -- The sample data are used to estimate the parameter values ad their stadard errors. β = slope (the chage i y per uit chage i x) β 0 = itercept (the value of y whe x=0) -3- The fitted model is the compared to the simpler model y = β 0 which says that y is ot liearly related to x. Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

4 BIOSTATS Fall 05 Regressio ad Correlatio Page 4 of 44. Learig Objectives Whe you have fiished this uit, you should be able to: Explai what is meat by idepedet versus depedet variable ad what is meat by a liear relatioship; Produce ad iterpret a scatterplot; Defie ad explai the itercept ad slope parameters of a liear relatioship; Explai the theory of least squares estimatio of the itercept ad slope parameters of a liear relatioship; Calculate by had least squares estimatio of the itercept ad slope parameters of a liear relatioship; Explai the theory of the aalysis of variace of simple liear regressio; Calculate by had the aalysis of variace of simple liear regressio; Explai, compute, ad iterpret R i the cotext of simple liear regressio; State ad explai the assumptios required for estimatio ad hypothesis tests i regressio; Explai, compute, ad iterpret the overall F-test i simple liear regressio; Iterpret the computer output of a simple liear regressio aalysis from a package such as R, Stata, SAS, SPSS, Miitab, etc.; Defie ad iterpret the value of a Pearso Product Momet Correlatio, r ; Explai the relatioship betwee the Pearso product momet correlatio r ad the liear regressio slope parameter; ad Calculate by had cofidece iterval estimatio ad statistical hypothesis testig of the Pearso product momet correlatio r. Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

5 BIOSTATS Fall 05 Regressio ad Correlatio Page 5 of Defiitio of the Liear Regressio Model Uit 8 cosidered two categorical (discrete) variables, such as smokig (yes/o) ad low birth weight (yes/o). It was a itroductio to chi-square tests of associatio. Uit 9 cosiders two cotiuous variables, such as age ad weight. It is a itroductio to simple liear regressio ad correlatio. A woderful itroductio to the ituitio of liear regressio ca be foud i the text by Freedma, Pisai, ad Purves (Statistics. WW Norto & Co., 978). The followig is excerpted from pp 46 ad 48 of their text: How is weight related to height? For example, there were 4 me aged 8 to 4 i Cycle I of the Health Examiatio Survey. Their average height was 5 feet 8 iches = 68 iches, with a overall average weight of 58 pouds. But those me who were oe ich above average i height had a somewhat higher average weight. Those me who were two iches above average i height had a still higher average weight. Ad so o. O the average, how much of a icrease i weight is associated with each uit icrease i height? The best way to get started is to look at the scattergram for these heights ad weights. The object is to see how weight depeds o height, so height is take as the idepedet variable ad plotted horizotally The regressio lie is to a scatter diagram as the average is to a list. The regressio lie estimates the average value for the depedet variable correspodig to each value of the idepedet variable. Liear Regressio Liear regressio models the mea µ = E [Y] of oe radom variable Y as a liear fuctio of oe or more other variables (called predictors or explaatory variables) that are treated as fixed. The estimatio ad hypothesis testig ivolved are extesios of ideas ad techiques that we have already see. I liear regressio, Y is the outcome or depedet variable that we observe. We observe its values for idividuals with various combiatios of values of a predictor or explaatory variable X. There may be more tha oe predictor X ; this will be discussed i BIOSTATS 640. I simple liear regressio the values of the predictor X are assumed to be fixed. Ofte, however, the variables Y ad X are both radom variables. Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

6 BIOSTATS Fall 05 Regressio ad Correlatio Page 6 of 44 Correlatio Correlatio cosiders the associatio of two radom variables. The techiques of estimatio ad hypothesis testig are the same for liear regressio ad correlatio aalyses. Explorig the relatioship begis with fittig a lie to the poits. Developmet of a simple liear regressio model aalysis Example. Source: Kleibaum, Kupper, ad Muller 988 The followig are observatios of age (days) ad weight (kg) for = chicke embryos. Notatio WT=Y AGE=X LOGWT=Z The data are pairs of (X i, Y i ) where X=AGE ad Y=WT (X, Y ) = (6,.09) (X, Y ) = (6,.8) ad This table also provides pairs of (X i, Z i ) where X=AGE ad Z=LOGWT (X, Z ) = (6, -.538) (X, Z ) = (6, 0.449) Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

7 BIOSTATS Fall 05 Regressio ad Correlatio Page 7 of 44 Research questio There are a variety of possible research questios: () Does weight chage with age? () I the laguage of aalysis of variace we are askig the followig: Ca the variability i weight be explaied, to a sigificat extet, by variatios i age? (3) What is a good fuctioal form that relates age to weight? Tip! Begi with a Scatter plot. Here we plot X=AGE versus Y=WT We check ad lear about the followig: The average ad media of X The rage ad patter of variability i X The average ad media of Y The rage ad patter of variability i Y The ature of the relatioship betwee X ad Y The stregth of the relatioship betwee X ad Y The idetificatio of ay poits that might be ifluetial Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

8 BIOSTATS Fall 05 Regressio ad Correlatio Page 8 of 44 Example, cotiued The plot suggests a relatioship betwee AGE ad WT A straight lie might fit well, but aother model might be better We have adequate rages of values for both AGE ad WT There are o outliers The bowl shape of our scatter plot suggests that perhaps a better model relates the logarithm of WT (Z=LOGWT) to AGE: Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

9 BIOSTATS Fall 05 Regressio ad Correlatio Page 9 of 44 We might have gotte ay of a variety of plots. y.5 No relatioship betwee X ad Y x 0 8 y 6 4 Liear relatioship betwee X ad Y x 5 0 y3 5 No-liear relatioship betwee X ad Y x Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

10 BIOSTATS Fall 05 Regressio ad Correlatio Page 0 of 44 y Note the outlyig poit Here, a fit of a liear model will yield a estimated slope that is spuriously o-zero x y Note the outlyig poit Here, a fit of a liear model will yield a estimated slope that is spuriously ear zero x y Note the outlyig poit Here, a fit of a liear model will yield a estimated slope that is spuriously high x Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

11 BIOSTATS Fall 05 Regressio ad Correlatio Page of 44 Review of the Straight Lie Way back whe, i your high school days, you may have bee itroduced to the straight lie fuctio, defied as y = mx + b where m is the slope ad b is the itercept. Nothig ew here. All we re doig is chagig the otatio a bit: () Slope : m à β () Itercept: b à β 0 Slope Slope > 0 Slope = 0 Slope < 0 Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

12 BIOSTATS Fall 05 Regressio ad Correlatio Page of 44 Defiitio of the Straight Lie Model Y = β 0 + β X Populatio Y = β 0 + βx + ε Y = β 0 + βx + ε = relatioship i the populatio. Y = β 0 + βx is measured with error ε defied ε = [Y] - [β 0 + βx] β 0, β ad ε are all ukow!! Y = β ˆ + βx ˆ + e 0 β, ˆ ˆ 0 β ad e are estimates of β 0, β ad ε Note: So you kow, these may also be writte as b 0, b, ad e residual = e is ow the differece betwee the observed ad the fitted (ot the true) e = [Y] - [β ˆ + βx] ˆ 0 We obtai guesses of these ukows, called β, ˆ β ˆ ad e by the method of least squares 0 estimatio. Notatio sorry Y = the outcome or depedet variable X = the predictor or idepedet variable β, ˆ β ˆ ad e are kow 0 How close did we get? To see if ˆβ 0 β ad 0 ˆβ βwe perform regressio diagostics. Regressio diagostics are discussed i BIOSTATS 640 µ Y = The expected value of Y for all persos i the populatio µ Y X=x = The expected value of Y for the sub-populatio for whom X=x σ Y σ Y X=x = Variability of Y amog all persos i the populatio = Variability of Y for the sub-populatio for whom X=x Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

13 BIOSTATS Fall 05 Regressio ad Correlatio Page 3 of Estimatio Least squares estimatio is used to obtai guesses of β 0 ad β. Whe the outcome = Y is distributed ormal, least squares estimatio is the same as maximum likelihood estimatio. Note If you are ot familiar with maximum likelihood estimatio, do t worry. This is itroduced i BIOSTATS 640. Least Squares, Close ad Least Squares Estimatio Theoretically, it is possible to draw may lies through a X-Y scatter of poits. Which to choose? Least squares estimatio is oe approach to choosig a lie that is closest to the data. d i = [observed Y - fitted! Y ] for the i th perso Perhaps we d like d i = [observed Y - fitted! Y ] = smallest possible. Note that this is a vertical distace, sice it is a distace o the vertical axis. d i = Y i Ŷi Better yet, perhaps we d like to miimize the squared differece: d = [observed Y - fitted! Y ] = smallest possible i Glitch. We ca t miimize each commo values of 0 d = d = d i β ˆ ad β ˆ that miimizes. d = Y ˆ separately. I particular, it is ot possible to choose ( Y Y ˆ ) for subject ad miimizes ( Y Y ˆ ) for subject ad miimizes ad miimizes ( Y ) for the th subject So, istead, we choose values for 0 β ˆ ad β ˆ that, upo isertio, miimizes the total d i = ( ) [ ] ( ) Y Yˆ = Y ˆ β + ˆ β X i i i 0 i i= i= Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

14 BIOSTATS Fall 05 Regressio ad Correlatio Page 4 of 44 d = i ( Y i Yˆ ) = Y i i ˆβ 0 + ˆβ X i has a variety of ames: ( ) residual sum of squares, SSE or SSQ(residual) sum of squares about the regressio lie sum of squares due error (SSE) σ Y X! Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

15 BIOSTATS Fall 05 Regressio ad Correlatio Page 5 of 44 Least Squares Estimatio of the Slope ad Itercept I case you re iterested. Cosider SSE = d i = Y ˆ i Y i = Y i ˆβ 0 + ˆβ X i ( ) ( ) Step #: Differetiate with respect to ˆβ Set derivative equal to 0 ad solve for ˆβ. Step #: Differetiate with respect to ˆβ 0 Set derivative equal to 0, isert ˆβ ad solve for ˆβ 0. Least Squares Estimatio Solutios Note the estimates are deoted either usig greek letters with a caret or with roma letters Estimate of Slope ˆβ or b ˆβ i i= = ( X X )( Y Y ) ( X X ) i i= i Itercept ˆβ or 0 b 0! β = Y! β X 0 Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

16 BIOSTATS Fall 05 Regressio ad Correlatio Page 6 of 44 A closer look Some very helpful prelimiary calculatios ( ) Sxx = X-X = X NX ( ) Syy = Y-Y = Y NY xy ( ) S = X-X (Y-Y) = XY NXY Note - These expressios make use of a summatio otatio, itroduced i Uit. The capitol S idicates summatio. I S xy, the first subscript x is sayig (x-x). The secod subscript y is sayig (y-y). S xy = ( ) X-X (Y-Y) S subscript x subscript y Slope ˆβ = ( X i X) Y i Y ( ) ( X i X) = côv( X,Y ) vâr(x) ˆ S β = S xy xx Itercept! β = Y! β X 0 Predictio of Y Ŷ= ˆβ 0 + ˆβ X = b 0 + b X Populatio/ Do these estimates make sese? Relatioships/ Modelig Aalysis/ Sythesis

17 BIOSTATS Fall 05 Regressio ad Correlatio Page 7 of 44 Slope ˆβ = ( X i X) Y i Y ( ) ( X i X) = côv( X,Y ) vâr(x) The liear movemet i Y with liear movemet i X is measured relative to the variability i X.!β = 0 says: With a uit chage i X, overall there is a chace that Y icreases versus decreases!β 0 says: With a uit icrease i X, Y icreases also (! β > 0) or Y decreases (! β < 0). Itercept! β = Y! β X If the liear model is icorrect, 0 or, if the true model does ot have a liear compoet, we obtai!β = 0 ad! β 0 = Y as our best guess of a ukow Y Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

18 BIOSTATS Fall 05 Regressio ad Correlatio Page 8 of 44 Illustratio i Stata Y=WT ad X=AGE. regress y x Partial listig of output y Coef. Std. Err. t P> t [95% Cof. Iterval] x _cos Aotated y = WEIGHT Coef. Std. Err. t P> t [95% Cof. Iterval] x = AGE = b _cos = Itercept = b The fitted lie is therefore WT = *AGE. It says that each uit icrease i AGE of day is estimated to predict a icrease i weight, WT. Here is a overlay of the fitted lie o our scatterplot. 3.0 Scatter Plot of WT vs AGE.4.8 WT AGE Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

19 BIOSTATS Fall 05 Regressio ad Correlatio Page 9 of 44 As we might have guessed, the straight lie model may ot be the best choice. The bowl shape of the scatter plot does have a liear compoet, however. Without the plot, we might have believed the straight lie fit is okay. Illustratio i Stata- cotiued Z=LOGWT ad X=AGE. regress z x Partial listig of output z Coef. Std. Err. t P> t [95% Cof. Iterval] x _cos Aotated Z = LOGWT Coef. Std. Err. t P> t [95% Cof. Iterval] x = AGE = b _cos = INTERCEPT = b Thus, the fitted lie is LOGWT = *AGE Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

20 BIOSTATS Fall 05 Regressio ad Correlatio Page 0 of 44 Now the overlay plot looks better: Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

21 BIOSTATS Fall 05 Regressio ad Correlatio Page of 44 Predictio of Weight from Height Source: Dixo ad Massey (969) Now You Try Idividual Height (X) Weight (Y) Prelimiary calculatios X = Y = X i = 49,068 Y i = 46,00 X i Y i = 09,380 S xx = S yy = 5, S xy = Slope ˆ S β = S xy xx ˆ β = = Itercept! β = Y! β X ˆ β (5.09)( = 0 = Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

22 BIOSTATS Fall 05 Regressio ad Correlatio Page of The Aalysis of Variace Table Recall the sample variace itroduced i I Uit, Summarizig. Yi Y. i= The umerator of the sample variace (S ) of the Y data is ( ) This same quatity ( ) i= ( Y ) i Y Yi Y is a cetral figure i regressio. It has a ew ame, several actually. i= = total variace of the Y s. = total sum of squares, = total, corrected, ad = SSY. (Note corrected refers to subtractig the mea before squarig.) The aalysis of variace tables is all about ( ) Yi Y ad partitioig it ito two compoets i=. Due residual (the idividual Y about the idividual predictio Ŷ). Due regressio (the predictio Ŷ about the overall mea Y) Here is the partitio (Note Look closely ad you ll see that both sides are the same) ( Y i Y ) = ( Y i ) Ŷi + ( Ŷi Y ) Some algebra (ot show) reveals a ice partitio of the total variability. ( Y Y) = ( Y Yˆ) + ( Yˆ Y) i i i i Total Sum of Squares = Due Error Sum of Squares + Due Model Sum of Squares Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

23 BIOSTATS Fall 05 Regressio ad Correlatio Page 3 of 44 A closer look Total Sum of Squares = Due Model Sum of Squares + Due Error Sum of Squares ( Y i Y ) ( ) + ( Y i Y ˆ i ) = Yi ˆ Y due model sum of squares due error sum of squares ( Y i Y ) = deviatio of Y i from Y that is to be explaied Yi ˆ Y ( ) = due model, sigal, systematic, due regressio ( Y i Y ˆ i ) = due error, oise, or residual We seek to explai the total variability ( Y i Y ) with a fitted model: What happes whe β 0? What happes whe β = 0? A straight lie relatioship is helpful A straight lie relatioship is ot helpful Best guess is ˆ Y = ˆβ 0 + ˆβ X Best guess is ˆ Y = ˆβ 0 = Y Due model sum of squares teds to be LARGE because ( Yˆ Y ) = ( ˆβ0 + ˆβ X Y ) Due error sum of squares teds to be early the TOTAL because ( Y Y ˆ ) = Y ˆβ 0 ( ) = Y Y ( ) = Y ˆβ X + ˆβ X Y = ˆβ ( X X) Due error sum of squares has to be small à due(model) due error ( ) will be large Due regressio sum of squares has to be small à due model ( ) due( error) will be small Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

24 BIOSTATS Fall 05 Regressio ad Correlatio Page 4 of 44 How to Partitio the Total Variace Thik: carvig a pie ito wedges/pieces: (explaied) + (remaider). total pie The total or total, corrected refers to the variability of Y about Y ( Y i Y ) is called the total sum of squares Degrees of freedom = df = (-) Divisio of the total sum of squares by its df yields the total mea square. carve out the piece of the pie explaied by the model The regressio or due model refers to the variability of! Y about Y ( Yi ˆ Y ) = ˆβ ( X i X) is called the regressio sum of squares Degrees of freedom = df = Divisio of the regressio sum of squares by its df yields the regressio mea square or model mea square. It is a example of a variace compoet. 3. the remaider of the pie The residual or due error refers to the variability of Y about! Y ( Y i Y ˆ i ) is called the residual sum of squares Degrees of freedom = df = (-) Divisio of the residual sum of squares by its df yields the residual mea square. Source df Sum of Squares Mea Square Regressio due model SSR = ( Yi ˆ Y ) SSR/ Residual due error Total, corrected (-) (-) Tip! Mea square = (Sum of squares)/(degrees of freedom,df) SSE = SST = ( Y i Y ˆ i ) ( Y i Y ) SSE/(-) Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

25 BIOSTATS Fall 05 Regressio ad Correlatio Page 5 of 44 Be careful! The questio we may ask from a aalysis of variace table is a limited oe. Does the fit of the straight lie model explai a sigificat portio of the variability of the idividual Y about Y? Is this fitted model better tha usig Y aloe? We are NOT askig: Is the choice of the straight lie model correct? or Would aother fuctioal form be a better choice? We ll use a hypothesis test approach (aother proof by cotradictio reasoig just like we did i Uit 7!). Assume, provisioally, the othig is goig o ull hypothesis that says β = 0 ( o liear relatioship ) Use least squares estimatio to estimate a closest lie The aalysis of variace table provides a compariso of the due regressio mea square to the residual mea square Where does least squares estimatio take us, vis a vis the slope β? If β 0 The due (regressio)/due (residual) will be LARGE If β = 0 The due (regressio)/due (residual) will be SMALL Our p-value calculatio will aswer the questio: If the ull hypothesis is true ad β = 0 truly, what were the chaces of obtaiig a value of due (regressio)/due (residual) as larger or larger tha that observed? To calculate chaces of extremeess uder some assumed ull hypothesis we eed a ull hypothesis probability model! But did you otice? So far, we have ot actually used oe! Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

26 BIOSTATS Fall 05 Regressio ad Correlatio Page 6 of Assumptios for a Straight Lie Regressio Aalysis I performig least squares estimatio, we did ot use a probability model. We were doig geometry. Cofidece iterval estimatio ad hypothesis testig require some assumptios ad a probability model. Here you go! Assumptios for Simple Liear Regressio The separate observatios Y, Y,, Y are idepedet. The values of the predictor variable X are fixed ad measured without error. For each value of the predictor variable X=x, the distributio of values of Y follows a ormal distributio with mea equal to µ Y X=x ad commo variace equal to σ Y x. The separate meas µ Y X=x lie o a straight lie; that is µ Y X=x = β 0 + β X At each value of X, there is a populatio of Y for persos with X=x Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

27 BIOSTATS Fall 05 Regressio ad Correlatio Page 7 of 44 With these assumptios, we ca assess the sigificace of the variace explaied by the model. F = msq(model) msq(residual) with df =, (-) β = 0 β 0 Due model MSR has expected value σ Y X Due residual MSE has expected value σ Y X Due model MSR has expected value σ Y X + β ( X i X) Due residual MSE has expected value σ Y X F = (MSR)/MSE will be close to F = (MSR)/MSE will be LARGER tha We obtai the aalysis of variace table for the model of Z=LOGWT to X=AGE: Stata illustratio with aotatios i red. Source SS df MS Number of obs = F(, 9) = = MSQ(model)/MSQ(residual) Model Prob > F = = p-value for Overall F Test Residual R-squared = = SSQ(model)/SSQ(TOTAL) Adj R-squared = = R ajusted for ad # of X Total Root MSE =.0807 = Sqaure root of MSQ(residual) Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

28 BIOSTATS Fall 05 Regressio ad Correlatio Page 8 of 44 This output correspods to the followig. Note I this example our depedet variable is actually Z, ot Y. Source Df Sum of Squares Mea Square Regressio due model Residual due error (-) = 9 Total, corrected (-) = 0 ( ) SSR = Ẑ i - Z = SSR/ = msr = mea square regressio ( ) SSE = Z i - Ẑi = SSE/(-) = 7/838E-04 mse = mea square error SST = Z i - Z = ( ) Other iformatio i this output: R-SQUARED = [(Sum of squares regressio)/(sum of squares total)] = proportio of the total that we have bee able to explai with the fit = percet of variace explaied by the model - Be careful! As predictors are added to the model, R-SQUARED ca oly icrease. Evetually, we eed to adjust this measure to take this ito accout. See ADJUSTED R-SQUARED. We also get a overall F test of the ull hypothesis that the simple liear model does ot explai sigificatly more variability i LOGWT tha the average LOGWT. F = MSQ (Regressio)/MSQ (Residual) = 4.063/ = with df =, 9 p-value = achieved sigificace < This is a highly ulikely outcome! à Reject H O. Coclude that the fitted lie explais statistically sigificatly more of the variability i Z=LOGWT tha is explaied by the itercept-oly ull hypothesis model. Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

29 BIOSTATS Fall 05 Regressio ad Correlatio Page 9 of Hypothesis Testig Straight Lie Model: Y = β 0 + β X ) Overall F-Test Research Questio: Does the fitted model, the! Y, explai sigificatly more of the total variability of the Y about Y tha does Y? A bit of clarificatio here, i case you re woderig. Whe the ull hypothesis is true, at least two thigs happe: () β = 0 ad () the correct model (the ull oe) says Y = β 0 + error. I this situatio, the least squares estimate of β 0 turs out to be Y (that seems reasoable, right?) Assumptios: As before. H O ad H A : H: β = 0 O H : β 0 A Test Statistic: F = msq(regresio) msq(residual) df =,( ) Evaluatio rule: Whe the ull hypothesis is true, the value of F should be close to. Alteratively, whe β 0, the value of F will be LARGER tha. Thus, our p-value calculatio aswers: What are the chaces of obtaiig our value of the F or oe that is larger if we believe the ull hypothesis that β = 0? Calculatios: For our data, we obtai p-value = pr F,(-) msq(model) msq(residual) b =0 = pr F ,9 <<.000 Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

30 BIOSTATS Fall 05 Regressio ad Correlatio Page 30 of 44 Evaluate: Assumptio of the ull hypothesis that β = 0 has led to a extremely ulikely outcome (F-statistic value of ), with chaces of beig observed less tha chace i 0,000. The ull hypothesis is rejected. Iterpret: We have leared that, at least, the fitted straight lie model does a much better job of explaiig the variability i Z = LOGWT tha a model that allows oly for the average LOGWT. later (BIOSTATS 640, Itermediate Biostatistics), we ll see that the aalysis does ot stop here ) Test of the Slope, β Notes - The overall F test ad the test of the slope are equivalet. The test of the slope uses a t-score approach to hypothesis testig It ca be show that { t-score for slope } = { overall F } Research Questio: Is the slope β = 0? Assumptios: As before. H O ad H A : H O :β = 0 H A :β 0 Test Statistic: To compute the t-score, we eed a estimate of the stadard error of ˆβ SÊ ˆβ ( ) = msq(residual) ( X i X) Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

31 BIOSTATS Fall 05 Regressio ad Correlatio Page 3 of 44 Our t-score is therefore: ( ) ( expected ) ( ) observed t score = sê expected df = ( ) = ( ˆβ ) 0 sê ˆβ ( ) ( ) We ca fid this iformatio i our Stata output. Aotatios are i red z Coef. Std. Err. t = Coef/Std. Err. P> t [95% Cof. Iterval] x = / _cos Recall what we mea by a t-score: t=73.38 says the estimated slope is estimated to be stadard error uits away from the ull hypothesis expected value of zero. Check that { t-score } = { Overall F }: Evaluatio rule: [ ] = which is close. Whe the ull hypothesis is true, the value of t should be close to zero. Alteratively, whe β 0, the value of t will be DIFFERENT from 0. Here, our p-value calculatio aswers: Uder the assumptio of the ull hypothesis that β = 0, what were our chaces of obtaiig a t-statistic value stadard error uits away from its ull hypothesis expected value of zero? Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

32 BIOSTATS Fall 05 Regressio ad Correlatio Page 3 of 44 Calculatios: For our data, we obtai p-value = pr ˆβ t ( ) 0 sê ˆβ ( ) = pr t [ ] <<.000 Evaluate: Uder the ull hypothesis that β = 0, the chaces of obtaiig a t-score value that is or more stadard error uits away from the expected value of 0 is less tha chace i 0,000. Iterpret: The iferece is the same as that for the overall F test. The fitted straight lie model does a statistically sigificatly better job of explaiig the variability i LOGWT tha the sample mea. 3) Test of the Itercept, β 0 This addresses the questio: Does the straight lie relatioship passes through the origi? It is rarely of iterest. Research Questio: Is the itercept β 0 = 0? Assumptios: As before. H O ad H A : H O :β 0 = 0 H A :β 0 0 Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

33 BIOSTATS Fall 05 Regressio ad Correlatio Page 33 of 44 Test Statistic: To compute the t-score for the itercept, we eed a estimate of the stadard error of ˆβ 0 ( ) = msq(residual) + X SÊ ˆβ 0 ( X i X) Our t-score is therefore: ( ) ( expected ) ( ) observed t score = sê expected df = ( ) ( ˆβ 0 ) ( 0) = sê ( ˆβ 0 ) Agai, we ca fid this iformatio i our Stata output. Aotatios are i red z Coef. Std. Err. t = Coef/Std. Err. P> t [95% Cof. Iterval] x _cos = / Here, t = says the estimated itercept is estimated to be stadard error uits away from its ull hypothesis expected value of zero. Evaluatio rule: Whe the ull hypothesis is true, the value of t should be close to zero. Alteratively, whe β 0 0, the value of t will be DIFFERENT from 0. Our p-value calculatio aswers: Uder the assumptio of the ull hypothesis that β 0 = 0, what were our chaces of obtaiig a t-statistic value stadard error uits away from its ull hypothesis expected value of zero? Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

34 BIOSTATS Fall 05 Regressio ad Correlatio Page 34 of 44 Calculatios: p-value = pr ˆβ t ( ) 0 0 sê ˆβ 0 ( ) = pr t <<.000 Evaluate: Uder the ull hypothesis that the lie passes through the origi, that β 0 = 0, the chaces of obtaiig a t-score value that is or more stadard error uits away from the expected value of 0 is less tha chace i 0,000, agai promptig statistical rejectio of the ull hypothesis. Iterpret: The iferece is that there is statistically sigificat evidece that the straight lie relatioship betwee Z=LOGWT ad X=AGE does ot pass through the origi. Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

35 BIOSTATS Fall 05 Regressio ad Correlatio Page 35 of Cofidece Iterval Estimatio Straight Lie Model: Y = β 0 + β X The cofidece itervals here have the usual 3 elemets (for review, see agai Uit 6): ) Best sigle guess (estimate) ) Stadard error of the best sigle guess (SE[estimate]) 3) Cofidece coefficiet : This will be a percetile from the Studet t distributio with df=(-) We might wat cofidece iterval estimates of the followig 4 parameters: () Slope () Itercept (3) Mea of subset of populatio for whom X=x 0 (4) Idividual respose for perso for whom X=x 0 ) SLOPE estimate =! β ( ) = msq(residual) sê ˆb ( X i X) ) INTERCEPT estimate =! β 0 ( ) = msq(residual) + X sê ˆb 0 ( X i X) Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

36 BIOSTATS Fall 05 Regressio ad Correlatio Page 36 of 44 3) MEAN at X=x 0 estimate = Y! =! β +! β x X = x sê = msq(residual) ( + x 0 X) X i X ( ) 4) INDIVIDUAL with X=x 0 estimate = Y! =! β +! β x X = x Example, cotiued Z=LOGWT to X=AGE. Stata yielded the followig fit: sê = msq(residual) + ( + x 0 X) X i X ( ) z Coef. Std. Err. t P> t [95% Cof. Iterval] x ß 95% CI for Slope β _cos % Cofidece Iterval for the Slope, β ) Best sigle guess (estimate) = ˆ β = ) Stadard error of the best sigle guess (SE[estimate]) = ( ) se ˆ β = ) Cofidece coefficiet = 97.5 th percetile of Studet t = t df 95% Cofidece Iterval for Slope β = Estimate ± ( cofidece coefficiet )*SE =.., = = ± (.6)(0.0068) = (0.898, 0.09) Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

37 BIOSTATS Fall 05 Regressio ad Correlatio Page 37 of 44 95% Cofidece Iterval for the Itercept, β z Coef. Std. Err. t P> t [95% Cof. Iterval] x _cos ß 95% CI for itercept β ) Best sigle guess (estimate) = ˆ β 0 =.6895 ) Stadard error of the best sigle guess (SE[estimate]) = ( 0 ) se ˆ β = ) Cofidece coefficiet = 97.5 th percetile of Studet t = t df 95% Cofidece Iterval for Slope β 0 = Estimate ± ( cofidece coefficiet )*SE = ± (.6)( ) = (-.7585,-.600) =.., = Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

38 BIOSTATS Fall 05 Regressio ad Correlatio Page 38 of 44 For the brave Stata Example, cotiued Cofidece Itervals for MEAN of Z at Each Value of X.. * Fit Z to x. regress z x. * save fitted values xb (this is iteral to Stata) to a ew variable called zhat. predict zhat, xb. ** Obtai SE for MEAN of Z at each X (this is iteral to Stata) to a ew variable called semeaz. predict semeaz, stdp. ** Obtai cofidece coefficiet = 97.5th percetile of T o df=9. geerate tmult=ivttail(9,.05). ** Geerate lower ad upper 95% CI limits for MEAN of Z at Each X. geerate lowmeaz=zhat -tmult*semeaz. geerate highmeaz=zhat+tmult*semeaz. ** Geerate lower ad upper 95% CI limits for INDIVIDUAL PREDICTED Z at Each X. geerate lowpredictz=zhat-tmult*sepredictz. geerate highpredictz=zhat+tmult*sepredictz. list x z zhat lowmeaz highmeaz, clea x z zhat lowmeaz highmeaz Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

39 BIOSTATS Fall 05 Regressio ad Correlatio Page 39 of 44 Stata Example, cotiued Cofidece Itervals for INDIVIDUAL PREDICTED Z at Each Value of X.. * Fit Z to x. regress z x. *Save fitted values to a ew variable called zhat. predict zhat, xb. ** Obtai SE for INDIVIDUAL PREDICTION of Z at give X (iteral to Stata) to a ew variable sepredictz. predict sepredictz, stdf. ** Obtai cofidece coefficiet = 97.5th percetile of T o df=9. geerate tmult=ivttail(9,.05). ** Geerate lower ad upper 95% CI limits for INDIVIDUAL PREDICTED Z at Each X. geerate lowpredictz=zhat-tmult*sepredictz. geerate highpredictz=zhat+tmult*sepredictz. *** List Idividual Predictios with 95% CI Limits. list x z zhat lowpredictz highpredictz, clea x z zhat lowpred~z highpre~z Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

40 BIOSTATS Fall 05 Regressio ad Correlatio Page 40 of 44 Defiitio of Correlatio 9. Itroductio to Correlatio A correlatio coefficiet is a measure of the associatio betwee two paired radom variables (e.g. height ad weight). The Pearso product momet correlatio, i particular, is a measure of the stregth of the straight lie relatioship betwee the two radom variables. Aother correlatio measure (ot discussed here) is the Spearma correlatio. It is a measure of the stregth of the mootoe icreasig (or decreasig) relatioship betwee the two radom variables. The Spearma correlatio is a o-parametric (meaig model free) measure. It is itroduced i BIOSTATS 640, Itermediate Biostatistics. Formula for the Pearso Product Momet Correlatio ρ Populatio product momet correlatio = ρ based estimate = r. Some prelimiaries: () Suppose we are iterested i the correlatio betwee X ad Y () coˆv(x,y) = (x i x)(y i y) (-) = S xy (-) This is the covariace(x,y) (3) vaˆr(x) = (x i x) (-) = S xx (-) ad similarly (4) vaˆr(y) = (y i y) (-) = S yy (-) Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

41 BIOSTATS Fall 05 Regressio ad Correlatio Page 4 of 44 Formula for Estimate of Pearso Product Momet Correlatio from a ˆ ρ = r = ˆ cov(x,y) var(x)var(y) ˆ ˆ = S xy S S xx yy If you absolutely have to do it by had, a equivalet (more calculator/excel friedly formula) is ˆ ρ = r = xy i i xi yi xi y i xi y i The correlatio r ca take o values betwee 0 ad oly Thus, the correlatio coefficiet is said to be dimesioless it is idepedet of the uits of x or y. Sig of the correlatio coefficiet (positive or egative) = Sig of the estimated slope ˆβ. Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

42 BIOSTATS Fall 05 Regressio ad Correlatio Page 4 of 44 There is a relatioship betwee the slope of the straight lie, ˆβ, ad the estimated correlatio r. Relatioship betwee slope ˆβ ad the sample correlatio r Tip! This is very hady Because ˆ S xy β = ad Sxx r = S xy S S xx yy A little algebra reveals that r = S S xx yy ˆ β Thus, beware!!! It is possible to have a very large (positive or egative) r might accompayig a very o-zero slope, iasmuch as - A very large r might reflect a very large S xx, all other thigs equal - A very large r might reflect a very small S yy, all other thigs equal. Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

43 BIOSTATS Fall 05 Regressio ad Correlatio Page 43 of Hypothesis Test of Correlatio The ull hypothesis of zero correlatio is equivalet to the ull hypothesis of zero slope. Research Questio: Is the correlatio ρ = 0? Is the slope β = 0? Assumptios: As before. H O ad H A : H O : ρ = 0 H A : ρ 0 Test Statistic: A little algebra (ot show) yields a very ice formula for the t-score that we eed. t score = r (-) r df = ( ) We ca fid this iformatio i our output. Recall the first example ad the model of Z=LOGWT to X=AGE: The Pearso Correlatio, r, is the R-squared i the output. Source SS df MS Number of obs = F(, 9) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.0807 Pearso Correlatio, r = = Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

44 BIOSTATS Fall 05 Regressio ad Correlatio Page 44 of 44 Substitutio ito the formula for the t-score yields t score = r (-) = =.9974 r = 7.69 Note: The value.999 i the umerator is r= R =.9983 =.999 This is very close to the value of the t-score that was obtaied for testig the ull hypothesis of zero slope. The discrepacy is probably roudig error. I did the calculatios o my calculator usig 4 sigificat digits. Stata probably used more sigificat digits - cb. Populatio/ Relatioships/ Modelig Aalysis/ Sythesis

Unit 12 Simple Linear Regression and Correlation

Unit 12 Simple Linear Regression and Correlation BIOSTATS 540 - Fall 018 Simple Liear Regressio ad Correlatio Page 1 of 54 Uit 1 Simple Liear Regressio ad Correlatio Assume that a statistical model such as a liear model is a good first start oly - Gerald

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Simple Linear Regression

Simple Linear Regression Simple Liear Regressio 1. Model ad Parameter Estimatio (a) Suppose our data cosist of a collectio of pairs (x i, y i ), where x i is a observed value of variable X ad y i is the correspodig observatio

More information

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notatio Math 113 - Itroductio to Applied Statistics Name : Use Word or WordPerfect to recreate the followig documets. Each article is worth 10 poits ad ca be prited ad give to the istructor

More information

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y). Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +

More information

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS SIMPLE LINEAR REGRESSION AND CORRELATION ANALSIS INTRODUCTION There are lot of statistical ivestigatio to kow whether there is a relatioship amog variables Two aalyses: (1) regressio aalysis; () correlatio

More information

Regression, Inference, and Model Building

Regression, Inference, and Model Building Regressio, Iferece, ad Model Buildig Scatter Plots ad Correlatio Correlatio coefficiet, r -1 r 1 If r is positive, the the scatter plot has a positive slope ad variables are said to have a positive relatioship

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Lecture 11 Simple Linear Regression

Lecture 11 Simple Linear Regression Lecture 11 Simple Liear Regressio Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech Midterm 2 mea: 91.2 media: 93.75 std: 6.5 2 Meddicorp

More information

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y 1 Sociology 405/805 Revised February 4, 004 Summary of Formulae for Bivariate Regressio ad Correlatio Let X be a idepedet variable ad Y a depedet variable, with observatios for each of the values of these

More information

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N. 3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued) Worksheet 3 ( 11.5-11.8) Itroductio to Simple Liear Regressio (cotiued) This worksheet is a cotiuatio of Discussio Sheet 3; please complete that discussio sheet first if you have ot already doe so. This

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n. ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics BIOTAT 640 Itermediate Biostatistics Frequetly Asked Questios Topic FAQ Review of BIOTAT 540 Itroductory Biostatistics. I m cofused about the jargo ad otatio, especially populatio versus sample. Could

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700 Simple Regressio CS 7 Ackowledgemet These slides are based o presetatios created ad copyrighted by Prof. Daiel Measce (GMU) Basics Purpose of regressio aalysis: predict the value of a depedet or respose

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information

(all terms are scalars).the minimization is clearer in sum notation:

(all terms are scalars).the minimization is clearer in sum notation: 7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 017 MODULE 4 : Liear models Time allowed: Oe ad a half hours Cadidates should aswer THREE questios. Each questio carries

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Statistics 20: Final Exam Solutions Summer Session 2007

Statistics 20: Final Exam Solutions Summer Session 2007 1. 20 poits Testig for Diabetes. Statistics 20: Fial Exam Solutios Summer Sessio 2007 (a) 3 poits Give estimates for the sesitivity of Test I ad of Test II. Solutio: 156 patiets out of total 223 patiets

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43 PAPER NO.: 444, 445 PAGE NO.: Page 1 of 1 INSTRUCTIONS I. You have bee provided with: a) the examiatio paper i two parts (PART A ad PART B), b) a multiple choice aswer sheet (for PART A), c) selected formulae

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines) Dr Maddah NMG 617 M Statistics 11/6/1 Multiple egressio () (Chapter 15, Hies) Test for sigificace of regressio This is a test to determie whether there is a liear relatioship betwee the depedet variable

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +

More information

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators. IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor. Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat

More information

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions Chapter 11: Askig ad Aswerig Questios About the Differece of Two Proportios These otes reflect material from our text, Statistics, Learig from Data, First Editio, by Roxy Peck, published by CENGAGE Learig,

More information

Lesson 11: Simple Linear Regression

Lesson 11: Simple Linear Regression Lesso 11: Simple Liear Regressio Ka-fu WONG December 2, 2004 I previous lessos, we have covered maily about the estimatio of populatio mea (or expected value) ad its iferece. Sometimes we are iterested

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Chapter 12 Correlation

Chapter 12 Correlation Chapter Correlatio Correlatio is very similar to regressio with oe very importat differece. Regressio is used to explore the relatioship betwee a idepedet variable ad a depedet variable, whereas correlatio

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So, 0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical

More information

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments: Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal

More information

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2 Chapter 8 Comparig Two Treatmets Iferece about Two Populatio Meas We wat to compare the meas of two populatios to see whether they differ. There are two situatios to cosider, as show i the followig examples:

More information

Topic 18: Composite Hypotheses

Topic 18: Composite Hypotheses Toc 18: November, 211 Simple hypotheses limit us to a decisio betwee oe of two possible states of ature. This limitatio does ot allow us, uder the procedures of hypothesis testig to address the basic questio:

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is: PROBABILITY FUNCTIONS A radom variable X has a probabilit associated with each of its possible values. The probabilit is termed a discrete probabilit if X ca assume ol discrete values, or X = x, x, x 3,,

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Liear Regressio 2.1 Simple liear model The simple liear regressio model shows how oe kow depedet variable is determied by a sigle explaatory variable (regressor). Is is writte as: Y i

More information

STP 226 ELEMENTARY STATISTICS

STP 226 ELEMENTARY STATISTICS TP 6 TP 6 ELEMENTARY TATITIC CHAPTER 4 DECRIPTIVE MEAURE IN REGREION AND CORRELATION Liear Regressio ad correlatio allows us to examie the relatioship betwee two or more quatitative variables. 4.1 Liear

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech Fial Review Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech 1 Radom samplig model radom samples populatio radom samples: x 1,..., x

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Common Large/Small Sample Tests 1/55

Common Large/Small Sample Tests 1/55 Commo Large/Small Sample Tests 1/55 Test of Hypothesis for the Mea (σ Kow) Covert sample result ( x) to a z value Hypothesis Tests for µ Cosider the test H :μ = μ H 1 :μ > μ σ Kow (Assume the populatio

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times Sigificace level vs. cofidece level Agreemet of CI ad HT Lecture 13 - Tests of Proportios Sta102 / BME102 Coli Rudel October 15, 2014 Cofidece itervals ad hypothesis tests (almost) always agree, as log

More information

Ismor Fischer, 1/11/

Ismor Fischer, 1/11/ Ismor Fischer, //04 7.4-7.4 Problems. I Problem 4.4/9, it was show that importat relatios exist betwee populatio meas, variaces, ad covariace. Specifically, we have the formulas that appear below left.

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions Assessmet ad Modelig of Forests FR 48 Sprig Assigmet Solutios. The first part of the questio asked that you calculate the average, stadard deviatio, coefficiet of variatio, ad 9% cofidece iterval of the

More information

Chapter 1 (Definitions)

Chapter 1 (Definitions) FINAL EXAM REVIEW Chapter 1 (Defiitios) Qualitative: Nomial: Ordial: Quatitative: Ordial: Iterval: Ratio: Observatioal Study: Desiged Experimet: Samplig: Cluster: Stratified: Systematic: Coveiece: Simple

More information

University of California, Los Angeles Department of Statistics. Simple regression analysis

University of California, Los Angeles Department of Statistics. Simple regression analysis Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100C Istructor: Nicolas Christou Simple regressio aalysis Itroductio: Regressio aalysis is a statistical method aimig at discoverig

More information

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples

More information

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005 Statistics 203 Itroductio to Regressio ad Aalysis of Variace Assigmet #1 Solutios Jauary 20, 2005 Q. 1) (MP 2.7) (a) Let x deote the hydrocarbo percetage, ad let y deote the oxyge purity. The simple liear

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 5: Parametric Hypothesis Testig: Comparig Meas GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review from last week What is a cofidece iterval? 2 Review from last week What is a cofidece

More information

Median and IQR The median is the value which divides the ordered data values in half.

Median and IQR The median is the value which divides the ordered data values in half. STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece

More information

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph Correlatio Y Two variables: Which test? X Explaatory variable Respose variable Categorical Numerical Categorical Cotigecy table Cotigecy Logistic Grouped bar graph aalysis regressio Mosaic plot Numerical

More information

Topic 10: Introduction to Estimation

Topic 10: Introduction to Estimation Topic 0: Itroductio to Estimatio Jue, 0 Itroductio I the simplest possible terms, the goal of estimatio theory is to aswer the questio: What is that umber? What is the legth, the reactio rate, the fractio

More information

Lecture 1, Jan 19. i=1 p i = 1.

Lecture 1, Jan 19. i=1 p i = 1. Lecture 1, Ja 19 Review of the expected value, covariace, correlatio coefficiet, mea, ad variace. Radom variable. A variable that takes o alterative values accordig to chace. More specifically, a radom

More information

Chapter 13, Part A Analysis of Variance and Experimental Design

Chapter 13, Part A Analysis of Variance and Experimental Design Slides Prepared by JOHN S. LOUCKS St. Edward s Uiversity Slide 1 Chapter 13, Part A Aalysis of Variace ad Eperimetal Desig Itroductio to Aalysis of Variace Aalysis of Variace: Testig for the Equality of

More information

STAT431 Review. X = n. n )

STAT431 Review. X = n. n ) STAT43 Review I. Results related to ormal distributio Expected value ad variace. (a) E(aXbY) = aex bey, Var(aXbY) = a VarX b VarY provided X ad Y are idepedet. Normal distributios: (a) Z N(, ) (b) X N(µ,

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008 Chapter 6 Part 5 Cofidece Itervals t distributio chi square distributio October 23, 2008 The will be o help sessio o Moday, October 27. Goal: To clearly uderstad the lik betwee probability ad cofidece

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences. Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx

More information