REGRESSION AND ANALYSIS OF VARIANCE. Motivation. Module structure

Size: px
Start display at page:

Download "REGRESSION AND ANALYSIS OF VARIANCE. Motivation. Module structure"

Transcription

1 REGRESSION AND ANALYSIS OF VARIANCE 1 Motivatio Objective: Ivestigate associatios betwee two or more variables What tools do you already have? t-test Compariso of meas i two populatios What will we cover i this module? Liear Regressio Associatio of a cotiuous outcome with oe or more predictors (categorical or cotiuous) Aalysis of Variace Compariso of a cotiuous outcome over a fixed umber of groups Logistic Regressio Associatio of a biary outcome with oe or more predictors (categorical or cotiuous) Module structure 1 sessios over.5 days Alteratig i-class ad lab practical sessios, each of approximately 1.5 hour duratio Day 1 Simple liear regressio Day Model checkig Multiple liear regressio ANOVA Day 3 ANCOVA Logistic regressio 3

2 REGRESSION MODELS SIMPLE LINEAR REGRESSION 4 Outlie: Simple Liear Regressio Motivatio The equatio of a straight lie Least Squares Estimatio Iferece About regressio coefficiets About predictios Model Checkig Residual aalysis Outliers & Ifluetial observatios 5 Motivatio: Cholesterol Example Data: Factors related to serum total cholesterol, 4 idividuals, 11 variables > head(cholesterol) ID sex age chol BMI TG APOE rs rs HTN chd Our first goal: Ivestigate the relatioship betwee cholesterol (mg/dl) ad age i adults 6

3 Motivatio: Cholesterol Example 7 Motivatio: Cholesterol Example Is cholesterol associated with age? You could dichotomize age ad compare cholesterol betwee two age groups > group = 1*(age > 55) > group=factor(group,levels=c(,1), labels=c("3-55","56-8")) > table(group) group > boxplot(chol~group,ylab= Total cholesterol(mg/dl) ) 8 mea i group 3-55 mea i group 56-8 Motivatio: Cholesterol Example Is cholesterol associated with age? You could compare mea cholesterol betwee two groups: t-test > t.test(chol ~ group) Welch Two Sample t-test data: chol by group t = , df = , p-value =.315 alterative hypothesis: true differece i meas is ot equal to 95 percet cofidece iterval: sample estimates: mea i group 3-55 mea i group

4 Motivatio: Cholesterol Example Questio: What do the boxplot ad the t-test tell us about the relatioship betwee age ad cholesterol? > t.test(chol ~ group) Welch Two Sample t-test data: chol by group t = , df = , p-value =.315 alterative hypothesis: true differece i meas is ot equal to 95 percet cofidece iterval: sample estimates: mea i group 3-55 mea i group Motivatio: Cholesterol Example Usig the t-test: There is a statistical associatio betwee cholesterol ad age There appears to be a positive associatio betwee cholesterol ad age Is there ay way we could estimate the magitude of this associatio without breakig the cotiuous measure of age ito subgroups? With the t-test, we compared mea cholesterol i two age groups, could we compare mea cholesterol across cotiuous age? 11 Motivatio: Cholesterol Example We might assume that mea cholesterol chages liearly with age: Ca we fid the equatio for a straight lie that best fits these data? 1

5 Liear Regressio A statistical method for modelig the relatioship betwee a cotiuous variable [respose/outcome/depedet] ad other variables [predictors/exposure/idepedet] Most commoly used statistical model Flexible Well-developed ad uderstood properties Easy iterpretatio Buildig block for more geeral models Goals of aalysis: Estimate the associatio betwee respose ad predictors or, Predict respose values give the values of the predictors. We will start our discussio studyig the relatioship betwee a respose ad a sigle predictor Simple liear regressio model 13 The straight lie equatio Y A lie ca be described by two umbers y = b + b 1 x X 14 The straight lie equatio Y b o is the itercept: where the lie crosses the y-axis whe x= X 15

6 The straight lie equatio Y b 1 is the slope: the chage i y correspodig to a uit icrease i x x x+1 X 16 The straight lie equatio Y b 1 is the slope: the chage i y correspodig to a uit icrease i x b +b 1 (x+1) b +b 1 x Differece is b 1 x x+1 X 17 The straight lie equatio Y b 1 is the slope: the chage i y correspodig to a uit icrease i x The same across the etire lie X 18

7 The straight lie equatio Y Two values of x uits apart will have a differece i y values of *b 1 X 19 The straight lie equatio Slope b 1 is the chage i y correspodig to a uit icrease i x Slope gives iformatio about magitude ad directio of the associatio betwee x ad y The straight lie equatio y (b 1 =) No associatio betwee x ad y (values of y are the same regardless of x) y x (b 1 > ) Positive associatio betwee x ad y (values of y icrease as values of x icrease) y x (b 1 < ) Negative associatio betwee x ad y (values of y decrease as values of x icrease) x 1

8 Simple Liear Regressio We ca use liear regressio to model how the mea of a outcome Y chages with the level of a predictor, X The idividual Y observatios will be scattered about the mea We estimate a straight lie describig tred i the mea of a outcome Y as a fuctio of predictor X Simple Liear Regressio I regressio: X is used to predict or explai outcome Y. Respose or depedet variable (Y): variable we wat to predict or explai Explaatory or idepedet or predictor variable (X): attempts to explai the respose Simple Liear Regressio Model: y = b + b1 x + e, e ~ N(, s ) 3 Simple Liear Regressio y = b + b1 x + e, e ~ N(, s ) The model cosists of two compoets: Systematic compoet: E[Y X = x] = β + β 1 x b 1 : slope Mea populatio value of Y at X=x b :itercept Radom compoet: Var[Y X = x]=σ Variace does ot deped o x 4

9 Simple Liear Regressio: Assumptios MODEL: E 1 [ Y X = x] = β + β x Var[ Y X = x] = σ Compare with the boxplots o Slide 8 5 Simple Liear Regressio: Iterpretig model coefficiets Model: E[Y x] = b +b 1 x Var[Y x] = s Questio: How do you iterpret b? Aswer: b = E[Y x=], that is, the mea respose whe x= Your tur: iterpret b 1 6 Simple Liear Regressio: Iterpretig model coefficiets Model: E[Y x] = b +b 1 x Var[Y x] = s Questio: How do you iterpret b 1? Aswer: E[Y x] = β + β 1 x E[Y x+1] = β + β 1 (x+1) = β + β 1 x+ β 1 E[Y x+1] E[Y x] = β 1 idepedet of x (liearity) i.e. β 1 is the differece i the mea respose associated with a oe uit positive differece i x 7

10 Example: Cholesterol ad age Recall: Our motivatig example was to determie if there is a associatio betwee age (a cotiuous predictor) ad cholesterol (a cotiuous outcome) Suppose: We believe they are associated via the liear relatioship E[Y x] = b +b 1 x Questio: How would you iterpret b 1? Aswer: 8 Example: Cholesterol ad age Recall: Our motivatig example was to determie if there is a associatio betwee age (a cotiuous predictor) ad cholesterol (a cotiuous outcome) Suppose: We believe they are associated via the liear relatioship E[Y x] = b +b 1 x Questio: How do you iterpret b 1? Aswer: β 1 is the differece i mea cholesterol associated with a oe year icrease i age 9 Least Squares Estimatio Questio: How to fid a best-fittig lie? 3

11 8 6 y 4 Least Squares Estimatio Questio: How to fid a best-fittig lie? x Method: Least Squares Estimatio Idea: chooses the lie that miimizes the sum of squares of the vertical distaces from the observed poits to the lie. 31 Least Squares Estimatio The least squares regressio lie is give by yˆ = β ˆ + βˆ 1x So the (squared) distace betwee the data (y) ad the least squares regressio lie is D = ( y i yˆ i ) i We estimate β ad β 1 by fidig the values that miimize D 3 Least Squares Estimatio These values are: βˆ = y βˆ x 1 βˆ ( xi x)( yi y) ( xi x) 1 = We estimate the variace as ( y ˆ i yi ) i= 1 i= 1 ˆ ri i= 1 σ = = = ( y βˆ βˆ x ) i 1 i 33

12 Estimated Stadard Errors Recall that whe estimatig parameters, there will be samplig variability i the estimates This is true for regressio parameter estimates Lookig at the formulas for ˆβ ad ˆβ 1, we ca see that these are just complicated meas I repeated samplig we would get differet estimates Kowledge of the samplig distributio of parameter estimates ca help us make iferece about the lie 34 true regressio lie Click here to simulate a data set 35 36

13 Samplig Distributio 37 Samplig Distributio 38 From statistical theory (i the figure, the theoretical distributio is show i pik) " # $ $ % & " # $ $ % & " # $ $ % & " # $ $ % & ) ( 1, ~ ˆ 1) ( 1, ~ ˆ x x s N s x N σ β β σ β β Samplig variability

14 -5 5 desity desity -5 5 desity desity -5 5 desity desity desity Estimated Stadard Errors Estimate the variability of βˆ, βˆ across repeated samplig 1 SE( βˆ ) =σˆ 1 x + ( 1) s x SE( βˆ ) =σˆ 1 1 ( 1) s x 4 Iferece About regressio model parameters Hypothesis testig: H : b j = Test Statistic: Large Samples: βˆ j ( ull hyp) ~ N(,1) se( βˆ ) j Small Samples: ˆ b j - ( ull hyp) ~ t se( ˆ b ) j - Cofidece Itervals: β ˆ ± ( critical value) se( βˆ ) j j [Do t worry about these formulae: we will use R to fit the models] 41 Iferece: Hypothesis Testig Null Hypothesis: b j = T=test statistic Alterative P-Value b j > P(t - >T) desity P-value b j < P(t - <T) desity P-value b j ¹ P(t - > T ) desity P-value 4

15 Iferece: Cofidece Itervals 1 (1-a)% Cofidece Iterval for b j (j=,1) βˆ ± t j, α SE( βˆ ) j Gives itervals that (1- α)1% of the time will cover the true parameter value ( β or β 1 ). We say we are (1- α)1% cofidet the iterval covers β j. 43 Example: Scietific Questio: Is cholesterol associated with age? > fit = lm(chol ~ age) > summary(fit) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.69 o 398 degrees of freedom Multiple R-squared:.499, Adjusted R-squared:.3858 F-statistic: 17.1 o 1 ad 398 DF, p-value: 4.5e-5 > cofit(fit).5 % 97.5 % (Itercept) age Example: Scietific Questio: Is cholesterol associated with age? > fit = lm(chol ~ age) > summary(fit) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max Coefficiets: βˆ 1 =.31 ; Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.69 o 398 degrees of freedom Multiple R-squared:.499, Adjusted R-squared:.3858 F-statistic: 17.1 o 1 ad 398 DF, p-value: 4.5e-5 Estimates of the model parameters ad stadard errors βˆ = ; se( βˆ ) = 4.6 se( βˆ ) =.8 1 > cofit(fit).5 % 97.5 % (Itercept) age

16 Example: Scietific Questio: Is cholesterol associated with age? > fit = lm(chol ~ age) > summary(fit) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.69 o 398 degrees of freedom Multiple R-squared:.499, Adjusted R-squared:.3858 F-statistic: 17.1 o 1 ad 398 DF, p-value: 4.5e-5 95% Cofidece itervals > cofit(fit).5 % 97.5 % (Itercept) age Example: Scietific Questio: Is cholesterol associated with age? What do these models results mea i terms of our scietific questio? Parameter estimates ad cofidece itervals: ˆ b = % CI: (158.5, 175.3) ˆ b = % CI: (.16,.46) bˆ : The estimated average serum cholesterol for someoe of age = is 166.9? Your tur: What about 1 ˆb? 47 Example: Scietific Questio: Is cholesterol associated with age? What do these models results mea i terms of our scietific questio? Parameter estimates ad cofidece itervals: ˆ β = % CI: (158.5, 175.3) ˆ β 1 =.31 95% CI: (.16,.46) Aswer: 1 ˆb : mea cholesterol is estimated to differ by.31 mg/dl for each oe year differece i age. Questio: What about the cofidece itervals? 48

17 Example: Scietific Questio: Is cholesterol associated with age? What do these models results mea i terms of our scietific questio? Parameter estimates ad cofidece itervals: ˆ β = % CI: (158.5, 175.3) ˆ β 1 =.31 95% CI: (.16,.46) Aswer: 95% CIs give us a rage of values that will cover the true itercept ad slope 95% of the time For istace, we ca be 95% cofidet that the true differece i mea cholesterol associated with a oe year differece i age lies betwee.16 ad.46 mg/dl 49 Example: Scietific Questio: Is cholesterol associated with age? Presetatio of the results? The mea serum total cholesterol is sigificatly higher i older idividuals (p <.1). For each additioal year of age, we estimate that the mea total cholesterol differs by approximately.31 mg/dl (95% CI:.16,.46). Note: Emphasis o slope parameter (sig ad magitude) Cofidece iterval Uits for predictor ad respose. Scale matters 5 Iferece for predictios Give estimates βˆ, βˆ 1 we ca fid the predicted value, ŷ i for ay value of x i as ˆ = β ˆ + βˆ yi 1x i Iterpretatio of ŷ i : Estimated mea value of Y at X = x. i Be Cautious: This assumes the model is true. May be a reasoable assumptio withi the rage of your data. It may ot be true outside the rage of your data 51

18 y True model Observed values of x x No data i this rage Would you use the regressio lie to extrapolate?? 5 Be careful of extrapolatig Height (iches) Age It would ot make sese to extrapolate height at age from a study of girls aged 4-9 years 53 Predictio Predictio of the mea E[Y X=x]: Poit Estimate: yˆ =β ˆ ˆ x + β1 Stadard Error: 1 ( x x) se( yˆ) = σˆ + ( x i x) i= 1 Note that as x gets further from x, variace icreases 1 (1-a)% cofidece iterval for E[Y X=x]: ± t se( ˆ) yˆ,1 α / y 54

19 Predictio Predictio of a ew future observatio, y*, at X=x: * Poit Estimate: yˆ =β ˆ ˆ x + β1 Stadard Error: * 1 ( x x) se( yˆ ) = σˆ 1+ + ( x i x) i= 1 1 (1-a)% predictio iterval for a ew future observatio: * * ˆ ± t se( yˆ ) y,1 α / Stadard error for the predictio of a future observatio is bigger: It depeds ot oly o the precisio of the estimated mea, but also o the amout of variability i Y aroud the lie. 55 Cholesterol Example: Predictio Predictio of the mea > predict.lm(fit, ewdata=data.frame(age=c(46,47,48)), iterval="cofidece") fit lwr upr > predict.lm(fit, ewdata=data.frame(age=c(46,47,48)), iterval="predictio") fit lwr upr Predictio of a ew observatio 56 Example: Scietific Questio: Is cholesterol associated with age? Let s iterpret these predictios For x = 46 yˆ = % CI: (178.7, 183.7) yˆ* = % CI: (138.5, 3.9) Questio: How do our iterpretatios for ŷ ad differ? ŷ* 57

20 Example: Scietific Questio: Is cholesterol associated with age? Let s iterpret these predictios For x = 46 Questio: How do our iterpretatios for ŷ ad ŷ* differ? yˆ = % CI: (178.7, 183.7) yˆ* = % CI: (138.5, 3.9) Aswer: The poit estimates represet our predictios for the mea serum cholesterol for idividuals age 46 ( ŷ ) ad for a sigle ew idividual of age 46 ( ŷ* ) 58 Example: Scietific Questio: Is cholesterol associated with age? Let s iterpret these predictios For x = 46 yˆ = % CI: (178.7, 183.7) yˆ* = % CI: (138.5, 3.9) Questio: Why are the cofidece itervals for ŷ ad ŷ* of differig widths? 59 Example: Scietific Questio: Is cholesterol associated with age? Let s iterpret these predictios For x = 46 Questio: Why are the cofidece itervals for ŷ ad ŷ* of differig widths? yˆ = % CI: (178.7, 183.7) yˆ* = % CI: (138.5, 3.9) Aswer: The iterval is broader whe we make a predictio for a cholesterol level for a sigle idividual because it must icorporate radom variability aroud the mea. 6

21 y Simple Liear Regressio: R Give o liear associatio: We could simply use the sample mea to predict E(Y). The variability usig this simple predictio is give by SST (to be defied shortly). Give a liear associatio: The use of X permits a potetially better predictio of Y by usig E(Y X). Questio: What did we gai by usig X? Let s examie this questio with the followig figure 61 Decompositio of sum of squares y y yˆ y y yˆ y x 6 Decompositio of sum of squares It is always true that: y y = ( y yˆ ) + ( yˆ y) i i i i It ca be show that: ( yi y) = ( yi yˆ i ) + i= 1 i= 1 i= 1 ( yˆ y) i SST = SSE + SSR SST: describes the total variatio of the Y i. SSE: describes the variatio of the Y i aroud the regressio lie. SSR: describes the structural variatio; how much of the variatio is due to the regressio relatioship. This decompositio allows a characterizatio of the usefuless of the covariate X i predictig the respose variable Y. 63

22 Simple Liear Regressio: R Give o liear associatio: We could simply use the sample mea to predict E(Y). The variability betwee the data ad this simple predictio is give as SST. Give a liear associatio: The use of X permits a potetially better predictio of Y by usig E(Y X). Questio: What did we gai by usig X? Aswer: We ca aswer this by computig the proportio of the total variatio that ca be explaied by the regressio o X R SSR SST SSE = = = 1 SST SST SSE SST This R is, i fact, the correlatio coefficiet squared. 64 Examples of R Low values of R idicate that the model is ot adequate. However, high values of R do ot mea that the model is adequate 65 Cholesterol Example: Scietific Questio: Ca we predict cholesterol based o age? > fit = lm(chol ~ age) > summary(fit) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.69 o 398 degrees of freedom Multiple R-squared:.499, Adjusted R-squared:.3858 F-statistic: 17.1 o 1 ad 398 DF, p-value: 4.5e-5 > cofit(fit).5 % 97.5 % (Itercept) age

23 R =.4 Cholesterol Example: Scietific Questio: Ca we predict cholesterol based o age? What does R tell us about our model for cholesterol? 67 R =.4 Cholesterol Example: Scietific Questio: Ca we predict cholesterol based o age? What does R tell us about our model for cholesterol? Aswer: 4% of the variability i cholesterol is explaied by age. Although mea cholesterol icreases with age, there is much more variability i cholesterol tha age aloe ca explai 68 Cholesterol Example: Scietific Questio: Ca we predict cholesterol based o age? Decompositio of Sum of Squares ad the F-statistic > aova(fit) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) SSR= age e-5 *** SSE= Residuals Sigif. codes: ***.1 **.1 * Degrees of freedom Decompositio of the Sum of Squares Mea Squares: SS/df F-statistic: MSR/MSE I simple liear regressio: F-statistic = (t-statistic for slope) Hypothesis beig tested: H : b 1 =, H 1 : b 1 ¹. 69

24 Simple Liear Regressio: Assumptios 1. E[Y x] is related liearly to x. Y s are idepedet of each other 3. Distributio of [Y x] is ormal 4. Var[Y x] does ot deped o x Liearity Idepedece Normality Equal variace Ca we assess if these assumptios are valid? 7 Model Checkig: Residuals (Raw or ustadardized) Residual: differece (r i ) betwee the observed respose ad the predicted respose, that is, r = y yˆ i i = y ( β ˆ + βˆ x ) i i 1 i The residual captures the compoet of the measuremet y i that caot be explaied by x i. 71 Model Checkig: Residuals Residuals ca be used to Idetify poorly fit data poits Idetify uequal variace (heteroscedasticity) Idetify oliear relatioships Idetify additioal variables Examie ormality assumptio 7

25 Model Checkig: Residuals Liearity Idepedece Normality Equal variace Plot residual vs X or vs Ŷ Q: Is there ay tred? Q: Ay scietific cocers? Residual histogram or qq-plot Q: Symmetric? Normal? Plot residual vs X Q: Is there ay patter? 73 Model Checkig: Residuals If the liear model is appropriate we should see a ustructured horizotal bad of poits cetered at zero as see i the figure below Residuals 1 1 Deviatio = residual x 74 Model Checkig: Residuals The model does ot provide a good fit i these cases Residuals Residuals Violatios of the model assumptios? How? 75

26 Simple Liear Regressio: Residual Aalysis: Noliear Associatio True model: y = x^1.7 Plot of Fitted Model: ^ y= x 1 5 Residuals Plot fitted (predictio) vs. residual: ^y= x x Fitted values 76 Simple Liear Regressio: Residual Aalysis: No Costat Variace True model: y = x + errors icreasig with x 1 Plot of Fitted Model: ^y= x Plot fitted (predictio) vs. residual: ^y= x 5 5 Residuals x Fitted values 77 No-costat variace Sometimes variace of y is ot costat across the rage of x (heteroscedasticity) Little effect o poit estimates but variace estimates may be icorrect This may affect cofidece itervals ad p-values To accout for heteroscedasticity we ca Use robust stadard errors Trasform the data Fit a model that does ot assume costat variace (GLM) 78

27 Robust stadard errors Robust stadard errors correctly estimate variability of parameter estimates eve uder ocostat variace These stadard errors use empirical estimates of the variace i y at each x value rather tha assumig this variace is the same for all x values Regressio poit estimates will be uchaged Robust or empirical stadard errors will give correct cofidece itervals ad p-values 79 Simple Liear Regressio: Residual Aalysis: No-ormality of errors QQ-plot Graphical techique that allows us to assess whether or ot a data set follows a give distributio (such as the ormal distributio) The data are plotted agaist a give theoretical distributio Poits should approximately fall i a straight lie Departures from the straight lie idicate departures from the specified distributio. 8 Simple Liear Regressio: Residual Aalysis: No-ormality of errors Costructio of QQ-Plot: (a example) residuals Sort sorted idex(i) Empirical Probab i/ (i-.5)/ pr1 pr Z-quatile Get z-values -.19 (quatiles from -.6 Normal distr.) Plot of sorted residuals (sample quatiles) versus z-quatile (theoretical quatiles) = QQ-plot 81

28 Simple Liear Regressio: Residual Aalysis: No-ormality of errors Residuals Iverse Normal True model: y = x + chi-squared errors Plot of Fitted Model: Q-Q Plot Uder ormality, residuals should fall o the straight lie ^ 8 y= x x Plot of residuals versus fitted values Curvature? Heteroscedasticity? R COMMAND: plot(fit$fitted, fit$residuals) Plot of residuals versus quatiles of a ormal distributio(for > 3) Normality? R COMMAND: qqorm(fit$residuals) Cholesterol-Age example: Residuals fit$fitted fit$residuals Theoretical Quatiles Sample Quatiles 83 Aother example Liear regressio for associatio betwee age ad triglycerides > fit.tg=lm(tg~age)

29 Robust stadard errors Residual aalysis suggests meavariace relatioship Use robust stadard errors to get correct variace estimates fit.tg$fitted fit.tg$residuals 85 Cholesterol example: Robust stadard errors Liear regressio results: Results icorporatig robust SEs: > summary(fit.tg) Call: lm(formula = TG ~ age) Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) e-6 *** age < e-16 *** > summary(fit.tg.ese) Call: gee(formula = TG ~ age, id = seq(1, legth(age))) Coefficiets: Estimate Naive S.E. Naive z Robust S.E. Robust z (Itercept) age Poit estimates are uchaged 86 Cholesterol example: Robust stadard errors Liear regressio results: Results icorporatig robust SEs: > summary(fit.tg) Call: lm(formula = TG ~ age) Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) e-6 *** age < e-16 *** > summary(fit.tg.ese) Call: gee(formula = TG ~ age, id = seq(1, legth(age))) Coefficiets: Estimate Naive S.E. Naive z Robust S.E. Robust z (Itercept) age Stadard errors are corrected 87

30 Trasformatios Some reasos for usig data trasformatios Cotet area kowledge suggests oliearity Origial data suggest oliearity Equal variace assumptio violated Normality assumptio violated Trasformatios may be applied to the respose, predictor or both Be careful with the iterpretatio of the results 88 Cholesterol example: Trasformatios We have see that triglycerides are associated with age but display o-costat variace What about log trasformed triglycerides? 89 Cholesterol example: Trasformatios > summary(fit.tg.l) Call: lm(formula = log(tg) ~ age) Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) <e-16 *** age <e-16 *** fit.tg.l$fitted fit.tg.l$residuals Heteroscedasticity is corrected But iterpretatio of model is more complicated 9

31 Trasformatios Rarely do we kow which trasformatio of the predictor provides best liear fit As always, there is a dager i usig the data to estimate the best trasformatio to use If there is o associatio of ay kid betwee the respose ad the predictor, a liear fit (with a zero slope) is the correct oe Tryig to detect a trasformatio is thus a iformal test for a associatio Multiple testig procedures iflate the Type I error It is best to choose the trasformatio of the predictor o scietific grouds However, sometimes it does t matter it is ofte the case that may fuctios are well approximated by a straight lie over a small rage of the data Other approaches to o-liearity iclude splies ad fractioal polyomials 91 Model Checkig: Outliers vs Ifluetial observatios Outlier: a observatio with a residual that is uusually large (positive or egative) as compared to the other residuals. Ifluetial poit: a observatio that has a otable ifluece i determiig the regressio equatio. Removig such a poit would markedly chage the positio of the regressio lie. Observatios that are somewhat extreme for the value of x ca be ifluetial. 9 Outlier vs Ifluetial observatios y Poit A Lie icludig Poit A ^Y= x 5*X Lie with Poit A removed ^Y=.36+1.x *X Poit A is a outlier, but is ot ifluetial. x 93

32 Outlier vs Ifluetial observatios 8 6 Lie icludig Poit B ^Y= *X Poit B Y 4 Lie with Poit B removed ^Y= *X X Poit B is ifluetial, but ot a outlier. 94 Cholesterol-Age Example: Residuals No extreme outliers 95 Model Checkig: Deletio diagostics Δβ ( i) Δβ ( i) se( βˆ) = βˆ βˆ ( i) : Delta-beta : Stadardized Delta-beta Delta-beta Stadardized delta-beta : tells how much the regressio coefficiet chaged by excludig the i th observatio : approximates how much the t-statistic for a coefficiet chaged by excludig the i th observatio 96

33 ( Cholesterol-Age Example: Deletio diagostics > dfb = dfbeta(fit) > idex=order(abs(dfb[,]),decreasig=t) > cbid(dfb[idex[1:15],],age[idex[1;15]]) (Itercept) age No evidece of ifluetial poits. The largest (i absolute value) delta beta is.15 compared to.31 for the regressio coefficiet. 97 Model Checkig What to do if you fid a outlier ad/or ifluetial observatio: Check it for accuracy Decide (based o scietific judgmet) whether it is best to keep it or omit it If you thik it is represetative, ad likely would have appeared i a larger sample, keep it If you thik it is very uusual ad ulikely to occur agai i a larger sample, omit it Report its existece [whether or ot it is omitted] 98 Simple Liear Regressio: Impact of Violatios of Model Assumptios No Liearity No Normality Estimates Problematic Miimal for most departures. Outliers ca be a problem. Tests/CIs Problematic Miimal for most departures. CIs for correlatio are sesitive. Correctio Trasform or Choose a oliear model. Delete outliers (if warrated) or Use robust regressio Uequal Variaces Miimal impact Variace estimates are wrog, but the effect is usually ot dramatic Trasform or Use robust stadard error Depedece Ofte the estimates are ubiased Variace estimates are wrog Regressio for depedet data 99

34 REGRESSION MODELS MULTIPLE LINEAR REGRESSION 1 Outlie: Multiple Liear Regressio Motivatio Model ad Iterpretatio Estimatio ad Iferece Iteractio 11 Motivatio The respose or depedet variable, Y, may deped o several predictors ot just oe Multiple regressio is a attempt to cosider the simultaeous ifluece of several variables o the respose This may be with the goal of a ubiased estimate of associatio or for better predictio 1

35 Motivatio Why ot fit multiple separate simple liear regressios? If the goal is to estimate the associatio betwee the respose ad a predictor of iterest, a cofouder ca make the observed associatio appear stroger tha the true associatio, weaker tha the true associatio, or eve the reverse of the true associatio How ca we address this: We ca adjust for the effects of the cofouder by addig a correspodig term to our liear regressio If the goal is predictio of the respose, we may be able to improve predictio by icludig additioal variables i the regressio model 13 Motivatio: Cholesterol Example Data > head(cholesterol) ID sex age chol BMI TG APOE rs rs HTN chd Our goal: Ivestigate the relatioship betwee age (years), BMI (kg/m ) ad serum total cholesterol (mg/dl) 14 Motivatio I geeral, the multiple regressio equatio ca be writte as follows: E[Y x 1,x,...,x p ] = β + β 1 x 1 + β x β p x p We use multiple variables whe: The predictor variable is categorical with more tha two groups We eed polyomials, splies or other fuctios to model the shape of the relatioship(s) accurately Estimatig associatio: We wat to adjust for cofoudig by other variables We wat to allow the associatio to differ for differet values of other variables (iteractio) Predictio: we use multiple variables if we thik more tha oe variable will be useful i predictig future outcomes accurately 15

36 Model ad Iterpretatio Model: Y β + β x + β x β + ε = 1 1 p x p where we assume iid ε ~ N (, σ ) Extesio of simple liear regressio Systematic compoet: E [ Y x1,..., x p + β1x1+ β x ]= β β x p p Radom compoet: Var[ Y x1,..., x p ] = σ 16 Model ad Iterpretatio For example, let us assume that there are two predictors i the model ad so E[Y x 1, x ] = b + b 1 x 1 + b x Cosider two observatios with the same value for x, but oe observatio has x 1 oe uit higher, that is, Obs 1: E[Y x 1 =k+1, x =c] = b + b 1 (k+1) + b c Obs : E[Y x 1 =k, x =c] = b + b 1 (k) + b c Thus, E[Y x 1 =k+1, x =c] - E[Y x 1 =k, x =c] = b 1 That is, b 1 is the expected mea chage i y per uit chage i x 1 if x is held costat (adjusted/cotrollig for x ) Similar iterpretatio applies to b 17 Model ad Iterpretatio To facilitate our discussio let s assume we have two predictors with biary values Model: E[ Y x = β + β x + β x 1, x] 1 1 Mea of Y X = X =1 X 1 = b b +b X 1 =1 b +b 1 b +b 1 +b E[Y x 1 =1, x =] - E[Y x 1 =,x =] = b 1 E[Y x 1 =1, x =1] - E[Y x 1 =,x =1] = b 1 E[Y x 1 =, x =1] - E[Y x 1 =,x =] = b E[Y x 1 =1, x =1] - E[Y x 1 =1,x =] = b 18

37 Estimatio Least Squares Estimatio: Chooses the coefficiet estimates that miimize the residual sum of squares ( y y$ i i ) i Computatio more difficult, but statistical software (R) will do that for you 19 Estimatio ad Iferece Iferece About regressio model parameters Hypothesis Testig H : b j = Iterpretatio: Is there a statistically sigificat relatioship betwee the respose y ad x j after adjustig for all other factors (predictors) i the model? ˆ b j - ( ull hyp) Test Statistic: ~ t- p-1 se( ˆ b ) j Note: The square of the t-statistic gives the F-statistic ad the test is kow as the partial F-Test Cofidece Itervals β ˆ ( ) ( ˆ j ± critical value se β j ) 11 Estimatio ad Iferece About the full model Hypotheses H : β β =... = β vs. H 1 : At least oe b j is ot ull 1 = p = Aalysis of variace table Source df SS MS F Regressio p SSR= (y $ i - y) MSR= SSR/p MSR/MSE Residual -p-1 SSE= (y - y $ ) MSE= i i SSE/(-p-1) Total -1 SST= (y i - y) 111

38 Estimatio ad Iferece The F-value is tested agaist a F-distributio with p, -p-1 degrees of freedom If we reject the ull hypothesis, the the predictors do aid i predictig Y [i this aalysis we do ot kow which oes are importat] Failig to reject the ull hypothesis does ot mea that oe of the covariates are importat, sice the effect of oe or more covariates may be "masked" by others. The hard part is choosig which covariates to iclude or exclude. This is kow as the global (multiple) F-test 11 Scietific example: Modelig cholesterol usig age ad BMI We have see that there is a sigificat relatioship betwee age ad cholesterol Ca we better uderstad variability i cholesterol by icorporatig additioal covariates? 113 Scietific example: Modelig cholesterol usig age ad BMI 114

39 Scietific example: Modelig cholesterol usig age ad BMI It appears that BMI icreases with icreasig age Ad cholesterol icreases with icreasig BMI What if we wat to estimate the associatio betwee age ad cholesterol while holdig BMI costat? Multiple regressio 115 Scietific example: Modelig cholesterol usig age ad BMI > fit=lm(chol~age+bmi) > summary(fit) Call: lm(formula = chol ~ age + BMI) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age * BMI *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.34 o 397 degrees of freedom Multiple R-squared:.7351, Adjusted R-squared:.6884 F-statistic: o ad 397 DF, p-value:.6e Scietific example: Modelig cholesterol usig age ad BMI Our estimated regressio equatio is y ˆ = Age +1.43BMI Questio: How do we iterpret the age coefficiet? 117

40 Scietific example: Modelig cholesterol usig age ad BMI Our estimated regressio equatio is y ˆ = Age +1.43BMI Questio: How do we iterpret the age coefficiet? Aswer: This is the estimated average differece i cholesterol associated with a oe year differece i age for two subjects with the same BMI. 118 Scietific example: Modelig cholesterol usig age ad BMI Our estimated regressio equatio is y ˆ = Age +1.43BMI The age coefficiet from our simple liear regressio model was.31. Questio: Why do the estimates from the two models differ? 119 Scietific example: Modelig cholesterol usig age ad BMI Our estimated regressio equatio is y ˆ = Age +1.43BMI The age coefficiet from our simple liear regressio model was.31. Questio: Why do the estimates from the two models differ? Aswer: We are ow coditioig o or cotrollig for BMI so our estimate of the age associatio is amog subjects with the same BMI. 1

41 Scietific example: Modelig cholesterol usig age ad BMI Call: lm(formula = chol ~ age + BMI) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age * BMI *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.34 o 397 degrees of freedom Multiple R-squared:.7351, Adjusted R-squared:.6884 F-statistic: o ad 397 DF, p-value:.6e-7 11 Cholesterol Example: Did addig BMI improve our model? > aova(fit,fit) Aalysis of Variace Table Model 1: chol ~ age Model : chol ~ age + BMI Res.Df RSS Df Sum of Sq F Pr(>F) *** --- Sigif. codes: ***.1 **.1 * How does the model with age ad BMI compare to a model that cotais oly the mea? > fit=lm(chol~1) > aova(fit,fit) Aalysis of Variace Table Model 1: chol ~ 1 Model : chol ~ age + BMI Res.Df RSS Df Sum of Sq F Pr(>F) e-7 *** --- Sigif. codes: ***.1 **.1 * Iteractio ad Liear Regressio Statistical iteractio (aka effect modificatio) occurs whe the relatioship betwee a outcome variable ad oe predictor is differet depedig o the levels of a secod predictor Iteractios are usually ivestigated because of a priori assumptios/hypotheses o the part of the researchers Liear regressio models allow for the iclusio of iteractios with cross-product terms 13

42 Cofoudig vs. Iteractio/Effect Modificatio Data ad scietific uderstadig help distiguish betwee cofoudig ad effect modifyig variables: Cofouder: Associated with predictor ad respose; Associatio betwee respose ad predictor costat across strata of the ew variable Effect modifier/iteractio: Associatio betwee respose ad the predictor varies across strata of the ew variable 14 Cofoudig vs. Iteractio/Effect Modificatio Cofoudig: Estimates of associatio from uadjusted aalysis are markedly differet from estimates of associatio from adjusted aalysis Associatio withi each stratum is similar, but differet from the crude associatio i the combied data (igorig the strata) I liear regressio, these symptoms are diagostic of cofoudig Effect modificatio would show differeces betwee adjusted aalysis ad uadjusted aalysis, but would also show differet associatios i the differet strata 15 Effect Modificatio /Iteractio Eve if preset, effect modificatio may ot always be of iterest i summarizig the effect of a predictor. For example, plecoaril, a ativiral drug, reduced the mea duratio of symptoms i subjects with a commo cold due to rhioviruses but had o effect i subjects whose cold was due to some other aget. I the case of the plecoaril, effect modificatio was importat i checkig that the drug did actually work by ihibitig rhiovirus. However, i cliical use of the drug, it would typically ot be possible to determie the ifectious aget (the tests are expesive ad take loger tha just recoverig from the cold), ad so the average effectiveess of the drug across all colds would be a more importat quatity. 16

43 Graphical Represetatio Y W=1 No parallel lies Iteractio W= X 17 Graphical Represetatio Y W=1 No parallel lies Iteractio W= X 18 Graphical Represetatio Y W=1 Parallel lies No Iteractio W= X 19

44 Graphical Represetatio Y W=1 [Y W=1] Parallel lies No Iteractio W= [Y W=] W is possibly a Cofouder X [X W=] [X W=1] 13 Graphical Represetatio Y W=1 Parallel lies No Iteractio W= X 131 Model ad Iterpretatio: iteractio Assume that there are two predictors i the model E[Y x 1, x ] = b + b 1 x 1 + b x + b 3 x 1 x Cosider two observatios with the same value, c, for x, but oe observatio has x 1 oe uit higher Obs 1: E[Y x 1 =k+1, x =c] = b + b 1 (k+1) + b c + b 3 (k+1)c Obs : E[Y x 1 =k, x =c] = b + b 1 (k) + b c + b 3 kc Thus, E[Y x 1 =k+1, x =c] - E[Y x 1 =k, x =c] = b 1 + b 3 c That is, the differece i meas depeds ow o the value of x 13

45 Model ad Iterpretatio: iteractio Model: E[Y x 1, x ] = b + b 1 x 1 + b x + b 3 x 1 x Differece i Meas: E[Y x 1 =k+1, x =c] - E[Y x 1 =k, x =c] = b 1 + b 3 c The differece i meas depeds o the value of x The differece i meas is b 1 if c=. The differece i meas is b 1 + b 3 if c=1 The differece i meas chages by b 3 for each uit differece i c (that is, i x ) [that is, b 3 is the differece of differeces] H : β 3 = tests for iteractio Model ad Iterpretatio: iteractio Model: E[Y x 1, x ] = b + b 1 x 1 + b x + b 3 x 1 x Aother way to look at this Factor terms ivolvig x 1 : E[Y x 1, x ] = b + (b 1 + b 3 x )x 1 + b x Slope of x 1 chages with x = Differece i meas for each uit differece i x 1 chages with x (for each oe uit differece i x, the differece i meas chages by b 3 ) Cholesterol Example: Does sex affect the age cholesterol relatioship? age chol Male Female

46 Cholesterol Example: Does sex affect the age cholesterol relatioship? We first fit the model with age ad sex terms oly (Male: sex=, Female: sex=1) > fit3 = lm(chol ~ age+sex) > summary(fit3) Call: lm(formula = chol ~ age + sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** sex e-7 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.6 o 397 degrees of freedom Multiple R-squared:.9748, Adjusted R-squared:.993 F-statistic: 1.44 o ad 397 DF, p-value: 1.44e Cholesterol Example: Does sex affect the age cholesterol relatioship? Cholesterol Example: Does sex affect the age cholesterol relatioship? This model idicates that, after cotrollig for the effect of sex, the average cholesterol differs by.3 for each additioal year of age The age effect i this model is very similar to the effect from our simple liear regressio (.31) However, this does ot mea that the age/cholesterol relatioship is the same i males ad females To aswer this questio we must add the iteractio term 138

47 Cholesterol Example: Does sex affect the age cholesterol relatioship? Model with age ad sex mai effects, plus iteractio effect > fit4=lm(chol~age*sex) > summary(fit4) Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e Cholesterol Example: Does sex affect the age cholesterol relatioship? Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Mea cholesterol for males at age Coefficiets: Estimate Std. Error t value Pr(> t ) < e-16 (Itercept) *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e-9 14 Cholesterol Example: Does sex affect the age cholesterol relatioship? Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Differece i mea cholesterol betwee males ad females at age Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e-9 141

48 Cholesterol Example: Does sex affect the age cholesterol relatioship? Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Differece i mea cholesterol associated with each oe year chage i age for males Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e-9 14 Cholesterol Example: Does sex affect the age cholesterol relatioship? Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Differece i chage i mea cholesterol associated with each oe year chage i age for females compared to males Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e Iterpretatio? Estimated model: Age Sex-.7 Age x Sex Cholesterol Example: Does sex affect the age cholesterol relatioship? Subject 1: Age = a+1, sex = b Subject : Age = a, sex = b Differece i the estimated cholesterol: [ (a+1) (b).7 (a+1)(b)] [ (a) (b).7 (a)(b)] =.33-.7b Sex exerts a small (ot statistically sigificat) effect o the age/cholesterol relatioship I males: Age I females: Age 144

49 We ca also test the sigificace of iteractio terms usig a F-test Addig the iteractio term did ot sigificatly improve model fit > aova(fit3,fit4) Aalysis of Variace Table Model 1: chol ~ age + sex Model : chol ~ age * sex Res.Df RSS Df Sum of Sq F Pr(>F) Cholesterol Example: Does sex affect the age cholesterol relatioship? 145 Cholesterol Example: Does sex affect the age cholesterol relatioship? Age (years) Total cholesterol (mg/dl) Male Female Summary We have cosidered: Simple liear regressio Iterpretatio Estimatio Model checkig Multiple liear regressio Cofoudig Iterpretatio Estimatio Iteractio

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

REGRESSION MODELS ANOVA

REGRESSION MODELS ANOVA REGRESSION MODELS ANOVA 141 Cotiuous Outcome? NO RECAP: Logistic regressio ad other methods YES Liear Regressio Examie mai effects cosiderig predictors of iterest, ad cofouders Test effect modificatio

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N. 3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear

More information

Lecture 11 Simple Linear Regression

Lecture 11 Simple Linear Regression Lecture 11 Simple Liear Regressio Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech Midterm 2 mea: 91.2 media: 93.75 std: 6.5 2 Meddicorp

More information

Regression, Inference, and Model Building

Regression, Inference, and Model Building Regressio, Iferece, ad Model Buildig Scatter Plots ad Correlatio Correlatio coefficiet, r -1 r 1 If r is positive, the the scatter plot has a positive slope ad variables are said to have a positive relatioship

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +

More information

Simple Linear Regression

Simple Linear Regression Simple Liear Regressio 1. Model ad Parameter Estimatio (a) Suppose our data cosist of a collectio of pairs (x i, y i ), where x i is a observed value of variable X ad y i is the correspodig observatio

More information

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio

More information

(all terms are scalars).the minimization is clearer in sum notation:

(all terms are scalars).the minimization is clearer in sum notation: 7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700 Simple Regressio CS 7 Ackowledgemet These slides are based o presetatios created ad copyrighted by Prof. Daiel Measce (GMU) Basics Purpose of regressio aalysis: predict the value of a depedet or respose

More information

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

REGRESSION METHODS. Logistic regression

REGRESSION METHODS. Logistic regression REGRESSION METHODS Logistic regressio 233 RECAP: Biary Outcome? NO Cotiuous Outcome? YES Liear Regressio/ANOVA NO Other Methods YES Odds ratio as measure of associatio? Relative risk as measure of associatio?

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed

More information

University of California, Los Angeles Department of Statistics. Simple regression analysis

University of California, Los Angeles Department of Statistics. Simple regression analysis Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100C Istructor: Nicolas Christou Simple regressio aalysis Itroductio: Regressio aalysis is a statistical method aimig at discoverig

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines) Dr Maddah NMG 617 M Statistics 11/6/1 Multiple egressio () (Chapter 15, Hies) Test for sigificace of regressio This is a test to determie whether there is a liear relatioship betwee the depedet variable

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual

More information

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y). Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each

More information

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph Correlatio Y Two variables: Which test? X Explaatory variable Respose variable Categorical Numerical Categorical Cotigecy table Cotigecy Logistic Grouped bar graph aalysis regressio Mosaic plot Numerical

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.

More information

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005 Statistics 203 Itroductio to Regressio ad Aalysis of Variace Assigmet #1 Solutios Jauary 20, 2005 Q. 1) (MP 2.7) (a) Let x deote the hydrocarbo percetage, ad let y deote the oxyge purity. The simple liear

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Liear Regressio 2.1 Simple liear model The simple liear regressio model shows how oe kow depedet variable is determied by a sigle explaatory variable (regressor). Is is writte as: Y i

More information

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech Fial Review Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech 1 Radom samplig model radom samples populatio radom samples: x 1,..., x

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X. Regressio Correlatio vs. regressio Predicts Y from X Liear regressio assumes that the relatioship betwee X ad Y ca be described by a lie Regressio assumes... Radom sample Y is ormally distributed with

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve

More information

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n. ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic

More information

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So, 0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

MA238 Assignment 4 Solutions (part a)

MA238 Assignment 4 Solutions (part a) (i) Sigle sample tests. Questio. MA38 Assigmet 4 Solutios (part a) (a) (b) (c) H 0 : = 50 sq. ft H A : < 50 sq. ft H 0 : = 3 mpg H A : > 3 mpg H 0 : = 5 mm H A : 5mm Questio. (i) What are the ull ad alterative

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ), Cofidece Iterval Estimatio Problems Suppose we have a populatio with some ukow parameter(s). Example: Normal(,) ad are parameters. We eed to draw coclusios (make ifereces) about the ukow parameters. We

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y 1 Sociology 405/805 Revised February 4, 004 Summary of Formulae for Bivariate Regressio ad Correlatio Let X be a idepedet variable ad Y a depedet variable, with observatios for each of the values of these

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

STP 226 ELEMENTARY STATISTICS

STP 226 ELEMENTARY STATISTICS TP 6 TP 6 ELEMENTARY TATITIC CHAPTER 4 DECRIPTIVE MEAURE IN REGREION AND CORRELATION Liear Regressio ad correlatio allows us to examie the relatioship betwee two or more quatitative variables. 4.1 Liear

More information

Statistics 20: Final Exam Solutions Summer Session 2007

Statistics 20: Final Exam Solutions Summer Session 2007 1. 20 poits Testig for Diabetes. Statistics 20: Fial Exam Solutios Summer Sessio 2007 (a) 3 poits Give estimates for the sesitivity of Test I ad of Test II. Solutio: 156 patiets out of total 223 patiets

More information

1 Models for Matched Pairs

1 Models for Matched Pairs 1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 017 MODULE 4 : Liear models Time allowed: Oe ad a half hours Cadidates should aswer THREE questios. Each questio carries

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43 PAPER NO.: 444, 445 PAGE NO.: Page 1 of 1 INSTRUCTIONS I. You have bee provided with: a) the examiatio paper i two parts (PART A ad PART B), b) a multiple choice aswer sheet (for PART A), c) selected formulae

More information

Lesson 11: Simple Linear Regression

Lesson 11: Simple Linear Regression Lesso 11: Simple Liear Regressio Ka-fu WONG December 2, 2004 I previous lessos, we have covered maily about the estimatio of populatio mea (or expected value) ad its iferece. Sometimes we are iterested

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 00C Istructor: Nicolas Christou EXERCISE Aswer the followig questios: Practice problems - simple regressio - solutios a Suppose y,

More information

Chapter 12 Correlation

Chapter 12 Correlation Chapter Correlatio Correlatio is very similar to regressio with oe very importat differece. Regressio is used to explore the relatioship betwee a idepedet variable ad a depedet variable, whereas correlatio

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions Faculty of Egieerig MCT242: Electroic Istrumetatio Lecture 2: Istrumetatio Defiitios Overview Measuremet Error Accuracy Precisio ad Mea Resolutio Mea Variace ad Stadard deviatio Fiesse Sesitivity Rage

More information

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators. IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits

More information

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday Aoucemets MidtermII Review Sta 101 - Fall 2016 Duke Uiversity, Departmet of Statistical Sciece Office Hours Wedesday 12:30-2:30pm Watch liear regressio videos before lab o Thursday Dr. Abrahamse Slides

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences. Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx

More information

Lecture 1, Jan 19. i=1 p i = 1.

Lecture 1, Jan 19. i=1 p i = 1. Lecture 1, Ja 19 Review of the expected value, covariace, correlatio coefficiet, mea, ad variace. Radom variable. A variable that takes o alterative values accordig to chace. More specifically, a radom

More information

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 1 Populatio parameters ad Sample Statistics October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 2 Ifereces

More information

Comparing your lab results with the others by one-way ANOVA

Comparing your lab results with the others by one-way ANOVA Comparig your lab results with the others by oe-way ANOVA You may have developed a ew test method ad i your method validatio process you would like to check the method s ruggedess by coductig a simple

More information

Topic 10: Introduction to Estimation

Topic 10: Introduction to Estimation Topic 0: Itroductio to Estimatio Jue, 0 Itroductio I the simplest possible terms, the goal of estimatio theory is to aswer the questio: What is that umber? What is the legth, the reactio rate, the fractio

More information

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS SIMPLE LINEAR REGRESSION AND CORRELATION ANALSIS INTRODUCTION There are lot of statistical ivestigatio to kow whether there is a relatioship amog variables Two aalyses: (1) regressio aalysis; () correlatio

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Polynomial Functions and Their Graphs

Polynomial Functions and Their Graphs Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively

More information

Biostatistics for Med Students. Lecture 2

Biostatistics for Med Students. Lecture 2 Biostatistics for Med Studets Lecture 2 Joh J. Che, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 22, 2017 Lecture Objectives To uderstad basic research desig priciples

More information

1036: Probability & Statistics

1036: Probability & Statistics 036: Probability & Statistics Lecture 0 Oe- ad Two-Sample Tests of Hypotheses 0- Statistical Hypotheses Decisio based o experimetal evidece whether Coffee drikig icreases the risk of cacer i humas. A perso

More information

Chapter 2 Descriptive Statistics

Chapter 2 Descriptive Statistics Chapter 2 Descriptive Statistics Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples

More information

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 5: Parametric Hypothesis Testig: Comparig Meas GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review from last week What is a cofidece iterval? 2 Review from last week What is a cofidece

More information

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions Assessmet ad Modelig of Forests FR 48 Sprig Assigmet Solutios. The first part of the questio asked that you calculate the average, stadard deviatio, coefficiet of variatio, ad 9% cofidece iterval of the

More information

Analysis of Experimental Data

Analysis of Experimental Data Aalysis of Experimetal Data 6544597.0479 ± 0.000005 g Quatitative Ucertaity Accuracy vs. Precisio Whe we make a measuremet i the laboratory, we eed to kow how good it is. We wat our measuremets to be both

More information

Accuracy assessment methods and challenges

Accuracy assessment methods and challenges Accuracy assessmet methods ad challeges Giles M. Foody School of Geography Uiversity of Nottigham giles.foody@ottigham.ac.uk Backgroud Need for accuracy assessmet established. Cosiderable progress ow see

More information

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean. Sectio. True or False Questios (2 pts each). For a populatio the meas is defied as i= μ = i but for a small sample of the populatio, the mea is defied as: = i= i 2. For a logormal distributio, the media

More information

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions Chapter 11: Askig ad Aswerig Questios About the Differece of Two Proportios These otes reflect material from our text, Statistics, Learig from Data, First Editio, by Roxy Peck, published by CENGAGE Learig,

More information

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes Admiistrative Notes s - Lecture 7 Fial review Fial Exam is Tuesday, May 0th (3-5pm Covers Chapters -8 ad 0 i textbook Brig ID cards to fial! Allowed: Calculators, double-sided 8.5 x cheat sheet Exam Rooms:

More information

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio:

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information