REGRESSION AND ANALYSIS OF VARIANCE. Motivation. Module structure
|
|
- Christopher Tate
- 5 years ago
- Views:
Transcription
1 REGRESSION AND ANALYSIS OF VARIANCE 1 Motivatio Objective: Ivestigate associatios betwee two or more variables What tools do you already have? t-test Compariso of meas i two populatios What will we cover i this module? Liear Regressio Associatio of a cotiuous outcome with oe or more predictors (categorical or cotiuous) Aalysis of Variace Compariso of a cotiuous outcome over a fixed umber of groups Logistic Regressio Associatio of a biary outcome with oe or more predictors (categorical or cotiuous) Module structure 1 sessios over.5 days Alteratig i-class ad lab practical sessios, each of approximately 1.5 hour duratio Day 1 Simple liear regressio Day Model checkig Multiple liear regressio ANOVA Day 3 ANCOVA Logistic regressio 3
2 REGRESSION MODELS SIMPLE LINEAR REGRESSION 4 Outlie: Simple Liear Regressio Motivatio The equatio of a straight lie Least Squares Estimatio Iferece About regressio coefficiets About predictios Model Checkig Residual aalysis Outliers & Ifluetial observatios 5 Motivatio: Cholesterol Example Data: Factors related to serum total cholesterol, 4 idividuals, 11 variables > head(cholesterol) ID sex age chol BMI TG APOE rs rs HTN chd Our first goal: Ivestigate the relatioship betwee cholesterol (mg/dl) ad age i adults 6
3 Motivatio: Cholesterol Example 7 Motivatio: Cholesterol Example Is cholesterol associated with age? You could dichotomize age ad compare cholesterol betwee two age groups > group = 1*(age > 55) > group=factor(group,levels=c(,1), labels=c("3-55","56-8")) > table(group) group > boxplot(chol~group,ylab= Total cholesterol(mg/dl) ) 8 mea i group 3-55 mea i group 56-8 Motivatio: Cholesterol Example Is cholesterol associated with age? You could compare mea cholesterol betwee two groups: t-test > t.test(chol ~ group) Welch Two Sample t-test data: chol by group t = , df = , p-value =.315 alterative hypothesis: true differece i meas is ot equal to 95 percet cofidece iterval: sample estimates: mea i group 3-55 mea i group
4 Motivatio: Cholesterol Example Questio: What do the boxplot ad the t-test tell us about the relatioship betwee age ad cholesterol? > t.test(chol ~ group) Welch Two Sample t-test data: chol by group t = , df = , p-value =.315 alterative hypothesis: true differece i meas is ot equal to 95 percet cofidece iterval: sample estimates: mea i group 3-55 mea i group Motivatio: Cholesterol Example Usig the t-test: There is a statistical associatio betwee cholesterol ad age There appears to be a positive associatio betwee cholesterol ad age Is there ay way we could estimate the magitude of this associatio without breakig the cotiuous measure of age ito subgroups? With the t-test, we compared mea cholesterol i two age groups, could we compare mea cholesterol across cotiuous age? 11 Motivatio: Cholesterol Example We might assume that mea cholesterol chages liearly with age: Ca we fid the equatio for a straight lie that best fits these data? 1
5 Liear Regressio A statistical method for modelig the relatioship betwee a cotiuous variable [respose/outcome/depedet] ad other variables [predictors/exposure/idepedet] Most commoly used statistical model Flexible Well-developed ad uderstood properties Easy iterpretatio Buildig block for more geeral models Goals of aalysis: Estimate the associatio betwee respose ad predictors or, Predict respose values give the values of the predictors. We will start our discussio studyig the relatioship betwee a respose ad a sigle predictor Simple liear regressio model 13 The straight lie equatio Y A lie ca be described by two umbers y = b + b 1 x X 14 The straight lie equatio Y b o is the itercept: where the lie crosses the y-axis whe x= X 15
6 The straight lie equatio Y b 1 is the slope: the chage i y correspodig to a uit icrease i x x x+1 X 16 The straight lie equatio Y b 1 is the slope: the chage i y correspodig to a uit icrease i x b +b 1 (x+1) b +b 1 x Differece is b 1 x x+1 X 17 The straight lie equatio Y b 1 is the slope: the chage i y correspodig to a uit icrease i x The same across the etire lie X 18
7 The straight lie equatio Y Two values of x uits apart will have a differece i y values of *b 1 X 19 The straight lie equatio Slope b 1 is the chage i y correspodig to a uit icrease i x Slope gives iformatio about magitude ad directio of the associatio betwee x ad y The straight lie equatio y (b 1 =) No associatio betwee x ad y (values of y are the same regardless of x) y x (b 1 > ) Positive associatio betwee x ad y (values of y icrease as values of x icrease) y x (b 1 < ) Negative associatio betwee x ad y (values of y decrease as values of x icrease) x 1
8 Simple Liear Regressio We ca use liear regressio to model how the mea of a outcome Y chages with the level of a predictor, X The idividual Y observatios will be scattered about the mea We estimate a straight lie describig tred i the mea of a outcome Y as a fuctio of predictor X Simple Liear Regressio I regressio: X is used to predict or explai outcome Y. Respose or depedet variable (Y): variable we wat to predict or explai Explaatory or idepedet or predictor variable (X): attempts to explai the respose Simple Liear Regressio Model: y = b + b1 x + e, e ~ N(, s ) 3 Simple Liear Regressio y = b + b1 x + e, e ~ N(, s ) The model cosists of two compoets: Systematic compoet: E[Y X = x] = β + β 1 x b 1 : slope Mea populatio value of Y at X=x b :itercept Radom compoet: Var[Y X = x]=σ Variace does ot deped o x 4
9 Simple Liear Regressio: Assumptios MODEL: E 1 [ Y X = x] = β + β x Var[ Y X = x] = σ Compare with the boxplots o Slide 8 5 Simple Liear Regressio: Iterpretig model coefficiets Model: E[Y x] = b +b 1 x Var[Y x] = s Questio: How do you iterpret b? Aswer: b = E[Y x=], that is, the mea respose whe x= Your tur: iterpret b 1 6 Simple Liear Regressio: Iterpretig model coefficiets Model: E[Y x] = b +b 1 x Var[Y x] = s Questio: How do you iterpret b 1? Aswer: E[Y x] = β + β 1 x E[Y x+1] = β + β 1 (x+1) = β + β 1 x+ β 1 E[Y x+1] E[Y x] = β 1 idepedet of x (liearity) i.e. β 1 is the differece i the mea respose associated with a oe uit positive differece i x 7
10 Example: Cholesterol ad age Recall: Our motivatig example was to determie if there is a associatio betwee age (a cotiuous predictor) ad cholesterol (a cotiuous outcome) Suppose: We believe they are associated via the liear relatioship E[Y x] = b +b 1 x Questio: How would you iterpret b 1? Aswer: 8 Example: Cholesterol ad age Recall: Our motivatig example was to determie if there is a associatio betwee age (a cotiuous predictor) ad cholesterol (a cotiuous outcome) Suppose: We believe they are associated via the liear relatioship E[Y x] = b +b 1 x Questio: How do you iterpret b 1? Aswer: β 1 is the differece i mea cholesterol associated with a oe year icrease i age 9 Least Squares Estimatio Questio: How to fid a best-fittig lie? 3
11 8 6 y 4 Least Squares Estimatio Questio: How to fid a best-fittig lie? x Method: Least Squares Estimatio Idea: chooses the lie that miimizes the sum of squares of the vertical distaces from the observed poits to the lie. 31 Least Squares Estimatio The least squares regressio lie is give by yˆ = β ˆ + βˆ 1x So the (squared) distace betwee the data (y) ad the least squares regressio lie is D = ( y i yˆ i ) i We estimate β ad β 1 by fidig the values that miimize D 3 Least Squares Estimatio These values are: βˆ = y βˆ x 1 βˆ ( xi x)( yi y) ( xi x) 1 = We estimate the variace as ( y ˆ i yi ) i= 1 i= 1 ˆ ri i= 1 σ = = = ( y βˆ βˆ x ) i 1 i 33
12 Estimated Stadard Errors Recall that whe estimatig parameters, there will be samplig variability i the estimates This is true for regressio parameter estimates Lookig at the formulas for ˆβ ad ˆβ 1, we ca see that these are just complicated meas I repeated samplig we would get differet estimates Kowledge of the samplig distributio of parameter estimates ca help us make iferece about the lie 34 true regressio lie Click here to simulate a data set 35 36
13 Samplig Distributio 37 Samplig Distributio 38 From statistical theory (i the figure, the theoretical distributio is show i pik) " # $ $ % & " # $ $ % & " # $ $ % & " # $ $ % & ) ( 1, ~ ˆ 1) ( 1, ~ ˆ x x s N s x N σ β β σ β β Samplig variability
14 -5 5 desity desity -5 5 desity desity -5 5 desity desity desity Estimated Stadard Errors Estimate the variability of βˆ, βˆ across repeated samplig 1 SE( βˆ ) =σˆ 1 x + ( 1) s x SE( βˆ ) =σˆ 1 1 ( 1) s x 4 Iferece About regressio model parameters Hypothesis testig: H : b j = Test Statistic: Large Samples: βˆ j ( ull hyp) ~ N(,1) se( βˆ ) j Small Samples: ˆ b j - ( ull hyp) ~ t se( ˆ b ) j - Cofidece Itervals: β ˆ ± ( critical value) se( βˆ ) j j [Do t worry about these formulae: we will use R to fit the models] 41 Iferece: Hypothesis Testig Null Hypothesis: b j = T=test statistic Alterative P-Value b j > P(t - >T) desity P-value b j < P(t - <T) desity P-value b j ¹ P(t - > T ) desity P-value 4
15 Iferece: Cofidece Itervals 1 (1-a)% Cofidece Iterval for b j (j=,1) βˆ ± t j, α SE( βˆ ) j Gives itervals that (1- α)1% of the time will cover the true parameter value ( β or β 1 ). We say we are (1- α)1% cofidet the iterval covers β j. 43 Example: Scietific Questio: Is cholesterol associated with age? > fit = lm(chol ~ age) > summary(fit) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.69 o 398 degrees of freedom Multiple R-squared:.499, Adjusted R-squared:.3858 F-statistic: 17.1 o 1 ad 398 DF, p-value: 4.5e-5 > cofit(fit).5 % 97.5 % (Itercept) age Example: Scietific Questio: Is cholesterol associated with age? > fit = lm(chol ~ age) > summary(fit) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max Coefficiets: βˆ 1 =.31 ; Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.69 o 398 degrees of freedom Multiple R-squared:.499, Adjusted R-squared:.3858 F-statistic: 17.1 o 1 ad 398 DF, p-value: 4.5e-5 Estimates of the model parameters ad stadard errors βˆ = ; se( βˆ ) = 4.6 se( βˆ ) =.8 1 > cofit(fit).5 % 97.5 % (Itercept) age
16 Example: Scietific Questio: Is cholesterol associated with age? > fit = lm(chol ~ age) > summary(fit) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.69 o 398 degrees of freedom Multiple R-squared:.499, Adjusted R-squared:.3858 F-statistic: 17.1 o 1 ad 398 DF, p-value: 4.5e-5 95% Cofidece itervals > cofit(fit).5 % 97.5 % (Itercept) age Example: Scietific Questio: Is cholesterol associated with age? What do these models results mea i terms of our scietific questio? Parameter estimates ad cofidece itervals: ˆ b = % CI: (158.5, 175.3) ˆ b = % CI: (.16,.46) bˆ : The estimated average serum cholesterol for someoe of age = is 166.9? Your tur: What about 1 ˆb? 47 Example: Scietific Questio: Is cholesterol associated with age? What do these models results mea i terms of our scietific questio? Parameter estimates ad cofidece itervals: ˆ β = % CI: (158.5, 175.3) ˆ β 1 =.31 95% CI: (.16,.46) Aswer: 1 ˆb : mea cholesterol is estimated to differ by.31 mg/dl for each oe year differece i age. Questio: What about the cofidece itervals? 48
17 Example: Scietific Questio: Is cholesterol associated with age? What do these models results mea i terms of our scietific questio? Parameter estimates ad cofidece itervals: ˆ β = % CI: (158.5, 175.3) ˆ β 1 =.31 95% CI: (.16,.46) Aswer: 95% CIs give us a rage of values that will cover the true itercept ad slope 95% of the time For istace, we ca be 95% cofidet that the true differece i mea cholesterol associated with a oe year differece i age lies betwee.16 ad.46 mg/dl 49 Example: Scietific Questio: Is cholesterol associated with age? Presetatio of the results? The mea serum total cholesterol is sigificatly higher i older idividuals (p <.1). For each additioal year of age, we estimate that the mea total cholesterol differs by approximately.31 mg/dl (95% CI:.16,.46). Note: Emphasis o slope parameter (sig ad magitude) Cofidece iterval Uits for predictor ad respose. Scale matters 5 Iferece for predictios Give estimates βˆ, βˆ 1 we ca fid the predicted value, ŷ i for ay value of x i as ˆ = β ˆ + βˆ yi 1x i Iterpretatio of ŷ i : Estimated mea value of Y at X = x. i Be Cautious: This assumes the model is true. May be a reasoable assumptio withi the rage of your data. It may ot be true outside the rage of your data 51
18 y True model Observed values of x x No data i this rage Would you use the regressio lie to extrapolate?? 5 Be careful of extrapolatig Height (iches) Age It would ot make sese to extrapolate height at age from a study of girls aged 4-9 years 53 Predictio Predictio of the mea E[Y X=x]: Poit Estimate: yˆ =β ˆ ˆ x + β1 Stadard Error: 1 ( x x) se( yˆ) = σˆ + ( x i x) i= 1 Note that as x gets further from x, variace icreases 1 (1-a)% cofidece iterval for E[Y X=x]: ± t se( ˆ) yˆ,1 α / y 54
19 Predictio Predictio of a ew future observatio, y*, at X=x: * Poit Estimate: yˆ =β ˆ ˆ x + β1 Stadard Error: * 1 ( x x) se( yˆ ) = σˆ 1+ + ( x i x) i= 1 1 (1-a)% predictio iterval for a ew future observatio: * * ˆ ± t se( yˆ ) y,1 α / Stadard error for the predictio of a future observatio is bigger: It depeds ot oly o the precisio of the estimated mea, but also o the amout of variability i Y aroud the lie. 55 Cholesterol Example: Predictio Predictio of the mea > predict.lm(fit, ewdata=data.frame(age=c(46,47,48)), iterval="cofidece") fit lwr upr > predict.lm(fit, ewdata=data.frame(age=c(46,47,48)), iterval="predictio") fit lwr upr Predictio of a ew observatio 56 Example: Scietific Questio: Is cholesterol associated with age? Let s iterpret these predictios For x = 46 yˆ = % CI: (178.7, 183.7) yˆ* = % CI: (138.5, 3.9) Questio: How do our iterpretatios for ŷ ad differ? ŷ* 57
20 Example: Scietific Questio: Is cholesterol associated with age? Let s iterpret these predictios For x = 46 Questio: How do our iterpretatios for ŷ ad ŷ* differ? yˆ = % CI: (178.7, 183.7) yˆ* = % CI: (138.5, 3.9) Aswer: The poit estimates represet our predictios for the mea serum cholesterol for idividuals age 46 ( ŷ ) ad for a sigle ew idividual of age 46 ( ŷ* ) 58 Example: Scietific Questio: Is cholesterol associated with age? Let s iterpret these predictios For x = 46 yˆ = % CI: (178.7, 183.7) yˆ* = % CI: (138.5, 3.9) Questio: Why are the cofidece itervals for ŷ ad ŷ* of differig widths? 59 Example: Scietific Questio: Is cholesterol associated with age? Let s iterpret these predictios For x = 46 Questio: Why are the cofidece itervals for ŷ ad ŷ* of differig widths? yˆ = % CI: (178.7, 183.7) yˆ* = % CI: (138.5, 3.9) Aswer: The iterval is broader whe we make a predictio for a cholesterol level for a sigle idividual because it must icorporate radom variability aroud the mea. 6
21 y Simple Liear Regressio: R Give o liear associatio: We could simply use the sample mea to predict E(Y). The variability usig this simple predictio is give by SST (to be defied shortly). Give a liear associatio: The use of X permits a potetially better predictio of Y by usig E(Y X). Questio: What did we gai by usig X? Let s examie this questio with the followig figure 61 Decompositio of sum of squares y y yˆ y y yˆ y x 6 Decompositio of sum of squares It is always true that: y y = ( y yˆ ) + ( yˆ y) i i i i It ca be show that: ( yi y) = ( yi yˆ i ) + i= 1 i= 1 i= 1 ( yˆ y) i SST = SSE + SSR SST: describes the total variatio of the Y i. SSE: describes the variatio of the Y i aroud the regressio lie. SSR: describes the structural variatio; how much of the variatio is due to the regressio relatioship. This decompositio allows a characterizatio of the usefuless of the covariate X i predictig the respose variable Y. 63
22 Simple Liear Regressio: R Give o liear associatio: We could simply use the sample mea to predict E(Y). The variability betwee the data ad this simple predictio is give as SST. Give a liear associatio: The use of X permits a potetially better predictio of Y by usig E(Y X). Questio: What did we gai by usig X? Aswer: We ca aswer this by computig the proportio of the total variatio that ca be explaied by the regressio o X R SSR SST SSE = = = 1 SST SST SSE SST This R is, i fact, the correlatio coefficiet squared. 64 Examples of R Low values of R idicate that the model is ot adequate. However, high values of R do ot mea that the model is adequate 65 Cholesterol Example: Scietific Questio: Ca we predict cholesterol based o age? > fit = lm(chol ~ age) > summary(fit) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.69 o 398 degrees of freedom Multiple R-squared:.499, Adjusted R-squared:.3858 F-statistic: 17.1 o 1 ad 398 DF, p-value: 4.5e-5 > cofit(fit).5 % 97.5 % (Itercept) age
23 R =.4 Cholesterol Example: Scietific Questio: Ca we predict cholesterol based o age? What does R tell us about our model for cholesterol? 67 R =.4 Cholesterol Example: Scietific Questio: Ca we predict cholesterol based o age? What does R tell us about our model for cholesterol? Aswer: 4% of the variability i cholesterol is explaied by age. Although mea cholesterol icreases with age, there is much more variability i cholesterol tha age aloe ca explai 68 Cholesterol Example: Scietific Questio: Ca we predict cholesterol based o age? Decompositio of Sum of Squares ad the F-statistic > aova(fit) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) SSR= age e-5 *** SSE= Residuals Sigif. codes: ***.1 **.1 * Degrees of freedom Decompositio of the Sum of Squares Mea Squares: SS/df F-statistic: MSR/MSE I simple liear regressio: F-statistic = (t-statistic for slope) Hypothesis beig tested: H : b 1 =, H 1 : b 1 ¹. 69
24 Simple Liear Regressio: Assumptios 1. E[Y x] is related liearly to x. Y s are idepedet of each other 3. Distributio of [Y x] is ormal 4. Var[Y x] does ot deped o x Liearity Idepedece Normality Equal variace Ca we assess if these assumptios are valid? 7 Model Checkig: Residuals (Raw or ustadardized) Residual: differece (r i ) betwee the observed respose ad the predicted respose, that is, r = y yˆ i i = y ( β ˆ + βˆ x ) i i 1 i The residual captures the compoet of the measuremet y i that caot be explaied by x i. 71 Model Checkig: Residuals Residuals ca be used to Idetify poorly fit data poits Idetify uequal variace (heteroscedasticity) Idetify oliear relatioships Idetify additioal variables Examie ormality assumptio 7
25 Model Checkig: Residuals Liearity Idepedece Normality Equal variace Plot residual vs X or vs Ŷ Q: Is there ay tred? Q: Ay scietific cocers? Residual histogram or qq-plot Q: Symmetric? Normal? Plot residual vs X Q: Is there ay patter? 73 Model Checkig: Residuals If the liear model is appropriate we should see a ustructured horizotal bad of poits cetered at zero as see i the figure below Residuals 1 1 Deviatio = residual x 74 Model Checkig: Residuals The model does ot provide a good fit i these cases Residuals Residuals Violatios of the model assumptios? How? 75
26 Simple Liear Regressio: Residual Aalysis: Noliear Associatio True model: y = x^1.7 Plot of Fitted Model: ^ y= x 1 5 Residuals Plot fitted (predictio) vs. residual: ^y= x x Fitted values 76 Simple Liear Regressio: Residual Aalysis: No Costat Variace True model: y = x + errors icreasig with x 1 Plot of Fitted Model: ^y= x Plot fitted (predictio) vs. residual: ^y= x 5 5 Residuals x Fitted values 77 No-costat variace Sometimes variace of y is ot costat across the rage of x (heteroscedasticity) Little effect o poit estimates but variace estimates may be icorrect This may affect cofidece itervals ad p-values To accout for heteroscedasticity we ca Use robust stadard errors Trasform the data Fit a model that does ot assume costat variace (GLM) 78
27 Robust stadard errors Robust stadard errors correctly estimate variability of parameter estimates eve uder ocostat variace These stadard errors use empirical estimates of the variace i y at each x value rather tha assumig this variace is the same for all x values Regressio poit estimates will be uchaged Robust or empirical stadard errors will give correct cofidece itervals ad p-values 79 Simple Liear Regressio: Residual Aalysis: No-ormality of errors QQ-plot Graphical techique that allows us to assess whether or ot a data set follows a give distributio (such as the ormal distributio) The data are plotted agaist a give theoretical distributio Poits should approximately fall i a straight lie Departures from the straight lie idicate departures from the specified distributio. 8 Simple Liear Regressio: Residual Aalysis: No-ormality of errors Costructio of QQ-Plot: (a example) residuals Sort sorted idex(i) Empirical Probab i/ (i-.5)/ pr1 pr Z-quatile Get z-values -.19 (quatiles from -.6 Normal distr.) Plot of sorted residuals (sample quatiles) versus z-quatile (theoretical quatiles) = QQ-plot 81
28 Simple Liear Regressio: Residual Aalysis: No-ormality of errors Residuals Iverse Normal True model: y = x + chi-squared errors Plot of Fitted Model: Q-Q Plot Uder ormality, residuals should fall o the straight lie ^ 8 y= x x Plot of residuals versus fitted values Curvature? Heteroscedasticity? R COMMAND: plot(fit$fitted, fit$residuals) Plot of residuals versus quatiles of a ormal distributio(for > 3) Normality? R COMMAND: qqorm(fit$residuals) Cholesterol-Age example: Residuals fit$fitted fit$residuals Theoretical Quatiles Sample Quatiles 83 Aother example Liear regressio for associatio betwee age ad triglycerides > fit.tg=lm(tg~age)
29 Robust stadard errors Residual aalysis suggests meavariace relatioship Use robust stadard errors to get correct variace estimates fit.tg$fitted fit.tg$residuals 85 Cholesterol example: Robust stadard errors Liear regressio results: Results icorporatig robust SEs: > summary(fit.tg) Call: lm(formula = TG ~ age) Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) e-6 *** age < e-16 *** > summary(fit.tg.ese) Call: gee(formula = TG ~ age, id = seq(1, legth(age))) Coefficiets: Estimate Naive S.E. Naive z Robust S.E. Robust z (Itercept) age Poit estimates are uchaged 86 Cholesterol example: Robust stadard errors Liear regressio results: Results icorporatig robust SEs: > summary(fit.tg) Call: lm(formula = TG ~ age) Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) e-6 *** age < e-16 *** > summary(fit.tg.ese) Call: gee(formula = TG ~ age, id = seq(1, legth(age))) Coefficiets: Estimate Naive S.E. Naive z Robust S.E. Robust z (Itercept) age Stadard errors are corrected 87
30 Trasformatios Some reasos for usig data trasformatios Cotet area kowledge suggests oliearity Origial data suggest oliearity Equal variace assumptio violated Normality assumptio violated Trasformatios may be applied to the respose, predictor or both Be careful with the iterpretatio of the results 88 Cholesterol example: Trasformatios We have see that triglycerides are associated with age but display o-costat variace What about log trasformed triglycerides? 89 Cholesterol example: Trasformatios > summary(fit.tg.l) Call: lm(formula = log(tg) ~ age) Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) <e-16 *** age <e-16 *** fit.tg.l$fitted fit.tg.l$residuals Heteroscedasticity is corrected But iterpretatio of model is more complicated 9
31 Trasformatios Rarely do we kow which trasformatio of the predictor provides best liear fit As always, there is a dager i usig the data to estimate the best trasformatio to use If there is o associatio of ay kid betwee the respose ad the predictor, a liear fit (with a zero slope) is the correct oe Tryig to detect a trasformatio is thus a iformal test for a associatio Multiple testig procedures iflate the Type I error It is best to choose the trasformatio of the predictor o scietific grouds However, sometimes it does t matter it is ofte the case that may fuctios are well approximated by a straight lie over a small rage of the data Other approaches to o-liearity iclude splies ad fractioal polyomials 91 Model Checkig: Outliers vs Ifluetial observatios Outlier: a observatio with a residual that is uusually large (positive or egative) as compared to the other residuals. Ifluetial poit: a observatio that has a otable ifluece i determiig the regressio equatio. Removig such a poit would markedly chage the positio of the regressio lie. Observatios that are somewhat extreme for the value of x ca be ifluetial. 9 Outlier vs Ifluetial observatios y Poit A Lie icludig Poit A ^Y= x 5*X Lie with Poit A removed ^Y=.36+1.x *X Poit A is a outlier, but is ot ifluetial. x 93
32 Outlier vs Ifluetial observatios 8 6 Lie icludig Poit B ^Y= *X Poit B Y 4 Lie with Poit B removed ^Y= *X X Poit B is ifluetial, but ot a outlier. 94 Cholesterol-Age Example: Residuals No extreme outliers 95 Model Checkig: Deletio diagostics Δβ ( i) Δβ ( i) se( βˆ) = βˆ βˆ ( i) : Delta-beta : Stadardized Delta-beta Delta-beta Stadardized delta-beta : tells how much the regressio coefficiet chaged by excludig the i th observatio : approximates how much the t-statistic for a coefficiet chaged by excludig the i th observatio 96
33 ( Cholesterol-Age Example: Deletio diagostics > dfb = dfbeta(fit) > idex=order(abs(dfb[,]),decreasig=t) > cbid(dfb[idex[1:15],],age[idex[1;15]]) (Itercept) age No evidece of ifluetial poits. The largest (i absolute value) delta beta is.15 compared to.31 for the regressio coefficiet. 97 Model Checkig What to do if you fid a outlier ad/or ifluetial observatio: Check it for accuracy Decide (based o scietific judgmet) whether it is best to keep it or omit it If you thik it is represetative, ad likely would have appeared i a larger sample, keep it If you thik it is very uusual ad ulikely to occur agai i a larger sample, omit it Report its existece [whether or ot it is omitted] 98 Simple Liear Regressio: Impact of Violatios of Model Assumptios No Liearity No Normality Estimates Problematic Miimal for most departures. Outliers ca be a problem. Tests/CIs Problematic Miimal for most departures. CIs for correlatio are sesitive. Correctio Trasform or Choose a oliear model. Delete outliers (if warrated) or Use robust regressio Uequal Variaces Miimal impact Variace estimates are wrog, but the effect is usually ot dramatic Trasform or Use robust stadard error Depedece Ofte the estimates are ubiased Variace estimates are wrog Regressio for depedet data 99
34 REGRESSION MODELS MULTIPLE LINEAR REGRESSION 1 Outlie: Multiple Liear Regressio Motivatio Model ad Iterpretatio Estimatio ad Iferece Iteractio 11 Motivatio The respose or depedet variable, Y, may deped o several predictors ot just oe Multiple regressio is a attempt to cosider the simultaeous ifluece of several variables o the respose This may be with the goal of a ubiased estimate of associatio or for better predictio 1
35 Motivatio Why ot fit multiple separate simple liear regressios? If the goal is to estimate the associatio betwee the respose ad a predictor of iterest, a cofouder ca make the observed associatio appear stroger tha the true associatio, weaker tha the true associatio, or eve the reverse of the true associatio How ca we address this: We ca adjust for the effects of the cofouder by addig a correspodig term to our liear regressio If the goal is predictio of the respose, we may be able to improve predictio by icludig additioal variables i the regressio model 13 Motivatio: Cholesterol Example Data > head(cholesterol) ID sex age chol BMI TG APOE rs rs HTN chd Our goal: Ivestigate the relatioship betwee age (years), BMI (kg/m ) ad serum total cholesterol (mg/dl) 14 Motivatio I geeral, the multiple regressio equatio ca be writte as follows: E[Y x 1,x,...,x p ] = β + β 1 x 1 + β x β p x p We use multiple variables whe: The predictor variable is categorical with more tha two groups We eed polyomials, splies or other fuctios to model the shape of the relatioship(s) accurately Estimatig associatio: We wat to adjust for cofoudig by other variables We wat to allow the associatio to differ for differet values of other variables (iteractio) Predictio: we use multiple variables if we thik more tha oe variable will be useful i predictig future outcomes accurately 15
36 Model ad Iterpretatio Model: Y β + β x + β x β + ε = 1 1 p x p where we assume iid ε ~ N (, σ ) Extesio of simple liear regressio Systematic compoet: E [ Y x1,..., x p + β1x1+ β x ]= β β x p p Radom compoet: Var[ Y x1,..., x p ] = σ 16 Model ad Iterpretatio For example, let us assume that there are two predictors i the model ad so E[Y x 1, x ] = b + b 1 x 1 + b x Cosider two observatios with the same value for x, but oe observatio has x 1 oe uit higher, that is, Obs 1: E[Y x 1 =k+1, x =c] = b + b 1 (k+1) + b c Obs : E[Y x 1 =k, x =c] = b + b 1 (k) + b c Thus, E[Y x 1 =k+1, x =c] - E[Y x 1 =k, x =c] = b 1 That is, b 1 is the expected mea chage i y per uit chage i x 1 if x is held costat (adjusted/cotrollig for x ) Similar iterpretatio applies to b 17 Model ad Iterpretatio To facilitate our discussio let s assume we have two predictors with biary values Model: E[ Y x = β + β x + β x 1, x] 1 1 Mea of Y X = X =1 X 1 = b b +b X 1 =1 b +b 1 b +b 1 +b E[Y x 1 =1, x =] - E[Y x 1 =,x =] = b 1 E[Y x 1 =1, x =1] - E[Y x 1 =,x =1] = b 1 E[Y x 1 =, x =1] - E[Y x 1 =,x =] = b E[Y x 1 =1, x =1] - E[Y x 1 =1,x =] = b 18
37 Estimatio Least Squares Estimatio: Chooses the coefficiet estimates that miimize the residual sum of squares ( y y$ i i ) i Computatio more difficult, but statistical software (R) will do that for you 19 Estimatio ad Iferece Iferece About regressio model parameters Hypothesis Testig H : b j = Iterpretatio: Is there a statistically sigificat relatioship betwee the respose y ad x j after adjustig for all other factors (predictors) i the model? ˆ b j - ( ull hyp) Test Statistic: ~ t- p-1 se( ˆ b ) j Note: The square of the t-statistic gives the F-statistic ad the test is kow as the partial F-Test Cofidece Itervals β ˆ ( ) ( ˆ j ± critical value se β j ) 11 Estimatio ad Iferece About the full model Hypotheses H : β β =... = β vs. H 1 : At least oe b j is ot ull 1 = p = Aalysis of variace table Source df SS MS F Regressio p SSR= (y $ i - y) MSR= SSR/p MSR/MSE Residual -p-1 SSE= (y - y $ ) MSE= i i SSE/(-p-1) Total -1 SST= (y i - y) 111
38 Estimatio ad Iferece The F-value is tested agaist a F-distributio with p, -p-1 degrees of freedom If we reject the ull hypothesis, the the predictors do aid i predictig Y [i this aalysis we do ot kow which oes are importat] Failig to reject the ull hypothesis does ot mea that oe of the covariates are importat, sice the effect of oe or more covariates may be "masked" by others. The hard part is choosig which covariates to iclude or exclude. This is kow as the global (multiple) F-test 11 Scietific example: Modelig cholesterol usig age ad BMI We have see that there is a sigificat relatioship betwee age ad cholesterol Ca we better uderstad variability i cholesterol by icorporatig additioal covariates? 113 Scietific example: Modelig cholesterol usig age ad BMI 114
39 Scietific example: Modelig cholesterol usig age ad BMI It appears that BMI icreases with icreasig age Ad cholesterol icreases with icreasig BMI What if we wat to estimate the associatio betwee age ad cholesterol while holdig BMI costat? Multiple regressio 115 Scietific example: Modelig cholesterol usig age ad BMI > fit=lm(chol~age+bmi) > summary(fit) Call: lm(formula = chol ~ age + BMI) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age * BMI *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.34 o 397 degrees of freedom Multiple R-squared:.7351, Adjusted R-squared:.6884 F-statistic: o ad 397 DF, p-value:.6e Scietific example: Modelig cholesterol usig age ad BMI Our estimated regressio equatio is y ˆ = Age +1.43BMI Questio: How do we iterpret the age coefficiet? 117
40 Scietific example: Modelig cholesterol usig age ad BMI Our estimated regressio equatio is y ˆ = Age +1.43BMI Questio: How do we iterpret the age coefficiet? Aswer: This is the estimated average differece i cholesterol associated with a oe year differece i age for two subjects with the same BMI. 118 Scietific example: Modelig cholesterol usig age ad BMI Our estimated regressio equatio is y ˆ = Age +1.43BMI The age coefficiet from our simple liear regressio model was.31. Questio: Why do the estimates from the two models differ? 119 Scietific example: Modelig cholesterol usig age ad BMI Our estimated regressio equatio is y ˆ = Age +1.43BMI The age coefficiet from our simple liear regressio model was.31. Questio: Why do the estimates from the two models differ? Aswer: We are ow coditioig o or cotrollig for BMI so our estimate of the age associatio is amog subjects with the same BMI. 1
41 Scietific example: Modelig cholesterol usig age ad BMI Call: lm(formula = chol ~ age + BMI) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age * BMI *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.34 o 397 degrees of freedom Multiple R-squared:.7351, Adjusted R-squared:.6884 F-statistic: o ad 397 DF, p-value:.6e-7 11 Cholesterol Example: Did addig BMI improve our model? > aova(fit,fit) Aalysis of Variace Table Model 1: chol ~ age Model : chol ~ age + BMI Res.Df RSS Df Sum of Sq F Pr(>F) *** --- Sigif. codes: ***.1 **.1 * How does the model with age ad BMI compare to a model that cotais oly the mea? > fit=lm(chol~1) > aova(fit,fit) Aalysis of Variace Table Model 1: chol ~ 1 Model : chol ~ age + BMI Res.Df RSS Df Sum of Sq F Pr(>F) e-7 *** --- Sigif. codes: ***.1 **.1 * Iteractio ad Liear Regressio Statistical iteractio (aka effect modificatio) occurs whe the relatioship betwee a outcome variable ad oe predictor is differet depedig o the levels of a secod predictor Iteractios are usually ivestigated because of a priori assumptios/hypotheses o the part of the researchers Liear regressio models allow for the iclusio of iteractios with cross-product terms 13
42 Cofoudig vs. Iteractio/Effect Modificatio Data ad scietific uderstadig help distiguish betwee cofoudig ad effect modifyig variables: Cofouder: Associated with predictor ad respose; Associatio betwee respose ad predictor costat across strata of the ew variable Effect modifier/iteractio: Associatio betwee respose ad the predictor varies across strata of the ew variable 14 Cofoudig vs. Iteractio/Effect Modificatio Cofoudig: Estimates of associatio from uadjusted aalysis are markedly differet from estimates of associatio from adjusted aalysis Associatio withi each stratum is similar, but differet from the crude associatio i the combied data (igorig the strata) I liear regressio, these symptoms are diagostic of cofoudig Effect modificatio would show differeces betwee adjusted aalysis ad uadjusted aalysis, but would also show differet associatios i the differet strata 15 Effect Modificatio /Iteractio Eve if preset, effect modificatio may ot always be of iterest i summarizig the effect of a predictor. For example, plecoaril, a ativiral drug, reduced the mea duratio of symptoms i subjects with a commo cold due to rhioviruses but had o effect i subjects whose cold was due to some other aget. I the case of the plecoaril, effect modificatio was importat i checkig that the drug did actually work by ihibitig rhiovirus. However, i cliical use of the drug, it would typically ot be possible to determie the ifectious aget (the tests are expesive ad take loger tha just recoverig from the cold), ad so the average effectiveess of the drug across all colds would be a more importat quatity. 16
43 Graphical Represetatio Y W=1 No parallel lies Iteractio W= X 17 Graphical Represetatio Y W=1 No parallel lies Iteractio W= X 18 Graphical Represetatio Y W=1 Parallel lies No Iteractio W= X 19
44 Graphical Represetatio Y W=1 [Y W=1] Parallel lies No Iteractio W= [Y W=] W is possibly a Cofouder X [X W=] [X W=1] 13 Graphical Represetatio Y W=1 Parallel lies No Iteractio W= X 131 Model ad Iterpretatio: iteractio Assume that there are two predictors i the model E[Y x 1, x ] = b + b 1 x 1 + b x + b 3 x 1 x Cosider two observatios with the same value, c, for x, but oe observatio has x 1 oe uit higher Obs 1: E[Y x 1 =k+1, x =c] = b + b 1 (k+1) + b c + b 3 (k+1)c Obs : E[Y x 1 =k, x =c] = b + b 1 (k) + b c + b 3 kc Thus, E[Y x 1 =k+1, x =c] - E[Y x 1 =k, x =c] = b 1 + b 3 c That is, the differece i meas depeds ow o the value of x 13
45 Model ad Iterpretatio: iteractio Model: E[Y x 1, x ] = b + b 1 x 1 + b x + b 3 x 1 x Differece i Meas: E[Y x 1 =k+1, x =c] - E[Y x 1 =k, x =c] = b 1 + b 3 c The differece i meas depeds o the value of x The differece i meas is b 1 if c=. The differece i meas is b 1 + b 3 if c=1 The differece i meas chages by b 3 for each uit differece i c (that is, i x ) [that is, b 3 is the differece of differeces] H : β 3 = tests for iteractio Model ad Iterpretatio: iteractio Model: E[Y x 1, x ] = b + b 1 x 1 + b x + b 3 x 1 x Aother way to look at this Factor terms ivolvig x 1 : E[Y x 1, x ] = b + (b 1 + b 3 x )x 1 + b x Slope of x 1 chages with x = Differece i meas for each uit differece i x 1 chages with x (for each oe uit differece i x, the differece i meas chages by b 3 ) Cholesterol Example: Does sex affect the age cholesterol relatioship? age chol Male Female
46 Cholesterol Example: Does sex affect the age cholesterol relatioship? We first fit the model with age ad sex terms oly (Male: sex=, Female: sex=1) > fit3 = lm(chol ~ age+sex) > summary(fit3) Call: lm(formula = chol ~ age + sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age e-5 *** sex e-7 *** --- Sigif. codes: ***.1 **.1 * Residual stadard error: 1.6 o 397 degrees of freedom Multiple R-squared:.9748, Adjusted R-squared:.993 F-statistic: 1.44 o ad 397 DF, p-value: 1.44e Cholesterol Example: Does sex affect the age cholesterol relatioship? Cholesterol Example: Does sex affect the age cholesterol relatioship? This model idicates that, after cotrollig for the effect of sex, the average cholesterol differs by.3 for each additioal year of age The age effect i this model is very similar to the effect from our simple liear regressio (.31) However, this does ot mea that the age/cholesterol relatioship is the same i males ad females To aswer this questio we must add the iteractio term 138
47 Cholesterol Example: Does sex affect the age cholesterol relatioship? Model with age ad sex mai effects, plus iteractio effect > fit4=lm(chol~age*sex) > summary(fit4) Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e Cholesterol Example: Does sex affect the age cholesterol relatioship? Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Mea cholesterol for males at age Coefficiets: Estimate Std. Error t value Pr(> t ) < e-16 (Itercept) *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e-9 14 Cholesterol Example: Does sex affect the age cholesterol relatioship? Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Differece i mea cholesterol betwee males ad females at age Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e-9 141
48 Cholesterol Example: Does sex affect the age cholesterol relatioship? Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Differece i mea cholesterol associated with each oe year chage i age for males Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e-9 14 Cholesterol Example: Does sex affect the age cholesterol relatioship? Call: lm(formula = chol ~ age * sex) Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) < e-16 *** age ** sex age:sex Sigif. codes: ***.1 **.1 * Differece i chage i mea cholesterol associated with each oe year chage i age for females compared to males Residual stadard error: 1.8 o 396 degrees of freedom Multiple R-squared:.986, Adjusted R-squared:.913 F-statistic: o 3 ad 396 DF, p-value: 6.795e Iterpretatio? Estimated model: Age Sex-.7 Age x Sex Cholesterol Example: Does sex affect the age cholesterol relatioship? Subject 1: Age = a+1, sex = b Subject : Age = a, sex = b Differece i the estimated cholesterol: [ (a+1) (b).7 (a+1)(b)] [ (a) (b).7 (a)(b)] =.33-.7b Sex exerts a small (ot statistically sigificat) effect o the age/cholesterol relatioship I males: Age I females: Age 144
49 We ca also test the sigificace of iteractio terms usig a F-test Addig the iteractio term did ot sigificatly improve model fit > aova(fit3,fit4) Aalysis of Variace Table Model 1: chol ~ age + sex Model : chol ~ age * sex Res.Df RSS Df Sum of Sq F Pr(>F) Cholesterol Example: Does sex affect the age cholesterol relatioship? 145 Cholesterol Example: Does sex affect the age cholesterol relatioship? Age (years) Total cholesterol (mg/dl) Male Female Summary We have cosidered: Simple liear regressio Iterpretatio Estimatio Model checkig Multiple liear regressio Cofoudig Iterpretatio Estimatio Iteractio
1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationREGRESSION MODELS ANOVA
REGRESSION MODELS ANOVA 141 Cotiuous Outcome? NO RECAP: Logistic regressio ad other methods YES Liear Regressio Examie mai effects cosiderig predictors of iterest, ad cofouders Test effect modificatio
More informationStat 139 Homework 7 Solutions, Fall 2015
Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationLinear Regression Models
Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect
More informationResponse Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable
Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationCorrelation Regression
Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother
More information11 Correlation and Regression
11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record
More informationRead through these prior to coming to the test and follow them when you take your test.
Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1
More information3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.
3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear
More informationLecture 11 Simple Linear Regression
Lecture 11 Simple Liear Regressio Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech Midterm 2 mea: 91.2 media: 93.75 std: 6.5 2 Meddicorp
More informationRegression, Inference, and Model Building
Regressio, Iferece, ad Model Buildig Scatter Plots ad Correlatio Correlatio coefficiet, r -1 r 1 If r is positive, the the scatter plot has a positive slope ad variables are said to have a positive relatioship
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +
More informationSimple Linear Regression
Simple Liear Regressio 1. Model ad Parameter Estimatio (a) Suppose our data cosist of a collectio of pairs (x i, y i ), where x i is a observed value of variable X ad y i is the correspodig observatio
More informationII. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation
II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio
More information(all terms are scalars).the minimization is clearer in sum notation:
7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationFinal Examination Solutions 17/6/2010
The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:
More informationChapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo
More informationSimple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700
Simple Regressio CS 7 Ackowledgemet These slides are based o presetatios created ad copyrighted by Prof. Daiel Measce (GMU) Basics Purpose of regressio aalysis: predict the value of a depedet or respose
More informationContinuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised
Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for
More informationRecall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.
Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed
More informationREGRESSION METHODS. Logistic regression
REGRESSION METHODS Logistic regressio 233 RECAP: Biary Outcome? NO Cotiuous Outcome? YES Liear Regressio/ANOVA NO Other Methods YES Odds ratio as measure of associatio? Relative risk as measure of associatio?
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS
PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed
More informationUniversity of California, Los Angeles Department of Statistics. Simple regression analysis
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100C Istructor: Nicolas Christou Simple regressio aalysis Itroductio: Regressio aalysis is a statistical method aimig at discoverig
More informationBIOS 4110: Introduction to Biostatistics. Breheny. Lab #9
BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationChapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more
More informationDr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)
Dr Maddah NMG 617 M Statistics 11/6/1 Multiple egressio () (Chapter 15, Hies) Test for sigificace of regressio This is a test to determie whether there is a liear relatioship betwee the depedet variable
More informationParameter, Statistic and Random Samples
Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,
More informationComparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading
Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual
More informationChapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).
Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each
More informationCorrelation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph
Correlatio Y Two variables: Which test? X Explaatory variable Respose variable Categorical Numerical Categorical Cotigecy table Cotigecy Logistic Grouped bar graph aalysis regressio Mosaic plot Numerical
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.
More informationStatistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005
Statistics 203 Itroductio to Regressio ad Aalysis of Variace Assigmet #1 Solutios Jauary 20, 2005 Q. 1) (MP 2.7) (a) Let x deote the hydrocarbo percetage, ad let y deote the oxyge purity. The simple liear
More informationSimple Linear Regression
Chapter 2 Simple Liear Regressio 2.1 Simple liear model The simple liear regressio model shows how oe kow depedet variable is determied by a sigle explaatory variable (regressor). Is is writte as: Y i
More informationFinal Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech
Fial Review Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech 1 Radom samplig model radom samples populatio radom samples: x 1,..., x
More informationSTA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:
STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationSample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.
ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally
More informationRegression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.
Regressio Correlatio vs. regressio Predicts Y from X Liear regressio assumes that the relatioship betwee X ad Y ca be described by a lie Regressio assumes... Radom sample Y is ormally distributed with
More informationSTA6938-Logistic Regression Model
Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve
More informationST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.
ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic
More informationFirst, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,
0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationMA238 Assignment 4 Solutions (part a)
(i) Sigle sample tests. Questio. MA38 Assigmet 4 Solutios (part a) (a) (b) (c) H 0 : = 50 sq. ft H A : < 50 sq. ft H 0 : = 3 mpg H A : > 3 mpg H 0 : = 5 mm H A : 5mm Questio. (i) What are the ull ad alterative
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationECON 3150/4150, Spring term Lecture 3
Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio
More informationInterval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),
Cofidece Iterval Estimatio Problems Suppose we have a populatio with some ukow parameter(s). Example: Normal(,) ad are parameters. We eed to draw coclusios (make ifereces) about the ukow parameters. We
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationS Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y
1 Sociology 405/805 Revised February 4, 004 Summary of Formulae for Bivariate Regressio ad Correlatio Let X be a idepedet variable ad Y a depedet variable, with observatios for each of the values of these
More information6 Sample Size Calculations
6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig
More informationBig Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.
5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationMOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.
XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced
More information7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationSTP 226 ELEMENTARY STATISTICS
TP 6 TP 6 ELEMENTARY TATITIC CHAPTER 4 DECRIPTIVE MEAURE IN REGREION AND CORRELATION Liear Regressio ad correlatio allows us to examie the relatioship betwee two or more quatitative variables. 4.1 Liear
More informationStatistics 20: Final Exam Solutions Summer Session 2007
1. 20 poits Testig for Diabetes. Statistics 20: Fial Exam Solutios Summer Sessio 2007 (a) 3 poits Give estimates for the sesitivity of Test I ad of Test II. Solutio: 156 patiets out of total 223 patiets
More information1 Models for Matched Pairs
1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 017 MODULE 4 : Liear models Time allowed: Oe ad a half hours Cadidates should aswer THREE questios. Each questio carries
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationINSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43
PAPER NO.: 444, 445 PAGE NO.: Page 1 of 1 INSTRUCTIONS I. You have bee provided with: a) the examiatio paper i two parts (PART A ad PART B), b) a multiple choice aswer sheet (for PART A), c) selected formulae
More informationLesson 11: Simple Linear Regression
Lesso 11: Simple Liear Regressio Ka-fu WONG December 2, 2004 I previous lessos, we have covered maily about the estimatio of populatio mea (or expected value) ad its iferece. Sometimes we are iterested
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More informationUniversity of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 00C Istructor: Nicolas Christou EXERCISE Aswer the followig questios: Practice problems - simple regressio - solutios a Suppose y,
More informationChapter 12 Correlation
Chapter Correlatio Correlatio is very similar to regressio with oe very importat differece. Regressio is used to explore the relatioship betwee a idepedet variable ad a depedet variable, whereas correlatio
More informationt distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference
EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The
More informationChapter 23: Inferences About Means
Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For
More informationMCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions
Faculty of Egieerig MCT242: Electroic Istrumetatio Lecture 2: Istrumetatio Defiitios Overview Measuremet Error Accuracy Precisio ad Mea Resolutio Mea Variace ad Stadard deviatio Fiesse Sesitivity Rage
More informationOpen book and notes. 120 minutes. Cover page and six pages of exam. No calculators.
IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits
More informationMidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday
Aoucemets MidtermII Review Sta 101 - Fall 2016 Duke Uiversity, Departmet of Statistical Sciece Office Hours Wedesday 12:30-2:30pm Watch liear regressio videos before lab o Thursday Dr. Abrahamse Slides
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationBecause it tests for differences between multiple pairs of means in one test, it is called an omnibus test.
Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal
More informationTMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx
More informationLecture 1, Jan 19. i=1 p i = 1.
Lecture 1, Ja 19 Review of the expected value, covariace, correlatio coefficiet, mea, ad variace. Radom variable. A variable that takes o alterative values accordig to chace. More specifically, a radom
More informationOctober 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1
October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 1 Populatio parameters ad Sample Statistics October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 2 Ifereces
More informationComparing your lab results with the others by one-way ANOVA
Comparig your lab results with the others by oe-way ANOVA You may have developed a ew test method ad i your method validatio process you would like to check the method s ruggedess by coductig a simple
More informationTopic 10: Introduction to Estimation
Topic 0: Itroductio to Estimatio Jue, 0 Itroductio I the simplest possible terms, the goal of estimatio theory is to aswer the questio: What is that umber? What is the legth, the reactio rate, the fractio
More informationSIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS
SIMPLE LINEAR REGRESSION AND CORRELATION ANALSIS INTRODUCTION There are lot of statistical ivestigatio to kow whether there is a relatioship amog variables Two aalyses: (1) regressio aalysis; () correlatio
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationPolynomial Functions and Their Graphs
Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively
More informationBiostatistics for Med Students. Lecture 2
Biostatistics for Med Studets Lecture 2 Joh J. Che, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 22, 2017 Lecture Objectives To uderstad basic research desig priciples
More information1036: Probability & Statistics
036: Probability & Statistics Lecture 0 Oe- ad Two-Sample Tests of Hypotheses 0- Statistical Hypotheses Decisio based o experimetal evidece whether Coffee drikig icreases the risk of cacer i humas. A perso
More informationChapter 2 Descriptive Statistics
Chapter 2 Descriptive Statistics Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationOverview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions
Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples
More informationLecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS
Lecture 5: Parametric Hypothesis Testig: Comparig Meas GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review from last week What is a cofidece iterval? 2 Review from last week What is a cofidece
More informationAssessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions
Assessmet ad Modelig of Forests FR 48 Sprig Assigmet Solutios. The first part of the questio asked that you calculate the average, stadard deviatio, coefficiet of variatio, ad 9% cofidece iterval of the
More informationAnalysis of Experimental Data
Aalysis of Experimetal Data 6544597.0479 ± 0.000005 g Quatitative Ucertaity Accuracy vs. Precisio Whe we make a measuremet i the laboratory, we eed to kow how good it is. We wat our measuremets to be both
More informationAccuracy assessment methods and challenges
Accuracy assessmet methods ad challeges Giles M. Foody School of Geography Uiversity of Nottigham giles.foody@ottigham.ac.uk Backgroud Need for accuracy assessmet established. Cosiderable progress ow see
More informationn but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.
Sectio. True or False Questios (2 pts each). For a populatio the meas is defied as i= μ = i but for a small sample of the populatio, the mea is defied as: = i= i 2. For a logormal distributio, the media
More informationChapter 11: Asking and Answering Questions About the Difference of Two Proportions
Chapter 11: Askig ad Aswerig Questios About the Differece of Two Proportios These otes reflect material from our text, Statistics, Learig from Data, First Editio, by Roxy Peck, published by CENGAGE Learig,
More informationStatistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes
Admiistrative Notes s - Lecture 7 Fial review Fial Exam is Tuesday, May 0th (3-5pm Covers Chapters -8 ad 0 i textbook Brig ID cards to fial! Allowed: Calculators, double-sided 8.5 x cheat sheet Exam Rooms:
More informationChapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers
Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio:
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More information