REGRESSION METHODS. Logistic regression

Size: px
Start display at page:

Download "REGRESSION METHODS. Logistic regression"

Transcription

1 REGRESSION METHODS Logistic regressio 233

2 RECAP: Biary Outcome? NO Cotiuous Outcome? YES Liear Regressio/ANOVA NO Other Methods YES Odds ratio as measure of associatio? Relative risk as measure of associatio? Risk differece as measure of associatio? Logistic regressio GLM w/ log lik GLM w/ idetity lik 234

3 Logistic Regressio: Motivatio May scietific questios of iterest ivolve a biary outcome (e.g. disease/o disease) Let s ivestigate if geetic factors are associated with presece/absece of coroary heart disease (CHD) 235

4 Logistic Regressio: Motivatio Scietific questios of iterest: Assess the effect of rs o CHD Assess the effect of cholesterol o CHD Assess the effect of rs o CHD after accoutig for cholesterol 236

5 Logistic Regressio: Motivatio Scietific questio: Assess the effect of rs o risk of CHD rs Coded as the umber of mior alleles 0 = C/C, 1 = C/T, 2 = T/T. 237

6 Motivatio: rs ad CHD Here is a cotigecy table for the SNP ad CHD: > table(rs ,chd) chd rs Prevalece of CHD i C/C: 48/(48+154) = Prevalece of CHD i C/T: 66/(66+104) = Prevalece of CHD i T/T: 13/(13+15) = Does the prevalece of CHD differ across the groups? Without usig regressio, what tool could we use to look for a associatio betwee rs ad CHD? 238

7 Motivatio: rs ad CHD Here is a cotigecy table for the SNP ad CHD: > table(rs ,chd) Without usig regressio, what tool could we use to look for a associatio? > chisq.test(rs ,chd) Pearso's Chi-squared test data: rs ad chd X-squared = , df = 2, p-value = I additio to hypothesis testig, we eed to summarize the stregth of associatio betwee the two variables 239

8 Measures of associatio for biary outcomes Outcome No Yes Exposure Yes a b No c d Risk differece (RD) = P(outcome exposed) - P(outcome ot exposed) = (b/(a+b)) - (d/(c+d)) > table(rs ,chd) RD(T/T vs C/C) = 13/(13+15) 48/(48+154) = =

9 Measures of associatio for biary outcomes Outcome No Yes Exposure Yes a b No c d Risk differece iterpretatio Additive differece i probability (risk) betwee exposed ad uexposed Also called excess risk -1 < RD < 1 RD = 0 o associatio; risk of outcome same for exposed ad uexposed 241

10 Measures of associatio for biary outcomes Outcome No Yes Exposure Yes a b No c d Relative risk (RR) = P(outcome exposed)/p(outcome ot exposed) = (b/(a+b))/(d/(c+d)) > table(rs ,chd) RR(T/T vs C/C) = (13/(13+15)) / (48/(48+154)) = / =

11 Measures of associatio for biary outcomes Outcome No Yes Exposure Yes a b No c d Relative risk iterpretatio Multiplicative differece i probability (risk) of outcome amog exposed compared to uexposed 0 < RR < RR = 1 o associatio; risk of outcome same for exposed ad uexposed 243

12 Measures of associatio for biary outcomes The odds is the ratio of the risk of havig a outcome to the risk of ot havig the outcome If p is the risk of a outcome, the the odds of the outcome are p/(1-p) The odds ratio (OR) is the ratio of the odds of the outcome i the exposed to the odds of the outcome i the uexposed : OR = [p 1 /(1- p 1 )]/ [p 0 /(1- p 0 )] = odds ratio where p 1 =risk i exposed ad p 0 =risk i uexposed Like the relative risk, the odds ratio provides a measure of associatio i a ratio (rather tha a differece) The odds ratio is the ratio of two ratios (i.e. the ratio of odds) The OR approximates RR for rare evets The OR is more complicated to iterpret tha the RR (except for rare evets), but there are some study desigs (amely, case-cotrol studies) where it is ot possible to directly estimate the risk ratio, but oe ca always estimate the odds ratio 244

13 Measures of associatio for biary outcomes Say the chace of disease (D) if you re exposed (E) = 0.25 The the odds of gettig D (for those who are exposed) are 0.25/0.75 = 1/3 or 1:3 Say the chace of disease if you re ot exposed =0.1 The the odds of gettig D (for those who are ot exposed) are 0.1/0.9 = 1/9 or 1:9 The the disease odds ratio (ratio of the odds of disease i the exposed to the odds of disease i the uexposed) is (1/3)/(1/9) = 3 Q: What is the risk ratio here?

14 Measures of associatio for biary outcomes Outcome No Yes Exposure Yes a b No c d Odds = P/(1-P) Odds ratio (OR) = Odds(outcome exposed)/odds(outcome ot exposed) = ((b/(a+b))/(a/(a+b)))/((d/(c+d))/(c/(c+d))) = (b/a)/(d/c) = (bc)/(ad) > table(rs ,chd) OR(T/T vs C/C) = (13/15) / (48/154) =

15 Measures of associatio for biary outcomes Outcome No Yes Exposure Yes a b No c d Odds ratio iterpretatio Multiplicative differece i odds of outcome betwee exposed ad uexposed 0 < OR < OR = 1 o associatio; odds of outcome same for exposed ad uexposed 247

16 Pros ad cos of measures of associatio RD is appealig because it directly commuicates absolute icrease i risk Ofte more policy relevat tha relative measures RR more directly iterpretable tha OR (most people do t have a ituitive uderstadig of odds) OR estimable i case-cotrol studies where RR ad RD are ot For rare outcomes, OR RR 248

17 Logistic Regressio: Motivatio The chi-squared test is adequate for ivestigatig the associatio betwee two categorical predictors But what if we wat to ivestigate the associatio betwee a cotiuous predictor like cholesterol ad a biary outcome like CHD? Or what if we wat to adjust for potetial cofouders? Logistic regressio will provide us with a tool for this 249

18 Biary outcome ad cotiuous exposure Objective: Estimate associatio betwee biary outcome ad cotiuous exposure Y = biary respose (0=o, 1=yes) X = cotiuous exposure p = E(Y X) = P(Y = 1 X ) Oe solutio fit a liear model This is just a stadard liear model except our outcome is biary Iterpretatio of b 1? Problems with this approach? 250

19 Motivatig example: CHD ad cholesterol > lm.mod1 <- lm(chd ~ chol, data = cholesterol) > summary(lm.mod1) Call: lm(formula = chd ~ chol, data = cholesterol) Residuals: Mi 1Q Media 3Q Max What is the iterpretatio of the cholesterol parameter estimate? Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) e-15 *** chol < 2e-16 *** --- Sigif. codes: 0 *** ** 0.01 * Residual stadard error: o 398 degrees of freedom Multiple R-squared: 0.202, Adjusted R-squared: 0.2 F-statistic: o 1 ad 398 DF, p-value: < 2.2e

20 Biary outcome ad cotiuous exposure w w Alterative: use a trasformatio that maps P(Y = 1 X) to the real lie Let logit(p) = log(p / (1 - p))) w p (0, 1) w p /(1 - p) (0, ) w log(p /(1 - p)) (-, ) logit(p) p 252

21 Logistic regressio logit(p) = log(p / (1 - p))) this esures that p lies betwee 0 ad 1 Regress logit(p) o X logit[e(y X)] = log[p(y=1 X)/(1 P(Y=1 X))] = β 0 + β 1 X It turs out that the slope coefficiets i logistic regressio are readily iterpretable: they are just log odds ratios! 253

22 Iterpretatio of logistic regressio parameters O the log-odds scale log[odds(y=1 X = (c+1))] = β 0 + β 1 (c+1) log[odds(y=1 X = c)] = β 0 + β 1 c log[odds(y=1 X = (c+1))] - log[odds(y=1 X = c)] = β 1 log[odds(y=1 X = (c+1))/odds(y=1 X = c)] = β 1 log[or] = β 1 Odds Ratio (OR) That is, for two observatios that differ by oe uit i X there is a differece of β 1 i their log odds of Y = 1 Or, equivaletly, the log of the ratio of the odds of Y = 1 (i.e. the log OR) for two uits that differ i X by oe uit is β 1 254

23 Iterpretatio of logistic regressio parameters By expoetiatig we arrive at a simpler iterpretatio exp(log(or)) = exp(β 1 ) OR = exp(β 1 ) So for two observatios that differ i X by oe uit there is a multiplicative differece i their odds of Y = 1 of exp(β 1 ) Or, equivaletly, the ratio of the odds of Y = 1 (i.e., the odds ratio) for two observatios that differ i X by oe uit is exp(β 1 ) 255

24 Motivatig example: CHD ad cholesterol > glm.mod1 <- glm(chd ~ chol, family = "biomial") > summary(glm.mod1) Call: glm(formula = chd ~ chol, family = "biomial", data = cholesterol) Deviace Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error z value Pr(> z ) (Itercept) < 2e-16 *** chol e-16 *** --- Sigif. codes: 0 *** ** 0.01 * (Dispersio parameter for biomial family take to be 1) Null deviace: o 399 degrees of freedom Residual deviace: o 398 degrees of freedom AIC: Number of Fisher Scorig iteratios: 4 w What do these results tell us about the relatioship betwee cholesterol ad CHD? 256

25 Motivatig example: CHD ad cholesterol > glm.mod1 <- glm(chd ~ chol, family = "biomial") > summary(glm.mod1) Call: glm(formula = chd ~ chol, family = "biomial", data = cholesterol) Deviace Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error z value Pr(> z ) (Itercept) < 2e-16 *** chol e-16 *** --- Sigif. codes: 0 *** ** 0.01 * (Dispersio parameter for biomial family take to be 1) Null deviace: o 399 degrees of freedom Residual deviace: o 398 degrees of freedom AIC: Number of Fisher Scorig iteratios: 4 w Comparig two people who differ i cholesterol by 1 mg/dl, the log odds of CHD are higher by for the idividual with higher cholesterol 257

26 Motivatig example: CHD ad cholesterol w w Differeces i log odds are pretty spectacularly difficult to iterpret! It would be much better to expoetiate the coefficiets ad report odds ratios > exp(glm.mod1$coef) (Itercept) chol e e+00 > exp(cofit(glm.mod1)) Waitig for profilig to be doe % 97.5 % (Itercept) e chol e w Comparig two people who differ i cholesterol by 1 mg/dl, the odds of CHD are higher by a factor of 1.06 (95% CI: 1.04, 1.07) for the idividual with higher cholesterol 258

27 Motivatig example: CHD ad cholesterol w w A 1 mg/dl differece is very small, so we might be iterested i estimatig the OR associated with a larger differece such as 10 mg/dl I this case, just as i liear regressio we just eed to multiply our coefficiet by the appropriate factor > exp(10*glm.mod1$coef) (Itercept) chol e e+00 w Comparig two people whose cholesterol levels differ by 10 mg/dl, the perso with the higher cholesterol has 1.73 times higher odds of CHD compared to the perso with lower cholesterol. 259

28 Multivariable logistic regressio w w Ofte we are iterested i examiig associatios betwee multiple predictors simultaeously ad a biary outcome Multiple logistic regressio follows same patter as liear regressio logit[e(y X)] = β 0 + β 1 X 1 + β 2 X β p X p w exp(b j ) iterpreted as the OR associated with a oe uit chage i the j th predictor, amog idividuals with other predictors at same levels (or holdig other predictors costat/cotrollig for/adjustig for etc.) 260

29 Motivatig example > glm.mod2 <- glm(chd ~ chol+factor(rs ), family = "biomial", data = cholesterol) > summary(glm.mod2) Call: glm(formula = chd ~ chol + factor(rs ), family = "biomial", data = cholesterol) Deviace Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error z value Pr(> z ) (Itercept) < 2e-16 *** chol e-16 *** factor(rs ) ** factor(rs ) * --- Sigif. codes: 0 *** ** 0.01 * (Dispersio parameter for biomial family take to be 1) Null deviace: o 399 degrees of freedom Residual deviace: o 396 degrees of freedom AIC: Number of Fisher Scorig iteratios: 4 261

30 Motivatig example As we have see before, expoetiatig the coefficiets gives us odds ratios > exp(glm.mod2$coef) (Itercept) chol factor(rs )1 factor(rs ) e e e e+00 A oe mg/dl icrease i cholesterol is associated with 1.06 times higher odds of CHD after adjustig for geotype We ca also obtai cofidece itervals for the odds ratios > exp(cofit(glm.mod2)) 2.5 % 97.5 % (Itercept) e chol e factor(rs ) e factor(rs ) e

31 Hypothesis testig for logistic regressio Maximum likelihood is the stadard method of estimatig parameters from logistic models ad is based o fidig the estimates which maximize the joit probability for the observed data uder the chose model. The Wald test uses maximum likelihood estimates (MLE) ad their stadard errors to coduct hypothesis tests Test: H 0 : b j = 0 (o associatio) vs. H A : b j 0 Costruct a z-score: z = ˆβ j SE( ˆβ j ) N(0, 1) Wald Test 263

32 Motivatig example > glm.mod2 <- glm(chd ~ chol+factor(rs ), family = "biomial", data = cholesterol) > summary(glm.mod2) Call: glm(formula = chd ~ chol + factor(rs ), family = "biomial", data = cholesterol) Deviace Residuals: Mi 1Q Media 3Q Max Coefficiets: Estimate Std. Error z value Pr(> z ) (Itercept) < 2e-16 *** chol e-16 *** factor(rs ) ** factor(rs ) * --- Sigif. codes: 0 *** ** 0.01 * (Dispersio parameter for biomial family take to be 1) Null deviace: o 399 degrees of freedom Residual deviace: o 396 degrees of freedom AIC: Number of Fisher Scorig iteratios: 4 Wald statistics ad p-values for each parameter 264

33 Likelihood ratio test The likelihood ratio statistic is useful i comparig ested models. (LRT = likelihood ratio test) This allows us to test hypotheses about multiple parameters simultaeously such as H 0 : b 1 = b 2 = 0 vs H A : at least oe parameter ot equal to 0 I order to use the LRT we must fit a ested hierarchy of models For example: Model 1: logit p i = b 0 + b 1 chol i Model 2: logit p i = b 0 + b 1 chol i + b 2 SNP 1i + b 3 SNP 2i 265

34 Likelihood ratio test The LRT allows us to test the sigificace of the additioal parameters i the larger model. Example: Compare model 2 to model 3 H 0 : b 2 = b 3 = 0 LRT = -2 [L 1 L 2 ] c 2 2 df = # parameters beig tested 266

35 Example: Likelihood ratio test > lrtest(glm.mod1,glm.mod2) Likelihood ratio test Model 1: chd ~ chol Model 2: chd ~ chol + factor(rs ) #Df LogLik Df Chisq Pr(>Chisq) ** --- Sigif. codes: 0 *** ** 0.01 * After accoutig for cholesterol, there is a statistically sigificat associatio betwee rs ad CHD 267

36 Logistic Regressio: Assumptios 1. Logit(E[Y x]) is related liearly to x 2. Y s are idepedet of each other 268

37 Summary We have cosidered: Measures of associatio for biary outcomes Logistic regressio Iterpretatio Estimatio Hypothesis testig 269

38 REGRESSION METHODS Geeralized liear models 270

39 Geeralized liear models So far we have cosidered : Cotiuous outcomes liear regressio/anova Biary outcomes logistic regressio Geeralized liear models (GLMs) provide a way to model Cotiuous ad biary outcomes Additioal types of outcome variables (e.g. couts) Additioal fuctioal forms for the relatioship betwee outcomes ad predictors 271

40 Geeralized Liear Models GLMs allow us to estimate regressio models for outcomes arisig from expoetial family distributios. This family icludes may familiar distributios icludig Normal, Biomial ad Poisso. A GLM is specified based o three compoets: Outcome distributio Liear predictor Lik fuctio We will see that liear ad logistic regressio are both GLMs with specific choice of outcome ad lik fuctio! 272

41 Outcome distributio The first step i fittig a GLM is to choose a appropriate distributio for your outcome Examples Cotiuous outcome Normal Biary outcome Biomial Cout outcome Poisso 273

42 Liear predictor After specifyig a distributio for the outcome, we specify the liear predictor, g[e(y)] = β 0 + β 1 x β p x p This is just the systematic piece of our regressio model As i other regressio models we have see, we eed to idetify the set of covariates to be icluded 274

43 Lik fuctio Fially, we specify a lik fuctio, g[e(y)]: g[e(y)] = β 0 + β 1 x β p x p This describes the fuctioal form of the relatioship betwee E(Y) ad the liear predictor I liear regressio, we use the idetity lik fuctio g[e(y)] = E(Y) I logistic regressio, we use the logit lik fuctio g[(e(y)] = log[e(y)/(1-e(y))] 275

44 Geeralized liear models A few example GLMS: Distributio Lik fuctio Model Normal Idetity g[e(y)]=e(y) Liear regressio Biomial Logit g[e(y)]= log[e(y)/(1-e(y))] Logistic regressio Poisso Log g[e(y)]=log[e(y)] Poisso GLM Gamma Log g[e(y)]=log[e(y)] Gamma GLM 276

45 Alteratives to logistic regressio Odds ratio is limited by difficulty of iterpretatio Relative risk is more iterpretable To estimate a relative risk usig regressio we ca use the log liear model: log[e(y x)] = β 0 + β 1 x This is sometimes referred to as relative risk regressio exp(β 1 ) is the relative risk associated with a oeuit icrease i x 277

46 Modified Poisso regressio To estimate the relative risk, we could use a biomial GLM with log lik. It turs out that estimatio for this model is very challegig ad results are sesitive to outliers i X A alterative approach that performs better i practice is modified Poisso regressio This method uses a Poisso GLM with log lik Usig a Poisso model for biary data will give icorrect stadard errors because the variace for biary outcomes differs from the variace for Poisso outcomes We ca combie the Poisso GLM with a robust variace estimator to accout for this violatio of the model s assumptios 278

47 Modified Poisso regressio > glm.rr <- gee(chd ~ chol+factor(rs ), family = "poisso", id = seq(1,row(cholesterol)), data = cholesterol) > summary(glm.rr) GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA gee S-fuctio, versio 4.13 modified 98/01/27 (1998) Model: Lik: Logarithm Variace to Mea Relatio: Poisso Correlatio Structure: Idepedet Coefficiets: Estimate Naive S.E. Naive z Robust S.E. Robust z (Itercept) chol factor(rs ) factor(rs ) Estimated Scale Parameter: Number of Iteratios: 1 279

48 Modified Poisso regressio w Relative risk of CHD associated with 1 mg/dl icrease i cholesterol is > exp(glm.rr$coef) (Itercept) chol factor(rs )1 factor(rs ) w Compare this to the odds ratio we obtaied earlier usig logistic regressio > exp(glm.mod2$coef) (Itercept) chol factor(rs )1 factor(rs ) e e e e

49 Relative risk regressio: Assumptios 1. log(e[y x]) = log(p(y=1 x) is related liearly to x Warig: this ca lead to predicted probabilities > 1 2. Y s are idepedet of each other 281

50 Risk differece regressio w w w Recall, we also cosidered fittig a liear model to biary outcome data This allows us to estimate differeces i risk associated with a 1 uit differece i the predictor By usig robust stadard errors, we ca accout for violatio of the assumptios of ormality ad equal variace > glm.rd <- gee(chd ~ chol+factor(rs ), id = seq(1,row(cholesterol)), data = cholesterol) > summary(glm.rd) Coefficiets: Estimate Naive S.E. Naive z Robust S.E. Robust z (Itercept) chol factor(rs ) factor(rs )

51 Risk differece regressio A 1 mg/dl differece is very small, so we might be iterested i estimatig the RD associated with a larger differece such as 10 mg/dl Comparig two people with the same rs geotype whose cholesterol levels differ by 10 mg/dl, the risk of CHD for the perso with the higher cholesterol is 9.4% higher (i absolute terms) compared to the perso with lower cholesterol Comparig two people with the same cholesterol level, a perso with rs C/T is estimated to have risk of CHD 14.3% higher (i absolute terms) tha a perso with rs C/C Comparig two people with the same cholesterol level, a perso with rs T/T is estimated to have risk of CHD 21.2% higher (i absolute terms) tha a perso with rs C/C 283

52 Risk differece regressio: Assumptios 1. E[Y x] = P(Y=1 x) is related liearly to x Warig: this ca lead to predicted probabilities > 1 or < 0 2. Y s are idepedet of each other 284

53 Summary We have cosidered: Logistic regressio Iterpretatio Estimatio Geeralized liear models Relative risk regressio Risk differece regressio 285

54 Module summary I this module we have covered a variety of regressio methods that ca be used to aalyze cotiuous ad biary outcomes: Cotiuous outcomes Simple liear regressio Multiple liear regressio ANOVA Biary outcomes Logistic regressio Relative risk regressio Risk differece regressio These methods are foudatioal for may statistical aalyses, ad we hope you will be able to apply them to your future research! 286

55 Everythig is regressio! (Professor Scott Emerso) 287

REGRESSION MODELS ANOVA

REGRESSION MODELS ANOVA REGRESSION MODELS ANOVA 141 Cotiuous Outcome? NO RECAP: Logistic regressio ad other methods YES Liear Regressio Examie mai effects cosiderig predictors of iterest, ad cofouders Test effect modificatio

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve

More information

1 Models for Matched Pairs

1 Models for Matched Pairs 1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph Correlatio Y Two variables: Which test? X Explaatory variable Respose variable Categorical Numerical Categorical Cotigecy table Cotigecy Logistic Grouped bar graph aalysis regressio Mosaic plot Numerical

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Describing the Relation between Two Variables

Describing the Relation between Two Variables Copyright 010 Pearso Educatio, Ic. Tables ad Formulas for Sulliva, Statistics: Iformed Decisios Usig Data 010 Pearso Educatio, Ic Chapter Orgaizig ad Summarizig Data Relative frequecy = frequecy sum of

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

REGRESSION AND ANALYSIS OF VARIANCE. Motivation. Module structure

REGRESSION AND ANALYSIS OF VARIANCE. Motivation. Module structure REGRESSION AND ANALYSIS OF VARIANCE 1 Motivatio Objective: Ivestigate associatios betwee two or more variables What tools do you already have? t-test Compariso of meas i two populatios What will we cover

More information

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 017 MODULE 4 : Liear models Time allowed: Oe ad a half hours Cadidates should aswer THREE questios. Each questio carries

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Biostatistics for Med Students. Lecture 2

Biostatistics for Med Students. Lecture 2 Biostatistics for Med Studets Lecture 2 Joh J. Che, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 22, 2017 Lecture Objectives To uderstad basic research desig priciples

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Aalysis Mahida Samarakoo Jauary 28, 2016 Mahida Samarakoo STAC51: Categorical data Aalysis 1 / 35 Table of cotets Iferece for Proportios 1 Iferece for Proportios Mahida Samarakoo

More information

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued) Worksheet 3 ( 11.5-11.8) Itroductio to Simple Liear Regressio (cotiued) This worksheet is a cotiuatio of Discussio Sheet 3; please complete that discussio sheet first if you have ot already doe so. This

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y). Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each

More information

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS SIMPLE LINEAR REGRESSION AND CORRELATION ANALSIS INTRODUCTION There are lot of statistical ivestigatio to kow whether there is a relatioship amog variables Two aalyses: (1) regressio aalysis; () correlatio

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

General IxJ Contingency Tables

General IxJ Contingency Tables page1 Geeral x Cotigecy Tables We ow geeralize our previous results from the prospective, retrospective ad cross-sectioal studies ad the Poisso samplig case to x cotigecy tables. For such tables, the test

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n. ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum

More information

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 7: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review How ca we set a cofidece iterval o a proportio? 2 Review How ca we set a cofidece iterval

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005 Statistics 203 Itroductio to Regressio ad Aalysis of Variace Assigmet #1 Solutios Jauary 20, 2005 Q. 1) (MP 2.7) (a) Let x deote the hydrocarbo percetage, ad let y deote the oxyge purity. The simple liear

More information

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is: PROBABILITY FUNCTIONS A radom variable X has a probabilit associated with each of its possible values. The probabilit is termed a discrete probabilit if X ca assume ol discrete values, or X = x, x, x 3,,

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics 8.2 Testig a Proportio Math 1 Itroductory Statistics Professor B. Abrego Lecture 15 Sectios 8.2 People ofte make decisios with data by comparig the results from a sample to some predetermied stadard. These

More information

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced

More information

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes Admiistrative Notes s - Lecture 7 Fial review Fial Exam is Tuesday, May 0th (3-5pm Covers Chapters -8 ad 0 i textbook Brig ID cards to fial! Allowed: Calculators, double-sided 8.5 x cheat sheet Exam Rooms:

More information

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions Assessmet ad Modelig of Forests FR 48 Sprig Assigmet Solutios. The first part of the questio asked that you calculate the average, stadard deviatio, coefficiet of variatio, ad 9% cofidece iterval of the

More information

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y 1 Sociology 405/805 Revised February 4, 004 Summary of Formulae for Bivariate Regressio ad Correlatio Let X be a idepedet variable ad Y a depedet variable, with observatios for each of the values of these

More information

Statistical Hypothesis Testing. STAT 536: Genetic Statistics. Statistical Hypothesis Testing - Terminology. Hardy-Weinberg Disequilibrium

Statistical Hypothesis Testing. STAT 536: Genetic Statistics. Statistical Hypothesis Testing - Terminology. Hardy-Weinberg Disequilibrium Statistical Hypothesis Testig STAT 536: Geetic Statistics Kari S. Dorma Departmet of Statistics Iowa State Uiversity September 7, 006 Idetify a hypothesis, a idea you wat to test for its applicability

More information

Statistics 20: Final Exam Solutions Summer Session 2007

Statistics 20: Final Exam Solutions Summer Session 2007 1. 20 poits Testig for Diabetes. Statistics 20: Fial Exam Solutios Summer Sessio 2007 (a) 3 poits Give estimates for the sesitivity of Test I ad of Test II. Solutio: 156 patiets out of total 223 patiets

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Stat 200 -Testing Summary Page 1

Stat 200 -Testing Summary Page 1 Stat 00 -Testig Summary Page 1 Mathematicias are like Frechme; whatever you say to them, they traslate it ito their ow laguage ad forthwith it is somethig etirely differet Goethe 1 Large Sample Cofidece

More information

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N. 3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear

More information

Lecture 11 Simple Linear Regression

Lecture 11 Simple Linear Regression Lecture 11 Simple Liear Regressio Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech Midterm 2 mea: 91.2 media: 93.75 std: 6.5 2 Meddicorp

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.

More information

Simple Linear Regression

Simple Linear Regression Simple Liear Regressio 1. Model ad Parameter Estimatio (a) Suppose our data cosist of a collectio of pairs (x i, y i ), where x i is a observed value of variable X ad y i is the correspodig observatio

More information

¹Y 1 ¹ Y 2 p s. 2 1 =n 1 + s 2 2=n 2. ¹X X n i. X i u i. i=1 ( ^Y i ¹ Y i ) 2 + P n

¹Y 1 ¹ Y 2 p s. 2 1 =n 1 + s 2 2=n 2. ¹X X n i. X i u i. i=1 ( ^Y i ¹ Y i ) 2 + P n Review Sheets for Stock ad Watso Hypothesis testig p-value: probability of drawig a statistic at least as adverse to the ull as the value actually computed with your data, assumig that the ull hypothesis

More information

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X. Regressio Correlatio vs. regressio Predicts Y from X Liear regressio assumes that the relatioship betwee X ad Y ca be described by a lie Regressio assumes... Radom sample Y is ormally distributed with

More information

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech Fial Review Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech 1 Radom samplig model radom samples populatio radom samples: x 1,..., x

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2 Chapter 8 Comparig Two Treatmets Iferece about Two Populatio Meas We wat to compare the meas of two populatios to see whether they differ. There are two situatios to cosider, as show i the followig examples:

More information

y ij = µ + α i + ɛ ij,

y ij = µ + α i + ɛ ij, STAT 4 ANOVA -Cotrasts ad Multiple Comparisos /3/04 Plaed comparisos vs uplaed comparisos Cotrasts Cofidece Itervals Multiple Comparisos: HSD Remark Alterate form of Model I y ij = µ + α i + ɛ ij, a i

More information

Formulas and Tables for Gerstman

Formulas and Tables for Gerstman Formulas ad Tables for Gerstma Measuremet ad Study Desig Biostatistics is more tha a compilatio of computatioal techiques! Measuremet scales: quatitative, ordial, categorical Iformatio quality is primary

More information

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences. Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 00C Istructor: Nicolas Christou EXERCISE Aswer the followig questios: Practice problems - simple regressio - solutios a Suppose y,

More information

Homework for 4/9 Due 4/16

Homework for 4/9 Due 4/16 Name: ID: Homework for 4/9 Due 4/16 1. [ 13-6] It is covetioal wisdom i military squadros that pilots ted to father more girls tha boys. Syder 1961 gathered data for military fighter pilots. The sex of

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Ismor Fischer, 1/11/

Ismor Fischer, 1/11/ Ismor Fischer, //04 7.4-7.4 Problems. I Problem 4.4/9, it was show that importat relatios exist betwee populatio meas, variaces, ad covariace. Specifically, we have the formulas that appear below left.

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Simple Random Sampling!

Simple Random Sampling! Simple Radom Samplig! Professor Ro Fricker! Naval Postgraduate School! Moterey, Califoria! Readig:! 3/26/13 Scheaffer et al. chapter 4! 1 Goals for this Lecture! Defie simple radom samplig (SRS) ad discuss

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 8: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review What do we mea by oparametric? What is a desirable locatio statistic for ordial data? What

More information

Chapter 1 (Definitions)

Chapter 1 (Definitions) FINAL EXAM REVIEW Chapter 1 (Defiitios) Qualitative: Nomial: Ordial: Quatitative: Ordial: Iterval: Ratio: Observatioal Study: Desiged Experimet: Samplig: Cluster: Stratified: Systematic: Coveiece: Simple

More information

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday Aoucemets MidtermII Review Sta 101 - Fall 2016 Duke Uiversity, Departmet of Statistical Sciece Office Hours Wedesday 12:30-2:30pm Watch liear regressio videos before lab o Thursday Dr. Abrahamse Slides

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Regression, Inference, and Model Building

Regression, Inference, and Model Building Regressio, Iferece, ad Model Buildig Scatter Plots ad Correlatio Correlatio coefficiet, r -1 r 1 If r is positive, the the scatter plot has a positive slope ad variables are said to have a positive relatioship

More information

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 5: Parametric Hypothesis Testig: Comparig Meas GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review from last week What is a cofidece iterval? 2 Review from last week What is a cofidece

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 1 Populatio parameters ad Sample Statistics October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 2 Ifereces

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Logit regression Logit regression

Logit regression Logit regression Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Question 1: Exercise 8.2

Question 1: Exercise 8.2 Questio 1: Exercise 8. (a) Accordig to the regressio results i colum (1), the house price is expected to icrease by 1% ( 100% 0.0004 500 ) with a additioal 500 square feet ad other factors held costat.

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Sample Size Determination (Two or More Samples)

Sample Size Determination (Two or More Samples) Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Measurement uncertainty of the sound absorption

Measurement uncertainty of the sound absorption Measuremet ucertaity of the soud absorptio coefficiet Aa Izewska Buildig Research Istitute, Filtrowa Str., 00-6 Warsaw, Polad a.izewska@itb.pl 6887 The stadard ISO/IEC 705:005 o the competece of testig

More information

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines) Dr Maddah NMG 617 M Statistics 11/6/1 Multiple egressio () (Chapter 15, Hies) Test for sigificace of regressio This is a test to determie whether there is a liear relatioship betwee the depedet variable

More information

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times Sigificace level vs. cofidece level Agreemet of CI ad HT Lecture 13 - Tests of Proportios Sta102 / BME102 Coli Rudel October 15, 2014 Cofidece itervals ad hypothesis tests (almost) always agree, as log

More information

Chapter 4 - Summarizing Numerical Data

Chapter 4 - Summarizing Numerical Data Chapter 4 - Summarizig Numerical Data 15.075 Cythia Rudi Here are some ways we ca summarize data umerically. Sample Mea: i=1 x i x :=. Note: i this class we will work with both the populatio mea µ ad the

More information

1036: Probability & Statistics

1036: Probability & Statistics 036: Probability & Statistics Lecture 0 Oe- ad Two-Sample Tests of Hypotheses 0- Statistical Hypotheses Decisio based o experimetal evidece whether Coffee drikig icreases the risk of cacer i humas. A perso

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed

More information

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234 STA 291 Lecture 19 Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Locatio CB 234 STA 291 - Lecture 19 1 Exam II Covers Chapter 9 10.1; 10.2; 10.3; 10.4; 10.6

More information

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution. Importat Formulas Chapter 3 Data Descriptio Mea for idividual data: X = _ ΣX Mea for grouped data: X= _ Σf X m Stadard deviatio for a sample: _ s = Σ(X _ X ) or s = 1 (Σ X ) (Σ X ) ( 1) Stadard deviatio

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

GG313 GEOLOGICAL DATA ANALYSIS

GG313 GEOLOGICAL DATA ANALYSIS GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data

More information

Statistics Independent (X) you can choose and manipulate. Usually on x-axis

Statistics Independent (X) you can choose and manipulate. Usually on x-axis Statistics-6000 Variable: are characteristic that ca take o differet values with respect to persos, time, ad place ad types of variables are as follow: Idepedet (X) you ca choose ad maipulate. Usually

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So, 0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical

More information