STK4900/ Lecture 7. Program

Size: px
Start display at page:

Download "STK4900/ Lecture 7. Program"

Transcription

1 STK4900/ Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment on model fit Sections 5.1, 5.2 (excet 5.2.6), and 5.6 Sulementary material on likelihood and deviance 1

2 Logistic regression with one redictor We have data (x 1,y 1 ),, (x n,y n ) y Here i is a binary outcome (0 or 1) for subject i and i is a redictor for the subject x We let x ( ) = Ey ( x) = Py ( = 1 x) The logistic regression models take the form: x ( ) = ex( b0 + b1x) 1 + ex( b + b x) 0 1 This gives a "S-shaed" relation between (x) and x and ensures that (x) stays between 0 and 1 2

3 The logistic model may alternatively be given in terms of the odds: x ( ) 1 - x ( ) = ex( b + b x) (*) 0 1 x + D If we consider two subjects with covariate values and x, resectively, their odds ratio becomes [ x ] [ - x] x ( + D) 1 - ( + D) x ( ) 1 ( ) = ( b + b x +D ) ex ( ) 0 1 ex( b + b x) 0 1 = ex( b D) 1 e b 1 In articular is the odds ratio corresonding to one unit's increase in the value of the covariate By (*) the logistic regression model may also be given as: é x ( ) ù log ê = b0 + b1x (**) 1 - x ( ) ú ë û Thus the logistic regression model is linear in the log-odds 3

4 Consider the WCGS study with CHD as outcome and age as redictor (individual age, not groued age as we considered in Lecture 6) R commands: wcgs=read.table("htt:// se="\t",header=t,na.strings=".") fit=glm(chd69~age, data=wcgs,family=binomial) summary(fit) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age e-11 The odds ratio for one year increase in age is odds ratio for a ten-year increase is e = while the (The numbers deviate slightly from those on slide 25 from Lecture 6, since there we used mean age for each age grou while here we use the individual ages) e = How is the estimation erformed for the logistic regression model? 4

5 Maximum likelihood estimation Estimation in the logistic model is erformed using maximum likelihood estimation We first describe maximum likelihood estimation for the linear regression model: For ease of resentation, we assume that σ 2 is known The density of y i takes the form (cf slide 12 from Lecture 1): 5

6 The likelihood is the simultaneous density b 1 considered as a function of the arameters and for the 0 observed values of the y i We estimate the arameters by maximizing the likelihood. This corresonds to finding the arameters that make the observed y i as likely as ossible Maximizing the likelihood L is the same as maximizing b which is the same as minimizing For the linear regression model, maximum likelihood estimation coincides with least squares estimation 6

7 We then consider the situation for logistic regression We have data (x 1,y 1 ),, (x n,y n ), where outcome (0 or 1) for subject i and Here we have Py ( = 1 x) = i i i Py ( = 0 x) = 1- i i i x i y i is a redictor is a binary where i = ex( b0 + b1xi ) 1 + ex( b + b x ) 0 1 i Thus the distribution of y i may be written as yi 1 y P( yi xi) = i (1- i) - i 7

8 The likelihood becomes n L= Õ P( yi xi) i= 1 n Õ = - i= 1 yi 1 y i (1 i) - i Since i = ex( b0 + b1xi ) 1 + ex( b + b x ) 0 1 i the likelihood is, for given observations, a function of the unknown arameters b and 0 b1 b0 1 b We estimate and by the values of these arameters that maximize the likelihood These estimates are called the maximum likelihood estimates (MLE) and are denoted ˆ b and ˆb 1 0 8

9 Confidence interval for b 1 and odds ratio 95% confidence interval for (based on the normal aroximation): OR = ex( b ) ˆ b ± 1.96 se( ˆ b ) b 1 is the odds ratio for one unit's increase in x We obtain a 95% confidence interval for OR by transforming the lower and uer limits of the confidence interval for b1 In the CHD examle we have 95% confidence interval for b 1 : ˆ b = and ± i.e. from to se( ˆ b ) = Estimate of odds ratio OR = ex(0.0744) = % confidence interval for OR : from ex(0.052) = to ex(0.096) =

10 R function for comuting odds ratio with 95% confidence limits excoef=function(glmobj) { regtab=summary(glmobj)$coef excoef=ex(regtab[,1]) lower=excoef*ex(-1.96*regtab[,2]) uer=excoef*ex(1.96*regtab[,2]) cbind(excoef,lower,uer) } excoef(fit) R outut (edited): excoef lower uer (Intercet) age

11 Wald test for H0: b 1 = 0 To test the null hyothesis alternative H : b ¹ 0 A z = 1 ˆ b1 se( ˆ b ) 1 H : b = We reject H 0 for large values of z versus the two-sided we often use the Wald test statistic: Under H 0 the test statistic is aroximately standard normal P-value (two-sided): P = 2 P(Z > z ) where Z is standard normal In the CHD examle we have Wald test statistic z = / = 6.58 ˆ b = and 1 se( ˆ b ) = which is highly significant (cf. slide 4) 11

12 Multile logistic regression Assume now that we for each subject have We let a binary outcome y redictors x1, x2,..., x x (, x,..., x) = Ey ( x, x,..., x) = Py ( = 1 x, x,..., x) Logistic regression model: x (, x,..., x) 1 2 ex( b + b x + b x b x ) = Alternatively the model may be written: 1 ex( b b x b x... b x ) æ x (, x,..., x) ö = b + b x + b x + + b x è ø 1 2 log ç 1 - x ( 1, x2,..., x) 12

13 The logistic model may also be given in terms of the odds: x (, x,..., x) x (, x,..., x) 1 2 = ex( b + b x + b x b x ) If we consider two subjects with values 1 and x 1, for the first covariate and the same values for all the others, their odds ratio becomes x ( 1+ D, x2,..., x) é ë1 - x ( 1+ D, x2,..., x) ù û x ( 1, x2,..., x) é ë1 - x ( 1, x2,..., x) ù û ex( b0 + b1( x1+d ) + b2 x b x) = = ex( b1d) ex( b + b x + b x b x ) x + D e b 1 In articular is the odds ratio corresonding to one unit's increase in the value of the first covariate holding all other covariates constant A similar interretation holds for the other regression coefficients 13

14 Wald tests and confidence intervals ˆ b = MLE for b j se( ˆ b ) = standard error for j j ˆ b j To test the null hyothesis z = ˆ b j se( ˆ b ) j H 0 j : b j = 0 we use the Wald test statistic: which is aroximately N(0,1)-distributed under H 0 j 95% confidence interval for : ˆ b ± 1.96 se( ˆ b ) OR j = ex( b ) j b j j is the odds ratio for one unit's increase in the value of the j-th covariate holding all other covariates constant j We obtain a 95% confidence interval for OR j by transforming the lower and uer limits of the confidence interval for b j 14

15 Consider the WCGS study with CHD as outcome and age, cholesterol (mg/dl), systolic blood ressure (mmhg), body mass index (kg/m 2 ), and smoking (yes, no) as redictors (as on age 152 in the text book we omit an individual with an unusually high cholesterol value) R commands: wcgs.mult=glm(chd69~age+chol+sb+bmi+smoke, data=wcgs, family=binomial, subset=(chol<600)) summary(wcgs.mult) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age e-08 chol e-12 sb e-06 bmi smoke e-06 15

16 Odds ratios with confidence intervals R command (using the function from slide 10): excoef(wcgs.mult) R outut (edited): excoef lower uer (Intercet) 4.50e e e-05 age chol sb bmi smoke

17 For a numerical covariate it may be more meaningful to resent an odds ratio corresonding to a larger increase than one unit (cf. slide 13) This is easily achieved by refitting the model with a rescaled covariate If you (e.g) want to study the effect of a ten-years increase in age, you fit the model with the covariate age_10=age/10 R commands: wcgs.resc=glm(chd69~age_10+chol_50+sb_50+bmi_10+smoke, data=wcgs, family=binomial, subset=(chol<600)) summary(wcgs.resc) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age_ e-08 chol_ e-12 sb_ e-06 bmi_ smoke e-06 Note that values of the Wald test statistic are not changed (cf. slide 15) 17

18 Odds ratios with confidence intervals: R command (using the function from slide 10): excoef(wcgs.resc) R outut (edited): excoef lower uer (Intercet) age_ chol_ sb_ bmi_ smoke

19 An aim of the WCGS study was to study the effect on CHD of certain behavioral atterns, denoted A1, A2, B3 and B4 Behavioral attern is a categorical covariate with four levels, and must be fitted as a factor in R R commands: wcgs$behcat=factor(wcgs$behat) wcgs.beh=glm(chd69~age_10+chol_50+sb_50+bmi_10+smoke+behcat, data=wcgs, family=binomial, subset=(chol<600)) summary(wcgs.beh) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age_ e-07 chol_ e-12 sb_ e-05 bmi_ smoke e-05 behcat behcat behcat

20 Here we may be interested in : Testing if behavioral atterns have an effect on CHD risk Testing if it is sufficient to use two categories for behavioral attern (A and B) In general we consider a logistic regression model: x (, x,..., x) 1 2 ex( b + b x + b x b x ) = ex( b b x b x... b x ) Here we want to test the null hyothesis that q of the zero, or equivalently that there are q linear restrictions among the b j 's are equal to b j 's Examles: H H : b = b = b = b = 0 ( q= 4) : b = b and b = b ( q= 2)

21 Deviance and sum of squares For the linear regression model the sum of squares was a key quantity in connection with testing and for assessing the fit of a model We want to define a quantity for logistic regression that corresonds to the sum of squares To this end we start out by considering the relation between the log-likelihood and the sum of squares for the linear regression model For the linear regression model l=log L takes the form (cf. slide 6): The log-likelihood obtains its largest value for the saturated model, i.e. the model where there are no restrictions on the μ i 21

22 For the saturated model the μ i are estimated by log-likelihood becomes, and the For a given secification of the linear regression model the μ i are estimated by the fitted values, i.e. ˆ µ ˆ, with corresonding i = yi log-likelihood lˆ n 1 s µ n 2 = - log(2 ) - 2 å i - 2 2s i = 1 ( y ˆ ) 2 i The deviance for the model is defined as and it becomes n 1 D = y - µ å 2 s i = 1 ( ˆ ) 2 i i For the linear regression model the deviance is just the sum of squares for the fitted model divided by σ 2 22

23 Deviance for binary data We then consider logistic regression with data ( y, x, x,..., x ) i = 1,2,..., n y i i 1i 2i i where is binary resonse and the are redictors x ji We introduce log-likelihood (cf. slide 8) = P( y = 1 x, x,..., x ) i i 1i 2i i l l 1 n and note that the = (,..., ) is a function of,..., 1 n For the saturated model, i.e. the model where there are no restrictions on the i, the i are estimated by the log-likelihood takes the value and 23

24 For a fitted logistic regression model we obtain the estimated robabilities ex( ˆ b0 + ˆ b ˆ 1x1i bxi) ˆi = ˆ( x1 i, x2i,..., xi) = 1 + ex( ˆ b + ˆ b x ˆ b x ) 0 1 1i i and the corresonding value ˆ ( ˆ,..., ˆ ) l = l 1 n of the log-likelihood The deviance for the model is defined as The deviance itself is not of much use for binary data But by comaring the deviances of two models, we may check if one gives a better fit than the other. 24

25 Consider the WCGS study with age, cholesterol, systolic blood ressure, body mass index, smoking and behavioral attern as redictors (cf slide 19) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age_ e-07 chol_ e-12 sb_ e-05 bmi_ smoke e-05 behcat behcat behcat Null deviance: on 3140 degrees of freedom Residual deviance: on 3132 degrees of freedom The deviance of the fitted model is denoted "residual deviance" in the outut The "null deviance" is the deviance for the model with no covariates, i.e. for the model where all the i are assumed to be equal 25

26 Deviance and likelihood ratio tests We want to test the null hyothesis H 0 that q of the b j 's are equal to zero, or equivalently that there are q linear restrictions among the b j 's To test the null hyothesis, we use the test statistic G = D0 - D where D 0 is the deviance under the null hyothesis and D is the deviance for the fitted model (not assuming H 0 ) We reject H 0 for large values of G To comute P-values, we use that the test statistic G is chi-square distributed (c 2 ) with q degrees of freedom under H 0 26

27 We will show how we may rewrite G in terms of the likelihood ratio We have Here where lˆ= log Lˆ and lˆ = log Lˆ 0 0 Lˆ = max L and Lˆ = max L model 0 H 0 Thus G = D0 - D ˆ ˆ = -2( l0 - l) = -2log( Lˆ ˆ) 0 L Thus large values of G corresonds to small values of the likelihood ratio Lˆ the likelihood ratio test 0 Lˆ and the test based on G is equivalent to 27

28 For the model with age, cholesterol, systolic blood ressure, body mass index, smoking, and behavioral attern as redictors (cf slide 25) the deviance becomes R commands: anova(wcgs.resc,wcgs.beh,test="chisq") R outut (edited): Analysis of Deviance Table D = For the model without behavioral attern (cf slide 17) the deviance takes the value D 0 = The test statistic takes the value: G= D0 - D= = 24.8 Model 1: chd69 ~ age_10 + chol_50 + sb_50 + bmi_10 + smoke Model 2: chd69 ~ age_10 + chol_50 + sb_50 + bmi_10 + smoke + behcat Resid.Df Resid.Dev Df Deviance P(> Chi ) e-05 28

29 Model fit for linear regression (review) 1. Linearity 2. Constant variance 3. Indeendent resonses 4. Normally distributed error terms and no outliers Model fit for logistic regression 1. Linearity: Still relevant, see following slides Var( y x ) = (1- ) 2. Heteroscedastic model, i i i i, i.e. deends on. Ey ( x) = i i i However this non-constant variance is taken care of by the maximum likelihood estimation. 3. Indeendent resonses: See Lecture 10 on Friday. 4. Not relevant, data are binary, no outliers in resonses (but there could well be extreme covariates, influential observations). 29

30 Checking linearity for logistic regression We want to check if the robabilities can be adequately described by the linear exression æ x (, x,..., x) ö = b + b x + b x + + b x è ø 1 2 log ç 1 - x ( 1, x2,..., x) We will discuss 3 aroaches: 1. Grouing the covariates 2. Adding square terms or logarithmic terms to the model 3. Extending the model to generalized additive models (GAM) æ x (, x,..., x) ö = log b0 f1( x1) f2( x2)... f( x) ç 1 - x ( 1, x2,..., x) è ø 30

31 1. Grouing the variables For a simle illustration we consider the situation where age is the only covariate in the model for CHD, and we want to check if the effect of age is linear (on the log-odds scale) The rocedure will be similar if there are other covariates in addition to age We may here fit a model considering the age grou as a factor (age grous: 35-40, 41-45, 46-50, 51-55, 56-60) Or we may fit a model where the mean age in each age grou is used as numerical covariate (means: 39.5, 42.9, 47.9, 52.8, 57.3) We may then use a deviance test to check if flexible categorical model gives a better fit than the linear numerical. Here we find no imrovement, =

32 R commands: fit.catage=glm(chd69~factor(agec), data=wcgs,family=binomial) wcgs$agem=39.5*(wcgs$agec==0)+42.9*(wcgs$agec==1)+47.9*(wcgs$agec==2)+ 52.8*(wcgs$agec==3)+57.3*(wcgs$agec==4) fit.linage=glm(chd69~agem, data=wcgs,family=binomial) summary(fit.catage) summary(fit.linage) anova(fit.linage, fit.catage,test="chisq") R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 factor(agec) factor(agec) factor(agec) factor(agec) e-05 Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 agem e-10 Model 1: chd69 ~ agem Model 2: chd69 ~ factor(agec) Resid. Df Resid. Dev Df Deviance P(> Chi )

33 2. Adding square terms or log-terms The simle model can be extended to more flexible models such as or log é x ( ) ù ê = b 1 ( ) 0 + b1x - x ú ë û log é x ( ) ù ê = b 1 ( ) 0 + b1x + b2x - x ú ë û ( ) log é x ê ù = b + 0 b1x + b2log( x) 1 - x ( ) ú ë û We may then use a deviance test to check if the flexible models give a better fit than the original. Here we neither find any imrovement, =0.79 and =

34 R commands: fit=glm(chd69~age, data=wcgs,family=binomial) fita2=glm(chd69~age+i(age^2), data=wcgs,family=binomial) fitlog=glm(chd69~age+log(age), data=wcgs,family=binomial) anova(fit,fita2,test="chisq") anova(fit,fitlog,test="chisq") R outut (edited): > anova(fit,fita2,test="chisq") Model 1: chd69 ~ age Model 2: chd69 ~ age + I(age^2) Resid. Df Resid. Dev Df Deviance Pr(>Chi) > anova(fit,fitlog,test="chisq") Model 1: chd69 ~ age Model 2: chd69 ~ age + log(age) Resid. Df Resid. Dev Df Deviance Pr(>Chi)

35 3. Generalized additive model In this examle just with one covariate: where f1( x1) æ x ( ) ö = + è ø 1 log ç b0 f1( x1) 1 - x ( 1) is a smooth function estimated by the rogram. The aroach can easily be extended to several covariates. We can then (a) Plot the estimated function with confidence intervals. Will a straight line fit within the confidence limits? (a) Comare the simle and flexible model by a deviance test. 35

36 R outut (edited): > library(gam) Øfitgam=gam(chd69~s(age), data=wcgs, family=binomial) > lot(fitgam,se=t) > anova(fit,fitgam,test="chisq") Analysis of Deviance Table Model 1: chd69 ~ age Model 2: chd69 ~ s(age) Resid. Df Resid. Dev Df Deviance Pr(>Chi) * For these data (a) The informal grahical check just allows a straight line within confidence limits. (a) However, the deviance test gives a weakly significant deviation from linearity (=0.032) There may thus be some unimortant deviation from linearity. 36

37 Deviance and groued data On slides in Lecture 6 we saw that we got the same estimates and standard errors when we fitted the model with mean age in each age grou as numerical covariate using binary data and groued data R commands: summary(fit.linage) chd.groued=read.table("htt:// ", header=t) fit.groued=glm(cbind(chd,no-chd)~agem, data=chd.groued, family=binomial) summary(fit.groued) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 agem e-10 Null deviance: on 3153 degrees of freedom Residual deviance: on 3152 degrees of freedom Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 agem e-10 Null deviance: on 4 degrees of freedom Residual deviance: on 3 degrees of freedom 37

38 We see that the "residual deviance" and the "null deviance" are not the same when we use binary data and when we use groued data However, the difference between the two is the same in both cases As long as we look at differences between deviances, it does not matter whether we used binary or groued data 38

Logistic regression with one predictor. STK4900/ Lecture 7. Program

Logistic regression with one predictor. STK4900/ Lecture 7. Program Logstc regresson wth one redctor STK49/99 - Lecture 7 Program. Logstc regresson wth one redctor 2. Maxmum lkelhood estmaton 3. Logstc regresson wth several redctors 4. Devance and lkelhood rato tests 5.

More information

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models

More information

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout

More information

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout 4a): χ 2 test of

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Models of Regression type: Logistic Regression Model for Binary Response Variable

Models of Regression type: Logistic Regression Model for Binary Response Variable Models of Regression tye: Logistic Regression Model for Binary Resonse Variable Gebrenegus Ghilagaber March 7, 2008 Introduction to Logistic Regression Let Y be a binary (0, ) variable de ned as 8 < if

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analysis of Variance and Design of Exeriment-I MODULE II LECTURE -4 GENERAL LINEAR HPOTHESIS AND ANALSIS OF VARIANCE Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

Introduction to logistic regression

Introduction to logistic regression Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

One-way ANOVA Inference for one-way ANOVA

One-way ANOVA Inference for one-way ANOVA One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

The Poisson Regression Model

The Poisson Regression Model The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle

More information

Introduction to Probability and Statistics

Introduction to Probability and Statistics Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi LOGISTIC REGRESSION VINAANAND KANDALA M.Sc. (Agricultural Statistics), Roll No. 444 I.A.S.R.I, Library Avenue, New Delhi- Chairerson: Dr. Ranjana Agarwal Abstract: Logistic regression is widely used when

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

7.2 Inference for comparing means of two populations where the samples are independent

7.2 Inference for comparing means of two populations where the samples are independent Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

HEC Lausanne - Advanced Econometrics

HEC Lausanne - Advanced Econometrics HEC Lausanne - Advanced Econometrics Christohe HURLI Correction Final Exam. January 4. C. Hurlin Exercise : MLE, arametric tests and the trilogy (3 oints) Part I: Maximum Likelihood Estimation (MLE) Question

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

STK4900/ Lecture 5. Program

STK4900/ Lecture 5. Program STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward

More information

Background. GLM with clustered data. The problem. Solutions. A fixed effects approach

Background. GLM with clustered data. The problem. Solutions. A fixed effects approach Background GLM with clustered data A fixed effects aroach Göran Broström Poisson or Binomial data with the following roerties A large data set, artitioned into many relatively small grous, and where members

More information

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1. Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u

More information

ECON 4130 Supplementary Exercises 1-4

ECON 4130 Supplementary Exercises 1-4 HG Set. 0 ECON 430 Sulementary Exercises - 4 Exercise Quantiles (ercentiles). Let X be a continuous random variable (rv.) with df f( x ) and cdf F( x ). For 0< < we define -th quantile (or 00-th ercentile),

More information

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Econ 379: Business and Economics Statistics Instructor: Yogesh Ual Email: yual@ysu.edu Chater 9, Part A: Hyothesis Tests Develoing Null and Alternative Hyotheses Tye I and Tye II Errors Poulation Mean:

More information

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution Monte Carlo Studies Do not let yourself be intimidated by the material in this lecture This lecture involves more theory but is meant to imrove your understanding of: Samling distributions and tests of

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments /4/008 Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University C. A Sample of Data C. An Econometric Model C.3 Estimating the Mean of a Population C.4 Estimating the Population

More information

Hypothesis Test-Confidence Interval connection

Hypothesis Test-Confidence Interval connection Hyothesis Test-Confidence Interval connection Hyothesis tests for mean Tell whether observed data are consistent with μ = μ. More secifically An hyothesis test with significance level α will reject the

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code. Convolutional Codes Goals Lecture Be able to encode using a convolutional code Be able to decode a convolutional code received over a binary symmetric channel or an additive white Gaussian channel Convolutional

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Downloaded from jhs.mazums.ac.ir at 9: on Monday September 17th 2018 [ DOI: /acadpub.jhs ]

Downloaded from jhs.mazums.ac.ir at 9: on Monday September 17th 2018 [ DOI: /acadpub.jhs ] Iranian journal of health sciences 013; 1(): 56-60 htt://jhs.mazums.ac.ir Original Article Comaring Two Formulas of Samle Size Determination for Prevalence Studies Hamed Tabesh 1 *Azadeh Saki Fatemeh Pourmotahari

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

F O R SOCI AL WORK RESE ARCH

F O R SOCI AL WORK RESE ARCH 7 TH EUROPE AN CONFERENCE F O R SOCI AL WORK RESE ARCH C h a l l e n g e s i n s o c i a l w o r k r e s e a r c h c o n f l i c t s, b a r r i e r s a n d p o s s i b i l i t i e s i n r e l a t i o n

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

BIVARIATE DATA data for two variables

BIVARIATE DATA data for two variables (Chapter 3) BIVARIATE DATA data for two variables INVESTIGATING RELATIONSHIPS We have compared the distributions of the same variable for several groups, using double boxplots and back-to-back stemplots.

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Estimation and Detection

Estimation and Detection Estimation and Detection Lecture : Detection Theory Unknown Parameters Dr. ir. Richard C. Hendriks //05 Previous Lecture H 0 : T (x) < H : T (x) > Using detection theory, rules can be derived on how to

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

SAS for Bayesian Mediation Analysis

SAS for Bayesian Mediation Analysis Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Interactions in Logistic Regression

Interactions in Logistic Regression Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information