STK4900/ Lecture 7. Program
|
|
- Madison Judith Casey
- 5 years ago
- Views:
Transcription
1 STK4900/ Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment on model fit Sections 5.1, 5.2 (excet 5.2.6), and 5.6 Sulementary material on likelihood and deviance 1
2 Logistic regression with one redictor We have data (x 1,y 1 ),, (x n,y n ) y Here i is a binary outcome (0 or 1) for subject i and i is a redictor for the subject x We let x ( ) = Ey ( x) = Py ( = 1 x) The logistic regression models take the form: x ( ) = ex( b0 + b1x) 1 + ex( b + b x) 0 1 This gives a "S-shaed" relation between (x) and x and ensures that (x) stays between 0 and 1 2
3 The logistic model may alternatively be given in terms of the odds: x ( ) 1 - x ( ) = ex( b + b x) (*) 0 1 x + D If we consider two subjects with covariate values and x, resectively, their odds ratio becomes [ x ] [ - x] x ( + D) 1 - ( + D) x ( ) 1 ( ) = ( b + b x +D ) ex ( ) 0 1 ex( b + b x) 0 1 = ex( b D) 1 e b 1 In articular is the odds ratio corresonding to one unit's increase in the value of the covariate By (*) the logistic regression model may also be given as: é x ( ) ù log ê = b0 + b1x (**) 1 - x ( ) ú ë û Thus the logistic regression model is linear in the log-odds 3
4 Consider the WCGS study with CHD as outcome and age as redictor (individual age, not groued age as we considered in Lecture 6) R commands: wcgs=read.table("htt:// se="\t",header=t,na.strings=".") fit=glm(chd69~age, data=wcgs,family=binomial) summary(fit) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age e-11 The odds ratio for one year increase in age is odds ratio for a ten-year increase is e = while the (The numbers deviate slightly from those on slide 25 from Lecture 6, since there we used mean age for each age grou while here we use the individual ages) e = How is the estimation erformed for the logistic regression model? 4
5 Maximum likelihood estimation Estimation in the logistic model is erformed using maximum likelihood estimation We first describe maximum likelihood estimation for the linear regression model: For ease of resentation, we assume that σ 2 is known The density of y i takes the form (cf slide 12 from Lecture 1): 5
6 The likelihood is the simultaneous density b 1 considered as a function of the arameters and for the 0 observed values of the y i We estimate the arameters by maximizing the likelihood. This corresonds to finding the arameters that make the observed y i as likely as ossible Maximizing the likelihood L is the same as maximizing b which is the same as minimizing For the linear regression model, maximum likelihood estimation coincides with least squares estimation 6
7 We then consider the situation for logistic regression We have data (x 1,y 1 ),, (x n,y n ), where outcome (0 or 1) for subject i and Here we have Py ( = 1 x) = i i i Py ( = 0 x) = 1- i i i x i y i is a redictor is a binary where i = ex( b0 + b1xi ) 1 + ex( b + b x ) 0 1 i Thus the distribution of y i may be written as yi 1 y P( yi xi) = i (1- i) - i 7
8 The likelihood becomes n L= Õ P( yi xi) i= 1 n Õ = - i= 1 yi 1 y i (1 i) - i Since i = ex( b0 + b1xi ) 1 + ex( b + b x ) 0 1 i the likelihood is, for given observations, a function of the unknown arameters b and 0 b1 b0 1 b We estimate and by the values of these arameters that maximize the likelihood These estimates are called the maximum likelihood estimates (MLE) and are denoted ˆ b and ˆb 1 0 8
9 Confidence interval for b 1 and odds ratio 95% confidence interval for (based on the normal aroximation): OR = ex( b ) ˆ b ± 1.96 se( ˆ b ) b 1 is the odds ratio for one unit's increase in x We obtain a 95% confidence interval for OR by transforming the lower and uer limits of the confidence interval for b1 In the CHD examle we have 95% confidence interval for b 1 : ˆ b = and ± i.e. from to se( ˆ b ) = Estimate of odds ratio OR = ex(0.0744) = % confidence interval for OR : from ex(0.052) = to ex(0.096) =
10 R function for comuting odds ratio with 95% confidence limits excoef=function(glmobj) { regtab=summary(glmobj)$coef excoef=ex(regtab[,1]) lower=excoef*ex(-1.96*regtab[,2]) uer=excoef*ex(1.96*regtab[,2]) cbind(excoef,lower,uer) } excoef(fit) R outut (edited): excoef lower uer (Intercet) age
11 Wald test for H0: b 1 = 0 To test the null hyothesis alternative H : b ¹ 0 A z = 1 ˆ b1 se( ˆ b ) 1 H : b = We reject H 0 for large values of z versus the two-sided we often use the Wald test statistic: Under H 0 the test statistic is aroximately standard normal P-value (two-sided): P = 2 P(Z > z ) where Z is standard normal In the CHD examle we have Wald test statistic z = / = 6.58 ˆ b = and 1 se( ˆ b ) = which is highly significant (cf. slide 4) 11
12 Multile logistic regression Assume now that we for each subject have We let a binary outcome y redictors x1, x2,..., x x (, x,..., x) = Ey ( x, x,..., x) = Py ( = 1 x, x,..., x) Logistic regression model: x (, x,..., x) 1 2 ex( b + b x + b x b x ) = Alternatively the model may be written: 1 ex( b b x b x... b x ) æ x (, x,..., x) ö = b + b x + b x + + b x è ø 1 2 log ç 1 - x ( 1, x2,..., x) 12
13 The logistic model may also be given in terms of the odds: x (, x,..., x) x (, x,..., x) 1 2 = ex( b + b x + b x b x ) If we consider two subjects with values 1 and x 1, for the first covariate and the same values for all the others, their odds ratio becomes x ( 1+ D, x2,..., x) é ë1 - x ( 1+ D, x2,..., x) ù û x ( 1, x2,..., x) é ë1 - x ( 1, x2,..., x) ù û ex( b0 + b1( x1+d ) + b2 x b x) = = ex( b1d) ex( b + b x + b x b x ) x + D e b 1 In articular is the odds ratio corresonding to one unit's increase in the value of the first covariate holding all other covariates constant A similar interretation holds for the other regression coefficients 13
14 Wald tests and confidence intervals ˆ b = MLE for b j se( ˆ b ) = standard error for j j ˆ b j To test the null hyothesis z = ˆ b j se( ˆ b ) j H 0 j : b j = 0 we use the Wald test statistic: which is aroximately N(0,1)-distributed under H 0 j 95% confidence interval for : ˆ b ± 1.96 se( ˆ b ) OR j = ex( b ) j b j j is the odds ratio for one unit's increase in the value of the j-th covariate holding all other covariates constant j We obtain a 95% confidence interval for OR j by transforming the lower and uer limits of the confidence interval for b j 14
15 Consider the WCGS study with CHD as outcome and age, cholesterol (mg/dl), systolic blood ressure (mmhg), body mass index (kg/m 2 ), and smoking (yes, no) as redictors (as on age 152 in the text book we omit an individual with an unusually high cholesterol value) R commands: wcgs.mult=glm(chd69~age+chol+sb+bmi+smoke, data=wcgs, family=binomial, subset=(chol<600)) summary(wcgs.mult) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age e-08 chol e-12 sb e-06 bmi smoke e-06 15
16 Odds ratios with confidence intervals R command (using the function from slide 10): excoef(wcgs.mult) R outut (edited): excoef lower uer (Intercet) 4.50e e e-05 age chol sb bmi smoke
17 For a numerical covariate it may be more meaningful to resent an odds ratio corresonding to a larger increase than one unit (cf. slide 13) This is easily achieved by refitting the model with a rescaled covariate If you (e.g) want to study the effect of a ten-years increase in age, you fit the model with the covariate age_10=age/10 R commands: wcgs.resc=glm(chd69~age_10+chol_50+sb_50+bmi_10+smoke, data=wcgs, family=binomial, subset=(chol<600)) summary(wcgs.resc) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age_ e-08 chol_ e-12 sb_ e-06 bmi_ smoke e-06 Note that values of the Wald test statistic are not changed (cf. slide 15) 17
18 Odds ratios with confidence intervals: R command (using the function from slide 10): excoef(wcgs.resc) R outut (edited): excoef lower uer (Intercet) age_ chol_ sb_ bmi_ smoke
19 An aim of the WCGS study was to study the effect on CHD of certain behavioral atterns, denoted A1, A2, B3 and B4 Behavioral attern is a categorical covariate with four levels, and must be fitted as a factor in R R commands: wcgs$behcat=factor(wcgs$behat) wcgs.beh=glm(chd69~age_10+chol_50+sb_50+bmi_10+smoke+behcat, data=wcgs, family=binomial, subset=(chol<600)) summary(wcgs.beh) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age_ e-07 chol_ e-12 sb_ e-05 bmi_ smoke e-05 behcat behcat behcat
20 Here we may be interested in : Testing if behavioral atterns have an effect on CHD risk Testing if it is sufficient to use two categories for behavioral attern (A and B) In general we consider a logistic regression model: x (, x,..., x) 1 2 ex( b + b x + b x b x ) = ex( b b x b x... b x ) Here we want to test the null hyothesis that q of the zero, or equivalently that there are q linear restrictions among the b j 's are equal to b j 's Examles: H H : b = b = b = b = 0 ( q= 4) : b = b and b = b ( q= 2)
21 Deviance and sum of squares For the linear regression model the sum of squares was a key quantity in connection with testing and for assessing the fit of a model We want to define a quantity for logistic regression that corresonds to the sum of squares To this end we start out by considering the relation between the log-likelihood and the sum of squares for the linear regression model For the linear regression model l=log L takes the form (cf. slide 6): The log-likelihood obtains its largest value for the saturated model, i.e. the model where there are no restrictions on the μ i 21
22 For the saturated model the μ i are estimated by log-likelihood becomes, and the For a given secification of the linear regression model the μ i are estimated by the fitted values, i.e. ˆ µ ˆ, with corresonding i = yi log-likelihood lˆ n 1 s µ n 2 = - log(2 ) - 2 å i - 2 2s i = 1 ( y ˆ ) 2 i The deviance for the model is defined as and it becomes n 1 D = y - µ å 2 s i = 1 ( ˆ ) 2 i i For the linear regression model the deviance is just the sum of squares for the fitted model divided by σ 2 22
23 Deviance for binary data We then consider logistic regression with data ( y, x, x,..., x ) i = 1,2,..., n y i i 1i 2i i where is binary resonse and the are redictors x ji We introduce log-likelihood (cf. slide 8) = P( y = 1 x, x,..., x ) i i 1i 2i i l l 1 n and note that the = (,..., ) is a function of,..., 1 n For the saturated model, i.e. the model where there are no restrictions on the i, the i are estimated by the log-likelihood takes the value and 23
24 For a fitted logistic regression model we obtain the estimated robabilities ex( ˆ b0 + ˆ b ˆ 1x1i bxi) ˆi = ˆ( x1 i, x2i,..., xi) = 1 + ex( ˆ b + ˆ b x ˆ b x ) 0 1 1i i and the corresonding value ˆ ( ˆ,..., ˆ ) l = l 1 n of the log-likelihood The deviance for the model is defined as The deviance itself is not of much use for binary data But by comaring the deviances of two models, we may check if one gives a better fit than the other. 24
25 Consider the WCGS study with age, cholesterol, systolic blood ressure, body mass index, smoking and behavioral attern as redictors (cf slide 19) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 age_ e-07 chol_ e-12 sb_ e-05 bmi_ smoke e-05 behcat behcat behcat Null deviance: on 3140 degrees of freedom Residual deviance: on 3132 degrees of freedom The deviance of the fitted model is denoted "residual deviance" in the outut The "null deviance" is the deviance for the model with no covariates, i.e. for the model where all the i are assumed to be equal 25
26 Deviance and likelihood ratio tests We want to test the null hyothesis H 0 that q of the b j 's are equal to zero, or equivalently that there are q linear restrictions among the b j 's To test the null hyothesis, we use the test statistic G = D0 - D where D 0 is the deviance under the null hyothesis and D is the deviance for the fitted model (not assuming H 0 ) We reject H 0 for large values of G To comute P-values, we use that the test statistic G is chi-square distributed (c 2 ) with q degrees of freedom under H 0 26
27 We will show how we may rewrite G in terms of the likelihood ratio We have Here where lˆ= log Lˆ and lˆ = log Lˆ 0 0 Lˆ = max L and Lˆ = max L model 0 H 0 Thus G = D0 - D ˆ ˆ = -2( l0 - l) = -2log( Lˆ ˆ) 0 L Thus large values of G corresonds to small values of the likelihood ratio Lˆ the likelihood ratio test 0 Lˆ and the test based on G is equivalent to 27
28 For the model with age, cholesterol, systolic blood ressure, body mass index, smoking, and behavioral attern as redictors (cf slide 25) the deviance becomes R commands: anova(wcgs.resc,wcgs.beh,test="chisq") R outut (edited): Analysis of Deviance Table D = For the model without behavioral attern (cf slide 17) the deviance takes the value D 0 = The test statistic takes the value: G= D0 - D= = 24.8 Model 1: chd69 ~ age_10 + chol_50 + sb_50 + bmi_10 + smoke Model 2: chd69 ~ age_10 + chol_50 + sb_50 + bmi_10 + smoke + behcat Resid.Df Resid.Dev Df Deviance P(> Chi ) e-05 28
29 Model fit for linear regression (review) 1. Linearity 2. Constant variance 3. Indeendent resonses 4. Normally distributed error terms and no outliers Model fit for logistic regression 1. Linearity: Still relevant, see following slides Var( y x ) = (1- ) 2. Heteroscedastic model, i i i i, i.e. deends on. Ey ( x) = i i i However this non-constant variance is taken care of by the maximum likelihood estimation. 3. Indeendent resonses: See Lecture 10 on Friday. 4. Not relevant, data are binary, no outliers in resonses (but there could well be extreme covariates, influential observations). 29
30 Checking linearity for logistic regression We want to check if the robabilities can be adequately described by the linear exression æ x (, x,..., x) ö = b + b x + b x + + b x è ø 1 2 log ç 1 - x ( 1, x2,..., x) We will discuss 3 aroaches: 1. Grouing the covariates 2. Adding square terms or logarithmic terms to the model 3. Extending the model to generalized additive models (GAM) æ x (, x,..., x) ö = log b0 f1( x1) f2( x2)... f( x) ç 1 - x ( 1, x2,..., x) è ø 30
31 1. Grouing the variables For a simle illustration we consider the situation where age is the only covariate in the model for CHD, and we want to check if the effect of age is linear (on the log-odds scale) The rocedure will be similar if there are other covariates in addition to age We may here fit a model considering the age grou as a factor (age grous: 35-40, 41-45, 46-50, 51-55, 56-60) Or we may fit a model where the mean age in each age grou is used as numerical covariate (means: 39.5, 42.9, 47.9, 52.8, 57.3) We may then use a deviance test to check if flexible categorical model gives a better fit than the linear numerical. Here we find no imrovement, =
32 R commands: fit.catage=glm(chd69~factor(agec), data=wcgs,family=binomial) wcgs$agem=39.5*(wcgs$agec==0)+42.9*(wcgs$agec==1)+47.9*(wcgs$agec==2)+ 52.8*(wcgs$agec==3)+57.3*(wcgs$agec==4) fit.linage=glm(chd69~agem, data=wcgs,family=binomial) summary(fit.catage) summary(fit.linage) anova(fit.linage, fit.catage,test="chisq") R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 factor(agec) factor(agec) factor(agec) factor(agec) e-05 Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 agem e-10 Model 1: chd69 ~ agem Model 2: chd69 ~ factor(agec) Resid. Df Resid. Dev Df Deviance P(> Chi )
33 2. Adding square terms or log-terms The simle model can be extended to more flexible models such as or log é x ( ) ù ê = b 1 ( ) 0 + b1x - x ú ë û log é x ( ) ù ê = b 1 ( ) 0 + b1x + b2x - x ú ë û ( ) log é x ê ù = b + 0 b1x + b2log( x) 1 - x ( ) ú ë û We may then use a deviance test to check if the flexible models give a better fit than the original. Here we neither find any imrovement, =0.79 and =
34 R commands: fit=glm(chd69~age, data=wcgs,family=binomial) fita2=glm(chd69~age+i(age^2), data=wcgs,family=binomial) fitlog=glm(chd69~age+log(age), data=wcgs,family=binomial) anova(fit,fita2,test="chisq") anova(fit,fitlog,test="chisq") R outut (edited): > anova(fit,fita2,test="chisq") Model 1: chd69 ~ age Model 2: chd69 ~ age + I(age^2) Resid. Df Resid. Dev Df Deviance Pr(>Chi) > anova(fit,fitlog,test="chisq") Model 1: chd69 ~ age Model 2: chd69 ~ age + log(age) Resid. Df Resid. Dev Df Deviance Pr(>Chi)
35 3. Generalized additive model In this examle just with one covariate: where f1( x1) æ x ( ) ö = + è ø 1 log ç b0 f1( x1) 1 - x ( 1) is a smooth function estimated by the rogram. The aroach can easily be extended to several covariates. We can then (a) Plot the estimated function with confidence intervals. Will a straight line fit within the confidence limits? (a) Comare the simle and flexible model by a deviance test. 35
36 R outut (edited): > library(gam) Øfitgam=gam(chd69~s(age), data=wcgs, family=binomial) > lot(fitgam,se=t) > anova(fit,fitgam,test="chisq") Analysis of Deviance Table Model 1: chd69 ~ age Model 2: chd69 ~ s(age) Resid. Df Resid. Dev Df Deviance Pr(>Chi) * For these data (a) The informal grahical check just allows a straight line within confidence limits. (a) However, the deviance test gives a weakly significant deviation from linearity (=0.032) There may thus be some unimortant deviation from linearity. 36
37 Deviance and groued data On slides in Lecture 6 we saw that we got the same estimates and standard errors when we fitted the model with mean age in each age grou as numerical covariate using binary data and groued data R commands: summary(fit.linage) chd.groued=read.table("htt:// ", header=t) fit.groued=glm(cbind(chd,no-chd)~agem, data=chd.groued, family=binomial) summary(fit.groued) R outut (edited): Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 agem e-10 Null deviance: on 3153 degrees of freedom Residual deviance: on 3152 degrees of freedom Estimate Std. Error z value Pr(> z ) (Intercet) < 2e-16 agem e-10 Null deviance: on 4 degrees of freedom Residual deviance: on 3 degrees of freedom 37
38 We see that the "residual deviance" and the "null deviance" are not the same when we use binary data and when we use groued data However, the difference between the two is the same in both cases As long as we look at differences between deviances, it does not matter whether we used binary or groued data 38
Logistic regression with one predictor. STK4900/ Lecture 7. Program
Logstc regresson wth one redctor STK49/99 - Lecture 7 Program. Logstc regresson wth one redctor 2. Maxmum lkelhood estmaton 3. Logstc regresson wth several redctors 4. Devance and lkelhood rato tests 5.
More informationStatistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform
Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May
More informationFinite Mixture EFA in Mplus
Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered
More informationHotelling s Two- Sample T 2
Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test
More informationSTK4900/ Lecture 3. Program
STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies
More informationOutline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution
Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models
More informationBiostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression
Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout
More informationBiostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression
Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout 4a): χ 2 test of
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More informationModels of Regression type: Logistic Regression Model for Binary Response Variable
Models of Regression tye: Logistic Regression Model for Binary Resonse Variable Gebrenegus Ghilagaber March 7, 2008 Introduction to Logistic Regression Let Y be a binary (0, ) variable de ned as 8 < if
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analysis of Variance and Design of Exeriment-I MODULE II LECTURE -4 GENERAL LINEAR HPOTHESIS AND ANALSIS OF VARIANCE Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur
More informationCHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit
Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationTests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)
Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationOne-way ANOVA Inference for one-way ANOVA
One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationMorten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014
Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationUse of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek
Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.
More informationStatistical Modelling with Stata: Binary Outcomes
Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationThe Poisson Regression Model
The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle
More informationIntroduction to Probability and Statistics
Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based
More informationResearch Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **
Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationLOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi
LOGISTIC REGRESSION VINAANAND KANDALA M.Sc. (Agricultural Statistics), Roll No. 444 I.A.S.R.I, Library Avenue, New Delhi- Chairerson: Dr. Ranjana Agarwal Abstract: Logistic regression is widely used when
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationEstimating function analysis for a class of Tweedie regression models
Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More information7.2 Inference for comparing means of two populations where the samples are independent
Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationCHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules
CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationHEC Lausanne - Advanced Econometrics
HEC Lausanne - Advanced Econometrics Christohe HURLI Correction Final Exam. January 4. C. Hurlin Exercise : MLE, arametric tests and the trilogy (3 oints) Part I: Maximum Likelihood Estimation (MLE) Question
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationCombining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)
Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment
More informationSTK4900/ Lecture 5. Program
STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward
More informationBackground. GLM with clustered data. The problem. Solutions. A fixed effects approach
Background GLM with clustered data A fixed effects aroach Göran Broström Poisson or Binomial data with the following roerties A large data set, artitioned into many relatively small grous, and where members
More informationOutline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding
Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationLECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]
LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for
More informationExperimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.
Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationSolved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.
Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More information(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.
Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationOct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope
Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u
More informationECON 4130 Supplementary Exercises 1-4
HG Set. 0 ECON 430 Sulementary Exercises - 4 Exercise Quantiles (ercentiles). Let X be a continuous random variable (rv.) with df f( x ) and cdf F( x ). For 0< < we define -th quantile (or 00-th ercentile),
More informationEcon 3790: Business and Economics Statistics. Instructor: Yogesh Uppal
Econ 379: Business and Economics Statistics Instructor: Yogesh Ual Email: yual@ysu.edu Chater 9, Part A: Hyothesis Tests Develoing Null and Alternative Hyotheses Tye I and Tye II Errors Poulation Mean:
More informationMonte Carlo Studies. Monte Carlo Studies. Sampling Distribution
Monte Carlo Studies Do not let yourself be intimidated by the material in this lecture This lecture involves more theory but is meant to imrove your understanding of: Samling distributions and tests of
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationSTA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2
STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationEstimating Time-Series Models
Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary
More informationEstimation of the large covariance matrix with two-step monotone missing data
Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationSurvival Analysis Math 434 Fall 2011
Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup
More information1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments
/4/008 Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University C. A Sample of Data C. An Econometric Model C.3 Estimating the Mean of a Population C.4 Estimating the Population
More informationHypothesis Test-Confidence Interval connection
Hyothesis Test-Confidence Interval connection Hyothesis tests for mean Tell whether observed data are consistent with μ = μ. More secifically An hyothesis test with significance level α will reject the
More informationA Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression
Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi
More informationConvolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.
Convolutional Codes Goals Lecture Be able to encode using a convolutional code Be able to decode a convolutional code received over a binary symmetric channel or an additive white Gaussian channel Convolutional
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationDownloaded from jhs.mazums.ac.ir at 9: on Monday September 17th 2018 [ DOI: /acadpub.jhs ]
Iranian journal of health sciences 013; 1(): 56-60 htt://jhs.mazums.ac.ir Original Article Comaring Two Formulas of Samle Size Determination for Prevalence Studies Hamed Tabesh 1 *Azadeh Saki Fatemeh Pourmotahari
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationF O R SOCI AL WORK RESE ARCH
7 TH EUROPE AN CONFERENCE F O R SOCI AL WORK RESE ARCH C h a l l e n g e s i n s o c i a l w o r k r e s e a r c h c o n f l i c t s, b a r r i e r s a n d p o s s i b i l i t i e s i n r e l a t i o n
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationJohn Weatherwax. Analysis of Parallel Depth First Search Algorithms
Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationBIVARIATE DATA data for two variables
(Chapter 3) BIVARIATE DATA data for two variables INVESTIGATING RELATIONSHIPS We have compared the distributions of the same variable for several groups, using double boxplots and back-to-back stemplots.
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationEstimation and Detection
Estimation and Detection Lecture : Detection Theory Unknown Parameters Dr. ir. Richard C. Hendriks //05 Previous Lecture H 0 : T (x) < H : T (x) > Using detection theory, rules can be derived on how to
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationSAS for Bayesian Mediation Analysis
Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses
More informationMachine Learning: Homework 4
10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,
More informationLower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data
Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment
More informationLogistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ
Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent
More informationdn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential
Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationOn split sample and randomized confidence intervals for binomial proportions
On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationInteractions in Logistic Regression
Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More information