Logistic regression with one predictor. STK4900/ Lecture 7. Program

Similar documents
STK4900/ Lecture 7. Program

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

x i1 =1 for all i (the constant ).

Statistics for Economics & Business

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Statistics Chapter 4

Comparison of Regression Lines

a. (All your answers should be in the letter!

Chapter 11: Simple Linear Regression and Correlation

Diagnostics in Poisson Regression. Models - Residual Analysis

Lecture 4 Hypothesis Testing

Lecture 6: Introduction to Linear Regression

Negative Binomial Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Economics 130. Lecture 4 Simple Linear Regression Continued

STATISTICS QUESTIONS. Step by Step Solutions.

STAT 3008 Applied Regression Analysis

STK4080/9080 Survival and event history analysis

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

First Year Examination Department of Statistics, University of Florida

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Biostatistics 360 F&t Tests and Intervals in Regression 1

Hydrological statistics. Hydrological statistics and extremes

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

4.3 Poisson Regression

Chapter 13: Multiple Regression

Statistics for Business and Economics

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Basic Business Statistics, 10/e

e i is a random error

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

β0 + β1xi and want to estimate the unknown

Chapter 9: Statistical Inference and the Relationship between Two Variables

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

LECTURE 9 CANONICAL CORRELATION ANALYSIS

F statistic = s2 1 s 2 ( F for Fisher )

18. SIMPLE LINEAR REGRESSION III

STAT 511 FINAL EXAM NAME Spring 2001

F8: Heteroscedasticity

28. SIMPLE LINEAR REGRESSION III

Statistics MINITAB - Lab 2

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Composite Hypotheses testing

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Non-Ideality Through Fugacity and Activity

/ n ) are compared. The logic is: if the two

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Introduction to Generalized Linear Models

Lecture 2: Prelude to the big shrink

Introduction to Regression

Professor Chris Murray. Midterm Exam

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Kernel Methods and SVMs Extension

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Learning Objectives for Chapter 11

β0 + β1xi. You are interested in estimating the unknown parameters β

Lecture 3 Stat102, Spring 2007

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Chapter 14: Logit and Probit Models for Categorical Response Variables

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Comparing two Quantiles: the Burr Type X and Weibull Cases

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Unit 10: Simple Linear Regression and Correlation

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Goodness of fit and Wilks theorem

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Chapter 6. Supplemental Text Material

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Managing Capacity Through Reward Programs. on-line companion page. Byung-Do Kim Seoul National University College of Business Administration

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

January Examinations 2015

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Chapter 5 Multilevel Models

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 12 Analysis of Covariance

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

U-Pb Geochronology Practical: Background

Linear Approximation with Regularization and Moving Least Squares

Transcription:

Logstc regresson wth one redctor STK49/99 - Lecture 7 Program. Logstc regresson wth one redctor 2. Maxmum lkelhood estmaton 3. Logstc regresson wth several redctors 4. Devance and lkelhood rato tests 5. A comment on model ft Sectons 5., 5.2 (excet 5.2.6, and 5.6 Sulementary materal on lkelhood and devance We have data (x,y,, (x n,y n Here y s a bnary outcome ( or for subect and x s a redctor for the subect We let ( x E( y x P( y x The logstc regresson models take the form: ex( b+ b x ( x + ex( b + b x Ths gves a "S-shaed" relaton between (x and x and ensures that (x stays between and 2 The logstc model may alternatvely be gven n terms of the odds: ( x ex( b+ b x (* - ( x If we consder two subects wth covarate values x+d and x, resectvely, ther odds rato becomes [ x ] [ - x ] ( x+d - ( +D ( x ( ( b + b x+d ex ( ex( b + b x ex( bd In artcular e b s the odds rato corresondng to one unt's ncrease n the value of the covarate By (* the logstc regresson model may also be gven as: é ( x ù log ê b + b x (** - ( x ú ë û Thus the logstc regresson model s lnear n the log-odds 3 Consder the WCGS study wth CHD as outcome and age as redctor (ndvdual age, not groued age as we consdered n Lecture 6 wcgsread.table("htt://www.uo.no/studer/emner/matnat/math/stk49/v3/wcgs.txt", se"\t",headert,na.strngs"." ftglm(chd69~age, datawcgs,famlybnomal summary(ft R outut (edted: Estmate Std. Error z value Pr(> z (Intercet -5.9395.5493 -.83 < 2e-6 age.744.3 6.585 4.56e-.744 The odds rato for one year ncrease n age s e.77 whle the.744 odds rato for a ten-year ncrease s e 2. (The numbers devate slghtly from those on slde 25 from Lecture 6, snce there we used mean age for each age grou whle here we use the ndvdual ages How s the estmaton erformed for the logstc regresson model? 4

Maxmum lkelhood estmaton Estmaton n the logstc model s erformed usng maxmum lkelhood estmaton We frst descrbe maxmum lkelhood estmaton for the lnear regresson model: For ease of resentaton, we assume that 2 s known The densty of y takes the form (cf slde 2 from Lecture : The lkelhood s the smultaneous densty b consdered as a functon of the arameters and b for the observed values of the y We estmate the arameters by maxmzng the lkelhood. Ths corresonds to fndng the arameters that make the observed y as lkely as ossble Maxmzng the lkelhood L s the same as maxmzng whch s the same as mnmzng 5 For the lnear regresson model, maxmum lkelhood estmaton concdes wth least squares estmaton 6 We then consder the stuaton for logstc regresson We have data (x,y,, (x n,y n, where s a bnary outcome ( or for subect and s a redctor Here we have where P( y x P( y x - ex( b+ b x + ex( b + b x Thus the dstrbuton of y may be wrtten as y P( y x (- - y x y 7 The lkelhood becomes Snce n Õ y L P( y x (- - n Õ ex( b+ b x + ex( b + b x the lkelhood s, for gven observatons, a functon of the unknown arameters b and b b We estmate and b by the values of these arameters that maxmze the lkelhood These estmates are called the maxmum lkelhood estmates (MLE and are denoted ˆ and bˆ b y 8

Confdence nterval for b and odds rato 95% confdence nterval for (based on the normal aroxmaton: OR ex( b ˆ b ±.96 se( ˆ b b s the odds rato for one unt's ncrease n x We obtan a 95% confdence nterval for OR by transformng the lower and uer lmts of the confdence nterval for b R functon for comutng odds rato wth 95% confdence lmts excoeffuncton(glmob { regtabsummary(glmob$coef excoefex(regtab[,] lowerexcoef*ex(-.96*regtab[,2] uerexcoef*ex(.96*regtab[,2] cbnd(excoef,lower,uer } In the CHD examle we have ˆ b.744 and se( ˆ b.3 95% confdence nterval for b :.744±.96.3.e. from.52 to.96 excoef(ft R outut (edted: Estmate of odds rato OR ex(.744.77 95% confdence nterval for OR : excoef lower uer (Intercet.26.9.77 age.77.54. from ex(.52.53 to ex(.96. 9 Wald test for H : b Multle logstc regresson To test the null hyothess H : b versus the two-sded alternatve H : b ¹ we use the Wald test statstc: A ˆ b z se( ˆ b We reect H for large values of z Assume now that we for each subect have a bnary outcome y We let redctors x, x2,..., x ( x, x,..., x E( y x, x,..., x P( y x, x,..., x 2 2 2 Under H the test statstc s aroxmately standard normal Logstc regresson model: P-value (two-sded: P 2 P(Z > z where Z s standard normal In the CHD examle we have ˆ b.744 and se( ˆ b.3 Wald test statstc ( x, x,..., x 2 ex( b + b x + b x +... + b x 2 2 + b + b x + b2 x2 + + b x Alternatvely the model may be wrtten: ex(... z.744/.3 6.58 whch s hghly sgnfcant (cf. slde 4 æ ( x, x,..., x ö b + b x + b x + + b x è ø 2 log 2 2... ç - ( x, x2,..., x 2

The logstc model may also be gven n terms of the odds: ( x, x2,..., x ex( b+ b x+ b2 x2+... + b x - ( x, x,..., x 2 If we consder two subects wth values x +D and x, for the frst covarate and the same values for all the others, ther odds rato becomes ( x+d, x2,..., x é ë - ( x+d, x2,..., x ù û ( x, x2,..., x é ë - ( x, x2,..., x ù û ex( b+ b ( x+d + b2 x2+... + b x ex( bd ex( b + b x + b x +... + b x 2 2 In artcular e b s the odds rato corresondng to one unt's ncrease n the value of the frst covarate holdng all other covarates constant Wald tests and confdence ntervals ˆ b MLE for b se( ˆ b standard error for To test the null hyothess H : we use the Wald test statstc: b ˆ b z se( ˆ b 95% confdence nterval for : ˆ b ±.96 se( ˆ b b OR ex( b s the odds rato for one unt's ncrease n the value of the -th covarate holdng all other covarates constant ˆ b whch s aroxmately N(,-dstrbuted under H A smlar nterretaton holds for the other regresson coeffcents 3 We obtan a 95% confdence nterval for OR by transformng the lower and uer lmts of the confdence nterval for b 4 Consder the WCGS study wth CHD as outcome and age, cholesterol (mg/dl, systolc blood ressure (mmhg, body mass ndex (kg/m 2, and smokng (yes, no as redctors (as on age 68 n the text book we omt an ndvdual wth an unusually hgh cholesterol value wcgs.multglm(chd69~age+chol+sb+bm+smoke, datawcgs, famlybnomal, subset(chol<6 summary(wcgs.mult R outut (edted: Estmate Std. Error z value Pr(> z (Intercet -2.3.9773-2.598 < 2e-6 age.644.9 5.42 6.22e-8 chol.7.5 7.79.45e-2 sb.93.4 4.76 2.4e-6 bm.574.264 2.79.293 smoke.6345.4 4.526 6.e-6 Odds ratos wth confdence ntervals R command (usng the functon from slde : excoef(wcgs.mult R outut (edted: excoef lower uer (Intercet 4.5e-6 6.63e-7 3.6e-5 age.67.42.92 chol..8.4 sb.9..28 bm.59.6.5 smoke.886.433 2.482 5 6

For a numercal covarate t may be more meanngful to resent an odds rato corresondng to a larger ncrease than one unt (cf. slde 3 Ths s easly acheved by refttng the model wth a rescaled covarate If you (e.g want to study the effect of a ten-years ncrease n age, you ft the model wth the covarate age_age/ wcgs.rescglm(chd69~age_+chol_5+sb_5+bm_+smoke, datawcgs, famlybnomal, subset(chol<6 summary(wcgs.resc R outut (edted: Estmate Std. Error z value Pr(> z (Intercet -3.6.6-2.598 < 2e-6 age_.644.9 5.42 6.22e-8 chol_5.537.76 7.79.45e-2 sb_5.965.25 4.76 2.4e-6 bm_.574.264 2.79.293 smoke.634.4 4.526 6.e-6 Odds ratos wth confdence ntervals: R command (usng the functon from slde : excoef(wcgs.resc R outut (edted: excoef lower uer (Intercet.494.394.62 age_.95.585 2.457 chol_5.7.4746.9853 sb_5 2.624.7573 3.98 bm_.776.595 2.977 smoke.886.4329 2.4824 Note that values of the Wald test statstc are not changed (cf. slde 5 7 8 An am of the WCGS study was to study the effect on CHD of certan behavoral atterns, denoted A, A2, B3 and B4 Behavoral attern s a categorcal covarate wth four levels, and must be ftted as a factor n R wcgs$behcatfactor(wcgs$behat wcgs.behglm(chd69~age_+chol_5+sb_5+bm_+smoke+behcat, datawcgs, famlybnomal, subset(chol<6 summary(wcgs.beh R outut (edted: Estmate Std. Error z value Pr(> z (Intercet -2.7527.2259-2.9 < 2e-6 age_.664.99 5.57 4.25e-7 chol_5.533.764 6.98 2.96e-2 sb_5.96.265 4.367.26e-5 bm_.5536.2656 2.84.372 smoke.647.4 4.285.82e-5 behcat2.66.222.298.7654 behcat3 -.6652.2423-2.746.6 behcat4 -.5585.392 -.75.82 9 Here we may be nterested n : Testng f behavoral atterns have an effect on CHD rsk Testng f t s suffcent to use two categores for behavoral attern (A and B In general we consder a logstc regresson model: ( x, x,..., x 2 ex( b + b x + b x +... + b x 2 2 + b + b x + b2 x2 + + b x ex(... Here we want to test the null hyothess that q of the b 's are equal to zero, or equvalently that there are q lnear restrctons among the b 's Examles: H : b b b b ( q 4 2 3 4 H : b b and b b ( q 2 2 3 4 2

Devance and sum of squares For the lnear regresson model the sum of squares was a key quantty n connecton wth testng and for assessng the ft of a model We want to defne a quantty for logstc regresson that corresonds to the sum of squares To ths end we start out by consderng the relaton between the log-lkelhood and the sum of squares for the lnear regresson model For the lnear regresson model llog L takes the form (cf. slde 6: For the saturated model the are estmated by m y, and the log-lkelhood becomes For a gven secfcaton of the lnear regresson model the are estmated by the ftted values,.e. ˆ m ˆ y, wth corresondng log-lkelhood n ˆ n 2 l - log(2 s - å - 2 2 2 s ( y ˆ m 2 The devance for the model s defned as D 2( l - l ˆ and t becomes The log-lkelhood obtans ts largest value for the saturated model,.e. the model where there are no restrctons on the 2 n D y -m å 2 s ( ˆ 2 For the lnear regresson model the devance s ust the sum of squares for the ftted model dvded by 2 22 Devance for bnary data We then consder logstc regresson wth data ( y, x, x,..., x,2,..., n y 2 where s bnary resonse and the are redctors We ntroduce P( y x, x,..., x and note that the 2 log-lkelhood l l( s a functon of,..., n,..., n (cf. slde 8 x For a ftted logstc regresson model we obtan the estmated robabltes ex( ˆ b+ ˆ b ˆ x +... + b x ˆ ˆ( x, x2,..., x + ex( ˆ b + ˆ b x +... + ˆ b x and the corresondng value lˆ l( ˆ ˆ,..., n of the log-lkelhood The devance for the model s defned as D 2( l -ˆ lˆ For the saturated model,.e. the model where there are no restrctons on the, the are estmated by y and the log-lkelhood takes the value l l,....., n l l( (,..., The devance tself s not of much use for bnary data But by comarng the devances of two models, we may check f one gves a better ft than the other. 23 24

Consder the WCGS study wth age, cholesterol, systolc blood ressure, body mass ndex, smokng and behavoral attern as redctors (cf slde 9 R outut (edted: Estmate Std. Error z value Pr(> z (Intercet -2.7527.2259-2.9 < 2e-6 age_.664.99 5.57 4.25e-7 chol_5.533.764 6.98 2.96e-2 sb_5.96.265 4.367.26e-5 bm_.5536.2656 2.84.372 smoke.647.4 4.285.82e-5 behcat2.66.222.298.7654 behcat3 -.6652.2423-2.746.6 behcat4 -.5585.392 -.75.82 Null devance: 774.2 on 34 degrees of freedom Resdual devance: 589.6 on 332 degrees of freedom Devance and lkelhood rato tests We want to test the null hyothess H that q of the b 's are equal to zero, or equvalently that there are q lnear restrctons among the b 's To test the null hyothess, we use the test statstc G D - D where D s the devance under the null hyothess and D s the devance for the ftted model (not assumng H We reect H for large values of G The devance of the ftted model s denoted "resdual devance" n the outut The "null devance" s the devance for the model wth no covarates,.e. for the model where all the are assumed to be equal 25 To comute P-values, we use that the test statstc G s ch-square dstrbuted wth q degrees of freedom under H 26 We wll show how we may rewrte G n terms of the lkelhood rato We have D 2( l - l and D 2( l - l Here where Thus G D - D lˆ log Lˆ and lˆ log Lˆ Lˆ max L and Lˆ max L model H 2( l - l ˆ - 2( l - l ˆ 2( l ˆ ˆ l - - -2log( Lˆ ˆ L Thus large values of G corresonds to small values of the lkelhood rato Lˆ and the test based on G s equvalent to Lˆ the lkelhood rato test 27 For the model wth age, cholesterol, systolc blood ressure, body mass ndex, smokng, and behavoral attern as redctors (cf slde 25 the devance becomes D589.6 For the model wthout behavoral attern (cf slde 7 the devance takes the value D 64.4 The test statstc takes the value: anova(wcgs.resc,wcgs.beh,test"chsq" R outut (edted: G D - D 64.4-589.6 24.8 Analyss of Devance Table Model : chd69 ~ age_ + chol_5 + sb_5 + bm_ + smoke Model 2: chd69 ~ age_ + chol_5 + sb_5 + bm_ + smoke + behcat Resd.Df Resd.Dev Df Devance P(> Ch 335 64.4 2 332 589.6 3 24.765.729e-5 28

Model ft for lnear regresson (revew. Lnearty 2. Constant varance 3. Indeendent resonses 4. Normally dstrbuted error terms and no outlers Model ft for logstc regresson. Lnearty: Stll relevant, see followng sldes 2. Heteroscedastc model, Var( y x (-,.e. deends on E( y x. However ths non-constant varance s taken care of by the maxmum lkelhood estmaton. 3. Indeendent resonses: See Lecture on Frday. 4. Not relevant, data are bnary, no outlers n resonses (but there could well be extreme covarates, nfluental observatons. 29 Checkng lnearty for logstc regresson We want to check f the robabltes can be adequately descrbed by the lnear exresson æ ( x, x,..., x ö b + b x + b x + + b x è ø 2 log 2 2... ç - ( x, x2,..., x We wll dscuss 3 aroaches:. Groung the covarates 2. Addng square terms or logarthmc terms to the model 3. Extendng the model to generalzed addtve models (GAM æ ( x, x,..., x ö b + + + + è ø 2 log f( x f2( x2... f ( x ç - ( x, x2,..., x 3. Groung the varables For a smle llustraton we consder the stuaton where age s the only covarate n the model for CHD, and we want to check f the effect of age s lnear (on the log-odds scale ft.catageglm(chd69~factor(agec, datawcgs,famlybnomal wcgs$agem39.5*(wcgs$agec+42.9*(wcgs$agec+47.9*(wcgs$agec2+ 52.8*(wcgs$agec3+57.3*(wcgs$agec4 ft.lnageglm(chd69~agem, datawcgs,famlybnomal summary(ft.catage summary(ft.lnage anova(ft.lnage, ft.catage,test"chsq" The rocedure wll be smlar f there are other covarates n addton to age We may here ft a model consderng the age grou as a factor (age grous: 35-4, 4-45, 46-5, 5-55, 56-6 Or we may ft a model where the mean age n each age grou s used as numercal covarate (means: 39.5, 42.9, 47.9, 52.8, 57.3 We may then use a devance test to check f flexble categorcal model gves a better ft than the lnear numercal. Here we fnd no mrovement,.269 3 R outut (edted: Estmate Std. Error z value Pr(> z (Intercet -2.843.85-5.62 < 2e-6 factor(agec -.35.23 -.569.569 factor(agec2.537.2235 2.374.8 factor(agec3.84.2275 3.697.2 factor(agec4.6.2585 4. 4.3e-5 Estmate Std. Error z value Pr(> z (Intercet -5.9466.566 -.588 < 2e-6 agem.747.6 6.445.5e- Model : chd69 ~ agem Model 2: chd69 ~ factor(agec Resd. Df Resd. Dev Df Devance P(> Ch 352 74.2 2 349 736.3 3 3.928.269 32

f( x 2. Addng square terms or log-terms The smle model log é ( x ù ê b ( + b x - x ú ë û can be extended to more flexble models such as ( x 2 logê é ù b + b x + b2x ( x ú ë - û or ( x logê é ù b + b x + b2 log( x - ( x ú ë û We may then use a devance test to check f the flexble models gves a better ft than the orgnal. Here we nether fnd any mrovement,.79 and.88 ftglm(chd69~age, datawcgs,famlybnomal fta2glm(chd69~age+i(age^2, datawcgs,famlybnomal ftlogglm(chd69~age+log(age, datawcgs,famlybnomal anova(ft,fta2,test"chsq" anova(ft,ftlog,test"chsq" R outut (edted: > anova(ft,fta2,test"chsq" Model : chd69 ~ age Model 2: chd69 ~ age + I(age^2 Resd. Df Resd. Dev Df Devance Pr(>Ch 352 738.4 2 35 738.3.69473.792 > anova(ft,ftlog,test"chsq" Model : chd69 ~ age Model 2: chd69 ~ age + log(age Resd. Df Resd. Dev Df Devance Pr(>Ch 352 738.4 2 35 738.3.958.892 However, n these data, age, age^2 and log(age are strongly correlated 33 34 2b. Addng a less correlated term R outut (edted: > ftlogbglm(chd69~age+log(age-38.9, datawcgs,famlybnomal > anova(ft,ftlogb,test"chsq" Model : chd69 ~ age Model 2: chd69 ~ age + log(age - 38.9 Resd. Df Resd. Dev Df Devance Pr(>Ch 352 738.4 2 35 735.8 2.5793.83 > > cor(wcgs$age,log(wcgs$age-38.9 [].852647 > cor(wcgs$age,log(wcgs$age [].998432 Stll no sgnfcant mrovement (., but the new covarate s stll qute correlated wth orgnal covarate. 3. Generalzed addtve model In ths examle ust wth one covarate æ ( x ö b + è ø where f ( x s a smooth functon estmated by the rogram. log ç f( x - ( x The aroach can easly be extended to several covarates. We can then (a Plot the estmated functon wth confdence ntervals. Wll a straght lne ft wthn the confdence lmts? (a Comare the smle and flexble model by a devance test. 35 36

R outut (edted: > lbrary(gam ftgamgam(chd69~s(age, datawcgs, famlybnomal > lot(ftgam,set > anova(ft,ftgam,test"chsq" Analyss of Devance Table Model : chd69 ~ age Model 2: chd69 ~ s(age Resd. Df Resd. Dev Df Devance Pr(>Ch 352 738.4 2 349 729.6 3 8.7622.3263 * For these data (a The nformal grahcal check ust allows a straght lne wthn confdence lmts. (a However, the devance test gves a weakly sgnfcant devaton from lnearty (.32 There may thus be some unmortant devaton from lnearty. 37 Devance and groued data On sldes 26-28 n Lecture 6 we saw that we got the same estmates and standard errors when we ftted the model wth mean age n each age grou as numercal covarate usng bnary data and groued data summary(ft.lnage chd.grouedread.table("htt://www.uo.no/studer/emner/matnat/math/stk49/v3/chd_groued.txt ", headert ft.grouedglm(cbnd(chd,no-chd~agem, datachd.groued, famlybnomal summary(ft.groued R outut (edted: Estmate Std. Error z value Pr(> z (Intercet -5.9466.566 -.588 < 2e-6 agem.747.6 6.445.5e- Null devance: 78.2 on 353 degrees of freedom Resdual devance: 74.2 on 352 degrees of freedom Estmate Std. Error z value Pr(> z (Intercet -5.9466.566 -.588 < 2e-6 agem.747.6 6.445.5e- Null devance: 44.95 on 4 degrees of freedom Resdual devance: 3.928 on 3 degrees of freedom 38 We see that the "resdual devance" and the "null devance" are not the same when we use bnary data and when we use groued data However, the dfference between the two s the same n both cases As long as we look at dfferences between devances, t does not matter whether we used bnary or groued data Further note that the resdual devance for the model wth groued data s the same as we got on slde 3 when comarng the models wth age as a numercal and categorcal covarate (based on bnary data When we use groued data, the resdual devance can be used as a goodness-of-ft test 39