Is the cholesterol concentration in blood related to the body mass index (bmi)?

Size: px

Start display at page:

Download "Is the cholesterol concentration in blood related to the body mass index (bmi)?"

Maryann Watkins
5 years ago
Views:

1 Regression problems The fundamental (statistical) problems of regression are to decide if an explanatory variable affects the response variable and estimate the magnitude of the effect Major question: Can we from a regression analysis justify a causal interpretation of the relation? Slide 1/9 Niels Richard Hansen Regression May 4, 2010

2 Cholesterol example Is the cholesterol concentration in blood related to the body mass index (bmi)? Is it causally related if I loose weight will the cholesterol concentration go down? Data from Exercise 6.4 are considered. Data are observational of nature a random sample, where we just observe the reponse as well as the explanatory variables. Slide 2/9 Niels Richard Hansen Regression May 4, 2010

3 To draw causal conclusions and estimate causal effects from observational data is in principle impossible. Doing so anyway will necessarily involve a number of assumptions and a careful inclusion of relevant additional explanatory variables. Slide 3/9 Niels Richard Hansen Regression May 4, 2010 Observational study The golden standard is a designed experiment: We control all potential explanatory variables and choose their values by a design. If we can not control everything we employ randomization the individuals are randomly assigned treatment. The purpose is to make the errors independent (or uncorrelated) of the response to faithfully estimate the treatment effect. An observational study always have of a treatment variable whose (causal) effect on the reponse is in question. However, we can not control how individuals are treated this is given by the observations.

4 Causality Ultimately, all scientific questions are causal of nature whereas probability models and statistics are descriptive of nature. Recent example from the media: Will it affect my salery if I get a child? Does it matter if I am a woman or a man? Slide 4/9 Niels Richard Hansen Regression May 4, 2010

5 Expansion of the log-likelihood function The approximation l(β) l(β 0 ) + U(β 0 )(β β 0 ) 1 2 (β β 0) T J (β 0 )(β β 0 ) is fundamental. For β 0 = ˆβ we have U(ˆβ) = 0 and get l(β) l(ˆβ) 1 2 (β ˆβ) T J (ˆβ)(β ˆβ). Slide 5/9 Niels Richard Hansen Regression May 4, 2010

6 Expansion of the score function Similarly, which for β 0 = ˆβ gives U(β) T U(β 0 ) T J (β 0 )(β β 0 ) U(β) T J (ˆβ)(β ˆβ) The approximate distribution of ˆβ is based on ˆβ β J (ˆβ) 1 U(β) T. Slide 6/9 Niels Richard Hansen Regression May 4, 2010

7 Asymptotic distribution For n we need that there is a solution to the score equation U(β) = 0 with probability tending to 1 U(β) T asymp N(0, J (β)) when the true parameter is β J (ˆβ)J (β) 1 P I when the true parameter is β. Under these assumptions and ˆβ asymp N(β, J (β) 1 ) 2(l(β) l(ˆβ)) (β ˆβ) T J (ˆβ)(β ˆβ) (β ˆβ) T asymp J (β)(β ˆβ) χ 2 (p) Slide 7/9 Niels Richard Hansen Regression May 4, 2010

8 Deviance The saturated model is a maximal model given by either a(y i ) = µ(ˆη i ) or if we have replicated observations y ij, j = 1,..., n j a(y ij ) = µ(ˆη i ) j where all explanatory variables are the same for j = 1,..., n j. Denote the log-likelihood in the MLE for the saturated model l max (y). Definition The deviance is D = 2(l max (y) l(ˆβ)) where ˆβ is the MLE for a generalized linear model. Slide 8/9 Niels Richard Hansen Regression May 4, 2010

9 Test theory If we have a null-hypothesis H 0 : β M 0 where we use M 0 to denote Model 0 as a submodel of a general model M 1 in the sense of a q-dimensional subspace of the p-dimensional parameter space. Let ˆβ 0 denote the MLE under M 0 and ˆβ 1 the MLE under M 1. Then D = D 0 D 1 = 2(l max (y) l(ˆβ 0 ; y)) 2(l max (y) l(ˆβ 1 ; y)) = 2(l(ˆβ 1 ; y) l(ˆβ 0 ; y)) ˆβ 0 P M0 ˆβ 1 asymp N(β, P M0 J(β) 1 P M0 ) under H 0. D (ˆβ 0 ˆβ 1 ) T J (ˆβ 1 )(ˆβ 0 ˆβ 1 ) asymp χ 2 (p q) Slide 9/9 Niels Richard Hansen Regression May 4, 2010

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1 Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z