Logistisk regression T.K.

Size: px

Start display at page:

Download "Logistisk regression T.K."

Gertrude Kelley
5 years ago
Views:

1 Föreläsning 13: Logistisk regression T.K

2 Your Learning Outcomes Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition likelihood function: maximum likelihood estimate

3 The setting: Y is a binary r.v.. x is its covariant, predictor or explanatory variable. x can be continuous, categorical. We cannot model the association of Y to x by a direct linear regression, Y = α + βx + e where e is, e.g., N (0, σ 2 ).

4 A Case Y = Bacterial Meningitis or Acute Viral Meningitis. x =cerebrospinal fluid total protein count.

5 A Case Diaz, Armando A and Tomba, Emanuele and Lennarson, Reese and Richard, Rex and Bagajewicz, Miguel J and Harrison, Roger G: Prediction of protein solubility in Escherichia coli using logistic regression, Biotechnology and bioengineering, 105, 2 pp , The authors develop a model for the prediction of the solubility of proteins overexpressed in the bacterium Escherichia coli. The model uses the statistical technique of logistic regression. To build this model, 32 covariates x i that could potentially correlate well with solubility were used. Logistic regression provides the probability p of a certain protein to belong (= Y ) to one set or another.

6 Part I: Concepts Odds, Odds Ratio Logit function Logistic function

7 Odds The odds of a statement (event e.t.c) A is calculated as the probability p(a) of observing A divided by the probability of not observing A: odds of A = p(a) 1 p(a) E.g., in humans an average of 51 boys are born in every 100 births, the odds of a randomly chosen delivery being boy are: odds of a boy = = 1.04 The odds of a certain thing happening are infinite.

8 An Aside: Posterior Odds The posterior probability p(h E ) of H given the evidence E: posterior odds of H = p(h E ) 1 p(h E ) If there are only two alternative hypotheses, H and not-h, then posterior odds of H = p(h E ) p(not-h E ) by Bayes formula = p(e H) p(h) p(e not-h) p(not-h). = likelihood ratio prior odds

9 Odds ratio The odds ratio ψ is the ratio of odds from two different conditions or populations. ψ = odds of A 1 odds of A 2 = p(a 1 ) 1 p(a 1 ) p(a 2 ) 1 p(a 2 )

10 The function logit(p) The logarithmic odds of success is called the logit of p ( ) p logit(p) = ln 1 p

11 The function logit(p) and its inverse ( ) p logit(p) = log 1 p If θ = logit(p), then the inverse function is p = logit 1 (θ) = eθ 1 + e θ

12 The logit(p) and its inverse The function ( ) p logit(p) = log, 0 < p < 1 1 p p = logit 1 (θ) = σ(θ) = eθ 1 + e θ = e θ 1, < θ <, 1 + e θ is called the logistic function or sigmoid. Note that σ(0) = 1 2.

13 Logistic function In biology the logistic function refers to change in size of a species population.

14 The logit(p) and the logistic function Sats The logit function ( ) p θ = logit(p) = log, 0 < p < 1 1 p and the logistic function p = σ(θ) = are inverse functions to each other. 1, < θ <, 1 + e θ

15 The logit(p) and the logistic function ( ) p 1 θ = logit(p) = log, 0 < p < 1 p = σ(θ) =, < θ <, 1 p 1 + e θ

16 The logistic distribution We see that 0 σ(x) 1. σ(x) 1, as x +. σ(x) 0, as x. σ(x) is strictly increasing. Hence σ(x) is the distribution function of a random variable.

17 The logistic distribution If for every x P(ɛ x) = σ(x) = 1, < x < 1 + e x then we say that ɛ has the logistic distribution ɛ logistic(0, 1)

18 The logistic distribution: an exercise Let P U(0, 1) and ɛ = ln P 1 P. Then ɛ logistic(0, 1)

19 The logistic distribution: another exercise ɛ logistic(0, 1) Then P( ɛ x) = P(ɛ x).

20 Simulating ɛ Logistic(0, 1) Simulate p 1,..., p n from the uniform distribution on (0, 1) and then do ɛ i = logit(p i ), i = 1,..., n. In the figure we plot the empirical distribution function of ɛ i for n =

21 Part II: Logistic Regression A regression function log odds of Y a regression function How to generate Y

22 Let β = (β 0, β 1, β 2,..., β p ) be (p + 1) 1 vector and X = (1, X 1, X 2,..., X p ) be a (p + 1) 1 -vector of (predictor) variables. We set, as in multiple regression, β T X = β 0 + β 1 X 1 + β 2 X β p X p Then G (X) = σ(β T X) = T X e = e β βt X 1 + e βt X

23 Logistic regression The predictor variables (X 1, X 2,..., X p ) can be binary, ordinal, categorical or continuous.

24 e βt X G (X) = σ(β T X) = 1 + e βt X By construction 0 < G (X) < 1. Then logit is well defined and logit(g (X)) = ln G (X) 1 G (X) = βt X = β 0 + β 1 X 1 + β 2 X β p X p.

25 Logistic regression Let now Y be a binary random variable such that { 1 with probability G (X) Y = 1 with probability 1 G (X) Definition If the logit of G (X) (or log odds of Y ) is logit(g (X)) = ln G (X) 1 G (X) = β 0 + β 1 X 1 + β 2 X β p X p, then we say that Y follows a logistic regression w.r.t. the predictor variables X = (1, X 1, X 2,..., X p ).

26 Logistic regression: genetic epidemiology Logistic regression { success with probability G (X) Y = failure with probability 1 G (X) logit(g (X)) = ln G (X) 1 G (X) = β 0 + β 1 X 1 + β 2 X β p X p. is extensively applied in biomedical research.

27 Logistic regression P(Y = 1 X) = G (X) = σ(β T X) = e βt X 1 + e βt X 1 P(Y = 0 X) = 1 G (X) = e = e β βt X 1 + e βt X = e βt X. T X

28 Logistic regression: genetic epidemiology Suppose we have two populations, where X i = x 1 in first population and X i = x 2 in the second population, all other predictors are equal in the two populations. Then a medical geneticist finds it useful to calculate the logarithm of the odds ratio ln ψ = ln p 1 1 p 1 ln = β i (x 1 x 2 ) p 2 1 p 2 or ψ = e β i (x 1 x 2 ) Hence a unit change in X i corresponds to e β i β i change in logodds. change in odds and

29 We need the following regression model Y = β T X + ɛ where β T X = β 0 + β 1 X 1 + β 2 X β p X p and ɛ Logistic(0, 1), i.e. the variable Y can be written directly in terms of the linear predictor function and an additive random error variable. The logistic distribution is the probability distribution the random error.

30 Generating model and/or how to simulate Take a continuous latent variable Y (latent= an unobserved random variable) that is given as follows: Y = β T X + ɛ where β T X = β 0 + β 1 X 1 + β 2 X β p X p and ɛ Logistic(0, 1).

31 Define the response Y as the indicator for whether the latent variable is positive: { 1 if Y > 0 i.e. ε < β T X, Y = 0 otherwise. Then Y follows a logistic regression w.r.t. X. We need only to verify that 1 P(Y = 1 X) = 1 + e. βt X

32 P(Y = 1 X) = P(Y > 0 X) (1) = P(β T X + ε > 0) (2) = P(ε > β T X) (3) = P( ε < β T X) (4) = P(ε < β T X) (5) = σ(β T X) (6) 1 = (7) 1 + e βt X where we used that the logistic distribution is symmetric (and continuous), as found in the exercise classes, or Pr ( ε x) = Pr (ε x).

33 Part III: Logistic Regression Special case: β T X = β 0 + β 1 x. Likelihood Maximum Likelihood logisticmle.m

34 We consider the model: β T X = β 0 + β 1 x. P(Y = 1 x) = σ(β T X) = e (β 0+β 1 x)

35 Notation: P(Y = y x) = σ(β T X) y (1 σ(β T X) 1 y = { σ(β T X) if y = 1 1 σ(β T X) if y = 0

36 Data (x i, y i ) n i=1, likelihood function with the notation above L (β 0, β 1 ) = P(Y = y 1 x 1 ) P(Y = y 2 x 2 ) P(Y = y n x n ) = σ(β T X 1 ) y 1 (1 σ(β T X 1 ) 1 y1 σ(β T X n ) y n (1 σ(β T X n ) 1 y n = A B A = σ(β T X 1 ) y1 σ(β T X 2 ) y2 σ(β T X n ) y n B = (1 σ(β T X 1 ) 1 y 1 (1 σ(β T X 2 ) 1 y2 (1 σ(β T X n ) 1 y n

37 = n i=1 + ln L (β 0, β 1 ) = ln A + ln B n i=1 = n y l ln σ(β T X i ) i=1 (1 y i ) ln(1 σ(β T X i )) ln(1 σ(β T X i )) + n i=1 σ(β T X i ) y l ln 1 σ(β T X i )

38 ( ln(1 σ(β T X i )) = ln 1 ( e (β 0+β 1 x) = ln 1 + e (β 0+β 1 x i ) ( ) 1 = ln 1 + e β 0+β 1 = x i ) = ln (e β 0+β 1 x i ) e (β 0+β 1 x i ) )

39 logodds σ(β T X i ) ln 1 σ(β T X i ) = β 0 + β 1 x i

40 In summary: ln L (β 0, β 1 ) = n i=1 y i (β 0 + β 1 x i ) n i=1 ) ln (e β 0+β 1 x i

41 = n i=1 β 1 ln L (β 0, β 1 ) y i x i n i=1 ) ln (e β 0+β 1 x i β 1 ) ln (e β 0+β 1 x i β 1 = e β 0+β 1 x i x i 1 + e β 0+β 1 x i = P(Y = 1 x i ) x i.

42 ln L (β 0, β 1 ) = 0 β 1 n (y i x i P(Y = 1 x i ) x i ) = 0. i=1 In the same manner we can also find n ln L (β 0, β 1 ) = β 0 (y i P(Y = 1 x i )) = 0 i=1 These two equations have no closed form solution w.r.t. β 0 and β 1.

43 Newton-Raphson for MLE: a one-parameter case For one parameter θ, set f (θ) = ln L(θ). We are searching for the solution of Newton-Raphson method f (θ) = d f (θ) = 0. dθ θ new = θ old + f ( θ old ) where a good initial value is desired. f (θ old ),

44 Newton-Raphson for logistic MLE ( β new 0 β new 1 ) = ( β old 0 β old 1 ) ( + H 1 (β old 0, β old β 1 ) 0 ln L ( β old 0, ) βold 1 β 1 ln L ( β old 0, ) βold 1 ). where H 1 (β old (next slide) 0, βold 1 ) is the matrix inverse of the 2 2 matrix

45 Newton-Raphson for logistic MLE H(β old 0, β old 1 ) = 2 β 2 0 ln L ( β old 0, ) βold 1 2 β 1 β 0 ln L ( β old 0, βold 1 ) 2 β 0 β 1 ln L ( β old 2 β 2 1 0, βold 1 ln L ( β old 0, βold 1 ) ) = n i=1 P (Y = 1 x i ) (1 P(Y = 1 x i ) n i=1 P (Y = 1 x i ) (1 P(Y = 1 x i ) right)cdotx i n i=1 P (Y = 1 x i ) (1 P (Y = 1 x i )) x i n i=1 P (Y = 1 x i ) (1 P(Y = 1 x i )) x 2 i

46 [bs,stderr,phat,deviance] = logisticmle(y,x) Input: y - responses, a binary vector, values 0 and 1 x - the covariate, as a vector Output: bs - estimators of beta0 and beta1 stderr - standard error of the estimate = square roots of the diagonal elements of H 1. phat - estimator of p= P(Y=1) deviance - deviance

47 Confidence Interval with degree of confidence 1 α β MLE,i ± λ α/2 stderr( β MLE,i )

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition