Logistisk regression T.K.

Similar documents
Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Seungjin Choi

Generalized Linear Models. Kurt Hornik

Machine Learning Linear Classification. Prof. Matteo Matteucci

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Lecture 10: Introduction to Logistic Regression

Data-analysis and Retrieval Ordinal Classification

Linear Regression Models P8111

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Dynamic Generalized Linear Models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

STA 216, GLM, Lecture 16. October 29, 2007

COMPLEMENTARY LOG-LOG MODEL

CSCI-567: Machine Learning (Spring 2019)

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

BMI 541/699 Lecture 22

Statistical Data Mining and Machine Learning Hilary Term 2016

Linear Models in Machine Learning

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Simple logistic regression

Lecture 4 Towards Deep Learning

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Gradient-Based Learning. Sargur N. Srihari

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

STA6938-Logistic Regression Model

The material for categorical data follows Agresti closely.

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

Modeling Binary Outcomes: Logit and Probit Models

STAB57: Quiz-1 Tutorial 1 (Show your work clearly) 1. random variable X has a continuous distribution for which the p.d.f.

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

ECE521 week 3: 23/26 January 2017

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Linear Regression (9/11/13)

Association studies and regression

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Lecture #11: Classification & Logistic Regression

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Logistic regression model for survival time analysis using time-varying coefficients

CS340 Machine learning Gaussian classifiers

Classification. Chapter Introduction. 6.2 The Bayes classifier

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Naive Bayes and Gaussian Bayes Classifier

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

A Reliable Constrained Method for Identity Link Poisson Regression

Longitudinal Modeling with Logistic Regression

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Introduction To Logistic Regression

LOGISTIC REGRESSION Joseph M. Hilbe

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Chapter 10 Logistic Regression

Generalized Linear Models 1

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Density Estimation. Seungjin Choi

Probability and Estimation. Alan Moses

7. Estimation and hypothesis testing. Objective. Recommended reading

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Analysing categorical data using logit models

Bayesian Interpretations of Regularization

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Lecture 5: LDA and Logistic Regression

Generalized Linear Modeling - Logistic Regression

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Linear Classification: Probabilistic Generative Models

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Linear Models for Classification

Correlation and regression

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Time Series and Dynamic Models

Chapter 5: Logistic Regression-I

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

When is MLE appropriate

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Lecture 12: Effect modification, and confounding in logistic regression

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Bayesian RL Seminar. Chris Mansley September 9, 2008

For more information about how to cite these materials visit

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Binary Logistic Regression

Outline of GLMs. Definitions

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Bayes methods for categorical data. April 25, 2017

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Linear and logistic regression

9 Generalized Linear Models

Transcription:

Föreläsning 13: Logistisk regression T.K. 05.12.2017

Your Learning Outcomes Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition likelihood function: maximum likelihood estimate

The setting: Y is a binary r.v.. x is its covariant, predictor or explanatory variable. x can be continuous, categorical. We cannot model the association of Y to x by a direct linear regression, Y = α + βx + e where e is, e.g., N (0, σ 2 ).

A Case Y = Bacterial Meningitis or Acute Viral Meningitis. x =cerebrospinal fluid total protein count.

A Case Diaz, Armando A and Tomba, Emanuele and Lennarson, Reese and Richard, Rex and Bagajewicz, Miguel J and Harrison, Roger G: Prediction of protein solubility in Escherichia coli using logistic regression, Biotechnology and bioengineering, 105, 2 pp. 374 383, 2010. The authors develop a model for the prediction of the solubility of proteins overexpressed in the bacterium Escherichia coli. The model uses the statistical technique of logistic regression. To build this model, 32 covariates x i that could potentially correlate well with solubility were used. Logistic regression provides the probability p of a certain protein to belong (= Y ) to one set or another.

Part I: Concepts Odds, Odds Ratio Logit function Logistic function

Odds The odds of a statement (event e.t.c) A is calculated as the probability p(a) of observing A divided by the probability of not observing A: odds of A = p(a) 1 p(a) E.g., in humans an average of 51 boys are born in every 100 births, the odds of a randomly chosen delivery being boy are: odds of a boy = 0.51 0.49 = 1.04 The odds of a certain thing happening are infinite.

An Aside: Posterior Odds The posterior probability p(h E ) of H given the evidence E: posterior odds of H = p(h E ) 1 p(h E ) If there are only two alternative hypotheses, H and not-h, then posterior odds of H = p(h E ) p(not-h E ) by Bayes formula = p(e H) p(h) p(e not-h) p(not-h). = likelihood ratio prior odds

Odds ratio The odds ratio ψ is the ratio of odds from two different conditions or populations. ψ = odds of A 1 odds of A 2 = p(a 1 ) 1 p(a 1 ) p(a 2 ) 1 p(a 2 )

The function logit(p) The logarithmic odds of success is called the logit of p ( ) p logit(p) = ln 1 p

The function logit(p) and its inverse ( ) p logit(p) = log 1 p If θ = logit(p), then the inverse function is p = logit 1 (θ) = eθ 1 + e θ

The logit(p) and its inverse The function ( ) p logit(p) = log, 0 < p < 1 1 p p = logit 1 (θ) = σ(θ) = eθ 1 + e θ = 1 1 + e θ 1, < θ <, 1 + e θ is called the logistic function or sigmoid. Note that σ(0) = 1 2.

Logistic function In biology the logistic function refers to change in size of a species population.

The logit(p) and the logistic function Sats The logit function ( ) p θ = logit(p) = log, 0 < p < 1 1 p and the logistic function p = σ(θ) = are inverse functions to each other. 1, < θ <, 1 + e θ

The logit(p) and the logistic function ( ) p 1 θ = logit(p) = log, 0 < p < 1 p = σ(θ) =, < θ <, 1 p 1 + e θ 8 6 4 2 0 2 4 6 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 8 6 4 2 0 2 4 6 8

The logistic distribution We see that 0 σ(x) 1. σ(x) 1, as x +. σ(x) 0, as x. σ(x) is strictly increasing. Hence σ(x) is the distribution function of a random variable.

The logistic distribution If for every x P(ɛ x) = σ(x) = 1, < x < 1 + e x then we say that ɛ has the logistic distribution ɛ logistic(0, 1)

The logistic distribution: an exercise Let P U(0, 1) and ɛ = ln P 1 P. Then ɛ logistic(0, 1)

The logistic distribution: another exercise ɛ logistic(0, 1) Then P( ɛ x) = P(ɛ x).

Simulating ɛ Logistic(0, 1) Simulate p 1,..., p n from the uniform distribution on (0, 1) and then do ɛ i = logit(p i ), i = 1,..., n. In the figure we plot the empirical distribution function of ɛ i for n = 200. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 100 80 60 40 20 0 20 40

Part II: Logistic Regression A regression function log odds of Y a regression function How to generate Y

Let β = (β 0, β 1, β 2,..., β p ) be (p + 1) 1 vector and X = (1, X 1, X 2,..., X p ) be a (p + 1) 1 -vector of (predictor) variables. We set, as in multiple regression, β T X = β 0 + β 1 X 1 + β 2 X 2 +... + β p X p Then G (X) = σ(β T X) = T X 1 1 + e = e β βt X 1 + e βt X

Logistic regression The predictor variables (X 1, X 2,..., X p ) can be binary, ordinal, categorical or continuous.

e βt X G (X) = σ(β T X) = 1 + e βt X By construction 0 < G (X) < 1. Then logit is well defined and logit(g (X)) = ln G (X) 1 G (X) = βt X = β 0 + β 1 X 1 + β 2 X 2 +... + β p X p.

Logistic regression Let now Y be a binary random variable such that { 1 with probability G (X) Y = 1 with probability 1 G (X) Definition If the logit of G (X) (or log odds of Y ) is logit(g (X)) = ln G (X) 1 G (X) = β 0 + β 1 X 1 + β 2 X 2 +... + β p X p, then we say that Y follows a logistic regression w.r.t. the predictor variables X = (1, X 1, X 2,..., X p ).

Logistic regression: genetic epidemiology Logistic regression { success with probability G (X) Y = failure with probability 1 G (X) logit(g (X)) = ln G (X) 1 G (X) = β 0 + β 1 X 1 + β 2 X 2 +... + β p X p. is extensively applied in biomedical research.

Logistic regression P(Y = 1 X) = G (X) = σ(β T X) = e βt X 1 + e βt X 1 P(Y = 0 X) = 1 G (X) = 1 1 + e = e β βt X 1 + e βt X = 1 1 + e βt X. T X

Logistic regression: genetic epidemiology Suppose we have two populations, where X i = x 1 in first population and X i = x 2 in the second population, all other predictors are equal in the two populations. Then a medical geneticist finds it useful to calculate the logarithm of the odds ratio ln ψ = ln p 1 1 p 1 ln = β i (x 1 x 2 ) p 2 1 p 2 or ψ = e β i (x 1 x 2 ) Hence a unit change in X i corresponds to e β i β i change in logodds. change in odds and

We need the following regression model Y = β T X + ɛ where β T X = β 0 + β 1 X 1 + β 2 X 2 +... + β p X p and ɛ Logistic(0, 1), i.e. the variable Y can be written directly in terms of the linear predictor function and an additive random error variable. The logistic distribution is the probability distribution the random error.

Generating model and/or how to simulate Take a continuous latent variable Y (latent= an unobserved random variable) that is given as follows: Y = β T X + ɛ where β T X = β 0 + β 1 X 1 + β 2 X 2 +... + β p X p and ɛ Logistic(0, 1).

Define the response Y as the indicator for whether the latent variable is positive: { 1 if Y > 0 i.e. ε < β T X, Y = 0 otherwise. Then Y follows a logistic regression w.r.t. X. We need only to verify that 1 P(Y = 1 X) = 1 + e. βt X

P(Y = 1 X) = P(Y > 0 X) (1) = P(β T X + ε > 0) (2) = P(ε > β T X) (3) = P( ε < β T X) (4) = P(ε < β T X) (5) = σ(β T X) (6) 1 = (7) 1 + e βt X where we used that the logistic distribution is symmetric (and continuous), as found in the exercise classes, or Pr ( ε x) = Pr (ε x).

Part III: Logistic Regression Special case: β T X = β 0 + β 1 x. Likelihood Maximum Likelihood logisticmle.m

We consider the model: β T X = β 0 + β 1 x. P(Y = 1 x) = σ(β T X) = 1 1 + e (β 0+β 1 x)

Notation: P(Y = y x) = σ(β T X) y (1 σ(β T X) 1 y = { σ(β T X) if y = 1 1 σ(β T X) if y = 0

Data (x i, y i ) n i=1, likelihood function with the notation above L (β 0, β 1 ) = P(Y = y 1 x 1 ) P(Y = y 2 x 2 ) P(Y = y n x n ) = σ(β T X 1 ) y 1 (1 σ(β T X 1 ) 1 y1 σ(β T X n ) y n (1 σ(β T X n ) 1 y n = A B A = σ(β T X 1 ) y1 σ(β T X 2 ) y2 σ(β T X n ) y n B = (1 σ(β T X 1 ) 1 y 1 (1 σ(β T X 2 ) 1 y2 (1 σ(β T X n ) 1 y n

= n i=1 + ln L (β 0, β 1 ) = ln A + ln B n i=1 = n y l ln σ(β T X i ) i=1 (1 y i ) ln(1 σ(β T X i )) ln(1 σ(β T X i )) + n i=1 σ(β T X i ) y l ln 1 σ(β T X i )

( ln(1 σ(β T X i )) = ln 1 ( e (β 0+β 1 x) = ln 1 + e (β 0+β 1 x i ) ( ) 1 = ln 1 + e β 0+β 1 = x i ) = ln (e β 0+β 1 x i ) 1 1 + e (β 0+β 1 x i ) )

logodds σ(β T X i ) ln 1 σ(β T X i ) = β 0 + β 1 x i

In summary: ln L (β 0, β 1 ) = n i=1 y i (β 0 + β 1 x i ) n i=1 ) ln (e β 0+β 1 x i

= n i=1 β 1 ln L (β 0, β 1 ) y i x i n i=1 ) ln (e β 0+β 1 x i β 1 ) ln (e β 0+β 1 x i β 1 = e β 0+β 1 x i x i 1 + e β 0+β 1 x i = P(Y = 1 x i ) x i.

ln L (β 0, β 1 ) = 0 β 1 n (y i x i P(Y = 1 x i ) x i ) = 0. i=1 In the same manner we can also find n ln L (β 0, β 1 ) = β 0 (y i P(Y = 1 x i )) = 0 i=1 These two equations have no closed form solution w.r.t. β 0 and β 1.

Newton-Raphson for MLE: a one-parameter case For one parameter θ, set f (θ) = ln L(θ). We are searching for the solution of Newton-Raphson method f (θ) = d f (θ) = 0. dθ θ new = θ old + f ( θ old ) where a good initial value is desired. f (θ old ),

Newton-Raphson for logistic MLE ( β new 0 β new 1 ) = ( β old 0 β old 1 ) ( + H 1 (β old 0, β old β 1 ) 0 ln L ( β old 0, ) βold 1 β 1 ln L ( β old 0, ) βold 1 ). where H 1 (β old (next slide) 0, βold 1 ) is the matrix inverse of the 2 2 matrix

Newton-Raphson for logistic MLE H(β old 0, β old 1 ) = 2 β 2 0 ln L ( β old 0, ) βold 1 2 β 1 β 0 ln L ( β old 0, βold 1 ) 2 β 0 β 1 ln L ( β old 2 β 2 1 0, βold 1 ln L ( β old 0, βold 1 ) ) = n i=1 P (Y = 1 x i ) (1 P(Y = 1 x i ) n i=1 P (Y = 1 x i ) (1 P(Y = 1 x i ) right)cdotx i n i=1 P (Y = 1 x i ) (1 P (Y = 1 x i )) x i n i=1 P (Y = 1 x i ) (1 P(Y = 1 x i )) x 2 i

[bs,stderr,phat,deviance] = logisticmle(y,x) Input: y - responses, a binary vector, values 0 and 1 x - the covariate, as a vector Output: bs - estimators of beta0 and beta1 stderr - standard error of the estimate = square roots of the diagonal elements of H 1. phat - estimator of p= P(Y=1) deviance - deviance

Confidence Interval with degree of confidence 1 α β MLE,i ± λ α/2 stderr( β MLE,i )