Introduction to Logistic Regression

Similar documents
Introduction to Logistic Regression

Simple Linear Regression for the MPG Data

Multiple Linear Regression for the Salary Data

Multiple Linear Regression for the Supervisor Data

Linear Regression Models P8111

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Logistic Regression 21/05

LOGISTIC REGRESSION Joseph M. Hilbe

Classification. Chapter Introduction. 6.2 The Bayes classifier

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Simple Linear Regression for the Advertising Data

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

BMI 541/699 Lecture 22

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Statistical Consulting Topics Classification and Regression Trees (CART)

Chapter 10 Logistic Regression

Generalized Linear Models

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Machine Learning Linear Classification. Prof. Matteo Matteucci

Lecture 6: Linear Regression (continued)

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Simple logistic regression

Statistical Methods for SVM

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Lecture 12: Effect modification, and confounding in logistic regression

Minimum Description Length (MDL)

Gradient Ascent Chris Piech CS109, Stanford University

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Generalized Linear Models 1

Performance Evaluation and Comparison

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Loglikelihood and Confidence Intervals

Classification: Linear Discriminant Analysis

Lecture 6: Linear Regression

STA102 Class Notes Chapter Logistic Regression

MATH 1150 Chapter 2 Notation and Terminology

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Generalized Additive Models

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Machine Learning, Fall 2012 Homework 2

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Classification: Logistic Regression and Naive Bayes Book Chapter 4. Carlos M. Carvalho The University of Texas McCombs School of Business

Lecture 01: Introduction

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Lecture 3.1 Basic Logistic LDA

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Lecture 3 Classification, Logistic Regression

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

LDA, QDA, Naive Bayes

MATH 644: Regression Analysis Methods

36-463/663: Multilevel & Hierarchical Models

UNIVERSITY OF TORONTO Faculty of Arts and Science

Statistical Methods for Data Mining

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Stat 5101 Lecture Notes

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

Unit 9: Inferences for Proportions and Count Data

Review of Statistics 101

Probability Models of Information Exchange on Networks Lecture 1

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Stat 101 Exam 1 Important Formulas and Concepts 1

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

Correlation and regression

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Bayesian linear regression

Single-level Models for Binary Responses

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Chapter 1 Linear Regression with One Predictor

THE LINEAR DISCRIMINATION PROBLEM

Introduction to the Analysis of Tabular Data

Logistic Regression: Regression with a Binary Dependent Variable

Turning a research question into a statistical question.

Exam Applied Statistical Regression. Good Luck!

Seminar über Statistik FS2008: Model Selection

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Lecture 5: Clustering, Linear Regression

Stat 579: Generalized Linear Models and Extensions

Generalized Linear Modeling - Logistic Regression

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Lecture #11: Classification & Logistic Regression

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Swarthmore Honors Exam 2012: Statistics

Transcription:

Misclassification 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.0 0.2 0.4 0.6 0.8 1.0 Cutoff Introduction to Logistic Regression

Problem & Data Overview Primary Research Questions: 1. What skills are important in winning tennis matches? Regression Questions: 1. What is Y? 2. What is X? Did player win or lose? Match statistics

Exporatory Data Analysis 1. Side-by-side boxplots

Exporatory Data Analysis 2. Scatterplot (Win =1, Lose = 0)

Exporatory Data Analysis 3. Scatterplot w/smooth curve

Exporatory Data Analysis 4. Cross-tabulations 0 1 Sum 0 1 0 1 1 3 1 4 2 4 0 4 3 6 5 11 4 4 3 7 5 9 7 16 6 5 4 9 7 4 4 8 8 6 6 12 9 4 7 11 Ace Result 10 5 4 9 11 4 2 6 12 1 2 3 13 4 3 7 14 0 4 4 15 1 1 2 16 0 3 3 17 1 0 1 19 2 0 2 20 1 0 1 21 0 1 1 23 0 2 2 26 1 0 1 29 1 0 1 Sum 67 59 126

Can we use linear regression? Our response is a categorical variables so can we ust use indicator variables and set, Y i = ( 1 if Win 0 otherwise then use regular least squares multiple regression? No, because 1. predictions will be outside of {0,1} 2. linear assumption might be violated 3. errors certainly won t be normal 4. equal variance is also likely to be violated. We need an entirely new regression framework!

Logistic regression Going back to Day 1, we have the following generic framework for statistical modeling: Y i iid p Y (y i ) E(y i )=f(x i1,...,x ip ) E.g, for simple and multiple linear regression! modeling we had: Y i iid N 0 + E(y i )= 0 + p=1 p=1 x ip x ip p, Where the normal assumption was OK because Y was quantitative p 2

Logistic regression What s an appropriate distribution when Y i 2 {0, 1}? Bernoulli Distribution: f(y i )=p y i (1 p) 1 y i If our response follows a Bernoulli distribution then E(y i )=p = Prob(Y = 1) So can we ust set E(y i )=p = 0 + p=1 x ip p No because p is has to be between 0 and 1. We need to choose a different math function than we have used before (one that keeps p between 0 and 1).

Logistic regression Logistic Regression Model: (Generalized Linear Model) Odds Ratio log Logit Transform Y i ind Bern(p i ) JX = 0 + x i ) p i = exp{ 0 + P J x i } 1 + exp{ 0 + P J x i } Logistic Function 2 (0, 1)

Logistic Regression Model: log = 0 + How do we interpret? 1. For every unit increase in x, the log-odds ratio increases by. 2. Just interpret the sign: If > 0, then p i increases as x increases. 3. As x increases by 1, a player is exp{ } times more likely to win the game. 4. As x increases by 1, a player is more likely to win. JX x i 100 (exp{ } 1)%

Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i How do we estimate the s? We use maximum likelihood (see Stat 340) In this class, we ll let R do it for us.

Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i Example: - ˆDBF = 0.272 - How do we interpret this number? 1. As DBF increases by 1 then the log(odds) goes down by 0.272. 2. As DBF increases by 1 then the probability of winning goes down by 100*(e -0.272-1) 24%.

Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i What assumptions are we making? Linear in log-odds (monotone in probability) Scatterplot w/smoother

What assumptions are we making? Linear in log-odds (monotone in probability) Scatterplot w/smoother

Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i What assumptions are we making? Linear in log-odds (monotone in probability) Check using ittered scatterplot Independence Normality Equal Variance

Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i How can we perform variable selection? Same way as before - compare AIC or BIC.

Logistic Regression Model: Bern(p i ) log = 0 + y i ind How do we build confidence intervals (or perform hypothesis tests) for our effects? ˆ N(0, 1) SE( ˆ) ˆ ± z? SE( ˆ) x i

Logistic Regression Model: Bern(p i ) log = 0 + y i ind How do we build confidence intervals (or perform hypothesis tests) for our effects? - 95% CI for DBF is (-0.487, -0.078). - How do we interpret this interval? 1. We are 95% confident that as DBF increases by 1 the log(odds) of winning goes down by between 0.487 and 0.078. x i

Logistic Regression Model: Bern(p i ) log = 0 + y i ind How do we build confidence intervals (or perform hypothesis tests) for our effects? - 95% CI for DBF is (-0.487, -0.078). - How do we interpret this interval? 2. We are 95% confident that as DBF increases by 1 the probability of winning decreases between 100 (exp{( 0.487, 0.078)} 1) = ( 38.6%, 7.5%) x i

Logistic Regression Model: y i ind How do we predict? Predict probabilities Bern(p i ) log = 0 + ˆp = n exp ˆ0 + P P 1 + exp x i p=1 x ip ˆp o n ˆ0 + P P p=1 x ip ˆp o

Logistic Regression Model: Bern(p i ) log = 0 + y i ind Many times we want to classify so we set: ŷ = ( 1 if ˆp>c 0 if ˆp apple c x i where c = Cuto Probability

Logistic Regression Model: Bern(p i ) log = 0 + y i ind How do we choose the cutoff value? 1. c =0.5! Bayes Classifier 2. Choose c to minimize the misclassification rate 1 n nx I(y i 6=ŷ i ) = Percent Misclassified i=1 x i

Misclassification 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.0 0.2 0.4 0.6 0.8 1.0 Cutoff

Logistic Regression Model: Bern(p i ) log = 0 + y i ind Can we build a prediction interval? Sort of we can build a confidence interval for a predicted probability by untransforming the interval: log ˆp 1 ˆp ± z? SE log x i ˆp 1 ˆp

Steps to building an interval for a predicted probability: 1. Calculate ˆp ˆp Low = log z? SE log 1 ˆp 1 ˆp ˆp ˆp Up = log + z? SE log 1 ˆp 1 ˆp 2. Untransform ˆ Low = exp{low}/(1 + exp{low}) ˆ Up = exp{up}/(1 + exp{up})

Confidence Interval for a probability example: If a player has the following: FSP = 68, FSW = 60 SSP = 79, SSW = 16 ACE = 6, DBF = 2 NPA = 6, NPW = 64 then the estimated probability of winning is between 66% and 95%. Note: This was Dokovic vs. Nadal and Nadal won (Nadal beat the odds).

Logistic Regression Model: Bern(p i ) log = 0 + y i ind How can we tell how well our model fits? In sample confusion matrix: x i Predicted Wins Predicted Loss True Win 49 10 True Loss 14 53

Important Definitions: Predicted Wins Predicted Loss True Win 49 10 True Loss 14 53 Sensitivity: Percent of True Positives (49/59) Specificity: Percent of True Negatives (53/67) Positive Predictive Value: % Correctly Predicted Yes s (49/63) Negative Predictive Value: % Correctly Predicted No s (53/63)

Logistic Regression Model: Bern(p i ) log = 0 + y i ind How can we tell how well our model fits? Pseudo -R 2 R 2 pseudo =1 Whats Left Over After Model Total Variation x i =1 Residual Deviance Null Deviance Interpretation: Percent of variation in log(p/(1-p)) explained by modeling.

Logistic Regression Model: Bern(p i ) log = 0 + y i ind How can we tell how well our model predicts? Cross validated confusion matrix: Repeat confusion matrix but first split into test and training sets x i

End of Tennis Analysis (see webpage for R code)