Multiple regression: Categorical dependent variables

Similar documents
Data Analytics for Social Science

Advanced Quantitative Methods: limited dependent variables

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Performance Evaluation

Evaluation & Credibility Issues

Can a Pseudo Panel be a Substitute for a Genuine Panel?

Generalized Linear Models for Non-Normal Data

Performance Evaluation and Comparison

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Model Accuracy Measures

Machine Learning Concepts in Chemoinformatics

Single-level Models for Binary Responses

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Linear Regression With Special Variables

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Lecture 3 Classification, Logistic Regression

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Chapter 1. Modeling Basics

Least Squares Classification

disc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Psych 230. Psychological Measurement and Statistics

Generalized linear models

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Generalized Models: Part 1

Linear Regression Models P8111

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Pointwise Exact Bootstrap Distributions of Cost Curves

Semiparametric Generalized Linear Models

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Ch 6: Multicategory Logit Models

Machine Learning Linear Classification. Prof. Matteo Matteucci

Performance evaluation of binary classifiers

STA 216, GLM, Lecture 16. October 29, 2007

Homework 1 Solutions

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Splitting a predictor at the upper quarter or third and the lower quarter or third

Mid-term exam Practice problems

Handbook of Regression Analysis

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Comparing IRT with Other Models

Introduction to Generalized Models

Week 5: Classification

Introduction to Signal Detection and Classification. Phani Chavali

Post-Estimation Uncertainty

STAT 7030: Categorical Data Analysis

Applied Machine Learning Annalisa Marsico

Lecture 9: Classification, LDA

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Sampling distributions and the Central Limit. Theorem. 17 October 2016

Lecture 9: Classification, LDA

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Introduction to Logistic Regression

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Final Exam. Name: Solution:

STAT5044: Regression and Anova

Categorical data analysis Chapter 5

Lecture 9: Classification, LDA

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Logistic regression modeling the probability of success

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Probabilities and distributions

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Chapter 11. Regression with a Binary Dependent Variable

Business Statistics. Lecture 10: Course Review

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Contents. Part I: Fundamentals of Bayesian Inference 1

Introduction to GSEM in Stata

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Econometrics Problem Set 10

Model Assumptions; Predicting Heterogeneity of Variance

Interpreting and using heterogeneous choice & generalized ordered logit models

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Data Mining algorithms

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Logistic Regression Models for Multinomial and Ordinal Outcomes

Data-analysis and Retrieval Ordinal Classification

ECON 594: Lecture #6

Introduction To Logistic Regression

Generalized linear models III Log-linear and related models

8 Nominal and Ordinal Logistic Regression

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Dynamic Generalized Linear Models

POLI 618 Notes. Stuart Soroka, Department of Political Science, McGill University. March 2010

Model generation and model selection in credit scoring

Introducing Generalized Linear Models: Logistic Regression

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Transcription:

Multiple : Categorical Johan A. Elkink School of Politics & International Relations University College Dublin 28 November 2016

1 2 3 4

Outline 1 2 3 4

models models have a variable consisting of two categories. For example, Vote on a particular law Turning out in an election Approval in a referendum Bankrupt or not

Limited When a variable is not continuous, or is truncated for some reason, a linear model would lead to implausible predictions. For binary we estimate the probability of observing a one: Prediction below 0 and above 1 would not make sense. For any case where the predicted probability is already high, it cannot increase much with a change in X (and vice versa for low probabilities). A linear model would imply high levels of heteroskedasticity.

Estimators A typical approach is to have an estimator that is linear in the parameters i.e. it generates a linear prediction based on X and β but then transforms this linear prediction into one bounded between 0 and 1. Predicted probability 0.0 0.2 0.4 0.6 0.8 1.0 transformation Pr(Y = 1) = 0.5 6 4 2 0 2 4 6 Linear prediction

Outline 1 2 3 4

The most common transformation is the logistic transformation, which relates to the log-odds: log ( ) Pr(yi = 1) = β 1 + β 2 x i1 + β 3 x i2, Pr(y i = 0) which can also be formulated as: Pr(y i = 1) = 1 1 + e (β 1+β 2 x i1 +β 3 x i2 ).

Estimating a logistic Estimating a logistic is straightforward and output will look similar to that of linear. E.g. explaining Yes in the Marriage Equality Referendum. Note the use of continuous and discreet in. Age 25-34 0.152 (0.410) 35-44 0.707 (0.386) 45-54 0.865 (0.390) 55-64 1.084 (0.399) 65+ 1.857 (0.374) Urban 0.305 (0.168) Pro-abortion 0.221 attitude (0.028) intercept 0.358 (0.372) N 851

Outline 1 2 3 4

Derivatives For linear, we interpret using the first derivative i.e. the effect of X on Y is: ŷ x j = β j. In a logistic, however, the derivative is more complicated: ˆπ x j = β j ˆπ(1 ˆπ). Because of the non-linear relationship, the effect of X on Y depends on all other in. Nevertheless, a quick method to interpret logit coefficients is to divide them by 4 to get the slope at ˆπ = 0.5.

Graphical interpretation An alternative method is to plot the relationship between one x and π, holding the other values of X constant (e.g. at the mean, median, etc.). 2 4 6 8 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Predicted probability of a yes vote (for 35 44 year old) Pro abortion attitude Probability of yes vote Urban Rural Because the link function g(xβ) is not linear (but instead g(xβ) = 1 1+e Xβ ), the effect of X on y depends on all X.

Fitted values A third useful way of interpreting logit coefficients is by describing typical cases or interesting examples. Age Region P(Yesvote) 18 24 Urban 0.89 35 44 Urban 0.79 65+ Urban 0.59 18 24 Rural 0.85 35 44 Rural 0.74 65+ Rural 0.51 (This assumes attitude towards abortion at the median value.)

Presentation Bottom line: it is much better to present interpretable and understandable inferences, with an indication of the level of uncertainty, than to present simply estimated coefficients. E.g. An increase in automobile support for a Republican senator from $10000 to $20000 in total increases his or her probability to vote for the Corporate Average Fuel Economy standard bill by 11%, give or take 7%, all else equal.

Outline 1 2 3 4

R 2 for logistic Although various authors have proposed pseudo-r 2 estimators that roughly do the same thing as an R 2 for linear, there is no good alternative. They cannot be interpreted as the proportion of variance in Y explained. Instead, it is typically better to look at the quality of the predictions do I get high Pr(Y = 1) for the observed ones in the data and low Pr(Y = 1) for the observed zeros?

Confusion matrix Evaluating the performance of the binary model can be done by using the confusion matrix: Prediction True value 1 0 1 True positive False positive (Type I error) 0 False negative True negative (Type II error)

Confusion matrix Evaluating the performance of the binary model can be done by using the confusion matrix: Prediction True value 1 0 1 True positive False positive Precision: TP TP+FP (Type I error) 0 False negative True negative (Type II error) Sensitivity: TP TN TP+FN Specificity: FP+TN Accuracy: TP+TN N TPR: TP TP+FN FPR: FP FP+TN

Receiver Operating Characteristic curve The accuracy of predictions will depend on the threshold probability variations on default of ˆπ = 0.5 are possible. Depending on the application, it might be better or worse to over- or underestimate ones relative to zeros. The ROC-curve plots, for all possible thresholds, the true positive rate against the false positive rate. An ROC-curve further from (above) the 45 degree line indicates a better predictive performance; any predictions under this line indicate worse than random prediction.

Receiver Operating Characteristic curve Receiver Operating Characteristic curve True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 With age Without age 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Given the above, we can also calculate the area under the ROC-curve as a measure of prediction quality, called AUC. This is somewhat related to the Gini coefficient for income distributions (G = 2AUC 1).

Variations is the most common and easiest to use. Other models exist for specific uses: Probit similar to logistic, but with less fat tails. Ordered probit similar to probit, but with multiple category ordinal variable. Multinomial logistic similar to logistic, but for a multiple category nominal variable. Poisson / negative binomial models for discrete, positive, such as counts.