Logit Regression and Quantities of Interest

Similar documents
Logit Regression and Quantities of Interest

Count and Duration Models

Count and Duration Models

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Linear Regression With Special Variables

Generalized Linear Models for Non-Normal Data

From Model to Log Likelihood

Generalized Linear Models

POLI 8501 Introduction to Maximum Likelihood Estimation

Visualizing Inference

Visualizing Inference

Linear Regression Models P8111

POLI 7050 Spring 2008 February 27, 2008 Unordered Response Models I

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Precept Three: Numerical Optimization and Simulation to Derive QOI from Model Estimates

Generalized Models: Part 1

GOV 2001/ 1002/ Stat E-200 Section 8 Ordered Probit and Zero-Inflated Logit

Investigating Models with Two or Three Categories

High-Throughput Sequencing Course

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Generalized Linear Models. Kurt Hornik

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Generalized Linear Models

Introduction to Generalized Models

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

STAT5044: Regression and Anova

Generalized Linear Models and Exponential Families

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Introduction Fitting logistic regression models Results. Logistic Regression. Patrick Breheny. March 29

Advanced Quantitative Methods: limited dependent variables

Continuing with Binary and Count Outcomes

Single-level Models for Binary Responses

Logistic Regression. Seungjin Choi

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Two-Variable Regression Model: The Problem of Estimation

GOV 2001/ 1002/ E-2001 Section 8 Ordered Probit and Zero-Inflated Logit

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1

Generalized linear models

Post-Estimation Uncertainty

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Week 2: Review of probability and statistics

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Lecture #11: Classification & Logistic Regression

Poisson regression: Further topics

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression

Comparing IRT with Other Models

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

GOV 2001/ 1002/ Stat E-200 Section 1 Probability Review

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Binary Logistic Regression

Generalized logit models for nominal multinomial responses. Local odds ratios

3 Independent and Identically Distributed 18

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Survival Models. Patrick Lam. February 1, 2008

Algebra & Trig Review

LOGISTIC REGRESSION Joseph M. Hilbe

Lecture 16 Solving GLMs via IRWLS

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

12 Modelling Binomial Response Data

12 Generalized linear models

MS&E 226: Small Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Classification. Chapter Introduction. 6.2 The Bayes classifier

Generalized Linear Models 1

Generalized Linear Models Introduction

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

Generalized Linear Models

CSC2515 Assignment #2

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Generalized Multilevel Models for Non-Normal Outcomes

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

Introducing Generalized Linear Models: Logistic Regression

This does not cover everything on the final. Look at the posted practice problems for other topics.

Matching. Stephen Pettigrew. April 15, Stephen Pettigrew Matching April 15, / 67

Generative Learning algorithms

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Chapter 11. Regression with a Binary Dependent Variable

9 Generalized Linear Models

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

Lecture 13: More on Binary Data

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Consider Table 1 (Note connection to start-stop process).

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Association studies and regression

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Lecture 10: Introduction to Logistic Regression

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Introduction to Generalized Linear Models

Model Checking. Patrick Lam

Binary Dependent Variable. Regression with a

POLI 618 Notes. Stuart Soroka, Department of Political Science, McGill University. March 2010

Transcription:

Logit Regression and Quantities of Interest Stephen Pettigrew March 4, 2015 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 1 / 57

Outline 1 Logistics 2 Generalized Linear Models 3 Our example 4 Logit Regression 5 Quantities of Interest 6 Other useful R functions Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 2 / 57

Logistics Outline 1 Logistics 2 Generalized Linear Models 3 Our example 4 Logit Regression 5 Quantities of Interest 6 Other useful R functions Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 3 / 57

Logistics Replication Assignment You should be in the process of replicating your article. If you haven t started to try to get the data for the article, what are you waiting for? Replication and preliminary write-up is due Wednesday March 25 at 6pm. On that day you ll have to submit: All data required to do the replication A file of R code that replicates all relevant tables, graphs, and numbers in the paper A PDF of the original paper A PDF of all the tables, graphs, and numbers that your replicated. These should look as close to what s in the original paper as possible. This document should also have a brief description of what you plan to write your final paper about Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 4 / 57

Logistics New problem set The new problem set and assessment question will be posted by this evening It s not due until March 25 (the week after spring break) Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 5 / 57

Logistics Problems with last problem set Remember to count the number of parameters you re estimating. When you use optim() or test your function, your vector of starting values should have this many elements in it. So if your model is: Y i N(µ i, σ 2 ) µ i = X i β how many parameters are you estimating? In the case of question 3 on the the problem set, it s 21 (19 X covariates, 1 intercept term, σ 2 ) Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 6 / 57

Logistics Problems with last problem set If your model is: Y i N(µ i, σ 2 ) how many are you estimating? µ i = X i β σ 2 = e Z i γ In the case of the assessment question, it s 23 (19 X covariates, 1 intercept term for the mean, 2 Z covariates, 1 intercept term for the variance) Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 7 / 57

Generalized Linear Models Outline 1 Logistics 2 Generalized Linear Models 3 Our example 4 Logit Regression 5 Quantities of Interest 6 Other useful R functions Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 8 / 57

Generalized Linear Models Generalized Linear Models All of the models we ve talked about so far (and for most of the rest of the class) belong to a class called generalized linear models (GLM). Three elements of a GLM (equivalent to Gary s stochastic and systematic component): A distribution for Y (stochastic component) A linear predictor X β (systematic component) A link function that relates the linear predictor to the paramaters of the distribution (systematic component) Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 9 / 57

Generalized Linear Models 1. Specify a distribution for Y Assume our data was generated from some distribution. Examples: Continuous and Unbounded: Normal Binary: Bernoulli Event Count: Poisson, negative binomial Duration: Exponential Ordered Categories: Normal with observation mechanism Unordered Categories: Multinomial Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 10 / 57

Generalized Linear Models 2. Specify a linear predictor We are interested in allowing some parameter of the distribution θ to vary as a (linear) function of covariates. So we specify a linear predictor. X β = β 0 + x 1 β 1 + x 2 β 2 + + x k β k Analogous to how you would specify an OLS model. You can have interaction terms, squared terms, etc. Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 11 / 57

Generalized Linear Models 3. Specify a link function The link function is a one-to-one continuous differentiable function that relates the linear predictor to some parameter θ of the distribution for Y (almost always the mean). Let g( ) be the link function and let E(Y ) = θ be the mean of distribution for Y. g(θ) = X β θ = g 1 (X β) Note that we usually use the inverse link function g 1 (X β) rather than the link function. 1 g 1 (X β) is the systematic component that we ve been talking about all along. 1 In some texts, g 1 ( ) is referred to as the link function Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 12 / 57

Generalized Linear Models Examples of Link Functions Identity: Link: µ = X β Logit: ( Link: ln Probit: Log: Inverse: π 1 π ) = X β Inverse Link: π = Link: Φ 1 (π) = X β 1 1+e X β Inverse Link: π = Φ(X β) Link: ln(λ) = X β Inverse Link: λ = exp(x β) Link: λ 1 = X β Inverse Link: λ = (X β) 1 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 13 / 57

Our example Outline 1 Logistics 2 Generalized Linear Models 3 Our example 4 Logit Regression 5 Quantities of Interest 6 Other useful R functions Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 14 / 57

Our example The Data: Congressional Elections Today we re going to use election data from the 2004 through 2010 congressional House general elections. Available at: tinyurl.com/kbanga2 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 15 / 57

Our example The Data: Congressional Elections We ll estimate a model using the 2004-2008 data, and then use the results to predict results in 2010. votes0408.dta is the data that we re going to use to estimate our model. You can read the data into R by using the read.dta() function in the foreign package: setwd("c:/.../your working directory") install.packages("foreign") require(foreign) votes <- read.dta("votes0408.dta") Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 16 / 57

Our example Election Data What s in the data? > head(votes) incwin open freshman incpres 1 1 0 1 64.77 2 1 0 0 66.97 3 1 0 1 58.59 4 1 0 0 71.73 5 1 0 0 39.77 6 1 0 1 64.53 > apply(votes, 2, summary) incwin open freshman incpres Min. 0.0000 0.00000 0.0000 30.10 1st Qu. 1.0000 0.00000 0.0000 52.98 Median 1.0000 0.00000 0.0000 58.09 Mean 0.9393 0.08824 0.1233 59.08 3rd Qu. 1.0000 0.00000 0.0000 64.26 Max. 1.0000 1.00000 1.0000 95.00 incwin: dependent variable; did the incumbent (or their party) win this election? {0,1} open: is the incumbent running for reelection? {0,1} freshman: is the incumbent in his/her first term? {0,1} incpres: in the last election, what percentage of votes did the presidential candidate from the incumbent s party receive in this district? (0,100) Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 17 / 57

Logit Regression Outline 1 Logistics 2 Generalized Linear Models 3 Our example 4 Logit Regression 5 Quantities of Interest 6 Other useful R functions Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 18 / 57

Logit Regression Binary Dependent Variable Our outcome variable is whether or not the incumbent s party won House general election What s the first question we should ask ourselves when we start to model this dependent variable? Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 19 / 57

Logit Regression 1. Specify a distribution for Y 2. Specify a linear predictor: Y i Bernoulli(π i ) n p(y π) = π y i i (1 π i ) 1 y i i=1 X i β = β 0 + X i,open β 1 + X i,freshman β 2 + X i,incpres β 3 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 20 / 57

Logit Regression 3. Specify a link (or inverse link) function. Logit: π i = 1 1 + e x i β We could also have chosen several other options: Probit: π i = Φ(x i β) Complementary Log-log (cloglog): π i = 1 exp( exp(x β)) Scobit: π i = (1 + e x i β ) α Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 21 / 57

Logit Regression Our model In the notation of UPM, this is the model we ve just defined: Y i Bernoulli(π i ) 1 π i = 1 + e X i β where X i β = β 0 + X i,open β 1 + X i,freshman β 2 + X i,incpres β 3 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 22 / 57

Logit Regression Log-likelihood of the logit p(y π) = L(π y) = n i=1 n i=1 π y i i (1 π i ) 1 y i π y i i (1 π i ) 1 y i [ n ] l(π y) = log π y i i (1 π i ) 1 y i = i=1 i=1 n [ ] log π y i i (1 π i ) 1 y i Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 23 / 57

Logit Regression Log-likelihood of the logit = = = l(β y, X) = n i=1 n i=1 [ ] log π y i i (1 π i ) 1 y i log(π y i i ) + log(1 π i ) 1 y i n y i log(π i ) + (1 y i ) log(1 π i ) i=1 n i=1 ( y i log 1 1 + e X i β ) ( + (1 y i ) log 1 ) 1 1 + e X i β Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 24 / 57

Logit Regression Log-likelihood of the logit = = = = = n ( 1 y i log 1 + e X i β n ( 1 y i log 1 + e X i β n [ ( 1 y i log 1 + e X i β n [ ( y i log i=1 i=1 i=1 i=1 n i=1 ( 1 y i log e X i β ) ( + (1 y i ) log 1 ) + (1 y i ) log ) log 1 1 + e X i β 1 + e Xi β e X i β ) + log ( e X i β ) 1 1 + e X i β ) 1 + e X i β ( e X i )] ( β e X i ) β 1 + e X + log i β 1 + e X i β )] ( e X i ) β + log ( e X i β 1 + e X i β ) 1 + e X i β Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 25 / 57

Logit Regression Log-likelihood of the logit = = = n ( ) ( 1 e X i ) β y i log e X + log i β 1 + e X i β n y i log(e X i β ) + log(e X i β ) log(1 + e X i β ) i=1 i=1 n y i X i β X i β log(1 + e X i β ) i=1 There s ways to simplify it further, but we ll leave it here for now. Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 26 / 57

Logit Regression Coding our log-likelihood function n y i X i β X i β log(1 + e X i β ) i=1 logit.ll <- function(par, outcome, covariates){ if(!all(covariates[,1] == 1)){ covariates <- as.matrix(cbind(1,covariates)) } } xb <- covariates %*% par sum(outcome * xb - xb - log(1 + exp(-xb))) Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 27 / 57

Logit Regression Finding the MLE opt <- optim(par = rep(0, ncol(votes[,2:4]) + 1), fn = logit.ll, covariates = votes[,2:4], outcome = votes$incwin, control = list(fnscale = -1), hessian = T, method = "BFGS") Point estimate of the MLE: opt$par [1] -2.9064379-2.1266744-0.3568115 0.1112137 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 28 / 57

Logit Regression Standard errors of the MLE Recall from last week that the standard errors are defined as the diagonal of: [ 2 ] l 1 β 2 where 2 l β 2 is the Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 29 / 57

Logit Regression Standard errors of the MLE Variance-covariance matrix: -solve(opt$hessian) [,1] [,2] [,3] [,4] [1,] 0.72666757 0.0144430061-0.0449794968-0.0136690621 [2,] 0.01444301 0.1057899296 0.0328375038-0.0009480586 [3,] -0.04497950 0.0328375038 0.1599947539 0.0002239472 [4,] -0.01366906-0.0009480586 0.0002239472 0.0002695985 Standard errors: sqrt(diag(-solve(opt$hessian))) [1] 0.85244799 0.32525364 0.39999344 0.01641946 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 30 / 57

Quantities of Interest Outline 1 Logistics 2 Generalized Linear Models 3 Our example 4 Logit Regression 5 Quantities of Interest 6 Other useful R functions Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 31 / 57

Quantities of Interest Interpreting Logit Results Here s a nicely formatted table with your regression results from our model: This table is total garbage. DV: Incumbent Election Win Coefficient SE Intercept -2.906 0.852 Open Seat -2.127 0.325 Freshman -0.357 0.400 Pres Vote 0.111 0.016 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 32 / 57

Quantities of Interest Interpreting Logit Results What the heck does it even mean for the coefficient for open seats to be -2.13? For a one unit increase in the open seat variable, there is a -2.13 unit change in the log odds of the incumbent winning. And what are log odds? total garbage Nobody thinks in terms of log odds, or probit coefficients, or exponential rates. Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 33 / 57

Quantities of Interest If there s one thing you take away from this class, it should be this: When you present results, always present your findings in terms of something that has substantive meaning to the reader. For binary outcome models that often means turning your results into predicted probabilities, which is what we ll do now. If there s a second thing you should take away, it s this: Always account for all types of uncertainty when you present your results We ll spend the rest of today looking at how to do that. Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 34 / 57

Quantities of Interest Getting Quantities of Interest How to present results in a better format than just coefficients and standard errors: 1 Write our your model and estimate ˆβ MLE and the Hessian 2 Simulate from the sampling distribution of ˆβ MLE to incorporate estimation uncertainty 3 Multiply these simulated βs by some covariates in the model to get X β 4 Plug X β into your link function, g 1 ( X β), to put it on the same scale as the parameter(s) in your stochastic function 5 Use the transformed g 1 ( X β) to take thousands of draws from your stochastic function and incorporate fundamental uncertainty 6 Store the mean of these simulations, E[y X ] 7 Repeat steps 2 through 6 thousands of times 8 Use the results to make fancy graphs and informative tables Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 35 / 57

Quantities of Interest Simulate from the sampling distribution of ˆβ MLE By the central limit theorem, we assume that ˆβ MLE mvnorm( ˆβ, ˆV ( ˆβ)) ˆβ is the vector of our estimates for the parameters, opt$par ˆV ( ˆβ) is the variance-covariance matrix, -solve(opt$hessian) We hope that the ˆβs we estimated are good estimates of the true βs, but we know that they aren t exactly perfect because of estimation uncertainty. So we account for this uncertainty by simulating βs from the multivariate normal distribution defined above Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 36 / 57

Quantities of Interest Simulate from the sampling distribution of ˆβ MLE Simulate one draw from mvnorm( ˆβ, ˆV ( ˆβ)) # Install the mvtnorm package if you need to install.packages("mvtnorm") require(mvtnorm) sim.betas <- rmvnorm(n = 1, mean = opt$par, sigma = -solve(opt$hessian)) sim.betas [,1] [,2] [,3] [,4] [1,] -4.233413-2.389987 0.06549826 0.137009 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 37 / 57

Quantities of Interest Untransform X β Now we need to choose some values of the covariates that we want predictions about. Let s make predictions about the 2010 data. votes10 <- read.dta("votes10.dta") votes10 <- as.matrix(cbind(1, votes10)) Well first predict the outcome given the covariates in row 1 of votes10: votes10[1,] 1 open freshman incpres 1 0 1 37 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 38 / 57

Quantities of Interest Untransform X β Now we multiply our covariates of interest, votes10[1,], by our simulated parameters: votes10[1,] %*% t(sim.betas) [,1] [1,] 1.087011 If we stopped right here we d be making two mistakes. 1 1.09 is not the predicted probability (obviously), it s the predicted log odds 2 We haven t done a very good job of accounting for the uncertainty in the model Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 39 / 57

Quantities of Interest Untransform X β To turn X β into a predicted probability we need to plug it back into our link function, which was 1 1+exp( X β) 1 / (1 + exp(-votes10[1,] %*% t(sim.betas))) [,1] [1,] 0.7478185 or require(boot) inv.logit(votes10[1,] %*% t(sim.betas)) [,1] [1,] 0.7478185 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 40 / 57

Quantities of Interest Simulate from the stochastic function Now we have to account for fundamental uncertainty by simulating from the original stochastic function, Bernoulli draws <- rbinom(n = 10000, size = 1, prob = sim.xb) mean(draws) [1] 0.7464 Is 0.7464 our best guess at the predicted probability of the incumbent winning for this election? Nope Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 41 / 57

Quantities of Interest Store and repeat Remember, we only took 1 draw of our βs from the multivariate normal distribution. To fully account for estimation uncertainty, we need to take tons of draws of β. To do this we d need to loop over all the steps I just went through and get the full distribution of predicted probabilities for this case. Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 42 / 57

Quantities of Interest Speeding up the process Or, instead of using a loop, let s just vectorize our code: sim.betas <- rmvnorm(n = 10000, mean = opt$par, sigma = -solve(opt$hessian)) head(sim.betas) [,1] [,2] [,3] [,4] [1,] -2.550772-2.399970-1.053165235 0.1068195 [2,] -3.152271-1.892773 0.239023294 0.1196752 [3,] -3.736406-2.089426 0.009018314 0.1230741 [4,] -3.467353-2.245585-0.060595098 0.1237179 [5,] -2.633138-2.258514-0.184834591 0.1097837 [6,] -2.304798-2.212776-0.303566255 0.1006882 dim(sim.betas) [1] 10000 4 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 43 / 57

Quantities of Interest Speeding up the process Now multiply the 10000x4 β matrix by your 1x4 vector of X of interest pred.prob <- votes10[1,] %*% t(sim.betas) And untransform them pred.prob <- c(inv.logit(pred.prob)) head(pred.prob) [1] 0.6424987 0.6941857 0.6672079 0.6771491 0.6848743 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 44 / 57

Quantities of Interest Speeding up the process Shortcut: For this particular example (and only with binary outcomes) we can skip the last step of simulating from the stochastic component. Why? Because E(y X ) = π; i.e. the parameter is the expected value of the outcome and that s all we re interested in with this case. Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 45 / 57

Quantities of Interest When doesn t this work? Suppose that y i Expo(λ i ) where λ i = exp(x i β). We could write our likelihood and then optimize to find ˆβ. Then, for some covariates of interest, X i, we now have a simulated sampling distribution for λ i which has a mean at E[exp(X i ˆβ)]) If y Expo(λ), then E(y) = 1 λ. You might be tempted to say that because E[ ˆλ i ] = E[exp(X i ˆβ)]) then Ê[y] = 1/E[exp(X i ˆβ)] It turns out this isn t the case because E[1/ˆλ] 1/E[ˆλ]. Jensen s inequality: given a random variable X, E[g(X )] g(e[x ]). It s if g() is concave; if g() is convex. Rule of thumb: if E[Y ] = θ, you are safe taking the shortcut. Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 46 / 57

Quantities of Interest Look at Our Results Freshman incumbent where presidential candidate got 37% Frequency 0 500 1000 1500 2000 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Predicted probability incumbent victory Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 47 / 57

Quantities of Interest Look at Our Results mean(pred.prob) [1] 0.6960111 quantile(pred.prob, prob = c(.025,.975)) 2.5% 97.5% 0.5126669 0.8422167 mean(pred.prob >.5) [1] 0.979 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 48 / 57

Quantities of Interest Different QOI What if our QOI was how many incumbents we expect to win in 2010? We ll follow the same steps, but just multiply by all the 365 observations in votes10 instead of just the first Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 49 / 57

Quantities of Interest Different QOI sim.betas <- rmvnorm(n = 10000, mean = opt$par, sigma = -solve(opt$hessian)) Now what? pred.prob <- inv.logit(votes10 %*% t(sim.betas)) What s the dimensionality of pred.prob? 365 x 10000 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 50 / 57

Quantities of Interest Different QOI Now let s turn pred.prob into something useful results <- colsums(pred.prob >.5) What is results? Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 51 / 57

Quantities of Interest Different QOI Projections of 2010 elections Proportion 0.00 0.05 0.10 0.15 0.20 345 350 355 360 365 Number of seats won by incumbent's party (out of 365 contested races) Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 52 / 57

Other useful R functions Outline 1 Logistics 2 Generalized Linear Models 3 Our example 4 Logit Regression 5 Quantities of Interest 6 Other useful R functions Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 53 / 57

Other useful R functions glm() Estimating a logit model using glm(): model.glm <- glm(incwin ~ open + freshman + incpres, data = votes, family = "binomial", link = "logit") Logit is the default link function for the binomial family, but you could also specify probit here if you wanted. > coef(model.glm) (Intercept) open freshman incpres -2.8935308-2.1262600-0.3570745 0.1109699 > opt$par [1] -2.9064379-2.1266744-0.3568115 0.1112137 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 54 / 57

Other useful R functions Zelig Zelig is an R package that Gary developed a few years ago which is meant to streamline estimating models and getting quantities of interest There s three main functions you ll use in Zelig: 1 Estimate your model using the zelig() function 2 Set your covariates of interest using the setx() function 3 Simulate QOI using the sim() function Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 55 / 57

Other useful R functions Zelig Estimate your model: #install.packages("zelig") require(zelig) model.zelig <- zelig(incwin ~ open + freshman + incpres, data = votes, model = "logit") Set your covariates: values <- setx(model.zelig, open = 0, freshman = 1, incpres = 37) Simulate your QOI: sims <- sim(model.zelig, values) Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 56 / 57

Other useful R functions Zelig Predicted Values: Y X 0.0 0.2 0.4 0.6 Y=0 Y=1 Expected Values: E(Y X) 0 1 2 3 4 0.4 0.5 0.6 0.7 0.8 0.9 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 57 / 57

Other useful R functions Next week: probit, ordered probit, first differences, marginal effects Questions? Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 58 / 57