Advanced Quantitative Methods: limited dependent variables
|
|
- Cecilia Hubbard
- 5 years ago
- Views:
Transcription
1 Advanced Quantitative Methods: Limited Dependent Variables I University College Dublin 2 April 2013
2
3 Outline Model Measurement levels
4 Components Model Measurement levels Two components of the model: y N(µ,σ 2 ) Stochastic µ = Xβ Systematic
5 Components Model Measurement levels Two components of the model: y N(µ,σ 2 ) Stochastic µ = Xβ Systematic Generalised version (not necessarily linear): y f(µ,α) Stochastic µ = g(x,β) Systematic (King 1998, 8)
6 Components Model Measurement levels y f(µ,α) Stochastic µ = g(x,β) Systematic Stochastic component: varies over repeated (hypothetical) observations on the same unit. Systematic component: varies across units, but constant given X. (King 1998, 8)
7 Uncertainty Model Measurement levels y f(µ,α) Stochastic µ = g(x,β) Systematic Two types of uncertainty: Estimation uncertainty: lack of knowledge about α and β; can be reduced by increasing n.
8 Uncertainty Model Measurement levels y f(µ,α) Stochastic µ = g(x,β) Systematic Two types of uncertainty: Estimation uncertainty: lack of knowledge about α and β; can be reduced by increasing n. Fundamental uncertainty: represented by stochastic component and exists independent of researcher.
9 Levels of measurement Model Measurement levels Discreet Continuous party choice Interval Ratio Categories in no particular order (examples in cells)
10 Levels of measurement Model Measurement levels Discreet Continuous party choice - education level - - Interval Ratio Categories in a specific order (examples in cells)
11 Levels of measurement Model Measurement levels Discreet Continuous party choice - education level - - Interval how likely to vote... temperature Ratio All values possible (examples in cells)
12 Levels of measurement Model Measurement levels Discreet Continuous party choice - education level - - Interval how likely to vote... temperature Ratio deaths in war ideological distance All values possible, with a meaningful zero point (examples in cells)
13 Levels of measurement Model Measurement levels Discreet Continuous party choice - education level - turnout - Interval how likely to vote... temperature Ratio deaths in war ideological distance Two categories, coded as 0 and 1 (examples in cells)
14 Limited dependent variables Model Measurement levels When a dependent variable is not continuous, or is truncated for some reason, a linear model would lead to implausible predictions.
15 Limited dependent variables Model Measurement levels When a dependent variable is not continuous, or is truncated for some reason, a linear model would lead to implausible predictions. E.g. regressing whether someone voted on a set of independent variables will give somewhat reasonable estimates (see previous homework), but using these estimates to calculate predictions leads to meaningless predictions.
16 Limited dependent variables Model Measurement levels When a dependent variable is not continuous, or is truncated for some reason, a linear model would lead to implausible predictions. E.g. regressing whether someone voted on a set of independent variables will give somewhat reasonable estimates (see previous homework), but using these estimates to calculate predictions leads to meaningless predictions. Furthermore, estimating limited dependent variable data with a linear model implies serious heteroscedasticity.
17 Outline Logistic regression Interpretation Probit regression
18 models Logistic regression Interpretation Probit regression models have a dependent variable consisting of two categories.
19 models Logistic regression Interpretation Probit regression models have a dependent variable consisting of two categories. For example, Vote on a particular law Turning out in an election Approval in a referendum Bankrupt or not
20 Logistic regression Logistic regression Interpretation Probit regression Stochastic: y i = Bernoulli(π i ) = π y i i (1 π i ) 1 y i
21 Logistic regression Logistic regression Interpretation Probit regression Stochastic: y i = Bernoulli(π i ) = π y i i (1 π i ) 1 y i Variance of stochastic component: V(y i ) = π i (1 π i )
22 Logistic regression Logistic regression Interpretation Probit regression Stochastic: y i = Bernoulli(π i ) = π y i i (1 π i ) 1 y i Variance of stochastic component: V(y i ) = π i (1 π i ) Systematic: π i = g(x i β) = 1 1+e x iβ
23 Logistic distribution Logistic regression Interpretation Probit regression 1/(1 + exp( x)) x x 0.5x x
24 Logistic regression Logistic regression Interpretation Probit regression Resulting predictions: are bounded between 0 and 1
25 Logistic regression Logistic regression Interpretation Probit regression Resulting predictions: are bounded between 0 and 1 show limited effect at extremes
26 Logistic regression Logistic regression Interpretation Probit regression Resulting predictions: are bounded between 0 and 1 show limited effect at extremes are a smooth, monotone translation from linear prediction X i β
27 Loglikelihood function Logistic regression Interpretation Probit regression Systematic part: π i = 1 1+e x iβ Stochastic part: P(y i = 1) = π y i i (1 π i ) 1 y i n f(y π i ) = π y i i (1 π i ) 1 y i i=1 l(y) = log = n i=1 π y i i (1 π i ) 1 y i n y i logπ i + i=1 n (1 y i )log(1 π i ) i=1
28 Logistic regression in R Logistic regression Interpretation Probit regression model <- glm(y ~ x1 + x2 + x3, data=data, family=binomial(link="logit")) summary(model)
29 Exercise Logistic regression Interpretation Probit regression Using the asiabaro.dta data set, estimate a logistic regression explaining abstention in elections by trust in the government, satisfaction with democracy, gender, urbanisation, and preference for democracy.
30 Graphical interpretation Logistic regression Interpretation Probit regression Useful for plotting relationship between one x and π, holding the other values of X constant
31 Graphical interpretation Logistic regression Interpretation Probit regression Useful for plotting relationship between one x and π, holding the other values of X constant (e.g. at the mean, median, etc.).
32 Graphical interpretation Logistic regression Interpretation Probit regression Useful for plotting relationship between one x and π, holding the other values of X constant (e.g. at the mean, median, etc.). Because the link function g(xβ) is not linear in this case (but instead g(xβ) = 1 1+e Xβ ), the effect of X on y depends on all X.
33 Graphical: example in R Logistic regression Interpretation Probit regression m <- glm(y ~ x1 + x2 + x3, family=binomial(link="logit")) curve(1/(1+exp(-(coef(m)[1] + coef(m)[2]*x + coef(m)[3]*mean(x2) + coef(m([4]*mean(x3)))), from=0, to=1)
34 Graphical: example Logistic regression Interpretation Probit regression jitter(vote)
35 Fitted values Logistic regression Interpretation Probit regression A second useful way of interpreting logit regression coefficients is by describing typical cases or interesting examples.
36 Fitted values Logistic regression Interpretation Probit regression A second useful way of interpreting logit regression coefficients is by describing typical cases or interesting examples. Party Amount P(Vote = 1) 95% C.I. Republican $ [.69,.95] Democrat $ [.27,.65] Republican $ [.84,.99] Democrat $ [.44,.95]
37 First differences Logistic regression Interpretation Probit regression Basic idea is to calculate and present: ˆπ = g(xβ) g(x β), whereby X differs only in one variable from X.
38 First differences Logistic regression Interpretation Probit regression Basic idea is to calculate and present: ˆπ = g(xβ) g(x β), whereby X differs only in one variable from X. Variable Values Diff 95% C.I. Party Rep, Dem -.34 [-.53,-.11] Amount $10000, $ [.03,.20]
39 Derivatives Logistic regression Interpretation Probit regression δˆπ X j = β jˆπ(1 ˆπ)
40 Derivatives Logistic regression Interpretation Probit regression δˆπ X j = β jˆπ(1 ˆπ) Hence a quick method to interpret logit coefficients is to divide them by 4 to get the slope at ˆπ =.5.
41 Presentation Logistic regression Interpretation Probit regression Bottom line: it is much better to present interpretable and understandable inferences, with an indication of the level of uncertainty, than to present simply estimated coefficients.
42 Presentation Logistic regression Interpretation Probit regression E.g. An increase in automobile support for a Republican senator from $10000 to $20000 in total increases his or her probability to vote for the Corporate Average Fuel Economy standard bill by 11%, give or take 7%, all else equal.
43 Predictions check Logistic regression Interpretation Probit regression One method of checking the predictions is to Group the cases in bins based on their predicted probabilities
44 Predictions check Logistic regression Interpretation Probit regression One method of checking the predictions is to Group the cases in bins based on their predicted probabilities, e.g. 0-10%, 10-20%, etc.
45 Predictions check Logistic regression Interpretation Probit regression One method of checking the predictions is to Group the cases in bins based on their predicted probabilities, e.g. 0-10%, 10-20%, etc. Plot the bins against the number of ones found in this bin.
46 Predictions check Logistic regression Interpretation Probit regression One method of checking the predictions is to Group the cases in bins based on their predicted probabilities, e.g. 0-10%, 10-20%, etc. Plot the bins against the number of ones found in this bin. The closer this is to the 45 degree line, the more accurate the predictions of the probabilities.
47 Predictions check Logistic regression Interpretation Probit regression One method of checking the predictions is to Group the cases in bins based on their predicted probabilities, e.g. 0-10%, 10-20%, etc. Plot the bins against the number of ones found in this bin. The closer this is to the 45 degree line, the more accurate the predictions of the probabilities. In R, you can use the custom function plot.predicted().
48 Logistic regression Interpretation Probit regression m <- glm(..., family = binomial(link = "logit")) phat <- predict(m, type = "response") pgroups <- cut(phat, seq(0, 1,.1)) hist(phat, xlim = c(0,1)) barplot(prop.table(table(pgroups)), ylim = c(0,1))
49 Predictions check Logistic regression Interpretation Probit regression Proportion of outcomes positive Predicted probability
50 Predictions check Logistic regression Interpretation Probit regression 0 s correctly predicted s correctly predicted
51 Exercise Logistic regression Interpretation Probit regression Using the logistic previous logistic regression: 1 What is the slope of the regression line with respect to preference for democracy at ˆπ =.5? 2 Calculate predicted differences for urban vs rural respondent. 3 Calculate first differences for urban vs rural and female vs male. 4 Plot predicted probabilities as a function of preference for democracy.
52 Threshold models Logistic regression Interpretation Probit regression Imagine we have a latent, unobserved variable: y f(µ) µ = Xβ
53 Threshold models Logistic regression Interpretation Probit regression Imagine we have a latent, unobserved variable: y f(µ) µ = Xβ With the following observation mechanism: { 1 if yi > 0 y i = 0 if yi 0.
54 Threshold models Logistic regression Interpretation Probit regression If f(µ i ) is the standardised logistic distribution, we replicate the logit model.
55 Threshold models Logistic regression Interpretation Probit regression If f(µ i ) is the standardised logistic distribution, we replicate the logit model. f(x i β) = e y i µ i (1+e y i µ i ) 2
56 Threshold models Logistic regression Interpretation Probit regression Alternatively, if f(µ i ) is the cumulative standard normal distribution (σ 2 = 1), we have the probit model.
57 Threshold models Logistic regression Interpretation Probit regression Alternatively, if f(µ i ) is the cumulative standard normal distribution (σ 2 = 1), we have the probit model. Hence β can now be interpreted as the effect of an increase by 1 in x on y, whereby the unit of y is one standard deviation.
58 Threshold models Logistic regression Interpretation Probit regression Alternatively, if f(µ i ) is the cumulative standard normal distribution (σ 2 = 1), we have the probit model. Hence β can now be interpreted as the effect of an increase by 1 in x on y, whereby the unit of y is one standard deviation. P(y i = 1 β,x i ) = Φ(x i β), whereby Φ(x) is the cumulative standard normal distribution (i.e. the surface between and x under a normal distribution with µ = 0 and σ 2 = 1).
59 Threshold models Logistic regression Interpretation Probit regression Alternatively, if f(µ i ) is the cumulative standard normal distribution (σ 2 = 1), we have the probit model. Hence β can now be interpreted as the effect of an increase by 1 in x on y, whereby the unit of y is one standard deviation. P(y i = 1 β,x i ) = Φ(x i β), whereby Φ(x) is the cumulative standard normal distribution (i.e. the surface between and x under a normal distribution with µ = 0 and σ 2 = 1). pnorm(xb); predict(m, type="response")
60 Threshold models Logistic regression Interpretation Probit regression
61 Loglikelihood function Logistic regression Interpretation Probit regression Systematic part: π i = Φ(x i β), where Φ(x) is the cumulative standard normal distribution. Stochastic part: P(y i = 1) = π y i i (1 π i ) 1 y i n f(y π i ) = π y i i (1 π i ) 1 y i l(y) = i=1 n y i logπ i + i=1 n (1 y i )log(1 π i ) i=1
62 Exercise Logistic regression Interpretation Probit regression Repeat the previous estimation using probit instead of logit. 1 Plot predicted probabilities with respect to preference for democracy. 2 Evaluate the predictive performance of this model. 3 Add suitdemoc as a dependent variable to the model. Does the predictive performance improve?
63 Outline
64 Ordered data Models where the dependent variable is categorical, and the categories are in a particular order.
65 Ordered data Models where the dependent variable is categorical, and the categories are in a particular order. We can thus take a similar latent variable approach.
66 Ordered probit y i N(µ i,1) µ i = x i β
67 Ordered probit y i N(µ i,1) µ i = x i β With the following observation mechanism: y i = j if τ j 1 y i τ j
68 Ordered probit y i N(µ i,1) µ i = x i β With the following observation mechanism: y i = j if τ j 1 y i τ j Or, alternatively, when we use dummy variables for each category: { 1 if τ j 1 yi τ j y ij = 0 otherwise.
69 Ordered probit y ij = { 1 if τ j 1 y i τ j 0 otherwise.
70 Ordered probit y ij = { 1 if τ j 1 y i τ j 0 otherwise. P(y i = j β,x i ) = Φ(τ j x i β) Φ(τ j 1 x i β)
71 Ordered probit
72 Ordered probit To estimate in R: library(mass) m <- polr(y ~ x1 + x2, method="probit")
73 Ordered probit To estimate in R: library(mass) m <- polr(y ~ x1 + x2, method="probit") To get predicted probabilities: pnorm(m$zeta[2] - xb) - pnorm(m$zeta[1] - xb) predict(m, type="probs")
74 Loglikelihood function Assuming y i can take the values 0,1,2,...,p. P(y i = 0) = Φ(τ 1 x i β) P(y i = p) = Φ(x i β τ p ) P(y i = j 0 < y i < p) = Φ(τ j+1 x i β) Φ(τ j x i β)
75 Loglikelihood function Assuming y i can take the values 0,1,2,...,p. P(y i = 0) = Φ(τ 1 x i β) P(y i = p) = Φ(x i β τ p ) P(y i = j 0 < y i < p) = Φ(τ j+1 x i β) Φ(τ j x i β) l(y) = log(φ(τ 1 x i β))+ log(φ(x i β τ p )) y i =0 y i =p + 0<y i <p log(φ(τ j+1 x i β) Φ(τ j x i β))
76 Exercise Using the asiabaro.dta data set, estimate a model explaining interest in politics by gender, education, and reliance on fate. 1 Perform t-tests for each of the independent variables. Use: se <- sqrt(diag(vcov(m))) 2 Check the standard errors of the τ values are the categories clearly separated? 3 Calculate the predicted probability for males and females of the respondent being very interested in politics.
77 Outline
78 Multinomial model Here the dependent variable consists of multiple categories, without particular order.
79 Multinomial model Here the dependent variable consists of multiple categories, without particular order. A latent variable approach will thus not work.
80 Multinomial model Here the dependent variable consists of multiple categories, without particular order. A latent variable approach will thus not work. The probabilities for a particular case to be in any of those categories still has to be one.
81 Multinomial logit R function: multinom, in package nnet.
82 Multinomial logit R function: multinom, in package nnet. m <- multinom(y ~ x1 + x2) summary(m)
83 Multinomial logit R function: multinom, in package nnet. m <- multinom(y ~ x1 + x2) summary(m) predict(m, type="probs")
84 Multinomial logit 1 p if j = 1 k=1 P(y i = j β,x i ) = ex i β k e x i β j p if j > 1 k=1 ex i β k
85 Multinomial logit 1 p if j = 1 k=1 P(y i = j β,x i ) = ex i β k e x i β j p if j > 1 k=1 ex i β k To get predicted probabilities for a new dataset (X ) - e.g. one that is constant on all variables but one:
86 Multinomial logit 1 p if j = 1 k=1 P(y i = j β,x i ) = ex i β k e x i β j p if j > 1 k=1 ex i β k To get predicted probabilities for a new dataset (X ) - e.g. one that is constant on all variables but one: m <- multinom(y ~ x1 + x2 + x3 + x4) Xb <- X %*% t(coef(m)) denominator <- 1 - rowsums(exp(xb)) probs <- exp(xb) / denominator baseline <- 1 - rowsums(p) p <- cbind(baseline, p)
87 Loglikelihood function 1 p if j = 1 k=1 P(y i = j β,x i ) = ex i β k e x i β j p if j > 1 k=1 ex i β k
88 Loglikelihood function 1 p if j = 1 k=1 P(y i = j β,x i ) = ex i β k e x i β j p if j > 1 k=1 ex i β k l(y) = n p p I(y i = j)x i β j log i=1 j=0 j=0 e x iβ j
89 Exercise Using data set asiabaro.dta, selecting only data from Taiwan, explain party choice by urbanisation, gender, age, and trust in the police. Estimate a multinomial model and interpret the results.
90 Outline
91 models Here the dependent variable is a count (of events). For example, The number of conflicts in a particular period The number of coups d état in the 1980s The number of visits to a psychologist for a respondent
92 models Here the dependent variable is a count (of events). For example, The number of conflicts in a particular period The number of coups d état in the 1980s The number of visits to a psychologist for a respondent Note: Truncated at zero (no negative outcome possible)
93 models Here the dependent variable is a count (of events). For example, The number of conflicts in a particular period The number of coups d état in the 1980s The number of visits to a psychologist for a respondent Note: Truncated at zero (no negative outcome possible) No upper limit
94 models Here the dependent variable is a count (of events). For example, The number of conflicts in a particular period The number of coups d état in the 1980s The number of visits to a psychologist for a respondent Note: Truncated at zero (no negative outcome possible) No upper limit Limited by time, place, or both
95 Poisson regression Stochastic: y i Poisson(λ i )
96 Poisson regression Stochastic: y i Poisson(λ i ) Systematic: λ i = e x iβ
97 Poisson regression Stochastic: y i Poisson(λ i ) Systematic: λ i = e x iβ Poisson distribution: f(k,λ) = λk e λ k!
98 Poisson regression Stochastic: y i Poisson(λ i ) Systematic: λ i = e x iβ Poisson distribution: f(k,λ) = λk e λ k! Note that there is no parameter for the variance. This leads to regular underestimation of the variance (overdispersion, most common) or overestimation of the variance (underdispersion).
99 Poisson regression Stochastic: y i Poisson(λ i ) Systematic: λ i = e x iβ Poisson distribution: f(k,λ) = λk e λ k! Note that there is no parameter for the variance. This leads to regular underestimation of the variance (overdispersion, most common) or overestimation of the variance (underdispersion). glm(y ~ x1 + x2 + x3, family=poisson)
100 Poisson regression The exponential of the coefficients can be interpreted in a multiplicative sense. E.g. if the coefficient of x 1 is 0.012, then e = implies that an increase of x 1 by 1 increases y by 1.2%.
101 Estimating overdispersion The estimated overdispersion of a Poisson model is 1 n k n ( y i ŷ i sd(ŷ i ) )2 = 1 n k i n ( y i e X iβ e X i β )2, i which has a χ 2 n k distribution.
102 Negative binomial The negative binomial model is useful for overdispersed data: Stochastic: y i NegBin(φ,σ 2 )
103 Negative binomial The negative binomial model is useful for overdispersed data: Stochastic: y i NegBin(φ,σ 2 ) Systematic: φ = e x iβ = E(y i )
104 Negative binomial The negative binomial model is useful for overdispersed data: Stochastic: y i NegBin(φ,σ 2 ) Systematic: φ = e xiβ = E(y i ) Variance: V(y X) = φσ 2
105 Negative binomial The negative binomial model is useful for overdispersed data: Stochastic: y i NegBin(φ,σ 2 ) Systematic: φ = e x iβ = E(y i ) Variance: V(y X) = φσ 2 When σ 2 approaches 1, the negative binomial approaches the Poisson distribution.
106 Negative binomial The negative binomial model is useful for overdispersed data: Stochastic: y i NegBin(φ,σ 2 ) Systematic: φ = e x iβ = E(y i ) Variance: V(y X) = φσ 2 When σ 2 approaches 1, the negative binomial approaches the Poisson distribution. library(mass) glm.nb(y ~ x1 + x2 + x3)
Multiple regression: Categorical dependent variables
Multiple : Categorical Johan A. Elkink School of Politics & International Relations University College Dublin 28 November 2016 1 2 3 4 Outline 1 2 3 4 models models have a variable consisting of two categories.
More informationData Analytics for Social Science
Data Analytics for Social Science Johan A. Elkink School of Politics & International Relations University College Dublin 17 October 2017 Outline 1 2 3 4 5 6 Levels of measurement Discreet Continuous Nominal
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationEconometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit
Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationLogit Regression and Quantities of Interest
Logit Regression and Quantities of Interest Stephen Pettigrew March 4, 2015 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 1 / 57 Outline 1 Logistics 2 Generalized Linear Models
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationLogit Regression and Quantities of Interest
Logit Regression and Quantities of Interest Stephen Pettigrew March 5, 2014 Stephen Pettigrew Logit Regression and Quantities of Interest March 5, 2014 1 / 59 Outline 1 Logistics 2 Generalized Linear Models
More informationdisc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez
disc choice5.tex; April 11, 2001 1 Lecture Notes on Discrete Choice Models Copyright, April 11, 2001 Jonathan Nagler 1 Topics 1. Review the Latent Varible Setup For Binary Choice ffl Logit ffl Likelihood
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationUsing the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes 1 JunXuJ.ScottLong Indiana University 2005-02-03 1 General Formula The delta method is a general
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationGoals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model
Goals PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1 Tetsuya Matsubayashi University of North Texas November 2, 2010 Random utility model Multinomial logit model Conditional logit model
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationGeneralized Linear Models I
Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationEconometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland
Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2018 Part III Limited Dependent Variable Models As of Jan 30, 2017 1 Background 2 Binary Dependent Variable The Linear Probability
More informationLecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson
Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample
More informationAdvanced Quantitative Methods: ordinary least squares
Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the
More informationCh 6: Multicategory Logit Models
293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationCan a Pseudo Panel be a Substitute for a Genuine Panel?
Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20 Outline Motivation: gauging mechanism of changes Introduce
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationPost-Estimation Uncertainty
Post-Estimation Uncertainty Brad 1 1 Department of Political Science University of California, Davis May 12, 2009 Simulation Methods and Estimation Uncertainty Common approach to presenting statistical
More informationNELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation
NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable
More informationGOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1
GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1 Anton Strezhnev Harvard University March 23, 2016 1 These section notes are heavily indebted to past Gov 2001 TFs
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationModels for Heterogeneous Choices
APPENDIX B Models for Heterogeneous Choices Heteroskedastic Choice Models In the empirical chapters of the printed book we are interested in testing two different types of propositions about the beliefs
More informationLogistic & Tobit Regression
Logistic & Tobit Regression Different Types of Regression Binary Regression (D) Logistic transformation + e P( y x) = 1 + e! " x! + " x " P( y x) % ln$ ' = ( + ) x # 1! P( y x) & logit of P(y x){ P(y
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationGeneralized linear models
CHAPTER 6 Generalized linear models 6.1 Introduction Generalized linear modeling is a framework for statistical analysis that includes linear and logistic regression as special cases. Linear regression
More informationApplied Health Economics (for B.Sc.)
Applied Health Economics (for B.Sc.) Helmut Farbmacher Department of Economics University of Mannheim Autumn Semester 2017 Outlook 1 Linear models (OLS, Omitted variables, 2SLS) 2 Limited and qualitative
More informationApplied Econometrics Lecture 1
Lecture 1 1 1 Università di Urbino Università di Urbino PhD Programme in Global Studies Spring 2018 Outline of this module Beyond OLS (very brief sketch) Regression and causality: sources of endogeneity
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationMaximum Likelihood and. Limited Dependent Variable Models
Maximum Likelihood and Limited Dependent Variable Models Michele Pellizzari IGIER-Bocconi, IZA and frdb May 24, 2010 These notes are largely based on the textbook by Jeffrey M. Wooldridge. 2002. Econometric
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationNinth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"
Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric
More informationRegression with Qualitative Information. Part VI. Regression with Qualitative Information
Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving
More informationVisualizing Inference
CSSS 569: Visualizing Data Visualizing Inference Christopher Adolph University of Washington, Seattle February 23, 2011 Assistant Professor, Department of Political Science and Center for Statistics and
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationVisualizing Inference
CSSS 569: Visualizing Data Visualizing Inference Christopher Adolph University of Washington, Seattle February 23, 2011 Assistant Professor, Department of Political Science and Center for Statistics and
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationLecture-19: Modeling Count Data II
Lecture-19: Modeling Count Data II 1 In Today s Class Recap of Count data models Truncated count data models Zero-inflated models Panel count data models R-implementation 2 Count Data In many a phenomena
More informationGOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching
GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching Mayya Komisarchik Harvard University April 13, 2016 1 Heartfelt thanks to all of the Gov 2001 TFs of yesteryear; this section draws heavily
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More information12 Generalized linear models
12 Generalized linear models In this chapter, we combine regression models with other parametric probability models like the binomial and Poisson distributions. Binary responses In many situations, we
More informationLecture 41 Sections Mon, Apr 7, 2008
Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,
More informationPractice Examination # 3
Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single
More informationPoisson Regression. Gelman & Hill Chapter 6. February 6, 2017
Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationLoglinear models. STAT 526 Professor Olga Vitek
Loglinear models STAT 526 Professor Olga Vitek April 19, 2011 8 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1 Recall: Poisson Distribution Probability distribution: Y - number
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationSplitting a predictor at the upper quarter or third and the lower quarter or third
Splitting a predictor at the upper quarter or third and the lower quarter or third Andrew Gelman David K. Park July 31, 2007 Abstract A linear regression of y on x can be approximated by a simple difference:
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationChapter 11: Models for Matched Pairs
: Models for Matched Pairs Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More information[Chapter 6. Functions of Random Variables]
[Chapter 6. Functions of Random Variables] 6.1 Introduction 6.2 Finding the probability distribution of a function of random variables 6.3 The method of distribution functions 6.5 The method of Moment-generating
More informationGeneralised Linear Mixed Models
University of Edinburgh November 4, 2014 ANCOVA Bradley-Terry models MANCOVA Meta-analysis Multi-membership models Pedigree analysis: animal models Phylogenetic analysis: comparative approach Random Regression
More informationIntroduction To Logistic Regression
Introduction To Lecture 22 April 28, 2005 Applied Regression Analysis Lecture #22-4/28/2005 Slide 1 of 28 Today s Lecture Logistic regression. Today s Lecture Lecture #22-4/28/2005 Slide 2 of 28 Background
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationGOV 2001/ 1002/ Stat E-200 Section 8 Ordered Probit and Zero-Inflated Logit
GOV 2001/ 1002/ Stat E-200 Section 8 Ordered Probit and Zero-Inflated Logit Solé Prillaman Harvard University March 25, 2015 1 / 56 LOGISTICS Reading Assignment- Becker and Kennedy (1992), Harris and Zhao
More informationBinary Dependent Variables
Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome
More informationGeneralized Linear Models
Generalized Linear Models Assumptions of Linear Model Homoskedasticity Model variance No error in X variables Errors in variables No missing data Missing data model Normally distributed error Error in
More informationConsider Table 1 (Note connection to start-stop process).
Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationCHAPTER 5. Logistic regression
CHAPTER 5 Logistic regression Logistic regression is the standard way to model binary outcomes (that is, data y i that take on the values 0 or 1). Section 5.1 introduces logistic regression in a simple
More informationWeek 7: Binary Outcomes (Scott Long Chapter 3 Part 2)
Week 7: (Scott Long Chapter 3 Part 2) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China April 29, 2014 1 / 38 ML Estimation for Probit and Logit ML Estimation for Probit and Logit
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationCount and Duration Models
Count and Duration Models Stephen Pettigrew April 2, 2015 Stephen Pettigrew Count and Duration Models April 2, 2015 1 / 1 Outline Stephen Pettigrew Count and Duration Models April 2, 2015 2 / 1 Logistics
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationRegression Discontinuity Designs
Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational
More informationEstimated Precision for Predictions from Generalized Linear Models in Sociological Research
Quality & Quantity 34: 137 152, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 137 Estimated Precision for Predictions from Generalized Linear Models in Sociological Research TIM FUTING
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationMULTIVARIATE DISTRIBUTIONS
Chapter 9 MULTIVARIATE DISTRIBUTIONS John Wishart (1898-1956) British statistician. Wishart was an assistant to Pearson at University College and to Fisher at Rothamsted. In 1928 he derived the distribution
More informationApplied Linear Statistical Methods
Applied Linear Statistical Methods (short lecturenotes) Prof. Rozenn Dahyot School of Computer Science and Statistics Trinity College Dublin Ireland www.scss.tcd.ie/rozenn.dahyot Hilary Term 2016 1. Introduction
More information