Post-Estimation Uncertainty

Size: px

Start display at page:

Download "Post-Estimation Uncertainty"

Brook Armstrong
5 years ago
Views:

1 Post-Estimation Uncertainty Brad 1 1 Department of Political Science University of California, Davis May 12, 2009

3 Simulation Methods and Estimation Uncertainty Common approach to presenting statistical analysis is no longer acceptable

4 Simulation Methods and Estimation Uncertainty Common approach to presenting statistical analysis is no longer acceptable The basic problem

5 Simulation Methods and Estimation Uncertainty Common approach to presenting statistical analysis is no longer acceptable The basic problem Recognize there is a lot of uncertainty around our point estimates and...

6 Simulation Methods and Estimation Uncertainty Common approach to presenting statistical analysis is no longer acceptable The basic problem Recognize there is a lot of uncertainty around our point estimates and... EVEN uncertainty around our estimates of uncertainty

7 Simulation Methods and Estimation Uncertainty Consider a probit function Pr(y = 1 x) = F (xβ) = Φ(xβ) (1)

8 Simulation Methods and Estimation Uncertainty Consider a probit function Pr(y = 1 x) = F (xβ) = Φ(xβ) (1) Φ is the cdf for the standard normal.

9 Simulation Methods and Estimation Uncertainty Consider a probit function Pr(y = 1 x) = F (xβ) = Φ(xβ) (1) Φ is the cdf for the standard normal. Probit model thus given by: Φ 1 (p i ) = ˆβ k x ik (2)

10 Simulation Methods and Estimation Uncertainty Consider a probit function Pr(y = 1 x) = F (xβ) = Φ(xβ) (1) Φ is the cdf for the standard normal. Probit model thus given by: Φ 1 (p i ) = ˆβ k x ik (2) Coefficients are scale by inverse normal

11 Simulation Methods and Estimation Uncertainty Consider a probit function Pr(y = 1 x) = F (xβ) = Φ(xβ) (1) Φ is the cdf for the standard normal. Probit model thus given by: Φ 1 (p i ) = ˆβ k x ik (2) Coefficients are scale by inverse normal That is, they re z-scores

12 > z.out <- glm( tempwork ~ education + libcon + latino + income + rep, + data = fp, family = binomial(link = "probit")) > summary(z.out) Call: glm(formula = tempwork ~ education + libcon + latino + income + rep, family = binomial(link = "probit"), data = fp) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) education libcon latino e-05 *** income rep ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: on 450 degrees of freedom Residual deviance: on 445 degrees of freedom (549 observations deleted due to missingness) AIC: Number of Fisher Scoring iterations: 4

13 Simulation Methods and Estimation Uncertainty Coefficients may be starred and reported.

14 Simulation Methods and Estimation Uncertainty Coefficients may be starred and reported. As is, not a whole lot of information.

15 Simulation Methods and Estimation Uncertainty Coefficients may be starred and reported. As is, not a whole lot of information. The real work begins post-estimation

16 Simulation Methods and Estimation Uncertainty Coefficients may be starred and reported. As is, not a whole lot of information. The real work begins post-estimation And we know how to do all of this stuff... right?

17 Quantities of Interest In general, there are a variety of quantities of interest (QI) to us.

18 Quantities of Interest In general, there are a variety of quantities of interest (QI) to us. E(Y X ), Pr(Y = 1 X ), Ŷ X, E(Y X 1) E(Y X ), Pr(Y = 1 X 1 )/ Pr(Y = 1 X )... and so forth.

19 Quantities of Interest In general, there are a variety of quantities of interest (QI) to us. E(Y X ), Pr(Y = 1 X ), Ŷ X, E(Y X 1) E(Y X ), Pr(Y = 1 X 1 )/ Pr(Y = 1 X )... and so forth. The issue is that there is considerable uncertainty around any of these quantities.

20 Quantities of Interest In general, there are a variety of quantities of interest (QI) to us. E(Y X ), Pr(Y = 1 X ), Ŷ X, E(Y X 1) E(Y X ), Pr(Y = 1 X 1 )/ Pr(Y = 1 X )... and so forth. The issue is that there is considerable uncertainty around any of these quantities. A predicted probability, or any predicted quantity from a stochastic model has a stochastic component around it.

21 Quantities of Interest In general, there are a variety of quantities of interest (QI) to us. E(Y X ), Pr(Y = 1 X ), Ŷ X, E(Y X 1) E(Y X ), Pr(Y = 1 X 1 )/ Pr(Y = 1 X )... and so forth. The issue is that there is considerable uncertainty around any of these quantities. A predicted probability, or any predicted quantity from a stochastic model has a stochastic component around it. That is, there are standard errors around all of this stuff.

22 Backing out these Quantities: Zelig Part of today s class is devoted to considering software developed by Imai, King, and Lau: Zelig

23 Backing out these Quantities: Zelig Part of today s class is devoted to considering software developed by Imai, King, and Lau: Zelig We don t need Zelig to do the stuff discussed in the previous slide; however, it makes life a lot easier if we consider it (or something like it... spost, for example, in Stata).

24 Backing out these Quantities: Zelig Part of today s class is devoted to considering software developed by Imai, King, and Lau: Zelig We don t need Zelig to do the stuff discussed in the previous slide; however, it makes life a lot easier if we consider it (or something like it... spost, for example, in Stata). Some theory first (these next few slides draw heavily from King)

25 Backing out these Quantities: Zelig Part of today s class is devoted to considering software developed by Imai, King, and Lau: Zelig We don t need Zelig to do the stuff discussed in the previous slide; however, it makes life a lot easier if we consider it (or something like it... spost, for example, in Stata). Some theory first (these next few slides draw heavily from King) Basic idea: Y i f (y i θ i, α (3)

26 Backing out these Quantities: Zelig Part of today s class is devoted to considering software developed by Imai, King, and Lau: Zelig We don t need Zelig to do the stuff discussed in the previous slide; however, it makes life a lot easier if we consider it (or something like it... spost, for example, in Stata). Some theory first (these next few slides draw heavily from King) Basic idea: Y i f (y i θ i, α (3) This gives a model where Y has a stochastic component with some pdf (f ( )), model parameters (θ), and ancillary parameter(s) α.

27 GLMs The random component is linked to a systematic component given by θ i = g(x i β) (4)

28 GLMs The random component is linked to a systematic component given by θ i = g(x i β) (4) This is the systematic component because it relates the mean of the response variable to the linear predictor. As such, (.) is the systematic component.

29 GLMs The random component is linked to a systematic component given by θ i = g(x i β) (4) This is the systematic component because it relates the mean of the response variable to the linear predictor. As such, (.) is the systematic component. This could be log-odds, OLS, probit, or anything else.

30 GLMs The random component is linked to a systematic component given by θ i = g(x i β) (4) This is the systematic component because it relates the mean of the response variable to the linear predictor. As such, (.) is the systematic component. This could be log-odds, OLS, probit, or anything else. The systematic component can be non-linear:

31 GLMs The random component is linked to a systematic component given by θ i = g(x i β) (4) This is the systematic component because it relates the mean of the response variable to the linear predictor. As such, (.) is the systematic component. This could be log-odds, OLS, probit, or anything else. The systematic component can be non-linear: E(Y ) = Pr(Y = 1) = exp(x i β) 1+exp(x i β)

32 GLMs The random component is linked to a systematic component given by θ i = g(x i β) (4) This is the systematic component because it relates the mean of the response variable to the linear predictor. As such, (.) is the systematic component. This could be log-odds, OLS, probit, or anything else. The systematic component can be non-linear: E(Y ) = Pr(Y = 1) = exp(x i β) 1+exp(x i β) In the probabilities, this logit is non-linear.

33 Modeling We can linearize logit (or probit) by suitable transformation...

34 Modeling We can linearize logit (or probit) by suitable transformation... But when we do, the E(Y ) becomes more difficult to interpret.

35 Modeling We can linearize logit (or probit) by suitable transformation... But when we do, the E(Y ) becomes more difficult to interpret. i.e. see the probit results from before.

36 Modeling We can linearize logit (or probit) by suitable transformation... But when we do, the E(Y ) becomes more difficult to interpret. i.e. see the probit results from before. What we normally do: Take a supposed random sample to understand some population Use the sample to estimate some features from the population Estimates will be highly sensitive to the sample size (bigger samples produce more precise estimates).

37 Modeling using Simulations Suppose we estimate our model and then simulate quantities of interest from that particular model?

38 Modeling using Simulations Suppose we estimate our model and then simulate quantities of interest from that particular model? Under this approach we try to understand a distribution by taking draws from the observed sample. And, like we did with bootstrapping or MC methods, we approximate features of the distribution (marginal changes, expected values, and so forth). The more replicates, the more precise our approximation.

39 Modeling using Simulations Suppose we estimate our model and then simulate quantities of interest from that particular model? Under this approach we try to understand a distribution by taking draws from the observed sample. And, like we did with bootstrapping or MC methods, we approximate features of the distribution (marginal changes, expected values, and so forth). The more replicates, the more precise our approximation. Analogous to what we think we re doing using classical statistical theory.

40 Modeling using Simulations Suppose we estimate our model and then simulate quantities of interest from that particular model? Under this approach we try to understand a distribution by taking draws from the observed sample. And, like we did with bootstrapping or MC methods, we approximate features of the distribution (marginal changes, expected values, and so forth). The more replicates, the more precise our approximation. Analogous to what we think we re doing using classical statistical theory. Do we need specialized software to do this?

41 Modeling using Simulations Suppose we estimate our model and then simulate quantities of interest from that particular model? Under this approach we try to understand a distribution by taking draws from the observed sample. And, like we did with bootstrapping or MC methods, we approximate features of the distribution (marginal changes, expected values, and so forth). The more replicates, the more precise our approximation. Analogous to what we think we re doing using classical statistical theory. Do we need specialized software to do this? Not really, but it helps.

42 Simulation In regression context, draw simulations of β.

43 Simulation In regression context, draw simulations of β. Calculate the systematic component: θ = g(x i β i )

44 Simulation In regression context, draw simulations of β. Calculate the systematic component: θ = g(x i β i ) These two components are what King calls estimation uncertainty. Draw from the density, f ( θ, α), to account for fundamental uncertainty.

45 Simulation In regression context, draw simulations of β. Calculate the systematic component: θ = g(x i β i ) These two components are what King calls estimation uncertainty. Draw from the density, f ( θ, α), to account for fundamental uncertainty. What is the difference? (It s an important distinction)

46 Simulation Do what we did on the previous slide over and over!

47 Simulation Do what we did on the previous slide over and over! Sounds like what we did with the bootstrap.

48 Simulation Do what we did on the previous slide over and over! Sounds like what we did with the bootstrap. That is, create replicates B and compute some quantity we called θ. We can implement this approach via boostrapping.

49 Simulation Do what we did on the previous slide over and over! Sounds like what we did with the bootstrap. That is, create replicates B and compute some quantity we called θ. We can implement this approach via boostrapping. Important result: n, estimation uncertainty tends toward 0; however, fundamental uncertainty always remains.

50 Simulation Do what we did on the previous slide over and over! Sounds like what we did with the bootstrap. That is, create replicates B and compute some quantity we called θ. We can implement this approach via boostrapping. Important result: n, estimation uncertainty tends toward 0; however, fundamental uncertainty always remains. Essentially, we re building up a sampling distribution.

51 Logit Simulation Are income levels related to support for guest worker program?

52 Logit Simulation Are income levels related to support for guest worker program? Highest income level is 5; lowest is 1.

53 Logit Simulation Are income levels related to support for guest worker program? Highest income level is 5; lowest is 1. What does the logit estimator look like:

54 Logit Simulation Are income levels related to support for guest worker program? Highest income level is 5; lowest is 1. What does the logit estimator look like: In the probabilities, logit is a non-linear model.

55 Logit Simulation Are income levels related to support for guest worker program? Highest income level is 5; lowest is 1. What does the logit estimator look like: In the probabilities, logit is a non-linear model. E(y) = P(y = 1 x) = β k x ik, and y is binary. Pr(y = 1 x) = exp( β k x ik )

56 Logit Simulation Are income levels related to support for guest worker program? Highest income level is 5; lowest is 1. What does the logit estimator look like: In the probabilities, logit is a non-linear model. E(y) = P(y = 1 x) = β k x ik, and y is binary. Z = β k x ik, then Pr(y = 1 x) = Pr(y = 1 x) = exp( β k x ik ) exp( Z) = exp(z) 1 + exp(z)

57 Logit Simulation We estimate the model in this form: ( ) pi log = Z 1 p i

58 Logit Simulation We estimate the model in this form: ( ) pi log = Z 1 p i More formally Y i Bernoulli(y i π i ) implying π y i i (1 π i ) 1 y i

59 Logit Simulation We estimate the model in this form: ( ) pi log = Z 1 p i More formally Y i Bernoulli(y i π i ) implying Obviously, π i = Pr(y i = 1). π y i i (1 π i ) 1 y i

60 Logit Simulation We estimate the model in this form: ( ) pi log = Z 1 p i More formally Y i Bernoulli(y i π i ) implying Obviously, π i = Pr(y i = 1). π y i i (1 π i ) 1 y i This is the stochastic component to the logit estimator (it s in the binomial family)

61 Logit Simulation We estimate the model in this form: ( ) pi log = Z 1 p i More formally Y i Bernoulli(y i π i ) implying Obviously, π i = Pr(y i = 1). π y i i (1 π i ) 1 y i This is the stochastic component to the logit estimator (it s in the binomial family) The systematic component is Pr(y = 1 x) = exp( Z) = exp(z) 1 + exp(z)

62 Logit Simulation Step 1: simulate the regressors in Z ( β)

63 Logit Simulation Step 1: simulate the regressors in Z ( β) Calculate θ = 1 1+exp( x i β) for some covariate profile.

64 Logit Simulation Step 1: simulate the regressors in Z ( β) Calculate θ = 1 1+exp( x i β) for some covariate profile. Here, set income to it s highest and lowest values.

65 Logit Simulation Step 1: simulate the regressors in Z ( β) Calculate θ = 1 1+exp( x i β) for some covariate profile. Here, set income to it s highest and lowest values. Draw simulations of Y i from the binomial.

66 Logit Simulation How to simulate?

67 Logit Simulation How to simulate? Asymptotic Normal

68 Logit Simulation How to simulate? Asymptotic Normal Non-parametric bootstrapping

69 Logit Simulation How to simulate? Asymptotic Normal Non-parametric bootstrapping Bayesian simulation

70 The Basic Idea We can t repeatedly sample from the real world...

71 The Basic Idea We can t repeatedly sample from the real world... But we can from our observed data.

72 The Basic Idea We can t repeatedly sample from the real world... But we can from our observed data. Central limit theorem tells us with repeated samples, even from a non-normal population, the sampling distribution of some statistic will tend to the normal.

73 The Basic Idea We can t repeatedly sample from the real world... But we can from our observed data. Central limit theorem tells us with repeated samples, even from a non-normal population, the sampling distribution of some statistic will tend to the normal. We approximate this with simulations instead of samples.

74 The Basic Idea We can t repeatedly sample from the real world... But we can from our observed data. Central limit theorem tells us with repeated samples, even from a non-normal population, the sampling distribution of some statistic will tend to the normal. We approximate this with simulations instead of samples. Recall the result that the empirical CDF with tend to the true CDF with enough replicates.

75 Illustration Use the Zelig package with R

76 Illustration Use the Zelig package with R Zelig is a wrapper that allows a variety of simulations to be conducted post-estimation.

77 Illustration Use the Zelig package with R Zelig is a wrapper that allows a variety of simulations to be conducted post-estimation. Works with many of the models we teach here in our methods sequence including matching models.

78 Illustration Use the Zelig package with R Zelig is a wrapper that allows a variety of simulations to be conducted post-estimation. Works with many of the models we teach here in our methods sequence including matching models. Illustration using California Field Poll Data.

79 z.out <- zelig(tempwork ~ education + libcon + latino + income + dem, model = "logit", data = fp) x.out <- setx(z.out) s.out <- sim(z.out, x = x.out) x.lo <- setx(z.out, income = 1, education = 1:10) x.hi <- setx(z.out, income = 5, education = 1:10) s.out <- sim(z.out, x = x.lo, x1 = x.hi, bootstrap = TRUE, num=1000)

80 Model: logit Number of simulations: 1000 Mean Values of X (n = 10) (Intercept) education libcon latino income dem Mean Values of X1 (n = 10) (Intercept) education libcon latino income dem Pooled Expected Values: E(Y X) mean sd 2.5% 97.5% Pooled Predicted Values: Y X Pooled First Differences in Expected Values: E(Y X1)-E(Y X) mean sd 2.5% 97.5% Pooled Risk Ratios: P(Y=1 X1)/P(Y=1 X) mean sd 2.5% 97.5% >

81 Uncertainty Estimation: More Details Imagine y regressed on X giving us parameters b = (X X ) 1 X y.

82 Uncertainty Estimation: More Details Imagine y regressed on X giving us parameters b = (X X ) 1 X y. Standard linear model.

83 Uncertainty Estimation: More Details Imagine y regressed on X giving us parameters b = (X X ) 1 X y. Standard linear model. Often interested in some quantity of interest, like a prediction.

84 Uncertainty Estimation: More Details Imagine y regressed on X giving us parameters b = (X X ) 1 X y. Standard linear model. Often interested in some quantity of interest, like a prediction. Suppose y p is a prediction based on some X p.

85 Uncertainty Estimation: More Details Imagine y regressed on X giving us parameters b = (X X ) 1 X y. Standard linear model. Often interested in some quantity of interest, like a prediction. Suppose y p is a prediction based on some X p. Two kinds of uncertainty: Estimation Uncertainty: Related to sample size. Fundamental Uncertainty: E(Y p ) = µ = X p β.

86 Uncertainty Estimation: More Details In regression setting, variability decomposes as: Y p = X p b + ɛ p = var(x p b) + var(ɛ p ) = X p var(b)(x p ) + σ 2 I = σ 2 X p ((X p ) X p ) 1 + σ 2 I = estimation uncertainty + fundamental uncertainty (5)

87 Uncertainty Estimation: More Details In regression setting, variability decomposes as: Y p = X p b + ɛ p = var(x p b) + var(ɛ p ) = X p var(b)(x p ) + σ 2 I = σ 2 X p ((X p ) X p ) 1 + σ 2 I = estimation uncertainty + fundamental uncertainty (5) Distribution of ˆ Y p is Ŷ p N(X p β, X p var(b)(x p ) )

88 Uncertainty Estimation: More Details In regression setting, variability decomposes as: Y p = X p b + ɛ p = var(x p b) + var(ɛ p ) = X p var(b)(x p ) + σ 2 I = σ 2 X p ((X p ) X p ) 1 + σ 2 I = estimation uncertainty + fundamental uncertainty (5) Distribution of ˆ Y p is Unconditional distribution is: Ŷ p N(X p β, X p var(b)(x p ) ) Ŷ p N(X p β, X p var(b)(x p ) + σ 2 I )

89 Uncertainty Estimation: More Details Because quantities are estimated with uncertainty, these quantities have standard errors around them.

90 Uncertainty Estimation: More Details Because quantities are estimated with uncertainty, these quantities have standard errors around them. Recall in regression setting that s.e. around predicted y include the variance ɛ.

91 Uncertainty Estimation: More Details Because quantities are estimated with uncertainty, these quantities have standard errors around them. Recall in regression setting that s.e. around predicted y include the variance ɛ. Therefore, prediction interval is larger than standard confidence interval.

92 Uncertainty Estimation: More Details Because quantities are estimated with uncertainty, these quantities have standard errors around them. Recall in regression setting that s.e. around predicted y include the variance ɛ. Therefore, prediction interval is larger than standard confidence interval. Quick illustration using standard approaches.

93 > m1<-lm(votes1st~spend_total + incumb + spend_total:incumb) > summary(m1)$coeff Estimate Std. Error t value Pr(> t ) (Intercept) e-03 spend_total e-51 incumb e-19 spend_total:incumb e-06 > > #Five-Number Summary (good to use to understand typical covariate profile) > fivenum(dail$spend_total) [1] > > x0 <- c(1, 75000, 1, 75000) # set some predictor values (cons, spend, incumb, s:i) > (y0 <- sum(x0*coef(m1))) # compute predicted response [1] > fivenum(dail$votes1st) # how typical is this response? [1] > quantile(dail$votes1st,.99, na.rm=t) # versus 99th percentile 99% > x0.df <- data.frame(incumb=1, spend_total=75000) > predict(m1, x0.df) > predict(m1, x0.df, interval="confidence") fit lwr upr > predict(m1, x0.df, interval="prediction") fit lwr upr

94 Uncertainty Estimation: More Details C.I. is much narrower than prediction interval.

95 Uncertainty Estimation: More Details C.I. is much narrower than prediction interval. Predicted y may be a q.i. for us.

96 Uncertainty Estimation: More Details C.I. is much narrower than prediction interval. Predicted y may be a q.i. for us. Estimation uncertainty in non-linear models... two examples

97 Uncertainty Estimation: More Details C.I. is much narrower than prediction interval. Predicted y may be a q.i. for us. Estimation uncertainty in non-linear models... two examples Mean and Variance of Poisson: E(Y ) = e X β, var(y ) = λ

98 Uncertainty Estimation: More Details C.I. is much narrower than prediction interval. Predicted y may be a q.i. for us. Estimation uncertainty in non-linear models... two examples Mean and Variance of Poisson: Mean and Variance of Logit E(Y ) = E(Y ) = e X β, var(y ) = λ 1, var(y ) = π(1 π) 1 + e X β

99 Uncertainty Estimation: More Details Estimation uncertainty requires the Delta Method.

100 Uncertainty Estimation: More Details Estimation uncertainty requires the Delta Method. Taylor series approximation can be applied: ŷ p = g(b) = g(β) + g (β)(b β) +...

101 Uncertainty Estimation: More Details Estimation uncertainty requires the Delta Method. Taylor series approximation can be applied: ŷ p = g(b) = g(β) + g (β)(b β) +... here, g (β) is the first derivative of g(β) wrt β.

102 Uncertainty Estimation: More Details Estimation uncertainty requires the Delta Method. Taylor series approximation can be applied: ŷ p = g(b) = g(β) + g (β)(b β) +... here, g (β) is the first derivative of g(β) wrt β. Dropping all but the first two terms gives var var[g(β)] + var[g (β)(b β)] = g (β)var(b)g (β) (6)

103 Uncertainty Estimation: More Details Estimation uncertainty requires the Delta Method. Taylor series approximation can be applied: ŷ p = g(b) = g(β) + g (β)(b β) +... here, g (β) is the first derivative of g(β) wrt β. Dropping all but the first two terms gives var var[g(β)] + var[g (β)(b β)] = g (β)var(b)g (β) (6) This is the DELTA METHOD.

104 Delta Method and Poisson Poisson: Y e λ λ y y!

105 Delta Method and Poisson Poisson: Y e λ λ y y! This gives the random component.

106 Delta Method and Poisson Poisson: Y e λ λ y y! This gives the random component. Systematic component: λ = e X β

107 Delta Method and Poisson Poisson: Y e λ λ y y! This gives the random component. Systematic component: Fundamental variability: λ = e X β var(y λ) = λ

108 Delta Method and Poisson Estimation variability (score matrix): g (β) = ex β β = X e X β (7) Which gives the element-by-element multiplication.

109 Delta Method and Poisson Estimation variability (score matrix): g (β) = ex β β = X e X β (7) Which gives the element-by-element multiplication. Variance of Ŷ p is: var(ŷ p = (X p e X pb )var(b)(x p e X pb ))

110 Delta Method and Poisson Estimation variability (score matrix): g (β) = ex β β = X e X β (7) Which gives the element-by-element multiplication. Variance of Ŷ p is: var(ŷ p = (X p e X pb )var(b)(x p e X pb )) If we solved this, we d be using the delta method to analytically find the variance of the function.

111 Delta Method and Poisson Estimation variability (score matrix): g (β) = ex β β = X e X β (7) Which gives the element-by-element multiplication. Variance of Ŷ p is: var(ŷ p = (X p e X pb )var(b)(x p e X pb )) If we solved this, we d be using the delta method to analytically find the variance of the function. In turn, c.i. or standard errors could be computed around qi.

112 Or, use simulation methods This is where we were at last week.

113 Or, use simulation methods This is where we were at last week. Stochastic component: Y i f (θ i, α)

114 Or, use simulation methods This is where we were at last week. Stochastic component: Y i f (θ i, α) Systematic Component: θ i = g(x i, β)

115 Or, use simulation methods This is where we were at last week. Stochastic component: Y i f (θ i, α) Systematic Component: θ i = g(x i, β) OLS: Y i = N(µ i, σ 2 ) and µ i = X i β

116 Or, use simulation methods This is where we were at last week. Stochastic component: Y i f (θ i, α) Systematic Component: θ i = g(x i, β) OLS: Y i = N(µ i, σ 2 ) and µ i = X i β Simulated parameter vector: ˆλ = ( ˆβ, ˆα).

117 Or, use simulation methods This is where we were at last week. Stochastic component: Y i f (θ i, α) Systematic Component: θ i = g(x i, β) OLS: Y i = N(µ i, σ 2 ) and µ i = X i β Simulated parameter vector: ˆλ = ( ˆβ, ˆα). By central limit theorem, simulate λ as: ˆλ N(ˆλ, ˆV (ˆλ)).

118 Poisson Example: Benoit data on Frequency of War Issue: Number of wars as a function of democratization levels.

119 Poisson Example: Benoit data on Frequency of War Issue: Number of wars as a function of democratization levels. Original study published in 1996.

120 Poisson Example: Benoit data on Frequency of War Issue: Number of wars as a function of democratization levels. Original study published in Compare simulated q.i. to point-estimates.

121 > #Poisson Model using Zelig > > ## predicted values on Weede dataset from Benoit (1996) > weede <- read.dta("weede.dta") > z.out <- zelig(ssal6080 ~ + fh73+lpopln70+lmilwp70, model="poisson", data=weede) > (x.out <- setx(z.out, fh73=2:14)) (Intercept) fh73 lpopln70 lmilwp > s.out <- sim(z.out, x=x.out) > summary(s.out) Model: poisson Number of simulations: 1000 Mean Values of X (n = 13) (Intercept) fh73 lpopln70 llmilwp Pooled Expected Values: E(Y X) mean sd 2.5% 97.5% Pooled Predicted Values: Y X mean sd 2.5% 97.5%

122 > ## replicate part of Table 3 from Benoit (1996) > z.tab2nbpoldem <- zelig(butterw ~ poldem65, model="negbin", data=weede) > x.tab2nbpoldem <- setx(z.tab2nbpoldem, poldem65=c(0,20,55,85,100)) > s.tab2nbpoldem <- sim(z.tab2nbpoldem, x=x.tab2nbpoldem) > cbind(apply(s.tab2nbpoldem$qi$ev, 2, mean), + apply(s.tab2nbpoldem$qi$ev, 2, sd)) [,1] [,2] [1,] < POLDEM Estimates from Simulations [2,] [3,] [4,] [5,] > z.tab2nbfh73 <- zelig(butterw ~ fh73, model="negbin", data=weede) > x.tab2nbfh73 <- setx(z.tab2nbfh73, fh73=c(2,4,7,12,14)) > s.tab2nbfh73 <- sim(z.tab2nbfh73, x=x.tab2nbfh73) > cbind(apply(s.tab2nbfh73$qi$ev, 2, mean), + apply(s.tab2nbfh73$qi$ev, 2, sd)) [,1] [,2] [1,] < Freedom House Estimates from [2,] Simulations [3,] [4,] [5,] >

123

124 Poisson Example: Benoit data on Frequency of War Estimates based on simulation differ slightly from those reported in original article.

125 Poisson Example: Benoit data on Frequency of War Estimates based on simulation differ slightly from those reported in original article. Why might this be the case?... Delta method vs. nonparametric simulation.

126 Poisson Example: Benoit data on Frequency of War Estimates based on simulation differ slightly from those reported in original article. Why might this be the case?... Delta method vs. nonparametric simulation. Useful to show graphical displays of expected counts.

127 Poisson Example: Benoit data on Frequency of War Estimates based on simulation differ slightly from those reported in original article. Why might this be the case?... Delta method vs. nonparametric simulation. Useful to show graphical displays of expected counts. Plotting c.i. using Zelig with a poisson.

128 > ## replicate top part of Figure 1 from Benoit (1996) > x.tab2nbpoldem <- setx(z.tab2nbpoldem, poldem65=seq(0,100,1)) > s.tab2nbpoldem <- sim(z.tab2nbpoldem, x=x.tab2nbpoldem) > x.tab2nbfh73 <- setx(z.tab2nbfh73, fh73=2:14) > s.tab2nbfh73 <- sim(z.tab2nbfh73, x=x.tab2nbfh73) > par(mfrow=c(2,2), mar=c(4,4,1,1)) > plot.ci(s.tab2nbpoldem, xlab="poldem 1965", ylab="butterworth Wars", + ylim=c(0,7)) > points(weede$poldem65, weede$butterw) > plot.ci(s.tab2nbfh73, xlab="freedom House 1973", ylab="butterworth Wars", + ylim=c(0,7)) > points(weede$fh73, weede$butterw) > par(mfrow=c(2,2)) >

129

130 Post-Estimation Uncertainty: Probit Consider a probit model.

131 Post-Estimation Uncertainty: Probit Consider a probit model. Winning, spending, and incumbency in parliamentary system.

132 Post-Estimation Uncertainty: Probit Consider a probit model. Winning, spending, and incumbency in parliamentary system. Compute first differences, etc. for various scenarios.

133 > ## replicate Table 5 Benoit and Marsh (2009) PRQ > # note: convert.factors=f since this makes dummy vars 0/1 numeric > dail <- read.dta("dailprobit.dta", convert.factors=f) > > z.out <- zelig(wonseat ~ pspend_total*incumb+m, model="probit", data=dail) > > x.lo <- setx(z.out, pspend_total=5, incumb=0, m=4) > x.hi <- setx(z.out, pspend_total=15, incumb=0, m=4) > summary(s.out <- sim(z.out, x=x.lo, x1=x.hi)) Model: probit Number of simulations: 1000 Values of X (Intercept) pspend_total incumb m pspend_total:incumb Values of X1 (Intercept) pspend_total incumb m pspend_total:incumb Expected Values: E(Y X) mean sd 2.5% 97.5% Predicted Values: Y X First Differences in Expected Values: E(Y X1)-E(Y X) mean sd 2.5% 97.5% Risk Ratios: P(Y=1 X1)/P(Y=1 X)

134 mean sd 2.5% 97.5% > > x.lo <- setx(z.out, pspend_total=0, incumb=1, m=4) > x.hi <- setx(z.out, pspend_total=5, incumb=1, m=4) > summary(s.out <- sim(z.out, x=x.lo, x1=x.hi)) Model: probit Number of simulations: 1000 Values of X (Intercept) pspend_total incumb m pspend_total:incumb Values of X1 (Intercept) pspend_total incumb m pspend_total:incumb Expected Values: E(Y X) mean sd 2.5% 97.5% Predicted Values: Y X First Differences in Expected Values: E(Y X1)-E(Y X) mean sd 2.5% 97.5% Risk Ratios: P(Y=1 X1)/P(Y=1 X) mean sd 2.5% 97.5% > > x.lo <- setx(z.out, pspend_total=5, incumb=1, m=4)

135 > x.hi <- setx(z.out, pspend_total=10, incumb=1, m=4) > summary(s.out <- sim(z.out, x=x.lo, x1=x.hi)) Model: probit Number of simulations: 1000 Values of X (Intercept) pspend_total incumb m pspend_total:incumb Values of X1 (Intercept) pspend_total incumb m pspend_total:incumb Expected Values: E(Y X) mean sd 2.5% 97.5% Predicted Values: Y X First Differences in Expected Values: E(Y X1)-E(Y X) mean sd 2.5% 97.5% Risk Ratios: P(Y=1 X1)/P(Y=1 X) mean sd 2.5% 97.5% > > x.lo <- setx(z.out, pspend_total=10, incumb=1, m=4) > x.hi <- setx(z.out, pspend_total=15, incumb=1, m=4) > summary(s.out <- sim(z.out, x=x.lo, x1=x.hi)) Model: probit

136 Number of simulations: 1000 Values of X (Intercept) pspend_total incumb m pspend_total:incumb Values of X1 (Intercept) pspend_total incumb m pspend_total:incumb Expected Values: E(Y X) mean sd 2.5% 97.5% Predicted Values: Y X First Differences in Expected Values: E(Y X1)-E(Y X) mean sd 2.5% 97.5% Risk Ratios: P(Y=1 X1)/P(Y=1 X) mean sd 2.5% 97.5% > > x.lo <- setx(z.out, pspend_total=5, incumb=1, m=4) > x.hi <- setx(z.out, pspend_total=15, incumb=1, m=4) > summary(s.out <- sim(z.out, x=x.lo, x1=x.hi)) Model: probit Number of simulations: 1000 Values of X (Intercept) pspend_total incumb m pspend_total:incumb

137 Values of X1 (Intercept) pspend_total incumb m pspend_total:incumb Expected Values: E(Y X) mean sd 2.5% 97.5% Predicted Values: Y X First Differences in Expected Values: E(Y X1)-E(Y X) mean sd 2.5% 97.5% Risk Ratios: P(Y=1 X1)/P(Y=1 X) mean sd 2.5% 97.5% >

138 Post-Estimation Uncertainty: Probit Graphical displays

139 Post-Estimation Uncertainty: Probit Graphical displays Incumbents vs. Non-Incumbents.

140

141 Graphical Displays This figure conveys a lot of information.

142 Graphical Displays This figure conveys a lot of information. We can see where the estimated relationship separates between the two groups and where it converges, given some value of x.

143 Graphical Displays This figure conveys a lot of information. We can see where the estimated relationship separates between the two groups and where it converges, given some value of x. In general, much easier to visualize.

144 Graphical Displays This figure conveys a lot of information. We can see where the estimated relationship separates between the two groups and where it converges, given some value of x. In general, much easier to visualize. Let s consider some plot-based approaches to analysis.

145 Graphical Displays This figure conveys a lot of information. We can see where the estimated relationship separates between the two groups and where it converges, given some value of x. In general, much easier to visualize. Let s consider some plot-based approaches to analysis. First, ROC then effects displays.

146 Graphical Displays This figure conveys a lot of information. We can see where the estimated relationship separates between the two groups and where it converges, given some value of x. In general, much easier to visualize. Let s consider some plot-based approaches to analysis. First, ROC then effects displays. ROC receiver operator characteristcs

147 Receiver Operator Characteristic Curves: Basic Ideas Useful to understanding predictive power of models.

148 Receiver Operator Characteristic Curves: Basic Ideas Useful to understanding predictive power of models. Developed in Britain during World War II

149 Receiver Operator Characteristic Curves: Basic Ideas Useful to understanding predictive power of models. Developed in Britain during World War II Radar receiver operators were assessed on their ability to differentiate signal from noise.

150 Receiver Operator Characteristic Curves: Basic Ideas Useful to understanding predictive power of models. Developed in Britain during World War II Radar receiver operators were assessed on their ability to differentiate signal from noise. i.e. Germans versus flocks of birds.

151 Receiver Operator Characteristic Curves: Basic Ideas Useful to understanding predictive power of models. Developed in Britain during World War II Radar receiver operators were assessed on their ability to differentiate signal from noise. i.e. Germans versus flocks of birds. What factors influenced skill were important (gain levels, for example.)

152 ROC Curves If we care about prediction, or even if we don t, have to recognize binary GLMs entail classification problems.

153 ROC Curves If we care about prediction, or even if we don t, have to recognize binary GLMs entail classification problems. Think Type-1 and Type-II errors.

154 ROC Curves If we care about prediction, or even if we don t, have to recognize binary GLMs entail classification problems. Think Type-1 and Type-II errors. Suppose p is prediction of success; q is prediction of failure. Let p and q denote the true value. With binary classification we have four possible outcomes: p p True Positive q p False Negative p q False Positive q q True Negative

155

156 ROC Curves Imagine two rates: a true positive rate (TPR) and a false positive rate (FPR).

157 ROC Curves Imagine two rates: a true positive rate (TPR) and a false positive rate (FPR). ROC curves plot the relationship between these two rates.

158 ROC Curves Imagine two rates: a true positive rate (TPR) and a false positive rate (FPR). ROC curves plot the relationship between these two rates. In this sense, it is sensitivity analysis: TPR is sensitivity rate and FPR is 1 sensitivity.

159 ROC Curves Imagine two rates: a true positive rate (TPR) and a false positive rate (FPR). ROC curves plot the relationship between these two rates. In this sense, it is sensitivity analysis: TPR is sensitivity rate and FPR is 1 sensitivity. These curves are sometimes called sensitivity vs. 1-sensitivity graphs.

160 ROC Curves Can use ROC curves to evaluate competing models or compare models on subgroups.

161 ROC Curves Can use ROC curves to evaluate competing models or compare models on subgroups. Implemented in Stata and in R.

162 ROC Curves Can use ROC curves to evaluate competing models or compare models on subgroups. Implemented in Stata and in R. In R, packages ROCR and Zelig will work.

163 ROC Curves Can use ROC curves to evaluate competing models or compare models on subgroups. Implemented in Stata and in R. In R, packages ROCR and Zelig will work. In Zelig, the ROC plot will return a plot of the probability of 0 against the probability of 1.

164 ROC Curves Can use ROC curves to evaluate competing models or compare models on subgroups. Implemented in Stata and in R. In R, packages ROCR and Zelig will work. In Zelig, the ROC plot will return a plot of the probability of 0 against the probability of 1. Imagine the two extremes in such a plot.

165 ROC Curves Can use ROC curves to evaluate competing models or compare models on subgroups. Implemented in Stata and in R. In R, packages ROCR and Zelig will work. In Zelig, the ROC plot will return a plot of the probability of 0 against the probability of 1. Imagine the two extremes in such a plot. Then imagine a 45-degree line connecting these two points.

166 ROC Curves Can use ROC curves to evaluate competing models or compare models on subgroups. Implemented in Stata and in R. In R, packages ROCR and Zelig will work. In Zelig, the ROC plot will return a plot of the probability of 0 against the probability of 1. Imagine the two extremes in such a plot. Then imagine a 45-degree line connecting these two points. Models falling on this line would essentially be equivalent to random guesses.

167

168 ROC Curves and Effects Displays Departures from the 45-degree line suggests the model s improvement over merely guessing on the 45-degree line.

169 ROC Curves and Effects Displays Departures from the 45-degree line suggests the model s improvement over merely guessing on the 45-degree line. Covariates seem to do a better job accounting for challengers than for incumbents.

170 ROC Curves and Effects Displays Departures from the 45-degree line suggests the model s improvement over merely guessing on the 45-degree line. Covariates seem to do a better job accounting for challengers than for incumbents. Why? Potentially useful question to try and answer.

171 ROC Curves and Effects Displays Departures from the 45-degree line suggests the model s improvement over merely guessing on the 45-degree line. Covariates seem to do a better job accounting for challengers than for incumbents. Why? Potentially useful question to try and answer. Comparing alternative models.

172 ROC Curves and Effects Displays Departures from the 45-degree line suggests the model s improvement over merely guessing on the 45-degree line. Covariates seem to do a better job accounting for challengers than for incumbents. Why? Potentially useful question to try and answer. Comparing alternative models. Illustration: California field poll data on Prop. 86 (2006)

173 ROC Curves and Effects Displays Departures from the 45-degree line suggests the model s improvement over merely guessing on the 45-degree line. Covariates seem to do a better job accounting for challengers than for incumbents. Why? Potentially useful question to try and answer. Comparing alternative models. Illustration: California field poll data on Prop. 86 (2006) Begin with Zelig model.

174 > #Setting education to its mean and adjusting the Latino covariate > > x.lo <- setx(z.out, conservative=0, liberal=1, latino=0) > x.hi <- setx(z.out, conservative=0, liberal=1, latino=1) > summary(s.out <- sim(z.out, x=x.lo, x1=x.hi)) Model: logit Number of simulations: 1000 Values of X (Intercept) logage conservative liberal latino education smoke100 latino:education Values of X1 (Intercept) logage conservative liberal latino education smoke100 latino:education Expected Values: E(Y X) mean sd 2.5% 97.5% Predicted Values: Y X First Differences in Expected Values: E(Y X1)-E(Y X) mean sd 2.5% 97.5% Risk Ratios: P(Y=1 X1)/P(Y=1 X) mean sd 2.5% 97.5% >

175 Call: zelig(formula = yeson86 ~ logage + conservative + liberal + latino * education + smoke100, model = "logit", data = fp) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) ** logage *** conservative ** liberal *** latino e-05 *** education * smoke e-06 *** latino:education ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: on 707 degrees of freedom Residual deviance: on 700 degrees of freedom AIC: Number of Fisher Scoring iterations: 4

176 ROC Curves, Effects Displays Difficult with interaction to understand effect.

177 ROC Curves, Effects Displays Difficult with interaction to understand effect. We see the conditioning effect is negative.

178 ROC Curves, Effects Displays Difficult with interaction to understand effect. We see the conditioning effect is negative. As education increases, probability of support decreases among Latinos

179 ROC Curves, Effects Displays Difficult with interaction to understand effect. We see the conditioning effect is negative. As education increases, probability of support decreases among Latinos Side-by-side plots (non-pretty)

180

181

182 ROC Curves Back to ROC plots.

183 ROC Curves Back to ROC plots. Illustrate with reduced model.

184 ROC Curves Back to ROC plots. Illustrate with reduced model. Smokers only!

185 ROC Curves Back to ROC plots. Illustrate with reduced model. Smokers only! Let s inspect a ROC plot.

186

187

188 Effects Displays Lots of our models have higher order terms.

189 Effects Displays Lots of our models have higher order terms. Polynomials Quadratics Interactions

190 Effects Displays Lots of our models have higher order terms. Polynomials Quadratics Interactions Difficult to present results in tabular form sometimes.

191 Effects Displays Lots of our models have higher order terms. Polynomials Quadratics Interactions Difficult to present results in tabular form sometimes. John Fox has developed an R package called effects

192 Effects Displays Lots of our models have higher order terms. Polynomials Quadratics Interactions Difficult to present results in tabular form sometimes. John Fox has developed an R package called effects Useful with higher order terms or with multiequation models.

193 Effects Displays As learned in POL 212, higher order terms yield complicated standard errors.

194 Effects Displays As learned in POL 212, higher order terms yield complicated standard errors. Effects may be conditional.

195 Effects Displays As learned in POL 212, higher order terms yield complicated standard errors. Effects may be conditional. Plots w/predicted effects on the higher-order relative might be useful to do.

196 Effects Displays As learned in POL 212, higher order terms yield complicated standard errors. Effects may be conditional. Plots w/predicted effects on the higher-order relative might be useful to do. Quick illustration using the Prop 86 data.

197 Effects Displays As learned in POL 212, higher order terms yield complicated standard errors. Effects may be conditional. Plots w/predicted effects on the higher-order relative might be useful to do. Quick illustration using the Prop 86 data. Estimate model with conditional effect.

198 Effects Displays As learned in POL 212, higher order terms yield complicated standard errors. Effects may be conditional. Plots w/predicted effects on the higher-order relative might be useful to do. Quick illustration using the Prop 86 data. Estimate model with conditional effect. Inspect the coefficients and plot them.

199 > support.mod <- glm(yeson86 ~ logage + conservative + liberal + latino * education + smoke100, + family=binomial(link="logit"), data=fp) > > summary(support.mod) Call: glm(formula = yeson86 ~ logage + conservative + liberal + latino * education + smoke100, family = binomial(link = "logit"), data = fp) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) ** logage *** conservative ** liberal *** latino e-05 *** education * smoke e-06 *** latino:education ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: on 707 degrees of freedom Residual deviance: on 700 degrees of freedom AIC: Number of Fisher Scoring iterations: 4

200 > > Anova(support.mod) Anova Table (Type II tests) Response: yeson86 LR Chisq Df Pr(>Chisq) logage *** conservative ** liberal *** latino *** education smoke e-07 *** latino:education ** --- Signif. codes: 0 *** ** 0.01 * >

201 Effects Displays Conditional effect holds.

202 Effects Displays Conditional effect holds. Consider ROC plot

203

204 Effects Displays Use effects package to plot conditional effect of education by Latino.

205 eff <- effect("latino:education", support.mod, xlevels=list(latino=0:1, education=1:10)) plot(eff, multiline=false, ylab="probability(support)", rug=false) (Thanks AKM for the help!)

206

207 Effects Displays Illustration using Fox s data on race and arrests.

208 Effects Displays Illustration using Fox s data on race and arrests. Data available in effects package.

209 Effects Displays Illustration using Fox s data on race and arrests. Data available in effects package. Some code.

210 > data(arrests) > Arrests$year <-as.factor(arrests$year) > arrests.mod <- glm(released ~ employed + citizen + checks + colour*year + colour*age, + family=binomial, data=arrests_ + summary(arrests.mod) Error: unexpected symbol in: " family=binomial, data=arrests_ summary" > data(arrests) > Arrests$year <-as.factor(arrests$year) > arrests.mod <- glm(released ~ employed + citizen + checks + colour*year + colour*age, + family=binomial, data=arrests) > summary(arrests.mod) Call: glm(formula = released ~ employed + citizen + checks + colour * year + colour * age, family = binomial, data = Arrests) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) employedyes < 2e-16 *** citizenyes e-07 *** checks < 2e-16 *** colourwhite *** year year year year year

Linear Regression Models P8111

Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started