PS 203 Spring 2002 Homework One - Answer Key

Size: px

Start display at page:

Download "PS 203 Spring 2002 Homework One - Answer Key"

Bryan White
5 years ago
Views:

1 PS 203 Spring 2002 Homework One - Answer Key 1. If you have a home or office computer, download and install WinBUGS. If you don t have your own computer, try running WinBUGS in the Department lab. 2. The data set aes96media (on the PS203 website) contains survey data on voting in the 1996 Australian Federal Election. The variables are: y, a dummy for whether survey respondent i voted for the Labor Party (1) or not (0) PID, an indicator of partisanship (1 for Strong Labor, 2 for Weak Labor, 3 for Lean Labor, 4 for Independents, 5 for Lean Conservative, 6 for Weak Conservative, 7 for Strong Conservative). media, a scale measure of media consumption through the election campaign (0 through 1 corresponding to low through high) quiz, a scale measure of the respondent s level of political information, ascertained through a series of objective true/false items administered at the end of the survey (0 through 1 corresponding to low through high) Use the available predictors to model the voting outcomes y, via logit. Use a series of dummy variables for each level of party identification (collapse the weak and strong conservative categories, since no strong conservatives voted Labor). Include an interaction between media and quiz. (a) Briefly interpret the coefficients and report on the fit of the model to the data. The party identification dummies perform as expected, with a steady monotonic decreasing pattern from Strong Labor through to Weak/Strong Conservative. The media and quiz coefficients tap the effect of a unit change in one of these variables when the other is set to its lowest level. Thus, the coefficient on the media exposure variable suggests that when political information is at its lowest level (zero), increased media exposure decreases the probability of voting Labor. Alternatively, the coefficient on quiz suggests that when media exposure is at its lowest level (zero), as political information increases, the probability of a Labor vote also decreases. The positive interaction term indicates that as both variables increase, these negative effects eventually turn into positive effects (see the next question and Figure 1). (b) Use the estimated coefficients to solve for the level of political information z such that conditional on z, media consumption has no impact on the probability of voting for the ALP. The logit model can be written as p i = F(l i ) l i = α + x i1 b 1 + x i2 b 2 + x i1 x i2 b 3 1

2 and we seek z = x i2 such that p i / x i1 = 0. Note that Note that f (l i ) 0, l i, so p i = F(l i) l i x i1 l i x i1 = f (l i ) (b 1 + x i2 b 3 ) p i x i1 = 0 b 1 + x i2 b 3 = 0 z = -b 1 b 3, where b 1 is the coefficient on media and b 3 is the coefficient on the interaction between media and quiz. This ratio is / = That is, for respondents with a quiz score of.66, there is no relationship between media exposure and the probability of voting Labor. Note that this is a very high level of quiz, corresponding to the 77th to 87th percentiles of this variable. Note also that for values of quiz greater than.66, the effect of media exposure is actually positive. See the contour plot in Figure 1. (c) Use simulation methods to obtain a 95% confidence bound for z. To do this, I simply sampled from the multivariate Normal distribution implied for b by the MLEs. That is, from a Bayesian perspective if we have flat priors over b, then the posterior for b is proportional to the likelihood, and so ) p(b data) N (ˆbMLE, V(ˆb MLE ). Then we can induce a posterior on g(b), denoted p(g(b) data), by repeating the following steps many times (t = 1,..., T): i. sample b (t) from p(b data) ii. form g (t) = g(b (t). In this case, g(b) = -b 1 b 3. Since this is a ratio of two (correlated) random variables, p(g(b) data) has Cauchy-like properties with extremely heavy tails. In fact, the more Monte Carlo simulations we draw, the further we probe into the heavy tails. With 500,000 draws, the median of p(g(b) data),.66 is equal to the value implied by the MLEs, and with a 95 percent confidence interval extending outside the unit interval on which quiz is measured. A fifty percent bound (the inter-quartile range) is [.56,.87]. 3. Consider the lung cancer data presented in class (from Johnson and Albert s Ordinal Data Modeling, p35). Eight-six lung cancer patients and a matched sample of 86 controls were questioned about their smoking habits. The two groups were chosen to represent random samples from a subpopulation of lung-cancer patietns and an otherwise similar population of cancer-free individuals. The following table summarises the data: Cancer Control Smokers Nonsmokers

3 Quiz Media 0.05 Figure 1: Predicted Probabilities of Labor Vote, as a function of Quiz (political information) and Media (media exposure) 3

4 Let 0 < p L < 1 and 0 < p C < 1 denote the population proportions of lung-cancer patients and controls who smoke, respectively. Assume a binomial model for the data and independence (both within and across groups). (a) With uninformative (uniform) priors on p L and p C, report the posterior means of these parameters, along with 95% credible intervals. The uninformative (uniform) priors for p L and p C are equivalent to Beta(1,1) distributions, yielding the posteriors with the following characteristics: p(p L data) Beta(83 + 1, 3 + 1) p(p C data) Beta(72 + 1, ) Parameter Mean Mode (MLE) 2.5% 97.5% p L 84/ / p L 73/ / (b) Consider the quantity d = p L - p C. With the same uninformative priors on p L and p C, summarize the posterior density implied by the model for d. d has a posterior mean of.125, with a 95 percent confidence interval extending from.04 to.22. (c) Compare your Bayesian inferences about p L, p C and d with those from a classical, likelihood analysis. For p L and p C, see the table above. Via independence across the two groups, the MLE is simply of d is simply the difference of the two within-group point estimates, or (83-72)/86 = 11/86 or about.128. Note that point estimate corresponds to the posterior mode obtained with flat priors. To obtain a confidence interval for d, I rely on asymptotically-valid normal approximations as summaries of uncertainty in the group-specific MLEs. This is convenient, since the normal is completely characterized by its mean and variance, and so a 95% confidence interval for the MLE of d can then be obtained by simply adding/substracting 1.96 standard errors to the MLE of d. By independence across groups, the variance of the MLE of d is V(ˆd MLE ) = V( ˆp L MLE - ˆp C MLE) = V( ˆp L MLE) + V( ˆp C MLE) = p L(1 - p L ) + p C(1 - p C ) n n times the square root of this variance is the half-width of a 95% bound for d is [.041,.215], which is not dissimilar from that obtained via the Bayesian simulation procedure. That is, the asymptotic normal approximation is not bad with this sample size (n.b., n=86). 4

5 (d) Prior Sensitivity Analysis: Imagine that after seeing the data in the table (above), a skeptic maintains that he is still not convinced that p L > p C. Assume this skeptic has an uninformative prior on p C. Find a prior on p L that rationalizes the skeptic s posterior beliefs. There is an infinite set of Beta priors for p L that will rationalize the skeptic s beliefs. To provide a sense of the mapping from prior to posterior, I use a data-equivalent representation of the family of priors, summarizing a Beta (α, b) density with its mean, and an equivalent sample size. I summarize the mapping from the skeptic s priors (a two-space, since the Beta density takes two parameters), into the posterior probability that p L > p C. 4. In generating state-level forecasts of the 2000 presidential vote, Jackman and Rivers used historical election results as priors. For instance, the average of Democratic presidential vote share in 1988, 1992 and 1996 was used to generate a prior for forecasting the 2000 outcomes. For California, this averaging of historical results yields a prior mean for Democratic vote share of of 48.4%. Jackman and Rivers complete the specification of their prior by assuming that after controlling for period-specific national-level shocks, vote shares vary randomly around a stable long-term average level specific to each state. They estimated this within-state random component to have a standard deviation of 3.1 percentage points. This prior information is to be combined with poll numbers from the 2000 election season to generate state-level forecasts. For instance, a Zogby poll of 436 Californian likely voters fielded on August 23, 2000 found 42% support for Gore. Use Bayesian methods to combine the historical prior information with the poll information to come up with a posterior density over Gore support in California. Report the posterior mean and a 95% confidence interval. Hints: you will have to first convert the prior information into a form suitable for pooling with the poll data (or vice-versa). For instance, if you assume a binomial model for the poll data, then you will have to convert the historical prior information into a conjugate Beta density. On the other hand, you might assume that a normal model and prior is a suitable characterization of the information in the poll and the historical data, in which case you will need to convert the poll information into a form captured by the parameters of a normal distribution. First, try converting the poll information into a form suitable for pooling (via Bayes Rule) with the historical information. The historical information is expressed as a mean and a standard deviation, which, for convenience, we can interpret as the sufficient statistics of a normal distribution. The poll information can also be expressed in terms of the sufficient statistics of a normal distribution, i.e., mean =.42 and variance var(p) = p(1 - p) n = = while the historical analysis yields a variance of = Pooling this information via Bayes Rule yields a variance of v = ( ) -1 = 1/ =

6 Prior Precision as Equivalent N Prior Pr(Smoker Lung Cancer) Figure 2: Mapping from Prior over p L to Posterior Mean of d. The contour lines connect points in the prior space for p L (defined as a prior mean and an equivalent prior n) that give rise to the same posterior mean for d. For instance, an uninformative prior (prior mean =.5 and prior sample size of zero) yields a posterior mean for d of just over.1. The observed data for lung cancer patients (solid square) and the control group (open circle) are also represented in this prior space for comparison. 6

7 or a standard deviation of 1.88 percentage points. The pooled (or posterior) mean is x = = / Another approach is to turn the historical information into a form suitable for pooling with the poll information. This can be done by treating the historical information as the equivalent of a Beta prior for the binomial poll data. The historical information has mean.484 and variance , which we can use to solve for the parameters of a Beta (α, b) distribution, noting that α α + b =.484 αb (α + b) 2 (α + b + 1) = Solving for α and b yields α and b The binomial data from the poll can be represented as y = successes from n = 436 trials, and so the posterior is a Beta density with parameters = and = This Beta distribution has mean.444, and variance , or standard deviation about 1.88 percentage points. Note that the two approaches to this problem yield identical answers. 5. Given data y = (y 1, y 2,..., y n ), consider the model y i iid N(h 1 + h 2, 1), i = 1,..., n. Prove that (a) h 1 and h 2 are unidentified. The likelihood for these iid normal data is n p(y; h 1, h 2, r 2 = 1) = p(y i ; h 1, h 2, r 2 = 1) and so lnp(y) -1 2 = = = n (y i - h 1 - h 2 ) 2 i=1 i=1 n u(y i ; h 1, h 2, r 2 = 1) i=1 n [ ] 1 -(yi - h 1 - h 2 ) 2 exp i=1 2p 2 [ n - n i=1 exp (y ] i - h 1 - h 2 ) 2 2p 2 = -1 2 ( y 2 i + nh nh 2 2-2h 1 yi - 2h 2 yi + 2nh 1 h 2 ) 7

8 Now we have the following derivatives: lnp(y) h 1 = -nh 1 + y i - nh 2 lnp(y) h 2 = -nh 1 + y i - nh 2 2 lnp(y) = -n h lnp(y) h 2 2 = -n 2 lnp(y) h 1 h 2 = 2 lnp(y) h 2 h 1 = -n and so the Hessian (the matrix of second derivatives) of the log-likelihood is [ ] -n -n H = = -ni -n -n which is clearly not of full column rank (column one is a linear combination of column two, and vice-versa), and hence singular. This implies that the likelihood function does not have a unique maximum with respect to h = (h 1, h 2 ) and so the parameters are not identified. (b) h 1 + h 2 is identified. This is rather trivial and I will not elaborate here. The model for the mean is now re-parameterized as l = h 1 + h 2. Twice differentiate the log-likelihood function with respect to l; the 2nd derivative is -n, implying that the likelihood over l has a unique maximum. (c) normal priors with finite variances on h 1 and h 2 are sufficient to identify h 1 and h 2. Harder problem. The strategy of proof is to note that since a posterior is proportional to a prior times a likelihood, a log-posterior is proportional to the log prior plus the log-likelihood, and further, the Hessian of the log-posterior equals the Hessian of the log-prior plus the Hessian of the log-likelihood. We have shown that the Hessian of the log-likelihood is of not full rank. It remains to be shown that with proper priors, this is no longer the case and, further, that the Hessian of the log-posterior is now negative definite, implying a unique posterior mode for h = (h 1, h 2 ). 8

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.