Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38

Similar documents
Ecological Inference

A Consensus on Second-Stage Analyses in Ecological Inference Models

4 Extending King s Ecological Inference Model to Multiple Elections Using Markov Chain Monte Carlo

Ecological inference with distribution regression

CHAPTER 1: Preliminary Description of Errors Experiment Methodology and Errors To introduce the concept of error analysis, let s take a real world

AP Statistics Review Ch. 7

ECNS 561 Multiple Regression Analysis

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

LECTURE 15: SIMPLE LINEAR REGRESSION I

Hint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit

Online Appendix to The Political Economy of the U.S. Mortgage Default Crisis Not For Publication

The Simple Linear Regression Model

EMERGING MARKETS - Lecture 2: Methodology refresher

1. Capitalize all surnames and attempt to match with Census list. 3. Split double-barreled names apart, and attempt to match first half of name.

Ordinary Least Squares Regression

Lecture Notes Part 7: Systems of Equations

Regression Discontinuity

HWA CHONG INSTITUTION 2016 JC2 PRELIMINARY EXAMINATION. Tuesday 20 September hours. List of Formula (MF15)

Statistical Models for Causal Analysis

Machine Learning, Fall 2009: Midterm

Statistical Inference for Means

Chapter 11. Regression with a Binary Dependent Variable

Probabilistic Machine Learning. Industrial AI Lab.

Predicting the Treatment Status

Machine Learning, Midterm Exam

Regression Discontinuity

The Importance of the Median Voter

Econ 325: Introduction to Empirical Economics

Binary Logistic Regression

SALES AND MARKETING Department MATHEMATICS. 2nd Semester. Bivariate statistics. Tutorials and exercises

ECE521 week 3: 23/26 January 2017

Regression Discontinuity Designs

How to Use the Internet for Election Surveys

Gibbs Sampling in Endogenous Variables Models

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

1 Introduction Overview of the Book How to Use this Book Introduction to R 10

Final Exam - Solutions

Week 2: Review of probability and statistics

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Statistical Analysis of Causal Mechanisms

Gibbs Sampling in Linear Models #2

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Gov 2002: 3. Randomization Inference

Truncation and Censoring

Econometrics Summary Algebraic and Statistical Preliminaries

BIG IDEAS. Area of Learning: SOCIAL STUDIES Urban Studies Grade 12. Learning Standards. Curricular Competencies

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

CSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement

Forecasting the 2012 Presidential Election from History and the Polls

For more information about how to cite these materials visit

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

Supplemental Material for Policy Deliberation and Voter Persuasion: Experimental Evidence from an Election in the Philippines

review session gov 2000 gov 2000 () review session 1 / 38

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Introduction to Statistical Inference

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Bias Variance Trade-off

Statistical Analysis of Causal Mechanisms

Ecological Regression with Partial Identification

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Weakly informative priors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Generative Learning algorithms

Gibbs Sampling in Latent Variable Models #1

CSC 411: Lecture 09: Naive Bayes

Econometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points]

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Classification: The rest of the story

Mid-term exam Practice problems

Math 10 - Compilation of Sample Exam Questions + Answers

Weakly informative priors

Selection on Observables

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

CSC321 Lecture 5 Learning in a Single Neuron

Preliminary Results on Social Learning with Partial Observations

GEOGRAPHIC INFORMATION SYSTEMS

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

Sociology Exam 2 Answer Key March 30, 2012

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

MA Advanced Econometrics: Applying Least Squares to Time Series

To Hold Out or Not. Frank Schorfheide and Ken Wolpin. April 4, University of Pennsylvania

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

1 Review of the dot product

HUDM4122 Probability and Statistical Inference. February 2, 2015

Learning Objectives. Zeroes. The Real Zeros of a Polynomial Function

multilevel modeling: concepts, applications and interpretations

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Last few slides from last time

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Instrumental Variables

Multiple Regression Analysis

Eco517 Fall 2014 C. Sims FINAL EXAM

SOLUTIONS Problem Set 2: Static Entry Games

SALES AND MARKETING Department MATHEMATICS. 2nd Semester. Bivariate statistics. SOLUTIONS of tutorials and exercises

Transcription:

Advanced Quantitative Research Methodology Lecture Notes: Ecological Inference 1 Gary King http://gking.harvard.edu January 28, 2012 1 c Copyright 2008 Gary King, All Rights Reserved. Gary King http://gking.harvard.edu () Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38

Reading Reading: Gary King. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton University Press, 1997 Gary King () Ecological Inference 2 / 38

Preliminaries Gary King () Ecological Inference 3 / 38

Preliminaries Definition: Ecological Inference is the process of using aggregate (i.e., ecological ) data to infer discrete individual-level relationships of interest when individual-level data are not available. Gary King () Ecological Inference 3 / 38

Preliminaries Definition: Ecological Inference is the process of using aggregate (i.e., ecological ) data to infer discrete individual-level relationships of interest when individual-level data are not available. History of the Problem: Gary King () Ecological Inference 3 / 38

Preliminaries Definition: Ecological Inference is the process of using aggregate (i.e., ecological ) data to infer discrete individual-level relationships of interest when individual-level data are not available. History of the Problem: 1. Ogburn and Goltra (1919) in the very first multivariate statistical analysis of politics in a political science journal made ecological inferences and recognized the problem. The big issue in 1919: are the newly enfranchised women going to take over the political system? They regressed votes in referenda in Oregon precincts on the percent of women in each precinct. But they worried: Gary King () Ecological Inference 3 / 38

Preliminaries Definition: Ecological Inference is the process of using aggregate (i.e., ecological ) data to infer discrete individual-level relationships of interest when individual-level data are not available. History of the Problem: 1. Ogburn and Goltra (1919) in the very first multivariate statistical analysis of politics in a political science journal made ecological inferences and recognized the problem. The big issue in 1919: are the newly enfranchised women going to take over the political system? They regressed votes in referenda in Oregon precincts on the percent of women in each precinct. But they worried: It is also theoretically possible to gerrymander the precincts in such a way that there may be a negative correlative even though men and women each distribute their votes 50 to 50 on a given measure... (Ogburn and Goltra, 1919). Gary King () Ecological Inference 3 / 38

Preliminaries Gary King () Ecological Inference 4 / 38

Preliminaries 2. Robinson s (1950) clarified the problem, causing: Gary King () Ecological Inference 4 / 38

Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. Gary King () Ecological Inference 4 / 38

Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. Gary King () Ecological Inference 4 / 38

Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. Gary King () Ecological Inference 4 / 38

Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. History of Solutions: A 45-year war between supporters of Gary King () Ecological Inference 4 / 38

Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. History of Solutions: A 45-year war between supporters of 1. Duncan and Davis (1953): a deterministic solution. Gary King () Ecological Inference 4 / 38

Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. History of Solutions: A 45-year war between supporters of 1. Duncan and Davis (1953): a deterministic solution. 2. Goodman (1953, 1959): a statistical solution. Gary King () Ecological Inference 4 / 38

Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. History of Solutions: A 45-year war between supporters of 1. Duncan and Davis (1953): a deterministic solution. 2. Goodman (1953, 1959): a statistical solution. 3. for 50 years, no other methods used in applications. Gary King () Ecological Inference 4 / 38

If you can avoid making ecological inferences, do so! Gary King () Ecological Inference 5 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. Gary King () Ecological Inference 5 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? Gary King () Ecological Inference 5 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? Gary King () Ecological Inference 5 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? 4. Banking: Are banks complying with red-lining laws? Are there areas with certain types of people who might take out loans but have not? Gary King () Ecological Inference 5 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? 4. Banking: Are banks complying with red-lining laws? Are there areas with certain types of people who might take out loans but have not? 5. Candidates for office: How do good representatives decide what policies they should favor? How can candidates tailor campaign appeals and target voter groups? Gary King () Ecological Inference 5 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? 4. Banking: Are banks complying with red-lining laws? Are there areas with certain types of people who might take out loans but have not? 5. Candidates for office: How do good representatives decide what policies they should favor? How can candidates tailor campaign appeals and target voter groups? 6. Sociology: Do the unemployed commit more crimes or is it just that there are more crimes in unemployed areas? Gary King () Ecological Inference 5 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? 4. Banking: Are banks complying with red-lining laws? Are there areas with certain types of people who might take out loans but have not? 5. Candidates for office: How do good representatives decide what policies they should favor? How can candidates tailor campaign appeals and target voter groups? 6. Sociology: Do the unemployed commit more crimes or is it just that there are more crimes in unemployed areas? 7. Economics: With some exceptions, most theories are based on assumptions about individuals, but most data are on groups. Gary King () Ecological Inference 5 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? Gary King () Ecological Inference 6 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? 9. Atmospheric physics: How can we tell which types of the vehicles actually on the roads emit more carbon dioxide and carbon monoxide? Gary King () Ecological Inference 6 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? 9. Atmospheric physics: How can we tell which types of the vehicles actually on the roads emit more carbon dioxide and carbon monoxide? 10. Oceanography: How many marine organisms of a certain type were collected at a given depth, from fishing nets dropped from the surface down through a variety of depths. Gary King () Ecological Inference 6 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? 9. Atmospheric physics: How can we tell which types of the vehicles actually on the roads emit more carbon dioxide and carbon monoxide? 10. Oceanography: How many marine organisms of a certain type were collected at a given depth, from fishing nets dropped from the surface down through a variety of depths. 11. Epidemiology: Does radon cause lung cancer? Gary King () Ecological Inference 6 / 38

If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? 9. Atmospheric physics: How can we tell which types of the vehicles actually on the roads emit more carbon dioxide and carbon monoxide? 10. Oceanography: How many marine organisms of a certain type were collected at a given depth, from fishing nets dropped from the surface down through a variety of depths. 11. Epidemiology: Does radon cause lung cancer? 12. Changes in public opinion: How to use repeated independent cross-sectional surveys to measure individual change? Gary King () Ecological Inference 6 / 38

The Problem: The District Level Race of Voting Age Voting Decision Person Democrat Republican No vote black??? 55,054 white??? 25,706 19,896 10,936 49,928 80,760 The Ecological Inference Problem at the District-Level: The 1990 Election to the Ohio State House, District 42. The goal is to infer from the marginal entries (each of which is the sum of the corresponding row or column) to the cell entries. (Note information in the bounds.) Gary King () Ecological Inference 7 / 38

The Problem: The Precinct Level Race of Voting Age Voting Decision Person Democrat Republican No vote black??? 221 white??? 484 130 92 483 705 The Ecological Inference Problem at the Precinct-Level: Precinct P in District 42 (1 of 131 in the district). The goal is to infer from the margins of a set of tables like this one to the cell entries in each. Gary King () Ecological Inference 8 / 38

The best we could do, circa 1996 Estimated Percent of Blacks Year District Voting for the Democratic Candidate 1986 12 95.65% 23 100.06 29 103.47 31 98.92 42 108.41 45 93.58 1988 12 95.67 23 102.64 29 105.00 31 100.20 42 111.05 45 97.49 Sample Ecological Inferences: All Ohio State House districts where an African American Democrat ran against a white Republican, 1986 1990 (Source: Statement of Gordon G. Henderson, presented as an exhibit in federal court, using Goodman s regression). Figures above 100% are logically impossible. Gary King () Ecological Inference 9 / 38

The best we could do, circa 1996: Continued Estimated Percent of Blacks Year District Voting for the Democratic Candidate 1990 12 94.79% 14 97.83 16 94.36 23 101.09 25 98.83 29 103.42 31 102.17 36 101.35 37 101.39 42 109.63 45 97.62 Sample Ecological Inferences: All Ohio State House districts where an African American Democrat ran against a white Republican, 1986 1990 (Source: Statement of Gordon G. Henderson, presented as an exhibit in federal court, using Goodman s regression). Figures above 100% are logically impossible. Gary King () Ecological Inference 10 / 38

What Information Does The New Method Provide? Goodman s Method: One incorrect number (5 standard deviations outside the deterministic bounds) Gary King () Ecological Inference 11 / 38

What Information Does The New Method Provide? Goodman s Method: One incorrect number (5 standard deviations outside the deterministic bounds) The New Method: Gary King () Ecological Inference 11 / 38

What Information Does The New Method Provide? Goodman s Method: One incorrect number (5 standard deviations outside the deterministic bounds) The New Method: Non-minority Turnout in New Jersey Cities and Towns. In contrast to the best existing methods, which provide one (incorrect) number for the entire state, the method offered here gives an accurate estimate of white turnout for all 567 minor civil divisions in the state, a few of which are labeled. Gary King () Ecological Inference 11 / 38

Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Gary King () Ecological Inference 12 / 38

Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: Gary King () Ecological Inference 12 / 38

Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i Gary King () Ecological Inference 12 / 38

Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i X i = Black proportion of Voting Age Population in precinct i Gary King () Ecological Inference 12 / 38

Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i X i = Black proportion of Voting Age Population in precinct i Unobserved quantities of interest: Gary King () Ecological Inference 12 / 38

Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i X i = Black proportion of Voting Age Population in precinct i Unobserved quantities of interest: β b i = fraction of blacks who vote in precinct i Gary King () Ecological Inference 12 / 38

Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i X i = Black proportion of Voting Age Population in precinct i Unobserved quantities of interest: βi b βi w = fraction of blacks who vote in precinct i = fraction of whites who vote in precinct i Gary King () Ecological Inference 12 / 38

Notation An accounting identity (a fact, not an assumption): Gary King () Ecological Inference 13 / 38

Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) Gary King () Ecological Inference 13 / 38

Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Gary King () Ecological Inference 13 / 38

Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Goodman s regression: Gary King () Ecological Inference 13 / 38

Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Goodman s regression: Run a regression of T i on X i and (1 X i ) (no constant term). Coefficients are intended to be: Gary King () Ecological Inference 13 / 38

Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Goodman s regression: Run a regression of T i on X i and (1 X i ) (no constant term). Coefficients are intended to be: B b, District-wide black turnout Gary King () Ecological Inference 13 / 38

Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Goodman s regression: Run a regression of T i on X i and (1 X i ) (no constant term). Coefficients are intended to be: B b, District-wide black turnout B w, District-wide white turnout Gary King () Ecological Inference 13 / 38

Selected Problems with the Goodman s Approach Gary King () Ecological Inference 14 / 38

Selected Problems with the Goodman s Approach If we follow Goodman s advice, we won t apply the model. Gary King () Ecological Inference 14 / 38

Selected Problems with the Goodman s Approach If we follow Goodman s advice, we won t apply the model. If we don t follow Goodman s advice & apply it anyway: Gary King () Ecological Inference 14 / 38

Selected Problems with the Goodman s Approach If we follow Goodman s advice, we won t apply the model. If we don t follow Goodman s advice & apply it anyway: 1. We know parameters are not constant 1.75 T i.5.25 0 0.25.5.75 1 X i Precincts in Marion County, Indiana: Voter Turnout for the U.S. Senate by Fraction Black, 1990. Gary King () Ecological Inference 14 / 38

Selected Problems with the Goodman s Approach The accounting identity, T i = β b i X i + β w i (1 X i ), contains no error other than due to parameter variation. Thus, all scatter around the regression line is due to parameter variation. Gary King () Ecological Inference 15 / 38

Selected Problems with the Goodman s Approach The accounting identity, T i = β b i X i + β w i (1 X i ), contains no error other than due to parameter variation. Thus, all scatter around the regression line is due to parameter variation. 2. Goodman s model does not take into account information from the method of bounds or from massive heteroskedasticity in aggregate data. See the graph. Gary King () Ecological Inference 15 / 38

Selected Problems with the Goodman s Approach The accounting identity, T i = β b i X i + β w i (1 X i ), contains no error other than due to parameter variation. Thus, all scatter around the regression line is due to parameter variation. 2. Goodman s model does not take into account information from the method of bounds or from massive heteroskedasticity in aggregate data. See the graph. 3. Goodman s regression is biased in the presence of aggregation bias: C(β b i, X i) 0 or C(β w i, X i ) 0 (True in any regression even if not ecological.) Gary King () Ecological Inference 15 / 38

Selected Problems with the Goodman s Approach Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: E(T i ) = (γ 0 + γ 1 X i )X i + (θ 0 + θ 1 X i )(1 X i ) Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: E(T i ) = (γ 0 + γ 1 X i )X i + (θ 0 + θ 1 X i )(1 X i ) = θ 0 + (γ 0 + θ 1 θ 0 )X i (γ 1 θ 1 )X 2 i Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: E(T i ) = (γ 0 + γ 1 X i )X i + (θ 0 + θ 1 X i )(1 X i ) = θ 0 + (γ 0 + θ 1 θ 0 )X i (γ 1 θ 1 )X 2 i (e) Model is not identified: Four parameters need to be estimated (γ 0, γ 1, θ 0, and θ 1 ), but only 3 can be estimated (θ 0 and coefficients in parens on X i and X 2 i ). Gary King () Ecological Inference 16 / 38

Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: E(T i ) = (γ 0 + γ 1 X i )X i + (θ 0 + θ 1 X i )(1 X i ) = θ 0 + (γ 0 + θ 1 θ 0 )X i (γ 1 θ 1 )X 2 i (e) Model is not identified: Four parameters need to be estimated (γ 0, γ 1, θ 0, and θ 1 ), but only 3 can be estimated (θ 0 and coefficients in parens on X i and X 2 i ). 5. If the number of people differs across precinct, Goodman s model is not estimating the correct quantity of interest. Gary King () Ecological Inference 16 / 38

The Data 1.75 T i.5.25 0 0.25.5.75 1 X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Gary King () Ecological Inference 17 / 38

The Data 1.75 T i.5.25 0 0.25.5.75 1 X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Solve the accounting identity: Gary King () Ecological Inference 17 / 38

The Data 1.75 T i.5.25 0 0.25.5.75 1 X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Solve the accounting identity: T i = β w i + (β b i β w i )X i Gary King () Ecological Inference 17 / 38

The Data 1.75 T i.5.25 0 0.25.5.75 1 X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Solve the accounting identity: for the unknowns: T i = β w i + (β b i β w i )X i Gary King () Ecological Inference 17 / 38

The Data 1.75 T i.5.25 0 0.25.5.75 1 X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Solve the accounting identity: for the unknowns: β w i = T i = β w i + (β b i β w i )X i Ti 1 X i ««Xi βi b 1 X i Gary King () Ecological Inference 17 / 38

The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 Gary King () Ecological Inference 18 / 38

The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 β w 52 = T 52 1 X 52 X 52 1 X 52 β b 52 Gary King () Ecological Inference 18 / 38

The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 β w 52 = T 52 1 X 52 X 52 1 X 52 β b 52 =.19 1.88.88 1.88 βb 52 Gary King () Ecological Inference 18 / 38

The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 β52 w = T 52 X 52 β52 b 1 X 52 1 X 52 =.19 1.88.88 1.88 βb 52 = 1.58 7.33β52 b Gary King () Ecological Inference 18 / 38

The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 β52 w = T 52 X 52 β52 b 1 X 52 1 X 52 =.19 1.88.88 1.88 βb 52 = 1.58 7.33β52 b 1.75 β w i.5.25 0 0.25.5.75 1 β b i Gary King () Ecological Inference 18 / 38

The Model for Data Without Aggregation Bias, But Robust in its Presence Gary King () Ecological Inference 19 / 38

The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Gary King () Ecological Inference 19 / 38

The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): Gary King () Ecological Inference 19 / 38

The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): T i = β b i X i + β w i (1 X i ) Gary King () Ecological Inference 19 / 38

The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): T i = β b i X i + β w i (1 X i ) add three assumptions (in the basic version of the model): Gary King () Ecological Inference 19 / 38

0 2 4 6 8 0 2 4 6 8 0.1 0.2 0.3 0.4 0.5 0.6 The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): T i = β b i X i + β w i (1 X i ) add three assumptions (in the basic version of the model): 1. β b i and β w i are truncated bivariate normal: 1 1 0.8 0.6 0.4 0.2 0 β w i β b i 0 0.2 0.4 0.6 0.8 1 0.8 0.6 0.4 0.2 0 β w i β b i 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 β w i 0.4 0.2 0 β b i 0 0.2 0.4 0.6 0.8 1 (a) 0.5 0.5 0.15 0.15 0 (b) 0.1 0.9 0.15 0.15 0 (c) 0.8 0.8 0.6 0.6 0.5 Gary King () Ecological Inference 19 / 38

0 2 4 6 8 0 2 4 6 8 0.1 0.2 0.3 0.4 0.5 0.6 The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): T i = β b i X i + β w i (1 X i ) add three assumptions (in the basic version of the model): 1. β b i and β w i are truncated bivariate normal: 1 1 0.8 0.6 0.4 0.2 0 β w i β b i 0 0.2 0.4 0.6 0.8 1 0.8 0.6 0.4 0.2 0 β w i β b i 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 β w i 0.4 0.2 0 β b i 0 0.2 0.4 0.6 0.8 1 (a) 0.5 0.5 0.15 0.15 0 (b) 0.1 0.9 0.15 0.15 0 (c) 0.8 0.8 0.6 0.6 0.5 (The 5 parameters of this density need to be estimated by forming the likelihood.) Gary King () Ecological Inference 19 / 38

The Model for Data Without Aggregation Bias, But Robust in its Presence Gary King () Ecological Inference 20 / 38

The Model for Data Without Aggregation Bias, But Robust in its Presence 2. No aggregation bias (a priori): β b i and β w i mean independent of X i. Allows a posteriori aggregation bias (i.e., after conditioning on T i ) Gary King () Ecological Inference 20 / 38

The Model for Data Without Aggregation Bias, But Robust in its Presence 2. No aggregation bias (a priori): β b i and β w i mean independent of X i. Allows a posteriori aggregation bias (i.e., after conditioning on T i ) 3. No spatial autocorrelation: T i X i are independent over observations. Gary King () Ecological Inference 20 / 38

Deriving the Likelihood Function Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} These are on the untruncated scale (and not quantities of interest) since: Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} These are on the untruncated scale (and not quantities of interest) since: TN(β b i, β w i B, Σ) = N(β b i, β w i B, Σ) 1(βb i, βw i ) R( B, Σ) Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} These are on the untruncated scale (and not quantities of interest) since: where TN(β b i, β w i B, Σ) = N(β b i, β w i B, Σ) 1(βb i, βw i ) R( B, Σ) Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} These are on the untruncated scale (and not quantities of interest) since: where R( B, Σ) = TN(β b i, β w i B, Σ) = N(β b i, β w i B, Σ) 1(βb i, βw i ) R( B, Σ) Z 1 Z 1 N(β b, β w B, Σ)dβ b dβ w (volume above unit square) 0 0 Gary King () Ecological Inference 21 / 38

Deriving the Likelihood Function Gary King () Ecological Inference 22 / 38

Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) Gary King () Ecological Inference 22 / 38

Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: Gary King () Ecological Inference 22 / 38

Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) P(T i ψ) Gary King () Ecological Inference 22 / 38

Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) = X i (0,1) P(T i ψ) ( What we observe ) What we could have observed Gary King () Ecological Inference 22 / 38

Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) = X i (0,1) = X i (0,1) P(T i ψ) ( What we observe What we could have observed ) ( ) Area above line segment Volume above square Gary King () Ecological Inference 22 / 38

Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) = X i (0,1) = X i (0,1) = X i (0,1) P(T i ψ) ( What we observe What we could have observed ) ( ) Area above line segment Volume above square ) ( ) Area above line segment ( Area above line Volume above plane Area above line ( Volume above square Volume above plane ) Gary King () Ecological Inference 22 / 38

Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) = X i (0,1) = X i (0,1) = X i (0,1) = X i (0,1) P(T i ψ) ( What we observe What we could have observed ) ( ) Area above line segment Volume above square ) ( ) Area above line segment ( Area above line Volume above plane N(T i µ i, σ 2 i ) S( B, Σ) R( B, Σ) Area above line ( Volume above square Volume above plane ) Gary King () Ecological Inference 22 / 38

Deriving the Likelihood Function Gary King () Ecological Inference 23 / 38

Deriving the Likelihood Function where Gary King () Ecological Inference 23 / 38

Deriving the Likelihood Function where E(T i X i ) µ i = B b X i + B w (1 X i ), Gary King () Ecological Inference 23 / 38

Deriving the Likelihood Function where E(T i X i ) µ i = B b X i + B w (1 X i ), V (T i X i ) σ 2 i = ( σ 2 w ) + (2 σ bw 2 σ 2 w )X i + ( σ 2 b + σ 2 w 2 σ bw )X 2 i, Gary King () Ecological Inference 23 / 38

Deriving the Likelihood Function where E(T i X i ) µ i = B b X i + B w (1 X i ), V (T i X i ) σ 2 i = ( σ 2 w ) + (2 σ bw 2 σ 2 w )X i + ( σ 2 b + σ 2 w 2 σ bw )X 2 i, min 1, T i X i S( B, Σ) = max 0, T (1 X i ) X i ( N β b B b + ω ) i ɛ i, σ b 2 ω2 i σ i σi 2 dβ b Gary King () Ecological Inference 23 / 38

Deriving the Likelihood Function 6. A visual version of the likelihood: 1.75 β w i.5.25 0 0.25.5.75 1 β b i Gary King () Ecological Inference 24 / 38

The Truncated Bivariate Normal Distribution s Five Parameters Can be Estimated From Aggregate Data: Intuition (a) 0.5 0.5 0.2 0.2-0.95 0.25.5.75 1 0.25.5.75 1 X i T i (b) 0.5 0.5 0.2 0.2 0.95 0.25.5.75 1 0.25.5.75 1 X i T i (c) 0.7 0.3 0.2 0.2-0.64 0.25.5.75 1 0.25.5.75 1 X i T i (d) 0.8 0.5 0.1 0.3 0.4 0.25.5.75 1 0.25.5.75 1 X i T i (e) 0.4 0.8 0.3 0.1-0.48 0.25.5.75 1 0.25.5.75 1 X i T i (f) 0.2 0.7 0.2 0.2 0.57 0.25.5.75 1 0.25.5.75 1 X i T i Data were randomly generated from the model with parameter values B b, B w, σ b, σ w, and ρ, at the top of each graph. The solid line is the expected value and dashed lines are at plus and minus one standard deviation. Gary King () Ecological Inference 25 / 38

Another view of how the data change with the model 1 (a) 0.5 0.7 0.2 0.2-0.8 1 (d) 0.1 0.1 0.2 0.3-0.8.75.75 β w i.5 β w i.5.25 0 0.25.5.75 1 β b i.25 0 0.25.5.75 1 β b i 1.75 (b) 0.5 0.7 0.2 0.2 0 1.75 (e) 0.9 0.9 0.1 0.3 0 β w i.5 β w i.5.25.25 0 0.25.5.75 1 β b i 0 0.25.5.75 1 β b i 1 (c) 0.5 0.7 0.2 0.2 0.8 1 (f) -0.05-0.05 0.2 0.3 0.8.75.75 β w i.5 β w i.5.25.25 0 0 0.25.5.75 1 0.25.5.75 1 β b i β b i Observable Implications for Sample Parameter Values. The numbers at the top of each tomography plot are the parameter values for the distribution from which data were randomly generated: B b, B w, σ b, σ w, and ρ. Gary King () Ecological Inference 26 / 38

Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it Rearranging the basic accounting identity gives βi w βi b: as a linear function of Gary King () Ecological Inference 27 / 38

Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it Rearranging the basic accounting identity gives βi w as a linear function of βi b: ( ) ( ) βi w Ti Xi = βi b 1 X i 1 X i Gary King () Ecological Inference 27 / 38

Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it Rearranging the basic accounting identity gives βi w as a linear function of βi b: ( ) ( ) βi w Ti Xi = βi b 1 X i 1 X i Thus, knowing T i and X i in one precinct narrows the possible values of βi b, βw i to one line cut across this figure: Gary King () Ecological Inference 27 / 38

Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it Rearranging the basic accounting identity gives βi w as a linear function of βi b: ( ) ( ) βi w Ti Xi = βi b 1 X i 1 X i Thus, knowing T i and X i in one precinct narrows the possible values of βi b, βw i to one line cut across this figure: 1.75 β w i.5.25 A Tomography Plot 0 0.25.5.75 1 β b i Gary King () Ecological Inference 27 / 38

Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it P 48 25 20 15 10 5 0 P 115 25 20 15 10 5 0 P 195 25 20 15 10 5 0 P 238 25 20 15 10 5 0 0.2.4.6.8 1 β b i Gary King () Ecological Inference 28 / 38

How to Calculate Quantities of Interest Gary King () Ecological Inference 29 / 38

How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities Gary King () Ecological Inference 29 / 38

How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: Gary King () Ecological Inference 29 / 38

How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. Gary King () Ecological Inference 29 / 38

How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. ii. Draw β b i and β w i from TN(β b i, β w i B, Σ), given the simulated parameters, ψ = { B, Σ}. Gary King () Ecological Inference 29 / 38

How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. ii. Draw β b i and β w i from TN(β b i, β w i B, Σ), given the simulated parameters, ψ = { B, Σ}. iii. Compute the weighted average of the simulated coefficients (weights based on precinct population): Gary King () Ecological Inference 29 / 38

How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. ii. Draw βi b and βi w from TN(βi b, βi w B, Σ), given the simulated parameters, ψ = { B, Σ}. iii. Compute the weighted average of the simulated coefficients (weights based on precinct population): px B b N b+ i β i b = N b+ + i=1 Gary King () Ecological Inference 29 / 38

How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. ii. Draw βi b and βi w from TN(βi b, βi w B, Σ), given the simulated parameters, ψ = { B, Σ}. iii. Compute the weighted average of the simulated coefficients (weights based on precinct population): px B b N b+ i β i b = N b+ + (b) Problem: We only get knowledge of the district-wide aggregate & its not robust. i=1 Gary King () Ecological Inference 29 / 38

How to Calculate Quantities of Interest Gary King () Ecological Inference 30 / 38

How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: Gary King () Ecological Inference 30 / 38

How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). Gary King () Ecological Inference 30 / 38

How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. Gary King () Ecological Inference 30 / 38

How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. Gary King () Ecological Inference 30 / 38

How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. (d) Alternative algorithm for drawing simulations of βi b and βi w. Gary King () Ecological Inference 30 / 38

How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. (d) Alternative algorithm for drawing simulations of βi b and βi w. i. Find the expression for P(β b i T i, ψ) analytically, which is a particular truncated univariate normal (see King, 1997: Appendix C). Gary King () Ecological Inference 30 / 38

How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. (d) Alternative algorithm for drawing simulations of βi b and βi w. i. Find the expression for P(β b i T i, ψ) analytically, which is a particular truncated univariate normal (see King, 1997: Appendix C). ii. Draw ψ from its posterior or sampling density (the same multivariate normal as always). Gary King () Ecological Inference 30 / 38

How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. (d) Alternative algorithm for drawing simulations of βi b and βi w. i. Find the expression for P(β b i T i, ψ) analytically, which is a particular truncated univariate normal (see King, 1997: Appendix C). ii. Draw ψ from its posterior or sampling density (the same multivariate normal as always). iii. Insert the simulation into P(β b i T i, ψ) and draw out one simulated β b i. Gary King () Ecological Inference 30 / 38