Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38
|
|
- Nathaniel Matthews
- 5 years ago
- Views:
Transcription
1 Advanced Quantitative Research Methodology Lecture Notes: Ecological Inference 1 Gary King January 28, c Copyright 2008 Gary King, All Rights Reserved. Gary King () Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38
2 Reading Reading: Gary King. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton University Press, 1997 Gary King () Ecological Inference 2 / 38
3 Preliminaries Gary King () Ecological Inference 3 / 38
4 Preliminaries Definition: Ecological Inference is the process of using aggregate (i.e., ecological ) data to infer discrete individual-level relationships of interest when individual-level data are not available. Gary King () Ecological Inference 3 / 38
5 Preliminaries Definition: Ecological Inference is the process of using aggregate (i.e., ecological ) data to infer discrete individual-level relationships of interest when individual-level data are not available. History of the Problem: Gary King () Ecological Inference 3 / 38
6 Preliminaries Definition: Ecological Inference is the process of using aggregate (i.e., ecological ) data to infer discrete individual-level relationships of interest when individual-level data are not available. History of the Problem: 1. Ogburn and Goltra (1919) in the very first multivariate statistical analysis of politics in a political science journal made ecological inferences and recognized the problem. The big issue in 1919: are the newly enfranchised women going to take over the political system? They regressed votes in referenda in Oregon precincts on the percent of women in each precinct. But they worried: Gary King () Ecological Inference 3 / 38
7 Preliminaries Definition: Ecological Inference is the process of using aggregate (i.e., ecological ) data to infer discrete individual-level relationships of interest when individual-level data are not available. History of the Problem: 1. Ogburn and Goltra (1919) in the very first multivariate statistical analysis of politics in a political science journal made ecological inferences and recognized the problem. The big issue in 1919: are the newly enfranchised women going to take over the political system? They regressed votes in referenda in Oregon precincts on the percent of women in each precinct. But they worried: It is also theoretically possible to gerrymander the precincts in such a way that there may be a negative correlative even though men and women each distribute their votes 50 to 50 on a given measure... (Ogburn and Goltra, 1919). Gary King () Ecological Inference 3 / 38
8 Preliminaries Gary King () Ecological Inference 4 / 38
9 Preliminaries 2. Robinson s (1950) clarified the problem, causing: Gary King () Ecological Inference 4 / 38
10 Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. Gary King () Ecological Inference 4 / 38
11 Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. Gary King () Ecological Inference 4 / 38
12 Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. Gary King () Ecological Inference 4 / 38
13 Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. History of Solutions: A 45-year war between supporters of Gary King () Ecological Inference 4 / 38
14 Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. History of Solutions: A 45-year war between supporters of 1. Duncan and Davis (1953): a deterministic solution. Gary King () Ecological Inference 4 / 38
15 Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. History of Solutions: A 45-year war between supporters of 1. Duncan and Davis (1953): a deterministic solution. 2. Goodman (1953, 1959): a statistical solution. Gary King () Ecological Inference 4 / 38
16 Preliminaries 2. Robinson s (1950) clarified the problem, causing: (a) several literatures to wither, including studies of local and regional politics through aggregate electoral statistics in favor of survey research based on national samples. (b) the development of a methodological literature devoted to solving the problem. 3. Hundreds of other articles have helped us understand the problem. History of Solutions: A 45-year war between supporters of 1. Duncan and Davis (1953): a deterministic solution. 2. Goodman (1953, 1959): a statistical solution. 3. for 50 years, no other methods used in applications. Gary King () Ecological Inference 4 / 38
17 If you can avoid making ecological inferences, do so! Gary King () Ecological Inference 5 / 38
18 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. Gary King () Ecological Inference 5 / 38
19 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? Gary King () Ecological Inference 5 / 38
20 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? Gary King () Ecological Inference 5 / 38
21 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? 4. Banking: Are banks complying with red-lining laws? Are there areas with certain types of people who might take out loans but have not? Gary King () Ecological Inference 5 / 38
22 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? 4. Banking: Are banks complying with red-lining laws? Are there areas with certain types of people who might take out loans but have not? 5. Candidates for office: How do good representatives decide what policies they should favor? How can candidates tailor campaign appeals and target voter groups? Gary King () Ecological Inference 5 / 38
23 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? 4. Banking: Are banks complying with red-lining laws? Are there areas with certain types of people who might take out loans but have not? 5. Candidates for office: How do good representatives decide what policies they should favor? How can candidates tailor campaign appeals and target voter groups? 6. Sociology: Do the unemployed commit more crimes or is it just that there are more crimes in unemployed areas? Gary King () Ecological Inference 5 / 38
24 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 1. Public policy: Applying the Voting Rights Act. 2. History: Who voted for the Nazi s? 3. Marketing: What types of people buy your products? 4. Banking: Are banks complying with red-lining laws? Are there areas with certain types of people who might take out loans but have not? 5. Candidates for office: How do good representatives decide what policies they should favor? How can candidates tailor campaign appeals and target voter groups? 6. Sociology: Do the unemployed commit more crimes or is it just that there are more crimes in unemployed areas? 7. Economics: With some exceptions, most theories are based on assumptions about individuals, but most data are on groups. Gary King () Ecological Inference 5 / 38
25 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? Gary King () Ecological Inference 6 / 38
26 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? 9. Atmospheric physics: How can we tell which types of the vehicles actually on the roads emit more carbon dioxide and carbon monoxide? Gary King () Ecological Inference 6 / 38
27 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? 9. Atmospheric physics: How can we tell which types of the vehicles actually on the roads emit more carbon dioxide and carbon monoxide? 10. Oceanography: How many marine organisms of a certain type were collected at a given depth, from fishing nets dropped from the surface down through a variety of depths. Gary King () Ecological Inference 6 / 38
28 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? 9. Atmospheric physics: How can we tell which types of the vehicles actually on the roads emit more carbon dioxide and carbon monoxide? 10. Oceanography: How many marine organisms of a certain type were collected at a given depth, from fishing nets dropped from the surface down through a variety of depths. 11. Epidemiology: Does radon cause lung cancer? Gary King () Ecological Inference 6 / 38
29 If you can avoid making ecological inferences, do so! Some of those who aren t so lucky: 8. Education: Do students who attend private schools through a voucher system do as well as students who can afford to attend on their own? 9. Atmospheric physics: How can we tell which types of the vehicles actually on the roads emit more carbon dioxide and carbon monoxide? 10. Oceanography: How many marine organisms of a certain type were collected at a given depth, from fishing nets dropped from the surface down through a variety of depths. 11. Epidemiology: Does radon cause lung cancer? 12. Changes in public opinion: How to use repeated independent cross-sectional surveys to measure individual change? Gary King () Ecological Inference 6 / 38
30 The Problem: The District Level Race of Voting Age Voting Decision Person Democrat Republican No vote black??? 55,054 white??? 25,706 19,896 10,936 49,928 80,760 The Ecological Inference Problem at the District-Level: The 1990 Election to the Ohio State House, District 42. The goal is to infer from the marginal entries (each of which is the sum of the corresponding row or column) to the cell entries. (Note information in the bounds.) Gary King () Ecological Inference 7 / 38
31 The Problem: The Precinct Level Race of Voting Age Voting Decision Person Democrat Republican No vote black??? 221 white??? The Ecological Inference Problem at the Precinct-Level: Precinct P in District 42 (1 of 131 in the district). The goal is to infer from the margins of a set of tables like this one to the cell entries in each. Gary King () Ecological Inference 8 / 38
32 The best we could do, circa 1996 Estimated Percent of Blacks Year District Voting for the Democratic Candidate % Sample Ecological Inferences: All Ohio State House districts where an African American Democrat ran against a white Republican, (Source: Statement of Gordon G. Henderson, presented as an exhibit in federal court, using Goodman s regression). Figures above 100% are logically impossible. Gary King () Ecological Inference 9 / 38
33 The best we could do, circa 1996: Continued Estimated Percent of Blacks Year District Voting for the Democratic Candidate % Sample Ecological Inferences: All Ohio State House districts where an African American Democrat ran against a white Republican, (Source: Statement of Gordon G. Henderson, presented as an exhibit in federal court, using Goodman s regression). Figures above 100% are logically impossible. Gary King () Ecological Inference 10 / 38
34 What Information Does The New Method Provide? Goodman s Method: One incorrect number (5 standard deviations outside the deterministic bounds) Gary King () Ecological Inference 11 / 38
35 What Information Does The New Method Provide? Goodman s Method: One incorrect number (5 standard deviations outside the deterministic bounds) The New Method: Gary King () Ecological Inference 11 / 38
36 What Information Does The New Method Provide? Goodman s Method: One incorrect number (5 standard deviations outside the deterministic bounds) The New Method: Non-minority Turnout in New Jersey Cities and Towns. In contrast to the best existing methods, which provide one (incorrect) number for the entire state, the method offered here gives an accurate estimate of white turnout for all 567 minor civil divisions in the state, a few of which are labeled. Gary King () Ecological Inference 11 / 38
37 Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Gary King () Ecological Inference 12 / 38
38 Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: Gary King () Ecological Inference 12 / 38
39 Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i Gary King () Ecological Inference 12 / 38
40 Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i X i = Black proportion of Voting Age Population in precinct i Gary King () Ecological Inference 12 / 38
41 Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i X i = Black proportion of Voting Age Population in precinct i Unobserved quantities of interest: Gary King () Ecological Inference 12 / 38
42 Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i X i = Black proportion of Voting Age Population in precinct i Unobserved quantities of interest: β b i = fraction of blacks who vote in precinct i Gary King () Ecological Inference 12 / 38
43 Notation Vote No vote black βi b 1 βi b X i white βi w 1 βi w 1 X i T i 1 T i Notation for Precinct i (i = 1,..., p). Observed variables: T i = voter Turnout in precinct i X i = Black proportion of Voting Age Population in precinct i Unobserved quantities of interest: βi b βi w = fraction of blacks who vote in precinct i = fraction of whites who vote in precinct i Gary King () Ecological Inference 12 / 38
44 Notation An accounting identity (a fact, not an assumption): Gary King () Ecological Inference 13 / 38
45 Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) Gary King () Ecological Inference 13 / 38
46 Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Gary King () Ecological Inference 13 / 38
47 Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Goodman s regression: Gary King () Ecological Inference 13 / 38
48 Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Goodman s regression: Run a regression of T i on X i and (1 X i ) (no constant term). Coefficients are intended to be: Gary King () Ecological Inference 13 / 38
49 Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Goodman s regression: Run a regression of T i on X i and (1 X i ) (no constant term). Coefficients are intended to be: B b, District-wide black turnout Gary King () Ecological Inference 13 / 38
50 Notation An accounting identity (a fact, not an assumption): T i = β b i X i + β w i (1 X i ) = β w i + (β b i β w i )X i Goodman s regression: Run a regression of T i on X i and (1 X i ) (no constant term). Coefficients are intended to be: B b, District-wide black turnout B w, District-wide white turnout Gary King () Ecological Inference 13 / 38
51 Selected Problems with the Goodman s Approach Gary King () Ecological Inference 14 / 38
52 Selected Problems with the Goodman s Approach If we follow Goodman s advice, we won t apply the model. Gary King () Ecological Inference 14 / 38
53 Selected Problems with the Goodman s Approach If we follow Goodman s advice, we won t apply the model. If we don t follow Goodman s advice & apply it anyway: Gary King () Ecological Inference 14 / 38
54 Selected Problems with the Goodman s Approach If we follow Goodman s advice, we won t apply the model. If we don t follow Goodman s advice & apply it anyway: 1. We know parameters are not constant 1.75 T i X i Precincts in Marion County, Indiana: Voter Turnout for the U.S. Senate by Fraction Black, Gary King () Ecological Inference 14 / 38
55 Selected Problems with the Goodman s Approach The accounting identity, T i = β b i X i + β w i (1 X i ), contains no error other than due to parameter variation. Thus, all scatter around the regression line is due to parameter variation. Gary King () Ecological Inference 15 / 38
56 Selected Problems with the Goodman s Approach The accounting identity, T i = β b i X i + β w i (1 X i ), contains no error other than due to parameter variation. Thus, all scatter around the regression line is due to parameter variation. 2. Goodman s model does not take into account information from the method of bounds or from massive heteroskedasticity in aggregate data. See the graph. Gary King () Ecological Inference 15 / 38
57 Selected Problems with the Goodman s Approach The accounting identity, T i = β b i X i + β w i (1 X i ), contains no error other than due to parameter variation. Thus, all scatter around the regression line is due to parameter variation. 2. Goodman s model does not take into account information from the method of bounds or from massive heteroskedasticity in aggregate data. See the graph. 3. Goodman s regression is biased in the presence of aggregation bias: C(β b i, X i) 0 or C(β w i, X i ) 0 (True in any regression even if not ecological.) Gary King () Ecological Inference 15 / 38
58 Selected Problems with the Goodman s Approach Gary King () Ecological Inference 16 / 38
59 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. Gary King () Ecological Inference 16 / 38
60 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. Gary King () Ecological Inference 16 / 38
61 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. Gary King () Ecological Inference 16 / 38
62 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) Gary King () Ecological Inference 16 / 38
63 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: Gary King () Ecological Inference 16 / 38
64 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: E(T i ) = (γ 0 + γ 1 X i )X i + (θ 0 + θ 1 X i )(1 X i ) Gary King () Ecological Inference 16 / 38
65 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: E(T i ) = (γ 0 + γ 1 X i )X i + (θ 0 + θ 1 X i )(1 X i ) = θ 0 + (γ 0 + θ 1 θ 0 )X i (γ 1 θ 1 )X 2 i Gary King () Ecological Inference 16 / 38
66 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: E(T i ) = (γ 0 + γ 1 X i )X i + (θ 0 + θ 1 X i )(1 X i ) = θ 0 + (γ 0 + θ 1 θ 0 )X i (γ 1 θ 1 )X 2 i (e) Model is not identified: Four parameters need to be estimated (γ 0, γ 1, θ 0, and θ 1 ), but only 3 can be estimated (θ 0 and coefficients in parens on X i and X 2 i ). Gary King () Ecological Inference 16 / 38
67 Selected Problems with the Goodman s Approach 4. We cannot correct for aggregation bias within Goodman s framework. (a) The good idea that doesn t work: since the coefficients vary with X i, let s model that explicitly, hence using X i to control for the covariation. (b) More specifically, even if C(β b i, X i) 0, if we control for Z i it might be true that C(β b i, X i Z i ) = 0. And if Z i = X i, its true for sure. (c) Take Goodman s regression E(T i ) = B b X i + B w (1 X i ) (d) Let B b = γ 0 + γ 1 X i and B w = θ 0 + θ 1 X i and substitute: E(T i ) = (γ 0 + γ 1 X i )X i + (θ 0 + θ 1 X i )(1 X i ) = θ 0 + (γ 0 + θ 1 θ 0 )X i (γ 1 θ 1 )X 2 i (e) Model is not identified: Four parameters need to be estimated (γ 0, γ 1, θ 0, and θ 1 ), but only 3 can be estimated (θ 0 and coefficients in parens on X i and X 2 i ). 5. If the number of people differs across precinct, Goodman s model is not estimating the correct quantity of interest. Gary King () Ecological Inference 16 / 38
68 The Data 1.75 T i X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Gary King () Ecological Inference 17 / 38
69 The Data 1.75 T i X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Solve the accounting identity: Gary King () Ecological Inference 17 / 38
70 The Data 1.75 T i X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Solve the accounting identity: T i = β w i + (β b i β w i )X i Gary King () Ecological Inference 17 / 38
71 The Data 1.75 T i X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Solve the accounting identity: for the unknowns: T i = β w i + (β b i β w i )X i Gary King () Ecological Inference 17 / 38
72 The Data 1.75 T i X i A Scattercross Graph of Voter Turnout by Fraction Hispanic Solve the accounting identity: for the unknowns: β w i = T i = β w i + (β b i β w i )X i Ti 1 X i ««Xi βi b 1 X i Gary King () Ecological Inference 17 / 38
73 The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 Gary King () Ecological Inference 18 / 38
74 The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 β w 52 = T 52 1 X 52 X 52 1 X 52 β b 52 Gary King () Ecological Inference 18 / 38
75 The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 β w 52 = T 52 1 X 52 X 52 1 X 52 β b 52 = βb 52 Gary King () Ecological Inference 18 / 38
76 The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 β52 w = T 52 X 52 β52 b 1 X 52 1 X 52 = βb 52 = β52 b Gary King () Ecological Inference 18 / 38
77 The Data: Continued Precinct 52: T 52 =.19, X 52 =.88 β52 w = T 52 X 52 β52 b 1 X 52 1 X 52 = βb 52 = β52 b 1.75 β w i β b i Gary King () Ecological Inference 18 / 38
78 The Model for Data Without Aggregation Bias, But Robust in its Presence Gary King () Ecological Inference 19 / 38
79 The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Gary King () Ecological Inference 19 / 38
80 The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): Gary King () Ecological Inference 19 / 38
81 The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): T i = β b i X i + β w i (1 X i ) Gary King () Ecological Inference 19 / 38
82 The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): T i = β b i X i + β w i (1 X i ) add three assumptions (in the basic version of the model): Gary King () Ecological Inference 19 / 38
83 The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): T i = β b i X i + β w i (1 X i ) add three assumptions (in the basic version of the model): 1. β b i and β w i are truncated bivariate normal: β w i β b i β w i β b i β w i β b i (a) (b) (c) Gary King () Ecological Inference 19 / 38
84 The Model for Data Without Aggregation Bias, But Robust in its Presence The Goal: Knowledge of β b i and β w i in each precinct. Begin with the basic accounting identity (not an assumption of linearity): T i = β b i X i + β w i (1 X i ) add three assumptions (in the basic version of the model): 1. β b i and β w i are truncated bivariate normal: β w i β b i β w i β b i β w i β b i (a) (b) (c) (The 5 parameters of this density need to be estimated by forming the likelihood.) Gary King () Ecological Inference 19 / 38
85 The Model for Data Without Aggregation Bias, But Robust in its Presence Gary King () Ecological Inference 20 / 38
86 The Model for Data Without Aggregation Bias, But Robust in its Presence 2. No aggregation bias (a priori): β b i and β w i mean independent of X i. Allows a posteriori aggregation bias (i.e., after conditioning on T i ) Gary King () Ecological Inference 20 / 38
87 The Model for Data Without Aggregation Bias, But Robust in its Presence 2. No aggregation bias (a priori): β b i and β w i mean independent of X i. Allows a posteriori aggregation bias (i.e., after conditioning on T i ) 3. No spatial autocorrelation: T i X i are independent over observations. Gary King () Ecological Inference 20 / 38
88 Deriving the Likelihood Function Gary King () Ecological Inference 21 / 38
89 Deriving the Likelihood Function 1. The story of the model is that we learn things in order Gary King () Ecological Inference 21 / 38
90 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. Gary King () Ecological Inference 21 / 38
91 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. Gary King () Ecological Inference 21 / 38
92 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). Gary King () Ecological Inference 21 / 38
93 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal Gary King () Ecological Inference 21 / 38
94 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} Gary King () Ecological Inference 21 / 38
95 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} These are on the untruncated scale (and not quantities of interest) since: Gary King () Ecological Inference 21 / 38
96 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} These are on the untruncated scale (and not quantities of interest) since: TN(β b i, β w i B, Σ) = N(β b i, β w i B, Σ) 1(βb i, βw i ) R( B, Σ) Gary King () Ecological Inference 21 / 38
97 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} These are on the untruncated scale (and not quantities of interest) since: where TN(β b i, β w i B, Σ) = N(β b i, β w i B, Σ) 1(βb i, βw i ) R( B, Σ) Gary King () Ecological Inference 21 / 38
98 Deriving the Likelihood Function 1. The story of the model is that we learn things in order (a) (As in regression), everything is conditional on X i, which means we learn it first. (b) Then the world draws β b i and β w i from a truncated normal, but we don t get to see them. (c) Finally, we learn T i, which is computed via the accounting identity deterministically: T i = β b i X i + β w i (1 X i ). 2. The random variable is then T (given X ), which is truncated bivarate normal 3. The five parameters of the truncated bivariate normal need to be estimated: ψ = { B b, B w, σ b, σ w, ρ} = { B, Σ} These are on the untruncated scale (and not quantities of interest) since: where R( B, Σ) = TN(β b i, β w i B, Σ) = N(β b i, β w i B, Σ) 1(βb i, βw i ) R( B, Σ) Z 1 Z 1 N(β b, β w B, Σ)dβ b dβ w (volume above unit square) 0 0 Gary King () Ecological Inference 21 / 38
99 Deriving the Likelihood Function Gary King () Ecological Inference 22 / 38
100 Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) Gary King () Ecological Inference 22 / 38
101 Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: Gary King () Ecological Inference 22 / 38
102 Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) P(T i ψ) Gary King () Ecological Inference 22 / 38
103 Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) = X i (0,1) P(T i ψ) ( What we observe ) What we could have observed Gary King () Ecological Inference 22 / 38
104 Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) = X i (0,1) = X i (0,1) P(T i ψ) ( What we observe What we could have observed ) ( ) Area above line segment Volume above square Gary King () Ecological Inference 22 / 38
105 Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) = X i (0,1) = X i (0,1) = X i (0,1) P(T i ψ) ( What we observe What we could have observed ) ( ) Area above line segment Volume above square ) ( ) Area above line segment ( Area above line Volume above plane Area above line ( Volume above square Volume above plane ) Gary King () Ecological Inference 22 / 38
106 Deriving the Likelihood Function 4. (From simulations of these parameters, we will compute quantities of interest: β b i, βw i. Details shortly.) 5. The likelihood: L( ψ T ) X i (0,1) = X i (0,1) = X i (0,1) = X i (0,1) = X i (0,1) P(T i ψ) ( What we observe What we could have observed ) ( ) Area above line segment Volume above square ) ( ) Area above line segment ( Area above line Volume above plane N(T i µ i, σ 2 i ) S( B, Σ) R( B, Σ) Area above line ( Volume above square Volume above plane ) Gary King () Ecological Inference 22 / 38
107 Deriving the Likelihood Function Gary King () Ecological Inference 23 / 38
108 Deriving the Likelihood Function where Gary King () Ecological Inference 23 / 38
109 Deriving the Likelihood Function where E(T i X i ) µ i = B b X i + B w (1 X i ), Gary King () Ecological Inference 23 / 38
110 Deriving the Likelihood Function where E(T i X i ) µ i = B b X i + B w (1 X i ), V (T i X i ) σ 2 i = ( σ 2 w ) + (2 σ bw 2 σ 2 w )X i + ( σ 2 b + σ 2 w 2 σ bw )X 2 i, Gary King () Ecological Inference 23 / 38
111 Deriving the Likelihood Function where E(T i X i ) µ i = B b X i + B w (1 X i ), V (T i X i ) σ 2 i = ( σ 2 w ) + (2 σ bw 2 σ 2 w )X i + ( σ 2 b + σ 2 w 2 σ bw )X 2 i, min 1, T i X i S( B, Σ) = max 0, T (1 X i ) X i ( N β b B b + ω ) i ɛ i, σ b 2 ω2 i σ i σi 2 dβ b Gary King () Ecological Inference 23 / 38
112 Deriving the Likelihood Function 6. A visual version of the likelihood: 1.75 β w i β b i Gary King () Ecological Inference 24 / 38
113 The Truncated Bivariate Normal Distribution s Five Parameters Can be Estimated From Aggregate Data: Intuition (a) X i T i (b) X i T i (c) X i T i (d) X i T i (e) X i T i (f) X i T i Data were randomly generated from the model with parameter values B b, B w, σ b, σ w, and ρ, at the top of each graph. The solid line is the expected value and dashed lines are at plus and minus one standard deviation. Gary King () Ecological Inference 25 / 38
114 Another view of how the data change with the model 1 (a) (d) β w i.5 β w i β b i β b i 1.75 (b) (e) β w i.5 β w i β b i β b i 1 (c) (f) β w i.5 β w i β b i β b i Observable Implications for Sample Parameter Values. The numbers at the top of each tomography plot are the parameter values for the distribution from which data were randomly generated: B b, B w, σ b, σ w, and ρ. Gary King () Ecological Inference 26 / 38
115 Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it Rearranging the basic accounting identity gives βi w βi b: as a linear function of Gary King () Ecological Inference 27 / 38
116 Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it Rearranging the basic accounting identity gives βi w as a linear function of βi b: ( ) ( ) βi w Ti Xi = βi b 1 X i 1 X i Gary King () Ecological Inference 27 / 38
117 Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it Rearranging the basic accounting identity gives βi w as a linear function of βi b: ( ) ( ) βi w Ti Xi = βi b 1 X i 1 X i Thus, knowing T i and X i in one precinct narrows the possible values of βi b, βw i to one line cut across this figure: Gary King () Ecological Inference 27 / 38
118 Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it Rearranging the basic accounting identity gives βi w as a linear function of βi b: ( ) ( ) βi w Ti Xi = βi b 1 X i 1 X i Thus, knowing T i and X i in one precinct narrows the possible values of βi b, βw i to one line cut across this figure: 1.75 β w i.5.25 A Tomography Plot β b i Gary King () Ecological Inference 27 / 38
119 Calculating Quantities of Interest: A story of X-Rays and tomography machines; then how to do it P P P P β b i Gary King () Ecological Inference 28 / 38
120 How to Calculate Quantities of Interest Gary King () Ecological Inference 29 / 38
121 How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities Gary King () Ecological Inference 29 / 38
122 How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: Gary King () Ecological Inference 29 / 38
123 How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. Gary King () Ecological Inference 29 / 38
124 How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. ii. Draw β b i and β w i from TN(β b i, β w i B, Σ), given the simulated parameters, ψ = { B, Σ}. Gary King () Ecological Inference 29 / 38
125 How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. ii. Draw β b i and β w i from TN(β b i, β w i B, Σ), given the simulated parameters, ψ = { B, Σ}. iii. Compute the weighted average of the simulated coefficients (weights based on precinct population): Gary King () Ecological Inference 29 / 38
126 How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. ii. Draw βi b and βi w from TN(βi b, βi w B, Σ), given the simulated parameters, ψ = { B, Σ}. iii. Compute the weighted average of the simulated coefficients (weights based on precinct population): px B b N b+ i β i b = N b+ + i=1 Gary King () Ecological Inference 29 / 38
127 How to Calculate Quantities of Interest 1. Option 1. Simulate only (district level) aggregate quantities (a) Algorithm to take one draw of the district-level fraction of blacks who vote: i. Draw ψ from its posterior or sampling density: an asymptotic normal with mean equal to point estimates and variance the inverse of the -Hessian at the maximum. ii. Draw βi b and βi w from TN(βi b, βi w B, Σ), given the simulated parameters, ψ = { B, Σ}. iii. Compute the weighted average of the simulated coefficients (weights based on precinct population): px B b N b+ i β i b = N b+ + (b) Problem: We only get knowledge of the district-wide aggregate & its not robust. i=1 Gary King () Ecological Inference 29 / 38
128 How to Calculate Quantities of Interest Gary King () Ecological Inference 30 / 38
129 How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: Gary King () Ecological Inference 30 / 38
130 How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). Gary King () Ecological Inference 30 / 38
131 How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. Gary King () Ecological Inference 30 / 38
132 How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. Gary King () Ecological Inference 30 / 38
133 How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. (d) Alternative algorithm for drawing simulations of βi b and βi w. Gary King () Ecological Inference 30 / 38
134 How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. (d) Alternative algorithm for drawing simulations of βi b and βi w. i. Find the expression for P(β b i T i, ψ) analytically, which is a particular truncated univariate normal (see King, 1997: Appendix C). Gary King () Ecological Inference 30 / 38
135 How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. (d) Alternative algorithm for drawing simulations of βi b and βi w. i. Find the expression for P(β b i T i, ψ) analytically, which is a particular truncated univariate normal (see King, 1997: Appendix C). ii. Draw ψ from its posterior or sampling density (the same multivariate normal as always). Gary King () Ecological Inference 30 / 38
136 How to Calculate Quantities of Interest 2. Option 2. use the knowledge that simulations for observation i must come from its tomography line: (a) By the story of the model, if we know T i, we learn the entire tomography line (since X i is known ex ante). (b) So we will condition on T i to make a prediction from the tomography line. (c) We could apply the Option 1 algorithm and use rejection sampling (discard simulations of βi b, βw i that are not on the tomography line), but this would take forever. (d) Alternative algorithm for drawing simulations of βi b and βi w. i. Find the expression for P(β b i T i, ψ) analytically, which is a particular truncated univariate normal (see King, 1997: Appendix C). ii. Draw ψ from its posterior or sampling density (the same multivariate normal as always). iii. Insert the simulation into P(β b i T i, ψ) and draw out one simulated β b i. Gary King () Ecological Inference 30 / 38
Ecological Inference
Ecological Inference Simone Zhang March 2017 With thanks to Gary King for slides on EI. Simone Zhang Ecological Inference March 2017 1 / 28 What is ecological inference? Definition: Ecological inference
More informationA Consensus on Second-Stage Analyses in Ecological Inference Models
Political Analysis, 11:1 A Consensus on Second-Stage Analyses in Ecological Inference Models Christopher Adolph and Gary King Department of Government, Harvard University, Cambridge, MA 02138 e-mail: cadolph@fas.harvard.edu
More information4 Extending King s Ecological Inference Model to Multiple Elections Using Markov Chain Monte Carlo
PART TWO 4 Extending King s Ecological Inference Model to Multiple Elections Using Markov Chain Monte Carlo Jeffrey B. Lewis ABSTRACT King s EI estimator has become a widely used procedure for tackling
More informationEcological inference with distribution regression
Ecological inference with distribution regression Seth Flaxman 10 May 2017 Department of Politics and International Relations Ecological inference I How to draw conclusions about individuals from aggregate-level
More informationCHAPTER 1: Preliminary Description of Errors Experiment Methodology and Errors To introduce the concept of error analysis, let s take a real world
CHAPTER 1: Preliminary Description of Errors Experiment Methodology and Errors To introduce the concept of error analysis, let s take a real world experiment. Suppose you wanted to forecast the results
More informationAP Statistics Review Ch. 7
AP Statistics Review Ch. 7 Name 1. Which of the following best describes what is meant by the term sampling variability? A. There are many different methods for selecting a sample. B. Two different samples
More informationECNS 561 Multiple Regression Analysis
ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationHint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit
Amherst College Department of Economics Economics 360 Fall 2014 Exam 1: Solutions 1. (10 points) The following table in reports the summary statistics for high and low temperatures in Key West, FL from
More informationOnline Appendix to The Political Economy of the U.S. Mortgage Default Crisis Not For Publication
Online Appendix to The Political Economy of the U.S. Mortgage Default Crisis Not For Publication 1 Robustness of Constituent Interest Result Table OA1 shows that the e ect of mortgage default rates on
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationEMERGING MARKETS - Lecture 2: Methodology refresher
EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different
More information1. Capitalize all surnames and attempt to match with Census list. 3. Split double-barreled names apart, and attempt to match first half of name.
Supplementary Appendix: Imai, Kosuke and Kabir Kahnna. (2016). Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records. Political Analysis doi: 10.1093/pan/mpw001
More informationOrdinary Least Squares Regression
Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section
More informationLecture Notes Part 7: Systems of Equations
17.874 Lecture Notes Part 7: Systems of Equations 7. Systems of Equations Many important social science problems are more structured than a single relationship or function. Markets, game theoretic models,
More informationRegression Discontinuity
Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 16, 2018 I will describe the basic ideas of RD, but ignore many of the details Good references
More informationHWA CHONG INSTITUTION 2016 JC2 PRELIMINARY EXAMINATION. Tuesday 20 September hours. List of Formula (MF15)
HWA CHONG INSTITUTION 06 JC PRELIMINARY EXAMINATION MATHEMATICS Higher 9740/0 Paper Tuesday 0 September 06 3 hours Additional materials: Answer paper List of Formula (MF5) READ THESE INSTRUCTIONS FIRST
More informationStatistical Models for Causal Analysis
Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationStatistical Inference for Means
Statistical Inference for Means Jamie Monogan University of Georgia February 18, 2011 Jamie Monogan (UGA) Statistical Inference for Means February 18, 2011 1 / 19 Objectives By the end of this meeting,
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationPredicting the Treatment Status
Predicting the Treatment Status Nikolay Doudchenko 1 Introduction Many studies in social sciences deal with treatment effect models. 1 Usually there is a treatment variable which determines whether a particular
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationRegression Discontinuity
Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 24, 2017 I will describe the basic ideas of RD, but ignore many of the details Good references
More informationThe Importance of the Median Voter
The Importance of the Median Voter According to Duncan Black and Anthony Downs V53.0500 NYU 1 Committee Decisions utility 0 100 x 1 x 2 x 3 x 4 x 5 V53.0500 NYU 2 Single-Peakedness Condition The preferences
More informationEcon 325: Introduction to Empirical Economics
Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationSALES AND MARKETING Department MATHEMATICS. 2nd Semester. Bivariate statistics. Tutorials and exercises
SALES AND MARKETING Department MATHEMATICS 2nd Semester Bivariate statistics Tutorials and exercises Online document: http://jff-dut-tc.weebly.com section DUT Maths S2. IUT de Saint-Etienne Département
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationRegression Discontinuity Designs
Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational
More informationHow to Use the Internet for Election Surveys
How to Use the Internet for Election Surveys Simon Jackman and Douglas Rivers Stanford University and Polimetrix, Inc. May 9, 2008 Theory and Practice Practice Theory Works Doesn t work Works Great! Black
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More information1 Introduction Overview of the Book How to Use this Book Introduction to R 10
List of Tables List of Figures Preface xiii xv xvii 1 Introduction 1 1.1 Overview of the Book 3 1.2 How to Use this Book 7 1.3 Introduction to R 10 1.3.1 Arithmetic Operations 10 1.3.2 Objects 12 1.3.3
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationWeek 2: Review of probability and statistics
Week 2: Review of probability and statistics Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED
More informationNotes 6: Multivariate regression ECO 231W - Undergraduate Econometrics
Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano 1 Notation and language Recall the notation that we discussed in the previous classes. We call the outcome
More informationTypes of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics
The Nature of Geographic Data Types of spatial data Continuous spatial data: geostatistics Samples may be taken at intervals, but the spatial process is continuous e.g. soil quality Discrete data Irregular:
More informationUnpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies
Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University Joint work with Keele (Ohio State), Tingley (Harvard), Yamamoto (Princeton)
More informationStatistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran
Statistics and Quantitative Analysis U4320 Segment 10 Prof. Sharyn O Halloran Key Points 1. Review Univariate Regression Model 2. Introduce Multivariate Regression Model Assumptions Estimation Hypothesis
More informationStatistical Analysis of Causal Mechanisms
Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University November 17, 2008 Joint work with Luke Keele (Ohio State) and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mechanisms
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationGov 2002: 3. Randomization Inference
Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last week: This week: What can we identify using randomization? Estimators were justified via
More informationTruncation and Censoring
Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation: sample data are drawn from a subset of
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationBIG IDEAS. Area of Learning: SOCIAL STUDIES Urban Studies Grade 12. Learning Standards. Curricular Competencies
Area of Learning: SOCIAL STUDIES Urban Studies Grade 12 BIG IDEAS Urbanization is a critical force that shapes both human life and the planet. The historical development of cities has been shaped by geographic,
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationConfidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom
Confidence Intervals for the Mean of Non-normal Data Class 23, 8.05 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to derive the formula for conservative normal confidence intervals for the proportion
More informationCSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement
CSSS/STAT/SOC 321 Case-Based Social Statistics I Levels of Measurement Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle
More informationForecasting the 2012 Presidential Election from History and the Polls
Forecasting the 2012 Presidential Election from History and the Polls Drew Linzer Assistant Professor Emory University Department of Political Science Visiting Assistant Professor, 2012-13 Stanford University
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationSTAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables
STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences Random Variables Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University
More informationSupplemental Material for Policy Deliberation and Voter Persuasion: Experimental Evidence from an Election in the Philippines
Supplemental Material for Policy Deliberation and Voter Persuasion: Experimental Evidence from an Election in the Philippines March 17, 2017 1 Accounting for Deviations in the Randomization Protocol In
More informationreview session gov 2000 gov 2000 () review session 1 / 38
review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationIntroduction to Statistical Inference
Introduction to Statistical Inference Kosuke Imai Princeton University January 31, 2010 Kosuke Imai (Princeton) Introduction to Statistical Inference January 31, 2010 1 / 21 What is Statistics? Statistics
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -
More informationBias Variance Trade-off
Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]
More informationStatistical Analysis of Causal Mechanisms
Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University April 13, 2009 Kosuke Imai (Princeton) Causal Mechanisms April 13, 2009 1 / 26 Papers and Software Collaborators: Luke Keele,
More informationEcological Regression with Partial Identification
Ecological Regression with Partial Identification Wenxin Jiang Gary King Allen Schmaltz Martin A. Tanner January 21, 2019 Abstract Ecological inference (EI) is the process of learning about individual
More informationInternal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.
Section 7 Model Assessment This section is based on Stock and Watson s Chapter 9. Internal vs. external validity Internal validity refers to whether the analysis is valid for the population and sample
More informationWeakly informative priors
Department of Statistics and Department of Political Science Columbia University 21 Oct 2011 Collaborators (in order of appearance): Gary King, Frederic Bois, Aleks Jakulin, Vince Dorie, Sophia Rabe-Hesketh,
More informationIV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors
IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard
More informationGenerative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationEconometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points]
Econometrics (60 points) Question 7: Short Answers (30 points) Answer parts 1-6 with a brief explanation. 1. Suppose the model of interest is Y i = 0 + 1 X 1i + 2 X 2i + u i, where E(u X)=0 and E(u 2 X)=
More informationQUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018
Page 1 of 4 QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018 ECONOMICS 250 Introduction to Statistics Instructor: Gregor Smith Instructions: The exam
More informationRegression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationMid-term exam Practice problems
Mid-term exam Practice problems Most problems are short answer problems. You receive points for the answer and the explanation. Full points require both, unless otherwise specified. Explaining your answer
More informationMath 10 - Compilation of Sample Exam Questions + Answers
Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the
More informationWeakly informative priors
Department of Statistics and Department of Political Science Columbia University 23 Apr 2014 Collaborators (in order of appearance): Gary King, Frederic Bois, Aleks Jakulin, Vince Dorie, Sophia Rabe-Hesketh,
More informationSelection on Observables
Selection on Observables Hasin Yousaf (UC3M) 9th November Hasin Yousaf (UC3M) Selection on Observables 9th November 1 / 22 Summary Altonji, Elder and Taber, JPE, 2005 Bellows and Miguel, JPubE, 2009 Oster,
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationCSC321 Lecture 5 Learning in a Single Neuron
CSC321 Lecture 5 Learning in a Single Neuron Roger Grosse and Nitish Srivastava January 21, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 5 Learning in a Single Neuron January 21, 2015 1 / 14
More informationPreliminary Results on Social Learning with Partial Observations
Preliminary Results on Social Learning with Partial Observations Ilan Lobel, Daron Acemoglu, Munther Dahleh and Asuman Ozdaglar ABSTRACT We study a model of social learning with partial observations from
More informationGEOGRAPHIC INFORMATION SYSTEMS
GEOGRAPHIC INFORMATION SYSTEMS 4-H Round-Up Community Transitions Workshop Daniel Hanselka June 14, 2011 Goals of the Workshop Answer the question: What is GIS? Uses of GIS. Some of the Common Terminology
More informationEXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science
EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned
More informationSociology Exam 2 Answer Key March 30, 2012
Sociology 63993 Exam 2 Answer Key March 30, 2012 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher has constructed scales
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationMA Advanced Econometrics: Applying Least Squares to Time Series
MA Advanced Econometrics: Applying Least Squares to Time Series Karl Whelan School of Economics, UCD February 15, 2011 Karl Whelan (UCD) Time Series February 15, 2011 1 / 24 Part I Time Series: Standard
More informationTo Hold Out or Not. Frank Schorfheide and Ken Wolpin. April 4, University of Pennsylvania
Frank Schorfheide and Ken Wolpin University of Pennsylvania April 4, 2011 Introduction Randomized controlled trials (RCTs) to evaluate policies, e.g., cash transfers for school attendance, have become
More informationEXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY
EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 2013 MODULE 5 : Further probability and inference Time allowed: One and a half hours Candidates should answer THREE questions.
More informationFrom Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...
From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...
More information1 Review of the dot product
Any typographical or other corrections about these notes are welcome. Review of the dot product The dot product on R n is an operation that takes two vectors and returns a number. It is defined by n u
More informationHUDM4122 Probability and Statistical Inference. February 2, 2015
HUDM4122 Probability and Statistical Inference February 2, 2015 Special Session on SPSS Thursday, April 23 4pm-6pm As of when I closed the poll, every student except one could make it to this I am happy
More informationLearning Objectives. Zeroes. The Real Zeros of a Polynomial Function
The Real Zeros of a Polynomial Function 1 Learning Objectives 1. Use the Remainder and Factor Theorems 2. Use the Rational Zeros Theorem to list the potential rational zeros of a polynomial function 3.
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationLast week: Sample, population and sampling distributions finished with estimation & confidence intervals
Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last week: Sample, population and sampling
More informationLast few slides from last time
Last few slides from last time Example 3: What is the probability that p will fall in a certain range, given p? Flip a coin 50 times. If the coin is fair (p=0.5), what is the probability of getting an
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationInstrumental Variables
Instrumental Variables Department of Economics University of Wisconsin-Madison September 27, 2016 Treatment Effects Throughout the course we will focus on the Treatment Effect Model For now take that to
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationEco517 Fall 2014 C. Sims FINAL EXAM
Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,
More informationSOLUTIONS Problem Set 2: Static Entry Games
SOLUTIONS Problem Set 2: Static Entry Games Matt Grennan January 29, 2008 These are my attempt at the second problem set for the second year Ph.D. IO course at NYU with Heski Bar-Isaac and Allan Collard-Wexler
More informationSALES AND MARKETING Department MATHEMATICS. 2nd Semester. Bivariate statistics. SOLUTIONS of tutorials and exercises
SALES AND MARKETING Department MATHEMATICS 2nd Semester Bivariate statistics SOLUTIONS of tutorials and exercises Online document: http://jff-dut-tc.weebly.com section DUT Maths S2. IUT de Saint-Etienne
More information