Redistricting Simulation through Markov Chain Monte Carlo

Similar documents
A New Automated Redistricting Simulator Using Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo

STAT 425: Introduction to Bayesian Analysis

Markov chain Monte Carlo

17 : Markov Chain Monte Carlo

Markov Chain Monte Carlo

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Monte Carlo Inference Methods

Computational statistics

Introduction to Machine Learning CMU-10701

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Bayesian Networks in Educational Assessment

Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Assessing significance in a Markov chain without mixing

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Reminder of some Markov Chain properties:

Brief introduction to Markov Chain Monte Carlo

Markov chain Monte Carlo

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models

Exact Goodness-of-Fit Testing for the Ising Model

Application of new Monte Carlo method for inversion of prestack seismic data. Yang Xue Advisor: Dr. Mrinal K. Sen

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Stochastic optimization Markov Chain Monte Carlo

Parallel Tempering I

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

Monte Carlo in Bayesian Statistics

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Convex Optimization CMU-10725

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Statistical Analysis of List Experiments

A = {(x, u) : 0 u f(x)},

CSC 2541: Bayesian Methods for Machine Learning

0.1 factor.bayes: Bayesian Factor Analysis

Bayesian Phylogenetics:

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

Quantifying Uncertainty

Chapter 12 PAWL-Forced Simulated Tempering

Departamento de Economía Universidad de Chile

The XY-Model. David-Alexander Robinson Sch th January 2012

Regression Discontinuity Designs

Lognormal Measurement Error in Air Pollution Health Effect Studies

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bayesian Inference Technique for Data mining for Yield Enhancement in Semiconductor Manufacturing Data

The Ising model and Markov chain Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Web Appendix for The Dynamics of Reciprocity, Accountability, and Credibility

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

MCMC algorithms for fitting Bayesian models

Finite Population Estimators in Stochastic Search Variable Selection

Statistical Analysis of the Item Count Technique

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

0.1 factor.ord: Ordinal Data Factor Analysis

Rare Event Sampling using Multicanonical Monte Carlo

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

On Bayesian Computation

16 : Markov Chain Monte Carlo (MCMC)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

Stat 516, Homework 1

Hamiltonian Monte Carlo

Multimodal Nested Sampling

Bayesian Estimation of Input Output Tables for Russia

Monte Carlo and cold gases. Lode Pollet.

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

1. Capitalize all surnames and attempt to match with Census list. 3. Split double-barreled names apart, and attempt to match first half of name.

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Experimental Designs for Identifying Causal Mechanisms

An introduction to Markov Chain Monte Carlo techniques

Bayesian model selection in graphs by using BDgraph package

Markov Networks.

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Bayesian Methods for Machine Learning

Advanced Statistical Modelling

Markov Chain Monte Carlo methods

Lecture 8: Computer Simulations of Generalized Ensembles

Bayesian model selection: methodology, computation and applications

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Manifold Monte Carlo Methods

Monte Carlo Methods. Leon Gu CSD, CMU

Simulated Annealing for Constrained Global Optimization

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Markov Chain Monte Carlo

Default Priors and Effcient Posterior Computation in Bayesian

Transcription:

Redistricting Simulation through Markov Chain Monte Carlo Kosuke Imai Department of Government Department of Statistics Institute for Quantitative Social Science Harvard University SAMSI Quantifying Gerrymandering Workshop October 8, 2018 Joint work with Benjamin Fifield, Michael Higgins, Jun Kawahara, and Alexander Tarr Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 1 / 27

Motivation and Progress of Our Team s Efforts Redistricting simulation: detect gerrymandering assess impact of constraints (e.g., population, compactness, race) In 2013 when our team started working on the project, many optimization methods existed but there were surprisingly few simulation methods no theoretical justification for standard random seed and grow algorithms Need a simulation method that: 1 samples uniformly from a target population of redistricting maps 2 incorporates common constraints 3 scales to redistricting problems of moderate and large size Paper presented at the 2014 Political Methodology Summer Meeting Open-source R package redist first published at CRAN in May 2015 Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 2 / 27

Overview of the Talk 1 Markov chain Monte Carlo algorithms 2 Validation studies 3 Empirical studies Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 3 / 27

The Random Seed-and-Grow Algorithm Cirincione et. al (2000), Altman & McDonald (2011), Chen & Rodden (2013): 1 Randomly choose a precinct as a seed for each district 2 Identify precincts adjacent to each seed 3 Randomly select adjacent precinct to merge with the seed 4 Repeat steps 2 & 3 until all precincts are assigned 5 Swap precincts around borders to achieve population parity Modify Step 3 to incorporate compactness No theoretical properties known The resulting maps may not be representative of the population Local exploration is difficult Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 4 / 27

Redistricting as a Graph-Cut Problem Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 5 / 27

Step 1: Independently Turn On Each Edge with Prob. q Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 6 / 27

Step 2: Gather Connected Components on Boundaries Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 7 / 27

Step 3: Select Subsets of Components and Propose Swaps Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 8 / 27

Step 4: Accept or Reject the Proposal Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 9 / 27

Step 4: Accept or Reject the Proposal Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 10 / 27

The Theoretical Property of the Algorithm We prove that the algorithm samples uniformly from the population of all valid redistricting plans An extension of the Swendsen-Wang algorithm (Barbu & Zhu, 2005) Metropolis-Hastings move from plan v t 1 vt : ( ) α(v t 1 vt ) = min 1, π(v t v t 1 ) π(v t 1 vt ) min (1, ) (1 q) B(C,v) B(C,v ) where q is the edge cut probability and B(C, v) is # of edges between connected component and its assigned district in redistricting plan v Easy to calculate Exact Metropolis ratio is too costly to evaluate, but approximation appears to work well Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 11 / 27

Incorporating a Population Constraint Want to sample plans where ψ(v k ) = p k p 1 ɛ where p k is population of district k, p is average district population, ɛ is strength of constraint Strategy 1: Only propose valid swaps slow mixing Strategy 2: Oversample certain plans and then reweight 1 Sample from target distribution f rather than the uniform distribution: ( ) f (v) g(v) = exp β V k v ψ(v k ) where β 0 and ψ(v k ) is deviation from parity for district V k 2 (Approximate) Acceptance probability is still easy to calculate, ( α(v v g(v ) ) ) min 1, (1,v) B(C,v ) q) B(C g(v) 3 Discard invalid plans and reweight the rest by 1/g(v) Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 12 / 27

Additional Constraints 1 Compactness (Fryer and Holden 2011): ψ(v k ) i,j V k,i<j p i p j d 2 ij where d ij is the distance between precincts i, j 2 Similarity to the adapted plan: ψ(v k ) = r k 1 where r k (rk ) is the # of precincts in V k (V k of the adapted plan) 3 Any criteria where constraint can be evaluated at each district g(v) = exp β (w 1 ψ 1 (V k ) + w 2 ψ 2 (V k ) + + w L ψ L (V k )) V k v r k Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 13 / 27

Improving the Mixing of the Algorithm Single iteration of the proposed algorithm runs very quickly But, like any MCMC algorithm, convergence may take a long time 1 Swapping multiple connected components more effective than increasing q but still leads to low acceptance ratio 2 Simulated tempering (Geyer and Thompson, 1995) Lower and raise the temperature parameter β as part of MCMC Explores low temperature space before visiting high temperature space 3 Parallel tempering (Geyer 1991) Run multiple chains of the algorithm with different temperatures Use the Metropolis criterion to swap temperatures with adjacent chains Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 14 / 27

Validation Studies based on Florida Data Evaluate algorithms when all valid plans can be enumerated # of precincts: 25 and 50 # of districts: 2 and 3 for the 25 set, and 2 for the 50 set With and without a population constraint of 20% within parity Also, consider simulated and parallel tempering Comparison with the standard random seed-and-grow algorithm via the BARD package (Altman & McDonald 2011) 10,000 draws for each algorithm Republican Dissimilarity Index for each simulated plan: D = 1 2 n w i R i R i=1 R(1 R) Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 15 / 27

Our Algorithm vs. Standard Algorithm Unconstrained Simulations Constrained Simulations 25 Precinct Set Density 0 10 20 30 40 50 60 True Distribution Our Algorithm Standard Algorithm Density 0 10 20 30 40 50 60 0.00 0.05 0.10 0.15 0.20 0.25 Republican Dissimilarity Index 0.00 0.05 0.10 0.15 0.20 0.25 Republican Dissimilarity Index 50 Precinct Set Density 0 10 20 30 40 50 60 Density 0 10 20 30 40 50 60 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 Republican Dissimilarity Index Republican Dissimilarity Index Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 16 / 27

Simulated and Parallel Tempering Unconstrained Simulations Constrained Simulations (20%) Constrained Simulations (10%) Without Tempering (Algorithm 1) Density 0 10 20 30 40 50 True Distribution Our Algorithm (Hard Constraint) Standard Algorithm Density 0 10 20 30 40 50 Density 0 10 20 30 40 50 0.00 0.10 0.20 0.00 0.10 0.20 0.00 0.10 0.20 Republican Dissimilarity Index Republican Dissimilarity Index Republican Dissimilarity Index With Tempering (Algorithms 2 & 3) Density 0 10 20 30 40 50 True Distribution Simulated Tempering Parallel Tempering Density 0 10 20 30 40 50 0.00 0.10 0.20 0.00 0.10 0.20 Republican Dissimilarity Index Republican Dissimilarity Index Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 17 / 27

More Validation Studies based on Florida Data 26.7 Fully Enumerated Validation Maps for Redistricting Simulation 25 Precinct Validation Subset 70 Precinct Validation Subset Population 25.875 0 5000 10000 15000 25.850 20000 Latitude 26.6 26.5 25.825 25.800 25.775 26.4 25.750 81.8 81.6 81.4 80.20 80.16 80.12 Longitude Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 18 / 27

enumpart (Kawahara et al. 2017) Quickly Computes the Total Number of Solutions 1e+09 How Solutions Change with Map Size and Constraints 2 Districts 3 Districts 4 Districts Number of Solutions 1e+06 1000 10 0 1e+09 1e+06 1000 10 0 1e+09 1e+06 No Constraint 5% 1% 1000 10 0 20 25 30 35 40 20 25 30 35 40 20 25 30 35 40 Number of Precincts Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 19 / 27

Total Number of Solutions and Population Parity 400 300 # < 1% Parity: 8 # < 5% Parity: 192 # < 10% Parity: 927 Distribution of Population Parity for Fully Enumerated Maps 25 Precinct Validation Subset 9e+05 # < 1% Parity: 717060 # < 5% Parity: 3678453 # < 10% Parity: 7867435 70 Precinct Validation Subset Number of Plans 200 100 6e+05 3e+05 0 0e+00 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 Distance from Population Parity Total number of solutions without population constraint 1 25 precinct (3 districts) example: 117,688 2 70 precinct (2 districts) example: 44,082,156 Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 20 / 27

Validation Results 1 Divide 70-precincts into 2 districts: no constraint, 5%, 1% 2 4 MCMC chains for 50,000 iterations each: with and without simulated tempering 3 Run 200,000 iterations of random seed-and-grow algorithm Validation Exercises on 70 Precinct Validation Map No Constraint 5% Parity 1% Parity Density 15 10 Truth MCMC MCMC (Simulated Tempering) RSG 5 0 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 Republican Dissimilarity Index Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 21 / 27

Empirical Studies Apply algorithm to state election data: 1 2 3 New Hampshire: 2 congressional districts, 327 precincts Mississippi: 4 congressional districts, 1,969 precincts 1% (NH) and 5% (MS) deviation from population parity Convergence diagnostics: 1 2 3 Autocorrelation Trace plot Gelman-Rubin multiple chain diagnostic New Hampshire Kosuke Imai (Harvard) Mississippi Redistricting through MCMC SAMSI (Oct. 8, 2018) 22 / 27

New Hampshire: Tempering Works Better Autocorrelation of a Chain Trace of a Chain Gelman Rubin Diagnostic Hard Population Constraint (Algorithm 1) Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 10 20 30 40 50 Lag Republican Dissimilarity (logit transformed) 12 10 8 6 4 0 1000 3000 5000 Iterations shrink factor 1.0 1.5 2.0 2.5 0 1000 3000 5000 last iteration in chain Parallel Tempering (Algorithm 2) Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 10 20 30 40 50 Lag Republican Dissimilarity (logit transformed) 12 10 8 6 4 0 1000 3000 5000 Iterations shrink factor 1.0 1.5 2.0 2.5 0 1000 3000 5000 last iteration in chain Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 23 / 27 median 97.5%

Mississippi: Parallel Tempering, More Challenging Case Autocorrelation of a Chain Trace of a Chain Gelman Rubin Diagnostic Republican Dissmilarity Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 10 20 30 40 50 Lag Republican Dissimilarity (logit transformed) 6.0 5.0 4.0 3.0 0 2000 6000 10000 14000 Iterations shrink factor 1 2 3 4 5 6 7 8 median 97.5% 0 2000 6000 10000 14000 last iteration in chain African American Dissimilarity Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 10 20 30 40 50 Lag African American Dissimilarity (logit transformed) 3.0 2.5 2.0 0 2000 6000 10000 14000 Iterations shrink factor 0 2000 6000 10000 14000 last iteration in chain Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 24 / 27 1 2 3 4 5 6 7 8 median 97.5%

Redistricting Plans that are Similar to the Adapted Plan Question: How does the partisan bias of the adapted plan compare with that of similar plans? Local exploration 0.145 0.135 0.125 Fraction of Precincts Switched Partisan Bias towards Republicans 0 0.05 0.1 0.15 0.2 0.25 0.3 Partisan Bias New Hampshire 0.12 0.14 0.16 0.18 0.20 Fraction of Precincts Switched Absolute Deviation 0 0.05 0.1 0.15 0.2 0.25 0.3 Absolute Deviation from Partisan Symmetry Fraction of Precincts Switched Republican Seats 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 Number of Republican Seats 0.102 0.106 0.110 Fraction of Precincts Switched Partisan Bias towards Republicans 0 0.05 0.1 0.15 0.2 0.25 0.3 Mississippi 0.12 0.14 0.16 0.18 0.20 Fraction of Precincts Switched Absolute Deviation 0 0.05 0.1 0.15 0.2 0.25 0.3 Fraction of Precincts Switched Republican Seats 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 3 4 0.145 0.135 0.125 Fraction of Precincts Switched Partisan Bias towards Republicans 0 0.05 0.1 0.15 0.2 0.25 0.3 Partisan Bias New Hampshire 0.12 0.14 0.16 0.18 0.20 Fraction of Precincts Switched Absolute Deviation 0 0.05 0.1 0.15 0.2 0.25 0.3 Absolute Deviation from Partisan Symmetry Fraction of Precincts Switched Republican Seats 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 Number of Republican Seats 0.102 0.106 0.110 Fraction of Precincts Switched Partisan Bias towards Republicans 0 0.05 0.1 0.15 0.2 0.25 0.3 Mississippi 0.12 0.14 0.16 0.18 0.20 Fraction of Precincts Switched Absolute Deviation 0 0.05 0.1 0.15 0.2 0.25 0.3 Fraction of Precincts Switched Republican Seats 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 3 4 Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 25 / 27

Concluding Remarks Scholars use simulations to characterize the distribution of redistricting plans Commonly used algorithms lack theoretical properties and speed Our MCMC algorithm has: better theoretical properties superior speed better performance in validation studies can do global exploration for small states and local exploration for other states Future research: more validation studies more diffused starting maps larger states with more districts and precincts apply the method to historical redistricting data Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 26 / 27

References 1 Paper at https://imai.fas.harvard.edu/research/redist.html 2 R package at https://github.com/kosukeimai/redist 3 Comments and suggestions: Imai@Harvard.Edu Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 27 / 27