Redistricting Simulation through Markov Chain Monte Carlo

Redistricting Simulation through Markov Chain Monte Carlo Kosuke Imai Department of Government Department of Statistics Institute for Quantitative Social Science Harvard University SAMSI Quantifying Gerrymandering Workshop October 8, 2018 Joint work with Benjamin Fifield, Michael Higgins, Jun Kawahara, and Alexander Tarr Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 1 / 27

Motivation and Progress of Our Team s Efforts Redistricting simulation: detect gerrymandering assess impact of constraints (e.g., population, compactness, race) In 2013 when our team started working on the project, many optimization methods existed but there were surprisingly few simulation methods no theoretical justification for standard random seed and grow algorithms Need a simulation method that: 1 samples uniformly from a target population of redistricting maps 2 incorporates common constraints 3 scales to redistricting problems of moderate and large size Paper presented at the 2014 Political Methodology Summer Meeting Open-source R package redist first published at CRAN in May 2015 Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 2 / 27

Overview of the Talk 1 Markov chain Monte Carlo algorithms 2 Validation studies 3 Empirical studies Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 3 / 27

The Random Seed-and-Grow Algorithm Cirincione et. al (2000), Altman & McDonald (2011), Chen & Rodden (2013): 1 Randomly choose a precinct as a seed for each district 2 Identify precincts adjacent to each seed 3 Randomly select adjacent precinct to merge with the seed 4 Repeat steps 2 & 3 until all precincts are assigned 5 Swap precincts around borders to achieve population parity Modify Step 3 to incorporate compactness No theoretical properties known The resulting maps may not be representative of the population Local exploration is difficult Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 4 / 27

Redistricting as a Graph-Cut Problem Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 5 / 27

Step 1: Independently Turn On Each Edge with Prob. q Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 6 / 27

Step 2: Gather Connected Components on Boundaries Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 7 / 27

Step 3: Select Subsets of Components and Propose Swaps Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 8 / 27

Step 4: Accept or Reject the Proposal Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 9 / 27

Step 4: Accept or Reject the Proposal Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 10 / 27

The Theoretical Property of the Algorithm We prove that the algorithm samples uniformly from the population of all valid redistricting plans An extension of the Swendsen-Wang algorithm (Barbu & Zhu, 2005) Metropolis-Hastings move from plan v t 1 vt : ( ) α(v t 1 vt ) = min 1, π(v t v t 1 ) π(v t 1 vt ) min (1, ) (1 q) B(C,v) B(C,v ) where q is the edge cut probability and B(C, v) is # of edges between connected component and its assigned district in redistricting plan v Easy to calculate Exact Metropolis ratio is too costly to evaluate, but approximation appears to work well Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 11 / 27

Incorporating a Population Constraint Want to sample plans where ψ(v k ) = p k p 1 ɛ where p k is population of district k, p is average district population, ɛ is strength of constraint Strategy 1: Only propose valid swaps slow mixing Strategy 2: Oversample certain plans and then reweight 1 Sample from target distribution f rather than the uniform distribution: ( ) f (v) g(v) = exp β V k v ψ(v k ) where β 0 and ψ(v k ) is deviation from parity for district V k 2 (Approximate) Acceptance probability is still easy to calculate, ( α(v v g(v ) ) ) min 1, (1,v) B(C,v ) q) B(C g(v) 3 Discard invalid plans and reweight the rest by 1/g(v) Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 12 / 27

Additional Constraints 1 Compactness (Fryer and Holden 2011): ψ(v k ) i,j V k,i<j p i p j d 2 ij where d ij is the distance between precincts i, j 2 Similarity to the adapted plan: ψ(v k ) = r k 1 where r k (rk ) is the # of precincts in V k (V k of the adapted plan) 3 Any criteria where constraint can be evaluated at each district g(v) = exp β (w 1 ψ 1 (V k ) + w 2 ψ 2 (V k ) + + w L ψ L (V k )) V k v r k Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 13 / 27

Improving the Mixing of the Algorithm Single iteration of the proposed algorithm runs very quickly But, like any MCMC algorithm, convergence may take a long time 1 Swapping multiple connected components more effective than increasing q but still leads to low acceptance ratio 2 Simulated tempering (Geyer and Thompson, 1995) Lower and raise the temperature parameter β as part of MCMC Explores low temperature space before visiting high temperature space 3 Parallel tempering (Geyer 1991) Run multiple chains of the algorithm with different temperatures Use the Metropolis criterion to swap temperatures with adjacent chains Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 14 / 27

Validation Studies based on Florida Data Evaluate algorithms when all valid plans can be enumerated # of precincts: 25 and 50 # of districts: 2 and 3 for the 25 set, and 2 for the 50 set With and without a population constraint of 20% within parity Also, consider simulated and parallel tempering Comparison with the standard random seed-and-grow algorithm via the BARD package (Altman & McDonald 2011) 10,000 draws for each algorithm Republican Dissimilarity Index for each simulated plan: D = 1 2 n w i R i R i=1 R(1 R) Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 15 / 27

Our Algorithm vs. Standard Algorithm Unconstrained Simulations Constrained Simulations 25 Precinct Set Density 0 10 20 30 40 50 60 True Distribution Our Algorithm Standard Algorithm Density 0 10 20 30 40 50 60 0.00 0.05 0.10 0.15 0.20 0.25 Republican Dissimilarity Index 0.00 0.05 0.10 0.15 0.20 0.25 Republican Dissimilarity Index 50 Precinct Set Density 0 10 20 30 40 50 60 Density 0 10 20 30 40 50 60 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 Republican Dissimilarity Index Republican Dissimilarity Index Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 16 / 27

Simulated and Parallel Tempering Unconstrained Simulations Constrained Simulations (20%) Constrained Simulations (10%) Without Tempering (Algorithm 1) Density 0 10 20 30 40 50 True Distribution Our Algorithm (Hard Constraint) Standard Algorithm Density 0 10 20 30 40 50 Density 0 10 20 30 40 50 0.00 0.10 0.20 0.00 0.10 0.20 0.00 0.10 0.20 Republican Dissimilarity Index Republican Dissimilarity Index Republican Dissimilarity Index With Tempering (Algorithms 2 & 3) Density 0 10 20 30 40 50 True Distribution Simulated Tempering Parallel Tempering Density 0 10 20 30 40 50 0.00 0.10 0.20 0.00 0.10 0.20 Republican Dissimilarity Index Republican Dissimilarity Index Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 17 / 27

More Validation Studies based on Florida Data 26.7 Fully Enumerated Validation Maps for Redistricting Simulation 25 Precinct Validation Subset 70 Precinct Validation Subset Population 25.875 0 5000 10000 15000 25.850 20000 Latitude 26.6 26.5 25.825 25.800 25.775 26.4 25.750 81.8 81.6 81.4 80.20 80.16 80.12 Longitude Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 18 / 27

enumpart (Kawahara et al. 2017) Quickly Computes the Total Number of Solutions 1e+09 How Solutions Change with Map Size and Constraints 2 Districts 3 Districts 4 Districts Number of Solutions 1e+06 1000 10 0 1e+09 1e+06 1000 10 0 1e+09 1e+06 No Constraint 5% 1% 1000 10 0 20 25 30 35 40 20 25 30 35 40 20 25 30 35 40 Number of Precincts Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 19 / 27

Total Number of Solutions and Population Parity 400 300 # < 1% Parity: 8 # < 5% Parity: 192 # < 10% Parity: 927 Distribution of Population Parity for Fully Enumerated Maps 25 Precinct Validation Subset 9e+05 # < 1% Parity: 717060 # < 5% Parity: 3678453 # < 10% Parity: 7867435 70 Precinct Validation Subset Number of Plans 200 100 6e+05 3e+05 0 0e+00 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 Distance from Population Parity Total number of solutions without population constraint 1 25 precinct (3 districts) example: 117,688 2 70 precinct (2 districts) example: 44,082,156 Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 20 / 27

Validation Results 1 Divide 70-precincts into 2 districts: no constraint, 5%, 1% 2 4 MCMC chains for 50,000 iterations each: with and without simulated tempering 3 Run 200,000 iterations of random seed-and-grow algorithm Validation Exercises on 70 Precinct Validation Map No Constraint 5% Parity 1% Parity Density 15 10 Truth MCMC MCMC (Simulated Tempering) RSG 5 0 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 Republican Dissimilarity Index Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 21 / 27

Empirical Studies Apply algorithm to state election data: 1 2 3 New Hampshire: 2 congressional districts, 327 precincts Mississippi: 4 congressional districts, 1,969 precincts 1% (NH) and 5% (MS) deviation from population parity Convergence diagnostics: 1 2 3 Autocorrelation Trace plot Gelman-Rubin multiple chain diagnostic New Hampshire Kosuke Imai (Harvard) Mississippi Redistricting through MCMC SAMSI (Oct. 8, 2018) 22 / 27

New Hampshire: Tempering Works Better Autocorrelation of a Chain Trace of a Chain Gelman Rubin Diagnostic Hard Population Constraint (Algorithm 1) Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 10 20 30 40 50 Lag Republican Dissimilarity (logit transformed) 12 10 8 6 4 0 1000 3000 5000 Iterations shrink factor 1.0 1.5 2.0 2.5 0 1000 3000 5000 last iteration in chain Parallel Tempering (Algorithm 2) Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 10 20 30 40 50 Lag Republican Dissimilarity (logit transformed) 12 10 8 6 4 0 1000 3000 5000 Iterations shrink factor 1.0 1.5 2.0 2.5 0 1000 3000 5000 last iteration in chain Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 23 / 27 median 97.5%

Mississippi: Parallel Tempering, More Challenging Case Autocorrelation of a Chain Trace of a Chain Gelman Rubin Diagnostic Republican Dissmilarity Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 10 20 30 40 50 Lag Republican Dissimilarity (logit transformed) 6.0 5.0 4.0 3.0 0 2000 6000 10000 14000 Iterations shrink factor 1 2 3 4 5 6 7 8 median 97.5% 0 2000 6000 10000 14000 last iteration in chain African American Dissimilarity Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 10 20 30 40 50 Lag African American Dissimilarity (logit transformed) 3.0 2.5 2.0 0 2000 6000 10000 14000 Iterations shrink factor 0 2000 6000 10000 14000 last iteration in chain Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 24 / 27 1 2 3 4 5 6 7 8 median 97.5%

Redistricting Plans that are Similar to the Adapted Plan Question: How does the partisan bias of the adapted plan compare with that of similar plans? Local exploration 0.145 0.135 0.125 Fraction of Precincts Switched Partisan Bias towards Republicans 0 0.05 0.1 0.15 0.2 0.25 0.3 Partisan Bias New Hampshire 0.12 0.14 0.16 0.18 0.20 Fraction of Precincts Switched Absolute Deviation 0 0.05 0.1 0.15 0.2 0.25 0.3 Absolute Deviation from Partisan Symmetry Fraction of Precincts Switched Republican Seats 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 Number of Republican Seats 0.102 0.106 0.110 Fraction of Precincts Switched Partisan Bias towards Republicans 0 0.05 0.1 0.15 0.2 0.25 0.3 Mississippi 0.12 0.14 0.16 0.18 0.20 Fraction of Precincts Switched Absolute Deviation 0 0.05 0.1 0.15 0.2 0.25 0.3 Fraction of Precincts Switched Republican Seats 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 3 4 0.145 0.135 0.125 Fraction of Precincts Switched Partisan Bias towards Republicans 0 0.05 0.1 0.15 0.2 0.25 0.3 Partisan Bias New Hampshire 0.12 0.14 0.16 0.18 0.20 Fraction of Precincts Switched Absolute Deviation 0 0.05 0.1 0.15 0.2 0.25 0.3 Absolute Deviation from Partisan Symmetry Fraction of Precincts Switched Republican Seats 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 Number of Republican Seats 0.102 0.106 0.110 Fraction of Precincts Switched Partisan Bias towards Republicans 0 0.05 0.1 0.15 0.2 0.25 0.3 Mississippi 0.12 0.14 0.16 0.18 0.20 Fraction of Precincts Switched Absolute Deviation 0 0.05 0.1 0.15 0.2 0.25 0.3 Fraction of Precincts Switched Republican Seats 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 3 4 Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 25 / 27

Concluding Remarks Scholars use simulations to characterize the distribution of redistricting plans Commonly used algorithms lack theoretical properties and speed Our MCMC algorithm has: better theoretical properties superior speed better performance in validation studies can do global exploration for small states and local exploration for other states Future research: more validation studies more diffused starting maps larger states with more districts and precincts apply the method to historical redistricting data Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 26 / 27

References 1 Paper at https://imai.fas.harvard.edu/research/redist.html 2 R package at https://github.com/kosukeimai/redist 3 Comments and suggestions: Imai@Harvard.Edu Kosuke Imai (Harvard) Redistricting through MCMC SAMSI (Oct. 8, 2018) 27 / 27