Bayesian nonparametric predictive approaches for causal inference: Regression Discontinuity Methods

Similar documents
Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Bayesian Nonparametric Rasch Modeling: Methods and Software

An Alternative Assumption to Identify LATE in Regression Discontinuity Design

Why high-order polynomials should not be used in regression discontinuity designs

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs

The Economics of European Regions: Theory, Empirics, and Policy

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC)

Regression Discontinuity Designs

An Alternative Assumption to Identify LATE in Regression Discontinuity Designs

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Causal Inference with Big Data Sets

Applied Microeconometrics Chapter 8 Regression Discontinuity (RD)

Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN

ted: a Stata Command for Testing Stability of Regression Discontinuity Models

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Regression Discontinuity

Regression Discontinuity Design

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach

Regression Discontinuity

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bayesian Methods for Testing Axioms of Measurement

Regression Discontinuity Designs.

Optimal Bandwidth Choice for the Regression Discontinuity Estimator

A Workshop on Bayesian Nonparametric Regression Analysis

Regression Discontinuity Designs in Stata

Section 7: Local linear regression (loess) and regression discontinuity designs

Quantile POD for Hit-Miss Data

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Markov Chain Monte Carlo methods

Truncated Regression Model and Nonparametric Estimation for Gifted and Talented Education Program

Nonparametric Bayesian Methods (Gaussian Processes)

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Regression Discontinuity Design on Model Schools Value-Added Effects: Empirical Evidence from Rural Beijing

Part 8: GLMs and Hierarchical LMs and GLMs

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Disk Diffusion Breakpoint Determination Using a Bayesian Nonparametric Variation of the Errors-in-Variables Model

Bayesian Modeling of Conditional Distributions

Tobit and Interval Censored Regression Model

Flexible Estimation of Treatment Effect Parameters

ECO 2403 TOPICS IN ECONOMETRICS

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Empirical approaches in public economics

Statistical Inference for Stochastic Epidemic Models

Modeling Mediation: Causes, Markers, and Mechanisms

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Finding Instrumental Variables: Identification Strategies. Amine Ouazad Ass. Professor of Economics

12E016. Econometric Methods II 6 ECTS. Overview and Objectives

Gaussian kernel GARCH models

Introduction to Statistical Analysis

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

Bayesian inference for factor scores

The propensity score with continuous treatments

Bayesian Semiparametric GARCH Models

Bayesian Methods for Machine Learning

Web Appendix for The Dynamics of Reciprocity, Accountability, and Credibility

Bayesian Semiparametric GARCH Models

Subject CS1 Actuarial Statistics 1 Core Principles

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Lecture 10 Regression Discontinuity (and Kink) Design

PIRLS 2016 Achievement Scaling Methodology 1

VCMC: Variational Consensus Monte Carlo

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Regression Discontinuity Design Econometric Issues

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Identifying the Effect of Changing the Policy Threshold in Regression Discontinuity Models

Analysis of propensity score approaches in difference-in-differences designs

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

STA 4273H: Statistical Machine Learning

A Nonparametric Bayesian Methodology for Regression Discontinuity Designs

Online Appendix to: Crises and Recoveries in an Empirical Model of. Consumption Disasters

CAEPR Working Paper # Identifying Multiple Marginal Effects with a Single Binary Instrument or by Regression Discontinuity

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

Master of Science in Statistics A Proposal

Probability and statistics; Rehearsal for pattern recognition

STAT 518 Intro Student Presentation

The STS Surgeon Composite Technical Appendix

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

October 11, Keywords: Regression Discontinuity Design, Permutation Test, Induced Order Statistics, R.

Regression Discontinuity

Principles of Bayesian Inference

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Optimal Bandwidth Choice for the Regression Discontinuity Estimator

Regression Discontinuity Designs Using Covariates

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Supplementary material to: Tolerating deance? Local average treatment eects without monotonicity.

Independent and conditionally independent counterfactual distributions

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

Marginal Specifications and a Gaussian Copula Estimation

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

An Alternative Infinite Mixture Of Gaussian Process Experts

Modeling conditional distributions with mixture models: Theory and Inference

Transcription:

Bayesian nonparametric predictive approaches for causal inference: Regression Discontinuity Methods George Karabatsos University of Illinois-Chicago ERCIM Conference, 14-16 December, 2013 Senate House, University of London Session ES38: Bayesian Nonparametric Regression Sunday 15.12.2013, 08:45-10:25 In collaboration with S.G. Walker Research is supported by NSF-MMS Grant SES-1156372.

Introduction: Outline I. Review causal inference framework (counterfactual) II. Randomized studies and non-randomized studies. The regression discontinuity (RD) design for non-randomized studies. (Thistlewaite & Campbell, 1960; Cook, 2008) Causal Modeling Framework: DAG and extended conditional independence (Dawid, 2002, 2010) III. Issues of current causal models for RD designs. IV. Propose a Bayesian nonparametric regression model for RD designs. Sharp RD design (full treatment compliance among subjects). For a fuzzy RD design (imperfect treatment compliance). V. Illustrate Bayesian nonparametric model on two real data sets. VI. Impact of new teacher education curriculum on student performance Impact of basic skills on teaching ability. Consider more recent work on RD-based causal inference, involving the restricted DP mixture of linear regressions model (Wade, Walker, & Petrone, 2013), which more directly exploits the local randomization feature of RD designs. Time permitting. 2

Introduction: Randomized Studies Causal inference: A basic aim of scientific research. Randomized studies: gold standard of causal inference (Rubin, 2008). Randomization ensures that the (pretreatment) covariate distribution does not differ between treatment subjects and non-treatment subjects. Then, any difference in treatment outcomes and non-treatment outcomes are only due to changes in the treatment variable, i.e., is attributable to the causal effect of the treatment on the outcome. A randomized study is often infeasible: financial, ethical, or timeliness. Regression Discontinuity (RD) Design Y Outcome R Assignment Variable A = 1(R > r 0 ) T Treatment assignment indicator Treatment receipt indicator. Sharp RD: Full compliance (A = T). Fuzzy RD: Imperfect compliance. For subjects with R observations located near r 0, treatments are as good as randomly assigned, under mild conditions. 3

Outcomes Y 35 30 Sharp RD Design Illustration Non-treatment (T = 0; R <.6) Treatment (T = 1; R >=.6) 25 20 15 10 5 0 At the cutoff of.6, the average jump size of Y from black line (control) to red line (treatment) is 4.3. 0 0.2 0.4 0.6 0.8 1 Assignment Variable R 4

RD Assumptions for Causal Inference Characterizing (RD) assumption: lim r r0 E(T r) lim r r0 E(T r) Sharp RD: f(t r) = Pr(T = t r) = t1(r r 0 ) + (1 t)1(r < r 0 ) Probabilistic DAG for sharp RD: R {, r 0 }, T {, 0, 1}, are intervention parameters. Y : general regime parameter that specifies the circumstances of Y; experimental conditions, environment, kind of subject, etc. DAG implies conditional independence properties: R T, Y R, T ( Y, R ) R, T, Y ( T, R ) R, T, Y Local stability ( SUTVA ): Y Ψ Y R = r 0, T, i.e., f(y r 0, t, ψ R ) = f(y r 0, t). In idle state, i.e., ψ R = ψ T =, joint p.d.f. is left at undisturbed state : f(r, t, y) = f(r)f(t r)f(y r, t). All previous CI assumptions imply a causal property: Y (Ψ T, Ψ R ) R = r 0, T An intervention regime (Ψ R = r 0, Ψ T = t 0 ), t 0 {0,1}, modifies f(r, t, y) to f(y r 0, t 0 ) = f(y r 0, t 0 ). Causal effect: comparison of functionals of f(y r 0, T = 1) and f(y r 0, T = 0). 5

RD Assumptions for Causal Inference Causal effect: comparison of functionals of f(y r 0, T = 1) and f(y r 0, T = 0). Conditioning on R = r 0 is motivated by the following assumption. Local Randomization (LR) (Lee, 2008): Each subject, described by all unobserved and observed pre-treatment covariates, W, has "imprecise control" over R, i.e., F R (r w) = Pr(R r w) is continuous in r at r 0, with 0 < F R (r 0 w) < 1. Then the p.d.f. of all observed pretreatment covariates, f(x w), is the same for all subjects just to the left and just to the right of the cutoff r 0. Estimate of causal effect of T on Y: Sharp RD: E(h{Y} r + ) E(h{Y} r ), for any chosen functional h{ }, where r+ denotes setting (R = r 0, 1(R r 0 ) = 1), and r denotes setting (R = r 0, 1(R r 0 ) = 0), as covariates in a regression model. Fuzzy RD (imperfect treatment compliance; f(t r) not point-mass): [E(h{Y} r + ) E(h{Y} r ) ] / [E(T r + ) E(T r ) ] under additional assumption of local exclusion restriction, i.e., conditionally on R = r 0, any effect of A on Y is only through T. 6

Standard Models for RD designs A standard model for sharp RD designs (e.g., Bloom, 2012): Y i = 0 + 1 (r i ) + 1(r i > r 0 ) + 2 (r i ) 1(r i > r 0 ) + i, i ~ N(0, 2 ) is the average causal effect of the treatment; 1 (r i ) and 2 (r i ) are each linear or polynomial effects of R. Estimate of causal effect ( ) can be easily biased by outliers. Local linear models (Fan & Gijbels, 1996) provide an outlier-resistant alternative (Imbens & Lemieux, 2008). A bandwidth parameter is chosen to assign higher weight to observations that are located close around the cutoff r 0 (Imbens & Lemieux, 2008). The local linear model has been extended to provide quantile regression, to provide causal effects in terms of quantiles (Frandsen et al. 2012). Local linear models can estimate the effect h in either sharp or fuzzy RD. However: Bandwidth choices only have large-sample justifications (Imbens & Kalyanaraman, 2012). Quantile regression method has the quantile-crossing problem. 7

Modeling RD designs For RD designs, a regression model is desired: That is flexible enough to make accurate predictions, while being able to capture r 0 -local effects. Accurate estimation of causal effects relies on a predictively-accurate regression model. That can provide coherent inferences of the causal effect of the treatment (versus the non-treatment), on the outcome Y, either in terms of the outcome s mean, variance, chosen quantiles, probability density, of Y. (i.e., for general functionals h{ } of Y). That would involve no quantile crossing problems. 8

IPMW Model for Sharp RD Designs f y i r i j j r, r j r r n y i j, j 2 j r i, r i, i 1,,n, r 0 1 r 2 1 r r 0 j 1 r r r exp 0 1 r 2 1 r r 0 1/2 j, j 2 N j, 2 IG j 2 1,b, 2 N 0, 0 2 Un 0,b b,, Ga b a 0,b 0, IPMW: Infinite-Probits Mixture Weights model (Karabatsos & Walker, 2012,EJS). 9

Density f(y r) Mixture weight j (r) IPMW Model (Karabatsos-Walker 12) f y r n y j, 2 j j r, r j r 0 1 r 2 1 r r 0 r exp 0 1 r 2 1 r r 0 1/2 1 (r) = 1/20 1 (r) = 1/2 Weights ω j ( (r), (r)) indicate how well r explains Y. (r) controls multimodality. If 2 0 or 2 0, then there is a regression discontinuity causal effect of T on p.d.f. of Y. 1 (r) = 1 1 (r) = 2 0.5 0.5 0.5 0.5 0-10 0 10 Index j 0-10 0 10 Index j 0-10 0 10 Index j 0-10 0 10 Index j 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0-10 0 10 y 0-10 0 10 y 0-10 0 10 y 0-10 0 10 10 y

Posterior Predictive Inference Fast MCMC sampling/estimation of posterior Π(ζ Data) i f(y i r i ) ( ), with = ((μ j, j2 ) j Z,, 2, β σ,, ) (Karabatsos & Walker, 2012). Inference focuses on the posterior predictive density: f n (y r, t) = f (y r, t)dπ(ζ Data). Sharp RD design: causal effect estimate: E n (h{y} r + ) E n (h{y} r ) Fuzzy RD design: causal effect estimate: {E n (h{y} r + ) E n (h{y} r )} / {E n (T r + ) E n (T r )} 11

Two Data Applications of the IPMW Model for Causal Inference Both data sets involve Sharp RD designs. Prior parameter specification: b = 5, for 2 ~ U( 0, b ). Same priors for all other model parameters, as before. 40K samples retained from 200K MCMC samples and 2K burn-in. For parameters of interest: -- Trace plots showed good mixing of model parameters. -- 95% Monte Carlo Confidence intervals half-widths were sufficiently small (near.00), according to the sub-sampling batch method (Flegal & Jones, 2011). 12

IPMW Data Application #1 A new teacher education curriculum, CTPP (Chicago Teacher Pipeline Partnership), was implemented at one of the four Chicago schools of education, starting the Fall of 2010. Data on n = 347 undergraduate math teaching candidates (90% female), who has just completed a course on how to teach algebra. Pre-CTPP and Post-CTPP data (Fall 2007 - Spring 2013). Dependent variable: Z-score, learning to teach Math assessment. Covariates: TimeF10 = (Year 2010.9)/10; CTPP = 1({Year 2010.9}>0). [2010.9 is Fall 2010 cutoff]. [treatment assignment indicator] IPMW Model Results: Standardized residuals ranged from.8 to.8. R-squared =.92. Posterior distribution of and slopes, for CTPP, each concentrate around zero. 13

Density p.d.f. (-- 95%) IPMW Data Application #1 CTPP = 0 (Blue) vs. 1 (Red) TimeF10 = 0 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0-4 -3-2 -1 0 1 2 3 4 Z_posttest The new curriculum, compared to the old curriculum, increased the LMT scores, in terms of shifting the density of LMT scores to the right. This shift corresponds to an increase in the mean (from.17 to.20), the 10%ile (-1.43 to -1.35), the median (.07 to.15), and corresponds to a variance decrease (1.78 to 1.69). 14

IPMW Data Application #2 Causal link between basic skills and teaching ability? (Gitomer et al. 2011, J Teacher Education). Data on n = 205 undergraduate teaching candidates, under CTPP. Dependent Variable: Haberman Z-score on urban teaching ability (persistence; organization & planning; values student learning; theory to practice; at risk students; approach to students; survive in bureaucracy; explains teacher success; explains student success; fallibility). Covariates: B240d10 = (min[reading, Language, Math, Write] 240) / 10. BasicPass = 1({B240d10 240} > 0) = 1(Pass Reading Test). IPMW Model Results: Standardized residuals ranged from 1.3 to 1.2. R-squared =.99. For BasicPass, posterior mean (s.d.) estimate of slope is 1.49 (s.d. = 1.54), and posterior mean (s.d.) estimate of the slope is -.04 (.49). 15

Density p.d.f. (-- 95%) IPMW Data Application #2 BasicPas = 0 (Blue) vs. 1 (Red) B240d10 = 0 0.14 0.12 Four clusters of students. 0.1 0.08 0.06 0.04 0.02 0-4 -3-2 -1 0 1 2 3 4 Z_haberman A detailed inspection revealed that passing the basic skills reading test causally increased the Haberman z-score, in terms of the mean (from.31 to.45), 25%ile (-.65 to -.62), 75%ile (1.30 to 1.43), and 95%ile (2.36 to 2.82). 16

Conclusions We proposed a Bayesian nonparametric regression model for RD designs. The model provides a way to estimate the causal effect of a treatment (versus non-treatment), in terms of the treatment s regression discontinuity effect on the entire density of the outcome variable. Through the analyses of real data, we showed how the model can be extended to provide a causal analysis of how a treatment variable impacts the full distribution of the outcomes, including mean, variance, quantiles, p.d.f., and so forth... The model can be easily extended for the analysis of discrete-valued or (left- and/or right- censored) outcomes. Manuscript A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs : http://arxiv.org/abs/1311.4482 User-friendly software has been developed for the model. 17

18

X 1,,X n r, n, j, 2 k j n j 1 Restricted DP (rdp) Mixture model X r i = (1, r i ) j, 2 k j n j 1 ρ n = (s₁,,s n ) r n k n j 1 Normal x i r i j, 2 j i:s i j n k n n k n n kn! j 1 j j 2 Normal j 0, j 2 C 1 j 2 InverseGamma j 2 a,b 1 nj 1 s r 1 s r n observed pre-treatment covariate (or prognostic/propensity score) are vectors of the assignment variables (i = 1,,n); the k n n distinct values of parameters that are assigned to each of the n subjects, with k n random; random partition of the n observations; 1, 1 2,, n, n2 s i = j if ( i, i2 ) = ( j*, j 2* ) and n j = i 1{( i, i2 ) = ( j*, j 2* )} the permutation of the first n integers that rearranges (r₁,,r n ) in increasing order, as r r 1 r r n with corresponding values x r 1,,x r n and s r 1,,s r n of x and s 1,,s n The rdp has precision and Normal-InvGamma baseline distribution. 19

Posterior of Random Partition n x,r k n k n! k n 1 n j j 1 C C R j R j b a a n j /2 a b V j 2 /2 a n j/2 1 S r 1 S r n V j 2 r j r j W j r j r j, W j I j R j C R j R j j 1 R j, and r j R j 0, r i is vector of r i, and R j is matrix of r i = (1, r i ), for subjects in cluster j. Posterior is sampled by a RJ-MCMC algorithm, which either splits or merges a randomly-selected cluster. A Causal Inference Strategy for sharp RD: Identify the subject i = i 0 with observed r i nearest to the cutoff r 0. For each draw of the partition n from its posterior ( n x, r), find the cluster where that subject is located, and then within that cluster, use a two-sample test statistic to compare the outcomes (y i ) for treatment subjects (having r i > r 0 ) and the outcomes for non-treatment subjects (having r i < r 0 ). Average two-sample statistics over a large number of RJ-MCMC draws. 20

Statistic Non-Treatment Treatment sample size 103.1 (3, 190) 6.7 (2, 16) mean.37 (.07, 1.55) 1.23 (.97, 1.59) variance.76 (.01, 1.04).47 (.01, 0.85) interquartile range 1.17 (.18, 1.71).90 (.24, 1.41) skewness.11 (-1.26,.71).03 (.63, 0.82) kurtosis 2.69 (1.45, 3.41) 2.06 (1.00, 3.06) 1%ile 1.34 ( 2.20, 1.47).27 ( 0.65, 1.47) 10%ile.77 ( 1.35, 1.47).42 (.01, 1.47) 25%ile.20 ( 0.65, 1.47).74 (.30, 1.47) 50%ile.34 (.18, 1.47) 1.22 (1.00, 1.59) 75%ile.98 (.53, 1.65) 1.65 (1.47, 2.06) 90%ile 1.43 (1.24, 1.71) 2.14 (1.71, 2.77) 99%ile 2.04 (1.71, 2.42) 2.28 (1.71, 2.89) t-statistic 2.02 ( 4.21,.88) p-value:.19 (.00,.91) F test,variance 4.86 (.02, 34.45) p-value:.65 (.05,.98) Pr Y 1 Y 0 C r 0 0.70 (.21,.93) Pr Y 1 Y 0 C r 0 0.22 (.04,.67) KS test.28 (.05,.98) Basic skills example (again). rdp: = 1, vague N-IG baseline prior. Posterior mean (95% posterior credible interval). for various test statistics, in comparing treatment outcomes (y i ) vs. non-treatment outcomes, for the cluster of subjects around the cutoff r 0. 21

References Bloom, H. (2012). Modern regression discontinuity analysis. Journal of Research on Educational Effectiveness, 5, 43-82. Cattaneo, M., Frandsen, B., and Titiunik, R. (2013). Randomization Inference in the Regression Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. University of Michigan. February 19th. Unpublished manuscript. Cook, T. (2008). Waiting for life to arrive: A history of the regression discontinuity design in psychology, statistics and economics. Journal of Econometrics, 142, 636 654. Dawid, A. (2000). Causal inference without counterfactuals. Journal of the American Statistical Association, 95, 407-424. Dawid, A. (2002). Influence diagrams for causal modelling and inference. International Statistical Review, 70, 161-189. 22

References (continued) Dawid, A. (2010). Beware of the DAG! Journal of Machine Learning Research-Proceedings Track, 6, 59-86. Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. London: Chapman and Hall/CRC. Flegal, J.M., and Jones, G.L. (2011). Implementing Markov chain Monte Carlo: Estimating with confidence. In S.P. Brooks and A.E. Gelman and G.L. Jones and X.L. Meng (Eds.), Handbook of Markov Chain Monte Carlo, pp. 175-197. Boca Raton, FL: CRC Press. Frandsen, B., Frölich, M., and Melly, B. (2012). Quantile treatment effects in the regression discontinuity design. Journal of Econometrics, 168, 382-395. Gitomer, D.H., Brown, T.L., and Bonett, J. (2011). Useful signal or unnecessary obstacle? The role of basic skills tests in teacher preparation. Journal of Teacher Education, 62, 431-445. 23

References (continued) Hahn, J., Todd, P., and der Klaauw, W. V. (2001). Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica, 69, 201-209. Imbens, G. and Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79, 933-959. Imbens, G. W. and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142, 615-635. Kalli, M., Griffin, J., and S.G. Walker (2010). Slice Sampling Mixture Models. Statistics and Computing, 21, 93-105. Karabatsos, G. and Walker, S. (2012). Adaptive-modal Bayesian nonparametric regression. Electronic Journal of Statistics, 6, 2038-2068. Lee, D. (2008). Randomized experiments from non-random selection in U.S. House elections. Journal of Econometrics, 142, 675-697. 24

References (continued) Lee, D. and Lemieux, T. (2010). Regression Discontinuity Designs in Economics. The Journal of Economic Literature, 48, 281-355. Rubin, D.B. (2008). For objective causal inference, design trumps analysis. The Annals of Applied Statistics, 2, 808-840. Thistlewaite, D. and Campbell, D. (1960). Regression-discontinuity analysis: An alternative to the ex-post facto experiment. Journal of Educational Psychology, 51, 309-317. Wade, S., Walker, S.G., and Petrone, S. (2013, to appear) A Predictive Study of Dirichlet Process Mixture Models for Curve Fitting. Scandinavian Journal of Statistics. Wong, V., Steiner, P., and Cook, T. (2013). Analyzing Regression- Discontinuity Designs With Multiple Assignment Variables: A Comparative Study of Four Estimation Methods. Journal of Educational and Behavioral Statistics, 38, 107-141. 25