Some challenges and results for causal and statistical inference with social network data

Size: px
Start display at page:

Download "Some challenges and results for causal and statistical inference with social network data"

Transcription

1 slide 1 Some challenges and results for causal and statistical inference with social network data Elizabeth L. Ogburn Department of Biostatistics, Johns Hopkins University May 10, 2013

2 Network data evince complex forms of dependence among observations, making statistical inference difficult. Two challenges for causal inference using data sampled from a single social network: nonparametric identification of causal effects with interference, valid statistical inference under network dependence. slide 2

3 slide 3 Computer scientists, physicists and mathematicians have been researching networks for decades: topology, diffusion properties, generation,... very little statistics for outcomes on network nodes. Statisticians have been researching dependent data for decades: time series spatial data interference surprisingly little of this work is directly applicable to networks.

4 slide 4 outline Introduction and background Motivating example: HopeNet Design and aims. Briefly describe three distinct types of interference. Inference in the presence of network dependence. Explain why results for spatial dependence are not immediately applicable. Point towards some possible solutions to the problem of statistical inference in networks.

5 slide 5 background Interference requires a new framework and new tools for causal inference. (Cf. Rubin, 1990; Halloran & Struchiner, 1995; Sobel, 2006; Hong & Raudenbush, 2006; Vansteelandt, 2007; Rosenbaum, 2007; Hudgens & Halloran, 2008; Graham et al., 2010; Manski, 2010; Tchetgen Tchetgen & VanderWeele, 2012; Aronow& Samii, 2012; van der Laan, 2012) Existing methods for causal inference in the presence of interference have limitations in the social network context. Generally assume that treatment is randomized. Generally assume multiple independent groups are observed.

6 slide 6 background Recent availability of social network data has inspired methods and analyses, but with caveats: GEEs and GLMs have been used extensively but are usually inappropriate for network dependence. (Cf. Lyons, 2011; Shalizi & Thomas, 2011; VanderWeele, Ogburn & Tchetgen Tchetgen, 2012; Christakis & Fowler, 2012; Shalizi, 2012) Spatial autoregressive (SAR) parameterize an equilibrium state and generally lack causal interpretations. Recent work by van der Laan (2012) harnesses independence assumptions that are reasonable if dependence is due to contagion and network evolution is observed over time; other recent work relies on randomization-based inference (Cf. Bowers et al., 2013; Lunagomez & Airoldi, 2014; Rosenblum, 2007)

7 slide 7 HopeNet Study Health outcomes, progressive entrepreneurship, and Networks Nyakabare Parish, Mbarara Province, SW Uganda 8villages 2000 adults

8 slide 8 HopeNet Study Complete social network census for the adult population of the parish. Clean water intervention. Microenterprise intervention.

9 effects of interest What types of effects are we interested in? Under what conditions are these effects nonparametrically identifiable? Ogburn & VanderWeele (2012) Under what conditions can we perform inference about them? slide 9

10 slide 10 definitions Stationarity: features of the distribution of observations does not depend on location in the network. Distance i,j between nodes i and j: lengthofshortestpath connecting i and j. Other definitions are possible, e.g. taking into account number of paths or average path length between i and j. M-dependence: W i W j if i,j > m. Mixing conditions: Cov (W i,w j ) 0as i,j.

11 slide 11 inference with network dependence Sources of network dependence. Limitations of methods for spatial dependence. Some methods for statistical inference in networks. local dependence weakly dependent clusters subsampling k-dependence

12 slide 12 sources of network dependence Latent variables cause outcomes among close social contacts to be more correlated than among distant contacts. (E.g. homophily, geography, shared culture.)

13 sources of network dependence Contagion implies information barrier structures e.g. [ Y1 t Y 2 t Y 1 t 2, Y2 t 2, Y1 t 1, and Y2 t 1 ] [ and Y t 2 1 Y3 t 1 ]. When a network is observed at a single time point, this will resemble latent variable dependence. If network is observed before information has had the opportunity to diffuse, m-dependence would be reasonable. slide 13

14 slide 14 Why can t we use spatial dependence results? Network topology doesn t naturally correspond to Euclidean space. In order to embed a network in R d,wewouldhavetoletd grow with sample size n. Spatial results require d to be fixed or to grow slowly with n. Population growth is usually assumed to occur at the boundaries of the d-dimensional space. It s not clear how to define boundaries in networks (nor how to define population growth). Mixing assumptions and m-dependence don t imply bounded correlation structure. In spatial data most observations are distant from one another. The maximum network-based distance between two observations may be very small. The distance distribution may not be right-skewed enough.

15 slide 15 inference with network dependence Focusing for now on a traditional frequentist inferential framework, point estimates are generally unbiased; the challenge is for s.e.and interval estimation. Estimates of s.e. may be unbiased but not consistent.

16 slide 16 inference with network dependence For simplicity, suppose our target parameter is a population mean, µ. The sample mean, W = n i=1 W i,isunbiasedforµ. The problem is consistent estimation of the asymptotic variance Avar ( W ). Agnostic about sources of dependence (contagion vs. latent variables). Population growth must preserve key features of the network and data.

17 slide 17 local dependence Local dependence: for each observation W i,thereisasetofindicesi such that W i W j for j I c. M-dependence is a special case. Using Stein s method, it is easy to show that CLTs hold for locally dependent data, under restrictions on the size of I. (Cf. Chen, 1978; Barbour, Karonski & Rucinski, 1989; Rinott & Rotar, 1996; Raic, 2002; Chen & Shao, 2005) What types of networks are consistent with the restrictions on I?

18 slide 18 weakly dependent clusters If K clusters are asymptotically mean independent from one another, there are two approaches we might consider: 1. T distribution based confidence intervals (Ibragimov & Muller, 2010; Bester, Conley & Hansen, 2011). Requires asymptotic normality and mean stationarity at the cluster level. 2. Bootstrap the weakly dependent communities. Stationarity is required only at the cluster level. In the spatial dependence literature mean independence is justified with conditions on the relative size of the boundaries and interiors of the clusters; growth in d dimensions uniformly. These conditions don t translate into the network setting...

19 slide 19 subsampling Subsampling has been used in many spatial dependence contexts (cf. Lahiri, 2003; Politis, Romano & Wolf, 1999), butneithertheimplementation nor the conditions under which it is appropriate are immediately applicable to networks. Under mild stationarity and dependence conditions, we can subsample to estimate Avar ( W ): 1. Select B subsamples of consecutive observations. 2. In each subsample, calculate the subsample variance estimator σ 2 b. 3. Estimate Avar( W ) with the average of the subsample estimators: σ 2 W = 1 B B b=1 σ 2 b.

20 slide 20 subsampling This is reasonable if, as n b and n, 1. σ 2 b is asymptotically unbiased for Avar( W ). This will hold if key features of the network and the mean and variance of W are stationary over groups smaller than the subsample sizes. 2. Var ( σ 2 W ) 0, where ) 1 Var ( σ 2 W = B 2 B Var ( σ 2 ) b b=1 + 1 B 2 2Cov ( σ b 2, σ 2 ) d I b,i d m + 1 B 2 2Cov ( σ b 2, σ 2 ) d I b,i d >m (1) (2) (3)

21 slide 21 k-dependence ) In some settings it may be expedient to estimate Cov (W directly. K-dependence: Cov(W i,w j )=σ k,wherek = i,j. Under k-dependence, m-dependence, and mean stationarity, ) we can get an unbiased and consistent estimate of Cov (W by this procedure: 1. For each k < m, selectpairsofnodesthatarek units apart, such that the pairs themselves are at least m units apart from on another. 2. Estimate σ k with ) the average covariance across the selected pairs. 3. Estimate Cov (W with the plug-in estimator. This doesn t demand as much from m-dependence as other procedures do...

22 slide 22 other directions Different types of asymptotics: combine infill and increasing domain asymptotics, fractal asymptotics. Identify low-level conditions for CLTs and LLNs in network settings. Learn a new, latent distance metric. (E.g. work by Adrian Raftery & colleagues) Finite sample results using bounded influence: Var (W i ) >> j Cov(W i,w j ).

23 slide 23 Thank you

24 slide 24 references Rubin, D. B. (1990). On the application of probability theory toagriculturalexperiments. essay on principles. section 9. comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science 5, Halloran, M. E. & Struchiner, C. J. (1995). Causal inference in infectious diseases. Epidemiology, Sobel, M. E. (2006). What do randomized studies of housing mobility demonstrate? Journal of the American Statistical Association 101, Hong, G. & Raudenbush, S. W. (2008). Causal inference for time-varying instruc- tional treatments. Journal of Educational and Behavioral Statistics 33, Vansteelandt, S. (2007). On confounding, prediction and efficiency in the analysis of longitudinal and cross-sectional clustered data. Scandinavian Journal of Statistics 34, Rosenbaum, P. R. (2007). Interference between units in randomized experiments. Journal of the American Statistical Association 102, Hudgens, M. G. & Halloran, M. E. (2008). Toward causal inference with inter- ference. Journal of the American Statistical Association 103,

25 slide 25 references Graham, B. S., Imbens, G. W. & Ridder, G. (2010). Measuring the effectsof segregation in the presence of social spillovers: A nonparametric approach. Tech. rep., National Bureau of Economic Research. Manski, C. F. (In press). Identification of treatment response with social interactions. The Econometrics Journal. Tchetgen Tchetgen, E. J. & VanderWeele, T. J. (2012). On causal inference in the presence of interference. Statistical Methods in Medical Research 21, Aronow, P. & Samii, C. (2012). Estimating Causal EffectsUnder General Interference. Working paper ( van der Laan, M. J. (2012). Causal Inference for Networks. U.C. Berkeley Division of Biostatistics Working Paper Series Paper 300. Lyons, R. (2011). The spread of evidence-poor medicine via áawed social- network analyses. Statistics, Politics and Policy 2(1), Article 2, VanderWeele, T.J., Ogburn, E.L. & Tchetgen Tchetgen, E.J. (2012). Why and When "Flawed" Social Network Analyses Still Yield Valid Tests of no Contagion. Statistics, Politics, and Policy 3(1).

26 slide 26 references Christakis, N.A. and Fowler, J.H. (2011). Social contagion theory: examining dynamic social networks and human behavior. Statistics in Medicine. Shalizi, C.R., Thomas, A.C. (2011). Homophily and contagion aregenerically confounded in observational social network studies. Sociological Methods and Research, 40: Shalizi C.R. (2012). Comment on Why and When Flawed Social Network Analyses Still Yield Valid Tests of no Contagion. Statistics, Politics, and Policy 3(1). Ogburn, E.L. & VanderWeele, T.J. (2012). Causal diagrams for interferenceand contagion. Under revision. Chen, L.H.Y. (1978). Two central limit problems for dependent random variables. Probability Theory and Related Fields 43, no. 3: Barbour, A. D., Karoński, M., & Ruciński, A. (1989). A central limit theorem for decomposable random variables with applications to random graphs. Journal of Combinatorial Theory, Series B, 47(2), Rinott, Y., & Rotar, V. (1996). A Multivariate CLT for Local Dependence with n 1/2 log n Rate and Applications to Multivariate Graph Related Statistics. Journal of multivariate analysis, 56,

27 slide 27 references Raic, M. (2003). Normal approximation by Stein s method. In Proceedings of the seventh young statisticians meeting. Chen, L. H., & Shao, Q. M. (2005). Stein s method for normal approximation. An introduction to Stein s method, 4, Chen, L. H., & Shao, Q. M. (2004). Normal approximation under local dependence. The Annals of Probability, 32(3), Ibragimov, R., & Müller, U. K. (2010). T-statistic based correlation and heterogeneity robust inference. Journal of Business & Economic Statistics, 28(4), Bester, C. A., Conley, T. G., & Hansen, C. B. (2011). Inference withdependentdata using cluster covariance estimators. Journal of Econometrics, 165(2), Lahiri, S. N. (2003). Resampling methods for dependent data. Springer. Politis, D., Romano, J. P., & Wolf, M. (1999). Weak convergence of dependent empirical measures with application to subsampling in function spaces. Journal of statistical planning and inference, 79(2),

28 slide 28 sources of network dependence 1. Latent variables cause outcomes among close social contacts to be more correlated than among distant contacts. (E.g. homophily, geography, shared culture.) This is the type of dependence considered in the context of spatial data. 2. Direct interference implies densely overlapping clustering of exposure status (cf. Aronow & Samii, 2012). Outcomes are independent conditional on exposure, unless other forms of dependence are also present.

29 When a network is observed at a single time point, this will resemble latent variable dependence. If network is observed before information has had the opportunity to diffuse, m-dependence would be reasonable. slide 29 sources of network dependence 3. Contagion implies an information barrier structure: [ Y t 1 Y2 t Y 1 t 2, Y2 t 2, Y1 t 1, and Y2 t 1 ] [ and Y t 2 1 Y3 t 1 ].

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

arxiv: v1 [math.st] 28 Feb 2017

arxiv: v1 [math.st] 28 Feb 2017 Bridging Finite and Super Population Causal Inference arxiv:1702.08615v1 [math.st] 28 Feb 2017 Peng Ding, Xinran Li, and Luke W. Miratrix Abstract There are two general views in causal analysis of experimental

More information

On block bootstrapping areal data Introduction

On block bootstrapping areal data Introduction On block bootstrapping areal data Nicholas Nagle Department of Geography University of Colorado UCB 260 Boulder, CO 80309-0260 Telephone: 303-492-4794 Email: nicholas.nagle@colorado.edu Introduction Inference

More information

Causal Diagrams for Interference

Causal Diagrams for Interference Statistical Science 2014, Vol. 29, No. 4, 559 578 DOI: 10.1214/14-STS501 c Institute of Mathematical Statistics, 2014 Causal Diagrams for Interference Elizabeth L. Ogburn and Tyler J. VanderWeele arxiv:1403.1239v3

More information

Identification and Estimation of Causal Effects from Dependent Data

Identification and Estimation of Causal Effects from Dependent Data Identification and Estimation of Causal Effects from Dependent Data Eli Sherman esherman@jhu.edu with Ilya Shpitser Johns Hopkins Computer Science 12/6/2018 Eli Sherman Identification and Estimation of

More information

Identification and estimation of treatment and interference effects in observational studies on networks

Identification and estimation of treatment and interference effects in observational studies on networks Identification and estimation of treatment and interference effects in observational studies on networks arxiv:1609.06245v4 [stat.me] 30 Mar 2018 Laura Forastiere, Edoardo M. Airoldi, and Fabrizia Mealli

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Research Note: A more powerful test statistic for reasoning about interference between units

Research Note: A more powerful test statistic for reasoning about interference between units Research Note: A more powerful test statistic for reasoning about interference between units Jake Bowers Mark Fredrickson Peter M. Aronow August 26, 2015 Abstract Bowers, Fredrickson and Panagopoulos (2012)

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

Auto-G-Computation of Causal Effects on a Network

Auto-G-Computation of Causal Effects on a Network Original Article Auto-G-Computation of Causal Effects on a Network Eric J. Tchetgen Tchetgen, Isabel Fulcher and Ilya Shpitser Department of Statistics, The Wharton School of the University of Pennsylvania

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

arxiv: v3 [stat.me] 17 Apr 2018

arxiv: v3 [stat.me] 17 Apr 2018 Unbiased Estimation and Sensitivity Analysis for etwork-specific Spillover Effects: Application to An Online etwork Experiment aoki Egami arxiv:1708.08171v3 [stat.me] 17 Apr 2018 Princeton University Abstract

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

arxiv: v4 [math.st] 20 Jun 2018

arxiv: v4 [math.st] 20 Jun 2018 Submitted to the Annals of Applied Statistics ESTIMATING AVERAGE CAUSAL EFFECTS UNDER GENERAL INTERFERENCE, WITH APPLICATION TO A SOCIAL NETWORK EXPERIMENT arxiv:1305.6156v4 [math.st] 20 Jun 2018 By Peter

More information

network-correlated outcomes

network-correlated outcomes Optimal design of experiments in the presence of network-correlated outcomes arxiv:1507.00803v1 [stat.me] 3 Jul 2015 Guillaume W. Basse, Edoardo M. Airoldi Department of Statistics Harvard University,

More information

Causal Inference with Interference and Noncompliance in Two-Stage Randomized Experiments

Causal Inference with Interference and Noncompliance in Two-Stage Randomized Experiments Causal Inference with Interference and Noncompliance in Two-Stage Randomized Experiments Kosuke Imai Zhichao Jiang Anup Malani First Draft: April 9, 08 This Draft: June 5, 08 Abstract In many social science

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Conditional randomization tests of causal effects with interference between units

Conditional randomization tests of causal effects with interference between units Conditional randomization tests of causal effects with interference between units Guillaume Basse 1, Avi Feller 2, and Panos Toulis 3 arxiv:1709.08036v3 [stat.me] 24 Sep 2018 1 UC Berkeley, Dept. of Statistics

More information

Optimal Blocking by Minimizing the Maximum Within-Block Distance

Optimal Blocking by Minimizing the Maximum Within-Block Distance Optimal Blocking by Minimizing the Maximum Within-Block Distance Michael J. Higgins Jasjeet Sekhon Princeton University University of California at Berkeley November 14, 2013 For the Kansas State University

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Causal diagrams for interference

Causal diagrams for interference Causal diagrams for interference Elizabeth L. Ogburn and Tyler J. VanderWeele Abstract The term interference has been used to describe any setting in which one subject s exposure may affect another subject

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin October 2018 Bruce Hansen (University of Wisconsin) Exact

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Hypothesis Testing For Multilayer Network Data

Hypothesis Testing For Multilayer Network Data Hypothesis Testing For Multilayer Network Data Jun Li Dept of Mathematics and Statistics, Boston University Joint work with Eric Kolaczyk Outline Background and Motivation Geometric structure of multilayer

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin June 2017 Bruce Hansen (University of Wisconsin) Exact

More information

Causal mediation analysis: Definition of effects and common identification assumptions

Causal mediation analysis: Definition of effects and common identification assumptions Causal mediation analysis: Definition of effects and common identification assumptions Trang Quynh Nguyen Seminar on Statistical Methods for Mental Health Research Johns Hopkins Bloomberg School of Public

More information

Understanding Ding s Apparent Paradox

Understanding Ding s Apparent Paradox Submitted to Statistical Science Understanding Ding s Apparent Paradox Peter M. Aronow and Molly R. Offer-Westort Yale University 1. INTRODUCTION We are grateful for the opportunity to comment on A Paradox

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

A Distinction between Causal Effects in Structural and Rubin Causal Models

A Distinction between Causal Effects in Structural and Rubin Causal Models A istinction between Causal Effects in Structural and Rubin Causal Models ionissi Aliprantis April 28, 2017 Abstract: Unspecified mediators play different roles in the outcome equations of Structural Causal

More information

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai Chapter 1 Propensity Score Analysis Concepts and Issues Wei Pan Haiyan Bai Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity score analysis, research using propensity score analysis

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha January 18, 2010 A2 This appendix has six parts: 1. Proof that ab = c d

More information

arxiv: v1 [stat.me] 3 Feb 2017

arxiv: v1 [stat.me] 3 Feb 2017 On randomization-based causal inference for matched-pair factorial designs arxiv:702.00888v [stat.me] 3 Feb 207 Jiannan Lu and Alex Deng Analysis and Experimentation, Microsoft Corporation August 3, 208

More information

A proof of Bell s inequality in quantum mechanics using causal interactions

A proof of Bell s inequality in quantum mechanics using causal interactions A proof of Bell s inequality in quantum mechanics using causal interactions James M. Robins, Tyler J. VanderWeele Departments of Epidemiology and Biostatistics, Harvard School of Public Health Richard

More information

arxiv: v1 [math.st] 5 Jun 2015

arxiv: v1 [math.st] 5 Jun 2015 Exact P-values for Network Interference Susan Athey Dean Eckles Guido W. Imbens arxiv:1506.02084v1 [math.st] 5 Jun 2015 June 2015 Abstract We study the calculation of exact p-values for a large class of

More information

PEARL VS RUBIN (GELMAN)

PEARL VS RUBIN (GELMAN) PEARL VS RUBIN (GELMAN) AN EPIC battle between the Rubin Causal Model school (Gelman et al) AND the Structural Causal Model school (Pearl et al) a cursory overview Dokyun Lee WHO ARE THEY? Judea Pearl

More information

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths for New Developments in Nonparametric and Semiparametric Statistics, Joint Statistical Meetings; Vancouver, BC,

More information

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen

More information

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance Identi cation of Positive Treatment E ects in Randomized Experiments with Non-Compliance Aleksey Tetenov y February 18, 2012 Abstract I derive sharp nonparametric lower bounds on some parameters of the

More information

Model-assisted design of experiments in the presence of network correlated outcomes

Model-assisted design of experiments in the presence of network correlated outcomes Model-assisted design of experiments in the presence of network correlated outcomes arxiv:1507.00803v4 [stat.me] 18 May 2017 Guillaume W. Basse, Edoardo M. Airoldi Department of Statistics Harvard University,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

Measurement error effects on bias and variance in two-stage regression, with application to air pollution epidemiology

Measurement error effects on bias and variance in two-stage regression, with application to air pollution epidemiology Measurement error effects on bias and variance in two-stage regression, with application to air pollution epidemiology Chris Paciorek Department of Statistics, University of California, Berkeley and Adam

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

Sensitivity analysis and distributional assumptions

Sensitivity analysis and distributional assumptions Sensitivity analysis and distributional assumptions Tyler J. VanderWeele Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA vanderweele@uchicago.edu

More information

Core Courses for Students Who Enrolled Prior to Fall 2018

Core Courses for Students Who Enrolled Prior to Fall 2018 Biostatistics and Applied Data Analysis Students must take one of the following two sequences: Sequence 1 Biostatistics and Data Analysis I (PHP 2507) This course, the first in a year long, two-course

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments

Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments James E. Pustejovsky UT Austin Educational Psychology Department Quantitative Methods Program pusto@austin.utexas.edu Elizabeth

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Inference for Average Treatment Effects

Inference for Average Treatment Effects Inference for Average Treatment Effects Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2018 1 / 15 Social

More information

Modeling Mediation: Causes, Markers, and Mechanisms

Modeling Mediation: Causes, Markers, and Mechanisms Modeling Mediation: Causes, Markers, and Mechanisms Stephen W. Raudenbush University of Chicago Address at the Society for Resesarch on Educational Effectiveness,Washington, DC, March 3, 2011. Many thanks

More information

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples DISCUSSION PAPER SERIES IZA DP No. 4304 A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples James J. Heckman Petra E. Todd July 2009 Forschungsinstitut zur Zukunft

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

CAUSAL INFERENCE FOR BINARY DATA WITH INTERFERENCE

CAUSAL INFERENCE FOR BINARY DATA WITH INTERFERENCE CAUSAL INFERENCE FOR BINARY DATA WITH INTERFERENCE Joseph Rigdon A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for

More information

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded 1 Background Latent confounder is common in social and behavioral science in which most of cases the selection mechanism

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Title Overlapping Sub-sampling and invariance to initial conditions By Maria Kyriacou

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014 Assess Assumptions and Sensitivity Analysis Fan Li March 26, 2014 Two Key Assumptions 1. Overlap: 0

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2012 Paper 300 Causal Inference for Networks Mark J. van der Laan Division of Biostatistics, School

More information

Granger Mediation Analysis of Functional Magnetic Resonance Imaging Time Series

Granger Mediation Analysis of Functional Magnetic Resonance Imaging Time Series Granger Mediation Analysis of Functional Magnetic Resonance Imaging Time Series Yi Zhao and Xi Luo Department of Biostatistics Brown University June 8, 2017 Overview 1 Introduction 2 Model and Method 3

More information

Randomization-Based Inference With Complex Data Need Not Be Complex!

Randomization-Based Inference With Complex Data Need Not Be Complex! Randomization-Based Inference With Complex Data Need Not Be Complex! JITAIs JITAIs Susan Murphy 07.18.17 HeartSteps JITAI JITAIs Sequential Decision Making Use data to inform science and construct decision

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy Department of Economics, Harvard University 1 / 40 Agenda instrumental variables part I Origins of instrumental

More information

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg

More information

Combining Difference-in-difference and Matching for Panel Data Analysis

Combining Difference-in-difference and Matching for Panel Data Analysis Combining Difference-in-difference and Matching for Panel Data Analysis Weihua An Departments of Sociology and Statistics Indiana University July 28, 2016 1 / 15 Research Interests Network Analysis Social

More information

The Essential Role of Pair Matching in. Cluster-Randomized Experiments. with Application to the Mexican Universal Health Insurance Evaluation

The Essential Role of Pair Matching in. Cluster-Randomized Experiments. with Application to the Mexican Universal Health Insurance Evaluation The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation Kosuke Imai Princeton University Gary King Clayton Nall Harvard

More information

Casual Mediation Analysis

Casual Mediation Analysis Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Technical Track Session I: Causal Inference

Technical Track Session I: Causal Inference Impact Evaluation Technical Track Session I: Causal Inference Human Development Human Network Development Network Middle East and North Africa Region World Bank Institute Spanish Impact Evaluation Fund

More information

CompSci Understanding Data: Theory and Applications

CompSci Understanding Data: Theory and Applications CompSci 590.6 Understanding Data: Theory and Applications Lecture 17 Causality in Statistics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu Fall 2015 1 Today s Reading Rubin Journal of the American

More information

Massive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering

Massive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering Massive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering Jasjeet S. Sekhon UC Berkeley June 21, 2016 Jasjeet S. Sekhon (UC Berkeley) Methods for Massive

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation

More information

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Inference in VARs with Conditional Heteroskedasticity of Unknown Form Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING ESTIMATION OF TREATMENT EFFECTS VIA MATCHING AAEC 56 INSTRUCTOR: KLAUS MOELTNER Textbooks: R scripts: Wooldridge (00), Ch.; Greene (0), Ch.9; Angrist and Pischke (00), Ch. 3 mod5s3 General Approach The

More information

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E. NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES James J. Heckman Petra E. Todd Working Paper 15179 http://www.nber.org/papers/w15179

More information

A Course in Applied Econometrics. Lecture 10. Partial Identification. Outline. 1. Introduction. 2. Example I: Missing Data

A Course in Applied Econometrics. Lecture 10. Partial Identification. Outline. 1. Introduction. 2. Example I: Missing Data Outline A Course in Applied Econometrics Lecture 10 1. Introduction 2. Example I: Missing Data Partial Identification 3. Example II: Returns to Schooling 4. Example III: Initial Conditions Problems in

More information

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu

More information

Comparing latent inequality with ordinal health data

Comparing latent inequality with ordinal health data Comparing latent inequality with ordinal health data David M. Kaplan University of Missouri Longhao Zhuo University of Missouri Midwest Econometrics Group October 2018 Dave Kaplan (Missouri) and Longhao

More information

arxiv: v1 [stat.me] 23 May 2017

arxiv: v1 [stat.me] 23 May 2017 Model-free causal inference of binary experimental data Peng Ding and Luke W. Miratrix arxiv:1705.08526v1 [stat.me] 23 May 27 Abstract For binary experimental data, we discuss randomization-based inferential

More information

AN EVALUATION OF PARAMETRIC AND NONPARAMETRIC VARIANCE ESTIMATORS IN COMPLETELY RANDOMIZED EXPERIMENTS. Stanley A. Lubanski. and. Peter M.

AN EVALUATION OF PARAMETRIC AND NONPARAMETRIC VARIANCE ESTIMATORS IN COMPLETELY RANDOMIZED EXPERIMENTS. Stanley A. Lubanski. and. Peter M. AN EVALUATION OF PARAMETRIC AND NONPARAMETRIC VARIANCE ESTIMATORS IN COMPLETELY RANDOMIZED EXPERIMENTS by Stanley A. Lubanski and Peter M. Steiner UNIVERSITY OF WISCONSIN-MADISON 018 Background To make

More information

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 2: Constant Treatment Strategies Introduction Motivation We will focus on evaluating constant treatment strategies in this lecture. We will discuss using randomized or observational study for these

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University Stata User Group Meeting, Boston July 26th, 2006 General Motivation Estimation of average effect

More information

Abstract Title Page. Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size

Abstract Title Page. Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size Abstract Title Page Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size Authors and Affiliations: Ben Kelcey University of Cincinnati SREE Spring

More information

Statistical Inference for Data Adaptive Target Parameters

Statistical Inference for Data Adaptive Target Parameters Statistical Inference for Data Adaptive Target Parameters Mark van der Laan, Alan Hubbard Division of Biostatistics, UC Berkeley December 13, 2013 Mark van der Laan, Alan Hubbard ( Division of Biostatistics,

More information

Testing for arbitrary interference on experimentation platforms

Testing for arbitrary interference on experimentation platforms Testing for arbitrary interference on experimentation platforms arxiv:704.090v4 [stat.me] 29 Jan 209 Jean Pouget-Abadie, Guillaume Saint-Jacques, Martin Saveski, Weitao Duan, Ya Xu, Souvik Ghosh, Edoardo

More information

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Ruth Keogh, Stijn Vansteelandt, Rhian Daniel Department of Medical Statistics London

More information

Bayesian Causal Mediation Analysis for Group Randomized Designs with Homogeneous and Heterogeneous Effects: Simulation and Case Study

Bayesian Causal Mediation Analysis for Group Randomized Designs with Homogeneous and Heterogeneous Effects: Simulation and Case Study Multivariate Behavioral Research, 50:316 333, 2015 Copyright C Taylor & Francis Group, LLC ISSN: 0027-3171 print / 1532-7906 online DOI: 10.1080/00273171.2014.1003770 Bayesian Causal Mediation Analysis

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall 2018 1 / 18 Linear Regression and

More information

Introduction to Propensity Score Matching: A Review and Illustration

Introduction to Propensity Score Matching: A Review and Illustration Introduction to Propensity Score Matching: A Review and Illustration Shenyang Guo, Ph.D. School of Social Work University of North Carolina at Chapel Hill January 28, 2005 For Workshop Conducted at the

More information

AFFINELY INVARIANT MATCHING METHODS WITH DISCRIMINANT MIXTURES OF PROPORTIONAL ELLIPSOIDALLY SYMMETRIC DISTRIBUTIONS

AFFINELY INVARIANT MATCHING METHODS WITH DISCRIMINANT MIXTURES OF PROPORTIONAL ELLIPSOIDALLY SYMMETRIC DISTRIBUTIONS Submitted to the Annals of Statistics AFFINELY INVARIANT MATCHING METHODS WITH DISCRIMINANT MIXTURES OF PROPORTIONAL ELLIPSOIDALLY SYMMETRIC DISTRIBUTIONS By Donald B. Rubin and Elizabeth A. Stuart Harvard

More information