network-correlated outcomes

Size: px
Start display at page:

Download "network-correlated outcomes"

Transcription

1 Optimal design of experiments in the presence of network-correlated outcomes arxiv: v1 [stat.me] 3 Jul 2015 Guillaume W. Basse, Edoardo M. Airoldi Department of Statistics Harvard University, Cambridge, MA 02138, USA Guillaume W. Basse is a graduate student in the Department of Statistics at Harvard University (gbasse@fas.harvard.edu). Edoardo M. Airoldi is an Associate Professor of Statistics at Harvard University (airoldi@fas.harvard.edu). This work was partially supported by the National Science Foundation under grants CAREER IIS and IIS , and by the Office of Naval Research under grant YIP N Guillaume W. Basse is a Google Fellow in Statistics. Edoardo M. Airoldi is an Alfred Sloan Research Fellow, and a Shutzer Fellow at the Radcliffe Institute for Advanced Studies. The authors are grateful to Hugh Chipman, Tirthankar Dasgupta, and Donald B. Rubin for insightful and constructive discussion that helped to greatly improve this paper.

2 Abstract We consider the problem of how to assign treatment in a randomized experiment, when the correlation among the outcomes is informed by a network available preintervention. Working within the potential outcome causal framework, we develop a class of models that posit such a correlation structure among the outcomes, and a strategy for allocating treatment optimally, for the goal of minimizing the integrated mean squared error of the estimated average treatment effect. We provide insights into features of the optimal designs via an analytical decomposition of the mean squared error used for optimization. We illustrate how the proposed treatment allocation strategy improves on allocations that ignore the network structure, with extensive simulations. Keywords: Causal inference; Social network data; Design of experiments; Optimal treatment allocation.

3 Contents 1 Introduction Causal inference setup and estimands Related work Contributions Optimal design with network-correlated outcomes An estimator for the average treatment effect and its mean squared error Nuisance parameters and integrated mean squared error Modeling network-correlated outcomes Comparative performance analysis Set-up Performance evaluation under correct prior specifications Robustness under prior mis-specification Quantifying sources of variation in the integrated mean squared error Discussion Decomposition of the mean square error and interpretation An alternative strategy to find good designs based on point priors Alternative conceptualizations to leverage social connections in causal analyses Optimal design and the role of randomization References 20 A Derivation of mean squared error for the proposed models 25 B Integrability of the mean squared error 25

4 1 Introduction The past decade has witnessed a surge of interest in causal analyses in the context of social networks, social media platforms and online advertising (e.g., Christakis and Fowler, 2007; Aral et al., 2009; Bakshy et al., 2011, 2012; Bond et al., 2012; Gui et al., 2015; Phan and Airoldi, 2015). The challenging aspect of these applications, generally, is the presence of connections, or network data, observed pre-intervention, possibly with uncertainty, and often missing. For instance, we might be interested in identifying which users should be offered a social coupon that enables them to play a new online multiplayer game for free, with their Facebook friends. Here, the units of analysis are users and the connections are given by the observed friendship links among users on Facebook. Or we might be interested in identifying which advertisers should be invited to bid in an auction to place ads on a banner on the CNN front page. Here, the units are auctions and a connection amounts to having a number of advertisers participating in a pair of auctions. While there is a well-developed literature on several aspects of the statistical analysis of network data (e.g., Wasserman, 1994; Kolaczyk, 2009; Goldenberg et al., 2010; Bickel and Chen, 2009), the literature about statistical methods and theory for experimentation in the social and information sciences that leverages observed connections is at a budding stage (e.g., Rosenbaum, 2007; Hudgens and Halloran, 2008; Parker, 2011; Aronow and Samii, 2013; Ogburn and VanderWeele, 2014). Moreover, even when considering the average treatment effect, or average exposure effects, as the inferential targets of interest, multiple conceptualizations and assumptions are possible, which require different methodological approaches to achieve valid inference when analyzing the same experiment (Imbens and Rubin, 2015; Airoldi and Rubin, 2015). Here, we consider the problem of how to assign treatment in a causal experiment, when the correlation among the outcomes is informed by a network, available pre-intervention. 1

5 Figure 1: Illustration of different assignments in the presence of observed connections. To illustrate the problem, consider the scenario in Figure 1 where the variance of Y i is V(Y i ) = 1 and the edge between nodes indicate a correlation of 0.9. Suppose that the two possible assignments correspond to the estimators ˆτ a = Y 1 Y 3 and ˆτ b = Y 1 Y 2. Looking only at the variance for now, one can check that V(ˆτ a ) = 2 > 0.2 = V(ˆτ b ), which illustrates two important points: first, the variance of the estimator depends significantly (by a factor of 10 here) on the assignment strategy, and second, the assignment that leverages the correlation structure (right panel) performs better that the assignment that mimics the independent case (left panel). The rest of this Section sets-up the problem, and places our contributions in the context of related work. The proposed methods are presented in Section 2 and illustrated with extensive simulations in Section 3. We discuss the results in Section Causal inference setup and estimands Let G be an undirected graph, and N be the number of nodes. Working within the potential outcomes framework (Rubin, 1974; Holland, 1986; Imbens and Rubin, 2015), we denote by Z= (Z 1,..., Z N ) the assignment vector, where Z i = 1 if unit i is assigned to treatment. Accepting the stable unit treatment value assumption (SUTVA henceforth, Rubin, 1980), the potential outcome of unit i depends only on its own treatment assignment Z i, therefore we can write its potential outcomes as Y i (Z i ). We can then define the vectors Y(0) = (Y 1 (0),..., Y N (0)) and Y(1) = (Y 1 (1),..., Y N (1)) for notational convenience. Causal inference methodology for analyzing randomized experiments, using potential outcomes, can be categorized into two general approaches: model-free and model-based (Imbens 2

6 and Rubin, 2015). The model-free approach, including the classical methodology by Neyman and Fisher (Fisher, 1922, 1954; Neyman, 1990; Speed, 1990; Rubin, 1990), posits the potential outcomes are constants, and the randomness comes from the assignment mechanism. The model-based approach posits the potential outcome vectors Y(0) and Y(1) are random variables (Rubin, 1978, 2005). This implies a joint distribution for (Y(1), Y(0)). Estimands of interest, denoted τ = τ(y(1), Y(0)), are also a random variables in this approach. In this paper we consider a set-up in between. Specifically, we consider a class of degenerate joint distributions for the potential outcomes vectors, where the estimand τ is a constant, Y(0) F (θ) s.t Cov(Y(0)) = Φ Y (G) Y(1) = Y(0) + τ1, where Φ Y (G) is a variance-covariance matrix, not necessarily diagonal, which depends on the network structure, as implied by the model introduced in Equations 7 9, and fully worked out in Equation 18, and θ denotes the parameters of the model for Y(0). 1.2 Related work The problem of design and analysis of causal experiments in situations where social connections (i.e., a social network) are available pre-intervention has recently received considerable attention. Two issues may arise from the availability of social connections: (i) social interference, that is, a situation where the potential outcomes for unit i are a function of the treatment assignments of other units that are related to unit i through the network, or they are function of the observed outcomes of these units, (ii) network-correlated outcomes, an alternative setting where the network informs the correlation among the potential outcomes, because the potential outcomes for unit i are a function of its covariates and those of other 3

7 units. Much of the current literature deals with social interference, as it is relevant to the estimation of peer effects, or peer influence (e.g., see Parker, 2011; Airoldi et al., 2012; Manski, 2013; Toulis and Kao, 2013; Aronow and Samii, 2013; Ugander et al., 2013; Eckles et al., 2014) on social networks and social media platforms (Aral et al., 2009; Bakshy et al., 2012; Bond et al., 2012; Ogburn and VanderWeele, 2014). When the inferential target is some causal measure of peer influence, SUTVA is an untenable assumption, since this assumption implies the causal effects of interference are zero by definition. Here, we focus on the case of network-correlated outcomes, assuming SUTVA holds. The scenario we consider subsumes a recent conceptualization of homophily (McPherson et al., 2001) as a special case. For a more precise discussion of the technical implications and differences between these settings, we refer to Section 4.3. A number of ideas developed in the classical literature on design of experiments can be adapted to the setting of designing experiments in the presence of network-correlated outcomes. Of particular interest to us is the line of work pioneered by Box and Lucas (1959), Draper and Hunter (1967), and Chaloner and Larntz (1989) on the design of experiments with non-linear response surfaces, and on strategy for how to deal with technical problems that arise in those situations. As we will show in Section 2.2, the optimal design in our setting also depends on the values of some unknown parameters. Similarly relevant to our work are the issues raised in Parker (2011). 1.3 Contributions Here, we offer three main contributions. First, we formalize the problem of optimal experimental design in the presence of network-correlated outcomes, and propose a general methodology for dealing with nuisance parameters at the design stage. Second, we introduce a class of models for network-correlated outcomes, for which we can compute the mean 4

8 squared error (MSE henceforth) in closed form, and we show empirically that minimizing the mean squared error implied by these models leads to significantly better treatment assignments than those that are currently considered optimal, but that ignore the network structure available at the design stage. Third, for a specific instance of our model, we show that the mean squared error decomposes into a sum of terms that have a clear interpretation. This decomposition offers valuable insights into desirable and undesirable unit assignments strategies for experiments on networks. 2 Optimal design with network-correlated outcomes We are interested in estimating the average treatment effect, under SUTVA, defined as τ 1 N i ( Yi (Z i = 1) Y i (Z i = 0) ), (1) or, equivalently, τ = Y(1) Y(0), as a consequence of the additivity assumption in Section An estimator for the average treatment effect and its mean squared error In order to estimate the average treatment effect, we propose the following estimator: ˆτ Z = 1 N 1 (Z) {i:z i =1} Y i(1) 1 N 0 (Z) {i:z i =0} Y i(0) (2) where N 0 (Z) and N 1 (Z) are the number of units assigned to the control and the treatment groups, respectively. If we measure the quality of our estimator using its mean squared error, then a natural goal when designing a randomized experiment is to chose the treatment 5

9 assignment vector Z that minimizes the mean squared error, Z opt = argmin Z MSE θ (ˆτ Z, τ). (3) The mean squared error can be written more explicitly as MSE θ (ˆτ Z, τ) = ω(z) Cov θ (Y(Z)) ω(z) (4) 1 + ( N 1 (Z) {i:z i =1} E θ(y i (0)) 1 N 0 (Z) {i:z i =0} E θ(y i (0))) 2 where θ parameterizes the joint model for the potential outcomes, and ω(z) is a vector of contrasts, with entries ω i (Z) = ω i (Z i ) = Z i /N 1 (1 Z i )/N 0 for i = 1... N. If we knew the parameter vector θ, the design problem would thus reduce to an optimization problem with the mean squared error as objective function. Such an optimization could be carried using a stochastic search algorithm over the possible treatment assignments. 2.2 Nuisance parameters and integrated mean squared error In general, the parameter vector θ is an unknown nuisance parameter, so we cannot evaluate the expression in Equation (4) directly. The presence of nuisance parameters in the objective function is a common issue in optimal design problems with nonlinear response surfaces. A popular way to address it is to posit a prior distribution for the nuisance parameters and integrate the mean squared error over the prior support (e.g., see Draper and Hunter, 1967; Chaloner and Verdinelli, 1995). Similarly, we modify our goal for designing an optimal experiment; we will seek the treatment assignment vector Z that minimizes the integrated 6

10 mean squared error, also known as Bayes risk in the optimal Bayesian design literature, Z opt = argmin Z MSE θ (ˆτ Z, τ) π(θ) dθ (5) argmin Z imse π (ˆτ Z, τ). However, it is difficult to obtain a closed form expression for the integrated MSE. Instead, we aim to minimize an estimator of the imse, obtained via Monte Carlo approximation, Z opt = argmin Z 1 n n MSE θi (ˆτ Z, τ) (6) i=1 argmin Z îmse π (ˆτ Z, τ), where θ i π. Conditions for the existence of the integrated mean squared error for the class of models considered in Section 2.3 are given in appendix. We study the robustness of the proposed method to variation in the estimated imse π in Section 3.4. Remark. A special case of this approach is to chose a point mass θ = θ 0 as a prior distribution (Box and Lucas, 1959). This approach is interesting to consider in our analysis, since for the proposed models the mean squared error is available in closed form. We leverage this observation to study the impact of different priors on the type of designs considered optimal in Section Modeling network-correlated outcomes In our exposition so far, we have abstracted away the network, simply assuming the existence of a correlation structure between outcomes. In the context of statistical network analysis (Kolaczyk, 2009; Goldenberg et al., 2010), many scientists have noticed that nodes sharing edges are more likely to share some similarity than unconnected nodes. This phenomenon, often referred to in the literature as homophily (McPherson et al., 2001; Hoff et al., 2001), 7

11 motivates the notion that the network structure might inform the correlation structure among the observed outcomes, when units in the population are connected through a social network. We introduce a simple class of models capturing this idea. A key motivation for the proposed models is that their simplicity leads to a closed form expression for the mean squared error of the estimator for the average treatment effect, which can be easily interpreted as discussed in Section 4.1. Let G be a network with N nodes (i.e., units), and let N i be the size of the closed neighborhood of node i (i.e., number of neighbors of node i, including node i). Define A the adjacency matrix (Kolaczyk, 2009) associated to the network G and A = A + I where I is the identity matrix. We propose the following class of models, which can be subsumed into the framework of Section 1.1, Y i (0) X ind NEF ( j N i X j, Λ( j N i X j ) ), for i = 1... N (7) X j iid NEF ( µ, φ X (µ) ), for j = 1... N (8) Y i (1) = Y i (0) + τ, for i = 1... N, (9) where Λ( j N i X j ) are positive scalars for i = 1... N and imply a diagonal specification for Φ Y X (µ), φ X (µ) is a positive scalar, and NEF(γ, Λ(γ)) denotes the natural exponential family distribution (Morris, 1982) with mean γ and variance function Λ(γ). The natural exponential family includes many popular distributions, making the proposed specifications flexible enough to model both continuous and discrete outcomes (Morris and Lock, 2009). Importantly, for these models, we can write the mean squared error of the estimator of the average treatment effect as MSE(ˆτ Z, τ) = µ 2 (δ N (Z)) 2 + ω(z) [ A Aφ X (µ) + diag(e(λ( j N i X j ))) ] ω(z) (10) 8

12 where the scalar, δ N (Z) = W i =1 N i W N 1 i =0 N i N 0, measures the difference in the average size of the neighborhoods for the units in the treatment and control groups. See the Appendix for details. The following two examples illustrate the flexibility of the class of models we introduced. Example 1 (Normal-Normal model). Consider the following model, Y i X ind Normal ( j N i X j, γ 2 ) X j iid Normal (µ, σ 2 ). This is a continuous instance of the general model introduced above, where we set µ = µ, φ X (µ) = σ 2, and Λ( j N i X i ) = γ 2. We can then write the mean squared error as, ( ) MSE(ˆτ Z, τ) = µ 2 (δ N ) 2 + σ 2 ω ( γ2 I + A A)ω. (11) σ 2 Example 2 (Poisson-Gamma model). Consider the following model, Y i X ind Poisson ( j N i X j ) X j iid λ Gamma (r). This is an discrete instance of the general model introduced above, where we set µ = rλ, φ X (µ) = µλ, and Λ( j N i X i ) = j N i X i. We can then write the mean squared error as, MSE(ˆτ Z, τ) = λ 2 r ( (δ N ) 2 + ω (A A + 1 λ diag(a A))ω ). (12) 9

13 3 Comparative performance analysis Here, we investigate the performance of the proposed methodology, empirically, and compare the MSE achieved by the optimal treatment assignment with the MSE of other design strategies, which either ignore the availability of social connections at the design stage, or that use them in a naïve way by clustering the network and using the clusters as strata. 3.1 Set-up We generated 100 networks from each of four different network models: Erdos-Renyi, Small- World, power-law degree distribution, and stochastic blockmodel (Goldenberg et al., 2010). For each network, we generated outcomes following the normal model in Section 2.3 with priors, γ 2 Inv-Gamma (r γ, λ γ ) µ Normal (µ 0, σ 0 ) σ 2 Inv-Gamma (r σ, λ σ ), where r γ = 3, λ γ = 1, µ 0 = 1, σ 0 = 0.5, λ σ = 1, and r σ = 2. We then evaluated the proposed optimal treatment assignment, as well as the assignments obtained by alternative strategies, using the integrated mean squared error evaluated under the true model. In Section 3.2, we consider the ideal case, in which we know the complete data generating process, and we use an estimate of the true integrated mean squared error as the objective function. In Section 3.3, we investigate the behavior of the proposed optimal treatment allocation strategy in a realistic situation, in which we have no knowledge of the hyper-parameters and we specify them incorrectly. 10

14 3.2 Performance evaluation under correct prior specifications Here, we investigate the performance of the proposed method when we know the full data generating process, and compare it with two other strategies. The first strategy ignores the network structure, and yields balanced randomized designs. The second strategy leverages the network structure by first partitioning the network into clusters, and then performing balanced randomizations within clusters, thus mimicking a stratified design. We evaluate the designs obtained from each method by comparing their imse computed under the true prior. Figure 2 shows a histogram of the imse of the proposed method relative to the imse of the naïve allocation strategy, which completely ignore the network structure, on the X axis. Figure 3 shows a similar histogram of the imse of the proposed method relative to the imse of the stratified randomized allocation, where strata were obtained using spectral clustering. The different panels of both figures report the relative imse results for the four network models separately, as indicated on the top of each panel. The results clearly indicate that the proposed method reduces the imse by at least 25% (the histogram bars fall below 0.75) in the majority of the simulations when compared to the na ve allocation strategy, independently of the network model, and by at least 20% (the histogram bars fall below 0.80) in the majority of the simulations when compared to the stratified allocation strategy, independently of the network model. 3.3 Robustness under prior mis-specification Here, we investigate the behavior of our method when we use the proposed generative model, but specify the wrong hyper-parameters. More in detail, we explored the comparative performance of the proposed methods over a range of ten priors for µ (i.e., we minimize the imse corresponding to the wrong priors), by letting the corresponding pair of hyper-parameters 11

15 ER power law count 0 sbm small world imse/balanced Figure 2: Histograms of the imse achieved with the proposed design relative to the imse obtained with a randomized balanced design, for outcomes generated according to four network models. ER power law count 0 sbm small world imse/spectral Figure 3: Histograms of the imse achieved with the proposed design relative to the imse obtained with a stratified design, for outcomes generated according to four network models. (µ 0, σ 0 ) take the following values: (1,0.5), (2,0.7), (5,1), (7, 1.2), (10,1.5), (15, 2), (20, 2.5), (30,3), (40,4), and (50, 5). The first prior (µ 0 = 1, σ 0 = 0.5) corresponds to the prior used to generate the outcome data, and the last one (µ 0 = 50, σ 0 = 5) corresponds to a prior that is 12

16 ER power law sbm small world true balanced true balanced true balanced true balanced factor(prior) imse Figure 4: Robustness of our method to prior mis-specifications very far from the correct distribution. Note that the integrated mean squared error is most sensitive to the parameter µ in the data generating process, so we are exploring a serious case of prior mis-specification. Figure 4 summarizes the performance of the proposed method when we use the wrong prior, over different network models. The results suggest that, although the performance degrades as the prior used to find the optimal design is further from the truth, as expected, the imse does not degrade substantially, and the proposed assignment significantly outperforms, on average, the random balanced designs provided for comparison in the rightmost column. Another attractive feature of the proposed design strategy is that it reduces the variation in the tails of the achieved imse; that is, we do not observe the kind of long tails we see for the random balanced designs. We offer some intuition behind the empirical robustness of the proposed strategy in Section

17 3.4 Quantifying sources of variation in the integrated mean squared error Here, we carry out analysis of variance on collection of comparative performance evaluation results presented above. The goal is to provide quantitative evidence of the robustness of the proposed optimal treatment assignment strategy to prior misspecification, and to compare the contribution of the sources of variation we considered in our simulation study (namely, treatment allocation strategy, network models, replications, prior specification/misspecification, variability of the estimated imse objective) to the overall integrated mean squared error. The choice of treatment allocation strategy (completely randomized, stratified/spectral, minimum imse) explains most of the variance observed in the simulations (MSS = 11,673), while the choice of prior hyper-parameters and the particular network model regime have little impact (MSS = 10.9 and MSS = 14.2, respectively). As described in Section 3.1, the minimization of imse is performed using a stochastic search algorithm over the space of treatment allocation vectors, Z Z, which means that even though the parameters underlying the optimization problem are the same, the optimal allocation vectors obtained for each replication might differ. A measure of the variation in the integrated mean squared error due to the stochastic nature of this minimization is given by the residuals of the analysis of variance, which is the smallest source of variation in the imse (MSS=3.0) when compared to the factors. This is reassuring. As we discuss in Section 2.2, we do not have a closed form formula for the integrated mean squared error. Instead, we use a Monte-Carlo estimate of this objective used for finding the optimal treatment assignment. Given the stochastic nature of the imse estimation routine, which is separate form the stochastic nature of the stochastic search that finds the optimal treatment assignment, it is difficult to analytically assess the impact of this approximation. 14

18 On the one hand, its magnitude is also captured by the residual sum of squares reported above (MSS=3.0). On the other hand, we can offer additional empirical evidence that the estimation of the IMSE objective is well behaved. We considered pairs of Monte-Carlo estimates, îmse 1 and îmse 2, of the same objective function, imse. If Z A and Z B are any two designs, such that, imse (Z A ) imse (Z B ) then we observe empirically that we have îmse 1 (Z A ) îmse 1 (Z B ) and îmse 2 (Z A ) îmse 2 (Z B ) in the vast majority of simulations. In other words, different estimates of the objective function are unlikely to alter the ranking of treatment assignments, in terms of their imse. So if an assignment is better than another, under an estimate of the objective function, it is also likely to be better under the true imse. The only unexplored source of variation at this point is given by mis-specifications of the model for the outcomes itself. We expect this to be qualitatively smaller than the variation due to the treatment allocation strategy (the main source of variation in the analysis above) because of the flexibility of the proposed model in explaining variation in the data, which can be inferred from its similarities to other models of network data that encode homophily, and to other exchangeable network models(e.g., see Wasserman, 1994; Kolaczyk, 2009; Goldenberg et al., 2010). 4 Discussion 4.1 Decomposition of the mean square error and interpretation The simulations in Section 3.3 show that the proposed method to find optimal treatment allocations is fairly robust to prior mis-specification. Robustness to prior mis-specification was also observed on simulation experiments, not reported here, when optimizing the mean squared error using point priors (Box and Lucas, 1959). Here, we investigate this feature in more details, by studying the analytical form of the MSE under the normal model introduced 15

19 in Section 2.3, MSE Z (ˆτ, τ) = µ 2 (δ N (Z)) 2 + γ 2 ω(z) ω(z) + σ 2 ω(z) A Aω(Z). }{{}}{{} bias 2 variance Below, we expand each of the terms and offer some intuition about what they capture. Interpretation of the bias term. The bias term can be expanded as follows, µ 2 (δ N ) 2 = µ 2 ( 1 N 1 {i:z i =1} N i 1 N 0 {i:z i =0} N i ) 2. (13) The bias is proportional to the difference in the average size of the neighborhoods for the units in the treatment and control groups. Intuitively, this difference measures a lack of balance between the two groups, in terms of their network characteristics specifically, the average degree. Larger values of the mean µ amplify this average imbalance in neighborhood sizes. Since the experimenter does not have control over µ, desirable treatment assignments have minimize bias by balancing the units assigned to treatment and control in terms of neighborhood sizes. Interpretation of the variance terms. The first variance term is, γ 2 ω ω = γ 2 ( 1 N N 0 ), (14) is minimized when N 1 = N 0. Intuitively, this term penalizes lack of balance between the sizes of the treatment and control groups. Larger values of the parameter γ amplify this average imbalance in group sizes. This is consistent with classical results on the optimality of balanced randomizations for estimating the average treatment effect (Fisher, 1954; Imbens and Rubin, 2015). 16

20 The second variance term involves features of the network, σ 2 ω A Aω = σ2 N σ2 N 2 0 {i,j:z i =Z j =1} N i N j (15) {i,j:z i =Z j =0} N i N j (16) 2σ2 N 1 N 0 {i,j:z i =1 and Z j =0} N i N j. (17) The term on right hand side of Equation 15 is proportional to the average number of neighbors in common between pairs of units assigned to the treatment group. The term in Equation 16 is proportional to the average number of neighbors in common between pairs of units assigned to the control group. The term in Equation 17 is proportional to the average number of neighbors in common between pairs of units, one assigned to the treatment and one assigned to the control groups. Intuitively, looking at the signs in front of these three factors, the second variance term is minimized by assigning units with shared neighbors to different groups, and by avoiding assigning treatment or control to entire clusters of units that are densely connected. The interpretation of the factors in the decomposition of the mean squared error presented above suggests an explanation for the empirical robustness to prior mis-specifications (of the proposed methodology for finding optimal designs) reported in Section 3.3. Prior misspecifications can be expected to have negligible effects on the mean squared error, as long as the treatment allocation respects a few key measures of balance. Namely, balance in the sizes of the treatment and control groups, balance in the average size of the neighborhoods for units in the treatment and control groups, balance in the assignment of units with shared neighbors to different groups, balance in assignment of treatment to dense clusters. These quantities are not a function of the prior hyper-parameters, and thus are unaffected by prior mis-specifications. 17

21 4.2 An alternative strategy to find good designs based on point priors An arguably simpler approach to optimal design, which builds on the notion of point priors (Box and Lucas, 1959) noted in Section 2.2, which also requires a prior distribution π(θ) on the parameter space, is the following three-steps procedure. First, consider multiple sets of parameters θ 1,..., θ k, and the associated mean squared error functions MSE θ1,..., MSE θk, and obtain k optimal designs Z (1),..., Z (k) by minimizing each of the k MSE functions separately. Second, evaluate each design under all the MSE functions; let s denote Γ ij MSE θj (Z (i) ). Third, build a loss function for Z, by averaging the mean squared error achieved by Z across the sets of parameters θ 1,..., θ k with weights equal to the density of the corresponding parameter sets under the prior distribution. That is, for each Z (i) we define the following loss, L(Z (i) ) = k j=1 Γ i,j π(θ j ). We then select the design with the smallest loss as the best among the k candidates. This method has the benefit of not relying on a numerical approximation for the integrated mean squared error, but has several limitations in practice. It requires running a stochastic optimization algorithm for each set of parameters. The choice of the k sets of parameters is somewhat arbitrary, and quickly runs into issues if the parameters space is large and/or multidimensional. In simulations (not reported here) this simpler strategy did not perform as well as the proposed approach. 18

22 4.3 Alternative conceptualizations to leverage social connections in causal analyses In Section 1.2 we hinted at how the notions of social interference and network-correlated outcomes relate to and differ form each other. The distinction is subtle, since both settings involve correlation structure among the outcomes. To appreciate the differences, let us consider how the correlation structure arises in typical causal settings where models for the outcomes are considered (Rubin, 1978). Let Σ be a diagonal covariance matrix, and Φ Z (G), Φ Y (G) be two full covariance matrices both functions of the network G. Then we can write, Cov(Y(Z) Z) = Σ = Cov(Y(Z)) Cov(Y(Z) Z) = Φ Y (G) = Cov(Y(Z)) Cov(Y(Z) Z) = Σ no interference network-correlated outcomes (our set-up) interference Cov(Y(Z)) = Φ Z (G) In the absence of interference, the potential outcomes are independent both conditionally and marginally. In our set-up, there is correlation among the potential outcomes conditionally on the assignment, as well as marginally. In the case of interference, the potential outcomes are independent conditionally on the assignment, but not marginally. The correlation arises when integrating out the assignment mechanism. The effects of interference and networkcorrelated outcomes cannot be identified in observational studies, in the absence of any assumptions (Shalizi and Thomas, 2011). Carefully designed experiments are necessary. We offered minimal assumptions and a strategy to design optimal experiments in the presence of network-correlated outcomes. 19

23 4.4 Optimal design and the role of randomization There is a lively debate in the optimal experimental design literature, between those supporting a set of optimal treatment assignments to enable randomization in practice (Harville, 1975; Kempthorne, 1977), mostly in the sciences and healthcare, and those favoring a single optimal treatment assignment (Kiefer, 1959), mostly in industrial engineering. In practice, randomization balances, on average, the effect of unobserved and potentially relevant covariates so called confounders. More generally, randomization improves the robustness of inference to model mis-specifications. In addition, when working within the Bayesian framework, randomization is an ignorable assignment mechanism that greatly simplifies inference (Rubin, 1978). The proposed methodology yields a trivially ignorable assignment mechanism, thus subsequent analyses are not any more complicated than if we had considered randomization instead. We are not leveraging covariates for balance, although these checks are simple to add. However, the proposed strategy is ultimately informed by the model specification, although we have illustrated the robustness to prior mis-specifications. There is an important place for randomization in designing optimal experiments in the presence of network-correlated outcomes. We have recently obtained results in this directions in the case of interference, and we plan on leveraging some of the insights in the setting we considered here, elsewhere. References Airoldi, E. M. and Rubin, D. B. (2015). Principles for leveraging information about a social network when designing and analyzing experiments. Proceedings of the National Academy of Sciences. Forthcoming. Airoldi, E. M., Toulis, P., Kao, E., and Rubin, D. B. (2012). Causal estimation of peer 20

24 influence effects. In NIPS Workshop on Social Network and Social Media Analysis. Aral, S., Muchnik, L., and Sundararajan, A. (2009). Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51): Aronow, P. M. and Samii, C. (2013). Estimating average causal effects under interference between units. arxiv preprint arxiv: Bakshy, E., Hofman, J. M., Mason, W. A., and Watts, D. J. (2011). Everyone s an influencer: quantifying influence on twitter. In Proceedings of the fourth ACM international conference on Web search and data mining, pages ACM. Bakshy, E., Rosenn, I., Marlow, C., and Adamic, L. (2012). The role of social networks in information diffusion. In Proceedings of the 21st international conference on World Wide Web, pages ACM. Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and newman girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50): Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E., and Fowler, J. H. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415): Box, G. E. and Lucas, H. (1959). Design of experiments in non-linear situations. Biometrika, pages Chaloner, K. and Larntz, K. (1989). Optimal bayesian design applied to logistic regression experiments. Journal of Statistical Planning and Inference, 21(2): Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, pages

25 Christakis, N. A. and Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England journal of medicine, 357(4): Draper, N. R. and Hunter, W. G. (1967). The use of prior distributions in the design of experiments for parameter estimation in non-linear situations. Biometrika, 54(1-2): Eckles, D., Karrer, B., and Ugander, J. (2014). Design and analysis of experiments in networks: Reducing bias from interference. arxiv preprint arxiv: Fisher, R. A. (1922). On the interpretation of χ 2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85(1): Fisher, R. A. (1954). Statistical Methods for Research Workers. Oliver and Boyd. Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends R in Machine Learning, 2(2): Gui, H., Xu, Y., Bhasin, A., and Han, J. (2015). Network A/B testing: From sampling to estimation. In International World Wide Web Conference (WWW), pages , Florence, Italy. Harville, D. A. (1975). Experimental randomization: who needs it? The American Statistician, 29(1): Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2001). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97: Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396): Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. Journal of the American Statistical Association, 103(482). 22

26 Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press. Kempthorne, O. (1977). Why randomize? Journal of Statistical Planning and Inference, 1(1):1 25. Kiefer, J. (1959). Optimum experimental designs. Journal of the Royal Statistical Society. Series B (Methodological), pages Kolaczyk, E. D. (2009). Statistical analysis of network data: methods and models. Springer Science & Business Media. Manski, C. F. (2013). Identification of treatment response with social interactions. The Econometrics Journal, 16(1):S1 S23. McPherson, M., Smith-Lovin, L., and Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual review of sociology, pages Morris, C. N. (1982). Natural exponential families with quadratic variance functions. The Annals of Statistics, pages Morris, C. N. and Lock, K. F. (2009). Unifying the named natural exponential families and their relatives. The American Statistician, 63(3): Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5(4): Translated and edited by D. M. Dabrowska and T. P. Speed from Polish original, which appeared in Roczniki Nauk Rolniczych Tom X (1923) 1 51 (Annals of Agricultural Sciences). Ogburn, E. L. and VanderWeele, T. J. (2014). Vaccines, Contagion, and Social Networks. ArXiv e-prints. Parker, B. M. (2011). Design of networked experiments. Presented at the Isaac Newton Institute, Cambridge University, UK. 23

27 Phan, T. Q. and Airoldi, E. M. (2015). A natural experiment of social network formation and dynamics. Proceedings of the National Academy of Sciences, 112(21): Rosenbaum, P. R. (2007). Interference between units in randomized experiments. Journal of the American Statistical Association, 102(477): Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5): Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, pages Rubin, D. B. (1980). Comment. Journal of the American Statistical Association, 75(371): Rubin, D. B. (1990). Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5(4): Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469): Shalizi, C. R. and Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2): Speed, T. P. (1990). Introcutory remarks on Neyman (1923). Statistical Science, 5(4): Toulis, P. and Kao, E. (2013). Estimation of causal peer influence effects. Journal of Machine Learning Research, W&CP, 28(3): Ugander, J., Karrer, B., Backstrom, L., and Kleinberg, J. (2013). Graph cluster randomization: Network exposure to multiple universes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM. 24

28 Wasserman, S. (1994). Social network analysis: Methods and applications, volume 8. Cambridge university press. A Derivation of mean squared error for the proposed models First we derive the following correlation for the model in Equations 7 9, N i N j φ X (µ), if i j Cov(Y i, Y j ) = N i N j φ X (µ) + E ( Λ( j N i X j ) ), if i = j, (18) which gives the element i, j of the marginal covariance matrix Φ Y (G). It follows, ( V(ˆτ Z ) = ω (A Aφ X (µ) + diag E ( Λ( j N i X j ) ))) ω. (19) The bias is, Bias(ˆτ Z, τ) = µ 2 (δ N ) 2, (20) which gives the overall mean squared error in Equation 10, ( MSE(ˆτ Z, τ) = µ 2 (δ N ) 2 + ω (A Aφ X (µ) + diag E ( Λ( j N i X j ) ))) ω. B Integrability of the mean squared error In the general model in Equations 7 9, we distinguish terms µ 2, φ X (µ), E(Λ( j N i )) from terms that only depend on the network G and on the assignment vector Z. The latter set of terms are not problematic to integrate since, due to the discrete and finite nature of 25

29 the network, which we assume fixed and available pre-intervention, they can be bounded by constants. However, the former terms all depend on parameters and might raise integrability issues. We note that these terms enter the problem linearly, so it is enough to check that they are integrable individually. For the normal-normal model in example 1 with prior specifications in Section 3.1 we have, µ 2 π(θ) dθ = σ µ 2 0 φx (µ)π(θ) dθ = λ σ r σ 1 E ( Λ( j N i ) ) π(θ) dθ = λ γ r γ 1 if r σ > 1 if r γ > 1. For the poisson-gamma model in example 2 with a Gamma prior for r and an Inverse-Gamma prior for λ, similar derivations confirm the integrability of these quantities as well. 26

Model-assisted design of experiments in the presence of network correlated outcomes

Model-assisted design of experiments in the presence of network correlated outcomes Model-assisted design of experiments in the presence of network correlated outcomes arxiv:1507.00803v4 [stat.me] 18 May 2017 Guillaume W. Basse, Edoardo M. Airoldi Department of Statistics Harvard University,

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

Identification and estimation of treatment and interference effects in observational studies on networks

Identification and estimation of treatment and interference effects in observational studies on networks Identification and estimation of treatment and interference effects in observational studies on networks arxiv:1609.06245v4 [stat.me] 30 Mar 2018 Laura Forastiere, Edoardo M. Airoldi, and Fabrizia Mealli

More information

Some challenges and results for causal and statistical inference with social network data

Some challenges and results for causal and statistical inference with social network data slide 1 Some challenges and results for causal and statistical inference with social network data Elizabeth L. Ogburn Department of Biostatistics, Johns Hopkins University May 10, 2013 Network data evince

More information

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu

More information

Research Note: A more powerful test statistic for reasoning about interference between units

Research Note: A more powerful test statistic for reasoning about interference between units Research Note: A more powerful test statistic for reasoning about interference between units Jake Bowers Mark Fredrickson Peter M. Aronow August 26, 2015 Abstract Bowers, Fredrickson and Panagopoulos (2012)

More information

arxiv: v1 [math.st] 28 Feb 2017

arxiv: v1 [math.st] 28 Feb 2017 Bridging Finite and Super Population Causal Inference arxiv:1702.08615v1 [math.st] 28 Feb 2017 Peng Ding, Xinran Li, and Luke W. Miratrix Abstract There are two general views in causal analysis of experimental

More information

Econometric Analysis of Network Data. Georgetown Center for Economic Practice

Econometric Analysis of Network Data. Georgetown Center for Economic Practice Econometric Analysis of Network Data Georgetown Center for Economic Practice May 8th and 9th, 2017 Course Description This course will provide an overview of econometric methods appropriate for the analysis

More information

arxiv: v3 [stat.me] 17 Apr 2018

arxiv: v3 [stat.me] 17 Apr 2018 Unbiased Estimation and Sensitivity Analysis for etwork-specific Spillover Effects: Application to An Online etwork Experiment aoki Egami arxiv:1708.08171v3 [stat.me] 17 Apr 2018 Princeton University Abstract

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

arxiv: v1 [stat.me] 3 Feb 2017

arxiv: v1 [stat.me] 3 Feb 2017 On randomization-based causal inference for matched-pair factorial designs arxiv:702.00888v [stat.me] 3 Feb 207 Jiannan Lu and Alex Deng Analysis and Experimentation, Microsoft Corporation August 3, 208

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

arxiv: v1 [math.st] 5 Jun 2015

arxiv: v1 [math.st] 5 Jun 2015 Exact P-values for Network Interference Susan Athey Dean Eckles Guido W. Imbens arxiv:1506.02084v1 [math.st] 5 Jun 2015 June 2015 Abstract We study the calculation of exact p-values for a large class of

More information

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

Inference for Average Treatment Effects

Inference for Average Treatment Effects Inference for Average Treatment Effects Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Average Treatment Effects Stat186/Gov2002 Fall 2018 1 / 15 Social

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Testing for arbitrary interference on experimentation platforms

Testing for arbitrary interference on experimentation platforms Testing for arbitrary interference on experimentation platforms arxiv:704.090v4 [stat.me] 29 Jan 209 Jean Pouget-Abadie, Guillaume Saint-Jacques, Martin Saveski, Weitao Duan, Ya Xu, Souvik Ghosh, Edoardo

More information

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability. June 19, 2017

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability. June 19, 2017 Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability Alessandra Mattei, Federico Ricciardi and Fabrizia Mealli Department of Statistics, Computer Science, Applications, University

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

arxiv: v4 [math.st] 20 Jun 2018

arxiv: v4 [math.st] 20 Jun 2018 Submitted to the Annals of Applied Statistics ESTIMATING AVERAGE CAUSAL EFFECTS UNDER GENERAL INTERFERENCE, WITH APPLICATION TO A SOCIAL NETWORK EXPERIMENT arxiv:1305.6156v4 [math.st] 20 Jun 2018 By Peter

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 21: More Networks: Models and Origin Myths Cosma Shalizi 31 March 2009 New Assignment: Implement Butterfly Mode in R Real Agenda: Models of Networks, with

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Optimal Blocking by Minimizing the Maximum Within-Block Distance

Optimal Blocking by Minimizing the Maximum Within-Block Distance Optimal Blocking by Minimizing the Maximum Within-Block Distance Michael J. Higgins Jasjeet Sekhon Princeton University University of California at Berkeley November 14, 2013 For the Kansas State University

More information

Clustering as a Design Problem

Clustering as a Design Problem Clustering as a Design Problem Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge Harvard-MIT Econometrics Seminar Cambridge, February 4, 2016 Adjusting standard errors for clustering is common

More information

Comment on Article by Scutari

Comment on Article by Scutari Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs

More information

arxiv: v1 [math.st] 7 Jan 2014

arxiv: v1 [math.st] 7 Jan 2014 Three Occurrences of the Hyperbolic-Secant Distribution Peng Ding Department of Statistics, Harvard University, One Oxford Street, Cambridge 02138 MA Email: pengding@fas.harvard.edu arxiv:1401.1267v1 [math.st]

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

Average Distance, Diameter, and Clustering in Social Networks with Homophily

Average Distance, Diameter, and Clustering in Social Networks with Homophily arxiv:0810.2603v1 [physics.soc-ph] 15 Oct 2008 Average Distance, Diameter, and Clustering in Social Networks with Homophily Matthew O. Jackson September 24, 2008 Abstract I examine a random network model

More information

Conditional randomization tests of causal effects with interference between units

Conditional randomization tests of causal effects with interference between units Conditional randomization tests of causal effects with interference between units Guillaume Basse 1, Avi Feller 2, and Panos Toulis 3 arxiv:1709.08036v3 [stat.me] 24 Sep 2018 1 UC Berkeley, Dept. of Statistics

More information

September 25, Abstract

September 25, Abstract On improving Neymanian analysis for 2 K factorial designs with binary outcomes arxiv:1803.04503v1 [stat.me] 12 Mar 2018 Jiannan Lu 1 1 Analysis and Experimentation, Microsoft Corporation September 25,

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

arxiv: v1 [stat.ml] 29 Jul 2012

arxiv: v1 [stat.ml] 29 Jul 2012 arxiv:1207.6745v1 [stat.ml] 29 Jul 2012 Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs Daniel L. Sussman, Minh Tang, Carey E. Priebe Johns Hopkins

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Inferring Causal Direction from Relational Data

Inferring Causal Direction from Relational Data Inferring Causal Direction from Relational Data David Arbour darbour@cs.umass.edu Katerina Marazopoulou kmarazo@cs.umass.edu College of Information and Computer Sciences University of Massachusetts Amherst

More information

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded 1 Background Latent confounder is common in social and behavioral science in which most of cases the selection mechanism

More information

Stratified Randomized Experiments

Stratified Randomized Experiments Stratified Randomized Experiments Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Stratified Randomized Experiments Stat186/Gov2002 Fall 2018 1 / 13 Blocking

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Causal inference for ordinal outcomes

Causal inference for ordinal outcomes Causal inference for ordinal outcomes Alexander Volfovsky, Edoardo M. Airoldi, and Donald B. Rubin arxiv:1501.01234v1 [stat.me] 6 Jan 2015 Department of Statistics, Harvard University January 7, 2015 Alexander

More information

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from Topics in Data Analysis Steven N. Durlauf University of Wisconsin Lecture Notes : Decisions and Data In these notes, I describe some basic ideas in decision theory. theory is constructed from The Data:

More information

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective 1 Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective Chuanhai Liu Purdue University 1. Introduction In statistical analysis, it is important to discuss both uncertainty due to model choice

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

Balancing Covariates via Propensity Score Weighting

Balancing Covariates via Propensity Score Weighting Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014

More information

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication,

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication, STATISTICS IN TRANSITION-new series, August 2011 223 STATISTICS IN TRANSITION-new series, August 2011 Vol. 12, No. 1, pp. 223 230 BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition,

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

Causal inference in multilevel data structures:

Causal inference in multilevel data structures: Causal inference in multilevel data structures: Discussion of papers by Li and Imai Jennifer Hill May 19 th, 2008 Li paper Strengths Area that needs attention! With regard to propensity score strategies

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha January 18, 2010 A2 This appendix has six parts: 1. Proof that ab = c d

More information

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS ANALYTIC COMPARISON of Pearl and Rubin CAUSAL FRAMEWORKS Content Page Part I. General Considerations Chapter 1. What is the question? 16 Introduction 16 1. Randomization 17 1.1 An Example of Randomization

More information

Consistency Under Sampling of Exponential Random Graph Models

Consistency Under Sampling of Exponential Random Graph Models Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient

More information

Dynamic Matrix-Variate Graphical Models A Synopsis 1

Dynamic Matrix-Variate Graphical Models A Synopsis 1 Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke

More information

Causal inference from 2 K factorial designs using potential outcomes

Causal inference from 2 K factorial designs using potential outcomes Causal inference from 2 K factorial designs using potential outcomes Tirthankar Dasgupta Department of Statistics, Harvard University, Cambridge, MA, USA 02138. atesh S. Pillai Department of Statistics,

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Bayesian Inference: Probit and Linear Probability Models

Bayesian Inference: Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow

More information

Knockoffs as Post-Selection Inference

Knockoffs as Post-Selection Inference Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:

More information

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman

More information

Propensity Score Matching

Propensity Score Matching Methods James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Methods 1 Introduction 2 3 4 Introduction Why Match? 5 Definition Methods and In

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA Groups of vertices and Core-periphery structure By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA Mostly observed real networks have: Why? Heavy tail (powerlaw most

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Simple Linear Regression Stat186/Gov2002 Fall 2018 1 / 18 Linear Regression and

More information

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Andrew A. Neath Department of Mathematics and Statistics; Southern Illinois University Edwardsville; Edwardsville, IL,

More information

How likely is Simpson s paradox in path models?

How likely is Simpson s paradox in path models? How likely is Simpson s paradox in path models? Ned Kock Full reference: Kock, N. (2015). How likely is Simpson s paradox in path models? International Journal of e- Collaboration, 11(1), 1-7. Abstract

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid UK Stata Users Group Meeting London, September 10, 2007 CONTENT Presentation of a Stata

More information

Econometric analysis of models with social interactions

Econometric analysis of models with social interactions Brendan Kline (UT-Austin, Econ) and Elie Tamer (Harvard, Econ) 2017 - Please see the paper for more details, citations, etc. The setting Social interactions or peer effects, or spillovers, or...: My outcome/choice

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Sensitivity analysis and distributional assumptions

Sensitivity analysis and distributional assumptions Sensitivity analysis and distributional assumptions Tyler J. VanderWeele Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA vanderweele@uchicago.edu

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

CompSci Understanding Data: Theory and Applications

CompSci Understanding Data: Theory and Applications CompSci 590.6 Understanding Data: Theory and Applications Lecture 17 Causality in Statistics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu Fall 2015 1 Today s Reading Rubin Journal of the American

More information

Identification and Estimation of Causal Effects from Dependent Data

Identification and Estimation of Causal Effects from Dependent Data Identification and Estimation of Causal Effects from Dependent Data Eli Sherman esherman@jhu.edu with Ilya Shpitser Johns Hopkins Computer Science 12/6/2018 Eli Sherman Identification and Estimation of

More information

PEARL VS RUBIN (GELMAN)

PEARL VS RUBIN (GELMAN) PEARL VS RUBIN (GELMAN) AN EPIC battle between the Rubin Causal Model school (Gelman et al) AND the Structural Causal Model school (Pearl et al) a cursory overview Dokyun Lee WHO ARE THEY? Judea Pearl

More information

Statistical and Computational Phase Transitions in Planted Models

Statistical and Computational Phase Transitions in Planted Models Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information