network-correlated outcomes

Size: px

Start display at page:

Download "network-correlated outcomes"

Rhoda Horn
6 years ago
Views:

1 Optimal design of experiments in the presence of network-correlated outcomes arxiv: v1 [stat.me] 3 Jul 2015 Guillaume W. Basse, Edoardo M. Airoldi Department of Statistics Harvard University, Cambridge, MA 02138, USA Guillaume W. Basse is a graduate student in the Department of Statistics at Harvard University (gbasse@fas.harvard.edu). Edoardo M. Airoldi is an Associate Professor of Statistics at Harvard University (airoldi@fas.harvard.edu). This work was partially supported by the National Science Foundation under grants CAREER IIS and IIS , and by the Office of Naval Research under grant YIP N Guillaume W. Basse is a Google Fellow in Statistics. Edoardo M. Airoldi is an Alfred Sloan Research Fellow, and a Shutzer Fellow at the Radcliffe Institute for Advanced Studies. The authors are grateful to Hugh Chipman, Tirthankar Dasgupta, and Donald B. Rubin for insightful and constructive discussion that helped to greatly improve this paper.

2 Abstract We consider the problem of how to assign treatment in a randomized experiment, when the correlation among the outcomes is informed by a network available preintervention. Working within the potential outcome causal framework, we develop a class of models that posit such a correlation structure among the outcomes, and a strategy for allocating treatment optimally, for the goal of minimizing the integrated mean squared error of the estimated average treatment effect. We provide insights into features of the optimal designs via an analytical decomposition of the mean squared error used for optimization. We illustrate how the proposed treatment allocation strategy improves on allocations that ignore the network structure, with extensive simulations. Keywords: Causal inference; Social network data; Design of experiments; Optimal treatment allocation.

3 Contents 1 Introduction Causal inference setup and estimands Related work Contributions Optimal design with network-correlated outcomes An estimator for the average treatment effect and its mean squared error Nuisance parameters and integrated mean squared error Modeling network-correlated outcomes Comparative performance analysis Set-up Performance evaluation under correct prior specifications Robustness under prior mis-specification Quantifying sources of variation in the integrated mean squared error Discussion Decomposition of the mean square error and interpretation An alternative strategy to find good designs based on point priors Alternative conceptualizations to leverage social connections in causal analyses Optimal design and the role of randomization References 20 A Derivation of mean squared error for the proposed models 25 B Integrability of the mean squared error 25

4 1 Introduction The past decade has witnessed a surge of interest in causal analyses in the context of social networks, social media platforms and online advertising (e.g., Christakis and Fowler, 2007; Aral et al., 2009; Bakshy et al., 2011, 2012; Bond et al., 2012; Gui et al., 2015; Phan and Airoldi, 2015). The challenging aspect of these applications, generally, is the presence of connections, or network data, observed pre-intervention, possibly with uncertainty, and often missing. For instance, we might be interested in identifying which users should be offered a social coupon that enables them to play a new online multiplayer game for free, with their Facebook friends. Here, the units of analysis are users and the connections are given by the observed friendship links among users on Facebook. Or we might be interested in identifying which advertisers should be invited to bid in an auction to place ads on a banner on the CNN front page. Here, the units are auctions and a connection amounts to having a number of advertisers participating in a pair of auctions. While there is a well-developed literature on several aspects of the statistical analysis of network data (e.g., Wasserman, 1994; Kolaczyk, 2009; Goldenberg et al., 2010; Bickel and Chen, 2009), the literature about statistical methods and theory for experimentation in the social and information sciences that leverages observed connections is at a budding stage (e.g., Rosenbaum, 2007; Hudgens and Halloran, 2008; Parker, 2011; Aronow and Samii, 2013; Ogburn and VanderWeele, 2014). Moreover, even when considering the average treatment effect, or average exposure effects, as the inferential targets of interest, multiple conceptualizations and assumptions are possible, which require different methodological approaches to achieve valid inference when analyzing the same experiment (Imbens and Rubin, 2015; Airoldi and Rubin, 2015). Here, we consider the problem of how to assign treatment in a causal experiment, when the correlation among the outcomes is informed by a network, available pre-intervention. 1

Suppose that the two possible assignments correspond to the estimators ˆτ a = Y 1 Y 3 and ˆτ b = Y 1 Y 2. Looking only at the variance for now, one can check that V(ˆτ a ) = 2 > 0.

5 Figure 1: Illustration of different assignments in the presence of observed connections. To illustrate the problem, consider the scenario in Figure 1 where the variance of Y i is V(Y i ) = 1 and the edge between nodes indicate a correlation of 0.9. Suppose that the two possible assignments correspond to the estimators ˆτ a = Y 1 Y 3 and ˆτ b = Y 1 Y 2. Looking only at the variance for now, one can check that V(ˆτ a ) = 2 > 0.2 = V(ˆτ b ), which illustrates two important points: first, the variance of the estimator depends significantly (by a factor of 10 here) on the assignment strategy, and second, the assignment that leverages the correlation structure (right panel) performs better that the assignment that mimics the independent case (left panel). The rest of this Section sets-up the problem, and places our contributions in the context of related work. The proposed methods are presented in Section 2 and illustrated with extensive simulations in Section 3. We discuss the results in Section Causal inference setup and estimands Let G be an undirected graph, and N be the number of nodes. Working within the potential outcomes framework (Rubin, 1974; Holland, 1986; Imbens and Rubin, 2015), we denote by Z= (Z 1,..., Z N ) the assignment vector, where Z i = 1 if unit i is assigned to treatment. Accepting the stable unit treatment value assumption (SUTVA henceforth, Rubin, 1980), the potential outcome of unit i depends only on its own treatment assignment Z i, therefore we can write its potential outcomes as Y i (Z i ). We can then define the vectors Y(0) = (Y 1 (0),..., Y N (0)) and Y(1) = (Y 1 (1),..., Y N (1)) for notational convenience. Causal inference methodology for analyzing randomized experiments, using potential outcomes, can be categorized into two general approaches: model-free and model-based (Imbens 2

6 and Rubin, 2015). The model-free approach, including the classical methodology by Neyman and Fisher (Fisher, 1922, 1954; Neyman, 1990; Speed, 1990; Rubin, 1990), posits the potential outcomes are constants, and the randomness comes from the assignment mechanism. The model-based approach posits the potential outcome vectors Y(0) and Y(1) are random variables (Rubin, 1978, 2005). This implies a joint distribution for (Y(1), Y(0)). Estimands of interest, denoted τ = τ(y(1), Y(0)), are also a random variables in this approach. In this paper we consider a set-up in between. Specifically, we consider a class of degenerate joint distributions for the potential outcomes vectors, where the estimand τ is a constant, Y(0) F (θ) s.t Cov(Y(0)) = Φ Y (G) Y(1) = Y(0) + τ1, where Φ Y (G) is a variance-covariance matrix, not necessarily diagonal, which depends on the network structure, as implied by the model introduced in Equations 7 9, and fully worked out in Equation 18, and θ denotes the parameters of the model for Y(0). 1.2 Related work The problem of design and analysis of causal experiments in situations where social connections (i.e., a social network) are available pre-intervention has recently received considerable attention. Two issues may arise from the availability of social connections: (i) social interference, that is, a situation where the potential outcomes for unit i are a function of the treatment assignments of other units that are related to unit i through the network, or they are function of the observed outcomes of these units, (ii) network-correlated outcomes, an alternative setting where the network informs the correlation among the potential outcomes, because the potential outcomes for unit i are a function of its covariates and those of other 3

7 units. Much of the current literature deals with social interference, as it is relevant to the estimation of peer effects, or peer influence (e.g., see Parker, 2011; Airoldi et al., 2012; Manski, 2013; Toulis and Kao, 2013; Aronow and Samii, 2013; Ugander et al., 2013; Eckles et al., 2014) on social networks and social media platforms (Aral et al., 2009; Bakshy et al., 2012; Bond et al., 2012; Ogburn and VanderWeele, 2014). When the inferential target is some causal measure of peer influence, SUTVA is an untenable assumption, since this assumption implies the causal effects of interference are zero by definition. Here, we focus on the case of network-correlated outcomes, assuming SUTVA holds. The scenario we consider subsumes a recent conceptualization of homophily (McPherson et al., 2001) as a special case. For a more precise discussion of the technical implications and differences between these settings, we refer to Section 4.3. A number of ideas developed in the classical literature on design of experiments can be adapted to the setting of designing experiments in the presence of network-correlated outcomes. Of particular interest to us is the line of work pioneered by Box and Lucas (1959), Draper and Hunter (1967), and Chaloner and Larntz (1989) on the design of experiments with non-linear response surfaces, and on strategy for how to deal with technical problems that arise in those situations. As we will show in Section 2.2, the optimal design in our setting also depends on the values of some unknown parameters. Similarly relevant to our work are the issues raised in Parker (2011). 1.3 Contributions Here, we offer three main contributions. First, we formalize the problem of optimal experimental design in the presence of network-correlated outcomes, and propose a general methodology for dealing with nuisance parameters at the design stage. Second, we introduce a class of models for network-correlated outcomes, for which we can compute the mean 4

8 squared error (MSE henceforth) in closed form, and we show empirically that minimizing the mean squared error implied by these models leads to significantly better treatment assignments than those that are currently considered optimal, but that ignore the network structure available at the design stage. Third, for a specific instance of our model, we show that the mean squared error decomposes into a sum of terms that have a clear interpretation. This decomposition offers valuable insights into desirable and undesirable unit assignments strategies for experiments on networks. 2 Optimal design with network-correlated outcomes We are interested in estimating the average treatment effect, under SUTVA, defined as τ 1 N i ( Yi (Z i = 1) Y i (Z i = 0) ), (1) or, equivalently, τ = Y(1) Y(0), as a consequence of the additivity assumption in Section An estimator for the average treatment effect and its mean squared error In order to estimate the average treatment effect, we propose the following estimator: ˆτ Z = 1 N 1 (Z) {i:z i =1} Y i(1) 1 N 0 (Z) {i:z i =0} Y i(0) (2) where N 0 (Z) and N 1 (Z) are the number of units assigned to the control and the treatment groups, respectively. If we measure the quality of our estimator using its mean squared error, then a natural goal when designing a randomized experiment is to chose the treatment 5

9 assignment vector Z that minimizes the mean squared error, Z opt = argmin Z MSE θ (ˆτ Z, τ). (3) The mean squared error can be written more explicitly as MSE θ (ˆτ Z, τ) = ω(z) Cov θ (Y(Z)) ω(z) (4) 1 + ( N 1 (Z) {i:z i =1} E θ(y i (0)) 1 N 0 (Z) {i:z i =0} E θ(y i (0))) 2 where θ parameterizes the joint model for the potential outcomes, and ω(z) is a vector of contrasts, with entries ω i (Z) = ω i (Z i ) = Z i /N 1 (1 Z i )/N 0 for i = 1... N. If we knew the parameter vector θ, the design problem would thus reduce to an optimization problem with the mean squared error as objective function. Such an optimization could be carried using a stochastic search algorithm over the possible treatment assignments. 2.2 Nuisance parameters and integrated mean squared error In general, the parameter vector θ is an unknown nuisance parameter, so we cannot evaluate the expression in Equation (4) directly. The presence of nuisance parameters in the objective function is a common issue in optimal design problems with nonlinear response surfaces. A popular way to address it is to posit a prior distribution for the nuisance parameters and integrate the mean squared error over the prior support (e.g., see Draper and Hunter, 1967; Chaloner and Verdinelli, 1995). Similarly, we modify our goal for designing an optimal experiment; we will seek the treatment assignment vector Z that minimizes the integrated 6

10 mean squared error, also known as Bayes risk in the optimal Bayesian design literature, Z opt = argmin Z MSE θ (ˆτ Z, τ) π(θ) dθ (5) argmin Z imse π (ˆτ Z, τ). However, it is difficult to obtain a closed form expression for the integrated MSE. Instead, we aim to minimize an estimator of the imse, obtained via Monte Carlo approximation, Z opt = argmin Z 1 n n MSE θi (ˆτ Z, τ) (6) i=1 argmin Z îmse π (ˆτ Z, τ), where θ i π. Conditions for the existence of the integrated mean squared error for the class of models considered in Section 2.3 are given in appendix. We study the robustness of the proposed method to variation in the estimated imse π in Section 3.4. Remark. A special case of this approach is to chose a point mass θ = θ 0 as a prior distribution (Box and Lucas, 1959). This approach is interesting to consider in our analysis, since for the proposed models the mean squared error is available in closed form. We leverage this observation to study the impact of different priors on the type of designs considered optimal in Section Modeling network-correlated outcomes In our exposition so far, we have abstracted away the network, simply assuming the existence of a correlation structure between outcomes. In the context of statistical network analysis (Kolaczyk, 2009; Goldenberg et al., 2010), many scientists have noticed that nodes sharing edges are more likely to share some similarity than unconnected nodes. This phenomenon, often referred to in the literature as homophily (McPherson et al., 2001; Hoff et al., 2001), 7

11 motivates the notion that the network structure might inform the correlation structure among the observed outcomes, when units in the population are connected through a social network. We introduce a simple class of models capturing this idea. A key motivation for the proposed models is that their simplicity leads to a closed form expression for the mean squared error of the estimator for the average treatment effect, which can be easily interpreted as discussed in Section 4.1. Let G be a network with N nodes (i.e., units), and let N i be the size of the closed neighborhood of node i (i.e., number of neighbors of node i, including node i). Define A the adjacency matrix (Kolaczyk, 2009) associated to the network G and A = A + I where I is the identity matrix. We propose the following class of models, which can be subsumed into the framework of Section 1.1, Y i (0) X ind NEF ( j N i X j, Λ( j N i X j ) ), for i = 1... N (7) X j iid NEF ( µ, φ X (µ) ), for j = 1... N (8) Y i (1) = Y i (0) + τ, for i = 1... N, (9) where Λ( j N i X j ) are positive scalars for i = 1... N and imply a diagonal specification for Φ Y X (µ), φ X (µ) is a positive scalar, and NEF(γ, Λ(γ)) denotes the natural exponential family distribution (Morris, 1982) with mean γ and variance function Λ(γ). The natural exponential family includes many popular distributions, making the proposed specifications flexible enough to model both continuous and discrete outcomes (Morris and Lock, 2009). Importantly, for these models, we can write the mean squared error of the estimator of the average treatment effect as MSE(ˆτ Z, τ) = µ 2 (δ N (Z)) 2 + ω(z) [ A Aφ X (µ) + diag(e(λ( j N i X j ))) ] ω(z) (10) 8

12 where the scalar, δ N (Z) = W i =1 N i W N 1 i =0 N i N 0, measures the difference in the average size of the neighborhoods for the units in the treatment and control groups. See the Appendix for details. The following two examples illustrate the flexibility of the class of models we introduced. Example 1 (Normal-Normal model). Consider the following model, Y i X ind Normal ( j N i X j, γ 2 ) X j iid Normal (µ, σ 2 ). This is a continuous instance of the general model introduced above, where we set µ = µ, φ X (µ) = σ 2, and Λ( j N i X i ) = γ 2. We can then write the mean squared error as, ( ) MSE(ˆτ Z, τ) = µ 2 (δ N ) 2 + σ 2 ω ( γ2 I + A A)ω. (11) σ 2 Example 2 (Poisson-Gamma model). Consider the following model, Y i X ind Poisson ( j N i X j ) X j iid λ Gamma (r). This is an discrete instance of the general model introduced above, where we set µ = rλ, φ X (µ) = µλ, and Λ( j N i X i ) = j N i X i. We can then write the mean squared error as, MSE(ˆτ Z, τ) = λ 2 r ( (δ N ) 2 + ω (A A + 1 λ diag(a A))ω ). (12) 9

13 3 Comparative performance analysis Here, we investigate the performance of the proposed methodology, empirically, and compare the MSE achieved by the optimal treatment assignment with the MSE of other design strategies, which either ignore the availability of social connections at the design stage, or that use them in a naïve way by clustering the network and using the clusters as strata. 3.1 Set-up We generated 100 networks from each of four different network models: Erdos-Renyi, Small- World, power-law degree distribution, and stochastic blockmodel (Goldenberg et al., 2010). For each network, we generated outcomes following the normal model in Section 2.3 with priors, γ 2 Inv-Gamma (r γ, λ γ ) µ Normal (µ 0, σ 0 ) σ 2 Inv-Gamma (r σ, λ σ ), where r γ = 3, λ γ = 1, µ 0 = 1, σ 0 = 0.5, λ σ = 1, and r σ = 2. We then evaluated the proposed optimal treatment assignment, as well as the assignments obtained by alternative strategies, using the integrated mean squared error evaluated under the true model. In Section 3.2, we consider the ideal case, in which we know the complete data generating process, and we use an estimate of the true integrated mean squared error as the objective function. In Section 3.3, we investigate the behavior of the proposed optimal treatment allocation strategy in a realistic situation, in which we have no knowledge of the hyper-parameters and we specify them incorrectly. 10

14 3.2 Performance evaluation under correct prior specifications Here, we investigate the performance of the proposed method when we know the full data generating process, and compare it with two other strategies. The first strategy ignores the network structure, and yields balanced randomized designs. The second strategy leverages the network structure by first partitioning the network into clusters, and then performing balanced randomizations within clusters, thus mimicking a stratified design. We evaluate the designs obtained from each method by comparing their imse computed under the true prior. Figure 2 shows a histogram of the imse of the proposed method relative to the imse of the naïve allocation strategy, which completely ignore the network structure, on the X axis. Figure 3 shows a similar histogram of the imse of the proposed method relative to the imse of the stratified randomized allocation, where strata were obtained using spectral clustering. The different panels of both figures report the relative imse results for the four network models separately, as indicated on the top of each panel. The results clearly indicate that the proposed method reduces the imse by at least 25% (the histogram bars fall below 0.75) in the majority of the simulations when compared to the na ve allocation strategy, independently of the network model, and by at least 20% (the histogram bars fall below 0.80) in the majority of the simulations when compared to the stratified allocation strategy, independently of the network model. 3.3 Robustness under prior mis-specification Here, we investigate the behavior of our method when we use the proposed generative model, but specify the wrong hyper-parameters. More in detail, we explored the comparative performance of the proposed methods over a range of ten priors for µ (i.e., we minimize the imse corresponding to the wrong priors), by letting the corresponding pair of hyper-parameters 11

15 ER power law count 0 sbm small world imse/balanced Figure 2: Histograms of the imse achieved with the proposed design relative to the imse obtained with a randomized balanced design, for outcomes generated according to four network models. ER power law count 0 sbm small world imse/spectral Figure 3: Histograms of the imse achieved with the proposed design relative to the imse obtained with a stratified design, for outcomes generated according to four network models. (µ 0, σ 0 ) take the following values: (1,0.5), (2,0.7), (5,1), (7, 1.2), (10,1.5), (15, 2), (20, 2.5), (30,3), (40,4), and (50, 5). The first prior (µ 0 = 1, σ 0 = 0.5) corresponds to the prior used to generate the outcome data, and the last one (µ 0 = 50, σ 0 = 5) corresponds to a prior that is 12

16 ER power law sbm small world true balanced true balanced true balanced true balanced factor(prior) imse Figure 4: Robustness of our method to prior mis-specifications very far from the correct distribution. Note that the integrated mean squared error is most sensitive to the parameter µ in the data generating process, so we are exploring a serious case of prior mis-specification. Figure 4 summarizes the performance of the proposed method when we use the wrong prior, over different network models. The results suggest that, although the performance degrades as the prior used to find the optimal design is further from the truth, as expected, the imse does not degrade substantially, and the proposed assignment significantly outperforms, on average, the random balanced designs provided for comparison in the rightmost column. Another attractive feature of the proposed design strategy is that it reduces the variation in the tails of the achieved imse; that is, we do not observe the kind of long tails we see for the random balanced designs. We offer some intuition behind the empirical robustness of the proposed strategy in Section

17 3.4 Quantifying sources of variation in the integrated mean squared error Here, we carry out analysis of variance on collection of comparative performance evaluation results presented above. The goal is to provide quantitative evidence of the robustness of the proposed optimal treatment assignment strategy to prior misspecification, and to compare the contribution of the sources of variation we considered in our simulation study (namely, treatment allocation strategy, network models, replications, prior specification/misspecification, variability of the estimated imse objective) to the overall integrated mean squared error. The choice of treatment allocation strategy (completely randomized, stratified/spectral, minimum imse) explains most of the variance observed in the simulations (MSS = 11,673), while the choice of prior hyper-parameters and the particular network model regime have little impact (MSS = 10.9 and MSS = 14.2, respectively). As described in Section 3.1, the minimization of imse is performed using a stochastic search algorithm over the space of treatment allocation vectors, Z Z, which means that even though the parameters underlying the optimization problem are the same, the optimal allocation vectors obtained for each replication might differ. A measure of the variation in the integrated mean squared error due to the stochastic nature of this minimization is given by the residuals of the analysis of variance, which is the smallest source of variation in the imse (MSS=3.0) when compared to the factors. This is reassuring. As we discuss in Section 2.2, we do not have a closed form formula for the integrated mean squared error. Instead, we use a Monte-Carlo estimate of this objective used for finding the optimal treatment assignment. Given the stochastic nature of the imse estimation routine, which is separate form the stochastic nature of the stochastic search that finds the optimal treatment assignment, it is difficult to analytically assess the impact of this approximation. 14

18 On the one hand, its magnitude is also captured by the residual sum of squares reported above (MSS=3.0). On the other hand, we can offer additional empirical evidence that the estimation of the IMSE objective is well behaved. We considered pairs of Monte-Carlo estimates, îmse 1 and îmse 2, of the same objective function, imse. If Z A and Z B are any two designs, such that, imse (Z A ) imse (Z B ) then we observe empirically that we have îmse 1 (Z A ) îmse 1 (Z B ) and îmse 2 (Z A ) îmse 2 (Z B ) in the vast majority of simulations. In other words, different estimates of the objective function are unlikely to alter the ranking of treatment assignments, in terms of their imse. So if an assignment is better than another, under an estimate of the objective function, it is also likely to be better under the true imse. The only unexplored source of variation at this point is given by mis-specifications of the model for the outcomes itself. We expect this to be qualitatively smaller than the variation due to the treatment allocation strategy (the main source of variation in the analysis above) because of the flexibility of the proposed model in explaining variation in the data, which can be inferred from its similarities to other models of network data that encode homophily, and to other exchangeable network models(e.g., see Wasserman, 1994; Kolaczyk, 2009; Goldenberg et al., 2010). 4 Discussion 4.1 Decomposition of the mean square error and interpretation The simulations in Section 3.3 show that the proposed method to find optimal treatment allocations is fairly robust to prior mis-specification. Robustness to prior mis-specification was also observed on simulation experiments, not reported here, when optimizing the mean squared error using point priors (Box and Lucas, 1959). Here, we investigate this feature in more details, by studying the analytical form of the MSE under the normal model introduced 15

19 in Section 2.3, MSE Z (ˆτ, τ) = µ 2 (δ N (Z)) 2 + γ 2 ω(z) ω(z) + σ 2 ω(z) A Aω(Z). }{{}}{{} bias 2 variance Below, we expand each of the terms and offer some intuition about what they capture. Interpretation of the bias term. The bias term can be expanded as follows, µ 2 (δ N ) 2 = µ 2 ( 1 N 1 {i:z i =1} N i 1 N 0 {i:z i =0} N i ) 2. (13) The bias is proportional to the difference in the average size of the neighborhoods for the units in the treatment and control groups. Intuitively, this difference measures a lack of balance between the two groups, in terms of their network characteristics specifically, the average degree. Larger values of the mean µ amplify this average imbalance in neighborhood sizes. Since the experimenter does not have control over µ, desirable treatment assignments have minimize bias by balancing the units assigned to treatment and control in terms of neighborhood sizes. Interpretation of the variance terms. The first variance term is, γ 2 ω ω = γ 2 ( 1 N N 0 ), (14) is minimized when N 1 = N 0. Intuitively, this term penalizes lack of balance between the sizes of the treatment and control groups. Larger values of the parameter γ amplify this average imbalance in group sizes. This is consistent with classical results on the optimality of balanced randomizations for estimating the average treatment effect (Fisher, 1954; Imbens and Rubin, 2015). 16

20 The second variance term involves features of the network, σ 2 ω A Aω = σ2 N σ2 N 2 0 {i,j:z i =Z j =1} N i N j (15) {i,j:z i =Z j =0} N i N j (16) 2σ2 N 1 N 0 {i,j:z i =1 and Z j =0} N i N j. (17) The term on right hand side of Equation 15 is proportional to the average number of neighbors in common between pairs of units assigned to the treatment group. The term in Equation 16 is proportional to the average number of neighbors in common between pairs of units assigned to the control group. The term in Equation 17 is proportional to the average number of neighbors in common between pairs of units, one assigned to the treatment and one assigned to the control groups. Intuitively, looking at the signs in front of these three factors, the second variance term is minimized by assigning units with shared neighbors to different groups, and by avoiding assigning treatment or control to entire clusters of units that are densely connected. The interpretation of the factors in the decomposition of the mean squared error presented above suggests an explanation for the empirical robustness to prior mis-specifications (of the proposed methodology for finding optimal designs) reported in Section 3.3. Prior misspecifications can be expected to have negligible effects on the mean squared error, as long as the treatment allocation respects a few key measures of balance. Namely, balance in the sizes of the treatment and control groups, balance in the average size of the neighborhoods for units in the treatment and control groups, balance in the assignment of units with shared neighbors to different groups, balance in assignment of treatment to dense clusters. These quantities are not a function of the prior hyper-parameters, and thus are unaffected by prior mis-specifications. 17

21 4.2 An alternative strategy to find good designs based on point priors An arguably simpler approach to optimal design, which builds on the notion of point priors (Box and Lucas, 1959) noted in Section 2.2, which also requires a prior distribution π(θ) on the parameter space, is the following three-steps procedure. First, consider multiple sets of parameters θ 1,..., θ k, and the associated mean squared error functions MSE θ1,..., MSE θk, and obtain k optimal designs Z (1),..., Z (k) by minimizing each of the k MSE functions separately. Second, evaluate each design under all the MSE functions; let s denote Γ ij MSE θj (Z (i) ). Third, build a loss function for Z, by averaging the mean squared error achieved by Z across the sets of parameters θ 1,..., θ k with weights equal to the density of the corresponding parameter sets under the prior distribution. That is, for each Z (i) we define the following loss, L(Z (i) ) = k j=1 Γ i,j π(θ j ). We then select the design with the smallest loss as the best among the k candidates. This method has the benefit of not relying on a numerical approximation for the integrated mean squared error, but has several limitations in practice. It requires running a stochastic optimization algorithm for each set of parameters. The choice of the k sets of parameters is somewhat arbitrary, and quickly runs into issues if the parameters space is large and/or multidimensional. In simulations (not reported here) this simpler strategy did not perform as well as the proposed approach. 18

22 4.3 Alternative conceptualizations to leverage social connections in causal analyses In Section 1.2 we hinted at how the notions of social interference and network-correlated outcomes relate to and differ form each other. The distinction is subtle, since both settings involve correlation structure among the outcomes. To appreciate the differences, let us consider how the correlation structure arises in typical causal settings where models for the outcomes are considered (Rubin, 1978). Let Σ be a diagonal covariance matrix, and Φ Z (G), Φ Y (G) be two full covariance matrices both functions of the network G. Then we can write, Cov(Y(Z) Z) = Σ = Cov(Y(Z)) Cov(Y(Z) Z) = Φ Y (G) = Cov(Y(Z)) Cov(Y(Z) Z) = Σ no interference network-correlated outcomes (our set-up) interference Cov(Y(Z)) = Φ Z (G) In the absence of interference, the potential outcomes are independent both conditionally and marginally. In our set-up, there is correlation among the potential outcomes conditionally on the assignment, as well as marginally. In the case of interference, the potential outcomes are independent conditionally on the assignment, but not marginally. The correlation arises when integrating out the assignment mechanism. The effects of interference and networkcorrelated outcomes cannot be identified in observational studies, in the absence of any assumptions (Shalizi and Thomas, 2011). Carefully designed experiments are necessary. We offered minimal assumptions and a strategy to design optimal experiments in the presence of network-correlated outcomes. 19

23 4.4 Optimal design and the role of randomization There is a lively debate in the optimal experimental design literature, between those supporting a set of optimal treatment assignments to enable randomization in practice (Harville, 1975; Kempthorne, 1977), mostly in the sciences and healthcare, and those favoring a single optimal treatment assignment (Kiefer, 1959), mostly in industrial engineering. In practice, randomization balances, on average, the effect of unobserved and potentially relevant covariates so called confounders. More generally, randomization improves the robustness of inference to model mis-specifications. In addition, when working within the Bayesian framework, randomization is an ignorable assignment mechanism that greatly simplifies inference (Rubin, 1978). The proposed methodology yields a trivially ignorable assignment mechanism, thus subsequent analyses are not any more complicated than if we had considered randomization instead. We are not leveraging covariates for balance, although these checks are simple to add. However, the proposed strategy is ultimately informed by the model specification, although we have illustrated the robustness to prior mis-specifications. There is an important place for randomization in designing optimal experiments in the presence of network-correlated outcomes. We have recently obtained results in this directions in the case of interference, and we plan on leveraging some of the insights in the setting we considered here, elsewhere. References Airoldi, E. M. and Rubin, D. B. (2015). Principles for leveraging information about a social network when designing and analyzing experiments. Proceedings of the National Academy of Sciences. Forthcoming. Airoldi, E. M., Toulis, P., Kao, E., and Rubin, D. B. (2012). Causal estimation of peer 20

24 influence effects. In NIPS Workshop on Social Network and Social Media Analysis. Aral, S., Muchnik, L., and Sundararajan, A. (2009). Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51): Aronow, P. M. and Samii, C. (2013). Estimating average causal effects under interference between units. arxiv preprint arxiv: Bakshy, E., Hofman, J. M., Mason, W. A., and Watts, D. J. (2011). Everyone s an influencer: quantifying influence on twitter. In Proceedings of the fourth ACM international conference on Web search and data mining, pages ACM. Bakshy, E., Rosenn, I., Marlow, C., and Adamic, L. (2012). The role of social networks in information diffusion. In Proceedings of the 21st international conference on World Wide Web, pages ACM. Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and newman girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50): Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E., and Fowler, J. H. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415): Box, G. E. and Lucas, H. (1959). Design of experiments in non-linear situations. Biometrika, pages Chaloner, K. and Larntz, K. (1989). Optimal bayesian design applied to logistic regression experiments. Journal of Statistical Planning and Inference, 21(2): Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, pages

25 Christakis, N. A. and Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England journal of medicine, 357(4): Draper, N. R. and Hunter, W. G. (1967). The use of prior distributions in the design of experiments for parameter estimation in non-linear situations. Biometrika, 54(1-2): Eckles, D., Karrer, B., and Ugander, J. (2014). Design and analysis of experiments in networks: Reducing bias from interference. arxiv preprint arxiv: Fisher, R. A. (1922). On the interpretation of χ 2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85(1): Fisher, R. A. (1954). Statistical Methods for Research Workers. Oliver and Boyd. Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends R in Machine Learning, 2(2): Gui, H., Xu, Y., Bhasin, A., and Han, J. (2015). Network A/B testing: From sampling to estimation. In International World Wide Web Conference (WWW), pages , Florence, Italy. Harville, D. A. (1975). Experimental randomization: who needs it? The American Statistician, 29(1): Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2001). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97: Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396): Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. Journal of the American Statistical Association, 103(482). 22

26 Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press. Kempthorne, O. (1977). Why randomize? Journal of Statistical Planning and Inference, 1(1):1 25. Kiefer, J. (1959). Optimum experimental designs. Journal of the Royal Statistical Society. Series B (Methodological), pages Kolaczyk, E. D. (2009). Statistical analysis of network data: methods and models. Springer Science & Business Media. Manski, C. F. (2013). Identification of treatment response with social interactions. The Econometrics Journal, 16(1):S1 S23. McPherson, M., Smith-Lovin, L., and Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual review of sociology, pages Morris, C. N. (1982). Natural exponential families with quadratic variance functions. The Annals of Statistics, pages Morris, C. N. and Lock, K. F. (2009). Unifying the named natural exponential families and their relatives. The American Statistician, 63(3): Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5(4): Translated and edited by D. M. Dabrowska and T. P. Speed from Polish original, which appeared in Roczniki Nauk Rolniczych Tom X (1923) 1 51 (Annals of Agricultural Sciences). Ogburn, E. L. and VanderWeele, T. J. (2014). Vaccines, Contagion, and Social Networks. ArXiv e-prints. Parker, B. M. (2011). Design of networked experiments. Presented at the Isaac Newton Institute, Cambridge University, UK. 23

27 Phan, T. Q. and Airoldi, E. M. (2015). A natural experiment of social network formation and dynamics. Proceedings of the National Academy of Sciences, 112(21): Rosenbaum, P. R. (2007). Interference between units in randomized experiments. Journal of the American Statistical Association, 102(477): Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5): Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, pages Rubin, D. B. (1980). Comment. Journal of the American Statistical Association, 75(371): Rubin, D. B. (1990). Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5(4): Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469): Shalizi, C. R. and Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2): Speed, T. P. (1990). Introcutory remarks on Neyman (1923). Statistical Science, 5(4): Toulis, P. and Kao, E. (2013). Estimation of causal peer influence effects. Journal of Machine Learning Research, W&CP, 28(3): Ugander, J., Karrer, B., Backstrom, L., and Kleinberg, J. (2013). Graph cluster randomization: Network exposure to multiple universes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM. 24

28 Wasserman, S. (1994). Social network analysis: Methods and applications, volume 8. Cambridge university press. A Derivation of mean squared error for the proposed models First we derive the following correlation for the model in Equations 7 9, N i N j φ X (µ), if i j Cov(Y i, Y j ) = N i N j φ X (µ) + E ( Λ( j N i X j ) ), if i = j, (18) which gives the element i, j of the marginal covariance matrix Φ Y (G). It follows, ( V(ˆτ Z ) = ω (A Aφ X (µ) + diag E ( Λ( j N i X j ) ))) ω. (19) The bias is, Bias(ˆτ Z, τ) = µ 2 (δ N ) 2, (20) which gives the overall mean squared error in Equation 10, ( MSE(ˆτ Z, τ) = µ 2 (δ N ) 2 + ω (A Aφ X (µ) + diag E ( Λ( j N i X j ) ))) ω. B Integrability of the mean squared error In the general model in Equations 7 9, we distinguish terms µ 2, φ X (µ), E(Λ( j N i )) from terms that only depend on the network G and on the assignment vector Z. The latter set of terms are not problematic to integrate since, due to the discrete and finite nature of 25

29 the network, which we assume fixed and available pre-intervention, they can be bounded by constants. However, the former terms all depend on parameters and might raise integrability issues. We note that these terms enter the problem linearly, so it is enough to check that they are integrable individually. For the normal-normal model in example 1 with prior specifications in Section 3.1 we have, µ 2 π(θ) dθ = σ µ 2 0 φx (µ)π(θ) dθ = λ σ r σ 1 E ( Λ( j N i ) ) π(θ) dθ = λ γ r γ 1 if r σ > 1 if r γ > 1. For the poisson-gamma model in example 2 with a Gamma prior for r and an Inverse-Gamma prior for λ, similar derivations confirm the integrability of these quantities as well. 26

Model-assisted design of experiments in the presence of network correlated outcomes

Model-assisted design of experiments in the presence of network correlated outcomes arxiv:1507.00803v4 [stat.me] 18 May 2017 Guillaume W. Basse, Edoardo M. Airoldi Department of Statistics Harvard University,