September 25, Abstract

Size: px

Start display at page:

Download "September 25, Abstract"

Anne Carr
5 years ago
Views:

1 On improving Neymanian analysis for 2 K factorial designs with binary outcomes arxiv: v1 [stat.me] 12 Mar 2018 Jiannan Lu 1 1 Analysis and Experimentation, Microsoft Corporation September 25, 2018 Abstract 2 K factorialdesignsarewidely adoptedbystatisticiansand the broaderscientificcommunity. In this short note, under the potential outcomes framework (Neyman, 1923; Rubin, 1974), we adopt the partial identification approach and derive the sharp lower bound of the sampling variance of the estimated factorial effects, which leads to an improved Neymanian variance estimator that mitigates the over-estimation issue suffered by the classic Neymanian variance estimator by Dasgupta et al. (2015). Keywords: Partial identification; potential outcome; randomization; robust inference 1. INTRODUCTION Originally introduced for agricultural research at the famous Rothamsted Experimental Station more than a century ago (Fisher, 1926; Yates and Mather, 1963), randomized controlled factorial designs (Fisher, 1935) have been widely adopted by researchers in social, behavior and biomedical sciences to simultaneously assess the main and interactive causal effects of multiple treatment factors. In applied research, a very frequently encountered scenario is where not only the treatment factors but also the analysis endpoint(s) are binary. For instance, Harris (1974) studied whether Address for correspondence: Jiannan Lu, One Microsoft Way, Redmond, Washington , U.S.A. jiannl@microsoft.com 1

2 gender difference and dressing in high status clothing lead to occurrences of various aggressive behaviors after frustrating incidents. Nair et al. (2008) explored how differentiating the format of computer-aided encouragement program (e.g., a single session vs. multiple occurrences, personalized vs. more general feedback and advice) affected the abstinence from smoking. Stampfer et al. (1985) investigated whether aspirin and β carotene could help prevent cardiovascular mortality. For such studies, to guarantee trustworthy discovery and reporting of causal effects that are scientifically meaningful, it is imperative to adopt an interpretable and robust methodology for estimation and inference. Indeed, as pointed out by Brunner and Puri (2001), the analysis of factorial designs is one of the most important and frequently encountered problems in statistics. During recent years, the potential outcomes framework (Neyman, 1923; Rubin, 1974, 1990) has drawn increasing attractions, because it enjoys clear interpretability (causal effects are defined as comparisons between potential outcomes under different treatments), flexible scopes of interests (i.e., finite-population or super-population), and robustness to model mis-specifications (because potential outcomes are treated as fixed constants). In particular, it is worth mentioning that, the only source of randomness under the potential outcomes framework is the treatment assignment itself, which Rubin (1990, 2008) advocated as the gold standard for causal inference. Realizing this salient feature of the potential outcomes framework, Dasgupta et al. (2015) extended it to 2 K factorial designs, and claimed that the proposed randomization-based approach results in better understanding of the estimands and allows greater flexibility in statistical inference of factorial effects, compared to the commonly used linear model based approach. However, as widely acknowledged by the causal inference community and extensively discussed in the existing literature (e.g., Aronow et al., 2014; Ding and Dasgupta, 2016; Ding, 2017; Imbens and Rubin, 2015, Section 6.5), a long-standing and fundamental challenge faced by the Neymanian framework (both in general and for factorial designs) is that, it over-estimates the sampling variances of the estimated factorial effects, because we cannot jointly observe the potential outcomes under different treatments, and therefore directly identify the strengths of association between them. This missing data problem is sometimes referred to as the fundamental problem of causal inference (e.g., Holland, 1986; Imbens and Rubin, 2015, Section 1.3). Among the numerous proposals that mitigate the variance over-estimation of the Neymanian causal inference framework, one solution that completely preserves the randomization-based fla- 2

3 vor is the partial identification approach (Richardson et al., 2014), which is widely employed by both statisticians (e.g., Cheng and Small, 2006; Zhang and Rubin, 2003; Aronow et al., 2014; Lu et al., 2015; Ding and Dasgupta, 2016) and econometricians (e.g., Fan and Park, 2010). The key idea of partial identification, in the context of factorial designs, is that although we cannot directly identify the sampling variances of the estimated factorial effects, we can derive their sharp lower bounds which are identifiable from observed data, which leads to an improved Neymanian variance estimator that guarantees better performance, regardless of the underlying dependency structure of the potential outcomes (however, the extent of performance improvement depends on the dependency structure). Along this line of research, Robins (1988) and Ding and Dasgupta (2016) solved the problem for treatment-control studies (i.e., 2 1 factorial designs), and in a recent paper Lu (2017) proposed the said improved variance estimator for 2 2 factorial designs. Nevertheless, we still need a unifying framework applicable to general 2 K factorial designs, which to our best knowledge is lacking from the existing literature. From a theoretical perspective, it seems non-trivial to generalize the main results in Lu (2017) to arbitrary 2 K factorial designs, because the complexity of the dependency structure of the potential outcomes grow exponentially as K increases. From a practical perspective, although 2 2 factorial designs seem common in applied research, high-order factorial designs were also frequently employed (Berkowitz and Daniels, 1964; Kim et al., 2008; Yuan et al., 2008) (e.g., to screen a large number of candidate treatment factors). In this paper, we fill this theoretical gap by deriving the desired improved Neymanian variance estimator, for arbitrary 2 K factorial designs. We organize the remainder of the paper as follows. Section 2 reviews Dasgupta et al. (2015) s randomization-based causal inference framework for 2 K factorial designs, with a primary focus on binary outcomes. Section 3 first highlights the variance over-estimation issue that the Neymanian causal inference framework suffers, and presents the improved Neymanian variance estimator that achieves unified bias-correction. Section 4 concludes with a discussion. 3

4 2. NEYMANIAN INFERENCE FOR FACTORIAL DESIGNS 2.1. Factorial designs We adapt some materials from Dasgupta et al. (2015) and Lu (2016a,b) to review the Neymanian causal inference framework for 2 k factorial designs. To maintain consistency, we inherit the set of notations from Lu (2017). Consider K( 1) distinct treatment factors with two-levels -1 (placebo) and 1 (active treatment), resulting a total number of J = 2 K treatment combinations, labelled as z 1,...,z J. Their definitions depend on the J J model matrix H = (h 0,...,h J 1 ) (c.f. Wu and Hamada, 2009), constructed as follows: 1. Let h 0 = 1 J ; 2. For k = 1,...,K, construct h k by letting its first 2 K k entries be -1, the next 2 K k entries be 1, and repeating 2 k 1 times; 3. If K 2, order all subsets of {1,...,K} with at least two elements, first by cardinality and then lexicography. For k = 1,...J 1 K, let σ k be the kth subset and h K+k = l σ k h l, where stands for entry-wise product. Given the resulting design matrix H, the jth row of the corresponding sub-matrix (h 1,...,h K ) is the jth treatment combination z j, for j = 1,...,J. For example, for K = 2, H = h 0 h 1 h 2 h 3 h 4 h 5 h 6 h ,

5 Therefore, the treatment combinations are z 1 = ( 1, 1, 1), z 2 = ( 1, 1,1), z 3 = ( 1,1, 1), z 4 = ( 1,1,1), z 5 = (1, 1, 1), z 6 = ( 1, 1,1), z 7 = (1,1, 1) and z 8 = (1,1,1), respectively Potential outcomes and factorial effects Consider 2 K factorial designs with N( 2 K+1 ) experimental units. Under the Stable Unit Treatment Value Assumption (SUTVA, Rubin, 1980), for all i = 1,...,N, we let Y i (z j ) {0,1} be the binary potential outcome of unit i under treatment z j, and Y i = {Y i (z 1 ),...,Y i (z J )}. To simplify future notations, let D k1,...,k J = N J 1 {Yi (z j )=k j } (k 1,...,k J {0,1}). By definition 1 k 1 = k J =0 D k 1,...,k J = N. Consequently, we can uniquely determine the potential outcomes using the following 2 J dimensional vector: D = (D 0,0,...,0,D 0,0,...,1,...,D 1,1,...,0,D 1,1,...,1 ), where the indices are ordered binary representations of {0,...,J 1}, respectively. Furthermore, for all {j 1,...,j s } {1,...,J}, let N j1,...,j s = N N 1 {Yi (z j1 )=1,...,Y i (z js )=1} = s Y i (z jr ). r=1 For j = 1,...,J, let the average potential outcome for z j is p j = N j /N, and let p = (p 1,...,p J ). For all l = 1,...,J 1, Dasgupta et al. (2015) defined the lth individual- and population-level factorial effects as τ il = 2 (K 1) h l Y i (i = 1,...,N); τ l = 2 (K 1) h lp. (1) 2.3. Factorial effects and randomization-based inference We consider a completely randomized treatment assignment. Let n 1,...,n J be positive constants such that n j = N. For all j = 1,...,J, randomly assign n j 2 units to z j. For all i = 1,...,N, 5

6 we let 1, if unit i is assigned to z j, W i (z j ) = 0, otherwise (j = 1,...,J). The observed outcomes for unit i is therefore Y obs i potential outcome for z j be ˆp j = n obs j /n j, where = J W i(z j )Y i (z j ). Let the average observed n obs j = N W i (z j )Y i (z j ) = i:w i (z j )=1 Y obs i (j = 1,...,J). Denote ˆp = (ˆp 1,..., ˆp J ). The randomization-based estimator for the factorial effects is ˆ τ l = 2 (K 1) h lˆp (l = 1,...,J 1). (2) Motivated by the seminal work of Dasgupta et al. (2015), Lu (2016b) derived the sampling variance of the estimator in (2) as where Var(ˆ τ l ) = S 2 j = (N 1) 1 N 1 2 2(K 1) is the variance of potential outcomes for z j, and Sj/n 2 j 1 N S2 ( τ l ), (3) { Yi (z j ) Ȳ(z 1) } 2 = N N 1 p j(1 p j ) N S 2 ( τ l ) = (N 1) 1 (τ il τ l ) 2 is the variance of the lth individual-level factorial effects among the experimental units. Following Dasgupta et al. (2015), Lu (2016b) derived the Neymanian estimator for the sampling variance (3), by substituting S 2 j with its unbiased estimate s 2 j = (n j 1) 1 N W i (z j ){Y obs i Ȳobs (z j )} 2 = n j n j 1ˆp j(1 ˆp j ), 6

7 and the unidentifiable S 2 ( τ l ) with its obvious lower bound, zero: Var Ney (ˆ τ l ) = 2 2(K 1) s 2 j /n j = 2 2(K 1) ˆp j (1 ˆp j ) n j 1, (4) This estimator is conservative, on average over-estimating the true sampling variance by } E{ VarNey (ˆ τ l ) Var(ˆ τ l ) = S 2 ( τ l )/N. The bias is generally positive, unless strict additivity (Dasgupta et al., 2015; Ding and Dasgupta, 2016; Ding, 2017) holds, i.e., τ il = τ i l for all i i. In other words, all experiment units have identical treatment effects. Several researchers (e.g., LaVange et al., 2005; Rigdon and Hudgens, 2015) pointed out that this condition is too strong in practice, especially for binary outcomes. In cases where the strict additivity does not hold, the estimator in (4) might be too conservative, as acknowledged by Aronow et al. (2014). 3. THE IMPROVED NEYMANIAN VARIANCE ESTIMATOR The core idea of the partial identification approach is to derive a non-zero and identifiable lower bound of S 2 ( τ l ). To achieve this goal, we rely on the following lemmas, which are 2 K versions of the corresponding 2 2 versions in Lu (2017). However, it is worth mentioning that, the original proofs of the 2 2 versions in Lu (2017) relied heavily on the inclusion-exclusion principle and Boole s inequality, and therefore are extremely difficult to be generalized to arbitrary 2 K factorial designs. To partially circumstance this technical difficulty, we adopt a methodology that is much simpler and more intuitive than the one used by Lu (2017). Lemma 1. For all l = 1,...,J 1, let h l = (h 1l,...,h Jl ), and S 2 ( τ l ) = 1 2 2(K 1) (N 1) N j + h lj h lj N jj j j N N 1 τ2 l. 7

8 Proof. The proof largely follows Lu (2017). First, by (1) N τil 2 = N 2 2(K 1) (h l Y i) 2 N = 2 2(K 1) h lj Y i (z j ) N = 2 2(K 1) h 2 lj Y i 2 (z j )+ h lj h lj Y i (z j )Y i (z j ) j j = 2 2(K 1) N h 2 lj 2 Y 2 i (z j)+ j j h lj h lj = 2 2(K 1) N j + h lj h lj N jj. j j N Y i (z j )Y i (z j ) Therefore ( N ) S 2 ( τ l ) = (N 1) 1 τil 2 N τ2 l = 1 2 2(K 1) (N 1) N j + h lj h lj N jj N N 1 τ2 l. j j Lemma 2. For all l = 1,...,J 1, N j + h lj h lj N jj h lj N l j j, (5) and the equality in (5) holds if and only if τ il {τ il +2 (K 1) } = 0 (i = 1,...,N) (6) or τ il {τ il 2 (K 1) } = 0 (i = 1,...,N). (7) 8

9 Proof. To prove that (5), we break it down into two parts: N j + h 1j h 1j N jj h 1j N l (8) j j and N j + h 1j h 1j N jj h 1j N l. (9) j j For the equality in (5) to hold, we only need the equality in either (8) or (9) to hold. We first prove (8), and derive the sufficient and necessary condition for the equality to hold. To simplify notations, denote J l = {j : h lj = 1}, J l+ = {j : h lj = 1}. Simple algebra suggests that (8) is equivalent to N j + j J l N jj + N jj j j J l j j J l+ j J l,j J l+ N jj. (10) To prove (10), for all i = 1,...,N let λ il = j J l Y i (z j ) and λ il+ = j J l+ Y i (z j ), which are two integer constant. Therefore, for i = 1,...,N, it is obvious that (λ il λ il+ )+(λ il λ il+ ) 2 0, or equivalently λ il + λ il (λ il 1) 2 + λ il+(λ il+ 1) 2 λ il λ il+. (11) Note that (11) immediately implies (8), because j J l N j = N λ il, j J l,j J l+ N jj = N λ il λ il+, and N jj = λ ils(λ ils 1) 2 j j J ls (s =,+). Moreover, the equality in (8) holds if and only if the equality in (11) holds for all i = 1,...,N, which is equivalent to (6), because by definition λ il λ il+ = 2 k 1 τ il. Similarly, we can prove (9), and its equality holds if and only if (7) holds. The proof is complete. 9

10 With the help of Lemmas 1 2, alongside with the definition of factorial effect in (1), we can derive the main theoretical result of the paper. Theorem 1. The sharp lower bound for S 2 ( τ l ) is SLB( τ 2 l ) = N N 1 max{2 (K 1) τ l τ l 2,0}. (12) The equality in (12) holds if and only if (6) or (7) holds. Proof. The proof follow from the definition of τ l in (1), and Lemmas 1 2. The lower bound in (12) sharp, in the sense that it is the minimum of all possible values of S 2 ( τ l ) compatible with the marginal distributions of the potential outcomes (i.e., N j for j = 1,...,J). To facilitate a better understanding of Theorem 1, we consider two special cases. First, when J = 1, we have the classic treatment-control studies. In this case, Theorem 1 reduces to the main result of Ding and Dasgupta (2016). Moreover, the condition in (6), which is sufficient and necessary to achieve the lower bound in (12), reduces to Y i (1) Y i ( 1) (i = 1,...,N) which is termed monotonicity by Ding and Dasgupta (2016). Second, when J = 2, Theorem 1 reduces to the main result of Lu (2017). Theorem 1 leads to the improved Neymanian variance estimator Var IN (ˆ τ l ) = Var Ney (ˆ τ l ) Ŝ2 LB ( τ l)/n, (13) where the bias-correlation term Ŝ 2 LB( τ l ) = N N 1 max{2 (K 1) ˆ τ l ˆ τ 2 l,0} is the finite-population analogue of the sharp lower bound presented in (12). This bias-correction term is always non-negative, which leads to a guaranteed improvement of variance estimation, for any observed data. As pointed out by several researchers, the consistency of the point estimate ˆ τ l 10

11 guarantees the performance of the improved Neymanian variance estimator in moderately large samples. For some simulated and empirical examples for 2 2 factorial designs, see Lu (2017). 4. CONCLUDING REMARKS Under the potential outcomes framework, we have proposed an improved Neymanian variance estimator for 2 K factorial designs with binary outcomes. Comparing to the classic variance estimator by Dasgupta et al. (2015), the newly proposed estimator guarantees bias-correction, regardless of the underlying dependency structure of the potential outcomes. The core idea behind the new estimator is the sharp lower bound of the sampling variance of the estimated factorial effects. We point out three directions of future research. First, although we focus on binary outcomes, it would be interesting to generalize the current work to general outcomes (e.g., continuous, time to event). Aronow et al. (2014) achieved this goal for treatment-control studies, but generalizing their results to factorial designs seems a highly non-trivial task. Second, in a recent paper Mukerjee et al. (2017) extended the potential outcomes framework to more complex experimental designs beyond 2 K factorial, and it is possible to study partial identification for more complex designs. Third, we can generalize our results beyond the classic difference-in-means type factorial effects, to more nuance causal estimands. REFERENCES Aronow, P., Green, D. P., and Lee, D. K. (2014). Sharp bounds on the variance in randomized experiments. The Annals of Statistics, 42: Berkowitz, L. and Daniels, L. R. (1964). Affecting the salience of the social responsibility norm: Effects of past help on the response to dependency relationships. Journal of Abnormal and Social Psychology, 68:275. Brunner, E. and Puri, M. L. (2001). Nonparametric methods in factorial designs. Statistical Papers, 42:1 52. Cheng, J. and Small, D. S.(2006). Bounds on causal effects in three-arm trials with non-compliance. Journal of the Royal Statistical Society: Series B, 68:

12 Dasgupta, T., Pillai, N., and Rubin, D. B. (2015). Causal inference from 2 k factorial designs using the potential outcomes model. Journal of the Royal Statistical Society: Series B, 77: Ding, P.(2017). A paradox from randomization-based causal inference(with discussions). Statistical Science, 32: Ding, P. and Dasgupta, T.(2016). A potential tale of two by two tables from completely randomized experiments. Journal of American Statistical Association, 111: Fan, Y. and Park, S. S. (2010). Sharp bounds on the distribution of treatment effects and their statistical inference. Econometric Theory, 26: Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture of Great Britain, 33: Fisher, R. A. (1935). The Design of Experiments. Edinburgh: Oliver and Boyd. Harris, M. B. (1974). Mediators between frustration and aggression in a field experiment. Journal of Experimental Social Psychology, 10: Holland, P. W. (1986). Statistics and causal inference. Journal of American Statistical Association, 81: Imbens, G. and Rubin, D. B.(2015). Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction. New York: Cambridge University Press. Kim, K., Zhang, M., and Li, X. (2008). Effects of temporal and social distance on consumer evaluations. Journal of Consumer Research, 35: LaVange, L. M., Durham, T. A., and Koch, G. (2005). Randomization-based non-parametric methods for the analysis of multi-centre trials. Statistical Methods in Medical Research, 14: Lu, J. (2016a). Covariate adjustment in randomization-based causal inference for 2 k factorial designs. Statistics & Probability Letters, 119: Lu, J. (2016b). On randomization-based and regression-based inferences for 2 k factorial designs. Statistics & Probability Letters, 112:

13 Lu, J. (2017). Sharpening randomization-based causal inference for 2 2 factorial designs with binary outcomes. Statistical Methods in Medical Research, in press. doi: / Lu, J., Ding, P., and Dasgupta, T.(2015). Treatment effects on ordinal outcomes: Causal estimands and sharp bounds. arxiv preprint: Mukerjee, R., Dasgupta, T., and Rubin, D. B. (2017). Using standard tools from finite population sampling to improve causal inference for complex experiments. Journal of the American Statistical Association, in press. doi: / Nair, V., Strecher, V., Fagerlin, A., Ubel, P., Resnicow, K., Murphy, S., Little, R., Chakraborty, B., and Zhang, A. (2008). Screening experiments and the use of fractional factorial designs in behavioral intervention research. American Journal of Public Health, 98: Neyman, J. S. (1923). On the application of probability theory to agricultural experiments. Essay on principles: Section 9 (reprinted edition). Statistical Science, 5: Richardson, A., Hudgens, M. G., Gilbert, P. B., and Fine, J. P. (2014). Nonparametric bounds and sensitivity analysis of treatment effects. Statistical Science, 29:596. Rigdon, J. and Hudgens, M. G. (2015). Randomization inference for treatment effects on a binary outcome. Statistics in Medicine, 34: Robins, J. M. (1988). Confidence intervals for causal parameters. Statistics in Medicine, 7: Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66: Rubin, D. B. (1980). Comment on Randomized analysis of experimental data: The Fisher randomization test by D. Basu. Journal of American Statistical Association, 75: Rubin, D. B. (1990). Formal mode of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25: Rubin, D. B. (2008). For objective causal inference, design trumps analysis. The Annals of Applied Statistics, 2:

14 Stampfer, M. J., Buring, J. E., Willett, W., Rosner, B., Eberlein, K., and Hennekens, C. H. (1985). The 2 2 factorial design: Its application to a randomized trial of aspirin and us physicians. Statistics in Medicine, 4: Wu, C. F. J. and Hamada, M. S. (2009). Experiments: Planning, Analysis, and Optimization. Wiley: New York. Yates, F. and Mather, K. (1963). Ronald Aylmer Fisher. Biographical Memoirs of Fellows of the Royal Society, 9: Yuan, X., Liu, J., Zeng, G., Shi, J., Tong, J., and Huang, G. (2008). Optimization of conversion of waste rapeseed oil with high FFA to bio-diesel using response surface methodology. Renewable Energy, 33: Zhang, J. L. and Rubin, D. B. (2003). Estimation of causal effects via principal stratification when some outcomes are truncated by death. Journal of Educational and Behavioral Statistics, 28:

arxiv: v1 [stat.me] 3 Feb 2017

arxiv: v1 [stat.me] 3 Feb 2017 On randomization-based causal inference for matched-pair factorial designs arxiv:702.00888v [stat.me] 3 Feb 207 Jiannan Lu and Alex Deng Analysis and Experimentation, Microsoft Corporation August 3, 208