Efficient Identification of Heterogeneous Treatment Effects via Anomalous Pattern Detection

Size: px

Start display at page:

Download "Efficient Identification of Heterogeneous Treatment Effects via Anomalous Pattern Detection"

Ashlyn Powell
5 years ago
Views:

1 Efficient Identification of Heterogeneous Treatment Effects via Anomalous Pattern Detection Edward McFowland III Joint work with (Sriram Somanchi & Daniel B. Neill) 1

2 Treatment Effects 2

3 Treatment Effects 3

4 Treatment Effect Heterogeneity Control Group Treatment Group Age = Young Age = Mid Age = Old 4

5 Treatment Effect Heterogeneity Age = Young Control Group Treatment Group ( Positive) Effect Positive and negative effects can cancel Age = Mid ( Negative) Effect Age = Old ( No ) Effect ( ) No Effect 5

6 Treatment Effect Heterogeneity Age = Young Control Group Treatment Group ( Positive) Effect Positive and negative effects can cancel Age = Mid ( ) No Effect True effect can be masked Age = Old ( No ) Effect ( No ) Effect 6

7 Treatment Effect Heterogeneity Age = Young Control Group Treatment Group ( Very ) Positive Effect Positive and negative effects can cancel Age = Mid ( No ) Effect True effect can be masked ( No ) Age = Old Effect ( Slight ) Effect Effects could really be driven by a subpopulation 7

8 Treatment Effect Heterogeneity Age = Young Control Group Treatment Group ( Very ) Positive Effect Positive and negative effects can cancel Age = Mid ( No ) Effect True effect can be masked Ex: FDA Approved BiDil Drug ( No ) Age = Old Effect ( Slight ) Effect Effects could really be driven by a subpopulation Ex: Perry Preschool 8

9 Treatment Effect Heterogeneity (Big Data) Control Group Treatment Group.. 9

10 Machine Learning s Contributions Regression Methods OLS and Regularized Regression (e.g., LASSO)* Imai and Ratkovic (2013) Single Tree Methods Su et al (2009) Imai and Strauss (2011) Athey and Imbens (2015)* Ensemble Methods Green and Kern (2012) Wager and Athey (2012)* * provide asym confidence intervals, for inference of effects significance 10

11 Limitations Regression Methods Pre-specification of the model Single Tree Methods Greedy and unstable Ensemble Methods Fairly uninterpretable/no natural subpopulations General Limitations The mean and only the mean Other moments can be effected Simpsons Paradox Risk minimization not effect maximization Small number of subpopulations considered No guarantee on their interestingness No discovery, only model inspection 11

12 Anomaly Detection Paradigm Identifying when a system deviates away from its expected behavior. 12

13 Anomalous Pattern Detection Procedure Normal Activity (M 0 ) A1 A2 A3 A4 A6 A5 Detect Anomalous Pattern Given M 0 Novel Pattern Test Data 13

14 Control Group HTE Pattern Detection Detect Anomalous Pattern Given M 0 Novel Pattern Treatment Group 14

15 Control Group HTE Pattern Detection Detect Anomalous Subpopulation Given M 0 Novel Pattern Treatment Group 15

16 Control Group HTE Pattern Detection Detect Anomalous Subpopulation Given M 0 Subpopulation Treatment Group 16

17 Male Female The Goal Black White Hispanic Asian Detect a subpopulation (subsets of attribute values), which correspond to anomalous outcomes for subjects in the treatment group Native American Other 17

18 Male Female The Goal Black White Hispanic Asian Native American Detect a subpopulation (subsets of attribute values), which correspond to anomalous outcomes for subjects in the treatment group The Optimization s 1 {a 1 a t },, s M {a 1 a p } Other 18

19 Male Female The Goal Black White Hispanic Asian Native American Other Detect a subpopulation (subsets of attribute values), which correspond to anomalous outcomes for subjects in the treatment group The Optimization s 1 {a 1 a t },, s M {a 1 a p } S = s 1 s M 19

20 Male Female The Goal Black White Hispanic Asian Native American Other Detect a subpopulation (subsets of attribute values), which correspond to anomalous outcomes for subjects in the treatment group The Optimization s 1 {a 1 a t },, s M {a 1 a p } S = s 1 s M S* = argmax S F(S) 20

21 Treatment Effects Subset Scan (TESS) Male Female Black White Y BM Y WM Y BF Y WF I. Compute the statistical anomalousness of each treatment group subject Hispanic Y HM Y HF Asian Native American Y AM Y NM Y AF Y NF II. Detect subpopulation that is collectively the most anomalous Other Y OM Y OF 21

22 Treatment Effects Subset Scan (TESS) Male Female Black White P BM P WM P BF P WF I. Compute the statistical anomalousness of each treatment group subject -- This measurement will be a p-value Hispanic P HM P HF Asian Native American P AM P NM P AF P NF II. Detect subpopulation that is collectively the most anomalous -- Many subjects with significant p-values Other P OM P OF 22

23 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic Y BM Y WM Y HM Y BF Y WF Y HF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 Asian Y AM Y AF Native American Y NM Y NF Other Y OM Y OF Control Group 23

24 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic Y BM Y WM Y HM Y BF Y WF Y HF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values Asian Y AM Y AF Native American Y NM Y NF Other Y OM Y OF Treatment Group 24

25 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic Y BM Y WM Y HM Y BF Y WF Y HF y i BF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values Asian Y AM Y AF Native American Y NM Y NF Other Y OM Y OF Treatment Group 25

26 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic P BM P WM P HM P BF P WF P HF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values Asian P AM P AF Native American P NM P NF Other P OM P OF Treatment Group 26

27 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic Asian Native American P BM P WM P HM P AM P NM P BF P WF P HF P AF P NF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values i. Maps each bin s distribution to the same interval Other P OM P OF Treatment Group 27

28 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic Asian Native American P BM P WM P HM P AM P NM P BF P WF P HF P AF P NF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values i. Maps each bin s distribution to the same interval ii. P ij ~ Uniform[0,1] under H 0 Other P OM P OF Treatment Group 28

29 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic Asian Native American Other P BM P WM P HM P AM P NM P OM P BF P WF P HF P AF P NF P OF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values i. Maps each bin s distribution to the same interval ii. P ij ~ Uniform[0,1] under H 0 iii. For any N p-values, we expect N*α to be significant at level α Treatment Group 29

Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic Asian Native American Other P BM P WM P HM P AM P NM P OM P BF P WF P HF P AF P NF P OF I.

30 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic Asian Native American Other P BM P WM P HM P AM P NM P OM P BF P WF P HF P AF P NF P OF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values i. Maps each bin s distribution to the same interval ii. P ij ~ Uniform[0,1] under H 0 iii. For any N p-values, we expect N*α to be significant at level α Treatment Group Higher Criticism: 30

31 Treatment Effects Subset Scan (TESS) Male Female Black White Hispanic P BM P WM P HM P BF P WF P HF I. Compute the statistical anomalousness of each treatment group subject II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of Asian Native American P AM P NM P AF P NF s 1 s M Naïve search is infeasible O(2 A i ) Other P OM P OF Treatment Group 31

32 Treatment Effects Subset Scan (TESS) Nonparametric Scan Statistic (NPSS) Have: S {A 1 A M } = {s 1 s M } Select: F(S) Want: max S F(S) I. Compute the statistical anomalousness of each treatment group subject II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M Naïve search is infeasible O(2 A i ) 32

33 Treatment Effects Subset Scan (TESS) Nonparametric Scan Statistic (NPSS) Have: S {A 1 A M } = {s 1 s M } Select: F(S) Want: max S F(S) There Exist: G(a i ) I. Compute the statistical anomalousness of each treatment group subject II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M Naïve search is infeasible O(2 A i ) 33

34 Treatment Effects Subset Scan (TESS) Nonparametric Scan Statistic (NPSS) Have: S {A 1 A M } = {s 1 s M } Select: F(S) Want: max S F(S) There Exist: G(a i ) Such That: I. Compute the statistical anomalousness of each treatment group subject II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M Naïve search is infeasible O(2 A i ) Only Consider: {Black} {Black, Hispanic} {Black, Hispanic, Asian}. {Black, Hispanic, Asian,, White } 34

35 Treatment Effects Subset Scan (TESS) Nonparametric Scan Statistic (NPSS) Have: S {A 1 A M } = {s 1 s M } Select: F(S) Want: max S F(S) There Exist: G(a i ) I. Compute the statistical anomalousness of each treatment group subject II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M NPSS over an attribute in O(t log t) Such That: Only Consider: {Black} {Black, Hispanic} {Black, Hispanic, Asian}. {Black, Hispanic, Asian,, White } 35

36 TESS Search Procedure Male Female Black White Hispanic Asian Native American Other P BM P WM P HM P AM P NM P OM P BF P WF P HF P AF P NF P OF Treatment Group I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M NPSS over an attribute in O(t log t) 36

37 TESS Search Procedure Male Female Black White Hispanic Asian Native American Other P BM P WM P HM P AM P NM P OM (Score = 7.5) P BF P WF P HF P AF P NF P OF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M NPSS over an attribute in O(t log t) 37

38 TESS Search Procedure Male Female Black White Hispanic Asian Native American Other P BM P WM P HM P AM P NM P OM (Score = 8.1) P BF P WF P HF P AF P NF P OF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M NPSS over an attribute in O(t log t) 38

39 TESS Search Procedure Male Female Black White Hispanic Asian Native American Other P BM P WM P HM P AM P NM P OM (Score = 9.3) P BF P WF P HF P AF P NF P OF I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M NPSS over an attribute in O(t log t) 39

40 TESS Search Procedure Male Female Black White Hispanic Asian Native American Other P BM P BF P WM P WF P HM P HF P AM P AF P NM P NF P OM P OF (Score = 9.3) I. Compute the statistical anomalousness of each treatment group subject 1. Estimate Conditional Distribution Under H 0 2. Compute empirical p-values II. Discover subsets of attribute values that define the most anomalous outcomes 1. Maximize F(S) over all subsets of s 1 s M NPSS over an attribute in O(t log t) Significance of our subpopulation Compare subpopulation score to maximum scores of simulated datasets under H 0 40

41 Tennessee Star Analysis (1985) Effect of classrooms size on achievement (test scores) 4 year panel (kindergarten to 3 rd grade) 6,500 students, 330 classrooms, 80 schools Total of over 11,000 records Treatment Conditions (randomized within school) Regular Size Class (20-25 students) Regular Size + Aide Class (20-25 students) Small Size Class (13-17 students) 42

42 Tennessee Star Analysis (1985) 43

43 Tennessee Star Analysis Treatment (1) (2.547) (2) (2.277) Sample All 2 nd Grade All 3 rd Grade R-squared Observations Notes: All estimates are from OLS models. Standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 44

44 Tennessee Star Analysis (1985) Detected Subpopulation grade: 2nd or 3rd school: inner-city or urban experience: [10, infinity) other features are considered irrelevant 45

45 Tennessee Star Analysis 46

46 Tennessee Star Analysis 47

47 Tennessee Star Analysis (1) (2) (3) Treatment *** (2.547) (8.727) (2.639) Sample All 2 nd Grade Detected Subpopulation Undetected Subpopulation R-squared 0.001< < Observations Notes: All estimates are from OLS models. Standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 48

48 Tennessee Star Analysis (1) (2) (3) Treatment ** (2.277) (9.154) (2.318) Sample All 3 rd Grade Detected Group (3 rd Grade) Undetected Group (3 rd Grade) R-squared 0.001< < Observations Notes: All estimates are from OLS models. Standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 49

49 Conclusion Discovering subpopulations with significant treatment effects can be paramount Machine Learning can flexibly estimate effects but it limited when goal is to identify subpopulations with large effects Anomalous Pattern Detection paradigm offers overcome some abilities to overcome these limitations Maintain high power to detect by searching over and combining signal across various subpopulations 50

50 References Athey, S., & Imbens, G. W. (2015). Machine Learning Methods for Estimating Heterogeneous Causal Effects. arxiv.org. Green, D. P., & Kern, H. L. (2012). Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees. Public Opinion Quarterly, 76(3), Imai, K., & Ratkovic, M. (2013). Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics, 7(1), Imai, K., & Strauss, A. (2011). Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Application to the Optimal Planning of the Get-Out-the-Vote Campaign. Political Analysis, 19(1), McFowland, E., III, Speakman, S. D., & Neill, D. B. (2013). Fast generalized subset scan for anomalous pattern detection. The Journal of Machine Learning Research, 14(1), Mehlig, Kirsten et al. CETP TaqIB genotype modifies the association between alcohol and coronary heart disease: The INTERGENE case-control study. Alcohol, 48(7), Su, X., Tsai, C.-L., Wang, H., Nickerson, D. M., & Li, B. (2009). Subgroup Analysis via Recursive Partitioning. The Journal of Machine Learning Research, 10, Wager, S., & Athey, S. (2015). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. arxiv.org. 51

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests Stefan Wager Department of Statistics Stanford University swager@stanford.edu Susan Athey Graduate School of Business Stanford