The Union of Intersections (UoI) Method for Interpretable Data Driven Discovery and Prediction Kristofer E. Bouchard 1,2,3*, Alejandro Bujan 3,

Size: px

Start display at page:

Download "The Union of Intersections (UoI) Method for Interpretable Data Driven Discovery and Prediction Kristofer E. Bouchard 1,2,3*, Alejandro Bujan 3,"

Randolf Atkinson
5 years ago
Views:

1 The Union of Intersections (UoI) Method for Interpretable Data Driven Discovery and Prediction Kristofer E. Bouchard 1,2,3*, Alejandro Bujan 3, Farbod Roosta 4, Prabhat 5, Antoine Snijdes 1, J-H Mao 1, Edward F. Chang 6,7, Michael Mahoney 8, Sharmodeep Bhattacharyya 9 1

2 Understanding the fundamentals of Nature 2

3 New technologies generate vast amounts of data 3

4 Some desired properties of statistical-machine learning predictive allow prediction of the response variable scalable return an answer on large data sets selective only features that influence the response variable are selected accurate estimated parameters are as close to the real value as possible stable return the same values on multiple runs Industry Science X X X X X X X 4

5 Some desired properties of statistical-machine learning predictive allow prediction of the response variable scalable return an answer on large data sets selective only features that influence the response variable are selected accurate estimated parameters are as close to the real value as possible stable return the same values on multiple runs Industry Science X X X X X X X Interpretable Identify features for intervention/synthesis 5

6 Linear regression: the canonical statistical estimation problem input { x B output y model parameters x " = %, +. (1) where, " = (" 1,..., " ( ), % is the n p design matrix and. = (. 1,...,. ( ) are iid random noise terms with. 0(0, ( ). Let 5 be the set of non-zero coefficients of,, 0 be the set of zero coefficients of, and 5 = 7. :,, 8 = " %, 2 + 8,

Linear regression: priors are a double-edged sword input { x B output y @L/@ 3 model parameters 2 1 x " = %, +. (1) where, " = (" 1,..., " ( ), % is the n p design matrix and. = (. 1,...,. ( ) are iid random noise terms with.

7 Linear regression: priors are a double-edged sword input { x B output 3 model parameters 2 1 x " = %, +. (1) where, " = (" 1,..., " ( ), % is the n p design matrix and. = (. 1,...,. ( ) are iid random noise terms with. 0(0, ( ). Let 5 be the set of non-zero coefficients of,, 0 be the set of zero coefficients of, and 5 = 7. Wainwright, ARSA 2014 :,, 8 = " %, 2 + 8,

8 Linear regression: priors are a double-edged sword input { x B output 3 model parameters 2 1 x " = %, +. (1) where, " = (" 1,..., " ( ), % is the n p design matrix and. = (. 1,...,. ( ) are iid random noise terms with. 0(0, ( ). Let 5 be the set of non-zero coefficients of,, 0 be the set of zero coefficients of, and 5 = 7. Wainwright, ARSA 2014 :,, 8 = " %, 2 + 8,

9 Regularization: feature compression 9

10 Ensemble methods: feature expansion 10

11 Union of Intersections (UoI) Method * 0,2 * +! "#$ =! ' ( ) /.,- ',-. /,-( ) q k Bach, BoLasso, ICML, 08 Breiman,Bagging, ML, 96 11

12 Union of Intersections (UoI) Method 12

13 UoILasso : Distributed algorithm for high-dimensional regression * 0,2 * +! "#$ =! ' ( ) /.,- ',-. /,-( ) q k 13

14 UoILasso : Distributed algorithm for high-dimensional regression Slope = cores Python/MPI 14

15 UoILasso: Simulation Results 15

16 UoILasso out performs other methods 16

17 UoILasso out performs other methods 17

18 ) Theorem 1.1. Consider the model in (1). Assume that the distribution of 1 " % &'(, " + is exchangeable and that the UoI Lasso procedure is not worse than random guessing for λ. Also consider that the SRC assumption is satisfied. For large n, let, -./ = /3 -./ 7 89:(<)//. Then, (i) (i) On a set > 1 such that?(> 1 ) (1 (3/<) C 1 ) C2, for,,-./, (ii) ) + % &'( = (7) (ii) Also, on a set Ω 2 such that?(> 2 ) 1 (1 H5<( 3C 1 /<)) C 2, for λ λmin, we have, (3) ) ) (%\% NOPQQ ) % &'( (8) (4) False Positives False Negatives ) where, % NOPQQ = T " T " 2 5Y7, 2. (iii) (iii) On a set > Z = > 1 > 2 such that?(> Z ) 1-45 (1 H5<( 3C 1 /<)) C 2, 1 ] (1 (3/<) C 1 ) C2, both (7) (3) and (4) (8) are satisfied. [ Model Consistency (iv) (iv) On the same set Ω A, we have, [(T &'(\]PNN' T) 2 /3 1,^ (9) (5) Prediction Error

19 Neural Recordings Directly from Human Cortex 19

20 Speech is generated by spatio-temporal patterns of cortical activity 20

21 UoILasso extracts sparse, meaningful graphs UoILasso SCAD 21

22 UoILasso extracts sparse, meaningful graphs UoILasso SCAD 22

23 UoILasso extracts sparse, meaningful graphs Syllable Production 23

24 Identification of genetic basis of complex phenotypes?? Need accurate prediction with small number of factors Very hard: Correlated factors P>>N Collaborative Cross Mice (90% of genetic variability) 24

25 UoILasso identifies a small number genes contributing to behavioral phenotype 25

26 UoILasso identifies a small number of highly predictive parameters 26

27 UoIL1Logistic for classification Drug Discovery Dorothea Prediction Accuracy Selection Ratio (PSR) Parsimony (BIC) L 1 -Logistic 93% 53 x UoI-L 1 Logistic 93% 3 x Cancer Prediction Arcene Prediction Accuracy Selection Ratio (PSR) Parsimony (BIC) L 1 -Logistic 66% 59 x UoI-L 1 Logistic 66% 5 x Parkinsons Prediction Parkinsons Prediction Accuracy Selection Ratio (PSR) Parsimony (BIC) L 1 Logistic 65% UoI-L 1 Logistic 68%

28 Summary/Conclusions 1) UoI: balance feature compression/expansion to maximize prediction accuracy 2) UoI: predictive, scalable, selective, accurate, stable 3) UoILasso: Sparse solutions with no prior 4) UoILasso: regression/ UoIL1Logistic: classification 5) Improved sparse prediction in neuroscience, genetics, disease prediction, drug discovery, UoI: scalable and accurate approximation of L0-norm (an NP-hard problem) 28

29 Future Directions Supervised Learning Generalized Linear Models -Regression -Classification Unsupervised Learning Multiple Input-Multiple Outputs -CCA Non-linear Functions -Random Forests -Random Intersection Trees -MARS Time Series Analysis -Vector Autoregressive -Sparse Identification of Non-linear Dynamics UoI Non-negative matrix Factorization CUR matrix decompositions Convolutional Sparse Coding ADMM/RLA/sub-sampling schemes non-orthogonal designs

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)