Detecting Statistical Interactions from Neural Network Weights

Size: px

Start display at page:

Download "Detecting Statistical Interactions from Neural Network Weights"

Alice Lang
5 years ago
Views:

1 Detecting Statistical Interactions from Neural Network Weights Michael Tsang Joint work with Dehua Cheng, Yan Liu 1/17

2 Motivation: We seek assurance that a neural network learned the longitude x latitude interaction for predicting housing price $$$ $ $$ $$$$ Michael Author Tsang (USC) Page 2

3 Problem Can we detect statistical interactions in data by interpreting the trained weights of a multilayer perceptron (MLP)? The complex behavior of MLPs can be better understood. 3/17

4 Statistical Interaction Statistical Interaction 1 : Non-Additive Groupings of Variables in F x For example: F x = sin x ( + x * + x + + x + x, + x - {1,2,3} {3,4} 1 Sorokina et al /17

5 Statistical Interaction Statistical Interaction 1 : Non-Additive Groupings of Variables in F x For example: F x = sin x ( + x * + x + + x + x, + x - {1,2,3} {3,4} F x = log x ( x * = log x ( + log x * no interaction 1 Sorokina et al /17

6 Core Insight in Nonlinear Networks: {1,3} should exist Assume first layer hidden units are especially good at modeling interactions 5/17

7 Neural Interaction Detection (NID) Framework 1. Train MLP with Regularization 2. Rank Interactions by Interpreting Weights 3. Find Cutoff on the Ranking (if desired) 6/17

8 Rank Interactions by Interpreting Weights Interaction Strength Per Hidden Unit for hidden unit i Approximation of Hidden Unit Influence 7/17

9 Rank Interactions by Interpreting Weights Interaction Strength Per Hidden Unit for hidden unit i Approximation of Hidden Unit Influence 7/17

10 Rank Interactions by Interpreting Weights Interaction Strength Per Hidden Unit for hidden unit i Approximation of Hidden Unit Influence 7/17

11 Rank Interactions by Interpreting Weights Interaction Strength Per Hidden Unit for hidden unit i Approximation of Hidden Unit Influence 7/17

12 Ranking Pairwise Interactions x ( x * x + x, 8/17

13 Ranking Pairwise Interactions x ( x * x + x, 8/17

14 Ranking Pairwise Interactions x ( x * x + x, 8/17

15 Ranking Pairwise Interactions x ( x * x + x, 8/17

16 Ranking Pairwise Interactions x ( x * x + x, 8/17

17 Ranking Pairwise Interactions x ( x * x + x, 8/17

18 Ranking Pairwise Interactions x ( x * x + x, 8/17

19 Ranking Pairwise Interactions x ( x * x + x, 8/17

20 Ranking Pairwise Interactions x ( x * x + x, 8/17

21 Ranking Pairwise Interactions x ( x * x + x, 8/17

22 Ranking Pairwise Interactions x ( x * x + x, 8/17

23 Ranking Pairwise Interactions x ( x * x + x, 8/17

24 Ranking Pairwise Interactions x ( x * x + x, 8/17

25 Ranking Pairwise Interactions x ( x * x + x, 8/17

26 Ranking Higher-Order Interactions x ( x * x + x, w ( > w * > w + > w, 9/17

27 Ranking Higher-Order Interactions h ( Interactions Strengths {1,2} z ( min w (, w * x ( x * x + x, w ( > w * > w + > w, 9/17

28 Ranking Higher-Order Interactions h ( Interactions Strengths {1,2} z ( w * x ( x * x + x, w ( > w * > w + > w, 9/17

29 Ranking Higher-Order Interactions h ( Interactions Strengths {1,2} z ( w * {1,2,3} z ( w + x ( x * x + x, w ( > w * > w + > w, 9/17

30 Ranking Higher-Order Interactions h ( Interactions Strengths x ( x * x + x, {1,2} z ( w * {1,2,3} z ( w + {1,2,3,4} z ( w, w ( > w * > w + > w, 9/17

31 Ranking Higher-Order Interactions h ( h * Interactions Strengths {1,2} z ( w * {1,2,3} z ( w + {1,2,3,4} z ( w, x ( x * x + x, {1,3} z * w ( w + > w ( > w * > w, 9/17

32 Ranking Higher-Order Interactions h ( h * Interactions Strengths {1,2} z ( w * {1,2,3} z ( w + + z * w * {1,2,3,4} z ( w, x ( x * x + x, {1,3} z * w ( w + > w ( > w * > w, 9/17

33 Ranking Higher-Order Interactions h ( h * Interactions Strengths {1,2} z ( w * {1,2,3} z ( w + + z * w * {1,2,3,4} z ( w, + z * w, x ( x * x + x, {1,3} z * w ( w + > w ( > w * > w, 9/17

34 Ranking Higher-Order Interactions h ( h * h + Interactions Strengths {1,2} z ( w * {1,2,3} z ( w + + z * w * {1,2,3,4} z ( w, + z * w, x ( x * x + x, {1,3} z * w ( 9/17

35 Ranking Higher-Order Interactions h ( h * h + h, Interactions Strengths {1,2} z ( w * {1,2,3} z ( w + + z * w * {1,2,3,4} z ( w, + z * w, x ( x * x + x, {1,3} z * w ( 9/17

36 Sample Interaction Ranking Interactions Strengths {1,2,3} {1,2,3,4} {1,2} {1,3} /17

37 Find a Cutoff on the Ranking Use a generalized additive model with interactions (MLP-Cutoff) error Cutoff K 10/17

38 Test Suite of Data-Generating Functions Complex functions are used in our evaluation 11/17

AUC of Pairwise Interaction Strengths 1 2 3 1 Fisher 1925, 2 Bien et al.

39 AUC of Pairwise Interaction Strengths Fisher 1925, 2 Bien et al. 2013, 3 Sorokina et al *F 6 plays an important role for this result 12/17

40 Higher-Order Interaction Detection for Synthetic Data 13/17

41 Higher-Order Interaction Detection versus Baseline 14/17

42 Higher-Order Interaction Detection versus Baseline Similar detection performance at varying noise levels 14/17

43 Higher-Order Interaction Detection versus Baseline Runtime is orders of magnitude times faster 14/17

44 Back to our housing problem $$$ $ $$ $$$$ Michael Author Tsang (USC) Page 44

45 Pairwise Heat-Maps for Real-World Data {1,2}: longitude and latitude! 1 Pace et al. 1997, 2 Fanaee-T et al. 2014, 3 Adam-Bourdarios et al. 2014, 4 Frey et al /17

Pairwise Heat-Maps for Real-World Data 1 2 3 4 {4,7}: hour and working day 1 Pace et

46 Pairwise Heat-Maps for Real-World Data {4,7}: hour and working day 1 Pace et al. 1997, 2 Fanaee-T et al. 2014, 3 Adam-Bourdarios et al. 2014, 4 Frey et al /17

47 Higher-Order Interaction Detection for Real-World Data Reached the cutoff point obtained informative interactions 16/17

48 Summary Proposed Neural Interaction Detection (NID) that detects interactions from neural network weights NID takes orders of magnitude less time to obtain similar performance to the state-of-the-art baseline. 17/17

49 References Adam-Bourdarios, Claire, et al. "Learning to discover: the higgs boson machine learning challenge." URL (2014). Bien, Jacob, Jonathan Taylor, and Robert Tibshirani. "A lasso for hierarchical interactions." Annals of statistics 41.3 (2013): Fanaee-T, Hadi, and Joao Gama. "Event labeling combining ensemble detectors and background knowledge." Progress in Artificial Intelligence (2014): Fisher, Ronald Aylmer. "Statistical methods for research workers." Breakthroughs in Statistics. Springer, New York, NY, Frey, Peter W., and David J. Slate. "Letter recognition using Holland-style adaptive classifiers." Machine learning 6.2 (1991): Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): Sorokina, Daria, et al. "Detecting statistical interactions with additive groves of trees." Proceedings of the 25th international conference on Machine learning. ACM, 2008.

Neural Interaction Detection

Neural Interaction Detection Michael Tsang, Dehua Cheng, Yan Liu Department of Computer Science University of Southern California Los Angeles, CA 008 {tsangm, dehuache, yanliu.cs}@usc.edu Abstract We develop