MACHINE LEARNING FOR CAUSE-EFFECT PAIRS DETECTION. Mehreen Saeed CLE Seminar 11 February, PDF Free Download

MACHINE LEARNING FOR CAUSE-EFFECT PAIRS DETECTION Mehreen Saeed CLE Seminar 11 February, 214.

WHY CAUSALITY. Polio drops can cause polio epidemics (The Nation, January 214) A supernova explosion causes a burst of neutrinos (Scientfic American, November 213) Mobile phones can cause brain tumors (The Telegraph, October 212) DDT pesticide my cause Alzhiemer s disease (BBC, January 214) Price of dollar going up causes price of gold to go down (Investopedia.com, March 211)

OUTLINE Causality Coefficients for computing causality Independence measures Probabilistic Determining the direction of arrows Transfer learning Causality challenge Conclusions

OBSERVATIONAL VS. EXPERIMENTAL DATA Observational data is collected by recording values of different characteristics Experimental data is collected by changing values of some characteristics of the subject and some values are under the control of an experimenter Example: Randomly select 1 individuals and collect data on their everyday diet and their health issues Vs. Select 1 individuals with diabetes and omit a certain food from their diet and observe the result

OBSERVATIONAL VS. EXPERIMENTAL DATA (CONTD) Observational data: Google receives around 2 million requests/minute, Facebook users post around 68, pieces of content/minute, email users send 2,, messages in a minute VS. Experimental data: expensive, maybe unethical, maybe not possible 15 years ago it was thought that inferring causal relationships from observational data is not possible. Research of machine learning scientists like Judea Pearl has changed this view REF: http://mashable.com/212/6/22/data-created-every-minute/

CAUSALITY: FROM OBSERVATIONAL DATA TO CAUSE EFFECT DETECTION X->Y Y->X X Y smoking causes lung cancer lung cancer causes coughing winning cricket match and being born in February X->Z->Y X Y Z (Conditional independence) X<-Z->Y X Y Z (Conditional independence)

OUTLINE Causality Coefficients for computing causality Independence measures Probabilistic Determining the direction of arrows Transfer learning Causality challenge Conclusions

CORRELATION ρ={e(xy)-e(x)e(y)}/std(x)/std(y) 3 x 14 2.5 2 1.5 correlation = -.36627 1 Y.5 4 x 14 correlation =.91918 3 2 -.5-1 -1.5-4 -3-2 -1 X 1 2 3 4 x 1 4 X->Y correlation = -.4 Y 1 3.5 x 14 correlation =.7349 3 2.5-1 2 1.5-2 -.5.5 1 1.5 2 2.5 3 X x 1 4 X->Y correlation =.9 Correlation does not necessarily imply causality Y 1.5 -.5-1 -1.5-3 -2-1 1 2 3 4 5 X x 1 4 X Y correlation =.73

χ 2 TEST FOR INDEPENDENCE 4 x 14 3 25 2 2 1 15 1 Y 5-1 1 2 3 4-2 -3-4 -2 2 4 6 8 X x 1 4 truth: X Y p-value =.99 dof= 81 chi2value = 52.6 5 6 7 8 91 2 3 4 1 5 6 7 8 9 1 8 x 14 Y 6 4 2-2 -4 p-value = dof = 63 chi2value = 3255 corr =.5948-6 -8-3 -2-1 1 2 3 4 X x 1 4 truth: X Y Again this test does not tell us anything about causal inference

STATISTICAL INDEPENDENCE FOR TWO INDEPENDENT EVENTS: P(XY)=P(X)P(Y)

STATISTICAL INDEPENDENCE CONTD Measuring P(XY)-P(X)P(Y) 2.5 x 14 p(xy) - P(X)P(Y) =.85591 2 4 x 14 p(xy) - P(X)P(Y) =.36651 2 1.5 Y 1.5 Y -2-4 -6-8 -.5-1 -12-1 1 2 3 4 5 6-1 -6-5 -4-3 -2-1 1 2 3 X x 1 5 X x 1 4 X->Y X Y P(XY)-P(X)P(Y) =.9 P(XY)-P(X)P(Y) =.4

X->Y VS. Y->X CAUSALITY & DIRECTION OF ARROWS

CONDITIONAL PROBABILITY 25 14 12 2 1 frequency 15 1 frequency 8 6 4 5 2-2 -1 1 2 3 4 5 6 X P(X) -3-2.5-2 -1.5-1 -.5.5 1 1.5 X Y>.4 P(X Y) Does the presence of another variable alter the distribution of X? P(cause and effect) more likely explained by P(cause)P(effect cause) as compared to P(effect)P(cause effect) ALSO if (PX)=P(X Y) it may indicates that X is independent of Y

DETERMINING THE DIRECTION OF ARROWS ANM PNL IGCI GPI-MML ANM-MML ANM-GAUSS LINGAM Fit Y=f(X)+e x check independence of X and e x to determine strength of X->Y Fit Y=g(f(X)+e x ) and check independence of X and e x If X->Y then KL-divergence between P(Y) and a reference distribution is greater than KL- divergence between P(X) and a reference distribution Likelihood of observed data given X->Y is inversely related to the complexity of P(X) and P(Y X) Fit Y=aX+e x and X=bY+e Y X->Y if a>b Note: There are assumptions associated with each method, not stated here REF: Statnikovet al., new methods for separating causes from effects in genomics data, BMC Genomics, 212

USING REGRESSION Determine the direction of causality idea behind ANM 3 x 14 3 x 14 2 2 1 Fit Y=f(X)+e x 1 Y -1-1 -2-2 -3-4 -3-2 -1 1 2 3 4 X x 1 4 Truth: X->Y Fit X=f(Y)+ e y -3-4 -3-2 -1 1 2 3 4 4 x 14 x 1 4 3 2 1 X Check the independence of X and e x and Y and e y -1-2 -3-4 -3-2 -1 1 2 3 Y x 1 4

IDEA BEHIND LINGAM 6 x 14 5 correlation =.58332 y=.58x-.2 4 3 Y 2 1-1 -2-2 -1 1 2 3 4 5 6 7 X x 1 4 truth: Y->X x=.6y+.1

OUTLINE Causality Coefficients for computing causality Independence measures Probabilistic Determining the direction of arrows Transfer learning Causality challenge Conclusions

TRANSFER LEARNING Can we use our knowledge from one problem and transfer it to another??? REF: Pan and Yang, A survey on transfer learning, IEEE TKDE, 22(1), 21.

TRANSFER LEARNING ONE POSSIBLE VIEW SOURCE DOMAIN Lots of labeled data Truth values are known feature construction TARGET DOMAIN Output labels Classification machine same features

CAUSALITY & FEATURE CONSTRUCTION FOR TRANSFER LEARNING If we know the truth values for X and Y relationship then construct features such as: Y 3 x 14 correlation = -.36627 2.5 2 1.5 1.5 -.5-1 -1.5-4 -3-2 -1 1 2 3 4 X x 1 4 independence based: correlation chi square and so on causality based IGCI ANM PNM and so on statistical percentiles medians and so on machine learning errors of prediction and so on

CAUSALITY AND TRANSFER LEARNING THE WHOLE PICTURE PAIR 1 PAIR 2 PAIR 3 X->Y Y->X X Y.1215.1855.37 -.64.225.6551.1557.3448.55 -.1891.537.4515.1692.2291.3983 -.6.388.7383.1114.3994.518 -.288.445.2788.1947.359.56 -.1113.596.6363.3416.2861.6278.555.978 1.1939.2519.4929.7449 -.241.1242.5111.1769.1232.32.537.218 1.4356 PAIR 1 LABEL CORR IG CHI-SQ ANM PAIR 2 LABEL CORR IG CHI-SQ ANM PAIR 3 LABEL CORR IG CHI-SQ ANM PAIR i PAIR j PAIR k unknown unknown unknown.783.5261.645 -.4478.412.1488.92.2827.3728 -.1925.255.319.125.565.6314 -.3815.633.2468.148.3727.5135 -.232.525.3777.4615.4928.9543 -.314.2274.9364 features Classification machine Output

OUTLINE Causality Coefficients for computing causality Independence measures Probabilistic Determining the direction of arrows Transfer learning Causality challenge Conclusions

CAUSE EFFECT PAIRS CHALLENGE Generated from artificial and real data (geography, demographics, chemistry, biology, etc.: Training Data: 45 pairs (truth values : known) Validation Data: 45 pairs (truth values : unknown) Test Data: 45 pairs (truth values : unknown) Can be categorical, numerical or binary Identity of variables in all cases: unknown REF: Guyon, Results and analysis of the 213 ChaLearn cause-effect pair challenge, NIPS 213. REF: http://www.causality.inf.ethz.ch/cause-effect.php

CAUSE EFFECT PAIRS CHALLENGE https://www.kaggle.com/c/cause-effect-pairs

WHAT WERE THE BEST METHODS Pre-processing: Smoothing, binning, transforms, noise removal etc. Feature extraction: Independence, entropy, residuals, statistical features etc. Dimensionality reduction: Feature selection, PCA, ICA, clustering Classifier : Random forests, decision trees, neural networks etc. REF: Guyon, Results and analysis of the 213 ChaLearn cause-effect pair challenge, NIPS 213.

INTERESTING RESULTS... TRANSFER LEARNING NO RETRAINING Jarfo.87.997 FirfiD.6.984 ProtoML.81.99 RETRAINING 3648 gene network cause effect pairs from Ecoli regulatory network REF: Guyon, Results and analysis of the 213 ChaLearn cause-effect pair challenge, NIPS 213. REF: http://gnw.sourceforge.net/dreamchallenge.html

CONCLUSIONS In many cases just one causal coefficient is not enough and so you may have to train a classifier with multiple causal features Research on causal inference from the past decade has shown that it is possible to isolate cause and effect pairs from observational data, to a great extent THANK YOU

REFERENCES 1. Statnikovet al., new methods for separating causes from effects in genomics data, BMC Genomics, 212. 2. NIPS 213 Workshop on Causality http://clopinet.com/isabelle/projects/nips213/ 3. Pan and Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, 22(1), 21 4. Kaggle website on machine learning challenges and cause effect pairs challenge, www.kaggle.com 5. All datasets are taken from the causality challenge: https://www.kaggle.com/c/cause-effect-pairs

MACHINE LEARNING FOR CAUSE-EFFECT PAIRS DETECTION. Mehreen Saeed CLE Seminar 11 February, 2014.