MACHINE LEARNING FOR CAUSE-EFFECT PAIRS DETECTION. Mehreen Saeed CLE Seminar 11 February, 2014.

Similar documents
Deep Convolutional Neural Networks for Pairwise Causality

Causal Modeling with Generative Neural Networks

Probabilistic Causal Models

Causation and Prediction (2007) Neural Connectomics (2014) Pot-luck challenge (2008) Causality challenges Isabelle Guyon

arxiv: v1 [cs.lg] 3 Jan 2017

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Generative MaxEnt Learning for Multiclass Classification

Causal Inference. Prediction and causation are very different. Typical questions are:

PATTERN RECOGNITION AND MACHINE LEARNING

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.

Association studies and regression

Naïve Bayesian. From Han Kamber Pei

Machine Learning (CS 567) Lecture 2

Chapter 6 Classification and Prediction (2)

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Probabilistic Machine Learning. Industrial AI Lab.

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Probabilistic latent variable models for distinguishing between cause and effect

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Probabilistic Graphical Models for Image Analysis - Lecture 1

CSCI567 Machine Learning (Fall 2014)

CS 188: Artificial Intelligence Fall 2008

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

Overfitting, Bias / Variance Analysis

Classification & Information Theory Lecture #8

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

Final Exam. December 11 th, This exam booklet contains five problems, out of which you are expected to answer four problems of your choice.

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

STK Statistical Learning: Advanced Regression and Classification

CS6220: DATA MINING TECHNIQUES

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Filter Methods. Part I : Basic Principles and Methods

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Information in Biology

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

An Introduction to Statistical and Probabilistic Linear Models

CS534 Machine Learning - Spring Final Exam

CS6220: DATA MINING TECHNIQUES

Learning features by contrasting natural images with noise

Computational Genomics

Undirected Graphical Models

Information in Biology

STA 4273H: Statistical Machine Learning

ECE521 week 3: 23/26 January 2017

The connection of dropout and Bayesian statistics

Classification and Regression Trees

Generative Learning algorithms

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

The Minimum Message Length Principle for Inductive Inference

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Linear Regression (continued)

Linear classifiers: Overfitting and regularization

15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018

Caesar s Taxi Prediction Services

Causal Inference on Discrete Data via Estimating Distance Correlations

CS281B/Stat241B. Statistical Learning Theory. Lecture 1.

Linear Methods for Classification

The Naïve Bayes Classifier. Machine Learning Fall 2017

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

causal inference at hulu

MLE/MAP + Naïve Bayes

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Naïve Bayes classification

Machine learning - HT Maximum Likelihood

Linear classifiers: Logistic regression

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

FINAL: CS 6375 (Machine Learning) Fall 2014

Unsupervised Learning Methods

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Bowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations

Non-parametric Methods

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Classification Based on Probability

CSC 411: Lecture 03: Linear Classification

CSCI-567: Machine Learning (Spring 2019)

Bayesian Learning (II)

Observing Dark Worlds

Regression tree methods for subgroup identification I

Intelligent Systems I

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Machine Learning, Fall 2009: Midterm

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Natural Language Processing with Deep Learning CS224N/Ling284

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning. Lecture 9: Learning Theory. Feng Li.

COMP 328: Machine Learning

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Linear & nonlinear classifiers

Introduction to Logistic Regression

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Transcription:

MACHINE LEARNING FOR CAUSE-EFFECT PAIRS DETECTION Mehreen Saeed CLE Seminar 11 February, 214.

WHY CAUSALITY. Polio drops can cause polio epidemics (The Nation, January 214) A supernova explosion causes a burst of neutrinos (Scientfic American, November 213) Mobile phones can cause brain tumors (The Telegraph, October 212) DDT pesticide my cause Alzhiemer s disease (BBC, January 214) Price of dollar going up causes price of gold to go down (Investopedia.com, March 211)

OUTLINE Causality Coefficients for computing causality Independence measures Probabilistic Determining the direction of arrows Transfer learning Causality challenge Conclusions

OBSERVATIONAL VS. EXPERIMENTAL DATA Observational data is collected by recording values of different characteristics Experimental data is collected by changing values of some characteristics of the subject and some values are under the control of an experimenter Example: Randomly select 1 individuals and collect data on their everyday diet and their health issues Vs. Select 1 individuals with diabetes and omit a certain food from their diet and observe the result

OBSERVATIONAL VS. EXPERIMENTAL DATA (CONTD) Observational data: Google receives around 2 million requests/minute, Facebook users post around 68, pieces of content/minute, email users send 2,, messages in a minute VS. Experimental data: expensive, maybe unethical, maybe not possible 15 years ago it was thought that inferring causal relationships from observational data is not possible. Research of machine learning scientists like Judea Pearl has changed this view REF: http://mashable.com/212/6/22/data-created-every-minute/

CAUSALITY: FROM OBSERVATIONAL DATA TO CAUSE EFFECT DETECTION X->Y Y->X X Y smoking causes lung cancer lung cancer causes coughing winning cricket match and being born in February X->Z->Y X Y Z (Conditional independence) X<-Z->Y X Y Z (Conditional independence)

OUTLINE Causality Coefficients for computing causality Independence measures Probabilistic Determining the direction of arrows Transfer learning Causality challenge Conclusions

CORRELATION ρ={e(xy)-e(x)e(y)}/std(x)/std(y) 3 x 14 2.5 2 1.5 correlation = -.36627 1 Y.5 4 x 14 correlation =.91918 3 2 -.5-1 -1.5-4 -3-2 -1 X 1 2 3 4 x 1 4 X->Y correlation = -.4 Y 1 3.5 x 14 correlation =.7349 3 2.5-1 2 1.5-2 -.5.5 1 1.5 2 2.5 3 X x 1 4 X->Y correlation =.9 Correlation does not necessarily imply causality Y 1.5 -.5-1 -1.5-3 -2-1 1 2 3 4 5 X x 1 4 X Y correlation =.73

χ 2 TEST FOR INDEPENDENCE 4 x 14 3 25 2 2 1 15 1 Y 5-1 1 2 3 4-2 -3-4 -2 2 4 6 8 X x 1 4 truth: X Y p-value =.99 dof= 81 chi2value = 52.6 5 6 7 8 91 2 3 4 1 5 6 7 8 9 1 8 x 14 Y 6 4 2-2 -4 p-value = dof = 63 chi2value = 3255 corr =.5948-6 -8-3 -2-1 1 2 3 4 X x 1 4 truth: X Y Again this test does not tell us anything about causal inference

STATISTICAL INDEPENDENCE FOR TWO INDEPENDENT EVENTS: P(XY)=P(X)P(Y)

STATISTICAL INDEPENDENCE CONTD Measuring P(XY)-P(X)P(Y) 2.5 x 14 p(xy) - P(X)P(Y) =.85591 2 4 x 14 p(xy) - P(X)P(Y) =.36651 2 1.5 Y 1.5 Y -2-4 -6-8 -.5-1 -12-1 1 2 3 4 5 6-1 -6-5 -4-3 -2-1 1 2 3 X x 1 5 X x 1 4 X->Y X Y P(XY)-P(X)P(Y) =.9 P(XY)-P(X)P(Y) =.4

X->Y VS. Y->X CAUSALITY & DIRECTION OF ARROWS

CONDITIONAL PROBABILITY 25 14 12 2 1 frequency 15 1 frequency 8 6 4 5 2-2 -1 1 2 3 4 5 6 X P(X) -3-2.5-2 -1.5-1 -.5.5 1 1.5 X Y>.4 P(X Y) Does the presence of another variable alter the distribution of X? P(cause and effect) more likely explained by P(cause)P(effect cause) as compared to P(effect)P(cause effect) ALSO if (PX)=P(X Y) it may indicates that X is independent of Y

DETERMINING THE DIRECTION OF ARROWS ANM PNL IGCI GPI-MML ANM-MML ANM-GAUSS LINGAM Fit Y=f(X)+e x check independence of X and e x to determine strength of X->Y Fit Y=g(f(X)+e x ) and check independence of X and e x If X->Y then KL-divergence between P(Y) and a reference distribution is greater than KL- divergence between P(X) and a reference distribution Likelihood of observed data given X->Y is inversely related to the complexity of P(X) and P(Y X) Fit Y=aX+e x and X=bY+e Y X->Y if a>b Note: There are assumptions associated with each method, not stated here REF: Statnikovet al., new methods for separating causes from effects in genomics data, BMC Genomics, 212

USING REGRESSION Determine the direction of causality idea behind ANM 3 x 14 3 x 14 2 2 1 Fit Y=f(X)+e x 1 Y -1-1 -2-2 -3-4 -3-2 -1 1 2 3 4 X x 1 4 Truth: X->Y Fit X=f(Y)+ e y -3-4 -3-2 -1 1 2 3 4 4 x 14 x 1 4 3 2 1 X Check the independence of X and e x and Y and e y -1-2 -3-4 -3-2 -1 1 2 3 Y x 1 4

IDEA BEHIND LINGAM 6 x 14 5 correlation =.58332 y=.58x-.2 4 3 Y 2 1-1 -2-2 -1 1 2 3 4 5 6 7 X x 1 4 truth: Y->X x=.6y+.1

OUTLINE Causality Coefficients for computing causality Independence measures Probabilistic Determining the direction of arrows Transfer learning Causality challenge Conclusions

TRANSFER LEARNING Can we use our knowledge from one problem and transfer it to another??? REF: Pan and Yang, A survey on transfer learning, IEEE TKDE, 22(1), 21.

TRANSFER LEARNING ONE POSSIBLE VIEW SOURCE DOMAIN Lots of labeled data Truth values are known feature construction TARGET DOMAIN Output labels Classification machine same features

CAUSALITY & FEATURE CONSTRUCTION FOR TRANSFER LEARNING If we know the truth values for X and Y relationship then construct features such as: Y 3 x 14 correlation = -.36627 2.5 2 1.5 1.5 -.5-1 -1.5-4 -3-2 -1 1 2 3 4 X x 1 4 independence based: correlation chi square and so on causality based IGCI ANM PNM and so on statistical percentiles medians and so on machine learning errors of prediction and so on

CAUSALITY AND TRANSFER LEARNING THE WHOLE PICTURE PAIR 1 PAIR 2 PAIR 3 X->Y Y->X X Y.1215.1855.37 -.64.225.6551.1557.3448.55 -.1891.537.4515.1692.2291.3983 -.6.388.7383.1114.3994.518 -.288.445.2788.1947.359.56 -.1113.596.6363.3416.2861.6278.555.978 1.1939.2519.4929.7449 -.241.1242.5111.1769.1232.32.537.218 1.4356 PAIR 1 LABEL CORR IG CHI-SQ ANM PAIR 2 LABEL CORR IG CHI-SQ ANM PAIR 3 LABEL CORR IG CHI-SQ ANM PAIR i PAIR j PAIR k unknown unknown unknown.783.5261.645 -.4478.412.1488.92.2827.3728 -.1925.255.319.125.565.6314 -.3815.633.2468.148.3727.5135 -.232.525.3777.4615.4928.9543 -.314.2274.9364 features Classification machine Output

OUTLINE Causality Coefficients for computing causality Independence measures Probabilistic Determining the direction of arrows Transfer learning Causality challenge Conclusions

CAUSE EFFECT PAIRS CHALLENGE Generated from artificial and real data (geography, demographics, chemistry, biology, etc.: Training Data: 45 pairs (truth values : known) Validation Data: 45 pairs (truth values : unknown) Test Data: 45 pairs (truth values : unknown) Can be categorical, numerical or binary Identity of variables in all cases: unknown REF: Guyon, Results and analysis of the 213 ChaLearn cause-effect pair challenge, NIPS 213. REF: http://www.causality.inf.ethz.ch/cause-effect.php

CAUSE EFFECT PAIRS CHALLENGE https://www.kaggle.com/c/cause-effect-pairs

WHAT WERE THE BEST METHODS Pre-processing: Smoothing, binning, transforms, noise removal etc. Feature extraction: Independence, entropy, residuals, statistical features etc. Dimensionality reduction: Feature selection, PCA, ICA, clustering Classifier : Random forests, decision trees, neural networks etc. REF: Guyon, Results and analysis of the 213 ChaLearn cause-effect pair challenge, NIPS 213.

INTERESTING RESULTS... TRANSFER LEARNING NO RETRAINING Jarfo.87.997 FirfiD.6.984 ProtoML.81.99 RETRAINING 3648 gene network cause effect pairs from Ecoli regulatory network REF: Guyon, Results and analysis of the 213 ChaLearn cause-effect pair challenge, NIPS 213. REF: http://gnw.sourceforge.net/dreamchallenge.html

CONCLUSIONS In many cases just one causal coefficient is not enough and so you may have to train a classifier with multiple causal features Research on causal inference from the past decade has shown that it is possible to isolate cause and effect pairs from observational data, to a great extent THANK YOU

REFERENCES 1. Statnikovet al., new methods for separating causes from effects in genomics data, BMC Genomics, 212. 2. NIPS 213 Workshop on Causality http://clopinet.com/isabelle/projects/nips213/ 3. Pan and Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, 22(1), 21 4. Kaggle website on machine learning challenges and cause effect pairs challenge, www.kaggle.com 5. All datasets are taken from the causality challenge: https://www.kaggle.com/c/cause-effect-pairs