Causal Discovery Methods Using Causal Probabilistic Networks

Size: px

Start display at page:

Download "Causal Discovery Methods Using Causal Probabilistic Networks"

Alberta Norton
5 years ago
Views:

1 ausal iscovery Methods Using ausal Probabilistic Networks MINFO 2004, T02: Machine Learning Methods for ecision Support and iscovery onstantin F. liferis & Ioannis Tsamardinos iscovery Systems Laboratory epartment of iomedical Informatics Vanderbilt University 276

2 esire for ausal Knowledge iagnosis Knowing that people with cancer often have yellowstained fingers and feel fatigue, diagnose lung cancer Prevention Need to know that Smoking causes lung cancer to reduce the risk of cancer Treatment Knowing that the presence of protein X causes cancer, inactivate protein X, using medicine Y that causes X to be inactive ausal Knowledge NOT required ausal Knowledge required 277

3 Importance of ausal iscovery Today What SNP combination causes what disease How genes and proteins are organized in complex causal regulatory networks How behaviour causes disease How genotype causes differences in response to treatment How the environment modifies or even supersedes the normal causal function of genes 278

4 What is ausality? Thousands of years old problem, still debated Operational Informal efinition: ssume the existence of a mechanism M capable of setting values for a variable. We say that can be manipulated by M to take the desired values. Variable causes variable, if: in a hypothetical randomized controlled experiment in which is randomly manipulated via M (i.e., all possible values a i of are randomly assigned to via M) we would observe in the sample limit that P(= b = a i ) P(= b =a j ) for some i j. efinition is stochastic Problems: self-referencing, ignores time-dependence, variables that need to be co-manipulated, etc. 279

5 ausation and ssociation What is the relationship between the two? If causes, are and always associated? If is associated with are they always causes or effects of each other? (directly?, indirectly?, conditionally, unconditionally?) 280

6 Statistical Indistinguishability SMOKING LUNG S1 GN SMOKING LUNG S2 GN S3 SMOKING LUNG 281

7 RNOMIZ ONTROLL TRILS SMOKING LUNG S1 GN SMOKING GN LUNG S2 S3 ssociation is still retained even after manipulating Smoking SMOKING LUNG 282

8 RTs re not always feasible! Unethical (smoking) ostly/time consuming (gene manipulation, epidemiology) Impossible (astronomy) xtremely large number 283

9 Large-Scale ausal iscovery without RTs? Heuristics to the rescue What is a heuristic? When the heuristic condition holds, most probably a causal association holds 284

10 Pitfalls of ausal Heuristics If is a robust and strong predictor of T then is likely a cause of T - xample: Feature selection - xample: Predictive Rules - In the example: Tuberculosis is a strong predictor of Lung ancer (when Haemoptysis is included as a predictor), but not causally associated with Lung ancer Lung a Tuberculosis Haemoptysis 285

11 Pitfalls of ausal Heuristics The closer and T are in a causal sense, the stronger their correlation (localizes causality as well) Poor Fitness may be a stronger predictor (univariately) to lung cancer than either Occupation or Smoking individually; yet the latter predictors are causally closer to Lung ancer Smoking Fatigue Occupation Lung ancer Poor Fitness nemia Stress (Potentially) Smallest predictor set with optimal accuracy 286

12 The Problem with ausal iscovery ausal heuristics are unreliable ausation is difficult to define RTs are not always doable Major causal knowledge does not have RT backing! 287

13 Formal omputational ausal iscovery from Observational ata Formal algorithms exist! Most are based on a graphical-probabilistic language called ausal Probabilistic Networks (a.k.a. ausal ayesian Networks ) Well-characterized properties of What types of causal relations they can learn Under which conditions What kind of errors they may make 288

14 Types of ausal iscovery Questions What will be the effect of a manipulation to the system Is causing, causing, or neither? Is causing directly (no other observed variables interfere)? What is the smallest set of variables for optimally effective manipulation of? an we infer the presence of hidden confounder factors/variables? 289

15 Formal Language for Representing ausality ayesian Networks dges: probabilistic dependence Markov ondition: node N is independent from non-descendants given its parents Probabilistic reasoning ausal ayesian Networks dges represent direct causal effects ausal Markov ondition: node N is independent from nondescendants given its direct causes Probabilistic reasoning + causal inferences 290

16 ausal ayesian Networks There may be many (non-causal) Ns that capture the same distribution. ll such Ns have the same edges (ignoring direction) same v- structures Statistically equivalent G G G 291

17 ausal ayesian Networks If there is a (faithful) ausal ayesian Network that captures the data generation process, it has to have the same edges and same v-structures as any (faithful) ayesian Network that is induced by the data. We can infer what the direct causal relations are We can infer some of the directions of the edges Gene1 Gene2 Gene1 Gene2 Gene3 Gene1 Gene2 292

18 Faithfulness When d-separation independence Intuitively, an open path between and means there is association between them in the data Previous discussion holds for faithful Ns only Faithful N is a very large class of Ns 293

19 Learning ayesian Networks: onstraint-ased pproach n edge X Y (of unknown direction) exists, if and only if for all sets of nodes S, ep(x, Y S) (allows discovery of the edges) Test all subsets. If ep(x,y s) holds, add the edge, otherwise do not. If structure contains F, ep(x, Y S), then F and for every set S that F 294

20 Learning ayesian Networks: onstraint-ased pproach Tests of conditional dependences and independencies from the data stimation using G 2 statistic, conditional mutualinformation, etc. Infer structure and orientation from results of tests ased on the assumption these tests are accurate The larger the number of nodes in the conditioning set, the more samples are required to estimate the dependence, Ind(,,,) more sample than Ind(,,) For relatively sparse networks, we can d-separate two nodes conditioned on a couple of variables (sample requirements in the low hundreds) 295

21 Learning ayesian Networks: Searchand-Score Score each possible structure ayesian score: P(Structure ata) Search in the space of all possible Ns structures to find the one that maximizes score. Search space too large. Greedy or local search is typical. Greedy search: add, delete, or reverse the edge that increases the score the most. 296

22 The P algorithm (Spirtes, Glymour, Scheines 1993) Phase I: dge detection Start with a fully connected undirected network For each subset of variables of size n=0, 1, For each remaining edge If there is a subset S of variables still connected to or of size n such Ind(; S), remove edge Phase II: dge orientation For every possible V-structure with missing If ep(, ), orient While no more orientations possible If and missing, orient it as If there is a path. orient the edge as 297

23 Trace xample of the P True Graph urrent candidate graph Start with a fully connected undirected network 298

24 Trace xample of the P True Graph urrent candidate graph For subsets of size 0 For each remaining edge If there is a subset S of variables still connected to or of size n such Ind(; S), remove edge No independencies discovered 299

25 Trace xample of the P True Graph urrent candidate graph For subsets of size 1 For each remaining edge If there is a subset S of variables still connected to or of size n such Ind(; S), remove edge Ind(, ) Ind(, ) Ind(, ) Ind(, ) 300

26 Trace xample of the P True Graph urrent candidate graph For subsets of size 2 For each remaining edge If there is a subset S of variables still connected to or of size n such Ind(; S), remove edge Ind(,,) 301

27 Trace xample of the P True Graph urrent candidate graph Phase II: dge orientation For every possible V-structure: with missing If ep(, ), orient ondition does not hold 302

28 Trace xample of the P True Graph urrent candidate graph Phase II: dge orientation For every possible V-structure: with missing If ep(, ), orient ondition does not hold 303

29 Trace xample of the P True Graph urrent candidate graph Phase II: dge orientation For every possible V-structure with missing If ep(, ), orient ondition does not hold 304

30 Trace xample of the P True Graph urrent candidate graph Phase II: dge orientation For every possible V-structure with missing If ep(, ), orient ondition holds 305

31 Trace xample of the P True Graph urrent candidate graph Final output! 306

32 Min-Max Hill limbing algorithm rown,tsamardinos, liferis, MedInfo 2004 ased on the same ideas as P and uses tests of conditional independence Uses different search strategy to identify interesting independence relations Similar quality results as P but scales up to tens of thousands of variables (P can only handle a couple of hundred variables) 307

33 Local ausal iscovery Max-Min Parents and hildren: returns the parents and children of a target variable Tsamardinos, liferis, Statnikov K 2003 Scales-up to tens of thousands of variables F G I H J K L 308

34 Local ausal iscovery Max-Min Markov lanket: returns the parents and children of a target variable Scales-up to tens of thousands of variables HITON (liferis, Tsamardinos, Statnikov MI 2003) close variant: different heuristic+wrapping with a classifier to optimize for variable selection tasks F G I H J K L 309

35 Local ausal iscovery- ifferent Flavor Mani&ooper 2000, 2001, Silverstein, rin, Motwani, Ullman Rule 1:,, pairwise dependent, Ind(, ), has no causes within the observed variables (e.g. temperature in a gene expression experiment), then Α Rule 2: ep(, ), ep(, ), Ind(, ), ep(, ), then iscovers a coarser causal model (ancestor relations and indirect causality) 310

36 FI ausal iscovery with Hidden onfounders SOIL NV. GN OUPTION SMOKING LUNG Ind(S,L ) ep(s,l SM) Ind(SM,O ) ep(sm,o L) The only consistent model with all tests is one that has a hidden confounder 311

37 Other ausal iscovery lgorithms Large body of work in ayesian (or other) search and score methods; still similar set of assumptions (Neapolitan 2003) Learning with linear Structural quation Models in systems in static equilibria (allows feedback loops) (Richardson, Spirtes 1999) Learning in the presence of selection bias (ooper 1995) Learning from mixtures of experimental and observational data (ooper, Yoo, 1999) 312

38 onclusions It is possible to perform causal discovery from observational data without Randomized ontrolled Trials! Heuristic methods are typically used instead of formal causal discovery methods; their properties and their relative efficacy are unknown ausal discovery algorithms also make assumptions but have well-characterized properties There is a plethora of different algorithms with different properties and assumptions for causal discovery There is still plenty of work to be done 313

39 Suggested Further Reading Neapolitan, Learning ayesian Networks, Prentice Hall, 2003 d. Glymour, ooper, omputation, ausation, and iscovery, I Press/MIT Press, 1999 d. Jordan, Learning in Graphical Models, MIT Press, 1998 Spirtes, Glymour, Scheines, ausation, Prediction, Search, MIT Press

Machine Learning Summer School

Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK