Egészségügyi informatika és biostatisztika Biomarkerek

Size: px

Start display at page:

Download "Egészségügyi informatika és biostatisztika Biomarkerek"

Brett French
5 years ago
Views:

1 Egészségügyi informatika és biostatisztika Biomarkerek Antal Péter Computational Biomedicine (Combine) workgroup Department of Measurement and Information Systems, Budapest University of Technology and Economics 1

2 Áttekintés Okozati és diagnosztikai markerek felfedezésének problémái Genetikai asszociációs kutatások Tumor marker kutatások Biomarker aspektusai, típusai, dimenziói Valószínűségi gráfos hálózatok, oksági Bayes-hálózatok További információ hasznossága Diagnosztikai jóság mértékei A biomarker tanulás statisztikai nehézsége 2

3 Pariticpants Pariticpants Genetic association studies (GAS) Clinical/ Demographic information A quantitative/ binary disease information Genotyping/sequencing Genetic variants Single Nucleotied Polymorphisms (SNPs) SNP i? Disease (Pairwise) statistical test of association Relevant SNPs for a disease with complex genetic background. Risch, N. and Merikangas, K., The future of genetic studies of complex human diseases. Science, 273(5281), pp

4 Genetikai asszociációs adatok Validációs SNP-ek Teljes genomi polimorfizmus adat (~1m, <100$) Kipótolt polimorfizmus adat (5m) Exome szekvenálás (1% WGS, x10k/x100k, 500$-1000$) Teljes genom szekvenálás (600$-2000$) 4

5 Megmagyarázott variancia (R 2 ) A hiányzó örökletesség Betegség Lókuszok száma Megmagyarázott örökletesség Örökletességi mérték T2 diabetesz 18 6%Testvéri előfordulás HDL cholesterol % Fenotipusos variancia Testmagasság 40 5%Fenotipusos variancia Szkizofrénia 5 3%Ikerbeli előfordulás 5

6 Tumor markers New omic markers genomics, proteomics, metabolomics,.. Missing the mark, 2007, Nature MISSING THE MARK: Why is it so hard to find a test to predict cancer?, 2011, Nature

7 Aspects of biomarkers Maximum predictivity, minimum redundancy Predictive power Directness Causality Multiple target Uncertainty

8 Questions about biomarkers Identification of a weakly significant biomarker Among huge number of irrelent factors, correction for multiple hypothesis testing Identification of weakly significant biomarkers Identification of interactions Identification of multitarget biomarkers Identification of a biomarker relevant for multiple aspects Identification of context-specific biomarkers Pleiotropic interactions and epistatic interactions Identification of pure effect modifiers Biomarkers without main effects: parametrically and structurally (structural) Discrimination of direct/indirect biomarkers Strong/weak relevance (structural) Discrimination of diagnostic and target biomarkers Causes versus effects Estimation of effect size Adjusting for confounding Optimal selection of a diagnostic biomarker Optimal selection of sequence of diagnostic biomarkers 8

9 Kérdéstípusok az orvosi döntéstámogatásban Diagnosztikai következtetés P(Diagnózis Passzív megfigyelések) Legkisebb várható veszteségű diagnózis passzív megfigyelések esetén Optimális információgyűjtés További információ hatása a következtetésre: P(Diagnózis megfigyelések, új megfigyelés) További információ hasznossága Terápiás következtetés P(Kimenetel Megfigyelés, Beavatkozás) Kontrafaktuális következtetés P(ElképzeltKimenetel Megfigyelés, Beavatkozás,Kimenetel, ElképzeltBeavatkozás) 9

10 Bayesian networks: interpretations M P ={I P,1 (X 1 ;Y 1 Z 1 ),...} ), ( ) ( ), ( ) ( ) ( ),,,, ( M S T P D S P M O D P M O P M P T S D O M P 3. Concise representation of joint distributions 2. Graphical representation of (in)dependencies 1. Causal model 4. Decision network

11 Motivation: from observational inference In a Bayesian network, any query can be answered corresponding to passive observations: p(q=q E=e). What is the (conditional) probability of Q=q given that E=e. Note that Q can preceed temporally E. X Y Specification: p(x), p(y X) Joint distribution: p(x,y) Inferences: p(x), p(y), p(y X), p(x Y) 12/1/2017 A.I. 11

12 Motivation: to interventional inference Perfect intervention: do(x=x) as set X to x. What is the relation of p(q=q E=e) and p(q=q do(e=e))? X Y Specification: p(x), p(y X) Joint distribution: p(x,y) Inferences: p(y X=x)=p(Y do(x=x)) p(x Y=y) p(x do(y=y)) What is a formal knowledge representation of a causal model? What is the formal inference method? 12/1/2017 A.I. 12

13 Principles of causality strong association, X precedes temporally Y, plausible explanation without alternative explanations based on confounding, necessity (generally: if cause is removed, effect is decreased or actually: y would not have been occurred with that much probability if x had not been present), sufficiency (generally: if exposure to cause is increased, effect is increased or actually: y would have been occurred with larger probability if x had been present). Autonomous, transportable mechanism. The probabilistic definition of causation formalizes many, but for example not the counterfactual aspects. 12/1/2017 A.I. 13

14 Conditional independence I P (X;Y Z) or (X Y Z) P denotes that X is independent of Y given Z: P(X;Y z)=p(y z) P(X z) for all z with P(z)>0. (Almost) alternatively, I P (X;Y Z) iff P(X Z,Y)= P(X Z) for all z,y with P(z,y)>0. Other notations: D P (X;Y Z) =def= I P (X;Y Z) Contextual independence: for not all z.

15 The independence model of a distribution The independence map (model) M of a distribution P is the set of the valid independence triplets: M P ={I P,1 (X 1 ;Y 1 Z 1 ),..., I P,K (X K ;Y K Z K )} If P(X,Y,Z) is a Markov chain, then M P ={D(X;Y), D(Y;Z), I(X;Z Y)} Normally/almost always: D(X;Z) Exceptionally: I(X;Z) X Y Z

16 The independence map of a N-BN Y X Z If P(Y,X,Z) is a naive Bayesian network, then M P ={D(X;Y), D(Y;Z), I(X;Z Y)} Normally/almost always: D(X;Z) Exceptionally: I(X;Z)

17 Bayesian networks: the three facets M P ={I P,1 (X 1 ;Y 1 Z 1 ),...} ), ( ) ( ), ( ) ( ) ( ),,,, ( M S T P D S P M O D P M O P M P T S D O M P 3. Concise representation of joint distributions 2. Graphical representation of (in)dependencies 1. Causal model

18 Inferring independencies from structure: d-separation I G (X;Y Z) denotes that X is d-separated (directed separated) from Y by Z in directed graph G.

19 d-separation and the global Markov condition

20 Representation of independencies For certain distributions exact representation is not possible by Bayesian networks, e.g.: 1. Intransitive Markov chain: X Y Z 2. Pure multivariate cause: {X,Z} Y V 3. Diamond structure: P(X,Y,Z,V) with M P ={D(X;Z), D(X;Y), D(V;X), D(V;Z), I(V;Y {X,Z}), I(X;Z {V,Y}).. }. X Z Y

21 Markov blanket (and boundary) 12/1/2017 A.I. 21

22 A jegykiválasztási probléma The feature subset selection (FSS) problem Egy X i jegy erősen releváns, ha létezik x i,y i, és s i = x 1, x i-1,x i+1,,x n úgy, hogy p(x i,s i )>0 és p(y x i,s i ) p(y s i ). Egy X i jegy gyengén releváns, ha létezik x i,y i, és valamely s i úgy, hogy p(x i,s i )>0 és p(y x i,s i ) p(y s i ).

23 Biomarkers and the feature subset selection (FSS) problem

24 A Bayesian network definition A directed acyclic graph (DAG) G is a Bayesian network of distribution P(U) iff P(U) obeys the global Markov condition with respect to G and G is minimal (i.e. no edges can be omitted without violating this property). 12/1/2017 A.I. 24

25 A practical definition

26 Association vs. Causation Causal models: X Y X Y X causes Y Y causes X X * Y... There is a common cause (pure confounding) X *... * Y Causal effect of Y on X is confounded by many factors From passive observations: M P ={D(X;Y)} P(X,Y) X Y X and Y are associated Reichenbach's Common Cause Principle: a correlation between events X and Y indicates either that X causes Y, or that Y causes X, or that X and Y have a common cause.

27 The building block of causality: v-structure (arrow of time) p(x),p(z X),p(Y Z) X Z Y p(x Z),p(Z Y),p(Y) X Z Y p(x Z),p(Z),p(Y Z) X Z Y transitive M intransitive M p(x),p(z X,Y),p(Y) X Y Z v-structure M P ={D(X;Z), D(Z;Y), D(X,Y), I(X;Y Z)} M P ={D(X;Z), D(Y;Z), I(X;Y), D(X;Y Z) } Often (confounding): present knowledge renders (otherwise dependent) future states conditionally independent. Ever(?): present knowledge renders (otherwise independent) future states conditionally dependent.

28 Observational equivalence of causal models

29 A limits of learnability: compelled edges ( can we interpret edges as causal relations? compelled edges)

30 Interventional inference in causal Bayesian networks (Passive, observational) inference P(Query Observations) Interventionist inference P(Query Observations, Interventions) Counterfactual inference P(Query Observations, Counterfactual conditionals)

31 Interventions and graph surgery If G is a causal model, then compute p(y do(x=x)) by 1. deleting the incoming edges to X 2. setting X=x 3. performing standard Bayesian network inference. Mutation? Subpopulation Location Disease E X? Y *

32 Local Causal Discovery can we interpret edges as causal relations in the presence of hidden variables? Can we learn causal relations from observational data in presence of confounders??? Smoking Increased propensity Smoking A genetic polymorphism* Increased susceptibility Lung cancer Lung cancer Automated, tabula rasa causal inference from (passive) observation is possible, i.e. hidden, confounding variables can be excluded E X? Y??? *

33 Questions about biomarkers Identification of a weakly significant biomarker Among huge number of irrelent factors, correction for multiple hypothesis testing Identification of weakly significant biomarkers Identification of interactions Identification of multitarget biomarkers Identification of a biomarker relevant for multiple aspects Identification of context-specific biomarkers Pleiotropic interactions and epistatic interactions Identification of pure effect modifiers Biomarkers without main effects: parametrically and structurally (structural) Discrimination of direct/indirect biomarkers Strong/weak relevance (structural) Discrimination of diagnostic and target biomarkers Causes versus effects Estimation of effect size Adjusting for confounding Optimal selection of a diagnostic biomarker Optimal selection of sequence of diagnostic biomarkers 33

34 Sensitivity of the inference 1 P(Pathology=malignant E=e) Evidence e

35 Decision theory probability theory+utility theory Decision situation: Actions Outcomes Probabilities of outcomes Utilities/losses of outcomes QALY, micromort Maximum Expected Utility Principle (MEU) Best action is the one with maximum expected utility a o i j p U ( o j ai ) ( o j ai ) ( a i) j U( oj ai ) p( oj ai EU ) a* arg max i EU ( ai ) Actions a i (which experiment) Outcomes (e.g. dataset) Probabilities Utilities, costs Expected utilities P(o j a i ) U(o j ), C(a i ) EU(a i ) = P(o j a i )U(o j ) a i o j

36 Maximizing expected utility 12/1/2017 A.I. 36

39 Value of (perfect) Information 12/1/2017 A.I. 39

41 Extensions Bayesian learning Predictive inference Parametric inference Value of further information Sequential decisions Optimal stopping (secretary problem) Multiarmed bandit problem Markov decision problem. U i U (e i ) e i a i =s i * D ij U i1 (s i *) U i+1 U i1 (D ir )

42 Characterizing a biomarker/test Sensitivity: p(prediction=true Ref=TRUE) Specificity: p(prediction=false Ref=FALSE) PPV: p(ref=true Prediction=TRUE) NPV: p(ref=false Prediction=FALSE) Healthy Disease present threshold t

43 Questions about biomarkers Identification of a weakly significant biomarker Among huge number of irrelent factors, correction for multiple hypothesis testing Identification of weakly significant biomarkers Identification of interactions Identification of multitarget biomarkers Identification of a biomarker relevant for multiple aspects Identification of context-specific biomarkers Pleiotropic interactions and epistatic interactions Identification of pure effect modifiers Biomarkers without main effects: parametrically and structurally (structural) Discrimination of direct/indirect biomarkers Strong/weak relevance (structural) Discrimination of diagnostic and target biomarkers Causes versus effects Estimation of effect size Adjusting for confounding Optimal selection of a diagnostic biomarker Optimal selection of sequence of diagnostic biomarkers 43

44 Why can we learn? The most incomprehensible thing about the world is that it is at all comprehensible. Albert Einstein. No theory of knowledge should attempt to explain why we are successful in our attempt to explain things. K.R.Popper: Objective Knowledge, 1972 Possibility of learning is an empirical observation.

45 Principles for induction Epicurus' (342? B.C B.C.) principle of multiple explanations which states that one should keep all hypotheses that are consistent with the data. The principle of Occam's razor ( , sometimes spelt Ockham). Occam's razor states that when inferring causes entities should not be multiplied beyond necessity. This is widely understood to mean: Among all hypotheses consistent with the observations, choose the simplest. In terms of a prior distribution over hypotheses, this is the same as giving simpler hypotheses higher a priori probability, and more complex ones lower probability.

46 Bayesian model averaging Russel&Norvig: Artificial intelligence, ch.20

47 Bayesian Model Averaging example Russel&Norvig: Artificial intelligence

48 Learning rate for models Russel&Norvig: Artificial intelligence

49 Learning rate for model predictions Russel&Norvig: Artificial intelligence

52 Probably Approximately Correct (PAC)-learning To have at least δ probability of approximate correctness: H (1 ε) n δ By expressing the sample size as function of ε accuracy and δ confidence we get a bound for sample complexity 1/ε(ln H + ln 1 δ ) n

53 Decision trees One possible representation for hypotheses E.g., here is the true tree for deciding whether to wait:

54 Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row path to leaf: Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples Prefer to find more compact decision trees

55 Hypothesis spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry Rain)? Each attribute can be in (positive), in (negative), or out 3 n distinct conjunctive hypotheses More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent with training set may get worse predictions

56 Multiple testing problem (MTP) If we perform N tests and our goal is p(falserejection 1 or or FalseRejection N )<α then we have to ensure, e.g. that for all p(falserejection i )< α/n loss of statistical power (probability of discovery of a true hypothesis)!

57 Solutions for MTP Study design: incorporation of prior Corrections Permutation tests Generate perturbed data sets under the null hypothesis: permute predictors and outcome. False discovery rate, q-value Bayesian approach

58 Corrections for multiple testing

59 Corrections for multiple testing I have 1,000,000 hypotheses that are not mutually exclusive. 1. I test them all. Correction? 2. I plan to test them all, but I run out of resources after testing only one of them. Correction? 3. I test one of them, and a year later test the others. Correction? If so, when? 4. I only test the first one because that is the one I suspect. Correction? 5. I run an algorithm that prunes unlikely hypotheses, keeping only 100,000. Correction for 100,000 or for 1,000,000 hypotheses? (R.Neopolitan, 2010)

60 Permute outcome/target Permutation testing Outcome Predictor variables Y X 1 X n Samples A random permutation guarantees the independency of the outcome Y. A random permutation corresponds to an artificial data set from the null model. direct estimation of the p-value: the probability of observing a more extreme data set from the null model with the same sample size. p( D perm N : IncompatibilityWithNull ( D real N ) IncompatibilityWithNull ( D perm N ))

61 False discovery rate (FDR) I. Another aspect of multiple hypothesis testing: the probability of Type I. error for any tests the expected number of Type I. errors at a given significance level (False discovery rate, FDR) q-value: the minimum FDR at which the test may be called significant.

62 False discovery rate (FDR) II.

63 Bayesian-network based Bayesian multilevel analysis (BN-BMLA) Hierarchic statistical questions about typed relevance can be translated to questions about Bayesian network structural features: Pairwise association Markov Blanket Memberhsips (MBM) Multivariable analysis Markov Blanket sets (MB) Multivariable analysis with interactions Markov Blanket Subgraphs (MBG) Complete dependency models Partially Directed Acyclic Graphs (PDAG) Complete causal models Bayesian network (BN) Hierarchy of levels BN PDAG MBG MB MBM

64 Kapcsolt, eltérő absztrakciós szintű hipotézisosztályok Az asszociációs tanulmányok kérdései Bayes hálók strukturális jegyeivel formalizálhatók Páronkénti erős relevancia: Markov Blanket Memberhsips (MBM) Többváltozós erős relevancia: Markov Blanket sets (k-sub/sup-mbs) Multifaktoriális interakciós algráf: Markov Blanket Subgraphs (MBG, C-RPDAG) Teljes interakciós modell: Részlegesen irányított Bayes háló (PDAG) Teljes oksági modell: Bayes háló (BN) Kapcsolt, absztrakciós szintek: DAG=>PDAG=>MBG=>C-RPDAG=>MB=>MBM

Causal Bayesian networks. Peter Antal

Causal Bayesian networks. Peter Antal Causal Bayesian networks Peter Antal antal@mit.bme.hu A.I. 4/8/2015 1 Can we represent exactly (in)dependencies by a BN? From a causal model? Suff.&nec.? Can we interpret edges as causal relations with