Greedy Biomarker Discovery in the Genome with Applications to Antibiotic Resistance

Size: px
Start display at page:

Download "Greedy Biomarker Discovery in the Genome with Applications to Antibiotic Resistance"

Transcription

1 Greedy Biomarker Discovery in the Genome with Applications to Antibiotic Resistance Alexandre Drouin, Sébastien Giguère, Maxime Déraspe, François Laviolette, Mario Marchand, Jacques Corbeil Department of Computer Science and Software Engineering, Université Laval Department of Molecular Medicine, Université Laval Institute for Research in Immunology and Cancer, Université de Montréal Greed is Great ICML 2015 Lille, France July 10, 2015 Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

2 Outline 1 Introduction Genomics Formalization 2 Methods Data Representation for Genomes The Set Covering Machine Risk bounds 3 Results Dataset Overview Benchmark 4 Conclusion Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

3 Introduction Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

4 Genomics Study of the entire genetic material of individuals DNA is composed of four nucleotides (A, T, G, C) DNA molecules are sequenced using DNA sequencers Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

5 Cost of sequencing Consequence: more and more data to analyse Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

6 Biomarker Discovery: Case-Control Studies Cancer Healthy VS Biomarker: a measurable characteristic that is predictive of some biological state Motivation Obtain a better understanding of the biological processes involved Develop diagnostic tests, new therapies and drug treatments Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

7 Biomarker Discovery: Case-Control Studies Cancer Healthy VS Biomarker: a measurable characteristic that is predictive of some biological state Motivation Obtain a better understanding of the biological processes involved Develop diagnostic tests, new therapies and drug treatments Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

8 Formalization as a Supervised Learning Problem Data Sample S = def {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ))} D m x X = def {A, T, G, C} is a genome y {0, 1} is a label (control or case) D is a data generating distribution Objective 1 Define a suitable representation for genomes φ : X R d 2 Find a predictor h : R d {0, 1} that has a good generalization performance, i.e. that minimizes: R(h) = def Pr [h(φ(x)) y] (x,y) D Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

9 Formalization as a Supervised Learning Problem Data Sample S = def {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ))} D m x X = def {A, T, G, C} is a genome y {0, 1} is a label (control or case) D is a data generating distribution Objective 1 Define a suitable representation for genomes φ : X R d 2 Find a predictor h : R d {0, 1} that has a good generalization performance, i.e. that minimizes: R(h) = def Pr [h(φ(x)) y] (x,y) D Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

10 Formalization as a Supervised Learning Problem Data Sample S = def {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ))} D m x X = def {A, T, G, C} is a genome y {0, 1} is a label (control or case) D is a data generating distribution Objective 1 Define a suitable representation for genomes φ : X R d 2 Find a predictor h : R d {0, 1} that has a good generalization performance, i.e. that minimizes: R(h) = def Pr [h(φ(x)) y] (x,y) D Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

11 The Biomarker Discovery Problem Additionnal objective : Model Interpretability Must have a form that is understandable by domain experts (validation/acceptance) Some types of models are more easily understood (rule-based vs linear combination) Sparsity is essential (less costs, faster diagnostics) Challenges Extremely high dimensional feature spaces (d is often > 10 7 ) Many highly correlated features (genes) Small sample size (m d) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

12 The Biomarker Discovery Problem Additionnal objective : Model Interpretability Must have a form that is understandable by domain experts (validation/acceptance) Some types of models are more easily understood (rule-based vs linear combination) Sparsity is essential (less costs, faster diagnostics) Challenges Extremely high dimensional feature spaces (d is often > 10 7 ) Many highly correlated features (genes) Small sample size (m d) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

13 Methods Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

14 Genome Representation Definition k-mer: a sequence of k nucleotides Note: There are 4 k possible sequences (HUGE) Definition K: the set of all k-mers that are at least in one genome of S Note: K can STILL be HUGE (tens of millions in our case) We represent each genome x by a binary vector φ(x) B K, such that { 1, if k j K is a substring of x φ(x) j = 0, otherwise Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

15 Genome Representation Definition k-mer: a sequence of k nucleotides Note: There are 4 k possible sequences (HUGE) Definition K: the set of all k-mers that are at least in one genome of S Note: K can STILL be HUGE (tens of millions in our case) We represent each genome x by a binary vector φ(x) B K, such that { 1, if k j K is a substring of x φ(x) j = 0, otherwise Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

16 Genome Representation Definition k-mer: a sequence of k nucleotides Note: There are 4 k possible sequences (HUGE) Definition K: the set of all k-mers that are at least in one genome of S Note: K can STILL be HUGE (tens of millions in our case) We represent each genome x by a binary vector φ(x) B K, such that { 1, if k j K is a substring of x φ(x) j = 0, otherwise Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

17 Genome Representation (Example) Représenta&on* bagxofxwords * K = { CAGATA* AGATAG* AACAGC* GATAGA* AGAACA* TAGAAC* GAACAG* ATAGAA* TTTCGG* CGATGA* CCGGCT* AAATAC* { x = CAGATAGAACAGC* (x) = 1* 0* 1* 1* 0* 1* 1* 0* 1* 1* 1* 0* CAGATA* TTTCGG* AGATAG* GATAGA* CGATGA* AACAGC* ATAGAA* CCGGCT* TAGAAC* GAACAG* AGAACA* AAATAC* Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

18 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Learns conjunctions or disjunctions of boolean-valued rules: r i : R d {True, False} We use a presence and an absence rule for each k-mer Objective Given a set of boolean-valued rules R, find the predictor that minimizes the empirical risk: R S def = m i=1 1 m I [h(φ(x i )) y i ], while using the smallest subset of R. This problem is NP-hard (minimum set cover problem) Solution: Use a greedy approximation algorithm inspired by the one of Chvátal (1979) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

19 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Learns conjunctions or disjunctions of boolean-valued rules: r i : R d {True, False} We use a presence and an absence rule for each k-mer Objective Given a set of boolean-valued rules R, find the predictor that minimizes the empirical risk: R S def = m i=1 1 m I [h(φ(x i )) y i ], while using the smallest subset of R. This problem is NP-hard (minimum set cover problem) Solution: Use a greedy approximation algorithm inspired by the one of Chvátal (1979) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

20 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Learns conjunctions or disjunctions of boolean-valued rules: r i : R d {True, False} We use a presence and an absence rule for each k-mer Objective Given a set of boolean-valued rules R, find the predictor that minimizes the empirical risk: R S def = m i=1 1 m I [h(φ(x i )) y i ], while using the smallest subset of R. This problem is NP-hard (minimum set cover problem) Solution: Use a greedy approximation algorithm inspired by the one of Chvátal (1979) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

21 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Learns conjunctions or disjunctions of boolean-valued rules: r i : R d {True, False} We use a presence and an absence rule for each k-mer Objective Given a set of boolean-valued rules R, find the predictor that minimizes the empirical risk: R S def = m i=1 1 m I [h(φ(x i )) y i ], while using the smallest subset of R. This problem is NP-hard (minimum set cover problem) Solution: Use a greedy approximation algorithm inspired by the one of Chvátal (1979) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

22 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Learns conjunctions or disjunctions of boolean-valued rules: r i : R d {True, False} We use a presence and an absence rule for each k-mer Objective Given a set of boolean-valued rules R, find the predictor that minimizes the empirical risk: R S def = m i=1 1 m I [h(φ(x i )) y i ], while using the smallest subset of R. This problem is NP-hard (minimum set cover problem) Solution: Use a greedy approximation algorithm inspired by the one of Chvátal (1979) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

23 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Greedy Algorithm (Conjunction Case) 1 Start with an empty conjunction 2 Compute a utility function for each rule of R 3 Select the rule with the greatest utility (r ) 4 Remove all the examples for which r (φ(x)) = False 5 Go to step 2 until one of the following is true: All the negative examples have been removed s iterations have been performed (hyperparameter) Motivation for Step 4 The outcome of the conjunction is definitive for any example that is predicted as negative by at least one rule. There is no need to consider these examples for further iterations Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

24 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Greedy Algorithm (Conjunction Case) 1 Start with an empty conjunction 2 Compute a utility function for each rule of R 3 Select the rule with the greatest utility (r ) 4 Remove all the examples for which r (φ(x)) = False 5 Go to step 2 until one of the following is true: All the negative examples have been removed s iterations have been performed (hyperparameter) Motivation for Step 4 The outcome of the conjunction is definitive for any example that is predicted as negative by at least one rule. There is no need to consider these examples for further iterations Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

25 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Greedy Algorithm (Conjunction Case) 1 Start with an empty conjunction 2 Compute a utility function for each rule of R 3 Select the rule with the greatest utility (r ) 4 Remove all the examples for which r (φ(x)) = False 5 Go to step 2 until one of the following is true: All the negative examples have been removed s iterations have been performed (hyperparameter) Motivation for Step 4 The outcome of the conjunction is definitive for any example that is predicted as negative by at least one rule. There is no need to consider these examples for further iterations Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

26 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Greedy Algorithm (Conjunction Case) 1 Start with an empty conjunction 2 Compute a utility function for each rule of R 3 Select the rule with the greatest utility (r ) 4 Remove all the examples for which r (φ(x)) = False 5 Go to step 2 until one of the following is true: All the negative examples have been removed s iterations have been performed (hyperparameter) Motivation for Step 4 The outcome of the conjunction is definitive for any example that is predicted as negative by at least one rule. There is no need to consider these examples for further iterations Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

27 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Greedy Algorithm (Conjunction Case) 1 Start with an empty conjunction 2 Compute a utility function for each rule of R 3 Select the rule with the greatest utility (r ) 4 Remove all the examples for which r (φ(x)) = False 5 Go to step 2 until one of the following is true: All the negative examples have been removed s iterations have been performed (hyperparameter) Motivation for Step 4 The outcome of the conjunction is definitive for any example that is predicted as negative by at least one rule. There is no need to consider these examples for further iterations Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

28 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Greedy Algorithm (Conjunction Case) 1 Start with an empty conjunction 2 Compute a utility function for each rule of R 3 Select the rule with the greatest utility (r ) 4 Remove all the examples for which r (φ(x)) = False 5 Go to step 2 until one of the following is true: All the negative examples have been removed s iterations have been performed (hyperparameter) Motivation for Step 4 The outcome of the conjunction is definitive for any example that is predicted as negative by at least one rule. There is no need to consider these examples for further iterations Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

29 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Utility Function For any rule r i, the utility function is given by: def U i = A i p B i, where A i is the subset of negative examples correctly classified by r i, B i is the subset of positive examples incorrectly classified by r i, p is a hyperparameter. Scalability The complexity is O(m R s), thus linear in the number of examples and rules We developed an out-of-core implementation (data is loaded/analysed in blocks) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

30 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Utility Function For any rule r i, the utility function is given by: def U i = A i p B i, where A i is the subset of negative examples correctly classified by r i, B i is the subset of positive examples incorrectly classified by r i, p is a hyperparameter. Scalability The complexity is O(m R s), thus linear in the number of examples and rules We developed an out-of-core implementation (data is loaded/analysed in blocks) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

31 The Set Covering Machine (Marchand and Shawe-Taylor, 2003) Utility Function For any rule r i, the utility function is given by: def U i = A i p B i, where A i is the subset of negative examples correctly classified by r i, B i is the subset of positive examples incorrectly classified by r i, p is a hyperparameter. Scalability The complexity is O(m R s), thus linear in the number of examples and rules We developed an out-of-core implementation (data is loaded/analysed in blocks) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

32 Can we expect good generalization? We can bound the risk of a predictor based on its performance on the training set. The following term bounds the risk of every conjonction h of rules in R with probability 1 δ. Occam s Razor Bound [ ln ɛ = 1 def m r ( m r ) + ln ( 2 4 k h where r is the number of errors on the training set, h is the number of rules in the conjunction, ζ is any function such that b N ζ(b) 1 ) ] ln(ζ(r) ζ( h ) δ), The combinatorial term dominates the bound even for classifiers that make few errors The bound seems to indicate bad generalization performance. Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

33 Can we expect good generalization? We can bound the risk of a predictor based on its performance on the training set. The following term bounds the risk of every conjonction h of rules in R with probability 1 δ. Occam s Razor Bound [ ln ɛ = 1 def m r ( m r ) + ln ( 2 4 k h where r is the number of errors on the training set, h is the number of rules in the conjunction, ζ is any function such that b N ζ(b) 1 ) ] ln(ζ(r) ζ( h ) δ), The combinatorial term dominates the bound even for classifiers that make few errors The bound seems to indicate bad generalization performance. Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

34 Can we expect good generalization? We can bound the risk of a predictor based on its performance on the training set. The following term bounds the risk of every conjonction h of rules in R with probability 1 δ. Occam s Razor Bound [ ln ɛ = 1 def m r ( m r ) + ln ( 2 4 k h where r is the number of errors on the training set, h is the number of rules in the conjunction, ζ is any function such that b N ζ(b) 1 ) ] ln(ζ(r) ζ( h ) δ), The combinatorial term dominates the bound even for classifiers that make few errors The bound seems to indicate bad generalization performance. Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

35 Can we expect good generalization? We can bound the risk of a predictor based on its performance on the training set. The following term bounds the risk of every conjonction h of rules in R with probability 1 δ. Occam s Razor Bound [ ln ɛ = 1 def m r ( m r ) + ln ( 2 4 k h where r is the number of errors on the training set, h is the number of rules in the conjunction, ζ is any function such that b N ζ(b) 1 ) ] ln(ζ(r) ζ( h ) δ), The combinatorial term dominates the bound even for classifiers that make few errors The bound seems to indicate bad generalization performance. Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

36 Can we expect good generalization? In the sample compression framework, the predictor h is specified using a small set of training examples (Z i ): Sample Compression Bound ( ) ( ) 1 m m h ɛ = def ln + ln m h r h r + ln(2 x ) ln(ζ( h ) ζ(r) δ), x Z i where r is the number of errors made on S \ Z i. The bound does not depend on k anymore We can consider exponentially more complex feature spaces without any penalty on the generalization error Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

37 Can we expect good generalization? In the sample compression framework, the predictor h is specified using a small set of training examples (Z i ): Sample Compression Bound ( ) ( ) 1 m m h ɛ = def ln + ln m h r h r + ln(2 x ) ln(ζ( h ) ζ(r) δ), x Z i where r is the number of errors made on S \ Z i. The bound does not depend on k anymore We can consider exponentially more complex feature spaces without any penalty on the generalization error Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

38 Can we expect good generalization? In the sample compression framework, the predictor h is specified using a small set of training examples (Z i ): Sample Compression Bound ( ) ( ) 1 m m h ɛ = def ln + ln m h r h r + ln(2 x ) ln(ζ( h ) ζ(r) δ), x Z i where r is the number of errors made on S \ Z i. The bound does not depend on k anymore We can consider exponentially more complex feature spaces without any penalty on the generalization error Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

39 Can we expect good generalization? In the sample compression framework, the predictor h is specified using a small set of training examples (Z i ): Sample Compression Bound ( ) ( ) 1 m m h ɛ = def ln + ln m h r h r + ln(2 x ) ln(ζ( h ) ζ(r) δ), x Z i where r is the number of errors made on S \ Z i. The bound does not depend on k anymore We can consider exponentially more complex feature spaces without any penalty on the generalization error Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

40 Results Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

41 Datasets Dataset K-mers Examples Clostridium difficile Azithromycin Ceftriaxone Clarithromycin Clindamycin Moxifloxacin Pseudomonas aeruginosa Amikacin Doripenem Meropenem Levofloxacin Streptococcus pneumoniae Benzylpenicillin Erythromycin Tetracyclin Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

42 Comparison to other methods We compared the SCM to CART and both L 1 and L 2 regularized support vector machines For all algorithms except the SCM, the dimensionality of the feature space had to be reduced Univariate filter: χ 2 test to score each feature and Benjamini and Yekutiely method to correct for multiple testing We performed 5-fold nested cross-validation and compared the average risk and number of k-mers in the models over the outer-folds Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

43 Comparison to other methods We compared the SCM to CART and both L 1 and L 2 regularized support vector machines For all algorithms except the SCM, the dimensionality of the feature space had to be reduced Univariate filter: χ 2 test to score each feature and Benjamini and Yekutiely method to correct for multiple testing We performed 5-fold nested cross-validation and compared the average risk and number of k-mers in the models over the outer-folds Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

44 Comparison to other methods We compared the SCM to CART and both L 1 and L 2 regularized support vector machines For all algorithms except the SCM, the dimensionality of the feature space had to be reduced Univariate filter: χ 2 test to score each feature and Benjamini and Yekutiely method to correct for multiple testing We performed 5-fold nested cross-validation and compared the average risk and number of k-mers in the models over the outer-folds Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

45 Comparison to other methods We compared the SCM to CART and both L 1 and L 2 regularized support vector machines For all algorithms except the SCM, the dimensionality of the feature space had to be reduced Univariate filter: χ 2 test to score each feature and Benjamini and Yekutiely method to correct for multiple testing We performed 5-fold nested cross-validation and compared the average risk and number of k-mers in the models over the outer-folds Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

46 Dataset SCM χ 2 + SCM χ 2 + CART χ 2 + L1SVM χ 2 + L2SVM Baseline C. difficile Azithromycin (3.2) (4.8) (6.6) (494.6) ( ) Ceftriaxone (2.0) (5.6) (7.2) (277.8) ( ) Clarithromycin (3.0) (4.6) (7.6) (522.6) ( ) Clindamycin (2.0) (2.4) (2.4) (702.2) ( ) Moxifloxacin (1.0) (1.8) (1.0) (173.6) ( ) P. aeruginosa Amikacin (6.0) (9.8) (18.8) (687.8) ( ) Doripenem (1.4) (1.6) (25.4) (44.8) ( ) Meropenem (1.8) (1.8) (9.2) (233.6) (3475.6) Levofloxacin (1.4) (1.8) (1.0) (180.4) ( ) S. pneumoniae Benzylpenicillin (1.0) (1.2) (1.8) (295.8) ( ) Erythromrycin (2.0) (5.6) (4.4) (299.4) ( ) Tetracyclin (1.2) (2.2) (1.0) (479.8) ( ) Average (2.2) (3.6) (7.2) (366.0) ( ) The SCM tends to learn the sparsest models On most datasets, the SCM generalizes well and outperforms the baseline Using univariate filters seems to degrade performance (SCM vs χ 2 + SCM) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

47 Dataset SCM χ 2 + SCM χ 2 + CART χ 2 + L1SVM χ 2 + L2SVM Baseline C. difficile Azithromycin (3.2) (4.8) (6.6) (494.6) ( ) Ceftriaxone (2.0) (5.6) (7.2) (277.8) ( ) Clarithromycin (3.0) (4.6) (7.6) (522.6) ( ) Clindamycin (2.0) (2.4) (2.4) (702.2) ( ) Moxifloxacin (1.0) (1.8) (1.0) (173.6) ( ) P. aeruginosa Amikacin (6.0) (9.8) (18.8) (687.8) ( ) Doripenem (1.4) (1.6) (25.4) (44.8) ( ) Meropenem (1.8) (1.8) (9.2) (233.6) (3475.6) Levofloxacin (1.4) (1.8) (1.0) (180.4) ( ) S. pneumoniae Benzylpenicillin (1.0) (1.2) (1.8) (295.8) ( ) Erythromrycin (2.0) (5.6) (4.4) (299.4) ( ) Tetracyclin (1.2) (2.2) (1.0) (479.8) ( ) Average (2.2) (3.6) (7.2) (366.0) ( ) The SCM tends to learn the sparsest models On most datasets, the SCM generalizes well and outperforms the baseline Using univariate filters seems to degrade performance (SCM vs χ 2 + SCM) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

48 Dataset SCM χ 2 + SCM χ 2 + CART χ 2 + L1SVM χ 2 + L2SVM Baseline C. difficile Azithromycin (3.2) (4.8) (6.6) (494.6) ( ) Ceftriaxone (2.0) (5.6) (7.2) (277.8) ( ) Clarithromycin (3.0) (4.6) (7.6) (522.6) ( ) Clindamycin (2.0) (2.4) (2.4) (702.2) ( ) Moxifloxacin (1.0) (1.8) (1.0) (173.6) ( ) P. aeruginosa Amikacin (6.0) (9.8) (18.8) (687.8) ( ) Doripenem (1.4) (1.6) (25.4) (44.8) ( ) Meropenem (1.8) (1.8) (9.2) (233.6) (3475.6) Levofloxacin (1.4) (1.8) (1.0) (180.4) ( ) S. pneumoniae Benzylpenicillin (1.0) (1.2) (1.8) (295.8) ( ) Erythromrycin (2.0) (5.6) (4.4) (299.4) ( ) Tetracyclin (1.2) (2.2) (1.0) (479.8) ( ) Average (2.2) (3.6) (7.2) (366.0) ( ) The SCM tends to learn the sparsest models On most datasets, the SCM generalizes well and outperforms the baseline Using univariate filters seems to degrade performance (SCM vs χ 2 + SCM) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

49 Dataset SCM χ 2 + SCM χ 2 + CART χ 2 + L1SVM χ 2 + L2SVM Baseline C. difficile Azithromycin (3.2) (4.8) (6.6) (494.6) ( ) Ceftriaxone (2.0) (5.6) (7.2) (277.8) ( ) Clarithromycin (3.0) (4.6) (7.6) (522.6) ( ) Clindamycin (2.0) (2.4) (2.4) (702.2) ( ) Moxifloxacin (1.0) (1.8) (1.0) (173.6) ( ) P. aeruginosa Amikacin (6.0) (9.8) (18.8) (687.8) ( ) Doripenem (1.4) (1.6) (25.4) (44.8) ( ) Meropenem (1.8) (1.8) (9.2) (233.6) (3475.6) Levofloxacin (1.4) (1.8) (1.0) (180.4) ( ) S. pneumoniae Benzylpenicillin (1.0) (1.2) (1.8) (295.8) ( ) Erythromrycin (2.0) (5.6) (4.4) (299.4) ( ) Tetracyclin (1.2) (2.2) (1.0) (479.8) ( ) Average (2.2) (3.6) (7.2) (366.0) ( ) The SCM tends to learn the sparsest models On most datasets, the SCM generalizes well and outperforms the baseline Using univariate filters seems to degrade performance (SCM vs χ 2 + SCM) Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

50 The Obtained Models are Interpretable Valida&on*biologique* 1* 2* 5* 3* 10* 4* 15* 6* Azithromycine$ Ce=riaxone$ Clarithromycin$ Clindamycin$ 11* * * * * * * 9* DNA*gyrase*subunit*A* Tn6194Xlike*Transposon,*other* TwoXcomponent*sensor*his&dine* kinase* 8* * * * Moxifloxacin$ 26* Transposon*Tn6110*and*Clostridium* Saccharoly&cum*23S*rRNA* m(2)ax2503*methyltransferase* PenicillinXbinding*protein* Conjuga&ve*transposon*FtsK/SpoIIIEXlike* * * * * * * * 12* * * * * 14* 13* ErmB*rRNA*adenine*NX6X methyltransferase* Other:*hypothe&cal*proteins*and* unmatched*kxmers****** Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

51 Conclusion Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

52 Conclusion We used the Set Covering Machine to learn from extremely high dimensional feature spaces with small sample sizes. Scalability: The Set Covering Machine is the only algorithm that did not require feature selection. Generalization: The obtained models compare favorably to other learning algorithms in terms of prediction error. Interpretability: The obtained models are sparse and explicitely highlight the importance of small DNA sequences. For all these reasons, greed is great! Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

53 Conclusion We used the Set Covering Machine to learn from extremely high dimensional feature spaces with small sample sizes. Scalability: The Set Covering Machine is the only algorithm that did not require feature selection. Generalization: The obtained models compare favorably to other learning algorithms in terms of prediction error. Interpretability: The obtained models are sparse and explicitely highlight the importance of small DNA sequences. For all these reasons, greed is great! Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

54 Conclusion We used the Set Covering Machine to learn from extremely high dimensional feature spaces with small sample sizes. Scalability: The Set Covering Machine is the only algorithm that did not require feature selection. Generalization: The obtained models compare favorably to other learning algorithms in terms of prediction error. Interpretability: The obtained models are sparse and explicitely highlight the importance of small DNA sequences. For all these reasons, greed is great! Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

55 Conclusion We used the Set Covering Machine to learn from extremely high dimensional feature spaces with small sample sizes. Scalability: The Set Covering Machine is the only algorithm that did not require feature selection. Generalization: The obtained models compare favorably to other learning algorithms in terms of prediction error. Interpretability: The obtained models are sparse and explicitely highlight the importance of small DNA sequences. For all these reasons, greed is great! Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

56 Conclusion We used the Set Covering Machine to learn from extremely high dimensional feature spaces with small sample sizes. Scalability: The Set Covering Machine is the only algorithm that did not require feature selection. Generalization: The obtained models compare favorably to other learning algorithms in terms of prediction error. Interpretability: The obtained models are sparse and explicitely highlight the importance of small DNA sequences. For all these reasons, greed is great! Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

57 Thank you! Come see me at our poster Thanks to my co-authors: Sébastien Giguère Maxime Déraspe François Laviolette Mario Marchand Jacques Corbeil Alexandre Drouin (Université Laval) Greedy Biomarker Discovery July 10, / 24

A Pseudo-Boolean Set Covering Machine

A Pseudo-Boolean Set Covering Machine A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper Département d informatique et de génie logiciel, Université

More information

A Pseudo-Boolean Set Covering Machine

A Pseudo-Boolean Set Covering Machine A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper GRAAL (Université Laval, Québec city) October 9, 2012

More information

The Set Covering Machine with Data-Dependent Half-Spaces

The Set Covering Machine with Data-Dependent Half-Spaces The Set Covering Machine with Data-Dependent Half-Spaces Mario Marchand Département d Informatique, Université Laval, Québec, Canada, G1K-7P4 MARIO.MARCHAND@IFT.ULAVAL.CA Mohak Shah School of Information

More information

The Decision List Machine

The Decision List Machine The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca

More information

Learning with Decision Lists of Data-Dependent Features

Learning with Decision Lists of Data-Dependent Features Journal of Machine Learning Research 6 (2005) 427 451 Submitted 9/04; Published 4/05 Learning with Decision Lists of Data-Dependent Features Mario Marchand Département IFT-GLO Université Laval Québec,

More information

PAC-Bayesian Learning and Domain Adaptation

PAC-Bayesian Learning and Domain Adaptation PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département

More information

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,

More information

arxiv: v1 [q-bio.qm] 31 Jul 2012

arxiv: v1 [q-bio.qm] 31 Jul 2012 Learning a peptide-protein binding affinity predictor with kernel ridge regression arxiv:107.753v1 [q-bio.qm] 31 Jul 01 Sébastien Giguère 1, Mario Marchand 1, François Laviolette 1, Alexandre Drouin 1,

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel

More information

Variable Selection in Data Mining Project

Variable Selection in Data Mining Project Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Generalization of the PAC-Bayesian Theory

Generalization of the PAC-Bayesian Theory Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Graph-Based Semi-Supervised Learning

Graph-Based Semi-Supervised Learning Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier

More information

Maximum Margin Interval Trees

Maximum Margin Interval Trees Maximum Margin Interval Trees Alexandre Drouin Département d informatique et de génie logiciel Université Laval, Québec, Canada alexandre.drouin.8@ulaval.ca Toby Dylan Hocking McGill Genome Center McGill

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Minimax risk bounds for linear threshold functions

Minimax risk bounds for linear threshold functions CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

La théorie PAC-Bayes en apprentissage supervisé

La théorie PAC-Bayes en apprentissage supervisé La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd

More information

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Algorithms for sparse analysis Lecture I: Background on sparse approximation Algorithms for sparse analysis Lecture I: Background on sparse approximation Anna C. Gilbert Department of Mathematics University of Michigan Tutorial on sparse approximations and algorithms Compress data

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Comp487/587 - Boolean Formulas

Comp487/587 - Boolean Formulas Comp487/587 - Boolean Formulas 1 Logic and SAT 1.1 What is a Boolean Formula Logic is a way through which we can analyze and reason about simple or complicated events. In particular, we are interested

More information

Decision Trees. Danushka Bollegala

Decision Trees. Danushka Bollegala Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Final Exam. December 11 th, This exam booklet contains five problems, out of which you are expected to answer four problems of your choice.

Final Exam. December 11 th, This exam booklet contains five problems, out of which you are expected to answer four problems of your choice. CS446: Machine Learning Fall 2012 Final Exam December 11 th, 2012 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. Note that there is

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

High-Dimensional Statistical Learning: Introduction

High-Dimensional Statistical Learning: Introduction Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Scalable Bayesian Event Detection and Visualization

Scalable Bayesian Event Detection and Visualization Scalable Bayesian Event Detection and Visualization Daniel B. Neill Carnegie Mellon University H.J. Heinz III College E-mail: neill@cs.cmu.edu This work was partially supported by NSF grants IIS-0916345,

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Multimodal Deep Learning for Predicting Survival from Breast Cancer Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information

The Performance of a New Hybrid Classifier Based on Boxes and Nearest Neighbors

The Performance of a New Hybrid Classifier Based on Boxes and Nearest Neighbors The Performance of a New Hybrid Classifier Based on Boxes and Nearest Neighbors Martin Anthony Department of Mathematics London School of Economics and Political Science Houghton Street, London WC2A2AE

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Multiclass Multilabel Classification with More Classes than Examples

Multiclass Multilabel Classification with More Classes than Examples Multiclass Multilabel Classification with More Classes than Examples Ohad Shamir Weizmann Institute of Science Joint work with Ofer Dekel, MSR NIPS 2015 Extreme Classification Workshop Extreme Multiclass

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Model Averaging With Holdout Estimation of the Posterior Distribution

Model Averaging With Holdout Estimation of the Posterior Distribution Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Minimization of Boolean Expressions Using Matrix Algebra

Minimization of Boolean Expressions Using Matrix Algebra Minimization of Boolean Expressions Using Matrix Algebra Holger Schwender Collaborative Research Center SFB 475 University of Dortmund holger.schwender@udo.edu Abstract The more variables a logic expression

More information

Variations sur la borne PAC-bayésienne

Variations sur la borne PAC-bayésienne Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Contents Introduction to Machine Learning Concept Learning Varun Chandola February 2, 2018 1 Concept Learning 1 1.1 Example Finding Malignant Tumors............. 2 1.2 Notation..............................

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information