Machine Learning in Action

Size: px
Start display at page:

Download "Machine Learning in Action"

Transcription

1 Machine Learning in Action Tatyana Goldberg August 16,

2 Machine Learning in Biology Beijing Genomics Institute in Shenzhen, China June 2014 GenBank 1 173,353,076 DNA sequences UniProt 2 69,560,473 protein sequences GOLD 3 total # of genomes 49,092 Biological data is growing much faster than our knowledge of biological processes!

3 Naive Bayes Car theft example: chances your car gets stolen? Color Brand Age Stolen? Black BMW New Yes Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Sample vectors x = x 1,, x n X Class labels y Y 2

4 Naive Bayes Sample vectors x = x 1,, x n X Class labels p y x posterior likelihood = y Y p x y p(y) p(x) evidence prior Thomas Bayes n p x y = p(x i y) i=1 naïve 3

5 Naive Bayes Sample vectors x = x 1,, x n X Class labels p y x posterior likelihood = y Y p x y p(y) p(x) evidence prior Thomas Bayes n p x y = p(x i y) i=1 naïve Naive Bayes classifier adds a decision rule Decision rule y max = arg max y Y p x y p(y) 3

6 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? p Yes = 0.5 p No = 0.5 4

7 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? p Yes = 0.5 p Black Yes = 2 5 ; p No = 0.5 p Black No = 2 5 4

8 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No p Yes = 0.5 p Black Yes = 2 5 ; p No = 0.5 p Black No = 2 5 Black Lada Old No Black Lada Old No p Lada Yes = 2 5 ; p Lada No = 4 5 Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? 4

9 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? p Yes = 0.5 p Black Yes = 2 5 ; p Lada Yes = 2 5 ; p New Yes = 4 5 ; p No = 0.5 p Black No = 2 5 p Lada No = 4 5 p New No = 1 5 4

10 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes p Yes = 0.5 p Black Yes = 2 5 ; p Lada Yes = 2 5 ; p New Yes = 4 5 ; p No = 0.5 p Black No = 2 5 p Lada No = 4 5 p New No = 1 5 Green Lada New No Green Lada Old No Green Lada New Yes p Yes Black, Lada, New = /z p No Black, Lada, New = 0.032/z Black Lada New? 4

11 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes p Yes = 0.5 p Black Yes = 2 5 ; p Lada Yes = 2 5 ; p New Yes = 4 5 ; p No = 0.5 p Black No = 2 5 p Lada No = 4 5 p New No = 1 5 Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? Green BMW New? p Yes Black, Lada, New = /z p No Black, Lada, New = 0.032/z p Yes Green, BMW, New = 0.14/z p No Green, BMW, New = 0.005/z 4

12 Naive Bayes Color Brand Age Stolen? Black BMW New Yes Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? Green BMW New? Green Lada?? p x y p(y) p y x = p(x) p Yes = 0.5 p No = 0.5 p Black Yes = 2 p Black No = 2 5 ; 5 p Lada Yes = 2 5 ; p Lada No = 4 5 p New Yes = 4 5 ; p New No = 1 5 p Yes Black, Lada, New = /z p No Black, Lada, New = 0.032/z p Yes Green, BMW, New = 0.14/z p No Green, BMW, New = 0.005/z p Yes Green, Lada,? = 0.12/z p No Green, Lada,? = 0.24/z 4

13 Support Vector Machine Price Vladimir Vapnik? - Linear separation by building a hyperplane - Hyperplane with the maximum margin is the best Age Cortes C. and Vapnik V. Machine Learning (1995) 5

14 Support Vector Machine Price Vladimir Vapnik? - Linear separation by building a hyperplane - Hyperplane with the maximum margin is the best Age Cortes C. and Vapnik V. Machine Learning (1995) 5

15 Support Vector Machine Price Vladimir Vapnik? - Linear separation by building a hyperplane - Hyperplane with the maximum margin is the best Age Cortes C. and Vapnik V. Machine Learning (1995) 5

16 Support Vector Machine Price Vladimir Vapnik? - Linear separation by building a hyperplane - Hyperplane with the maximum margin is the best Age Cortes C. and Vapnik V. Machine Learning (1995) 5

17 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 y = -1 w T x i + b 1 for y i = 1 b Age 6

18 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 y = -1 w T x i + b 1 for y i = 1 b Age 6

19 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 y = -1 w T x i + b 1 for y i = 1 Can be combined to: b Age y i w T x i + b 1 for i 6

20 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 y = -1 w T x i + b 1 for y i = 1 Can be combined to: b Age y i w T x i + b 1 for i 6

21 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 ξ<1 ξ>1 y = -1 w T x i + b 1 for y i = 1 Can be combined to: b Age y i w T x i + b 1 for i y i w T x i + b 1 ξ i for i 6

22 Support Vector Machine 7

23 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 ξ<1 ξ>1 y = -1 w T x i + b 1 for y i = 1 Can be combined to: b Age y i w T x i + b 1 for i Decision function f x = sgn( α i y i x i T x + b) i,α i >0 y i w T x i + b 1 ξ i for i Optimization Support vectors (x i α i > 0) 8

24 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 b ξ<1 ξ>1 support vector y = -1 Age w T x i + b 1 for y i = 1 Can be combined to: y i w T x i + b 1 for i Decision function f x = sgn( α i y i x i T x + b) i,α i >0 y i w T x i + b 1 ξ i for i Optimization Support vectors (x i α i > 0) 8

25 Localization Predictions Indispensable High-throughput methods cost money not accurate not complete Computational prediction Intracellular mitochondrial network, microtubules, nuclei MTSHSYYKDRLGFDPNEQQP GSNNSMKRSSSRQTTHHHQ SYHHATTSSSQSPARISVSPG GNNGTLEYQQVQRENNW Predictor Protein Function 9

26 The Signal Hypothesis Günter Blobel for the discovery that proteins have intrinsic signals that govern their transport and localization in the cell

27 Signal Peptide Prediction Eukaryotes Created with using the SignapP 4.0 data set SignalP v : prediction of signal peptides and their cleavage sites Henrik Nielsen Gunnar von Heijne Søren Brunak

28 Data Preprocessing 40N terminal sequences of positive and negative samples MKSLKSSTHDVPHPEHVVWAPPAYDEQHHLFFSHGTVLIG +1 MAQSVTAFQAAYISIEVLIALVSVPGNILVIWAVKMNQAL +1 MGRKSLYLLIVGILIAYYIYTPLPDNVEEPWRMMWINAHL +1 MDNDGGAPPPPPTLVVEEPKKAEIRGVAFKELFRFADGLD +1 MGFEPLDWYCKPVPNGVWTKTVDYAFGAYTPCAIDSFVLG +1 MARSSLFTFLCLAVFINGCLSQIEQQSPWEFQGSEVWQQH -1 MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPG -1 MVEMLPTVAVLVLAVSVVAKDNTTCDGPCGLRFRQNSQAG -1 MNSNLPAENLTIAVNMTKTLPTAVTHGFNSTNDPPSMSIT -1 MALHMILVMVSLLPLLEAQNPEHVNITIGDPITNETLSWL -1 MANKLFLVSATLAFFFLLTNASIYRTIVEVDEDDATNPAG -1 Convert sequences into feature vectors (e.g. amino acid composition) Sequence MPPPAD 20-dimensional feature vector [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0] 12

29 Attr. 4 Attr. 3 Attr. 2 Attr. 1 Class Visualize Your Data Attr. 4 Attr. 3 Attr. 2 Attr. 1 Class 2D Scatterplots for all attribute pairs Colors are the labels (+1/-1) Some clusters are easily separable (class/any attribute, Attr.1 /Attr. 3) Whereas clusters of Attr3/Attr4 are much harder to separate 13

30 Performance Overfitting: theory PredictProtein lecture, Prof. Rost Free parameters The predictor fits the data too well Many more free parameters than samples Poor prediction on new data sets Rule of thumb: samples > 10 free parameters Check for overfitting using cross- validation 14

31 Performance Overfitting: theory PredictProtein lecture, Prof. Rost Free parameters Data set The predictor fits the data too well Many more free parameters than samples Poor prediction on new data sets Rule of thumb: samples > 10 free parameters Check for overfitting using cross- validation Test Train 14

32 Stratified k-fold Cross-validation Train (k-1 folds) Test (1 fold) Data Performance! K-folds: a random partitioning into k equally sized subsets, use each subset for testing exactly once and the remaining k-1 subsets for training Stratified: each fold has the same proportion of each class value 15

33 Sensitivity observed Performance Metric: AUC +1-1 predicted +1-1 TP FN FP TN Sensitivity = Specificity = TP TP+FN TN TN+FP ROC: Receiver Operator Characteristic AUC: Area under the ROC Curve Threshold independent Better than Acc, Cov, F1, etc. 1 TP Threshold TN FN FP Threshold Classifier s confidence 0 Threshold 1 - Specificity perfect bad perfect random 16

34 WEKA What is Weka? Weka is a bird found only in New Zealand Waikato Environment for Knowledge Analysis Machine learning workbench for data mining tasks 100+ algorithms for classification 75+ for data pre-processing and analysis 20+ for clustering, finding association rules, etc. 25 to assist with feature selection Java-based & supports multiple platforms Ian H. Witten Eibe Frank 17

35 WEKA in Action Live Demo Presentation 18

36 ML Workflow Data Select a subset of data Invent features abnormalities Remaining data Run classifier Get Performance Plot data Run classifer(s) Compare to random, competitors Happy? nope nope Happy? yes yes Publish predictor 19

37 Goldberg T, Nielsen H, Rost B et al. (2014). LocTree3 prediction of localization. Nucleic Acids Research

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC CLASSIFICATION NAIVE BAYES NIKOLA MILIKIĆ nikola.milikic@fon.bg.ac.rs UROŠ KRČADINAC uros@krcadinac.com WHAT IS CLASSIFICATION? A supervised learning task of determining the class of an instance; it is

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A. Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten, E. Frank and M. A. Hall Statistical modeling Opposite of R: use all the attributes Two assumptions:

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Classification. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 162

Classification. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 162 Classification Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 66 / 162 Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 67 / 162

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

0.5. (b) How many parameters will we learn under the Naïve Bayes assumption?

0.5. (b) How many parameters will we learn under the Naïve Bayes assumption? . Consider the following four vectors:.5 (i) x = [.5 ] (ii) x = [ ] (iii) x 3 = [ (a) What is the magnitude of each vector?.5 ] (b) What is the result of each dot product below? x T x x 3 T x x T x 3.

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced

More information

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29 Data Mining and Knowledge Discovery Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2011/11/29 1 Practice plan 2011/11/08: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate test set,

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each) Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Naive Bayes Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 20 Introduction Classification = supervised method for

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU March 1, 2017 1 / 13 Bayes Rule Bayes Rule Assume that {B 1, B 2,..., B k } is a partition of S

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Evaluation & Credibility Issues

Evaluation & Credibility Issues Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Data Science and Technology Entrepreneurship

Data Science and Technology Entrepreneurship ! Data Science and Technology Entrepreneurship Data Science for Your Startup Classification Algorithms Minimum Viable Product Development Sameer Maskey Week6 Announcements No class for next 2 weeks March

More information

Decision Support. Dr. Johan Hagelbäck.

Decision Support. Dr. Johan Hagelbäck. Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems

More information

#33 - Genomics 11/09/07

#33 - Genomics 11/09/07 BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

TMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg

TMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg title: short title: TMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg lecture: Protein Prediction 1 (for Computational Biology) Protein structure TUM summer semester 09.06.2016 1 Last time 2 3 Yet another

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins

Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins Mol Divers (2008) 12:41 45 DOI 10.1007/s11030-008-9073-0 FULL LENGTH PAPER Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins Bing Niu Yu-Huan Jin Kai-Yan

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

COMP 328: Machine Learning

COMP 328: Machine Learning COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.

Lecture Slides for INTRODUCTION TO. Machine Learning. By:  Postedited by: R. Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends

More information

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms

More information

CSC 411: Lecture 03: Linear Classification

CSC 411: Lecture 03: Linear Classification CSC 411: Lecture 03: Linear Classification Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classification 1 / 24 Examples of Problems What

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS LINEAR CLASSIFIER 1 FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS x f y High Value Customers Salary Task: Find Nb Orders 150 70 300 100 200 80 120 100 Low Value Customers Salary Nb Orders 40 80 220

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Identifying Extracellular Plant Proteins Based on Frequent Subsequences

Identifying Extracellular Plant Proteins Based on Frequent Subsequences Identifying Extracellular Plant Proteins Based on Frequent Subsequences Yang Wang, Osmar R. Zaïane, Randy Goebel and Gregory Taylor University of Alberta {wyang, zaiane, goebel}@cs.ualberta.ca Abstract

More information

Pointwise Exact Bootstrap Distributions of Cost Curves

Pointwise Exact Bootstrap Distributions of Cost Curves Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Machine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77

Machine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77 Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77 Antidot Software vendor since 1999 Paris, Lyon, Aix-en-Provence 45 employees

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

A Bayesian Approach to Concept Drift

A Bayesian Approach to Concept Drift A Bayesian Approach to Concept Drift Stephen H. Bach Marcus A. Maloof Department of Computer Science Georgetown University Washington, DC 20007, USA {bach, maloof}@cs.georgetown.edu Abstract To cope with

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Is cross-validation valid for small-sample microarray classification?

Is cross-validation valid for small-sample microarray classification? Is cross-validation valid for small-sample microarray classification? Braga-Neto et al. Bioinformatics 2004 Topics in Bioinformatics Presented by Lei Xu October 26, 2004 1 Review 1) Statistical framework

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Machine Learning: A Statistics and Optimization Perspective

Machine Learning: A Statistics and Optimization Perspective Machine Learning: A Statistics and Optimization Perspective Nan Ye Mathematical Sciences School Queensland University of Technology 1 / 109 What is Machine Learning? 2 / 109 Machine Learning Machine learning

More information

Data Mining Part 4. Prediction

Data Mining Part 4. Prediction Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information