Machine Learning in Action
|
|
- Martha Scarlett Jordan
- 5 years ago
- Views:
Transcription
1 Machine Learning in Action Tatyana Goldberg August 16,
2 Machine Learning in Biology Beijing Genomics Institute in Shenzhen, China June 2014 GenBank 1 173,353,076 DNA sequences UniProt 2 69,560,473 protein sequences GOLD 3 total # of genomes 49,092 Biological data is growing much faster than our knowledge of biological processes!
3 Naive Bayes Car theft example: chances your car gets stolen? Color Brand Age Stolen? Black BMW New Yes Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Sample vectors x = x 1,, x n X Class labels y Y 2
4 Naive Bayes Sample vectors x = x 1,, x n X Class labels p y x posterior likelihood = y Y p x y p(y) p(x) evidence prior Thomas Bayes n p x y = p(x i y) i=1 naïve 3
5 Naive Bayes Sample vectors x = x 1,, x n X Class labels p y x posterior likelihood = y Y p x y p(y) p(x) evidence prior Thomas Bayes n p x y = p(x i y) i=1 naïve Naive Bayes classifier adds a decision rule Decision rule y max = arg max y Y p x y p(y) 3
6 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? p Yes = 0.5 p No = 0.5 4
7 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? p Yes = 0.5 p Black Yes = 2 5 ; p No = 0.5 p Black No = 2 5 4
8 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No p Yes = 0.5 p Black Yes = 2 5 ; p No = 0.5 p Black No = 2 5 Black Lada Old No Black Lada Old No p Lada Yes = 2 5 ; p Lada No = 4 5 Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? 4
9 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? p Yes = 0.5 p Black Yes = 2 5 ; p Lada Yes = 2 5 ; p New Yes = 4 5 ; p No = 0.5 p Black No = 2 5 p Lada No = 4 5 p New No = 1 5 4
10 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes p Yes = 0.5 p Black Yes = 2 5 ; p Lada Yes = 2 5 ; p New Yes = 4 5 ; p No = 0.5 p Black No = 2 5 p Lada No = 4 5 p New No = 1 5 Green Lada New No Green Lada Old No Green Lada New Yes p Yes Black, Lada, New = /z p No Black, Lada, New = 0.032/z Black Lada New? 4
11 Naive Bayes Color Brand Age Stolen? Black BMW New Yes p y x = p x y p(y) p(x) Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes p Yes = 0.5 p Black Yes = 2 5 ; p Lada Yes = 2 5 ; p New Yes = 4 5 ; p No = 0.5 p Black No = 2 5 p Lada No = 4 5 p New No = 1 5 Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? Green BMW New? p Yes Black, Lada, New = /z p No Black, Lada, New = 0.032/z p Yes Green, BMW, New = 0.14/z p No Green, BMW, New = 0.005/z 4
12 Naive Bayes Color Brand Age Stolen? Black BMW New Yes Black BMW Old Yes Green BMW New Yes Green BMW Old No Black Lada Old No Black Lada Old No Green Lada New Yes Green Lada New No Green Lada Old No Green Lada New Yes Black Lada New? Green BMW New? Green Lada?? p x y p(y) p y x = p(x) p Yes = 0.5 p No = 0.5 p Black Yes = 2 p Black No = 2 5 ; 5 p Lada Yes = 2 5 ; p Lada No = 4 5 p New Yes = 4 5 ; p New No = 1 5 p Yes Black, Lada, New = /z p No Black, Lada, New = 0.032/z p Yes Green, BMW, New = 0.14/z p No Green, BMW, New = 0.005/z p Yes Green, Lada,? = 0.12/z p No Green, Lada,? = 0.24/z 4
13 Support Vector Machine Price Vladimir Vapnik? - Linear separation by building a hyperplane - Hyperplane with the maximum margin is the best Age Cortes C. and Vapnik V. Machine Learning (1995) 5
14 Support Vector Machine Price Vladimir Vapnik? - Linear separation by building a hyperplane - Hyperplane with the maximum margin is the best Age Cortes C. and Vapnik V. Machine Learning (1995) 5
15 Support Vector Machine Price Vladimir Vapnik? - Linear separation by building a hyperplane - Hyperplane with the maximum margin is the best Age Cortes C. and Vapnik V. Machine Learning (1995) 5
16 Support Vector Machine Price Vladimir Vapnik? - Linear separation by building a hyperplane - Hyperplane with the maximum margin is the best Age Cortes C. and Vapnik V. Machine Learning (1995) 5
17 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 y = -1 w T x i + b 1 for y i = 1 b Age 6
18 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 y = -1 w T x i + b 1 for y i = 1 b Age 6
19 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 y = -1 w T x i + b 1 for y i = 1 Can be combined to: b Age y i w T x i + b 1 for i 6
20 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 y = -1 w T x i + b 1 for y i = 1 Can be combined to: b Age y i w T x i + b 1 for i 6
21 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 ξ<1 ξ>1 y = -1 w T x i + b 1 for y i = 1 Can be combined to: b Age y i w T x i + b 1 for i y i w T x i + b 1 ξ i for i 6
22 Support Vector Machine 7
23 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 ξ<1 ξ>1 y = -1 w T x i + b 1 for y i = 1 Can be combined to: b Age y i w T x i + b 1 for i Decision function f x = sgn( α i y i x i T x + b) i,α i >0 y i w T x i + b 1 ξ i for i Optimization Support vectors (x i α i > 0) 8
24 Support Vector Machine Price y = +1 x i Sample vectors x i = (x 1,, x n ) i, i 1,, N Class labels y i 1, +1 Separating hyperplane w T x i + b = 0 w w T x i + b 1 for y i = +1 b ξ<1 ξ>1 support vector y = -1 Age w T x i + b 1 for y i = 1 Can be combined to: y i w T x i + b 1 for i Decision function f x = sgn( α i y i x i T x + b) i,α i >0 y i w T x i + b 1 ξ i for i Optimization Support vectors (x i α i > 0) 8
25 Localization Predictions Indispensable High-throughput methods cost money not accurate not complete Computational prediction Intracellular mitochondrial network, microtubules, nuclei MTSHSYYKDRLGFDPNEQQP GSNNSMKRSSSRQTTHHHQ SYHHATTSSSQSPARISVSPG GNNGTLEYQQVQRENNW Predictor Protein Function 9
26 The Signal Hypothesis Günter Blobel for the discovery that proteins have intrinsic signals that govern their transport and localization in the cell
27 Signal Peptide Prediction Eukaryotes Created with using the SignapP 4.0 data set SignalP v : prediction of signal peptides and their cleavage sites Henrik Nielsen Gunnar von Heijne Søren Brunak
28 Data Preprocessing 40N terminal sequences of positive and negative samples MKSLKSSTHDVPHPEHVVWAPPAYDEQHHLFFSHGTVLIG +1 MAQSVTAFQAAYISIEVLIALVSVPGNILVIWAVKMNQAL +1 MGRKSLYLLIVGILIAYYIYTPLPDNVEEPWRMMWINAHL +1 MDNDGGAPPPPPTLVVEEPKKAEIRGVAFKELFRFADGLD +1 MGFEPLDWYCKPVPNGVWTKTVDYAFGAYTPCAIDSFVLG +1 MARSSLFTFLCLAVFINGCLSQIEQQSPWEFQGSEVWQQH -1 MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPG -1 MVEMLPTVAVLVLAVSVVAKDNTTCDGPCGLRFRQNSQAG -1 MNSNLPAENLTIAVNMTKTLPTAVTHGFNSTNDPPSMSIT -1 MALHMILVMVSLLPLLEAQNPEHVNITIGDPITNETLSWL -1 MANKLFLVSATLAFFFLLTNASIYRTIVEVDEDDATNPAG -1 Convert sequences into feature vectors (e.g. amino acid composition) Sequence MPPPAD 20-dimensional feature vector [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0] 12
29 Attr. 4 Attr. 3 Attr. 2 Attr. 1 Class Visualize Your Data Attr. 4 Attr. 3 Attr. 2 Attr. 1 Class 2D Scatterplots for all attribute pairs Colors are the labels (+1/-1) Some clusters are easily separable (class/any attribute, Attr.1 /Attr. 3) Whereas clusters of Attr3/Attr4 are much harder to separate 13
30 Performance Overfitting: theory PredictProtein lecture, Prof. Rost Free parameters The predictor fits the data too well Many more free parameters than samples Poor prediction on new data sets Rule of thumb: samples > 10 free parameters Check for overfitting using cross- validation 14
31 Performance Overfitting: theory PredictProtein lecture, Prof. Rost Free parameters Data set The predictor fits the data too well Many more free parameters than samples Poor prediction on new data sets Rule of thumb: samples > 10 free parameters Check for overfitting using cross- validation Test Train 14
32 Stratified k-fold Cross-validation Train (k-1 folds) Test (1 fold) Data Performance! K-folds: a random partitioning into k equally sized subsets, use each subset for testing exactly once and the remaining k-1 subsets for training Stratified: each fold has the same proportion of each class value 15
33 Sensitivity observed Performance Metric: AUC +1-1 predicted +1-1 TP FN FP TN Sensitivity = Specificity = TP TP+FN TN TN+FP ROC: Receiver Operator Characteristic AUC: Area under the ROC Curve Threshold independent Better than Acc, Cov, F1, etc. 1 TP Threshold TN FN FP Threshold Classifier s confidence 0 Threshold 1 - Specificity perfect bad perfect random 16
34 WEKA What is Weka? Weka is a bird found only in New Zealand Waikato Environment for Knowledge Analysis Machine learning workbench for data mining tasks 100+ algorithms for classification 75+ for data pre-processing and analysis 20+ for clustering, finding association rules, etc. 25 to assist with feature selection Java-based & supports multiple platforms Ian H. Witten Eibe Frank 17
35 WEKA in Action Live Demo Presentation 18
36 ML Workflow Data Select a subset of data Invent features abnormalities Remaining data Run classifier Get Performance Plot data Run classifer(s) Compare to random, competitors Happy? nope nope Happy? yes yes Publish predictor 19
37 Goldberg T, Nielsen H, Rost B et al. (2014). LocTree3 prediction of localization. Nucleic Acids Research
Model Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationMachine Learning Concepts in Chemoinformatics
Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics
More informationCLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC
CLASSIFICATION NAIVE BAYES NIKOLA MILIKIĆ nikola.milikic@fon.bg.ac.rs UROŠ KRČADINAC uros@krcadinac.com WHAT IS CLASSIFICATION? A supervised learning task of determining the class of an instance; it is
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten, E. Frank and M. A. Hall Statistical modeling Opposite of R: use all the attributes Two assumptions:
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationClassification. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 162
Classification Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 66 / 162 Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 67 / 162
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More information0.5. (b) How many parameters will we learn under the Naïve Bayes assumption?
. Consider the following four vectors:.5 (i) x = [.5 ] (ii) x = [ ] (iii) x 3 = [ (a) What is the magnitude of each vector?.5 ] (b) What is the result of each dot product below? x T x x 3 T x x T x 3.
More informationHow do we compare the relative performance among competing models?
How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationSmart Home Health Analytics Information Systems University of Maryland Baltimore County
Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced
More informationBANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1
BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationhsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference
CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science
More informationData Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29
Data Mining and Knowledge Discovery Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2011/11/29 1 Practice plan 2011/11/08: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate test set,
More informationMachine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler
+ Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table
More informationPlan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.
Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationQ1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)
Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Naive Bayes Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 20 Introduction Classification = supervised method for
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationMachine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU
Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU March 1, 2017 1 / 13 Bayes Rule Bayes Rule Assume that {B 1, B 2,..., B k } is a partition of S
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationEvaluation & Credibility Issues
Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data
More informationBayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI
Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationData Science and Technology Entrepreneurship
! Data Science and Technology Entrepreneurship Data Science for Your Startup Classification Algorithms Minimum Viable Product Development Sameer Maskey Week6 Announcements No class for next 2 weeks March
More informationDecision Support. Dr. Johan Hagelbäck.
Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems
More information#33 - Genomics 11/09/07
BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationTMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg
title: short title: TMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg lecture: Protein Prediction 1 (for Computational Biology) Protein structure TUM summer semester 09.06.2016 1 Last time 2 3 Yet another
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationUsing AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins
Mol Divers (2008) 12:41 45 DOI 10.1007/s11030-008-9073-0 FULL LENGTH PAPER Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins Bing Niu Yu-Huan Jin Kai-Yan
More informationBayes Decision Theory
Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16
More informationCOMP 328: Machine Learning
COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationLecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.
Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends
More informationGeneral Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering
CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationLecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms
More informationCSC 411: Lecture 03: Linear Classification
CSC 411: Lecture 03: Linear Classification Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classification 1 / 24 Examples of Problems What
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationFIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS
LINEAR CLASSIFIER 1 FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS x f y High Value Customers Salary Task: Find Nb Orders 150 70 300 100 200 80 120 100 Low Value Customers Salary Nb Orders 40 80 220
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationMicroarray Data Analysis: Discovery
Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationEmpirical Risk Minimization, Model Selection, and Model Assessment
Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationIdentifying Extracellular Plant Proteins Based on Frequent Subsequences
Identifying Extracellular Plant Proteins Based on Frequent Subsequences Yang Wang, Osmar R. Zaïane, Randy Goebel and Gregory Taylor University of Alberta {wyang, zaiane, goebel}@cs.ualberta.ca Abstract
More informationPointwise Exact Bootstrap Distributions of Cost Curves
Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationIntroduction to Supervised Learning. Performance Evaluation
Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationMachine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77
Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77 Antidot Software vendor since 1999 Paris, Lyon, Aix-en-Provence 45 employees
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationA Bayesian Approach to Concept Drift
A Bayesian Approach to Concept Drift Stephen H. Bach Marcus A. Maloof Department of Computer Science Georgetown University Washington, DC 20007, USA {bach, maloof}@cs.georgetown.edu Abstract To cope with
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationIs cross-validation valid for small-sample microarray classification?
Is cross-validation valid for small-sample microarray classification? Braga-Neto et al. Bioinformatics 2004 Topics in Bioinformatics Presented by Lei Xu October 26, 2004 1 Review 1) Statistical framework
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationModern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K
Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationMachine Learning: A Statistics and Optimization Perspective
Machine Learning: A Statistics and Optimization Perspective Nan Ye Mathematical Sciences School Queensland University of Technology 1 / 109 What is Machine Learning? 2 / 109 Machine Learning Machine learning
More informationData Mining Part 4. Prediction
Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationBayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan
Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More informationEvaluating Classifiers. Lecture 2 Instructor: Max Welling
Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?
More information