Modeling with Rules. Cynthia Rudin. Assistant Professor of Sta8s8cs Massachuse:s Ins8tute of Technology
|
|
- Jordan Hawkins
- 5 years ago
- Views:
Transcription
1 Modeling with Rules Cynthia Rudin Assistant Professor of Sta8s8cs Massachuse:s Ins8tute of Technology joint work with: David Madigan (Columbia) Allison Chang, Ben Letham (MIT PhD students) Dimitris Bertsimas (MIT) Tyler McCormick (UW) Gene Kogan (Independent)
2 Would like predic8ve models that are both accurate and interpretable. Accuracy = classifica8on accuracy Interpretability =?
3 Would like predic8ve models that are both accurate and interpretable. Accuracy = classifica8on accuracy Interpretability = concise - model is small convincing - there are reasons behind each predic8on
4 Modeling with Rules Decision List Traffic jam in Boston? fenway park=1 1 97/100 8mes rush_hour= /523 8mes rain=0, construc8on= /482 8mes Friday=1-1 3/3 8mes rain= /892 8mes otherwise /15 8mes
5 Modeling with Rules Dichotomy in the State of the Art Accuracy Interpretability vs. Decision Trees Support Vector Machines Boosted Decision Trees
6 Modeling with Rules Daydreaming Nice if the whole algorithm were interpretable OR Want the accuracy of SVM/Boosted DT and the interpretability of Decision Trees.
7 Outline Part 1: Humans can interpret the predictions, and understand the full algorithm Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011 Part 2: Bayesian hierarchical modeling with rules A Hierarchical Model for Associa8on Rule Mining of Sequen8al Events: An Approach to Automated Medical Symptom Predic8on. (McCormick, R, Madigan) Annals of Applied Sta8s8cs, forthcoming 2012 Part 3: Accurate rule classifiers using MIO Ordered Rules for Classifica8on: A Discrete Op8miza8on Approach to Associa8ve Classifica8on (Bertsimas, Chang, R) In progress
8 Associa8on Rule Mining: (Agrawal; Imielinski; Swami, 1993) & (Agrawal and Srikant, 1994)
9 Construc8on=1 Traffic=1 Rain=1
10 15 8mes we saw construc8on and rain, and 13 out of 15 of those 8mes we also saw traffic Supp(construction=1 & rain=1) = 15 Supp(traffic=1 & construction=1 & rain=1) = 13 Conf (contruction=1 & rain=1 traffic=1) = 13 /15
11 Max Confidence, Min Support Algorithm Step 1. Find all rules a b, where Supp(a) θ. Conf (a b), Step 2. Rank rules in descending order of recommend the right hand side of the first rule that applies. Supp(a) Conf (a b), /15=.867 rush hour=0 Friday=1 otherwise /25= /17= /50=.68
12 15 8mes we saw construc8on and rain, and 13 out of 15 of those 8mes we also saw traffic Supp(construction=1 & rain=1) = 15 Supp(traffic=1 & construction=1 & rain=1) = 13 Conf (contruction=1 & rain=1 traffic=1) = 13 /15 Conf=.99, Supp=10000 vs. Conf=1, Supp=10
13 AdjustedConf (a b) := Supp(a & b) Supp(a) + K Bayesian version of the confidence
14 Step 1 Find all rules. Adjusted Confidence Algorithm a b AdjustedConf (a b), Step 2. Rank rules in descending order of recommend the right hand side of the first rule that applies. Supp(a) AdjustedConf (a b), K = 5 rush hour= /(25+5)= /(15+5)=.65 otherwise /(50+5)=.62 Friday= /(17+5)=.55
15 AdjustedConf (a b) := Supp(a & b) Supp(a) + K Rare rules can be used Among rules with similar confidence, prefers rules with higher support K encourages larger support, helps with predic8on Conf=.99, Support=10000 vs. Conf=1, Support=10
16 Humans can understand the prediction, and the algorithm Good for sequential event problems, where a set of events happen in a particular order e.g., for predicting what a customer will put next into an online shopping cart, or for predicting medical symptoms in a sequence Having larger K helps with generalization algorithmic stability (pointwise hypothesis stability) other learning theoretic implications Performs better empirically than the Max-Conf Min-Support Classifiers in our experiments A Learning Theory Framework for Associa8on Rules and Sequen8al Events (R, Letham, Kogan, Madigan) SSRN 2011 Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011
17 Outline Part 1: Humans can interpret the predictions, and understand the full algorithm Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011 Part 2: Bayesian hierarchical modeling with rules A Hierarchical Model for Associa8on Rule Mining of Sequen8al Events: An Approach to Automated Medical Symptom Predic8on. (McCormick, R, Madigan) Annals of Applied Sta8s8cs, forthcoming 2012 Part 3: Accurate rule classifiers using MIO Ordered Rules for Classifica8on: A Discrete Op8miza8on Approach to Associa8ve Classifica8on (Bertsimas, Chang, R) In progress
18 Recommender Systems for Medical Condi8ons Input medical condi8on: Predic8on based on your medical history:
19 Recommender Systems for Medical Condi8ons Input medical condi8on: Predic8on based on your medical history: dyspepsia & epigastric pain depression Gastroesophageal reflux heartburn high blood pressure high blood pressure
20 Medical Condi8on Predic8on heartburn headache dyspepsia fungal infec8on heartburn epigastric pain hypertension dyspepsia t Recommenda8ons 1. rhini8s 2. dyspepsia 3. low back pain Recommenda8ons 1. dyspepsia 2. high blood pressure 3. low back pain Recommenda8ons 1. epigastric pain 2. heartburn 3. high blood pressure
21 Hierarchical Associa8on Rule Model (HARM)
22 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r
23 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) shared across individuals
24 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i )
25 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i ) Under this model, E(p ir y ir,n ir ) = y ir + π ir n ir + π ir + τ i.
26 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i ) π ir = exp(m' i β r + γ i )
27 Hierarchical Associa8on Rule Model (HARM) π ir = exp(m' i β r + γ i )
28 Hierarchical Associa8on Rule Model (HARM) M I D (observable characteristics) 1 1 π ir = exp(m' i β r + γ i )
29 Hierarchical Associa8on Rule Model (HARM) M I D (observable characteristics) 1 1 π ir = exp(m' i β r + γ i ) Example: π ir = exp(β r,0 + β r,1 1 male + γ i ) = exp(β r,1 1 male )exp(β r,0 + γ i )
30 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i ) π ir = exp(m' i β r + γ i )
31 Hierarchical Associa8on Rule Model (HARM) y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i ) π ir = exp(m' i β r + γ i ) log(τ i ) ~ Normal(0,σ τ 2 ) log(β rd ) ~ Normal(µ β,σ β 2 ) log(γ i ) ~ Normal(µ γ,σ γ 2 ) diffuse uniform priors on µ β,σ β 2,σ τ 2 HARM estimates posterior distribution (MCMC), then ranks rules by posterior mean.
32 Hierarchical Associa8on Rule Model (HARM) 43,000 pa8ent encounters ~2,300 pa8ents, age (> 40) pre- exis8ng condi8ons dealt with separately used 25 most common condi8ons, and 25 least common condi8ons
33 For trials=1:500 Form training and test sets: sample ~200 patients for each patient, randomly split encounters into training and test t training test
34 For trials=1:500 Form training and test sets: sample ~200 patients for each patient, randomly split encounters into training and test For each patient, iteratively make predictions on test encounters get 1 point whenever our top 3 recommendations contain patient s next condition t training test
35 (a) All patients Proportion of correct predictions HARM Conf. Adj. k=.25 Adj. k=.5 Adj. k=1 Adj. k=2 Thresh.=2 Thresh.=
36 Myocardial infarc8on in pa8ents with hypertension, in treatment (T) and placebo (P) groups HARM Confidence Rescaled Risk P T P T P T P T Over 70 T P T P T P T P Over 70 Mean of posterior means Key: Middle 90% Middle half
37 Myocardial infarc8on in pa8ents with high cholesterol, in treatment (T) and placebo (P) groups HARM Confidence Rescaled Risk P T P T P T P T Over 70 P T P T P T P T Over 70 Mean of posterior means Key: Middle 90% Middle half
38 Outline Part 1: Humans can interpret the predictions, and understand the full algorithm Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011 Part 2: Bayesian hierarchical modeling with rules A Hierarchical Model for Associa8on Rule Mining of Sequen8al Events: An Approach to Automated Medical Symptom Predic8on. (McCormick, R, Madigan) Annals of Applied Sta8s8cs, forthcoming 2012 Part 3: Accurate rule classifiers using MIO Ordered Rules for Classifica8on: A Discrete Op8miza8on Approach to Associa8ve Classifica8on (Bertsimas, Chang, R) In progress
39 Mixed Integer Optimization MIO/MIP is a style of mathematical programming Not generally used for ML perception from 1970 s that MIO s are intractable
40 Mixed Integer Optimization MIO/MIP is a style of mathematical programming Not generally used for ML perception from 1970 s that MIO s are intractable Not all valid MIO formulations are equally strong
41 Mixed Integer Optimization MIO/MIP is a style of mathematical programming Not generally used for ML perception from 1970 s that MIO s are intractable Not all valid MIO formulations are equally strong Can use LP relaxations for very large scale problems
42 Mixed Integer Optimization MIO/MIP is a style of mathematical programming Not generally used for ML perception from 1970 s that MIO s are intractable Not all valid MIO formulations are equally strong Can use LP relaxations for very large scale problems Associa8on rules historically plagued by combinatorial explosion...
43 Ordered Rules for Classifica8on Minimize misclassifica8on error, regularize by height of the highest null rule. null rules : higher one predicts the default class and ends the list. 43
44 MIO Learning Algorithm
45 MIO Learning Algorithm Maximize classificaaon accuracy
46 MIO Learning Algorithm Maximize classificaaon accuracy Maximize rank of the highest null rule (regularizaaon)
47 Experiments Five algorithms Logis8c Regression (LogReg) Support Vector Machines / RBF kernel (SVM) Classifica8on and Regression Trees (CART) Boosted Decision Trees (AdaBoost) Ordered Rules for Classifica8on (ORC) Several publicly available datasets (UCI) Accuracy averaged over 3 folds
48 Classifica8on Accuracy
49 CART on Tic Tac Toe yes o no o ~x ~x ~x 0.26 x : : : : o o ~o 1 CART accuracy = x : :
50 ORC on Tic Tac Toe x x x x x x x x x x x x x x x x x x x x x x x x x wins x wins x wins x wins x wins x wins x wins x wins 9 x does not win ORC accuracy = 1
51 MONKS Problems 1 6 Integer valued features taking values 1,2,3,4 Examples are in class 1 if either a1=a2 or a5=1
52 CART on MONKS Problems 1 Examples are in class 1 if either a1=a2 or a5=1
53 ORC on MONKS Problems 1 a1=3, a2=3 1 (33/33) a1=2, a2=2 1 (30/30) a5=1 1 (65/65) a1=1, a2=1 1 (31/31) 1 (152/288) Examples are in class 1 if either a1=a2 or a5=1
54 The bo:om line: You don t need to sacrifice accuracy to get interpretability.
55 Outline Part 1: Humans can interpret the predictions, and understand the full algorithm Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011 Part 2: Bayesian hierarchical modeling with rules A Hierarchical Model for Associa8on Rule Mining of Sequen8al Events: An Approach to Automated Medical Symptom Predic8on. (McCormick, R, Madigan) Annals of Applied Sta8s8cs, forthcoming 2012 Part 3: Accurate rule classifiers using MIO Ordered Rules for Classifica8on: A Discrete Op8miza8on Approach to Associa8ve Classifica8on (Bertsimas, Chang, R) In progress current work coming up
56 Associa8on Rules/ Associa8ve Classifica8on Bayesian Analysis Logical Analysis of Data (LAD) ML algorithms that use rules as features Decision Lists Decision Trees
57 Current Work Machine Learning for the NYC Power Grid cover of IEEE Computer, spotlight issue for IEEE TPAMI in February, WIRED Science, Slashdot, US News & World Report... Supervised Ranking, Equivalences between Ranking and Classifica8on, Ranking with MIO Reverse- Engineering Quality Rankings in Businessweek last week ML algorithms that understand how they will be used for a subsequent task Several other projects
58 Thank you!
An Integer Optimization Approach to Associative Classification
An Integer Optimization Approach to Associative Classification Dimitris Bertsimas Allison Chang Cynthia Rudin Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139 dbertsim@mit.edu
More informationOPERATIONS RESEARCH CENTER Working Paper
OPERATIONS RESEARCH CENTER Working Paper ORC: Ordered Rules for Classification A Discrete Optimization Approach to Associative Classification OR 386-11 by Dimitris Bertsimas Allison Chang Cynthia Rudin
More informationCse537 Ar*fficial Intelligence Short Review 1 for Midterm 2. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse537 Ar*fficial Intelligence Short Review 1 for Midterm 2 Professor Anita Wasilewska Computer Science Department Stony Brook University Data Mining Process Ques*ons: Describe and discuss all stages of
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationRecitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14
Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationSequential event prediction
Noname manuscript No. (will be inserted by the editor) Sequential event prediction Benjamin Letham Cynthia Rudin David Madigan Received: date / Accepted: date Abstract In sequential event prediction, we
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationClassifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University
Classifica(on and predic(on omics style Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on Learning Set Data with known classes Prediction Classification rule Data with unknown
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationCS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February
CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February 1 PageRank You are given in the file adjency.mat a matrix G of size n n where n = 1000 such that { 1 if outbound link from i to j,
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationBayesian Patchworks: An Approach to Case-Based Reasoning
Bayesian Patchworks: An Approach to Case-Based Reasoning Ramin Moghaddass, Cynthia Rudin Abstract Doctors often rely on their past experience in order to diagnose patients. For a doctor with enough experience,
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationDecision Trees Lecture 12
Decision Trees Lecture 12 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Machine Learning in the ER Physician documentation Triage Information
More informationSlides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The
More informationCSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.
22 February 2007 CSE-4412(M) Midterm p. 1 of 12 CSE-4412(M) Midterm Sur / Last Name: Given / First Name: Student ID: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2007 Answer the following
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationHierarchical models for the rainfall forecast DATA MINING APPROACH
Hierarchical models for the rainfall forecast DATA MINING APPROACH Thanh-Nghi Do dtnghi@cit.ctu.edu.vn June - 2014 Introduction Problem large scale GCM small scale models Aim Statistical downscaling local
More informationEmpirical Risk Minimization, Model Selection, and Model Assessment
Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner
More informationAlgorithmic Stability and Generalization Christoph Lampert
Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts
More informationManual for a computer class in ML
Manual for a computer class in ML November 3, 2015 Abstract This document describes a tour of Machine Learning (ML) techniques using tools in MATLAB. We point to the standard implementations, give example
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler
+ Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationBias/variance tradeoff, Model assessment and selec+on
Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationClassification Lecture 2: Methods
Classification Lecture 2: Methods Jing Gao SUNY Buffalo 1 Outline Basics Problem, goal, evaluation Methods Nearest Neighbor Decision Tree Naïve Bayes Rule-based Classification Logistic Regression Support
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationEstimating the accuracy of a hypothesis Setting. Assume a binary classification setting
Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More informationFACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION
SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationTop-k Parametrized Boost
Top-k Parametrized Boost Turki Turki 1,4, Muhammad Amimul Ihsan 2, Nouf Turki 3, Jie Zhang 4, Usman Roshan 4 1 King Abdulaziz University P.O. Box 80221, Jeddah 21589, Saudi Arabia tturki@kau.edu.sa 2 Department
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationTopics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning
Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationOutline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond
Outline Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond Robert F. Murphy External Senior Fellow, Freiburg Ins>tute for Advanced Studies Ray and Stephanie Lane Professor
More informationGenerative Model (Naïve Bayes, LDA)
Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationLogis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationParameter learning in CRF s
Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationBayesian Hypotheses Testing
Bayesian Hypotheses Testing Jakub Repický Faculty of Mathematics and Physics, Charles University Institute of Computer Science, Czech Academy of Sciences Selected Parts of Data Mining Jan 19 2018, Prague
More informationBayesian Statistics. Debdeep Pati Florida State University. February 11, 2016
Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationMachine Learning 2nd Edi7on
Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
More informationRelationship between Least Squares Approximation and Maximum Likelihood Hypotheses
Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a
More information