Modeling with Rules. Cynthia Rudin. Assistant Professor of Sta8s8cs Massachuse:s Ins8tute of Technology

Size: px
Start display at page:

Download "Modeling with Rules. Cynthia Rudin. Assistant Professor of Sta8s8cs Massachuse:s Ins8tute of Technology"

Transcription

1 Modeling with Rules Cynthia Rudin Assistant Professor of Sta8s8cs Massachuse:s Ins8tute of Technology joint work with: David Madigan (Columbia) Allison Chang, Ben Letham (MIT PhD students) Dimitris Bertsimas (MIT) Tyler McCormick (UW) Gene Kogan (Independent)

2 Would like predic8ve models that are both accurate and interpretable. Accuracy = classifica8on accuracy Interpretability =?

3 Would like predic8ve models that are both accurate and interpretable. Accuracy = classifica8on accuracy Interpretability = concise - model is small convincing - there are reasons behind each predic8on

4 Modeling with Rules Decision List Traffic jam in Boston? fenway park=1 1 97/100 8mes rush_hour= /523 8mes rain=0, construc8on= /482 8mes Friday=1-1 3/3 8mes rain= /892 8mes otherwise /15 8mes

5 Modeling with Rules Dichotomy in the State of the Art Accuracy Interpretability vs. Decision Trees Support Vector Machines Boosted Decision Trees

6 Modeling with Rules Daydreaming Nice if the whole algorithm were interpretable OR Want the accuracy of SVM/Boosted DT and the interpretability of Decision Trees.

7 Outline Part 1: Humans can interpret the predictions, and understand the full algorithm Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011 Part 2: Bayesian hierarchical modeling with rules A Hierarchical Model for Associa8on Rule Mining of Sequen8al Events: An Approach to Automated Medical Symptom Predic8on. (McCormick, R, Madigan) Annals of Applied Sta8s8cs, forthcoming 2012 Part 3: Accurate rule classifiers using MIO Ordered Rules for Classifica8on: A Discrete Op8miza8on Approach to Associa8ve Classifica8on (Bertsimas, Chang, R) In progress

8 Associa8on Rule Mining: (Agrawal; Imielinski; Swami, 1993) & (Agrawal and Srikant, 1994)

9 Construc8on=1 Traffic=1 Rain=1

10 15 8mes we saw construc8on and rain, and 13 out of 15 of those 8mes we also saw traffic Supp(construction=1 & rain=1) = 15 Supp(traffic=1 & construction=1 & rain=1) = 13 Conf (contruction=1 & rain=1 traffic=1) = 13 /15

11 Max Confidence, Min Support Algorithm Step 1. Find all rules a b, where Supp(a) θ. Conf (a b), Step 2. Rank rules in descending order of recommend the right hand side of the first rule that applies. Supp(a) Conf (a b), /15=.867 rush hour=0 Friday=1 otherwise /25= /17= /50=.68

12 15 8mes we saw construc8on and rain, and 13 out of 15 of those 8mes we also saw traffic Supp(construction=1 & rain=1) = 15 Supp(traffic=1 & construction=1 & rain=1) = 13 Conf (contruction=1 & rain=1 traffic=1) = 13 /15 Conf=.99, Supp=10000 vs. Conf=1, Supp=10

13 AdjustedConf (a b) := Supp(a & b) Supp(a) + K Bayesian version of the confidence

14 Step 1 Find all rules. Adjusted Confidence Algorithm a b AdjustedConf (a b), Step 2. Rank rules in descending order of recommend the right hand side of the first rule that applies. Supp(a) AdjustedConf (a b), K = 5 rush hour= /(25+5)= /(15+5)=.65 otherwise /(50+5)=.62 Friday= /(17+5)=.55

15 AdjustedConf (a b) := Supp(a & b) Supp(a) + K Rare rules can be used Among rules with similar confidence, prefers rules with higher support K encourages larger support, helps with predic8on Conf=.99, Support=10000 vs. Conf=1, Support=10

16 Humans can understand the prediction, and the algorithm Good for sequential event problems, where a set of events happen in a particular order e.g., for predicting what a customer will put next into an online shopping cart, or for predicting medical symptoms in a sequence Having larger K helps with generalization algorithmic stability (pointwise hypothesis stability) other learning theoretic implications Performs better empirically than the Max-Conf Min-Support Classifiers in our experiments A Learning Theory Framework for Associa8on Rules and Sequen8al Events (R, Letham, Kogan, Madigan) SSRN 2011 Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011

17 Outline Part 1: Humans can interpret the predictions, and understand the full algorithm Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011 Part 2: Bayesian hierarchical modeling with rules A Hierarchical Model for Associa8on Rule Mining of Sequen8al Events: An Approach to Automated Medical Symptom Predic8on. (McCormick, R, Madigan) Annals of Applied Sta8s8cs, forthcoming 2012 Part 3: Accurate rule classifiers using MIO Ordered Rules for Classifica8on: A Discrete Op8miza8on Approach to Associa8ve Classifica8on (Bertsimas, Chang, R) In progress

18 Recommender Systems for Medical Condi8ons Input medical condi8on: Predic8on based on your medical history:

19 Recommender Systems for Medical Condi8ons Input medical condi8on: Predic8on based on your medical history: dyspepsia & epigastric pain depression Gastroesophageal reflux heartburn high blood pressure high blood pressure

20 Medical Condi8on Predic8on heartburn headache dyspepsia fungal infec8on heartburn epigastric pain hypertension dyspepsia t Recommenda8ons 1. rhini8s 2. dyspepsia 3. low back pain Recommenda8ons 1. dyspepsia 2. high blood pressure 3. low back pain Recommenda8ons 1. epigastric pain 2. heartburn 3. high blood pressure

21 Hierarchical Associa8on Rule Model (HARM)

22 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r

23 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) shared across individuals

24 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i )

25 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i ) Under this model, E(p ir y ir,n ir ) = y ir + π ir n ir + π ir + τ i.

26 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i ) π ir = exp(m' i β r + γ i )

27 Hierarchical Associa8on Rule Model (HARM) π ir = exp(m' i β r + γ i )

28 Hierarchical Associa8on Rule Model (HARM) M I D (observable characteristics) 1 1 π ir = exp(m' i β r + γ i )

29 Hierarchical Associa8on Rule Model (HARM) M I D (observable characteristics) 1 1 π ir = exp(m' i β r + γ i ) Example: π ir = exp(β r,0 + β r,1 1 male + γ i ) = exp(β r,1 1 male )exp(β r,0 + γ i )

30 Hierarchical Associa8on Rule Model (HARM) i patient index, r rule index of lhs r rhs r y ir := Supp i (rhs r & lhs r ) n ir := Supp i (lhs r ) We'll model y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i ) π ir = exp(m' i β r + γ i )

31 Hierarchical Associa8on Rule Model (HARM) y ir ~ Binomial(n ir, p ir ) p ir ~ Beta(π ir,τ i ) π ir = exp(m' i β r + γ i ) log(τ i ) ~ Normal(0,σ τ 2 ) log(β rd ) ~ Normal(µ β,σ β 2 ) log(γ i ) ~ Normal(µ γ,σ γ 2 ) diffuse uniform priors on µ β,σ β 2,σ τ 2 HARM estimates posterior distribution (MCMC), then ranks rules by posterior mean.

32 Hierarchical Associa8on Rule Model (HARM) 43,000 pa8ent encounters ~2,300 pa8ents, age (> 40) pre- exis8ng condi8ons dealt with separately used 25 most common condi8ons, and 25 least common condi8ons

33 For trials=1:500 Form training and test sets: sample ~200 patients for each patient, randomly split encounters into training and test t training test

34 For trials=1:500 Form training and test sets: sample ~200 patients for each patient, randomly split encounters into training and test For each patient, iteratively make predictions on test encounters get 1 point whenever our top 3 recommendations contain patient s next condition t training test

35 (a) All patients Proportion of correct predictions HARM Conf. Adj. k=.25 Adj. k=.5 Adj. k=1 Adj. k=2 Thresh.=2 Thresh.=

36 Myocardial infarc8on in pa8ents with hypertension, in treatment (T) and placebo (P) groups HARM Confidence Rescaled Risk P T P T P T P T Over 70 T P T P T P T P Over 70 Mean of posterior means Key: Middle 90% Middle half

37 Myocardial infarc8on in pa8ents with high cholesterol, in treatment (T) and placebo (P) groups HARM Confidence Rescaled Risk P T P T P T P T Over 70 P T P T P T P T Over 70 Mean of posterior means Key: Middle 90% Middle half

38 Outline Part 1: Humans can interpret the predictions, and understand the full algorithm Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011 Part 2: Bayesian hierarchical modeling with rules A Hierarchical Model for Associa8on Rule Mining of Sequen8al Events: An Approach to Automated Medical Symptom Predic8on. (McCormick, R, Madigan) Annals of Applied Sta8s8cs, forthcoming 2012 Part 3: Accurate rule classifiers using MIO Ordered Rules for Classifica8on: A Discrete Op8miza8on Approach to Associa8ve Classifica8on (Bertsimas, Chang, R) In progress

39 Mixed Integer Optimization MIO/MIP is a style of mathematical programming Not generally used for ML perception from 1970 s that MIO s are intractable

40 Mixed Integer Optimization MIO/MIP is a style of mathematical programming Not generally used for ML perception from 1970 s that MIO s are intractable Not all valid MIO formulations are equally strong

41 Mixed Integer Optimization MIO/MIP is a style of mathematical programming Not generally used for ML perception from 1970 s that MIO s are intractable Not all valid MIO formulations are equally strong Can use LP relaxations for very large scale problems

42 Mixed Integer Optimization MIO/MIP is a style of mathematical programming Not generally used for ML perception from 1970 s that MIO s are intractable Not all valid MIO formulations are equally strong Can use LP relaxations for very large scale problems Associa8on rules historically plagued by combinatorial explosion...

43 Ordered Rules for Classifica8on Minimize misclassifica8on error, regularize by height of the highest null rule. null rules : higher one predicts the default class and ends the list. 43

44 MIO Learning Algorithm

45 MIO Learning Algorithm Maximize classificaaon accuracy

46 MIO Learning Algorithm Maximize classificaaon accuracy Maximize rank of the highest null rule (regularizaaon)

47 Experiments Five algorithms Logis8c Regression (LogReg) Support Vector Machines / RBF kernel (SVM) Classifica8on and Regression Trees (CART) Boosted Decision Trees (AdaBoost) Ordered Rules for Classifica8on (ORC) Several publicly available datasets (UCI) Accuracy averaged over 3 folds

48 Classifica8on Accuracy

49 CART on Tic Tac Toe yes o no o ~x ~x ~x 0.26 x : : : : o o ~o 1 CART accuracy = x : :

50 ORC on Tic Tac Toe x x x x x x x x x x x x x x x x x x x x x x x x x wins x wins x wins x wins x wins x wins x wins x wins 9 x does not win ORC accuracy = 1

51 MONKS Problems 1 6 Integer valued features taking values 1,2,3,4 Examples are in class 1 if either a1=a2 or a5=1

52 CART on MONKS Problems 1 Examples are in class 1 if either a1=a2 or a5=1

53 ORC on MONKS Problems 1 a1=3, a2=3 1 (33/33) a1=2, a2=2 1 (30/30) a5=1 1 (65/65) a1=1, a2=1 1 (31/31) 1 (152/288) Examples are in class 1 if either a1=a2 or a5=1

54 The bo:om line: You don t need to sacrifice accuracy to get interpretability.

55 Outline Part 1: Humans can interpret the predictions, and understand the full algorithm Sequen8al Event Predic8on with Associa8on Rules (R, Letham, Aouissi, Kogan, Madigan) - COLT 2011 Part 2: Bayesian hierarchical modeling with rules A Hierarchical Model for Associa8on Rule Mining of Sequen8al Events: An Approach to Automated Medical Symptom Predic8on. (McCormick, R, Madigan) Annals of Applied Sta8s8cs, forthcoming 2012 Part 3: Accurate rule classifiers using MIO Ordered Rules for Classifica8on: A Discrete Op8miza8on Approach to Associa8ve Classifica8on (Bertsimas, Chang, R) In progress current work coming up

56 Associa8on Rules/ Associa8ve Classifica8on Bayesian Analysis Logical Analysis of Data (LAD) ML algorithms that use rules as features Decision Lists Decision Trees

57 Current Work Machine Learning for the NYC Power Grid cover of IEEE Computer, spotlight issue for IEEE TPAMI in February, WIRED Science, Slashdot, US News & World Report... Supervised Ranking, Equivalences between Ranking and Classifica8on, Ranking with MIO Reverse- Engineering Quality Rankings in Businessweek last week ML algorithms that understand how they will be used for a subsequent task Several other projects

58 Thank you!

An Integer Optimization Approach to Associative Classification

An Integer Optimization Approach to Associative Classification An Integer Optimization Approach to Associative Classification Dimitris Bertsimas Allison Chang Cynthia Rudin Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139 dbertsim@mit.edu

More information

OPERATIONS RESEARCH CENTER Working Paper

OPERATIONS RESEARCH CENTER Working Paper OPERATIONS RESEARCH CENTER Working Paper ORC: Ordered Rules for Classification A Discrete Optimization Approach to Associative Classification OR 386-11 by Dimitris Bertsimas Allison Chang Cynthia Rudin

More information

Cse537 Ar*fficial Intelligence Short Review 1 for Midterm 2. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse537 Ar*fficial Intelligence Short Review 1 for Midterm 2. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse537 Ar*fficial Intelligence Short Review 1 for Midterm 2 Professor Anita Wasilewska Computer Science Department Stony Brook University Data Mining Process Ques*ons: Describe and discuss all stages of

More information

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign

More information

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14 Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Sequential event prediction

Sequential event prediction Noname manuscript No. (will be inserted by the editor) Sequential event prediction Benjamin Letham Cynthia Rudin David Madigan Received: date / Accepted: date Abstract In sequential event prediction, we

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

More information

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on and predic(on omics style Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on Learning Set Data with known classes Prediction Classification rule Data with unknown

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier

More information

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February 1 PageRank You are given in the file adjency.mat a matrix G of size n n where n = 1000 such that { 1 if outbound link from i to j,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Bayesian Patchworks: An Approach to Case-Based Reasoning

Bayesian Patchworks: An Approach to Case-Based Reasoning Bayesian Patchworks: An Approach to Case-Based Reasoning Ramin Moghaddass, Cynthia Rudin Abstract Doctors often rely on their past experience in order to diagnose patients. For a doctor with enough experience,

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

Decision Trees Lecture 12

Decision Trees Lecture 12 Decision Trees Lecture 12 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Machine Learning in the ER Physician documentation Triage Information

More information

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The

More information

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated. 22 February 2007 CSE-4412(M) Midterm p. 1 of 12 CSE-4412(M) Midterm Sur / Last Name: Given / First Name: Student ID: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2007 Answer the following

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Hierarchical models for the rainfall forecast DATA MINING APPROACH

Hierarchical models for the rainfall forecast DATA MINING APPROACH Hierarchical models for the rainfall forecast DATA MINING APPROACH Thanh-Nghi Do dtnghi@cit.ctu.edu.vn June - 2014 Introduction Problem large scale GCM small scale models Aim Statistical downscaling local

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Algorithmic Stability and Generalization Christoph Lampert

Algorithmic Stability and Generalization Christoph Lampert Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts

More information

Manual for a computer class in ML

Manual for a computer class in ML Manual for a computer class in ML November 3, 2015 Abstract This document describes a tour of Machine Learning (ML) techniques using tools in MATLAB. We point to the standard implementations, give example

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Gradient Boosting (Continued)

Gradient Boosting (Continued) Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive

More information

Bias/variance tradeoff, Model assessment and selec+on

Bias/variance tradeoff, Model assessment and selec+on Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Classification Lecture 2: Methods

Classification Lecture 2: Methods Classification Lecture 2: Methods Jing Gao SUNY Buffalo 1 Outline Basics Problem, goal, evaluation Methods Nearest Neighbor Decision Tree Naïve Bayes Rule-based Classification Logistic Regression Support

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Top-k Parametrized Boost

Top-k Parametrized Boost Top-k Parametrized Boost Turki Turki 1,4, Muhammad Amimul Ihsan 2, Nouf Turki 3, Jie Zhang 4, Usman Roshan 4 1 King Abdulaziz University P.O. Box 80221, Jeddah 21589, Saudi Arabia tturki@kau.edu.sa 2 Department

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond Outline Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond Robert F. Murphy External Senior Fellow, Freiburg Ins>tute for Advanced Studies Ray and Stephanie Lane Professor

More information

Generative Model (Naïve Bayes, LDA)

Generative Model (Naïve Bayes, LDA) Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Parameter learning in CRF s

Parameter learning in CRF s Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.

More information

Hierarchical Boosting and Filter Generation

Hierarchical Boosting and Filter Generation January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Bayesian Hypotheses Testing

Bayesian Hypotheses Testing Bayesian Hypotheses Testing Jakub Repický Faculty of Mathematics and Physics, Charles University Institute of Computer Science, Czech Academy of Sciences Selected Parts of Data Mining Jan 19 2018, Prague

More information

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016 Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Machine Learning 2nd Edi7on

Machine Learning 2nd Edi7on Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information