A Maximum Entropy Approach to Species Distribution Modeling. Rob Schapire Steven Phillips Miro Dudík Rob Anderson.

Size: px
Start display at page:

Download "A Maximum Entropy Approach to Species Distribution Modeling. Rob Schapire Steven Phillips Miro Dudík Rob Anderson."

Transcription

1 A Maximum Entropy Approach to Species Distribution Modeling Rob Schapire Steven Phillips Miro Dudík Rob Anderson schapire

2 The Problem: Species Habitat Modeling goal: model distribution of plant or animal species given: occurrence records given: environmental variables precipitation wet days average temperature desired output: map of range

3 Issues no negative examples very limited data sample bias data often collected over a range of years or decades geographic uncertainty realized versus potential range hard versus soft predictions interpretability of constructed model

4 Our Approach assume recorded localities come from probability distribution π try to estimate π apply maximum entropy approach

5 The Maxent Set-up given: samples x i X, x i π features (base functions) f j : X R output: ˆπ = estimate of π, in terms of features

6 Maxent and Habitat Modeling x i = recorded locality X = all localities (discretized) π = distribution of localities inhabited by species assumes locality chosen at random from entire range ignores sample bias and dependence between samples features: use raw environmental variables v k or derived functions linear: f j = v k quadratic: f j = vk 2 product: f j = v k v l 1 if v threshold: f j = k a 0 else # features can become very large (even infinite)

7 This Talk maxent with relaxed constraints new performance guarantees for maxent useful even with very large number of features algorithm and convergence lots of experiments with habitat modeling data

8 The Abstract Framework given: x 1,...,x m X x i π features f 1,...,f n f j : X [0, 1] goal: find ˆπ = estimate of π let π = empirical distribution i.e., π(x) =#{i : x i = x}/m notation: π[f] =expectation of f with respect to π (so π[f] =empirical average of f)

9 Maxent π typically very poor estimate of π but: π[f j ] likely to be reasonable estimate of π[f j ] so: choose distribution ˆπ such that ˆπ[f j ]= π[f j ] for all features f j among these, choose one closest to uniform, i.e., of maximum entropy [Jaynes]

10 Maxent Can Badly Overfit say X = {1,...,n}, n large 1 if x j features: f j (x) = 0 else can show only distribution satisfying constraints is π clear overfitting

11 A More Relaxed Version generally, only expect π[f j ] π[f j ] usually, can estimate upper bound on π[f j ] π[f j ] so: compute ˆπ to maximize H(ˆπ) (= entropy) subject to j : π[f j ] ˆπ[f j ] β j where β j = known upper bound [Kazama & Tsujii]

12 Duality in unrelaxed case, solution is maximum likelihood Gibbs distribution Gibbs distribution: q λ (x) = exp j λ jf j (x) can show solution in relaxed case is Gibbs distribution q λ that minimizes: 1 m ln q λ(x i ) + β j λ j i j }{{} negative log likelihood Z λ }{{} regularization

13 Equivalent Motivations maxent with relaxed constraints log loss with regularization MAP estimate with Laplace prior on weights λ

14 How Good Is Maxent Estimate? would like to bound distance between ˆπ and π bound should be in terms of: # of examples m distance between π and best Gibbs distribution π = q λ complexity of features smoothness of π measure distance using relative entropy RE(p q) =p[ln(p/q)]

15 Bounds for Finite Feature Classes with high probability, for all λ RE(π ˆπ) RE(π π )+O (for choice of β j based only on n and m) λ 1 ln n m e.g., if π best Gibbs distribution with λ 1 K then RE(π ˆπ) RE(π π )+O very moderate in number of features K ln n m

16 Bounds for Infinite Binary Feature Classes assume binary features with VC-dimension d then with high probability, for all λ : RE(π ˆπ) RE(π π )+Õ (for choice of β j based only on d and m) λ 1 d m

17 Main Theorem assume j : π[f j ] π[f j ] β j then RE(π ˆπ) RE(π π )+2 j β j λ j previous results are simple corollaries using standard uniform convergence results

18 Proof of Main Theorem let LL(p q) =p[ ln q] =RE(p q)+h(p) want: LL(π ˆπ) LL(π π )+2 β j λ j j claim: for all λ LL(π q λ ) LL( π q λ ) β j λ j j proof: LL(π q λ ) LL( π q λ ) = π [ λ j f j +lnz λ ] π [ λ j f j +lnz λ ] = λ j ( π[f j ] π[f j ]) β j λ j

19 Proof (cont.) let ˆπ = qˆλ then LL(π ˆπ) = LL(π qˆλ ) LL( π qˆλ )+ β j ˆλ j LL( π q λ )+ β j λ j LL(π q λ )+2 β j λ j = LL(π π )+2 β j λ j

20 Finding an Algorithm want to minimize L(λ) = 1 m i ln q λ(x i )+ j β j λ j no analytical solution instead, iteratively compute λ 1, λ 2,...so that L(λ t ) converges to minimum most algorithms for maxent update all weights λ j simultaneously less practical when very large number of features

21 Sequential-update Algorithm instead update just one weight at a time leads to sparser solution sometimes can search for best weight to update very efficiently analogous to boosting weak learner acts as oracle for choosing function (weak classifier) from large space

22 Idea for Algorithm say add δ to λ j new weight vector λ derive bound F such that L(λ ) L(λ) F (λ,j,δ) choose j, δ to minimize F repeat in our case, can choose F = δ π[f j ]+ln(1+(e δ 1)q λ [f j ]) + β j ( λ j + δ λ j ) possible values for minimizing δ can be found analytically in practice, combine with approximate line search to speed convergence

23 Convergence have that L(λ t+1 ) L(λ t ) min j,δ F (λ t,j,δ)=g(λ t ) enough to show: G(λ) 0 G continuous G(λ) =0 λ satisfies KKT conditions for optimization problem

24 Goals of Experiments does maxent work for this problem? what features work best? how many examples are enough? how to choose β j s? how interpretable are results? what is a biologist s qualitative assessment of results?

25 Breeding Bird Experiments used data from North American Breeding Bird Survey huge dataset with many species we chose 12 ranged from 78 to 2,773 occurence records used seven publicly available environmental variables basic approach: train with half sample points (randomly selected) test on rest repeat 10 times

26 Using Maxent features: linear, quadratic, product, threshold used β j = β σ[f j] m

27 GARP [Stockwell & Noble; Stockwell & Peters] compared to Genetic Algorithm for Ruleset Prediction (GARP) widely used by biologists just about only method that uses positive-only data outputs binary predictions commonly combined with ensemble-like technique ( best subsets ) integer-valued predictions in {0, 1,...,10}

28 How to Compare? maxent outputs probabilities treat each algorithm as providing ranking of localities from most to least habitable compare using ROC curve and area under curve (AUC) AUC = probability randomly chosen positive and negative examples are ranked correctly

29 ROC Curves true positive rate true positive rate Yellow throated Vireo Loggerhead Shrike false positive rate LQP T GARP L LQ false positive rate LQP T GARP L LQ

30 AUC Comparisons maxent LQP maxent T GARP GARP

31 Classification Accuracy maxent threshold chosen to predict same size habitat area as GARP then compute omission rate (fraction of test localities omitted) maxent LQP maxent T GARP GARP

32 How Many Examples Are Enough? (AUC) area under ROC curve (AUC) Gray Vireo Hutton s Vireo Plumbeous V. number of training examples (m) Blue h. Vireo Yellow th. V. Loggerh. Shrike LQP, β=0.1 T, β=1.0 LQ, β=0.1 L, β=0.1 GARP, all available training data

33 How Many Examples Are Enough? (log loss) log loss on test examples Gray Vireo Hutton s Vireo Plumbeous V. number of training examples (m) Blue h. Vireo Yellow th. V. Loggerh. Shrike LQP, β=0.1 LQ, β=0.1 T, β=1.0 L, β=0.1

34 Sensitivity to Regularization Parameter (AUC) Threshold features area under ROC curve (AUC) Hutton s Vireo.75 Blue h. Vireo.75 Yellow th. V Linear, quadratic and product features Hutton s Vireo Blue h. Vireo Yellow th. V. regularization value (β) m=10 m=32 m=100 m=316

35 Sensitivity to Regularization Parameter (log loss) log loss on test examples Threshold features Hutton s Vireo Blue h. Vireo Yellow th. V Linear, quadratic and product features regularization value (β) Hutton s Vireo Blue h. Vireo Yellow th. V m=10 m=32 m=100 m=316

36 Interpreting Maxent Models recall q λ (x) exp( j λ jf j (x)) for each environmental variable, can plot total contribution to exponent aspect elevation slope additive contribution to exponent precipitation avg. temperature no. wet days value of environmental variable threshold, β=1.0 linear+quadratic, β=0.1 threshold, β=0.01

37 Case Study: Microryzomys minutus

38 Case Study (cont.) accurately captures realized range did not predict other wet montane forest areas where could live, but doesn t examined predictions of maxent on six of these (chosen by biologist) found all had characteristics well outside typical range for actual occurence localities e.g., four sites had July precipitation 5 standard deviations above mean

39 Summary maxent provides clean and effective fit to habitat modeling problem works with positive-only examples seems to perform well with limited number of examples provides smooth gradation from most to least habitable theoretical guarantees indicate can be used even with a very large number of features easy to interpret by human expert

40 Some of the Challenges sample bias geographic uncertainty historical data collected with a changing landscape how to check accuracy of model how to produce hard (presence or absence) predictions convergence rates for iterative algorithms analogous performance bounds for conditional maxent

Maximum Entropy and Species Distribution Modeling

Maximum Entropy and Species Distribution Modeling Maximum Entropy and Species Distribution Modeling Rob Schapire Steven Phillips Miro Dudík Also including work by or with: Rob Anderson, Jane Elith, Catherine Graham, Chris Raxworthy, NCEAS Working Group,...

More information

Performance Guarantees for Regularized Maximum Entropy Density Estimation

Performance Guarantees for Regularized Maximum Entropy Density Estimation Appearing in Proceedings of the 17th Annual Conference on Computational Learning Theory, 2004. Performance Guarantees for Regularized Maximum Entropy Density Estimation Miroslav Dudík 1, Steven J. Phillips

More information

Maximum entropy modeling of species geographic distributions. Steven Phillips. with Miro Dudik & Rob Schapire

Maximum entropy modeling of species geographic distributions. Steven Phillips. with Miro Dudik & Rob Schapire Maximum entropy modeling of species geographic distributions Steven Phillips with Miro Dudik & Rob Schapire Modeling species distributions Yellow-throated Vireo occurrence points environmental variables

More information

Correcting sample selection bias in maximum entropy density estimation

Correcting sample selection bias in maximum entropy density estimation To appear in Advances in Neural Information Processing Systems 18, 2006. Correcting sample selection bias in maximum entropy density estimation Miroslav Dudík, Robert E. Schapire Princeton University Department

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1 Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and

More information

Voting (Ensemble Methods)

Voting (Ensemble Methods) 1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers

More information

Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling

Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling Journal of Machine Learning Research 8 (007) 117-160 Submitted 4/06; Revised 4/07; Published 6/07 Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.

More information

Game Theory, On-line prediction and Boosting (Freund, Schapire)

Game Theory, On-line prediction and Boosting (Freund, Schapire) Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Incorporating Boosted Regression Trees into Ecological Latent Variable Models

Incorporating Boosted Regression Trees into Ecological Latent Variable Models Incorporating Boosted Regression Trees into Ecological Latent Variable Models Rebecca A. Hutchinson, Li-Ping Liu, Thomas G. Dietterich School of EECS, Oregon State University Motivation Species Distribution

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

Boosting: Algorithms and Applications

Boosting: Algorithms and Applications Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition

More information

Machine Learning, Fall 2011: Homework 5

Machine Learning, Fall 2011: Homework 5 0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to

More information

Ensembles. Léon Bottou COS 424 4/8/2010

Ensembles. Léon Bottou COS 424 4/8/2010 Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Boosting: Foundations and Algorithms. Rob Schapire

Boosting: Foundations and Algorithms. Rob Schapire Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Ecological Modelling

Ecological Modelling Ecological Modelling 222 (2011) 2796 2811 Contents lists available at ScienceDirect Ecological Modelling journa l h o me pa g e: www.elsevier.com/locate/ecolmodel Species-specific tuning increases robustness

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch . Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Announcements Kevin Jamieson

Announcements Kevin Jamieson Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Managing Uncertainty in Habitat Suitability Models. Jim Graham and Jake Nelson Oregon State University

Managing Uncertainty in Habitat Suitability Models. Jim Graham and Jake Nelson Oregon State University Managing Uncertainty in Habitat Suitability Models Jim Graham and Jake Nelson Oregon State University Habitat Suitability Modeling (HSM) Also known as: Ecological Niche Modeling (ENM) Species Distribution

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

A GIS Approach to Modeling Native American Influence on Camas Distribution:

A GIS Approach to Modeling Native American Influence on Camas Distribution: A GIS Approach to Modeling Native American Influence on Camas Distribution: Humans as Environmental Variable Braden Elliott Oregon State University 29 March 2013 Northwest Anthropological Conference Portland,

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

The Frank-Wolfe Algorithm:

The Frank-Wolfe Algorithm: The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Learning Theory Continued

Learning Theory Continued Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.

More information

Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62

Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from

More information

MODELLING AND UNDERSTANDING MULTI-TEMPORAL LAND USE CHANGES

MODELLING AND UNDERSTANDING MULTI-TEMPORAL LAND USE CHANGES MODELLING AND UNDERSTANDING MULTI-TEMPORAL LAND USE CHANGES Jianquan Cheng Department of Environmental & Geographical Sciences, Manchester Metropolitan University, John Dalton Building, Chester Street,

More information

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:

More information

Weighted Classification Cascades for Optimizing Discovery Significance

Weighted Classification Cascades for Optimizing Discovery Significance Weighted Classification Cascades for Optimizing Discovery Significance Lester Mackey Collaborators: Jordan Bryan and Man Yue Mo Stanford University December 13, 2014 Mackey (Stanford) Weighted Classification

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

1 Review of Winnow Algorithm

1 Review of Winnow Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which

More information

Optimal and Adaptive Algorithms for Online Boosting

Optimal and Adaptive Algorithms for Online Boosting Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University Jul 8, 2015 Boosting: An Example

More information

1 Review of the Perceptron Algorithm

1 Review of the Perceptron Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds

More information

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016 The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Learning Ensembles. 293S T. Yang. UCSB, 2017. Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information