A Maximum Entropy Approach to Species Distribution Modeling. Rob Schapire Steven Phillips Miro Dudík Rob Anderson.
|
|
- Raymond Wood
- 5 years ago
- Views:
Transcription
1 A Maximum Entropy Approach to Species Distribution Modeling Rob Schapire Steven Phillips Miro Dudík Rob Anderson schapire
2 The Problem: Species Habitat Modeling goal: model distribution of plant or animal species given: occurrence records given: environmental variables precipitation wet days average temperature desired output: map of range
3 Issues no negative examples very limited data sample bias data often collected over a range of years or decades geographic uncertainty realized versus potential range hard versus soft predictions interpretability of constructed model
4 Our Approach assume recorded localities come from probability distribution π try to estimate π apply maximum entropy approach
5 The Maxent Set-up given: samples x i X, x i π features (base functions) f j : X R output: ˆπ = estimate of π, in terms of features
6 Maxent and Habitat Modeling x i = recorded locality X = all localities (discretized) π = distribution of localities inhabited by species assumes locality chosen at random from entire range ignores sample bias and dependence between samples features: use raw environmental variables v k or derived functions linear: f j = v k quadratic: f j = vk 2 product: f j = v k v l 1 if v threshold: f j = k a 0 else # features can become very large (even infinite)
7 This Talk maxent with relaxed constraints new performance guarantees for maxent useful even with very large number of features algorithm and convergence lots of experiments with habitat modeling data
8 The Abstract Framework given: x 1,...,x m X x i π features f 1,...,f n f j : X [0, 1] goal: find ˆπ = estimate of π let π = empirical distribution i.e., π(x) =#{i : x i = x}/m notation: π[f] =expectation of f with respect to π (so π[f] =empirical average of f)
9 Maxent π typically very poor estimate of π but: π[f j ] likely to be reasonable estimate of π[f j ] so: choose distribution ˆπ such that ˆπ[f j ]= π[f j ] for all features f j among these, choose one closest to uniform, i.e., of maximum entropy [Jaynes]
10 Maxent Can Badly Overfit say X = {1,...,n}, n large 1 if x j features: f j (x) = 0 else can show only distribution satisfying constraints is π clear overfitting
11 A More Relaxed Version generally, only expect π[f j ] π[f j ] usually, can estimate upper bound on π[f j ] π[f j ] so: compute ˆπ to maximize H(ˆπ) (= entropy) subject to j : π[f j ] ˆπ[f j ] β j where β j = known upper bound [Kazama & Tsujii]
12 Duality in unrelaxed case, solution is maximum likelihood Gibbs distribution Gibbs distribution: q λ (x) = exp j λ jf j (x) can show solution in relaxed case is Gibbs distribution q λ that minimizes: 1 m ln q λ(x i ) + β j λ j i j }{{} negative log likelihood Z λ }{{} regularization
13 Equivalent Motivations maxent with relaxed constraints log loss with regularization MAP estimate with Laplace prior on weights λ
14 How Good Is Maxent Estimate? would like to bound distance between ˆπ and π bound should be in terms of: # of examples m distance between π and best Gibbs distribution π = q λ complexity of features smoothness of π measure distance using relative entropy RE(p q) =p[ln(p/q)]
15 Bounds for Finite Feature Classes with high probability, for all λ RE(π ˆπ) RE(π π )+O (for choice of β j based only on n and m) λ 1 ln n m e.g., if π best Gibbs distribution with λ 1 K then RE(π ˆπ) RE(π π )+O very moderate in number of features K ln n m
16 Bounds for Infinite Binary Feature Classes assume binary features with VC-dimension d then with high probability, for all λ : RE(π ˆπ) RE(π π )+Õ (for choice of β j based only on d and m) λ 1 d m
17 Main Theorem assume j : π[f j ] π[f j ] β j then RE(π ˆπ) RE(π π )+2 j β j λ j previous results are simple corollaries using standard uniform convergence results
18 Proof of Main Theorem let LL(p q) =p[ ln q] =RE(p q)+h(p) want: LL(π ˆπ) LL(π π )+2 β j λ j j claim: for all λ LL(π q λ ) LL( π q λ ) β j λ j j proof: LL(π q λ ) LL( π q λ ) = π [ λ j f j +lnz λ ] π [ λ j f j +lnz λ ] = λ j ( π[f j ] π[f j ]) β j λ j
19 Proof (cont.) let ˆπ = qˆλ then LL(π ˆπ) = LL(π qˆλ ) LL( π qˆλ )+ β j ˆλ j LL( π q λ )+ β j λ j LL(π q λ )+2 β j λ j = LL(π π )+2 β j λ j
20 Finding an Algorithm want to minimize L(λ) = 1 m i ln q λ(x i )+ j β j λ j no analytical solution instead, iteratively compute λ 1, λ 2,...so that L(λ t ) converges to minimum most algorithms for maxent update all weights λ j simultaneously less practical when very large number of features
21 Sequential-update Algorithm instead update just one weight at a time leads to sparser solution sometimes can search for best weight to update very efficiently analogous to boosting weak learner acts as oracle for choosing function (weak classifier) from large space
22 Idea for Algorithm say add δ to λ j new weight vector λ derive bound F such that L(λ ) L(λ) F (λ,j,δ) choose j, δ to minimize F repeat in our case, can choose F = δ π[f j ]+ln(1+(e δ 1)q λ [f j ]) + β j ( λ j + δ λ j ) possible values for minimizing δ can be found analytically in practice, combine with approximate line search to speed convergence
23 Convergence have that L(λ t+1 ) L(λ t ) min j,δ F (λ t,j,δ)=g(λ t ) enough to show: G(λ) 0 G continuous G(λ) =0 λ satisfies KKT conditions for optimization problem
24 Goals of Experiments does maxent work for this problem? what features work best? how many examples are enough? how to choose β j s? how interpretable are results? what is a biologist s qualitative assessment of results?
25 Breeding Bird Experiments used data from North American Breeding Bird Survey huge dataset with many species we chose 12 ranged from 78 to 2,773 occurence records used seven publicly available environmental variables basic approach: train with half sample points (randomly selected) test on rest repeat 10 times
26 Using Maxent features: linear, quadratic, product, threshold used β j = β σ[f j] m
27 GARP [Stockwell & Noble; Stockwell & Peters] compared to Genetic Algorithm for Ruleset Prediction (GARP) widely used by biologists just about only method that uses positive-only data outputs binary predictions commonly combined with ensemble-like technique ( best subsets ) integer-valued predictions in {0, 1,...,10}
28 How to Compare? maxent outputs probabilities treat each algorithm as providing ranking of localities from most to least habitable compare using ROC curve and area under curve (AUC) AUC = probability randomly chosen positive and negative examples are ranked correctly
29 ROC Curves true positive rate true positive rate Yellow throated Vireo Loggerhead Shrike false positive rate LQP T GARP L LQ false positive rate LQP T GARP L LQ
30 AUC Comparisons maxent LQP maxent T GARP GARP
31 Classification Accuracy maxent threshold chosen to predict same size habitat area as GARP then compute omission rate (fraction of test localities omitted) maxent LQP maxent T GARP GARP
32 How Many Examples Are Enough? (AUC) area under ROC curve (AUC) Gray Vireo Hutton s Vireo Plumbeous V. number of training examples (m) Blue h. Vireo Yellow th. V. Loggerh. Shrike LQP, β=0.1 T, β=1.0 LQ, β=0.1 L, β=0.1 GARP, all available training data
33 How Many Examples Are Enough? (log loss) log loss on test examples Gray Vireo Hutton s Vireo Plumbeous V. number of training examples (m) Blue h. Vireo Yellow th. V. Loggerh. Shrike LQP, β=0.1 LQ, β=0.1 T, β=1.0 L, β=0.1
34 Sensitivity to Regularization Parameter (AUC) Threshold features area under ROC curve (AUC) Hutton s Vireo.75 Blue h. Vireo.75 Yellow th. V Linear, quadratic and product features Hutton s Vireo Blue h. Vireo Yellow th. V. regularization value (β) m=10 m=32 m=100 m=316
35 Sensitivity to Regularization Parameter (log loss) log loss on test examples Threshold features Hutton s Vireo Blue h. Vireo Yellow th. V Linear, quadratic and product features regularization value (β) Hutton s Vireo Blue h. Vireo Yellow th. V m=10 m=32 m=100 m=316
36 Interpreting Maxent Models recall q λ (x) exp( j λ jf j (x)) for each environmental variable, can plot total contribution to exponent aspect elevation slope additive contribution to exponent precipitation avg. temperature no. wet days value of environmental variable threshold, β=1.0 linear+quadratic, β=0.1 threshold, β=0.01
37 Case Study: Microryzomys minutus
38 Case Study (cont.) accurately captures realized range did not predict other wet montane forest areas where could live, but doesn t examined predictions of maxent on six of these (chosen by biologist) found all had characteristics well outside typical range for actual occurence localities e.g., four sites had July precipitation 5 standard deviations above mean
39 Summary maxent provides clean and effective fit to habitat modeling problem works with positive-only examples seems to perform well with limited number of examples provides smooth gradation from most to least habitable theoretical guarantees indicate can be used even with a very large number of features easy to interpret by human expert
40 Some of the Challenges sample bias geographic uncertainty historical data collected with a changing landscape how to check accuracy of model how to produce hard (presence or absence) predictions convergence rates for iterative algorithms analogous performance bounds for conditional maxent
Maximum Entropy and Species Distribution Modeling
Maximum Entropy and Species Distribution Modeling Rob Schapire Steven Phillips Miro Dudík Also including work by or with: Rob Anderson, Jane Elith, Catherine Graham, Chris Raxworthy, NCEAS Working Group,...
More informationPerformance Guarantees for Regularized Maximum Entropy Density Estimation
Appearing in Proceedings of the 17th Annual Conference on Computational Learning Theory, 2004. Performance Guarantees for Regularized Maximum Entropy Density Estimation Miroslav Dudík 1, Steven J. Phillips
More informationMaximum entropy modeling of species geographic distributions. Steven Phillips. with Miro Dudik & Rob Schapire
Maximum entropy modeling of species geographic distributions Steven Phillips with Miro Dudik & Rob Schapire Modeling species distributions Yellow-throated Vireo occurrence points environmental variables
More informationCorrecting sample selection bias in maximum entropy density estimation
To appear in Advances in Neural Information Processing Systems 18, 2006. Correcting sample selection bias in maximum entropy density estimation Miroslav Dudík, Robert E. Schapire Princeton University Department
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationLearning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1
Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationMaximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling
Journal of Machine Learning Research 8 (007) 117-160 Submitted 4/06; Revised 4/07; Published 6/07 Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationIncorporating Boosted Regression Trees into Ecological Latent Variable Models
Incorporating Boosted Regression Trees into Ecological Latent Variable Models Rebecca A. Hutchinson, Li-Ping Liu, Thomas G. Dietterich School of EECS, Oregon State University Motivation Species Distribution
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationFINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE
FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.
More informationBoosting: Algorithms and Applications
Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition
More informationMachine Learning, Fall 2011: Homework 5
0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationEcological Modelling
Ecological Modelling 222 (2011) 2796 2811 Contents lists available at ScienceDirect Ecological Modelling journa l h o me pa g e: www.elsevier.com/locate/ecolmodel Species-specific tuning increases robustness
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationBoosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationAnnouncements Kevin Jamieson
Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationManaging Uncertainty in Habitat Suitability Models. Jim Graham and Jake Nelson Oregon State University
Managing Uncertainty in Habitat Suitability Models Jim Graham and Jake Nelson Oregon State University Habitat Suitability Modeling (HSM) Also known as: Ecological Niche Modeling (ENM) Species Distribution
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationA GIS Approach to Modeling Native American Influence on Camas Distribution:
A GIS Approach to Modeling Native American Influence on Camas Distribution: Humans as Environmental Variable Braden Elliott Oregon State University 29 March 2013 Northwest Anthropological Conference Portland,
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationManfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62
Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from
More informationMODELLING AND UNDERSTANDING MULTI-TEMPORAL LAND USE CHANGES
MODELLING AND UNDERSTANDING MULTI-TEMPORAL LAND USE CHANGES Jianquan Cheng Department of Environmental & Geographical Sciences, Manchester Metropolitan University, John Dalton Building, Chester Street,
More informationChapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees
CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:
More informationWeighted Classification Cascades for Optimizing Discovery Significance
Weighted Classification Cascades for Optimizing Discovery Significance Lester Mackey Collaborators: Jordan Bryan and Man Yue Mo Stanford University December 13, 2014 Mackey (Stanford) Weighted Classification
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More information1 Review of Winnow Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationOptimal and Adaptive Algorithms for Online Boosting
Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University Jul 8, 2015 Boosting: An Example
More information1 Review of the Perceptron Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More information