# Introduction to Boosting and Joint Boosting

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Introduction to Boosting and Learning Systems Group, Caltech 2005/04/26, Presentation in EE150 Boosting and

2 Outline Introduction to Boosting 1 Introduction to Boosting Intuition of Boosting Adaptive Boosting (AdaBoost) 2 Boosting and

3 Apple Recognition Problem Intuition of Boosting Adaptive Boosting (AdaBoost) Is this a picture of an apple? We want to teach a class of 6 year olds. Gather photos from NY Apple Asso. and Google Image. Boosting and

4 Our Fruit Class Begins Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: How would you describe an apple? Michael? Michael: I think apples are circular. (Class): Apples are circular. Boosting and

5 Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Being circular is a good feature for the apples. However, if you only say circular, you could make several mistakes. What else can we say for an apple? Tina? Tina: It looks like apples are red. (Class): Apples are somewhat circular and somewhat red. Boosting and

6 Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Yes. Many apples are red. However, you could still make mistakes based on circular and red. Do you have any other suggestions, Joey? Joey: Apples could also be green. (Class): Apples are somewhat circular and somewhat red and possibly green. Boosting and

7 Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Yes. It seems that apples might be circular, red, green. But you may confuse them with tomatoes or peaches, right? Any more suggestions, Jessica? Jessica: Apples have stems at the top. (Class): Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top. Boosting and

8 Put Intuition to Practice Intuition of Boosting Adaptive Boosting (AdaBoost) Intuition Combine simple rules to approximate complex function. Emphasize incorrect data to focus on valuable information. AdaBoost Algorithm (Freund and Schapire 1997) Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, Learn a simple rule h t from emphasized training data. Get the confidence w t of such rule Emphasize the training data that do not agree with h t. Output: combined function H(x) = T t=1 w th t (x) with normalized w. Boosting and

9 Some More Details Introduction to Boosting Intuition of Boosting Adaptive Boosting (AdaBoost) AdaBoost Algorithm Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, Learn a simple rule h t from emphasized training data. How? Choose a h t H with minimum emphasized error. For example, H could be a set of decision stumps h θ,d,s (x) = s I[(x) d > θ]. Get the confidence w t of such rule How? An h t with lower error should get higher w t. Emphasize the training data that do not agree with h t. Output: combined function H(x) = T t=1 w th t (x) with normalized w. Let s see some demos. Boosting and

10 Why Boosting Works? Intuition of Boosting Adaptive Boosting (AdaBoost) Our intuition is correct. Provably, if each h t is better than a random guess (has error < 1/2), the combined function H(x) could make no error at all! Besides, boosting obtains large y i H(x i ) value on each data: H(x) could separate the data as clearly as possible. Boosting and

11 Multi-Class Boosting () Not very different from binary boosting. Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, For c = 1, 2,, C Learn a rule alone with confidence h t(x, c) from emphasized training data. Emphasize the training data that do not agree with h c,t. Output: combined function H(x, c) = T t=1 h t(x, c). Separate each class with the rest independently. Boosting and

12 Problem of Number of rules for good performance: O(C). For a budget of M rules, can only use M/C rules per class. For example, for fruits, many of the M rules (for apple, orange, tomato, etc.) would be it is circular. : waste of budget. The rules separate each class clearly: not contain mutual information between classes. For example, if we separate apples with other fruits, we have no idea that apples and tomatoes look similar. : each class resides in its own, budget-wasting rules. Boosting and

13 Introduction to Boosting Try to have joint rules. Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, For S {1, 2,, C} Learn a rule alone with confidence h t(x, S) using the classes in S combined together. Pick the rule h t (x, S t ) that achieves the best overall criteria. Emphasize the training data that do not agree with h t (x, S t ). Output: combined function H(x, c) = c S t h t (x, S t ). Separate a cluster of class jointly with the rest. Boosting and

14 Pros of A rule from a cluster of classes: meaningful and often stable. Number of rules for good performance: O(log C). Use the budget efficiently. Boosting and

15 Cons of The algorithm is very slow: S {1, 2,, C} is a loop of size 2 C. Replace the loop by a greedy search. Add the best single class to the cluster. Greedily combine a class to the cluster. Trace O(C 2 ) subsets instead of O(2 C ). Still slow in general, but could speed up when H is simple. For example, the regression stumps ai[(x) d > θ] + b. Boosting and

16 Experiment Framework Goal: detect 21 objects (13 indoor, 6 outdoor, 2 both) in the picture. Boosting and

17 Experiment Framework (Cont d) Extract feature with the following steps Scale the image by σ. Filter (by normalized correlation) with a patch g f. Mark the region to average response by a mask w f. Take the p-norm of average response in the region. Patches: small parts of the known objects randomly generated Example: a feature for the stem of an apple would be a patch (matched filter to stem) with mask at the top portion. Boosting and

18 Introduction to Boosting Experiment Results Similarity between combined classes (head and trash can). Boosting and

19 Experiment Results (Cont d) Save budgets for rules. Boosting and

20 Experiment Results (Cont d) Save needed data. Boosting and

21 Introduction to Boosting Experiment Results (Cont d) Simple rules are shared by more classes. Boosting and

22 Application: Multiview detection Multiview detection: usually consider each view as a class. Independent boosting: cannot allow too many classes (views). Views often share similar rules: joint boosting benefits. Boosting and

23 Result: Multiview detection Less false alarms in detection. Boosting and

24 Result: Multiview detection (Cont d) Significantly better ROC. Boosting and

25 Introduction to Boosting Boosting: reweight examples and combine rules. Independent boosting: separate each class with the rest independently. Joint boosting: find best joint cluster to separate with the rest. More complex algorithm. More meaningful and robust classifiers. Utility of joint boosting: When some of the classes share common rules: e.g. fruits. In multiview object detection: e.g. views of cars. Boosting and

### AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

### Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

### Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

### The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ

### Boosting: Foundations and Algorithms. Rob Schapire

Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you

### A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

### A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University

A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can

### Classification and Regression Trees

Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

### Classification: The rest of the story

U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

### Boos\$ng Can we make dumb learners smart?

Boos\$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type

### Does Unlabeled Data Help?

Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

### Linear Classifiers. Michael Collins. January 18, 2012

Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that

### Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

### PCA FACE RECOGNITION

PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal

### Least Squares Classification

Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

### Midterm Exam Solutions, Spring 2007

1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you

### Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

### Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.

### MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

### Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:

### Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

### Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

### ECE521 Lecture7. Logistic Regression

ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

### ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

### Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

### Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y

### Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

### Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

### CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

### 1 Handling of Continuous Attributes in C4.5. Algorithm

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

### Information Gain, Decision Trees and Boosting ML recitation 9 Feb 2006 by Jure

Information Gain, Decision Trees and Boosting 10-701 ML recitation 9 Feb 2006 by Jure Entropy and Information Grain Entropy & Bits You are watching a set of independent random sample of X X has 4 possible

### TDT 4173 Machine Learning and Case Based Reasoning. Helge Langseth og Agnar Aamodt. NTNU IDI Seksjon for intelligente systemer

TDT 4173 Machine Learning and Case Based Reasoning Lecture 6 Support Vector Machines. Ensemble Methods Helge Langseth og Agnar Aamodt NTNU IDI Seksjon for intelligente systemer Outline 1 Wrap-up from last

Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

### Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

### Midterm, Fall 2003

5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

### Useful Equations for solving SVM questions

Support Vector Machines (send corrections to yks at csail.mit.edu) In SVMs we are trying to find a decision boundary that maximizes the "margin" or the "width of the road" separating the positives from

### Monte Carlo Theory as an Explanation of Bagging and Boosting

Monte Carlo Theory as an Explanation of Bagging and Boosting Roberto Esposito Università di Torino C.so Svizzera 185, Torino, Italy esposito@di.unito.it Lorenza Saitta Università del Piemonte Orientale

### IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback

9/3/3 Admin Assignment 3: - how did it go? - do the experiments help? Assignment 4 IMBALANCED DATA Course feedback David Kauchak CS 45 Fall 3 Phishing 9/3/3 Setup Imbalanced data. for hour, google collects

### Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

### the tree till a class assignment is reached

Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

### Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Y-c Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,

### Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

+ Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:

### Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

### Learning Decision Trees

Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2

### Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

### Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

INST 737 April 1, 2013 Midterm Name: }{{} by writing my name I swear by the honor code Read all of the following information before starting the exam: For free response questions, show all work, clearly

### CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

### An Information-Theoretic Measure of Intrusion Detection Capability

An Information-Theoretic Measure of Intrusion Detection Capability Guofei Gu, Prahlad Fogla, David Dagon, Wenke Lee College of Computing, Georgia Institute of Technology, Atlanta GA 3332 {guofei,prahlad,dagon,wenke}@cc.gatech.edu

### Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

### 2D Image Processing Face Detection and Recognition

2D Image Processing Face Detection and Recognition Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de

### Generative v. Discriminative classifiers Intuition

Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

### ABC-Boost: Adaptive Base Class Boost for Multi-class Classification

ABC-Boost: Adaptive Base Class Boost for Multi-class Classification Ping Li Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA pingli@cornell.edu Abstract We propose -boost (adaptive

### Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

### 1 Using standard errors when comparing estimated values

MLPR Assignment Part : General comments Below are comments on some recurring issues I came across when marking the second part of the assignment, which I thought it would help to explain in more detail

### Learning from Examples

Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

### Mining Compositional Features for Boosting

Mining Compositional Features for Boosting Junsong Yuan EECS Dept., Northwestern Univ. Evanston, IL, USA j-yuan@northwestern.edu Jiebo Luo Kodak Research Labs Rochester, NY, USA jiebo.luo@kodak.com Ying

### Machine Learning Techniques

Machine Learning Techniques ( 機器學習技法 ) Lecture 15: Matrix Factorization Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

### Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

### Fonts and alphabets licensed from

Fonts and alphabets licensed from www.letteringdelights.com Pumpkins are a great way to teach and reinforce both Language Arts and Science standards. Hands on experiences are some of the best learning

### CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

### 15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation J. Zico Kolter Carnegie Mellon University Fall 2016 1 Outline Example: return to peak demand prediction

### Introduction to Machine Learning

Introduction to Machine Learning Concept Learning Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 /

### Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

### Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University

Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Modern applications: massive amounts of data distributed across multiple locations. Distributed

### Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,

### Least Mean Squares Regression

Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

### Introduction and Overview

1 Introduction and Overview How is it that a committee of blockheads can somehow arrive at highly reasoned decisions, despite the weak judgment of the individual members? How can the shaky separate views

### An Introduction to Boosting and Leveraging

An Introduction to Boosting and Leveraging Ron Meir 1 and Gunnar Rätsch 2 1 Department of Electrical Engineering, Technion, Haifa 32000, Israel rmeir@ee.technion.ac.il, http://www-ee.technion.ac.il/ rmeir

### Machine Learning Lecture 7

Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

### Numerical Learning Algorithms

Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

### CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore

### CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

### CS 6901 (Applied Algorithms) Lecture 3

CS 6901 (Applied Algorithms) Lecture 3 Antonina Kolokolova September 16, 2014 1 Representative problems: brief overview In this lecture we will look at several problems which, although look somewhat similar

### NON-TRIVIAL APPLICATIONS OF BOOSTING:

NON-TRIVIAL APPLICATIONS OF BOOSTING: REWEIGHTING DISTRIBUTIONS WITH BDT, BOOSTING TO UNIFORMITY Alex Rogozhnikov YSDA, NRU HSE Heavy Flavour Data Mining workshop, Zurich, 2016 1 BOOSTING a family of machine

### Loop Invariants and Binary Search. Chapter 4.4, 5.1

Loop Invariants and Binary Search Chapter 4.4, 5.1 Outline Iterative Algorithms, Assertions and Proofs of Correctness Binary Search: A Case Study Outline Iterative Algorithms, Assertions and Proofs of

### Adaptive Boosting of Neural Networks for Character Recognition

Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Yoshua Bengio Dept. Informatique et Recherche Opérationnelle Université de Montréal, Montreal, Qc H3C-3J7, Canada fschwenk,bengioyg@iro.umontreal.ca

### Multivariate Analysis Techniques in HEP

Multivariate Analysis Techniques in HEP Jan Therhaag IKTP Institutsseminar, Dresden, January 31 st 2013 Multivariate analysis in a nutshell Neural networks: Defeating the black box Boosted Decision Trees:

### Tutorial 2: Power and Sample Size for the Paired Sample t-test

Tutorial 2: Power and Sample Size for the Paired Sample t-test Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function of sample size, variability,

### Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response

### Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

### Atutorialonactivelearning

Atutorialonactivelearning Sanjoy Dasgupta 1 John Langford 2 UC San Diego 1 Yahoo Labs 2 Exploiting unlabeled data Alotofunlabeleddataisplentifulandcheap,eg. documents off the web speech samples images

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

### Decision Tree Learning

Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

### Active learning. Sanjoy Dasgupta. University of California, San Diego

Active learning Sanjoy Dasgupta University of California, San Diego Exploiting unlabeled data A lot of unlabeled data is plentiful and cheap, eg. documents off the web speech samples images and video But

### Introduction to Logistic Regression

Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

### Random Classification Noise Defeats All Convex Potential Boosters

Philip M. Long Google Rocco A. Servedio Computer Science Department, Columbia University Abstract A broad class of boosting algorithms can be interpreted as performing coordinate-wise gradient descent

### Final Exam, Machine Learning, Spring 2009

Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

### Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

### Conditional densities, mass functions, and expectations

Conditional densities, mass functions, and expectations Jason Swanson April 22, 27 1 Discrete random variables Suppose that X is a discrete random variable with range {x 1, x 2, x 3,...}, and that Y is

### Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

### From inductive inference to machine learning

From inductive inference to machine learning ADAPTED FROM AIMA SLIDES Russel&Norvig:Artificial Intelligence: a modern approach AIMA: Inductive inference AIMA: Inductive inference 1 Outline Bayesian inferences

### Fuzzy Systems. Introduction

Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christian Moewes {kruse,cmoewes}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge

### Problems, and How Computer Scientists Solve Them Manas Thakur

Problems, and How Computer Scientists Solve Them PACE Lab, IIT Madras Content Credits Introduction to Automata Theory, Languages, and Computation, 3rd edition. Hopcroft et al. Introduction to the Theory

### Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

### A brief review of basics of probabilities

brief review of basics of probabilities Milos Hauskrecht milos@pitt.edu 5329 Sennott Square robability theory Studies and describes random processes and their outcomes Random processes may result in multiple

### Machine Learning

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting