Machine Learning. Ilya Narsky, Caltech

Size: px
Start display at page:

Download "Machine Learning. Ilya Narsky, Caltech"

Transcription

1 Machie Learig Ilya Narsky, Caltech

2 Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set of biary problems.

3 Multi-class problems Various ad-hoc strategies ca be used to reduce a multi-class problem to a set of two-class problems Oe agaist Oe Oe agaist All This approach oly works if oe class clearly domiates. Not always the case. Example: oe-vs-oe strategy for 3 classes where red arrow stads for more likely tha A uified framework for reducig multi-class problems to biary Allwei, Schapire ad Siger, J. of Machie Learig Research (2) Alteratively, use a multi-class versio of your favorite classifier 3 2 Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 3

4 Multi-class Neural Network Extesio to multi-class case comes aturally: N(odes i output layer)n(classes) but we eed oly oe ode for 2 {,...,,,,...,} (i - th positio) k - th class is ecoded as y k The output of a eural et is a vector, {f k (x)}; k,,; f k (x). Evet is classified to category k if f k (x) is largest. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 4

5 Decisio trees 2: maximize Gii idex -p 2 -q 2 // p+q ay : maximize Gii idex A, B, C cross-etropy p*log(p)+q*log(q) // p+q cross-etropy e.g., A,B,C A 2,B 2,C 2 p A k A Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 5 k 2 p k p k log p k A + B + C subject to k p k Note that this has othig to do with biary vs multi-way splits for tree costructio. Oe ca use biary splits for multi-class problems ad multi-way splits for biary problems. Are multi-way splits a good idea? Depeds o who you ask.

6 Pealize some misclassificatios more tha others L ij pealty for misclassifyig class i as class j L kk Regular Gii idex k p 2 k i j p i p j Modified Gii idex: i, j L ij p i p j I HEP aalysis, we do t play this game. (Although it would be iterestig to try oe day!) But uequal misclassificatio pealties are ofte used i the statistics literature. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 6

7 2: Support Vector Machies T f ( x) w + w h( x) y sigal y backgroud mi [ y f ( x )] w N + + λ w 2 Lee, Li ad Wahba, Multicategory Support Vector Machies, J. of the America Statistical Associatio 99 (24) ay : k - th class is ecoded as y {,...,,,,..., } (i k - th positio) k k k ( ) { ( )} x f x f ( x) f k fk ( x) wk + hk ( x) L( y) is a pealty fuctio : L( y) {,...,,,,...,} ( i k - th positio) if y belogs to class k N T ( )[ ( ) ] + 2 mi L y f x y λ h + k h k Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 7

8 mi h Multi-class SVM (cotiued) N L If y is from class k: T ( y )[ ( ) ] + f x y λ + k f k (x ) ca be large because L k (y ) > does ot cotribute to the miimized loss f i (x ) for i k must be as small as possible, otherwise f i (x )>y (i) ad the loss is icreasig Caot make f k (x ) arbitrarily large ad f i (x ) arbitrarily small because of the pealty term which eforces smoothess of the solutio. h k 2 After traiig is completed, a evet is classified to category k if f k (x) is largest amog {f k (x)}. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 8

9 iteratio m : correctly classified evets : weight of AdaBoost (still 2 classes oly) ( m) m( x) : ε m w <.5 f misclassified evets : w w classifier m : β misclassified ( m) ( m) m 2 ( m) w 2( ε w 2ε ( m) m m ) ε log ε m m What do we eed to chage for multiple classes? Nothig! It is the same algorithm. Except For 2 classes, oe ca always build a classifier with ε</2. If you have a classifier with ε>/2, you ca flip its output. For multiple classes it is ot always possible. Freud ad Schapire, A Decisio-Theoretic Geeralizatio of O-Lie Learig ad a Applicatio to Boostig, J. of Computer ad System Scieces 55 (997) N, z true ε I( f ( x ) y ) I( z) N otherwise Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 9

10 AdaBoost.M (multiple classes) Solutio trai util ε /2 ad the abort. This algorithm is kow as AdaBoost.M. Ok, this is a solutio. But ca we come up with somethig better? (see ext slide) What is the output of multi-class AdaBoost? M F( x) βm m f m ( x) ( 2) M k m m m { ( k )( )} ( k ) F x : F ( x) β I f ( x) ( ( k ) y ) I plai Eglish, compute average iverse misclassificatio error over the built classifiers which predict class k for this poit. ad the choose the class which maximizes this quatity.

11 AdaBoost.MH: soft score vs hard score Suppose we built a decisio tree for 3 classes, A, B ad C. I oe of the termial odes we have N A, N B, ad N C traiig evets. Suppose N A >N B >N C. Cotributio to the overall classificatio error from this ode is ε(n B +N C )/N (N is the overall size of the traiig sample). Istead of returig a hard classificatio label (A, B or C), we ca retur a soft score, for example, f A N A /(N A +N B +N C ); same for f B ad f C. I geeral, we ca defie a fuctio, - <h(x,y)<, which represets cofidece that the true class label at x is y. What are the advatages of soft score agaist hard score? It ca be show that for ay dataset the classificatio error defied through soft score is guarateed to be /2. Cotiuous. More accurately reflects the amout of misclassificatio. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide

12 AdaBoost.MH: Yet aother trick I two-class versio of AdaBoost, we had oe weight per evet, a measure of how ofte the evet was assiged to the wrog category by the weak learers. But ow we have may wrog categories. ( x, A) ( x, B) ( x, C) 3 Example: For evet draw from class A it is easy to discrimiate class A from class B, but ot so easy to discrimiate class A from class C. Solutio: Itroduce oe weight for each class for each evet. (( x, A), + ), (( x, B), ), (( x, C), ) 2 (( x2, A), ), (( x2, B), + ), (( x2, C), ) (( x, A), ), (( x, B), ), (( x, C), + ) 3 We have trasformed a problem with 3 evets, 3 classes ad D iput variables ito a problem with 9 evets, 2 classes ad D+ iput variables. Now build a weak learer ad compute h(x,y); ya,b,c; for each traiig x usig, for example, decisio tree leaf purities. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 2 3 3

13 AdaBoost.MH full algorithm Schapire ad Siger, Improved Boostig Algorithms Usig Cofidece-rated Predictios, Machie Learig 37 (999) replace ( ) {( ) ( ( k ) x, y with k poits x, k, J y y )} ( k ) ( ( k ) ), y y J y y ( k ), y y () iteratio : w ( k) / ( N) ;,..., N; k,..., iteratio m: classifier with scorig fuctio h w ( m) ( ) ( ) [ ( ) ( )] ( m) ( k) ( k) k w k expβmj y, y hm x, y Zm AdaBoost respose : k Friedma, Hastie ad Tibshirai, Additive logistic regressio: A statistical view of boostig, The Aals of Statistics 28 (2): The AdaBoost.MH algorithm for a -class problem ca be effectively reduced to solvig problems, each class agaist the rest. m ( x, y) { ( k )( )} ( k )( ) ( ( k ) f x ) k : f x βmhm x, y M m Notatio: N traiig poits, classes, y (k) is the true class label for class k.

14 What have we leared? May biary classifiers have multi-class versios. For some classifiers (eural et, decisio tree), geeralizatio from biary to multi-class is obvious. For others (SVM, AdaBoost), less straightforward but doable. Multi-class algorithms are the subject of ogoig research. If you switch from grad school i physics to grad school i machie learig (ot that I ecourage you to), you may very well ivet oe. But let us look at this through the eyes of a practitioer. Eve if you kow how to do a multi-class versio of your favorite algorithm, do you have meas to do so? Do you have a piece of software? Is it easy to write oe? What ca you do if the aswer to both questios above is o? Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 4

15 Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 5 Reduce multi-class problem to a set of two-class problems usig a iteractio matrix For example, problem with 4 classes: Iteractio matrix *C matrix for classes ad C biary classifiers Allwei, Schapire ad Siger, Reducig Multiclass to Biary: A Uifyig Approach to Margi Classifiers, J. of Machie Learig Research (2) Μ ONE-VS-ALL Μ ONE-VS-ONE Reducig multi-class to biary: A Uified Approach C ( ) 2 C

16 Oe-vs-oe, oe-vs-all what else? Ca implemet ay o-stadard strategy. For example, two sigals with similar sigatures ad backgroud. We wat to separate both sigals from backgroud ad the separate them from each other. classifiers Μ backgroud sigal sigal 2 classes Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 6

17 How do you classify ew evets? For each evet, you get C umbers, oe for each colum of the iteractio matrix. Compute user-defied loss (quadratic, expoetial etc) for each row of the iteractio matrix For example, compute average quadratic error E k C c C ( Μ f ( x) ) c ; k,..., kc ad assig evet X to the class k which gives the miimal quadratic error E k. 2 Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 7

18 What about predictive power? It is ot clear if true multi-class algorithms offer ay advatage over multiclass-through-biary methods. Example: multiclass SVM by Lee, Li ad Wahba multiclass SVM Oe vs Rest earest eighbor Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 8

19 Lecture 5 Variable selectio V Geetic algorithms X StatPatterRecogitio X Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 9

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

MA131 - Analysis 1. Workbook 3 Sequences II

MA131 - Analysis 1. Workbook 3 Sequences II MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................

More information

CS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning

CS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning Lecture 22 Cocept learig Milos Hauskrecht milos@cs.pitt.edu 5329 Seott Square Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Recurrence Relations

Recurrence Relations Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning Lecture 3 Cocept learig Milos Hauskrecht milos@cs.pitt.edu Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm Probably approximately

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

EEO 401 Digital Signal Processing Prof. Mark Fowler

EEO 401 Digital Signal Processing Prof. Mark Fowler EEO 40 Digital Sigal Processig Prof. Mark Fowler Note Set #3 Covolutio & Impulse Respose Review Readig Assigmet: Sect. 2.3 of Proakis & Maolakis / Covolutio for LTI D-T systems We are tryig to fid y(t)

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

FIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser

FIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser FIR Filters Lecture #7 Chapter 5 8 What Is this Course All About? To Gai a Appreciatio of the Various Types of Sigals ad Systems To Aalyze The Various Types of Systems To Lear the Skills ad Tools eeded

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Multilayer perceptrons

Multilayer perceptrons Multilayer perceptros If traiig set is ot liearly separable, a etwork of McCulloch-Pitts uits ca give a solutio If o loop exists i etwork, called a feedforward etwork (else, recurret etwork) A two-layer

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7 Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Today s Topic Machie Learig Lecture 10 Neural Networks 26.11.2018 Bastia Leibe RWTH Aache http://www.visio.rwth-aache.de leibe@visio.rwth-aache.de Deep Learig 2 Course Outlie Recap: AdaBoost Adaptive Boostig

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Overview. Structured learning for feature selection and prediction. Motivation for feature selection. Outline. Part III:

Overview. Structured learning for feature selection and prediction. Motivation for feature selection. Outline. Part III: Overview Structured learig for feature selectio ad predictio Yookyug Lee Departmet of Statistics The Ohio State Uiversity Part I: Itroductio to Kerel methods Part II: Learig with Reproducig Kerel Hilbert

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem Optimizatio Methods: Liear Programmig Applicatios Assigmet Problem Itroductio Module 4 Lecture Notes 3 Assigmet Problem I the previous lecture, we discussed about oe of the bech mark problems called trasportatio

More information

Lecture 23 Rearrangement Inequality

Lecture 23 Rearrangement Inequality Lecture 23 Rearragemet Iequality Holde Lee 6/4/ The Iequalities We start with a example Suppose there are four boxes cotaiig $0, $20, $50 ad $00 bills, respectively You may take 2 bills from oe box, 3

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Pixel Recurrent Neural Networks

Pixel Recurrent Neural Networks Pixel Recurret Neural Networks Aa ro va de Oord, Nal Kalchbreer, Koray Kavukcuoglu Google DeepMid August 2016 Preseter - Neha M Example problem (completig a image) Give the first half of the image, create

More information

Selective Prediction

Selective Prediction COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability

More information

Addition: Property Name Property Description Examples. a+b = b+a. a+(b+c) = (a+b)+c

Addition: Property Name Property Description Examples. a+b = b+a. a+(b+c) = (a+b)+c Notes for March 31 Fields: A field is a set of umbers with two (biary) operatios (usually called additio [+] ad multiplicatio [ ]) such that the followig properties hold: Additio: Name Descriptio Commutativity

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Divide & Conquer. Divide-and-conquer algorithms. Conventional product of polynomials. Conventional product of polynomials.

Divide & Conquer. Divide-and-conquer algorithms. Conventional product of polynomials. Conventional product of polynomials. Divide-ad-coquer algorithms Divide & Coquer Strategy: Divide the problem ito smaller subproblems of the same type of problem Solve the subproblems recursively Combie the aswers to solve the origial problem

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Signals & Systems Chapter3

Signals & Systems Chapter3 Sigals & Systems Chapter3 1.2 Discrete-Time (D-T) Sigals Electroic systems do most of the processig of a sigal usig a computer. A computer ca t directly process a C-T sigal but istead eeds a stream of

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min) Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web

More information

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0. THE SOLUTION OF NONLINEAR EQUATIONS f( ) = 0. Noliear Equatio Solvers Bracketig. Graphical. Aalytical Ope Methods Bisectio False Positio (Regula-Falsi) Fied poit iteratio Newto Raphso Secat The root of

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic

More information

Lecture 2 Clustering Part II

Lecture 2 Clustering Part II COMS 4995: Usupervised Learig (Summer 8) May 24, 208 Lecture 2 Clusterig Part II Istructor: Nakul Verma Scribes: Jie Li, Yadi Rozov Today, we will be talkig about the hardess results for k-meas. More specifically,

More information

Classification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2

Classification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2 Data Miig CS 341, Sprig 2007 Lecture 8: Decisio tree algorithms Jackkife Estimator: Example 1 Estimate of mea for X={x 1, x 2, x 3,}, =3, g=3, m=1, θ = µ = (x( 1 + x 2 + x 3 )/3 θ 1 = (x( 2 + x 3 )/2,

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Lecture 3: Asymptotic Analysis + Recurrences

Lecture 3: Asymptotic Analysis + Recurrences Lecture 3: Asymptotic Aalysis + Recurreces Data Structures ad Algorithms CSE 373 SU 18 BEN JONES 1 Warmup Write a model ad fid Big-O for (it i = 0; i < ; i++) { for (it j = 0; j < i; j++) { System.out.pritl(

More information

STATISTICS 593C: Spring, Model Selection and Regularization

STATISTICS 593C: Spring, Model Selection and Regularization STATISTICS 593C: Sprig, 27 Model Selectio ad Regularizatio Jo A. Weller Lecture 2 (March 29): Geeral Notatio ad Some Examples Here is some otatio ad termiology that I will try to use (more or less) systematically

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty Bayes Classificatio Ucertaity & robability Baye's rule Choosig Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's cocept learig Maximum Likelihood of real valued fuctio Bayes optimal Classifier

More information

Lecture 11: Pseudorandom functions

Lecture 11: Pseudorandom functions COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A. Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix

More information

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression. Lecture 04 Machie Learig: Logistic Regressio Razva C. Buescu School of Electrical Egieerig ad Computer Sciece buescu@ohio.edu Supervised Learig ask = lear a uko fuctio t : X that maps iput istaces x Î X to output

More information

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10 Perceptro Ier-product scalar Perceptro Perceptro learig rule XOR problem liear separable patters Gradiet descet Stochastic Approximatio to gradiet descet LMS Adalie 1 Ier-product et =< w, x >= w x cos(θ)

More information

Markscheme May 2015 Calculus Higher level Paper 3

Markscheme May 2015 Calculus Higher level Paper 3 M5/5/MATHL/HP3/ENG/TZ0/SE/M Markscheme May 05 Calculus Higher level Paper 3 pages M5/5/MATHL/HP3/ENG/TZ0/SE/M This markscheme is the property of the Iteratioal Baccalaureate ad must ot be reproduced or

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

A Framework for Analyzing Skew in Evaluation Metrics

A Framework for Analyzing Skew in Evaluation Metrics A Framework for Aalyzig Skew i Evaluatio Metrics Alexader Liu Joydeep Ghosh Cheryl Marti Departmet of Electrical & Computer Egieerig Uiversity of Texas at Austi, Austi, TX 78712, USA. {aliu ghosh}@ece.utexas.edu

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

CS 5150/6150: Assignment 1 Due: Sep 23, 2010

CS 5150/6150: Assignment 1 Due: Sep 23, 2010 CS 5150/6150: Assigmet 1 Due: Sep 23, 2010 Wei Liu September 24, 2010 Q1: (1) Usig master theorem: a = 7, b = 4, f() = O(). Because f() = log b a ε holds whe ε = log b a = log 4 7, we ca apply the first

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Sigma notation. 2.1 Introduction

Sigma notation. 2.1 Introduction Sigma otatio. Itroductio We use sigma otatio to idicate the summatio process whe we have several (or ifiitely may) terms to add up. You may have see sigma otatio i earlier courses. It is used to idicate

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

4. Linear Classification. Kai Yu

4. Linear Classification. Kai Yu 4. Liear Classificatio Kai Y Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2 Liear Classifiers A simplest classificatio model

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to

More information

CS583 Lecture 02. Jana Kosecka. some materials here are based on E. Demaine, D. Luebke slides

CS583 Lecture 02. Jana Kosecka. some materials here are based on E. Demaine, D. Luebke slides CS583 Lecture 02 Jaa Kosecka some materials here are based o E. Demaie, D. Luebke slides Previously Sample algorithms Exact ruig time, pseudo-code Approximate ruig time Worst case aalysis Best case aalysis

More information

Model of Computation and Runtime Analysis

Model of Computation and Runtime Analysis Model of Computatio ad Rutime Aalysis Model of Computatio Model of Computatio Specifies Set of operatios Cost of operatios (ot ecessarily time) Examples Turig Machie Radom Access Machie (RAM) PRAM Map

More information

CSE 4095/5095 Topics in Big Data Analytics Spring 2017; Homework 1 Solutions

CSE 4095/5095 Topics in Big Data Analytics Spring 2017; Homework 1 Solutions CSE 09/09 Topics i ig Data Aalytics Sprig 2017; Homework 1 Solutios Note: Solutios to problems,, ad 6 are due to Marius Nicolae. 1. Cosider the followig algorithm: for i := 1 to α log e do Pick a radom

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information