Machine Learning. Ilya Narsky, Caltech
|
|
- Clifton Rafe Sparks
- 5 years ago
- Views:
Transcription
1 Machie Learig Ilya Narsky, Caltech
2 Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set of biary problems.
3 Multi-class problems Various ad-hoc strategies ca be used to reduce a multi-class problem to a set of two-class problems Oe agaist Oe Oe agaist All This approach oly works if oe class clearly domiates. Not always the case. Example: oe-vs-oe strategy for 3 classes where red arrow stads for more likely tha A uified framework for reducig multi-class problems to biary Allwei, Schapire ad Siger, J. of Machie Learig Research (2) Alteratively, use a multi-class versio of your favorite classifier 3 2 Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 3
4 Multi-class Neural Network Extesio to multi-class case comes aturally: N(odes i output layer)n(classes) but we eed oly oe ode for 2 {,...,,,,...,} (i - th positio) k - th class is ecoded as y k The output of a eural et is a vector, {f k (x)}; k,,; f k (x). Evet is classified to category k if f k (x) is largest. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 4
5 Decisio trees 2: maximize Gii idex -p 2 -q 2 // p+q ay : maximize Gii idex A, B, C cross-etropy p*log(p)+q*log(q) // p+q cross-etropy e.g., A,B,C A 2,B 2,C 2 p A k A Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 5 k 2 p k p k log p k A + B + C subject to k p k Note that this has othig to do with biary vs multi-way splits for tree costructio. Oe ca use biary splits for multi-class problems ad multi-way splits for biary problems. Are multi-way splits a good idea? Depeds o who you ask.
6 Pealize some misclassificatios more tha others L ij pealty for misclassifyig class i as class j L kk Regular Gii idex k p 2 k i j p i p j Modified Gii idex: i, j L ij p i p j I HEP aalysis, we do t play this game. (Although it would be iterestig to try oe day!) But uequal misclassificatio pealties are ofte used i the statistics literature. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 6
7 2: Support Vector Machies T f ( x) w + w h( x) y sigal y backgroud mi [ y f ( x )] w N + + λ w 2 Lee, Li ad Wahba, Multicategory Support Vector Machies, J. of the America Statistical Associatio 99 (24) ay : k - th class is ecoded as y {,...,,,,..., } (i k - th positio) k k k ( ) { ( )} x f x f ( x) f k fk ( x) wk + hk ( x) L( y) is a pealty fuctio : L( y) {,...,,,,...,} ( i k - th positio) if y belogs to class k N T ( )[ ( ) ] + 2 mi L y f x y λ h + k h k Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 7
8 mi h Multi-class SVM (cotiued) N L If y is from class k: T ( y )[ ( ) ] + f x y λ + k f k (x ) ca be large because L k (y ) > does ot cotribute to the miimized loss f i (x ) for i k must be as small as possible, otherwise f i (x )>y (i) ad the loss is icreasig Caot make f k (x ) arbitrarily large ad f i (x ) arbitrarily small because of the pealty term which eforces smoothess of the solutio. h k 2 After traiig is completed, a evet is classified to category k if f k (x) is largest amog {f k (x)}. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 8
9 iteratio m : correctly classified evets : weight of AdaBoost (still 2 classes oly) ( m) m( x) : ε m w <.5 f misclassified evets : w w classifier m : β misclassified ( m) ( m) m 2 ( m) w 2( ε w 2ε ( m) m m ) ε log ε m m What do we eed to chage for multiple classes? Nothig! It is the same algorithm. Except For 2 classes, oe ca always build a classifier with ε</2. If you have a classifier with ε>/2, you ca flip its output. For multiple classes it is ot always possible. Freud ad Schapire, A Decisio-Theoretic Geeralizatio of O-Lie Learig ad a Applicatio to Boostig, J. of Computer ad System Scieces 55 (997) N, z true ε I( f ( x ) y ) I( z) N otherwise Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 9
10 AdaBoost.M (multiple classes) Solutio trai util ε /2 ad the abort. This algorithm is kow as AdaBoost.M. Ok, this is a solutio. But ca we come up with somethig better? (see ext slide) What is the output of multi-class AdaBoost? M F( x) βm m f m ( x) ( 2) M k m m m { ( k )( )} ( k ) F x : F ( x) β I f ( x) ( ( k ) y ) I plai Eglish, compute average iverse misclassificatio error over the built classifiers which predict class k for this poit. ad the choose the class which maximizes this quatity.
11 AdaBoost.MH: soft score vs hard score Suppose we built a decisio tree for 3 classes, A, B ad C. I oe of the termial odes we have N A, N B, ad N C traiig evets. Suppose N A >N B >N C. Cotributio to the overall classificatio error from this ode is ε(n B +N C )/N (N is the overall size of the traiig sample). Istead of returig a hard classificatio label (A, B or C), we ca retur a soft score, for example, f A N A /(N A +N B +N C ); same for f B ad f C. I geeral, we ca defie a fuctio, - <h(x,y)<, which represets cofidece that the true class label at x is y. What are the advatages of soft score agaist hard score? It ca be show that for ay dataset the classificatio error defied through soft score is guarateed to be /2. Cotiuous. More accurately reflects the amout of misclassificatio. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide
12 AdaBoost.MH: Yet aother trick I two-class versio of AdaBoost, we had oe weight per evet, a measure of how ofte the evet was assiged to the wrog category by the weak learers. But ow we have may wrog categories. ( x, A) ( x, B) ( x, C) 3 Example: For evet draw from class A it is easy to discrimiate class A from class B, but ot so easy to discrimiate class A from class C. Solutio: Itroduce oe weight for each class for each evet. (( x, A), + ), (( x, B), ), (( x, C), ) 2 (( x2, A), ), (( x2, B), + ), (( x2, C), ) (( x, A), ), (( x, B), ), (( x, C), + ) 3 We have trasformed a problem with 3 evets, 3 classes ad D iput variables ito a problem with 9 evets, 2 classes ad D+ iput variables. Now build a weak learer ad compute h(x,y); ya,b,c; for each traiig x usig, for example, decisio tree leaf purities. Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 2 3 3
13 AdaBoost.MH full algorithm Schapire ad Siger, Improved Boostig Algorithms Usig Cofidece-rated Predictios, Machie Learig 37 (999) replace ( ) {( ) ( ( k ) x, y with k poits x, k, J y y )} ( k ) ( ( k ) ), y y J y y ( k ), y y () iteratio : w ( k) / ( N) ;,..., N; k,..., iteratio m: classifier with scorig fuctio h w ( m) ( ) ( ) [ ( ) ( )] ( m) ( k) ( k) k w k expβmj y, y hm x, y Zm AdaBoost respose : k Friedma, Hastie ad Tibshirai, Additive logistic regressio: A statistical view of boostig, The Aals of Statistics 28 (2): The AdaBoost.MH algorithm for a -class problem ca be effectively reduced to solvig problems, each class agaist the rest. m ( x, y) { ( k )( )} ( k )( ) ( ( k ) f x ) k : f x βmhm x, y M m Notatio: N traiig poits, classes, y (k) is the true class label for class k.
14 What have we leared? May biary classifiers have multi-class versios. For some classifiers (eural et, decisio tree), geeralizatio from biary to multi-class is obvious. For others (SVM, AdaBoost), less straightforward but doable. Multi-class algorithms are the subject of ogoig research. If you switch from grad school i physics to grad school i machie learig (ot that I ecourage you to), you may very well ivet oe. But let us look at this through the eyes of a practitioer. Eve if you kow how to do a multi-class versio of your favorite algorithm, do you have meas to do so? Do you have a piece of software? Is it easy to write oe? What ca you do if the aswer to both questios above is o? Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 4
15 Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 5 Reduce multi-class problem to a set of two-class problems usig a iteractio matrix For example, problem with 4 classes: Iteractio matrix *C matrix for classes ad C biary classifiers Allwei, Schapire ad Siger, Reducig Multiclass to Biary: A Uifyig Approach to Margi Classifiers, J. of Machie Learig Research (2) Μ ONE-VS-ALL Μ ONE-VS-ONE Reducig multi-class to biary: A Uified Approach C ( ) 2 C
16 Oe-vs-oe, oe-vs-all what else? Ca implemet ay o-stadard strategy. For example, two sigals with similar sigatures ad backgroud. We wat to separate both sigals from backgroud ad the separate them from each other. classifiers Μ backgroud sigal sigal 2 classes Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 6
17 How do you classify ew evets? For each evet, you get C umbers, oe for each colum of the iteractio matrix. Compute user-defied loss (quadratic, expoetial etc) for each row of the iteractio matrix For example, compute average quadratic error E k C c C ( Μ f ( x) ) c ; k,..., kc ad assig evet X to the class k which gives the miimal quadratic error E k. 2 Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 7
18 What about predictive power? It is ot clear if true multi-class algorithms offer ay advatage over multiclass-through-biary methods. Example: multiclass SVM by Lee, Li ad Wahba multiclass SVM Oe vs Rest earest eighbor Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 8
19 Lecture 5 Variable selectio V Geetic algorithms X StatPatterRecogitio X Ilya Narsky SLUO Statistics Lectures, August 26 Lecture 4 Slide 9
10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio
More informationPattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More informationCS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning
Lecture 22 Cocept learig Milos Hauskrecht milos@cs.pitt.edu 5329 Seott Square Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationRecurrence Relations
Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More informationCS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning
Lecture 3 Cocept learig Milos Hauskrecht milos@cs.pitt.edu Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm Probably approximately
More informationRecursive Algorithms. Recurrences. Recursive Algorithms Analysis
Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationEEO 401 Digital Signal Processing Prof. Mark Fowler
EEO 40 Digital Sigal Processig Prof. Mark Fowler Note Set #3 Covolutio & Impulse Respose Review Readig Assigmet: Sect. 2.3 of Proakis & Maolakis / Covolutio for LTI D-T systems We are tryig to fid y(t)
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationFIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser
FIR Filters Lecture #7 Chapter 5 8 What Is this Course All About? To Gai a Appreciatio of the Various Types of Sigals ad Systems To Aalyze The Various Types of Systems To Lear the Skills ad Tools eeded
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationMultilayer perceptrons
Multilayer perceptros If traiig set is ot liearly separable, a etwork of McCulloch-Pitts uits ca give a solutio If o loop exists i etwork, called a feedforward etwork (else, recurret etwork) A two-layer
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More informationMachine Learning Lecture 10
Today s Topic Machie Learig Lecture 10 Neural Networks 26.11.2018 Bastia Leibe RWTH Aache http://www.visio.rwth-aache.de leibe@visio.rwth-aache.de Deep Learig 2 Course Outlie Recap: AdaBoost Adaptive Boostig
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationOverview. Structured learning for feature selection and prediction. Motivation for feature selection. Outline. Part III:
Overview Structured learig for feature selectio ad predictio Yookyug Lee Departmet of Statistics The Ohio State Uiversity Part I: Itroductio to Kerel methods Part II: Learig with Reproducig Kerel Hilbert
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get
More informationOptimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem
Optimizatio Methods: Liear Programmig Applicatios Assigmet Problem Itroductio Module 4 Lecture Notes 3 Assigmet Problem I the previous lecture, we discussed about oe of the bech mark problems called trasportatio
More informationLecture 23 Rearrangement Inequality
Lecture 23 Rearragemet Iequality Holde Lee 6/4/ The Iequalities We start with a example Suppose there are four boxes cotaiig $0, $20, $50 ad $00 bills, respectively You may take 2 bills from oe box, 3
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationPixel Recurrent Neural Networks
Pixel Recurret Neural Networks Aa ro va de Oord, Nal Kalchbreer, Koray Kavukcuoglu Google DeepMid August 2016 Preseter - Neha M Example problem (completig a image) Give the first half of the image, create
More informationSelective Prediction
COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability
More informationAddition: Property Name Property Description Examples. a+b = b+a. a+(b+c) = (a+b)+c
Notes for March 31 Fields: A field is a set of umbers with two (biary) operatios (usually called additio [+] ad multiplicatio [ ]) such that the followig properties hold: Additio: Name Descriptio Commutativity
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationDivide & Conquer. Divide-and-conquer algorithms. Conventional product of polynomials. Conventional product of polynomials.
Divide-ad-coquer algorithms Divide & Coquer Strategy: Divide the problem ito smaller subproblems of the same type of problem Solve the subproblems recursively Combie the aswers to solve the origial problem
More informationNUMERICAL METHODS FOR SOLVING EQUATIONS
Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationSignals & Systems Chapter3
Sigals & Systems Chapter3 1.2 Discrete-Time (D-T) Sigals Electroic systems do most of the processig of a sigal usig a computer. A computer ca t directly process a C-T sigal but istead eeds a stream of
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More informationAdmin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)
Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web
More informationTHE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.
THE SOLUTION OF NONLINEAR EQUATIONS f( ) = 0. Noliear Equatio Solvers Bracketig. Graphical. Aalytical Ope Methods Bisectio False Positio (Regula-Falsi) Fied poit iteratio Newto Raphso Secat The root of
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationClassification with linear models
Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic
More informationLecture 2 Clustering Part II
COMS 4995: Usupervised Learig (Summer 8) May 24, 208 Lecture 2 Clusterig Part II Istructor: Nakul Verma Scribes: Jie Li, Yadi Rozov Today, we will be talkig about the hardess results for k-meas. More specifically,
More informationClassification Using Decision Trees. Jackknife Estimator: Example 1. Data Mining. Jackknife Estimator: Example 2(cont. Jackknife Estimator: Example 2
Data Miig CS 341, Sprig 2007 Lecture 8: Decisio tree algorithms Jackkife Estimator: Example 1 Estimate of mea for X={x 1, x 2, x 3,}, =3, g=3, m=1, θ = µ = (x( 1 + x 2 + x 3 )/3 θ 1 = (x( 2 + x 3 )/2,
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationLecture 3: Asymptotic Analysis + Recurrences
Lecture 3: Asymptotic Aalysis + Recurreces Data Structures ad Algorithms CSE 373 SU 18 BEN JONES 1 Warmup Write a model ad fid Big-O for (it i = 0; i < ; i++) { for (it j = 0; j < i; j++) { System.out.pritl(
More informationSTATISTICS 593C: Spring, Model Selection and Regularization
STATISTICS 593C: Sprig, 27 Model Selectio ad Regularizatio Jo A. Weller Lecture 2 (March 29): Geeral Notatio ad Some Examples Here is some otatio ad termiology that I will try to use (more or less) systematically
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationUncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty
Bayes Classificatio Ucertaity & robability Baye's rule Choosig Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's cocept learig Maximum Likelihood of real valued fuctio Bayes optimal Classifier
More informationLecture 11: Pseudorandom functions
COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationInverse Matrix. A meaning that matrix B is an inverse of matrix A.
Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix
More informationMachine Learning: Logistic Regression. Lecture 04
Machie Learig: Logistic Regressio Razva C. Buescu School of Electrical Egieerig ad Computer Sciece buescu@ohio.edu Supervised Learig ask = lear a uko fuctio t : X that maps iput istaces x Î X to output
More informationPerceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10
Perceptro Ier-product scalar Perceptro Perceptro learig rule XOR problem liear separable patters Gradiet descet Stochastic Approximatio to gradiet descet LMS Adalie 1 Ier-product et =< w, x >= w x cos(θ)
More informationMarkscheme May 2015 Calculus Higher level Paper 3
M5/5/MATHL/HP3/ENG/TZ0/SE/M Markscheme May 05 Calculus Higher level Paper 3 pages M5/5/MATHL/HP3/ENG/TZ0/SE/M This markscheme is the property of the Iteratioal Baccalaureate ad must ot be reproduced or
More information10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice
0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct
More information15-780: Graduate Artificial Intelligence. Density estimation
5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationA Framework for Analyzing Skew in Evaluation Metrics
A Framework for Aalyzig Skew i Evaluatio Metrics Alexader Liu Joydeep Ghosh Cheryl Marti Departmet of Electrical & Computer Egieerig Uiversity of Texas at Austi, Austi, TX 78712, USA. {aliu ghosh}@ece.utexas.edu
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationCS 5150/6150: Assignment 1 Due: Sep 23, 2010
CS 5150/6150: Assigmet 1 Due: Sep 23, 2010 Wei Liu September 24, 2010 Q1: (1) Usig master theorem: a = 7, b = 4, f() = O(). Because f() = log b a ε holds whe ε = log b a = log 4 7, we ca apply the first
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationSigma notation. 2.1 Introduction
Sigma otatio. Itroductio We use sigma otatio to idicate the summatio process whe we have several (or ifiitely may) terms to add up. You may have see sigma otatio i earlier courses. It is used to idicate
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More information4. Linear Classification. Kai Yu
4. Liear Classificatio Kai Y Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2 Liear Classifiers A simplest classificatio model
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationPRACTICE PROBLEMS FOR THE FINAL
PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to
More informationCS583 Lecture 02. Jana Kosecka. some materials here are based on E. Demaine, D. Luebke slides
CS583 Lecture 02 Jaa Kosecka some materials here are based o E. Demaie, D. Luebke slides Previously Sample algorithms Exact ruig time, pseudo-code Approximate ruig time Worst case aalysis Best case aalysis
More informationModel of Computation and Runtime Analysis
Model of Computatio ad Rutime Aalysis Model of Computatio Model of Computatio Specifies Set of operatios Cost of operatios (ot ecessarily time) Examples Turig Machie Radom Access Machie (RAM) PRAM Map
More informationCSE 4095/5095 Topics in Big Data Analytics Spring 2017; Homework 1 Solutions
CSE 09/09 Topics i ig Data Aalytics Sprig 2017; Homework 1 Solutios Note: Solutios to problems,, ad 6 are due to Marius Nicolae. 1. Cosider the followig algorithm: for i := 1 to α log e do Pick a radom
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More information