4. Linear Classification. Kai Yu
|
|
- Sophie Phelps
- 5 years ago
- Views:
Transcription
1 4. Liear Classificatio Kai Y
2 Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2
3 Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 3
4 Otlie Perceptro Algorithm Spport Vector Machies Logistic Regressio Smmary 4
5 Basic Nero 5
6 Perceptro Node Threshold Logic Uit x 1 w 1 x 2 w 2 b z x w z = 1 if x i w i b i=1 0 if x i w i < b i=1 6
7 Perceptro Learig Algorithm x z x x 1 x 2 t z = 1 if x i w i b i=1 0 if x i w i < b i=1 7
8 First Traiig Istace z = et =.8*.4.3*-.2 =.26 x 1 x 2 t z = 1 if x i w i b i=1 0 if x i w i < b i=1 8
9 Secod Traiig Istace z = et =.4*.4.1*-.2 =.14 x 1 x 2 t z = 1 if x i w i b i=1 0 if x i w i < b i=1 Δw i = (t - z) * c * x i 9
10 The Perceptro Learig Rle Δw ij = c(t j z j ) x i Least pertrbatio priciple Oly chage weights if there is a error (learig from mistakes) The chage amots to x i scaled by a small c Iteratively apply a example from the traiig set Each iteratio throgh the traiig set is a epoch Cotie traiig til total traiig error is less tha epsilo Perceptro Covergece Theorem: Garateed to fid a soltio i fiite time if a soltio exists 10
11 Otlie Perceptro Algorithm Spport Vector Machies Logistic Regressio Smmary 11
12 Spport Vector Machies: Overview A powerfl method for 2-class classificatio Origial idea: Vapik, 1965 for liear classifiers SVM, Corte ad Vapik, 1995 Becomes very hot sice late 90 s Better geeralizatio (less overfittig) Key ideas Use kerel fctio to trasform low dimesioal traiig samples to higher dim (for liear separability problem) Use qadratic programmig (QP) to fid the best classifier bodary hyperplae (for global optima ad)
13 Liear Classifiers x α f y est deotes 1 deotes -1 f(x,w,b) = sig(w. x - b) How wold yo classify this data?
14 Liear Classifiers x α f y est deotes 1 deotes -1 f(x,w,b) = sig(w. x - b) How wold yo classify this data?
15 Liear Classifiers x α f y est deotes 1 deotes -1 f(x,w,b) = sig(w. x - b) How wold yo classify this data?
16 Liear Classifiers x α f y est deotes 1 deotes -1 f(x,w,b) = sig(w. x - b) How wold yo classify this data?
17 Liear Classifiers x α f y est deotes 1 deotes -1 f(x,w,b) = sig(w. x - b) Ay of these wold be fie....bt which is best?
18 Classifier Margi x α f y est deotes 1 deotes -1 f(x,w,b) = sig(w. x - b) Defie the margi of a liear classifier as the width that the bodary cold be icreased by before hittig a datapoit.
19 Maximm Margi x α f y est deotes 1 deotes -1 Liear SVM f(x,w,b) = sig(w. x - b) The maximm margi liear classifier is the liear classifier with maximm margi. This is the simplest kid of SVM (Called a LSVM)
20 Maximm Margi x α f y est deotes 1 deotes -1 Spport Vectors are those datapoits that the margi pshes p agaist Liear SVM f(x,w,b) = sig(w. x - b) The maximm margi liear classifier is the liear classifier with maximm margi. This is the simplest kid of SVM (Called a LSVM)
21 Comptig the margi width M = Margi Width How do we compte M i terms of w ad b? Pls-plae = { x : w. x b = 1 } Mis-plae = { x : w. x b = -1 } 2 Margi M= w.w
22 Why Maximm Margi? deotes 1 deotes -1 Spport Vectors are those datapoits that the margi pshes p agaist f(x,w,b) = sig(w. x - b) The maximm margi liear classifier is the liear classifier with the, m, maximm margi. 1. Ititively this feels safest. 2. If we ve made a small error i the locatio of the bodary this gives s least chace of casig a misclassificatio. This is the simplest kid of SVM (Called a LSVM) 3. It also helps geeralizatio 4. There s some theory that this is a good thig. 5. Empirically it works very very well.
23 Aother way to derstad max margi For f(x)=w.xb, oe way to characterize the smoothess of f(x) is f (x) x = w Therefore, margi measres the smoothess of f(x). As a rle of thmb, machie learig prefers smooth fctios: similar x s shold have similar y s. 8/7/12 23
24 Learig the Maximm Margi Classifier x M = Margi Width = 2 w. w x - Give a gess of w ad b we ca Compte whether all data poits i the correct half-plaes Compte the width of the margi So ow we jst eed to write a program to search the space of w s ad b s to fid the widest margi that matches all the data poits. How?
25 Learig via Qadratic Programmig QP is a well-stdied class of optimizatio algorithms to maximize a qadratic fctio of some real-valed variables sbject to liear costraits. Miimize both w w (to maximize M) ad misclassificatio error
26 Qadratic Programmig 2 arg max d R c T T Fid m m m m m m b a a a b a a a b a a a... : ) ( ) ( 2 )2 ( 1 )1 ( 2) ( 2) ( 2 2)2 ( 1 2)1 ( 1) ( 1) ( 2 1)2 ( 1 1)1 (... : e m m e e e m m m m b a a a b a a a b a a a = = = Ad sbject to additioal liear ieqality costraits e additioal liear eqality costraits Qadratic criterio Sbject to
27 Learig withot Noise What shold or qadratic optimizatio criterio be? Miimize 1 2 w. w C R k = 1 å k What costraits shold be? w. x k b >= 1 if y k = 1 w. x k b <= -1 if y k = -1 27
28 Solvig the Optimizatio Problem Fid w ad b sch that Φ(w) =w T w is miimized ad for all (x i, y i ), i=1.. : y i (w T x i b) 1 Need to optimize a qadratic fctio sbject to liear costraits. Qadratic optimizatio problems are a well-kow class of mathematical programmig problems for which several (o-trivial) algorithms exist. The soltio ivolves costrctig a dal problem where a Lagrage mltiplier α i is associated with every ieqality costrait i the primal (origial) problem: Fid α 1 α sch that Q(α) =Σα i - ½ΣΣα i α j y i y j x it x j is maximized ad (1) Σα i y i = 0 (2) α i 0 for all α i
29 Soft Margi Classificatio What if the traiig set is ot liearly separable? Slack variables ε i ca be added to allow misclassificatio of difficlt or oisy examples, resltig so-called soft margi. ε i ε i
30 Learig Maximm Margi with Noise ε 2 ε 11 ε 7 M = 2 w.w Give gess of w, b we ca Compte sm of distaces of poits to their correct zoes Compte the margi width Assme R data poits, each (x k,y k ) where y k = /- 1 What shold or qadratic optimizatio criterio be? Miimize R 1 2 w.w C ε k k=1 ε k = distace of error poits to their correct place How may costraits will we have? R What shold they be? w. x k b >= 1-ε k if y k = 1 w. x k b <= -1ε k if y k = -1 ε k >= 0 for all k
31 Soft Margi Classificatio Mathematically The old formlatio: Fid w ad b sch that Φ(w) =w T w is miimized ad for all (x i,y i ), i=1.. : y i (w T x i b) 1 Modified formlatio icorporates slack variables: Fid w ad b sch that Φ(w) =w T w CΣξ i is miimized ad for all (x i,y i ), i=1.. : y i (w T x i b) 1 ξ i,, ξ i 0 Parameter C ca be viewed as a way to cotrol overfittig: it trades off the relative importace of maximizig the margi ad fittig the traiig data.
32 Hige loss The soft margi SVM is eqivalet to applyig a hige loss L(w, b) := X max(1 y i (w T x i b), 0). Eqivalet costraied optimizatio formlatio i=1 mi {w,b} L(w,b)λ w 2 λ=0.5/c 2 1 L Hige Loss 0-1 Loss y.(w.xb) /7/12 32
33 Otlie Perceptro Algorithm Spport Vector Machies Logistic Regressio Smmary 33
34 Logistic Regressio Biary respose: Y ={1, -1} where p i is the probability of Y i =1 Likelihood Y i X i Berolli(p i ) p i = 1 1 exp( W T X i ) Y P (Y i X i )= i=1 Y 1 1 exp( Y i Xi T W ) i=1 8/7/12 34
35 Logistic regressio Maximm likelihood estimator (MLE) becomes logistic regressio X X mi l p(y i X i )= l(1 exp(y i Xi T W )) W i=1 i=1 Covex optimizatio problem i terms of W MAP is reglarized logistic regressio mi W X l(1 exp(y i Xi T W )) kw k 2 i=1 8/7/12 35
36 Otlie Perceptro Algorithm Spport Vector Machies Logistic Regressio Smmary 36
37 Geeral formlatio of liear classifiers mi {w,b} L(w,b)λ w 2 The objective: empirical loss reglarizatio The reglarizatio term is sally L2 orm, bt also ofte L1 orm for sparse models The empirical loss ca be hige loss, logistic loss, smooth hige loss, or yor ow ivetio 8/7/12 37
38 Differet loss fctios classificatio error SVM logistic regressio least sqares expoetial modified hber loss f(x)*y 8/7/12 38
39 Commets o liear classifiers Choosig logistic regressio or SVM? Not that big differet! Logistic regressio otpts probabilities. Smooth loss fctios, e.g. logistic Easier to optimize (LBFGS ) Hige à Differetiable hige, the yo ca easily have yor ow implemetatio of SVMs Try differet loss fctios ad reglarizatio terms Deped o data, e.g., may otliers? Irrelevat featres? strctre i otpt? 8/7/12 39
40 Liear classifiers i practice ad research Liear classifiers are simple ad scalable Traiig complexity is liear or sb-liear w.r.t. data size Classificatio is simple to implemet State-of-the-art for texts, images, Their sccess depeds o qality of featres A sefl practice: se a lot of featres, lear sparse w Hot topic: large-scale liear classificatio May data, may featres, may classes Stochastic optimizatio Parallel implemetatio 8/7/12 40
Linear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationAndrew W. Moore Professor School of Computer Science Carnegie Mellon University
Spport Vector Machines Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce material sefl in giving yor own lectres. Feel free to se these slides verbatim, or
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationLINEARIZATION OF NONLINEAR EQUATIONS By Dominick Andrisani. dg x. ( ) ( ) dx
LINEAIZATION OF NONLINEA EQUATIONS By Domiick Adrisai A. Liearizatio of Noliear Fctios A. Scalar fctios of oe variable. We are ive the oliear fctio (). We assme that () ca be represeted si a Taylor series
More informationAdmin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)
Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationLEAST SQUARES UNIVERSUM TSVM
IJAMML 5: (06) -3 September 06 ISSN: 394-58 Available at http://scietificadvaces.co.i DOI: http://dx.doi.org/0.864/ijamml_700598 LEAS SQUARES UNIVERSUM SVM Yameg Li ad Liya Fa School of Mathematics Scieces
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationMAT2400 Assignment 2 - Solutions
MAT24 Assigmet 2 - Soltios Notatio: For ay fctio f of oe real variable, f(a + ) deotes the limit of f() whe teds to a from above (if it eists); i.e., f(a + ) = lim t a + f(t). Similarly, f(a ) deotes the
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationIntroduction. Question: Why do we need new forms of parametric curves? Answer: Those parametric curves discussed are not very geometric.
Itrodctio Qestio: Why do we eed ew forms of parametric crves? Aswer: Those parametric crves discssed are ot very geometric. Itrodctio Give sch a parametric form, it is difficlt to kow the derlyig geometry
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More information10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice
0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct
More informationPattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More informationTHE CONSERVATIVE DIFFERENCE SCHEME FOR THE GENERALIZED ROSENAU-KDV EQUATION
Zho, J., et al.: The Coservative Differece Scheme for the Geeralized THERMAL SCIENCE, Year 6, Vol., Sppl. 3, pp. S93-S9 S93 THE CONSERVATIVE DIFFERENCE SCHEME FOR THE GENERALIZED ROSENAU-KDV EQUATION by
More informationMarkov Decision Processes
Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes
More informationDifferentiable Convex Functions
Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationNumerical Methods for Finding Multiple Solutions of a Superlinear Problem
ISSN 1746-7659, Eglad, UK Joral of Iformatio ad Comptig Sciece Vol 2, No 1, 27, pp 27- Nmerical Methods for Fidig Mltiple Soltios of a Sperliear Problem G A Afrozi +, S Mahdavi, Z Naghizadeh Departmet
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationTopics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression
6.867 Machie learig: lecture 3 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics Beod liear regressio models additive regressio models, eamples geeralizatio ad cross-validatio populatio miimizer Statistical
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationRegression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.
Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat
More informationOn the Theory of Learning with Privileged Information
O the Theory of Learig with Privileged Iformatio Dmitry Pechyoy NEC Laboratories Priceto, NJ 08540, USA pechyoy@ec-labs.com Vladimir Vapik NEC Laboratories Priceto, NJ 08540, USA vlad@ec-labs.com Abstract
More informationIntroduction to Optimization Techniques. How to Solve Equations
Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually
More informationtoo many conditions to check!!
Vector Spaces Aioms of a Vector Space closre Defiitio : Let V be a o empty set of vectors with operatios : i. Vector additio :, v є V + v є V ii. Scalar mltiplicatio: li є V k є V where k is scalar. The,
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationSome examples of vector spaces
Roberto s Notes o Liear Algebra Chapter 11: Vector spaces Sectio 2 Some examples of vector spaces What you eed to kow already: The te axioms eeded to idetify a vector space. What you ca lear here: Some
More informationPartial Differential Equations
EE 84 Matematical Metods i Egieerig Partial Differetial Eqatios Followig are some classical partial differetial eqatios were is assmed to be a fctio of two or more variables t (time) ad y (spatial coordiates).
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationLearning Bounds for Support Vector Machines with Learned Kernels
Learig Bouds for Support Vector Machies with Leared Kerels Nati Srebro TTI-Chicago Shai Be-David Uiversity of Waterloo Mostly based o a paper preseted at COLT 06 Kerelized Large-Margi Liear Classificatio
More informationIn this document, if A:
m I this docmet, if A: is a m matrix, ref(a) is a row-eqivalet matrix i row-echelo form sig Gassia elimiatio with partial pivotig as described i class. Ier prodct ad orthogoality What is the largest possible
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More informationA NOTE ON THE TOTAL LEAST SQUARES FIT TO COPLANAR POINTS
A NOTE ON THE TOTAL LEAST SQUARES FIT TO COPLANAR POINTS STEVEN L. LEE Abstract. The Total Least Squares (TLS) fit to the poits (x,y ), =1,,, miimizes the sum of the squares of the perpedicular distaces
More information18.657: Mathematics of Machine Learning
18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex
More informationIntelligent Systems I 08 SVM
Itelliget Systems I 08 SVM Stefa Harmelig & Philipp Heig 12. December 2013 Max Plack Istitute for Itelliget Systems Dptmt. of Empirical Iferece 1 / 30 Your feeback Ejoye most Laplace approximatio gettig
More informationThe Expectation-Maximization (EM) Algorithm
The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationMachine Learning Assignment-1
Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationPerceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10
Perceptro Ier-product scalar Perceptro Perceptro learig rule XOR problem liear separable patters Gradiet descet Stochastic Approximatio to gradiet descet LMS Adalie 1 Ier-product et =< w, x >= w x cos(θ)
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationJacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3
No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies
More information7 LINEAR MODELS. 7.1 The Optimization Framework for Linear Models. Learning Objectives:
7 LINEAR MODELS The essece of mathematics is ot to make simple thigs complicated, but to make complicated thigs simple. Staley Gudder I Chapter 4, you leared about the perceptro algorithm for liear classificatio.
More informationLecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm
8.409 A Algorithmist s Toolkit Nov. 9, 2009 Lecturer: Joatha Keler Lecture 20 Brief Review of Gram-Schmidt ad Gauss s Algorithm Our mai task of this lecture is to show a polyomial time algorithm which
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationSupplemental Material: Proofs
Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationGrouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014
Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group
More informationStat410 Probability and Statistics II (F16)
Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems
More informationApplication of Digital Filters
Applicatio of Digital Filters Geerally some filterig of a time series take place as a reslt of the iability of the recordig system to respod to high freqecies. I may cases systems are desiged specifically
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES.
ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES. ANDREW SALCH 1. The Jacobia criterio for osigularity. You have probably oticed by ow that some poits o varieties are smooth i a sese somethig
More informationSolution of Final Exam : / Machine Learning
Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationBIOINF 585: Machine Learning for Systems Biology & Clinical Informatics
BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationResponse Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable
Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated
More informationDept. of Biomed. Eng. BME801: Inverse Problems in Bioengineering Kyung Hee Univ. q R l
Dept. of Biomed. Eg. BME81: Iverse Problems i Bioegieerig Kyg Hee Uiv. Optimizatio - We stdy parameter optimizatio algorithms. (1) Geeric problem defiitio - Basic igrediets: M, U, G U R M y R r q R l -
More informationChapter 10: Power Series
Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because
More informationThe z Transform. The Discrete LTI System Response to a Complex Exponential
The Trasform The trasform geeralies the Discrete-time Forier Trasform for the etire complex plae. For the complex variable is sed the otatio: jω x+ j y r e ; x, y Ω arg r x + y {} The Discrete LTI System
More informationThe Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model
Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=
More informationStatistical Learning: Algorithms and Theory. Sayan Mukherjee
Statistical Learig: Algorithms ad Theory Saya Mukherjee Cotets Statistical Learig: Algorithms ad Theory Lecture. Course prelimiaries ad overview Lecture 2. The learig problem 5 2.. Key defiitios i learig
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationCourse Outline. Curves for Modeling. Graphics Pipeline. Outline of Unit. Motivation. Foundations of Computer Graphics (Spring 2012)
Fodatios of Compter Graphics (Sprig 0) CS 84, Lectre : Crves http://ist.eecs.bereley.ed/~cs84 3D Graphics Pipelie Corse Otlie Modelig Aimatio Rederig Graphics Pipelie Crves for Modelig I HW, HW, draw,
More informationWeek 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :
ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios
More informationA Course in Machine Learning
A Course i Machie Learig Hal Daumé III 6 LINEAR MODELS The essece of mathematics is ot to make simple thigs complicated, but to make complicated thigs simple. Staley Gudder I Chapter, you leared about
More informationMassachusetts Institute of Technology
Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the
More information