FMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu
|
|
- Terence Casey
- 6 years ago
- Views:
Transcription
1 FMA90F: Machie Learig Lecture 4: Liear Models for Classificatio Cristia Smichisescu
2 Liear Classificatio Classificatio is itrisically o liear because of the traiig costraits that place o idetical iputs i the same class Differeces i the iput vector sometimes causes 0 chage i the aser Liear classificatio meas that the adaptive part is liear The adaptive part is cascaded ith a fixed o liearity It may also be preceded by a fixed o liearity he oliear basis fuctios are used fixed o liear fuctios T y x x 0, c f y x adaptive liear parameters decisio
3 Approach : Discrimiat Fuctio Use discrimiat fuctios directly, ad do ot compute probabilities Covert the iput vector ito oe or more real values so that a simple process threshholdig, or a majority vote ca be applied to assig the iput to the class The real values should be chose to maximize the useable iformatio about the class label preset i the real value Give discrimiat fuctios,, Classify as class, iff,
4 Approach : Class coditioal Probabilities Ifer coditioal class probabilities Use coditioal distributio to make optimal decisios, e.g. by miimizig some loss fuctio Example, classes, here exp
5 Approach 3: Class Geerative Model Compare the probability of the iput uder separate, classspecific, geerative models Model both the class coditioal desities, ad the prior class probabilities Compute posterior usig Bayes theorem class coditioal desity class prior posterior for class = Example: fit a multivariate Gaussia to the iput vectors correspodig to each class, model class prior probabilities by traiig data frequecy couts, ad see hich Gaussia makes a test data vector most probable usig Bayes theorem
6 Differet Types of Plots i the Course Weight space Each axis correspods to a eight A poit is a eight vector Dimesioality = #eights + extra dimesio for the loss Data space Each axis correspods to a iput value A poit is a data vector. A decisio surface is a plae. Dimesioality = dimesioality of a data vector Case space used for the geometric iterpretatio of least squares, L3 Each axis correspods to a traiig case Dimesioality = #traiig cases
7 class case: The decisio surface i data space for the liear discrimiat fuctio T y x x 0 is orthogoal to ay vector hich lies o the decisio surface, 0 cotrols the orietatio of the decisio surface 0 x
8 Represet Target Values: Biary vs. Multiclass To classes N=: typically use a sigle real valued output that has target values of for the positive class ad 0 or for the egative class For probabilistic class labels, the target ca be the probability of the positive class ad the output of the model ca be the probability the model assigs to the positive class For the multiclass N>, e use a vector of N target values cotaiig a sigle for the correct class ad zeros elsehere For probabilistic labels e ca the use a vector of class probabilities as the target vector
9 Discrimiat Fuctios for Multiclass Oe possibility is to use biary ay discrimiats Each fuctio separates oe class from the rest Aother possibility is to use biary ay discrimiats Each fuctio discrimiates betee to specific classes. We have discrimiat for each class pair Both methods have ambiguities
10 Problems ith Multi class Discrimiat Fuctios Costructed from Biary Classifiers vs. all vs. If e base our decisio o biary classifiers, e ca ecouter ambiguities
11 Simple Solutio Use discrimiat fuctios,,,,, ad take the max over their respose Cosider liear discrimiats The decisio boudary betee class ad is give by the hyperplae 0 I this liear case the decisio regios are covex,,,0 From the liearity of But y k x A y j x A ad yk xb y j xb Hece is covex also lies iside
12 Least Squares for Classificatio This is ot ecessarily the right approach i priciple, ad it does ot ork as ell as more advaced methods, but is simple It reduces classificatio to least squares regressio We already ko ho to do regressio. We ca solve for the optimal eights usig the ormal equatios L3 We set the target to be the coditioal probability of the class give iput Whe more tha to classes, e treat each class as a separate problem The justificatio for usig least squares is that it approximates the coditioal expectatio. For the biary codig scheme, this expectatio is give by the vector of posterior probabilities. Ufortuately these are approximated rather poorly e.g. values outside the rage 0,, due to the limited flexibility of the model
13 Least Squares Classificatio Assume each class has its o liear model: The e ca rite:, ith th colum a dim vector,,, Give,,,, ; ro of is ; s ro is The sum of squares error fuctio for classificatio is: Tr } 0 is the pseudoiverse of Closed form solutio Property: every vector i the traiig set ad the model predictio for ay value of, satisfy some liear costrait: 0, 0, for some costats,.
14 Problems ith usig least squares for classificatio logistic regressio least squares regressio Least squares solutios lack robustess to outliers If the right aser is ad the model says.5, it loses, so it chages the boudary to avoid beig too correct
15 For o Gaussia targets, least squares regressio gives poor decisio surfaces Least Squares Logistic Regressio Remember that least squares correspods to the Maximum Likelihood uder a Gaussia coditioal distributio Clearly the biary target vectors have a distributio that is far from Gaussia
16 Fisher s Liear Discrimiat We ca vie classificatio i terms of dimesioality reductio A simple liear discrimiat fuctio is a projectio of the dimesioal data do to dimesio Project: ; Classify: if the else Hoever projectio results i loss of iformatio. Classes ell separated i the origial iput space may strogly overlap i d We ill adjust the projectio eight vector to achieve the best separatio amog classes. But hat do e mea by best separatio?
17 Fisher s Vie of Class Separatio I The simplest measure of class separatio he projected oto is the separatio of the projected class meas. This suggests choosig so to miimize,,, This ca be made arbitrarily large by icreasig. We could hadle this by imposig uit orm costraits usig Lagrage multipliers. We get max, s.th. Hoever, still, if the mai directio of variace i each class is ot orthogoal to the directio betee meas, e ill ot get good separatio see ext slide
18 Advatage of usig Fisher s Criterio Whe projected oto the lie joiig the class meas, the classes are ot ell separated Fisher chooses a directio that makes the projected classes much tighter, eve though their projected meas are less far apart
19 Fisher s Vie of Class Separatio II Fisher: maximize a fuctio that gives a large separatio betee the projected class meas, hile also givig a small variace ithi each class, thereby miimizig class overlap Choose directio maximizig the ratio of betee class variace to ithi class variace This is the directio i hich the projected poits cotai the most iformatio about class membership uder Gaussia assumptios
20 Fisher s Liear Discrimiat We seek a liear trasformatio that is best for discrimiatio y T x The projectio oto the vector separatig the class meas seems right m m But e also at small variace ithi each class, Fisher s objective fuctio J m s m s Betee class Withi class
21 solutio: Optimal here m m S m x m x m x m x S m m m m S S S W C T C T W T B W T B T s s m m J Fisher s Liear Discrimiat Derivatios lx scalar scalar The above result is ko as Fischer s liear discrimiat. Strictly it is ot a discrimiat, but rather a directio of projectio that ca be used for classificatio i cojuctio ith a decisio e.g. thresholdig operatio.
22 Fischer s Liear Discrimiat Computatio Hoever, the objective is ivariat to rescalig. We ca chose the deomiator to be uity. We ca the miimize mi This correspods to the primal Lagragia From the KKT coditios Geeralized eigevalue problem, as ot symmetric
23 Fischer s Liear Discrimiat Computatio Give that is symmetric positive defiite, e ca rite / / here, / / Defiig /, e get / / We have to solve a regular eigevalue problem for a symmetric, positive defiite matrix / / We ca fid solutios ad correspodig to / Which eigevector ad eigevalue should e chose? The largest! Why? Trasformig to dual, cost. eed to maximize over
24 The Perceptro Model cca. 96 Liear discrimiat model Iput vector is first mapped usig a fixed o liear trasformatio, to give a feature vector, the used to costruct liear model here, 0, 0 Typically use for class, for Feature vector icludes a bias compoet
25 Perceptro Criteria I Perceptro s algorithm ca be motivated by error fuctio miimizatio A atural error ould be the umber of misclassified patters Hoever this does ot lead to a simple learig algorithm, because the error is a pieceise fuctio of Discotiuities heever a chage i causes the decisio boudary to move across oe of the datapoits Gradiet methods caot be immediately applied, as the gradiet is zero almost everyhere
26 Perceptro Criteria II Patters i class ill have Patters i class ill have Target codig Hece e ould like all patters to satisfy The perceptro associates error to correctly classified patters, hereas for a misclassified patter, it tries to miimize the quatity
27 Perceptro Criteria III The perceptro criterio is give by here is the set of misclassified examples By applyig stochastic gradiet descet = Sice perceptio s fuctio is ivariat to the rescalig of, e ca set As chages, so ill the set of misclassified patters
28 Algorithm We cycle through the traiig patters i tur For each patter e evaluate the perceptro fuctio output If the patter is correctly classified, the the eight vector remais uchaged If the patter is icorrectly classified For class e add vector to the curret estimate of the eight vector For class C e subtract vector from the curret estimate of the eight vector
29 Weight ad Data Space Imagie a space i hich each axis correspods to a feature value or to the eight o that feature A poit i this space is a eight vector. Feature vectors are sho i blue traslated aay from the origi to reduce clutter. Each traiig case defies a plae. O oe side of the plae the output is rog. To get all traiig cases right e eed to fid a poit o the right side of all the plaes. This feasible regio if it exists is a coe ith its tip at the origi A feature vector ith correct aser= good eights A feature vector ith correct aser=0 bad eights o the origi Slide from Hito
30 Perceptro s Covergece Cotributio to error fuctio from a misclassified patter is reduced Hoever, this does ot imply that cotributios from other misclassified patters ill have bee reduced The perceptro rule is ot guarateed to reduce the total error fuctio at each stage Novikoff 96 proved that the perceptro algorithm coverges after a fiite umber of iteratios, if the data set is liearly separable The eight vector is alays adjusted by a bouded amout i a directio it has a egative dot product ith, ad thus ca be bouded above by here is the umber of chages to. But it ca also be bouded belo by because if there exists a uko feasible, the every chage makes progress i this directio by a positive amout that depeds oly o the iput vector. This ca be used to sho that the umber of updates to the eight vector is bouded by, here is the maximum orm of a iput vector.
31 Summary: Perceptro s Covergece Perceptro s covergece theorem: if there exists a exact solutio data is liearly separable, the the perceptro algorithm is guarateed to fid a exact solutio i a fiite umber of steps The umber of steps could be very large, though Util covergece e caot distiguish betee a o separable problem, or oe that is just slo to coverge Eve for liearly separable data, there may be may solutios, depedig o the parameter iitializatio ad the order i hich datapoits are preseted
32 Perceptro at Work
33 Other Issues ith the Perceptro Does ot provide probabilistic outputs Does ot geeralize readily to more tha classes Is based o liear combiatios of fixed basis fuctios
34 What Perceptros Caot Lear The adaptive part of a perceptro caot eve tell if to sigle bit features have the same value! Same:, ; 0,0 Differet:,0 0; 0, 0 0, Data Space, The four feature output pairs give four iequalities that are impossible to satisfy:,, 0 0,0,0 The positive ad egative cases caot be separated by a plae Slide from Hito
35 The Logistic Sigmoid due to S shape This is also called a squashig fuctio because it maps the etire real axis ito a fiite iterval For classificatio, the output is a smooth fuctio of the iputs ad the eights Properties, l y 0.5 logit fuctio a y a a dy da i T x x i y 0 y e a a x i i 0 0 a
36 Probabilistic Geerative Models Use a class prior ad a separate geerative model of the iput vectors for each class, ad compute hich model makes a test iput vector most probable The posterior probability of class is give by: l l x x x x x x x x C p C p C p C p C p C p a here e C p C p C p C p C p C p C p a z is called the logit ad is give by the log odds Logistic sigmoid
37 Multiclass Model Softmax here l This is ko as the ormalized expoetial Ca be vieed as a multiclass geeralizatio of the logistic sigmoid It is also called a softmax fuctio it is a smoothed versio of `max if, the ad 0
38 Gaussia Class Coditioals Assume that the iput vectors for each class are from a Gaussia distributio, ad all classes have the same covariace matrix. The class coditioals are For to classes, ad, the posterior turs out to be a logistic exp / k T k C k Z p μ x μ x x l 0 0 C p C p C p T T T μ Σ μ μ Σ μ μ μ Σ x x iverse covariace matrix ormalizer Quadratic terms caceled due to commo covariace
39 Iterpretatio of Decisio Boudaries Quadratic terms caceled due to commo covariace The sigmoid takes a liear fuctio of as argumet The decisio boudaries correspod to surfaces alog hich the posteriors are costat, so they ill be give by liear fuctios of. Thus, decisio boudaries are liear fuctios i iput space The prior probabilities eter oly through the bias parameter, so chages i priors have the effect of makig parallel shifts of the decisio boudary more geerally of the parallel cotours of costat posterior probability l 0 0 C p C p C p T T T μ Σ μ μ Σ μ μ μ Σ x x
40 A picture of the to Gaussia models ad the resultig posterior for the red class, The logistic sigmoid i the right had plot is coloured usig a proportio of red toe give by ad a proportio of blue toe give by.
41 Class posteriors he covariace matrices are differet for differet classes The decisio surface is plaar he the covariace matrices are the same ad quadratic he they are ot
42 Effect of usig Basis Fuctios Ceters of Gaussia basis fuctios ad ith gree iso cotours Liear decisio boudary logistic regressio i feature space Decisio boudary iduced i iput space
43 Probabilistic Discrimiative Models Logistic Regressio I our discussio of geerative approaches, e sa that uder geeral assumptios, the class posterior for ca be ritte as a logistic sigmoid actig o a liear fuctio of the feature vector I logistic regressio, e use the fuctioal form of the geeralized liear model explicitly, here exp Feer adaptive parameters compared to the geerative model For dimesioal feature space Discrimiative: parameters Geerative: parameters for the meas + shared! covariace total parameters Quadratic versus liear umber of parameters! parameters for
44 Maximum Likelihood for Logistic Regressio ; For dataset,, ith 0,,,,, exp,,,,,, l l Cross etropy error l Similar form as the gradiet of the sum of squares regressio model
45 Iterative Reeighted Least Squares The Neto Raphso update Logistic model is the x desig matrix ith th ro here,0 ; The 0, It follos that Normal equatios ith o costat eightig matrix here
46 Logistic Regressio Chai Rule for Error Derivatives T t y a da dy y E E y y da dy y y t y y E a a, 0, l l l N N y y t y y t y t y E y t y t y t p E
47 Facts o IRLS, The eightig matrix is ot costat, but the Hessia is positive defiite This meas that e have to iterate to fid the solutio, but the likelihood fuctio is cocave i. We have a uique optimum The th compoet of ca be iterpreted as a effective target value obtaied by makig a local liear approximatio to the logistic sigmoid aroud the curret operatig poit The elemets of the diagoal eightig matrix ca be iterpreted as variaces We ca iterpret IRLS as the solutio to a liearized problem i the space of the variable the sigmoid argumet
48 Readigs Bishop Ch. 4, up to 4.3.4
10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More information10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice
0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationLecture 7: Linear Classification Methods
Homeork Homeork Lecture 7: Liear lassificatio Methods Fial rojects? Grous Toics Proosal eek 5 Lecture is oster sessio, Jacobs Hall Lobb, sacks Fial reort 5 Jue. What is liear classificatio? lassificatio
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationIntroduction to Optimization Techniques. How to Solve Equations
Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually
More informationClassification with linear models
Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationApply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.
Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationThe z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j
The -Trasform 7. Itroductio Geeralie the complex siusoidal represetatio offered by DTFT to a represetatio of complex expoetial sigals. Obtai more geeral characteristics for discrete-time LTI systems. 7.
More information6.003 Homework #3 Solutions
6.00 Homework # Solutios Problems. Complex umbers a. Evaluate the real ad imagiary parts of j j. π/ Real part = Imagiary part = 0 e Euler s formula says that j = e jπ/, so jπ/ j π/ j j = e = e. Thus the
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationGrouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014
Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationLecture 2 October 11
Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationSupplemental Material: Proofs
Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationPattern Classification, Ch4 (Part 1)
Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher
More information6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition
6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More informationDefinitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.
Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,
More information( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let.
Joh Riley 8 Jue 03 ANSWERS TO EXERCISES IN APPENDIX B Sectio B VECTORS AND SETS Exercise B-: Covex sets (a) Let 0 x, x X, X, hece 0 x, x X ad 0 x, x X Sice X ad X are covex, x X ad x X The x X X, which
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More informationSession 5. (1) Principal component analysis and Karhunen-Loève transformation
200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image
More informationDistributional Similarity Models (cont.)
Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio
More informationWEIGHTED LEAST SQUARES - used to give more emphasis to selected points in the analysis. Recall, in OLS we minimize Q =! % =!
WEIGHTED LEAST SQUARES - used to give more emphasis to selected poits i the aalysis What are eighted least squares?! " i=1 i=1 Recall, i OLS e miimize Q =! % =!(Y - " - " X ) or Q = (Y_ - X "_) (Y_ - X
More informationStudy the bias (due to the nite dimensional approximation) and variance of the estimators
2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationL = n i, i=1. dp p n 1
Exchageable sequeces ad probabilities for probabilities 1996; modified 98 5 21 to add material o mutual iformatio; modified 98 7 21 to add Heath-Sudderth proof of de Fietti represetatio; modified 99 11
More informationChapter 4. Fourier Series
Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationJacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3
No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies
More informationDistributional Similarity Models (cont.)
Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical
More informationChapter 7 z-transform
Chapter 7 -Trasform Itroductio Trasform Uilateral Trasform Properties Uilateral Trasform Iversio of Uilateral Trasform Determiig the Frequecy Respose from Poles ad Zeros Itroductio Role i Discrete-Time
More informationRandom Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices
Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationLECTURE 17: Linear Discriminant Functions
LECURE 7: Liear Discrimiat Fuctios Perceptro leari Miimum squared error (MSE) solutio Least-mea squares (LMS) rule Ho-Kashyap procedure Itroductio to Patter Aalysis Ricardo Gutierrez-Osua exas A&M Uiversity
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationPattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationChapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian
Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde
More informationDiscrete-Time Systems, LTI Systems, and Discrete-Time Convolution
EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [
More informationa for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a
Math S-b Lecture # Notes This wee is all about determiats We ll discuss how to defie them, how to calculate them, lear the allimportat property ow as multiliearity, ad show that a square matrix A is ivertible
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More informationCov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.
CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model
More informationDifferentiable Convex Functions
Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for
More informationMCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions
Faculty of Egieerig MCT242: Electroic Istrumetatio Lecture 2: Istrumetatio Defiitios Overview Measuremet Error Accuracy Precisio ad Mea Resolutio Mea Variace ad Stadard deviatio Fiesse Sesitivity Rage
More informationMachine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008
Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationBIOINF 585: Machine Learning for Systems Biology & Clinical Informatics
BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?
More informationSolution of Final Exam : / Machine Learning
Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if
More informationChapter 7: The z-transform. Chih-Wei Liu
Chapter 7: The -Trasform Chih-Wei Liu Outlie Itroductio The -Trasform Properties of the Regio of Covergece Properties of the -Trasform Iversio of the -Trasform The Trasfer Fuctio Causality ad Stability
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationLecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm
8.409 A Algorithmist s Toolkit Nov. 9, 2009 Lecturer: Joatha Keler Lecture 20 Brief Review of Gram-Schmidt ad Gauss s Algorithm Our mai task of this lecture is to show a polyomial time algorithm which
More informationRecitation 4: Lagrange Multipliers and Integration
Math 1c TA: Padraic Bartlett Recitatio 4: Lagrage Multipliers ad Itegratio Week 4 Caltech 211 1 Radom Questio Hey! So, this radom questio is pretty tightly tied to today s lecture ad the cocept of cotet
More informationTEACHER CERTIFICATION STUDY GUIDE
COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationThe Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model
Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More information