Support vector machine revisited

Size: px
Start display at page:

Download "Support vector machine revisited"

Transcription

1 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector machie ito its dual form where the examples oly appear i ier products. To this ed, assume we have mapped the examples ito feature vectors φ(x) of dimesio d ad that the resultig traiig set (φ(x 1 ), y 1 ),..., (φ(x ), y ) is liearly separable. Fidig the maximum margi liear separator i the feature space ow correspods to solvig miimize θ 2 /2 subject to y t (θ T φ(x t ) + θ 0 ) 1, t = 1,..., (1) We will discuss later o how slack variables affect the resultig kerel (dual) form. They merely complicate the derivatio without chagig the procedure. Optimizatio problems of the above type (covex, liear costraits) ca be tured ito their dual form by meas of Lagrage multipliers. Specifically, we itroduce a o-egative scalar parameter α t for each iequality costrait ad cast the estimatio problem i terms of θ ad α = {α 1,..., α }: [ ] J(θ, θ 0 ; α) = θ 2 /2 α t y t (θ T φ(x t ) + θ 0 ) 1 (2) The origial miimizatio problem for θ ad θ 0 is recovered by maximizig J(θ, θ 0 ; α) with respect to α. I other words, J(θ, θ 0 ) = max J(θ, θ 0 ; α) (3) α 0 where α 0 meas that all the compoets α t are o-egative. Let s try to see first that J(θ, θ 0 ) really is equivalet to the origial problem. Suppose we set θ ad θ 0 such that at least oe of the costraits, say the oe correspodig to (x i, y i ), is violated. I that case [ ] α i y i (θ T φ(x i ) + θ 0 ) 1 > 0 (4) for ay α i > 0. We ca the set α i = to obtai J(θ, θ 0 ) =. You ca thik of the Lagrage multipliers playig a adversarial role to eforce the margi costraits. More Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall MIT OpeCourseWare ( Massachusetts Istitute of Techology. Dowloaded o [DD Moth YYYY].

2 6.867 Machie learig, lecture 8 (Jaakkola) 2 formally, { θ J(θ, θ 0 ) = 2 /2 if y t (θ T φ(x t ) + θ 0 ) 1, t = 1,...,, otherwise (5) So the miimizig θ ad θ 0 are therefore those that satisfy the costraits. O the basis of a geeral set of criteria goverig the optimality whe dealig with Lagrage multipliers, criteria kow as Slater coditios, we ca actually switch the maximizig over α ad the miimizatio over {θ, θ 0 } ad get the same aswer: mi max J(θ, θ 0 ; α) = max mi J(θ, θ 0 ; α) (6) θ,θ 0 α 0 α 0 θ,θ 0 The left had side, equivalet to miimizig Eq.(5), is kow as the primal form, while the right had side is the dual form. Let s solve the right had side by first obtaiig θ ad θ 0 as a fuctio of the Lagrage multipliers (ad the data). To this ed d J(θ, θ 0 ; α) = α t y t = 0 (7) dθ 0 d dθ J(θ, θ 0; α) = θ α t y t φ(x t ) = 0 (8) So, agai the solutio for θ is i the spa of the feature vectors correspodig to the traiig examples. Substitutig this form of the solutio for θ back ito the objective, ad takig ito accout the costrait correspodig to the optimal θ 0, we get J(α) = mi J(θ, θ 0 ; α) (9) θ,θ 0 { = α t (1/2) i=1 j=1 α iα j y i y j [φ(x i ) T φ(x j )], if α ty t = 0 (10), otherwise The dual form of the solutio is therefore obtaied by maximizig α t (1/2) α i α j y i y j [φ(x i ) T φ(x j )], (11) i=1 j=1 subject to α t 0, α t y t = 0 (12) This is the dual or kerel form of the support vector machie, ad is also a quadratic optimizatio problem. The costraits are simpler, however. Moreover, the dimesio of Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall MIT OpeCourseWare ( Massachusetts Istitute of Techology. Dowloaded o [DD Moth YYYY].

3 6.867 Machie learig, lecture 8 (Jaakkola) 3 the iput vectors does ot appear explicitly as part of the optimizatio problem. It is formulated solely o the basis of the Gram matrix: φ(x 1 ) T φ(x 1 ) φ(x 1 ) T φ(x ) K = (13) φ(x ) T φ(x 1 )... φ(x ) T φ(x ) We have already see that the maximum margi hyperplae ca be costructed o the basis of oly a subset of the traiig examples. This should also also i terms of the feature vectors. How will this be maifested i the ˆα t s? May of them will be exactly zero due to the optimizatio. I fact, they are o-zero oly for examples (feature vectors) that are support vectors. Oce we have solved for ˆα t, we ca classify ay ew example accordig to the discrimiat fuctio ŷ(x) = θˆt φ(x) + θˆ0 (14) = αˆty t [φ(x t ) T φ(x)] + θˆ0 (15) = αˆty t [φ(x t ) T φ(x)] + θˆ0 (16) t SV where SV is the set of support vectors correspodig to o-zero values of α t. We do t kow which examples (feature vectors) become as support vectors util we have solved the optimizatio problem. Moreover, the idetity of the support vectors will deped o the feature mappig or the kerel fuctio. But what is θˆ0? It appeared to drop out of the optimizatio problem. We ca set θ 0 after solvig for ˆα t by lookig at the support vectors. Ideed, for all i SV we should have y i (θˆt φ(x i ) + θˆ0) = y i αˆt[φ(x t ) T φ(x i )] + y i θˆ0 = 1 (17) t SV from which we ca easily solve for θˆ0. I priciple, selectig ay support vector would suffice but sice we typically solve the quadratic program over α t s oly up to some resolutio, these costraits may ot be satisfied with equality. It is therefore advisable to costruct θˆ0 as the media value of the solutios implied by the support vectors. What is the geometric margi we attai with some kerel fuctio K(x, x ) = φ(x) T φ(x )? Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall MIT OpeCourseWare ( Massachusetts Istitute of Techology. Dowloaded o [DD Moth YYYY].

4 6.867 Machie learig, lecture 8 (Jaakkola) 4 It is still 1/ θˆ. I a kerel form ( ) 1/2 γˆgeom = αˆiαˆjy i y j K(x i, x j ) (18) i=1 j=1 Would it make sese to compare geometric margis we attai with differet kerels? We could perhaps use it as a criterio for selectig the best kerel fuctio. Ufortuately this wo t work without some care. For example, if we multiply all the feature vectors by 2, the the resultig geometric margi will also be twice as large (we just expaded the space; the relatios betwee the poits remai the same). It is ecessary to perform some ormalizatio before ay compariso makes sese. We have so far assumed that the examples i their feature represetatios are liearly separable. We d also like to have the kerel form of the relaxed support vector machie formulatio miimize θ 2 /2 + C ξ t (19) subject to y t (θ T φ(x t ) + θ 0 ) 1 ξ t, t = 1,..., (20) The resultig dual form is very similar to the simple oe we derived above. I fact, the oly differece is that the Lagrage multipliers α t are ow also bouded from above by C (the same C as i the above primal formulatio). Ituitively, the Lagrage multipliers α t serve to eforce the classificatio costraits ad adopt larger values for costraits that are harder to satisfy. Without ay upper limit, they would simply reach for ay costrait that caot be satisfied. The limit C specifies the poit whe we should stop from tryig to satisfy such costraits. More formally, the dual form is α t (1/2) α i α j y i y j [φ(x i ) T φ(x j )], (21) i=1 j=1 subject to 0 α t C, α t y t = 0 (22) The resultig discrimiat fuctio has the same form except that the ˆα t values ca be differet. What about θˆ0? To solve for θˆ0 we eed to idetify classificatio costraits that are satisfied with equality. These are o loger simply the oes for which ˆα t > 0 but those correspodig to 0 < αˆt < C. I other words, we have to exclude poits that violate the margi costraits. These are the oes for which ˆα t = C. Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall MIT OpeCourseWare ( Massachusetts Istitute of Techology. Dowloaded o [DD Moth YYYY].

5 6.867 Machie learig, lecture 8 (Jaakkola) 5 Kerel optimizatio Whether we are iterested i (liear) classificatio or regressio we are faced with the problem of selectig a appropriate kerel fuctio. A step i this directio might be to tailor a particular kerel a bit better to the available data. We could, for example, itroduce additioal parameters i the kerel ad optimize those parameters so as to improve the performace. These parameters could be simple as the β parameter i the radial basis kerel, weight each dimesio of the iput vectors, or more flexible as fidig the best covex combiatio of basic (fixed) kerels. Key to such a approach is the measure we would optimize. Ideally, this measure would be the geeralizatio error but we obviously have to settle for a surrogate measure. The surrogate measure could be cross-validatio or a alterative criterio related to the geeralizatio error (e.g., margi). Kerel selectio We ca also explicitly select amog possible kerels ad cast the problem as a model selectio problem. By choosig a kerel we specify the feature vectors o the basis of which liear predictios are made. Each model 1 (class) refers to a set of liear fuctios (classifiers) based o the chose feature represetatio. I may cases the models are ested i the sese that the more complex model cotais the simpler oe. We will cotiue from this further at the ext lecture. 1 I statistics, a model is a family/set of distributios or a family/set of liear separators. Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall MIT OpeCourseWare ( Massachusetts Istitute of Techology. Dowloaded o [DD Moth YYYY].

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Topics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression

Topics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression 6.867 Machie learig: lecture 3 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics Beod liear regressio models additive regressio models, eamples geeralizatio ad cross-validatio populatio miimizer Statistical

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Systems of Particles: Angular Momentum and Work Energy Principle

Systems of Particles: Angular Momentum and Work Energy Principle 1 2.003J/1.053J Dyamics ad Cotrol I, Sprig 2007 Professor Thomas Peacock 2/20/2007 Lecture 4 Systems of Particles: Agular Mometum ad Work Eergy Priciple Systems of Particles Agular Mometum (cotiued) τ

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Questions and answers, kernel part

Questions and answers, kernel part Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Lecture 6 Ecient estimators. Rao-Cramer bound.

Lecture 6 Ecient estimators. Rao-Cramer bound. Lecture 6 Eciet estimators. Rao-Cramer boud. 1 MSE ad Suciecy Let X (X 1,..., X) be a radom sample from distributio f θ. Let θ ˆ δ(x) be a estimator of θ. Let T (X) be a suciet statistic for θ. As we have

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES.

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES. ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES. ANDREW SALCH 1. The Jacobia criterio for osigularity. You have probably oticed by ow that some poits o varieties are smooth i a sese somethig

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

Learning Bounds for Support Vector Machines with Learned Kernels

Learning Bounds for Support Vector Machines with Learned Kernels Learig Bouds for Support Vector Machies with Leared Kerels Nati Srebro TTI-Chicago Shai Be-David Uiversity of Waterloo Mostly based o a paper preseted at COLT 06 Kerelized Large-Margi Liear Classificatio

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min) Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

1 The Primal and Dual of an Optimization Problem

1 The Primal and Dual of an Optimization Problem CS 189 Itroductio to Machie Learig Fall 2017 Note 18 Previously, i our ivestigatio of SVMs, we forulated a costraied optiizatio proble that we ca solve to fid the optial paraeters for our hyperplae decisio

More information

Linear Programming and the Simplex Method

Linear Programming and the Simplex Method Liear Programmig ad the Simplex ethod Abstract This article is a itroductio to Liear Programmig ad usig Simplex method for solvig LP problems i primal form. What is Liear Programmig? Liear Programmig is

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Notes for Lecture 5. 1 Grover Search. 1.1 The Setting. 1.2 Motivation. Lecture 5 (September 26, 2018)

Notes for Lecture 5. 1 Grover Search. 1.1 The Setting. 1.2 Motivation. Lecture 5 (September 26, 2018) COS 597A: Quatum Cryptography Lecture 5 (September 6, 08) Lecturer: Mark Zhadry Priceto Uiversity Scribe: Fermi Ma Notes for Lecture 5 Today we ll move o from the slightly cotrived applicatios of quatum

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Hybridized Heredity In Support Vector Machine

Hybridized Heredity In Support Vector Machine Hybridized Heredity I Suort Vector Machie May 2015 Hybridized Heredity I Suort Vector Machie Timothy Idowu Yougmi Park Uiversity of Wiscosi-Madiso idowu@stat.wisc.edu yougmi@stat.wisc.edu May 2015 Abstract

More information

Injections, Surjections, and the Pigeonhole Principle

Injections, Surjections, and the Pigeonhole Principle Ijectios, Surjectios, ad the Pigeohole Priciple 1 (10 poits Here we will come up with a sloppy boud o the umber of parethesisestigs (a (5 poits Describe a ijectio from the set of possible ways to est pairs

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

Lecture 2 October 11

Lecture 2 October 11 Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Simple Polygons of Maximum Perimeter Contained in a Unit Disk

Simple Polygons of Maximum Perimeter Contained in a Unit Disk Discrete Comput Geom (009) 1: 08 15 DOI 10.1007/s005-008-9093-7 Simple Polygos of Maximum Perimeter Cotaied i a Uit Disk Charles Audet Pierre Hase Frédéric Messie Received: 18 September 007 / Revised:

More information

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A. Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix

More information

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem Optimizatio Methods: Liear Programmig Applicatios Assigmet Problem Itroductio Module 4 Lecture Notes 3 Assigmet Problem I the previous lecture, we discussed about oe of the bech mark problems called trasportatio

More information

Ω ). Then the following inequality takes place:

Ω ). Then the following inequality takes place: Lecture 8 Lemma 5. Let f : R R be a cotiuously differetiable covex fuctio. Choose a costat δ > ad cosider the subset Ωδ = { R f δ } R. Let Ωδ ad assume that f < δ, i.e., is ot o the boudary of f = δ, i.e.,

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

CALCULATION OF FIBONACCI VECTORS

CALCULATION OF FIBONACCI VECTORS CALCULATION OF FIBONACCI VECTORS Stuart D. Aderso Departmet of Physics, Ithaca College 953 Daby Road, Ithaca NY 14850, USA email: saderso@ithaca.edu ad Dai Novak Departmet of Mathematics, Ithaca College

More information

Time-Domain Representations of LTI Systems

Time-Domain Representations of LTI Systems 2.1 Itroductio Objectives: 1. Impulse resposes of LTI systems 2. Liear costat-coefficiets differetial or differece equatios of LTI systems 3. Bloc diagram represetatios of LTI systems 4. State-variable

More information

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let.

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let. Joh Riley 8 Jue 03 ANSWERS TO EXERCISES IN APPENDIX B Sectio B VECTORS AND SETS Exercise B-: Covex sets (a) Let 0 x, x X, X, hece 0 x, x X ad 0 x, x X Sice X ad X are covex, x X ad x X The x X X, which

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

CHAPTER 5. Theory and Solution Using Matrix Techniques

CHAPTER 5. Theory and Solution Using Matrix Techniques A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

APPENDIX A SMO ALGORITHM

APPENDIX A SMO ALGORITHM AENDIX A SMO ALGORITHM Sequetial Miimal Optimizatio SMO) is a simple algorithm that ca quickly solve the SVM Q problem without ay extra matrix storage ad without usig time-cosumig umerical Q optimizatio

More information

6.867 Machine learning, lecture 11 (Jaakkola)

6.867 Machine learning, lecture 11 (Jaakkola) 6.867 Machie learig, lecture 11 (Jaakkola) 1 Lecture topics: moel selectio criteria Miimum escriptio legth (MDL) Feature (subset) selectio Moel selectio criteria: Miimum escriptio legth (MDL) The miimum

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 7 Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information