Machine Learning: Logistic Regression. Lecture 04
|
|
- Kimberly Logan
- 6 years ago
- Views:
Transcription
1 Machie Learig: Logistic Regressio Razva C. Buescu School of Electrical Egieerig ad Computer Sciece
2 Supervised Learig ask = lear a uko fuctio t : X that maps iput istaces x Î X to output targets tx Î : Classificatio: he output tx Î is oe of a fiite set of discrete categories. Regressio: he output tx Î is cotiuous, or has a cotiuous compoet. arget fuctio tx is ko oly through oisy set of traiig examples: x 1,t 1, x 2,t 2, x,t
3 Supervised Learig raiig raiig Examples x k, t k Learig Algorithm Model h estig est Examples x, t Model h Geeralizatio Performace
4 Parametric Approaches to Supervised Learig ask = build a fuctio hx such that: h matches t ell o the traiig data: => h is able to fit data that it has see. h also matches t ell o test data: => h is able to geeralize to usee data. ask = choose h from a ice class of fuctios that deped o a vector of parameters : hx º h x º h,x hat classes of fuctios are ice?
5 euros Soma is the cetral part of the euro: here the iput sigals are combied. Dedrites are cellular extesios: here majority of the iput occurs. Axo is a fie, log projectio: carries erve sigals to other euros. Syapses are molecular structures betee axo termials ad other euros: here the commuicatio takes place.
6 euro Models
7 Spikig/LIF euro Fuctio
8 euro Models
9 McCulloch-Pitts euro Fuctio x activatio / output fuctio x 1 x Σ i x i f h x x 3 Algebraic iterpretatio: he output of the euro is a liear combiatio of iputs from other euros, rescaled by the syaptic eights. eights i correspod to the syaptic eights activatig or ihibitig. summatio correspods to combiatio of sigals i the soma. It is ofte trasformed through a activatio / output fuctio.
10 Activatio Fuctios " $ uit step f z = # %$ Perceptro 0 if z < 0 1 if z logistic f z = 1+ e z Logistic Regressio idetity f z = z Liear Regressio 0
11 Liear Regressio x activatio / output fuctio x Σ f x 2 3 i x i f z = z h x = i x i x 3 Polyomial curve fittig is Liear Regressio: x = φx = [1, x, x 2,..., x M ] hx = x
12 McCulloch-Pitts euro Fuctio x activatio / output fuctio x 1 x Σ i x i f h x x 3 Algebraic iterpretatio: he output of the euro is a liear combiatio of iputs from other euros, rescaled by the syaptic eights. eights i correspod to the syaptic eights activatig or ihibitig. summatio correspods to combiatio of sigals i the soma. It is ofte trasformed through a mootoic activatio / output fuctio.
13 Logistic Regressio x 0 x 1 x 2 x Σ activatio fuctio f i x i 1 h x = 3 1 f z = 1+ exp x 1+ exp z raiig set is x 1,t 1, x 2,t 2, x,t. x = [1, x 1, x 2,..., x k ] hx = σ x Ca be used for both classificatio ad regressio: Classificatio: = {C 1, C 2 } = {1, 0}. Regressio: = [0, 1] i.e. output eeds to be ormalized.
14 Logistic Regressio for Biary Classificatio Model output ca be iterpreted as posterior class probabilities: pc 1 x = σ x = 1 1+ exp x pc 2 x =1 σ x = exp x 1+ exp x Ho do e trai a logistic regressio model? What error/cost fuctio to miimize?
15 Logistic Regressio Learig Learig = fidig the right parameters = [ 0, 1,, k ] Fid that miimizes a error fuctio E hich measures the misfit betee hx, ad t. Expect that hx, performig ell o traiig examples x Þ hx, ill perform ell o arbitrary test examples x Î X. Least Squares error fuctio? E = 1 2 =1 {hx, t } 2 Differetiable => ca use gradiet descet o-covex => ot guarateed to fid the global optimum
16 Maximum Likelihood raiig set is D = {áx, t ñ t Î {0,1}, Î 1 } Let h = pc 1 x h = pt =1 x = σ x Maximum Likelihood ML priciple: fid parameters that maximize the likelihood of the labels. he likelihood fuctio is pt = h t 1 h 1 t =1 he egative log-likelihood cross etropy error fuctio: { } E = l pt x = t lh + 1 t l1 h =1
17 Maximum Likelihood Learig for Logistic Regressio he ML solutio is: ML = argmax pt = argmi E covex i ML solutio is give by ÑE = 0. Caot solve aalytically => solve umerically ith gradiet based methods: stochastic gradiet descet, cojugate gradiet, L-BFGS, etc. Gradiet is prove it: E = =1 h t x
18 Regularized Logistic Regressio Use a Gaussia prior over the parameters: = [ 0, 1,, M ] Bayes heorem: MAP solutio: þ ý ü î í ì - ø ö ç è æ = = + - I 0 M Ν p 2 exp 2, 2 1 / 1 a p a a t t t t p p p p p p µ = max arg t p MAP =
19 Regularized Logistic Regressio MAP solutio: = MAP arg max p t = arg max p t p = arg mi- l p t p = arg mi- l p t - l p = arg mi E D - l p a = arg mi E D + 2 = arg mie D + E E E D = - å{ t l y tl1 - y } = 1 a = 2 regularizatio term data term
20 Regularized Logistic Regressio MAP solutio: MAP = arg mi E D + E still covex i ML solutio is give by ÑE = 0. ÑE = ÑE D + ÑE Caot solve aalytically => solve umerically: = h t x +α =1 stochastic gradiet descet [PRML 3.1.3], eto Raphso iterative optimizatio [PRML 4.3.3], cojugate gradiet, LBFGS. here h = σ x
21 Softmax Regressio = Logistic Regressio for Multiclass Classificatio Multiclass classificatio: = {C 1, C 2,..., C K } = {1, 2,..., K}. raiig set is x 1,t 1, x 2,t 2, x,t. x = [1, x 1, x 2,..., x M ] t 1, t 2, t Î {1, 2,..., K} Oe eight vector per class [PRML 4.3.4]: pc k x = exp k x exp j x j
22 Softmax Regressio K ³ 2 Iferece: C* = arg max p Ck x C k = argmax C k exp k x exp j x j Zx a ormalizatio costat raiig usig: = argmax C k exp k x = argmax C k k x Maximum Likelihood ML Maximum A Posteriori MAP ith a Gaussia prior o.
23 Softmax Regressio he egative log-likelihood error fuctio is: E D = 1 l pt x = 1 = 1 =1 =1 l exp t x Zx K =1 k=1 δ k t l exp k x Zx covex i here d x t = ì1 í î0 x x = ¹ t t is the Kroecker delta fuctio.
24 Softmax Regressio he ML solutio is: = arg mi ML E D he gradiet is prove it: k E D = 1 = 1 =1 =1 δ k t pc k x x δ k t exp k x Zx x E D = " # E 1 D, 2 E D,, K E D $ %
25 Regularized Softmax Regressio he e cost fuctio is: E = E D + E = 1 K δ k t l exp k x + α Zx 2 =1 k=1 K k=1 k k he e gradiet is prove it: k E = 1 =1 δ k t pc k x x +α k
26 Softmax Regressio ML solutio is give by ÑE D = 0. Caot solve aalytically. Solve umerically, by plugig [cost, gradiet] = [E D, ÑE D ] values ito geeral covex solvers: L-BFGS eto methods cojugate gradiet stochastic / miibatch gradiet-based methods. gradiet descet ith / ithout mometum. AdaGrad, AdaDelta RMSProp ADAM,...
27 Implemetatio eed to compute [cost, gradiet]: cost = 1 gradiet k δ k t l pc k x + α 2 =1 k=1 => eed to compute, for k = 1,..., K: K = 1 =1 K k=1 k k δ k t pc k x x +α k output pc k x = exp k x exp j x j Overflo he k x are too large.
28 Implemetatio: Prevetig Overflos Subtract from each product k x the maximum product: c = max k x 1 k K pc k x = exp k x c exp j x c j
29 Implemetatio: Gradiet Checkig Wat to miimize Jθ, here θ is a scalar. Mathematical defiitio of derivative: d dθ J θ = lim * J θ + ε Jθ ε 2ε umerical approximatio of derivative: d Jθ +ε Jθ ε Jθ dθ 2ε here ε =
30 Implemetatio: Gradiet Checkig If θ is a vector of parameters θ i, Compute umerical derivative ith respect to each θ i. Create a vector v that is ε i positio i ad 0 everyhere else: Ho do you do this ithout a for loop i umpy? Compute G um θ i = Jθ +v Jθ v / 2ε Aggregate all derivatives ito umerical gradiet G um θ. Compare umerical gradiet G um θ ith implemetatio of gradiet G imp θ: G um θ G imp θ G um θ+ G imp θ 10 6
31 Implemetatio: Vectorizatio of LR Versio 1: Compute gradiet compoet-ise. E = =1 h t x Assume example x is stored i colum X[:,] i data matrix X. grad = p.zerosk for i rage: h = sigmoid.dotx[:,] temp = h t[] for k i ragek: grad[k] = grad[k] + temp * X[k,] def sigmoidx: retur 1 / 1 + p.exp x Lecture 03
32 Implemetatio: Vectorizatio of LR Versio 2: Compute gradiet, partially vectorized. E = =1 h t x grad = p.zerosk for i rage: grad = grad + sigmoid.dotx[] t[] * X[] def sigmoidx: retur 1 / 1 + p.exp x Lecture 03
33 Implemetatio: Vectorizatio of LR Versio 3: Compute gradiet, vectorized. E = grad = X.dotsigmoid.dotX t =1 h t x def sigmoidx: retur 1 / 1 + p.exp x Lecture 03
34 Vectorizatio of Softmax eed to compute [cost, gradiet]: cost = 1 gradiet k K δ k t l pc k x + α 2 =1 k=1 = 1 =1 k k => compute groud truth matrix G such that G[k,] = δ k t K k=1 δ k t pc k x x +α k from scipy.sparse import coo_matrix groudruth = coo_matrixp.oes, dtype = p.uit8, labels, p.arage.toarray
35 Vectorizatio of Softmax Compute cost = 1 δ k t l pc k x + α 2 =1 K k=1 K k=1 k k Compute matrix of 3 4 x 6. Compute matrix of 3 4 x 6 c 6. Compute matrix of exp 3 4 x 6 c 6. Compute matrix of l pc 3 x 6. Compute log-likelihood.
36 Vectorizatio of Softmax Compute grad k = 1 =1 δ k t pc k x x +α k Gradiet = [grad 1 grad 2 grad K ] Compute matrix of pc 3 x 6. Compute matrix of gradiet of data term. Compute matrix of gradiet of regularizatio term.
37 Vectorizatio of Softmax Useful umpy fuctios: p.dot p.amax p.argmax p.exp p.sum p.log p.mea
38 import scipy scipy.sparse.coo_matrix groudruth = coo_matrixp.oesumcases, dtype = p.uit8, labels, p.arageumcases.toarray scipy.optimize: scipy.optimize.fmi_l_bfgs_b theta, _, _ = fmi_l_bfgs_bsoftmaxcost, theta, args = umclasses, iputsize, decay, images, labels, maxiter = 100, disp = 1 scipy.optimize.fmi_cg scipy.miimize Lecture 03
39 Multiclass Logistic Regressio K ³ 2 1 rai oe eight vector per class [PRML Chapter 4.3.4]: p C k x = å exp kj x exp j x j j 2 More geeral approach: p C - Iferece: k x = å exp j x, Ck exp j x, C C* = arg max p Ck x C k j j Lecture 07 39
40 Logistic Regressio K ³ 2 2 Iferece i more geeral approach: C* = arg max p Ck x = raiig usig: C Maximum Likelihood ML k arg max exp j x, C k C k å exp j x, C j j = arg max exp j x, C C k = arg max j x, C C k k Maximum A Posteriori MAP ith a Gaussia prior o. k Lecture 07 Zx the partitio fuctio. 40
41 Logistic Regressio K ³ 2 ith ML he egative log-likelihood error fuctio is: he gradiet is prove it: 41 Lecture 07 å Õ = = = - = - D Z t t p E 1 1, exp l l x x x j ú û ù ê ë é = Ñ M D D D D E E E E,,, 1 0!,, k i K k k i i D C C p t E x x x j åj åå = = = + = - mi arg ML E D = covex i
42 Logistic Regressio K ³ 2 ith ML Set ÑE D = 0 Þ ML solutio satisfies: åji x, t = åå = 1 K = 1 k= 1 p C k x j x i, C k Þ for every feature j i, the observed value o D should be the same as the expected value o D! Solve umerically: Stochastic gradiet descet [chapter 3.1.3]. eto Raphso iterative optimizatio large Hessia!. Limited memory eto methods e.g. L-BFGS. Lecture 07 42
43 he Maximum Etropy Priciple Priciple of Isufficiet Reaso Priciple of Idifferece ca be traced back to Pierre Laplace ad Jacob Beroulli. Ø A. L. Berger, S. A. Della Pietra, ad V. J. Della Pietra A maximum etropy approach to atural laguage processig. Computatioal Liguistics, 221. model all that is ko ad assume othig about that hich is uko. give a collectio of facts, choose a model cosistet ith all the facts, but otherise as uiform as possible. Lecture 07 43
44 Maximum Likelihood Û Maximum Etropy 1 Maximize coditioal likelihood: 2 Maximize coditioal etropy: subject to: Þ solutio is: 44 Õ Õ = = = = Z t t p p 1 1, exp x x x t j max arg t p ML = log arg max 1 1 k K k k p ME C p C p p x x åå = = - =,, k K k k C C p t x x x j åj åå = = = =, exp ML ME Z t t p t p ML x x x x j = = Lecture 07
Machine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More informationWeek 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :
ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationME 539, Fall 2008: Learning-Based Control
ME 539, Fall 2008: Learig-Based Cotrol Neural Network Basics 10/1/2008 & 10/6/2008 Uiversity Orego State Neural Network Basics Questios??? Aoucemet: Homework 1 has bee posted Due Friday 10/10/08 at oo
More informationPerceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10
Perceptro Ier-product scalar Perceptro Perceptro learig rule XOR problem liear separable patters Gradiet descet Stochastic Approximatio to gradiet descet LMS Adalie 1 Ier-product et =< w, x >= w x cos(θ)
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationFMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu
FMA90F: Machie Learig Lecture 4: Liear Models for Classificatio Cristia Smichisescu Liear Classificatio Classificatio is itrisically o liear because of the traiig costraits that place o idetical iputs
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationΩ ). Then the following inequality takes place:
Lecture 8 Lemma 5. Let f : R R be a cotiuously differetiable covex fuctio. Choose a costat δ > ad cosider the subset Ωδ = { R f δ } R. Let Ωδ ad assume that f < δ, i.e., is ot o the boudary of f = δ, i.e.,
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationThe Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model
Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationECE 308 Discrete-Time Signals and Systems
ECE 38-5 ECE 38 Discrete-Time Sigals ad Systems Z. Aliyazicioglu Electrical ad Computer Egieerig Departmet Cal Poly Pomoa ECE 38-5 1 Additio, Multiplicatio, ad Scalig of Sequeces Amplitude Scalig: (A Costat
More informationAdmin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)
Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationClassification with linear models
Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationStep 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b
Logistic Regressio Step : Fuctio Set We wat to fid P w,b C x σ z = + exp z If P w,b C x.5, output C Otherwise, output C 2 z P w,b C x = σ z z = w x + b = w i x i + b i z Fuctio set: f w,b x = P w,b C x
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationOnline Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb
Olie Covex Optimizatio i the Badit Settig: Gradiet Descet Without a Gradiet -Aviash Atreya Feb 9 2011 Outlie Itroductio The Problem Example Backgroud Notatio Results Oe Poit Estimate Mai Theorem Extesios
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationLecture 2 October 11
Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More informationIntroduction to Optimization Techniques. How to Solve Equations
Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationLecture 7: Linear Classification Methods
Homeork Homeork Lecture 7: Liear lassificatio Methods Fial rojects? Grous Toics Proosal eek 5 Lecture is oster sessio, Jacobs Hall Lobb, sacks Fial reort 5 Jue. What is liear classificatio? lassificatio
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationStatistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23
18.650 Statistics for Applicatios Chapter 3: Maximum Likelihood Estimatio 1/23 Total variatio distace (1) ( ) Let E,(IPθ ) θ Θ be a statistical model associated with a sample of i.i.d. r.v. X 1,...,X.
More informationPattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad
More information10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice
0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct
More informationECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations
ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows
More informationDifferentiable Convex Functions
Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationTMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods
TMA4205 Numerical Liear Algebra The Poisso problem i R 2 : diagoalizatio methods September 3, 2007 c Eiar M Røquist Departmet of Mathematical Scieces NTNU, N-749 Trodheim, Norway All rights reserved A
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationChapter 9: Numerical Differentiation
178 Chapter 9: Numerical Differetiatio Numerical Differetiatio Formulatio of equatios for physical problems ofte ivolve derivatives (rate-of-chage quatities, such as velocity ad acceleratio). Numerical
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationMachine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008
Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLinear Associator Linear Layer
Hebbia Learig opic 6 Note: lecture otes by Michael Negevitsky (uiversity of asmaia) Bob Keller (Harvey Mudd College CA) ad Marti Haga (Uiversity of Colorado) are used Mai idea: learig based o associatio
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More information18.657: Mathematics of Machine Learning
18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,
More information5 : Exponential Family and Generalized Linear Models
0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationOrthogonal Gaussian Filters for Signal Processing
Orthogoal Gaussia Filters for Sigal Processig Mark Mackezie ad Kiet Tieu Mechaical Egieerig Uiversity of Wollogog.S.W. Australia Abstract A Gaussia filter usig the Hermite orthoormal series of fuctios
More informationOverview. Structured learning for feature selection and prediction. Motivation for feature selection. Outline. Part III:
Overview Structured learig for feature selectio ad predictio Yookyug Lee Departmet of Statistics The Ohio State Uiversity Part I: Itroductio to Kerel methods Part II: Learig with Reproducig Kerel Hilbert
More informationTopics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression
6.867 Machie learig: lecture 3 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics Beod liear regressio models additive regressio models, eamples geeralizatio ad cross-validatio populatio miimizer Statistical
More informationChapter 7 Maximum Likelihood Estimate (MLE)
Chapter 7 aimum Likelihood Estimate (LE) otivatio for LE Problems:. VUE ofte does ot eist or ca t be foud . BLUE may ot be applicable ( Hθ w) Solutio: If the PDF
More informationSignals and Systems. Problem Set: From Continuous-Time to Discrete-Time
Sigals ad Systems Problem Set: From Cotiuous-Time to Discrete-Time Updated: October 5, 2017 Problem Set Problem 1 - Liearity ad Time-Ivariace Cosider the followig systems ad determie whether liearity ad
More informationJacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3
No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies
More informationLecture 9: September 19
36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace
More informationPH 411/511 ECE B(k) Sin k (x) dk (1)
Fall-26 PH 4/5 ECE 598 A. La Rosa Homework-2 Due -3-26 The Homework is iteded to gai a uderstadig o the Heiseberg priciple, based o a compariso betwee the width of a pulse ad the width of its spectral
More information4. Linear Classification. Kai Yu
4. Liear Classificatio Kai Y Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2 Liear Classifiers A simplest classificatio model
More information15-780: Graduate Artificial Intelligence. Density estimation
5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J
More informationMultilayer perceptrons
Multilayer perceptros If traiig set is ot liearly separable, a etwork of McCulloch-Pitts uits ca give a solutio If o loop exists i etwork, called a feedforward etwork (else, recurret etwork) A two-layer
More informationFrequency Response of FIR Filters
EEL335: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we itroduce the idea of the frequecy respose of LTI systems, ad focus specifically o the frequecy respose of FIR filters.. Steady-state
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationLecture 24: Variable selection in linear models
Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationEE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course
Sigal-EE Postal Correspodece Course 1 SAMPLE STUDY MATERIAL Electrical Egieerig EE / EEE Postal Correspodece Course GATE, IES & PSUs Sigal System Sigal-EE Postal Correspodece Course CONTENTS 1. SIGNAL
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationCSIE/GINM, NTU 2009/11/30 1
Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU sdli@csie.tu.edu.tw 009/11/30 1 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009
More informationPattern Classification, Ch4 (Part 1)
Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationMachine Learning Lecture 10
Today s Topic Machie Learig Lecture 10 Neural Networks 26.11.2018 Bastia Leibe RWTH Aache http://www.visio.rwth-aache.de leibe@visio.rwth-aache.de Deep Learig 2 Course Outlie Recap: AdaBoost Adaptive Boostig
More informationFor a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.
Closed Leotief Model Chapter 6 Eigevalues I a closed Leotief iput-output-model cosumptio ad productio coicide, i.e. V x = x = x Is this possible for the give techology matrix V? This is a special case
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationPC5215 Numerical Recipes with Applications - Review Problems
PC55 Numerical Recipes with Applicatios - Review Problems Give the IEEE 754 sigle precisio bit patter (biary or he format) of the followig umbers: 0 0 05 00 0 00 Note that it has 8 bits for the epoet,
More informationApply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.
Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α
More informationREGRESSION (Physics 1210 Notes, Partial Modified Appendix A)
REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data
More informationLecture 11: Decision Trees
ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces
More informationAbstract Vector Spaces. Abstract Vector Spaces
Astract Vector Spaces The process of astractio is critical i egieerig! Physical Device Data Storage Vector Space MRI machie Optical receiver 0 0 1 0 1 0 0 1 Icreasig astractio 6.1 Astract Vector Spaces
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More information