Feature Selection: Part 1


 Bernice Parks
 4 years ago
 Views:
Transcription
1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n? In the prevous lecture, we examned rdge regresson and provded a dmenson free rate of convergence. Now let us examne feature selecton. 2 Feature Selecton Let us suppose there are s relevant features out of the d possble features. Throughout ths analyss, let us assume that: Y = Xw + η, where Y R n and X R n d. We assume that the support of w (the number of nonzero entres) s s. 2.1 Loss Mnmzaton (Emprcal Rsk Mnmzaton) Defne our emprcal loss as: whch has no expectaton over Y. ˆL(w) = 1 Xw Y 2 n Suppose we knew the support sze s. One algorthm s to smply fnd the estmator whch mnmzes the emprcal loss and has support only on s coordnates. In partcular, consder the estmator: ŵ subset selecton = arg mn support(w) s ˆL(w) where the nf s over vectors wth support sze s. Computng ths estmator s not computatonally tractable n general (the nave algorthm runs n tme d s ). Furthermore, fndng the best subset s known to be an NPhard problem. How much better s ths estmator better than the nave estmator? Recall the rsk s: where the expectaton s over Y. We have the followng theorem: R(ŵ subset selecton ) = E Y ŵ T w 2 Σ Theorem 2.1. Suppose the support of w s bounded by s. We have that the rsk s bounded as: (where c sa unversal constant). R(ŵ subset selecton ) c s log d n σ2 1
2 2.2 Coordnate dependence? Clearly, the coordnates system s mportant here, as the support s defned wth respect to ths coordnate system. However, note that the scale n each coordnate s rrelevant here. In contrast, note the emprcal rsk mnmzaton does not depend on the coordnate system. 3 Norms The l p of a vector x s: The l 0 norm s defned as: x p = ( x p ) 1 p x p = { x 0} whch s the number of nonzero entres n x. Techncally, the l 0 norm s not a norm. 4 Lasso Let us vew the the subset selecton problem as a regularzed problem. A relaxed verson of a hard constrant on the sze of the subset would be to mnmze: ˆL(w) + λ w 0 One can show that for an approprate choce of λ ths algorthm also enjoys the same rsk guarantee of the hard constraned subset selecton algorthm (up to constants). Unfortunately, mnmzng ths objectve functon s also not computatonally tractable. A natural convex relaxaton s to nstead consder mnmzng the followng: F (w) = ˆL(w) + λ w 1 whch can be vewed as a convex relaxaton to the l 0 problem. Ths s referred to as the Lasso. 4.1 Coordnate Scalngs Often t s a good dea to transform the data so that the varance along each coordnate s 1. In other words, for each coordnate j, t often makes sense to do the followng transformaton: X,j X,j /Z where Z j = 1 n Intutvely, ths s to remove an arbtrary scale factor. A more precse reason for ths wll be dscussed n the next lecture. X 2,j 2
3 4.2 Optmzaton & Coordnate Descent The 1dmensonal case: Suppose that we are n n the 1dmensonal case where each x s a scalar and so w s a scalar. The lasso problem s then to mnmze: (y wx ) 2 + λ w where w s the absolute value functon. To mnmze ths functon, we can agan set the gradent to 0 and solve. A subtlety here s that the absolute value functon s nondfferentable at 0. Note that for any w 0 the gradent s: 2 x (y wx ) + λsgn(w) (1) where sgn(w) s 1 f w s postve and 1 f w s negatve. There are three cases to check. If the mnmzer w s postve, then we know that the frst order condton mples that: w = y x λ/2 x2 If we compute the rght hand sde and t s postve, then ndeed ths value s the mnmzer. Now suppose the mnmzer w s negatve, then we know that the frst order condton mples that: w = y x + λ/2 x2 If we compute the rght hand sde and t s negatve, then ndeed ths value s the mnmzer. Now suppose w s 0. Note that w s not dfferentable at 0. Here, one can show that we must have that: 2 y x [ λ, λ] So f we compute the left hand sde and t s n the nterval [ λ, λ] then w = 0 s a mnmzer. To see ths, consder any small perturbaton so that w = ɛ. Suppose ɛ > 0. For suffcently small ɛ, the frst term n (1) wll stll be n the nterval [ λ, λ] and so the gradent wll be strctly postve (for small ɛ). Thus gradent descent wll push us back to 0. Smlarly, for ɛ < 0, we wll move back to 0. Formally, the subgradent of w can take any value n [ 1, 1], whch s a vald tangent plane (see the wkpeda defnton). Coordnate Ascent: The coordnate ascent algorthm for mnmzng an objectve functon F (w 1, w 2,... w n ) s as follows: 1. Intalze: w = 0 2. choose a coordnate (e.g. at random) 3. update w as follows: w arg mn z R F (w 1,..., w 1, z, w +1,... w d ) where the optmzaton s over the th coordnate (holdng the other coordnates fxed). Then return to step 2. Clearly, many natural varants are possble. 3
4 5 Relatonshp of Lasso to Compressed Sensng As we dscussed earler, regresson can be vewed as fndng an approxmate soluton to an (nconsstent) lnear system of equatons. In compressed sensng, we are dealng wth the settng were the system of equatons s consstent,.e. Aw = b has a soluton. Suppose we are n the case where A s of sze n d and d > n, so there are multple solutons. In partcular, we seek the sparsest soluton: As before, ths problem s not computatonally tractable. mn w 0 s.t. Ax = b w The convex relaxaton s the followng optmzaton problem: mn w 1 s.t. Ax = b w Smlar to the case of lasso, under certan assumptons ths can recover the soluton to the l 0 problem. 6 Greedy Algorthms There are varety of greedy algorthms and numerous namng conventons for these algorthms. These algorthms must rely on some stoppng condton (or some condton to lmt the sparsty level of the soluton). 6.1 Stagewse Regresson / Matchng Pursut / Boostng Here, we typcally do no regularze our objectve functon and, nstead, drectly deal wth the emprcal loss ˆL(w 1, w 2,... w n ). Ths class of algorthms for mnmzng an objectve functon ˆL(w 1, w 2,... w n ) s as follows: 1. Intalze: w = 0 2. choose the coordnate whch can result n the greatest decrease n error,.e. 3. update w as follows: arg mn mn z R F (w 1,..., w 1, z, w +1 w arg mn z R F (w 1,..., w 1, z, w +1,... w d ) where the optmzaton s over the th coordnate (holdng the other coordnates fxed). 4. Whle some termnaton condton s not met, return to step 2. Ths termnaton condton can be lookng at the error on some holdout set or smply just runnng the algorthm for some predetermned number of steps. Varants: Clearly, many varants are possble. Sometmes (for loss functons other than the square loss) t s costly to do the mnmzaton exactly so we sometmes choose based on another method (e.g. the magntude of the gradent of a coordnate). We could also reoptmze all the weghts of all those features whch were are currently added. Also, sometmes we do backward steps where we try to prune away some of the features whch are added. Relaton to boostng: In boostng, we sometmes do not explctly enumerate the set of all features. Instead, we have a weak learner whch provdes us wth a new feature. The mportance of ths vewpont s that sometmes t s dffcult to enumerate the set of all features (e.g. our features could be decson trees, so our feature vector x could be of dmenson the number of possble tress). Instead, we just assume some oracle whch n step 2 whch provdes us wth a feature. There are numerous varants. 4
5 6.2 Stepwse Regresson / Orthogonal Matchng Pursut Note that the prevous algorthm fnds by only checkng the mprovement n performance keepng all the other varables fxed. At any gven teraton, we have some subset S of features whose weghts are not 0. Instead, when determnng whch coordnate to add, we could look mprovement based on reoptmzng the weghts on the full set S {}. Ths s a more costly procedure computatonally, though there are some ways to reduce the computatonal cost. 5
Generalized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationFeature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)
CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: GuoJun QI
Logstc Regresson CAP 561: achne Learnng Instructor: GuoJun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate classcondtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the faketest data; fxed
More information10701/ Machine Learning, Fall 2005 Homework 3
10701/15781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons10701@autonlaborg for queston Problem 1 Regresson and Crossvaldaton [40
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More informationLecture 20: November 7
0725/36725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationLecture 17: LeeSidford Barrier
CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: LeeSdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the
More informationU.C. Berkeley CS294: Beyond WorstCase Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond WorstCase Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso
Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularzaton, Sparsty Lasso 1 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Functon,b L( y, f
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationThe Experts/Multiplicative Weights Algorithm and Applications
Chapter 2 he Experts/Multplcatve Weghts Algorthm and Applcatons We turn to the problem of onlne learnng, and analyze a very powerful and versatle algorthm called the multplcatve weghts update algorthm.
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct  202008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worstcase optmal, one may wonder how well t would
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons Â and ˆb avalable. Then the best thng we can do s to solve Âˆx ˆb exactly whch
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationCS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of
More informationOutline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique
Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) 1 Knapsac Problem ( 5.3.3) Matrx ChanProduct ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 OO 4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 OO 5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fntedmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X  vector of observatons (features) n the world Y  state (class) of the world x X
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationBezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0
Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More informationWe present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.
CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More informationCSCI B609: Foundations of Data Science
CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/datascenceclass.html Grgory Yaroslavtsev http://grgory.us Constraned Convex
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationSupplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso
Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the elementwse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationLectures  Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures  Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Multarmed
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenuemaxmzng assortment of products to offer when the prces of products are fxed.
More information3.1 Expectation of Functions of Several Random Variables. )' be a kdimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, MH Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrapup In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationEcon Statistical Properties of the OLS estimator. Sanjaya DeSilva
Econ 39  Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate
More informationn α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0
MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationIntegrals and Invariants of EulerLagrange Equations
Lecture 16 Integrals and Invarants of EulerLagrange Equatons ME 256 at the Indan Insttute of Scence, Bengaluru Varatonal Methods and Structural Optmzaton G. K. Ananthasuresh Professor, Mechancal Engneerng,
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationExcess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More information4DVAR, according to the name, is a fourdimensional variational method.
4DVaratonal Data Assmlaton (4DVar) 4DVAR, accordng to the name, s a fourdmensonal varatonal method. 4DVar s actually a drect generalzaton of 3DVar to handle observatons that are dstrbuted n tme. The
More informationWhich Separator? Spring 1
Whch Separator? 6.034  Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034  Sprng Whch Separator? Mamze the margn to closest ponts 6.034  Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationStatistical Analysis of Bayes Optimal Subset Ranking
Statstcal Analyss of Bayes Optmal Subset Rankng Davd Cossock Yahoo Inc., Santa Clara, CA, USA dcossock@yahoonc.com Tong Zhang Yahoo Inc., New York Cty, USA tzhang@yahoonc.com Abstract The rankng problem
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 111 Emprcal Models 112 Smple Lnear Regresson 113 Propertes of the Least Squares Estmators 114 Hypothess Test n Smple Lnear Regresson 114.1 Use of ttests
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5*  NOTE (Summary) ECON 5*  NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationConvergence of random processes
DSGA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationA Delaytolerant ProximalGradient Algorithm for Distributed Learning
A Delaytolerant ProxmalGradent Algorthm for Dstrbuted Learnng Konstantn Mshchenko Franck Iutzeler Jérôme Malck Massh Amn KAUST Unv. Grenoble Alpes CNRS and Unv. Grenoble Alpes Unv. Grenoble Alpes ICML
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
KangweonKyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROWACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10708: Probablstc Graphcal Models 10708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. CauchyBunyakovskySchwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra PerOlof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for ConsumptionSaving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for ConsumptonSavng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationAdvanced Introduction to Machine Learning
Advanced Introducton to Machne Learnng 10715, Fall 2014 The Kernel Trck, Reproducng Kernel Hlbert Space, and the Representer Theorem Erc Xng Lecture 6, September 24, 2014 Readng: Erc Xng @ CMU, 2014 1
More informationA New Refinement of Jacobi Method for Solution of Linear System Equations AX=b
Int J Contemp Math Scences, Vol 3, 28, no 17, 819827 A New Refnement of Jacob Method for Soluton of Lnear System Equatons AX=b F Naem Dafchah Department of Mathematcs, Faculty of Scences Unversty of Gulan,
More informationMean Field / Variational Approximations
Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms
More informationExpectation propagation
Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors
More informationResource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis
Resource Allocaton and Decson Analss (ECON 800) Sprng 04 Foundatons of Regresson Analss Readng: Regresson Analss (ECON 800 Coursepak, Page 3) Defntons and Concepts: Regresson Analss statstcal technques
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationOpen Systems: Chemical Potential and Partial Molar Quantities Chemical Potential
Open Systems: Chemcal Potental and Partal Molar Quanttes Chemcal Potental For closed systems, we have derved the followng relatonshps: du = TdS pdv dh = TdS + Vdp da = SdT pdv dg = VdP SdT For open systems,
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmaldual schema Network Desgn:
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120astyle regresson
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationThe Second AntiMathima on Game Theory
The Second AntMathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2player 2acton zerosum games 2. 2player
More informationCOS 511: Theoretical Machine Learning
COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that
More informationCOMP th April, 2007 Clement Pang
COMP 540 12 th Aprl, 2007 Cleent Pang Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental,
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationBézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0
Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of
More informationLECTURE 9 CANONICAL CORRELATION ANALYSIS
LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of
More information2.3 Nilpotent endomorphisms
s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongamro, Namgu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More information