CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
|
|
- Sheena Lewis
- 5 years ago
- Views:
Transcription
1 CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of a smple model selecton procedure. Let there be a bnary classfcaton problem wth labels y {,}, and let H H... H k be k dfferent fnte hypothess classes ( H < ). Gven a dataset S of m d tranng examples, we wll dvde t nto a tranng set S tran consstng of the frst ( β)m examples, and a hold-out cross valdaton set S cv consstng of the remanng βm examples. Here, β (,). Let ĥ = arg mn h H ˆε Stran (h) be the hypothess n H wth the lowest tranng error (on S tran ). Thus, ĥ would be the hypothess returned by tranng (wth emprcal rsk mnmzaton) usng hypothess class H and dataset S tran. Also let h = arg mn h H ε(h) be the hypothess n H wth the lowest generalzaton error. Suppose that our algorthm frst fnds all the ĥ s usng emprcal rsk mnmzaton then uses the hold-out cross valdaton set to select a hypothess from ths the {ĥ,...,ĥk} wth mnmum tranng error. That s, the algorthm wll output ĥ = arg mn h {ĥ,...,ĥk} ˆε Scv (h). For ths queston you wll prove the followng bound. Let any δ > be fxed. Then wth probablty at least δ, we have that ( ) ε(ĥ) mn ε(h ) + =,...,k ( β)m log 4 H + δ (a) Prove that wth probablty at least δ, for all ĥ, ε(ĥ) ˆε Scv (ĥ). Answer: For each ĥ, the emprcal error on the cross-valdaton set, ˆε(ĥ) represents the average of βm random varables wth mean ε(ĥ), so by the Hoeffdng nequalty for any ĥ, P( ε(ĥ) ˆε Scv (ĥ) γ) exp( γ βm). As n the class notes, to nsure that ths holds for all ĥ, we need to take the unon over all k of the ĥ s. P(,s.t. ε(ĥ) ˆε Scv (ĥ) γ) k exp( γ βm).
2 CS9 Problem Set #3 Solutons Settng ths term equal to δ/ and solvng for γ yelds γ = provng the desred bound. (b) Use part (a) to show that wth probablty δ, ε(ĥ) mn ε(ĥ) + =,...,k βm log 4k δ. Answer: Let j = arg mn ε(ĥ). Usng part (a), wth probablty at least δ ε(ĥ) ˆε S (ĥ) + cv = mn ˆε Scv (ĥ) + ˆε Scv (ĥj) + ε(ĥj) + = mn =,...,k ε(ĥ) + βm log 4k δ (c) Let j = arg mn ε(ĥ). We know from class that for H j, wth probablty δ ε(ĥj) ˆε Stran (h j) ( β)m log 4 H j, h j H j. δ Use ths to prove the fnal bound gven at the begnnng of ths problem. Answer: The bounds n parts (a) and (c) both hold smultaneously wth probablty ( δ ) = δ + δ 4 > δ, so wth probablty greater than δ, ε(ĥ) ε(h j) + ( γ)m log H j + δ γm log k δ whch s equvalent to the bound we want to show.. VC Dmenson Let the nput doman of a learnng problem be X = R. Gve the VC dmenson for each of the followng classes of hypotheses. In each case, f you clam that the VC dmenson s d, then you need to show that the hypothess class can shatter d ponts, and explan why there are no d + ponts t can shatter.
3 CS9 Problem Set #3 Solutons 3 h(x) = {a < x}, wth parameter a R. Answer: VC-dmenson =. (a) It can shatter pont {}, by choosng a to be and. (b) It cannot shatter any two ponts {x,x }, x < x, because the labellng x = and x = cannot be realzed. h(x) = {a < x < b}, wth parameters a,b R. Answer: VC-dmenson =. (a) It can shatter ponts {,} by choosng (a,b) to be (3,5), (,), (,3), (,3). (b) It cannot shatter any three ponts {x,x,x 3 }, x < x < x 3, because the labellng x = x 3 =, x = cannot be realzed. h(x) = {asn x > }, wth parameter a R. Answer: VC-dmenson =. a controls the ampltude of the sne curve. (a) It can shatter pont { π } by choosng a to be and. (b) It cannot shatter any two ponts {x,x }, snce, the labellngs of x and x wll flp together. If x = x = for some a, then we cannot acheve x x. If x x for some a, then we cannot acheve x = x = (x = x = can be acheved by settng a = ). h(x) = {sn(x + a) > }, wth parameter a R. Answer: VC-dmenson =. a controls the phase of the sne curve. (a) It can shatter ponts { π 4, 3π 4 }, by choosng a to be, π, π, and 3π. (b) It cannot shatter any three ponts {x,x,x 3 }. Snce sne has a preod of π, let s defne x = x mod π. W.l.o.g., assume x < x < x 3. If the labellng of x = x = x 3 = can be realzed, then the labellng of x = x 3 =, x = wll not be realzable. Notce the smlarty to the second queston. 3. l regularzaton for least squares In the prevous problem set, we looked at the least squares problem where the objectve functon s augmented wth an addtonal regularzaton term λ θ. In ths problem we ll consder a smlar regularzed objectve but ths tme wth a penalty on the l norm of the parameters λ θ, where θ s defned as θ. That s, we want to mnmze the objectve J(θ) = m n (θ T x () y () ) + λ θ. = There has been a great deal of recent nterest n l regularzaton, whch, as we wll see, has the beneft of outputtng sparse solutons (.e., many components of the resultng θ are equal to zero). The l regularzed least squares problem s more dffcult than the unregularzed or l regularzed case, because the l term s not dfferentable. However, there have been many effcent algorthms developed for ths problem that work very well n practve. One very straghtforward approach, whch we have already seen n class, s the coordnate descent method. In ths problem you ll derve and mplement a coordnate descent algorthm for l regularzed least squares, and apply t to test data. =
4 CS9 Problem Set #3 Solutons 4 (a) Here we ll derve the coordnate descent update for a gven θ. Gven the X and y matrces, as defned n the class notes, as well a parameter vector θ, how can we adjust θ so as to mnmze the optmzaton objectve? To answer ths queston, we ll rewrte the optmzaton objectve above as J(θ) = Xθ y + λ θ = X θ + X θ y + λ θ + λ θ where X R m denotes the th column of X, and θ s equal to θ except wth θ = ; all we have done n rewrtng the above expresson s to make the θ term explct n the objectve. However, ths stll contans the θ term, whch s non-dfferentable and therefore dffcult to optmze. To get around ths we make the observaton that the sgn of θ must ether be non-negatve or non-postve. But f we knew the sgn of θ, then θ becomes just a lnear term. That, s, we can rewrte the objectve as J(θ) = X θ + X θ y + λ θ + λs θ where s denotes the sgn of θ, s {,}. In order to update θ, we can just compute the optmal θ for both possble values of s (makng sure that we restrct the optmal θ to obey the sgn restrcton we used to solve for t), then look to see whch acheves the best objectve value. For each of the possble values of s, compute the resultng optmal value of θ. [Hnt: to do ths, you can fx s n the above equaton, then dfferentate wth respect to θ to fnd the best value. Fnally, clp θ so that t les n the allowable range.e., for s =, you need to clp θ such that θ.] Answer: For s =, J(θ) = tr(x θ + X θ y) T (X θ + X θ y) + λ θ + λθ = so J(θ) θ whch means the optmal θ s gven by ( X T X θ + X T (X θ y)θ + X θ y ) + λ θ + λθ, = X T X θ + X T (X θ y) + λ { X T θ = max (X θ y) λ X TX },. Smlarly, for s =, the optmal θ s gven by { } X T θ = mn (X θ y) + λ X TX,. (b) Implement the above coordnate descent algorthm usng the updates you found n the prevous part. We have provded a skeleton theta = lls(x,y,lambda) functon n the q3/ drectory. To mplement the coordnate descent algorthm, you should repeatedly terate over all the θ s, adjustng each as you found above. You can termnate the process when θ changes by less than 5 after all n of the updates. Answer: The followng s our mplementaton of lls.m:
5 CS9 Problem Set #3 Solutons 5 functon theta = ll(x,y,lambda) m = sze(x,); n = sze(x,); theta = zeros(n,); old_theta = ones(n,); whle (norm(theta - old_theta) > e-5) old_theta = theta; for =:n, % compute possble values for theta theta() = ; theta_() = max((-x(:,) *(X*theta - y) - lambda) / (X(:,) *X(:,)), ); theta_() = mn((-x(:,) *(X*theta - y) + lambda) / (X(:,) *X(:,)), ); % get objectve value for both possble thetas theta() = theta_(); obj_theta() =.5*norm(X*theta - y)^ + lambda*norm(theta,); theta() = theta_(); obj_theta() =.5*norm(X*theta - y)^ + lambda*norm(theta,); % pck the theta whch mnmzes the objectve [mn_obj, mn_nd] = mn(obj_theta); theta() = theta_(mn_nd); (c) Test your mplementaton on the data provded n the q3/ drectory. The [X, y, theta true] = load data; functon wll load all the data the data was generated by y = X*theta true +.5*randn(,), but theta true s sparse, so that very few of the columns of X actually contan relevant features. Run your lls.m mplementaton on ths data set, rangng λ from. to. Comment brefly on how ths algorthm mght be used for feature selecton. Answer: For λ =, our mplementaton of l regularzed least squares recovers the exact sparsty pattern of the true parameter that generated the data. In constrast, usng any amount of l regularzaton stll leads to θ s that contan no zeros. Ths suggests that the l regularzaton could be very useful as a feature selecton algorthm: we could run l regularzed least squares to see whch coeffcents are non-zero, then select only these features for use wth ether least-squares or possbly a completely dfferent machne learnng algorthm. 4. K-Means Clusterng In ths problem you ll mplement the K-means clusterng algorthm on a synthetc data set. There s code and data for ths problem n the q4/ drectory. Run load X.dat ; to load the data fle for clusterng. Implement the [clusters, centers] = k means(x, k) functon n ths drectory. As nput, ths functon takes the m n data matrx X and the number of clusters k. It should output a m element vector, clusters, whch ndcates whch of the clusters each data pont belongs to, and a k n matrx, centers, whch contans the centrods of each cluster. Run the algorthm on the data provded, wth k = 3
6 6 CS9 Problem Set #3 Solutons and k = 4. Plot the cluster assgnments and centrods for each teraton of the algorthm usng the draw clusters(x, clusters, centrods) functon. For each k, be sure to run the algorthm several tmes usng dfferent ntal centrods. The followng s our mplementaton of k means.m: Answer: functon [clusters, centrods] = k_means(x, k) m = sze(x,); n = sze(x,); oldcentrods = zeros(k,n); centrods = X(cel(rand(k,)*m),:); whle (norm(oldcentrods - centrods) > e-5) oldcentrods = centrods; % compute cluster assgnments for =:m, dsts = sum((repmat(x(,:), k, ) - centrods).^, ); [mn_dst, clusters(,)] = mn(dsts); draw_clusters(x, clusters, centrods); pause(.); % compute cluster centrods for =:k, centrods(,:) = mean(x(clusters ==, :)); Below we show the centrod evoluton for two typcal runs wth k = 3. Note that the dfferent startng postons of the clusters lead to do dfferent fnal clusterngs The Generalzed EM algorthm When attemptng to run the EM algorthm, t may sometmes be dffcult to perform the M step exactly recall that we often need to mplement numercal optmzaton to perform the maxmzaton, whch can be costly. Therefore, nstead of fndng the global maxmum of our lower bound on the log-lkelhood, and alternatve s to just ncrease ths lower bound a lttle bt, by takng one step of gradent ascent, for example. Ths s commonly known as the Generalzed EM (GEM) algorthm.
7 CS9 Problem Set #3 Solutons 7 Put slghtly more formally, recall that the M-step of the standard EM algorthm performs the maxmzaton θ := arg max Q (z () )log p(x(),z() ;θ). θ Q (z () ) z () The GEM algorthm, n constrast, performs the followng update n the M-step: θ := θ + α θ Q (z () )log p(x(),z() ;θ) Q (z () ) z () where α s a learnng rate whch we assume s choosen small enough such that we do not decrease the objectve functon when takng ths gradent step. (a) Prove that the GEM algorthm descrbed above converges. To do ths, you should show that the the lkelhood s monotoncally mprovng, as t does for the EM algorthm.e., show that l(θ (t+) ) l(θ (t) ). Answer: We use the same logc as for the standard EM algorthm. Specfcally, just as for EM, we have for the GEM algorthm that l(θ (t+) ) z () Q (t) = l(θ (t) ) z () Q (t) (z () )log p(x(),z () ;θ (t+) ) Q (t) (z () ) (z () )log p(x(),z () ;θ (t) ) Q (t) (z () ) where as n EM the frst lne holds due to Jensen s equalty, and the last lne holds because we choose the Q dstrbuton to make ths hold wth equalty. The only dfference between EM and GEM s the logc as to why the second lne holds: for EM t held because θ (t+) was chosen to maxmze ths quantty, but for GEM t holds by our assumpton that we take a gradent step small enough so as not to decrease the objectve functon. (b) Instead of usng the EM algorthm at all, suppose we just want to apply gradent ascent to maxmze the log-lkelhood drectly. In other words, we are tryng to maxmze the (non-convex) functon l(θ) = log z () p(x (),z () ;θ) so we could smply use the update θ := θ + α θ log p(x (),z () ;θ). z () Show that ths procedure n fact gves the same update as the GEM algorthm descrbed above. Answer: Dfferentatng the log lkelhood drectly we get log p(x (),z () ;θ) = p(x (),z () ;θ) θ j z () z () p(x(),z () ;θ) θ j z () = p(x () ;θ) p(x (),z () ;θ). θ j z ()
8 CS9 Problem Set #3 Solutons 8 For the GEM algorthm, θ j Q (z () )log p(x(),z() ;θ) Q (z () ) z () = Q (z() ) p(x (),z () ;θ) p(x (),z () ;θ). θ j z () But the E-step of the GEM algorthm chooses so Q (z () ) = p(z () x () ;θ) = p(x(),z () ;θ), p(x () ;θ) Q (z() ) p(x (),z () ;θ) p(x (),z () ;θ) = θ j z () p(x () ;θ) p(x (),z () ;θ) θ j z () whch s the same as the dervatve of the log lkelhood.
Lecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationVapnik-Chervonenkis theory
Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationSupplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso
Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More information1 Definition of Rademacher Complexity
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More information14 Lagrange Multipliers
Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve
More informationTHE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.
THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY Wllam A. Pearlman 2002 References: S. Armoto - IEEE Trans. Inform. Thy., Jan. 1972 R. Blahut - IEEE Trans. Inform. Thy., July 1972 Recall
More informationLecture Space-Bounded Derandomization
Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval
More informationECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)
ECE 534: Elements of Informaton Theory Solutons to Mdterm Eam (Sprng 6) Problem [ pts.] A dscrete memoryless source has an alphabet of three letters,, =,, 3, wth probabltes.4,.4, and., respectvely. (a)
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationChapter 12. Ordinary Differential Equation Boundary Value (BV) Problems
Chapter. Ordnar Dfferental Equaton Boundar Value (BV) Problems In ths chapter we wll learn how to solve ODE boundar value problem. BV ODE s usuall gven wth x beng the ndependent space varable. p( x) q(
More informationExcess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More informationWeek3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity
Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationSalmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2
Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to
More informationFisher Linear Discriminant Analysis
Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear
More informationCase A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.
THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationCS286r Assign One. Answer Key
CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,
More informationComputational and Statistical Learning theory Assignment 4
Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationk t+1 + c t A t k t, t=0
Macro II (UC3M, MA/PhD Econ) Professor: Matthas Kredler Fnal Exam 6 May 208 You have 50 mnutes to complete the exam There are 80 ponts n total The exam has 4 pages If somethng n the queston s unclear,
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More information10.34 Numerical Methods Applied to Chemical Engineering Fall Homework #3: Systems of Nonlinear Equations and Optimization
10.34 Numercal Methods Appled to Chemcal Engneerng Fall 2015 Homework #3: Systems of Nonlnear Equatons and Optmzaton Problem 1 (30 ponts). A (homogeneous) azeotrope s a composton of a multcomponent mxture
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed
More information