Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Size: px
Start display at page:

Download "Homework 2 Solutions EM, Mixture Models, PCA, Dualitys"

Transcription

1 Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture of Bernoullis Eric; 5 pts In this section, you will erive an expectation-maximization EM algorithm to cluster blac an white images. The inputs x i can be thought of as vectors of binary values corresponing to blac an white pixel values, an the goal is to cluster the images into groups. You will be using a mixture of Bernoullis moel to tacle this problem. For the sae of brevity, you o not nee to substitute in previously erive expression in later problems. For example, beyon question.. you may use P x i p in your answers.. Mixture of Bernoullis. pts Consier a vector of binary ranom variables, x {0, } D. Assume each variable x is rawn from a Bernoullip istribution, so P x p. Let p 0, D be the resulting vector of Bernoulli parameters. Write an expression for P x p. P x p D p x p x. pts Now suppose we have a mixture of K Bernoulli istributions: each vector x i is rawn from some vector of Bernoulli ranom variables with parameters p, we will call this Bernoullip. Let {p,..., p K } p. Assume a istribution π over the selection of which set of Bernoulli parameters p is chosen. Write an expression for P x i p, π. Let A be the event that x x i was rawn from p. Then: P x p, π P x, A p, π P x A, p, πp A p, π π P x p 3. pts Finally, suppose we have inputs X {x i }...n. Using the above, write an expression for the log lielihoo of the ata X, log P X π, p. log Lp, π log P x i p, π. Expectation step. 4pts Now, we introuce the latent variables for the EM algorithm. Let z i {0, } K be an inicator vector, such that z i if x i was rawn from a Bernoullip, an 0 otherwise. Let Z {z i }...n. What is P z i π? What is P x i z i, p, π?

2 Let A i be the event that x i was rawn from p. P x i z i, p, π K P z i π K π zi P x i z i, p, π, A i zi K i z P x i p. pts Using the above two quantities, erive the lielihoo of the ata an the latent variables, P Z, X π, p. N N N K P Z, X π, p P x i, z i π, p P x i, z i, π, pp z i π P x i p i z K π zi 3. 5pts Let ηz i Ezi xi, π, p. Show that ηz i π D p D pj j π j xi xi p xi p j xi Let p, π be the new parameters that we lie to maximize, so p, π are from the previous iteration. Use this to erive the following final expression for the E step in the expectation-maximization algorithm: D Elog P Z, X p, π X, p, π ηz i log π + x i log p + x i log p ηz i Ezi xi, π, p P z i x i, π, p P xi z i, π, pp z i π, p P xi z i, π, pp z i π, p D π p π D p xi xi Next we compute the log lielihoo: K log P Z, D π, p z i P log x i p + z i z i z i p xi p K log P x i p + log π log π + log log π + Taing the expecte value an replacing Ez i D p D x i xi logp xi z i log π p xi + xi log p ηzi finishes the solution.

3 .3 Maximization step. 4pts We nee to maximize the above expression with respect to π, p. First, show that the value of p that maximizes the E step is p ηzi xi N where N ηzi. Setting the erivative to 0: p Multiply by the enominators: Solving for p ηz i results in Elog P Z, D π, p x i p + xi p ηzi ηz i p xi ηzi x i p. 4pts Show that the value of π that maximizes the E step is π N N + xi 0 p ηz i p ηzi xi N + x i 0 The exponential families notation may be useful. Alternatively, you can use Lagrange multipliers. We only nee to minimize K ηzi log π since the rest is not a function of π. In orer to eep π a istribution, we require π. Let λ be the ual variable for this constraint: K Lπ, λ ηz i log π + λ π Taing the erivate w.r.t. π : solve for π an we get Now we solve for λ: Lλ Tae the erivative w.r.t. λ: Solve for λ π Lπ, λ π λ λ ηz i π + λ 0 ηzi N λ λ ηz i log N log λ + ηz i ηz i 0 K N K N λ 3

4 Clustering Images of Numbers Eric; 5 pts In this section you will use the above algorithm to cluster images of numbers. You will be using the MNIST ataset. Each input is a binary number corresponing to blac an white pixels, an is a flattene version of the 8x8 pixel image. We will use the following conventions: N is the number of atapoints, D is the imension of each input, an K is the number of clusters. Xs is an N D matrix of the input ata, where row i is the pixel ata for picture i. p is a K D matrix of Bernoulli parameters, where row is the vector of parameters for the th mixture of Bernoullis. mix p is a K vector containing the istribution over the various mixtures. eta is a N K matrix containing the results of the E step, so etai, ηz i. clusters is an N vector containing the final cluster labels of the input ata. Each label is a number from to K.. Programming 0pts. Implement the E step of the algorithm within eta EstepXs,p,mix p, saving your calculate values within eta.. Implement the M step of the algorithm within p,mix p MstepXs, moel, alpha, alpha. p an mix p returne by this function shoul contain the new values that maximize the E step. alpha, alpha are Dirichlet smoothing parameters explaine below. 3. Implement clusters MoBlabelsXs,p,mix p. This function will tae in the estimate parameters an return the resulting labels that cluster the ata. 4. Some hints an tips: The autograer will separately grae the accuracy of your implementation separately for the E-step 4pts, M-step 4pts, an M-step with smoothing pts. Finally, it will run your implementation for several iterations on a subset of the MNIST ataset 8pts. A small subset of the MNIST ataset is provie for your convenience. Use the log operator to mae your calculations more numerically stable. In particular, pay attention to the calculation of ηz i. You will nee to avoi zeros in π an p or else you will tae log0. Use Dirichlet prior smoothing with the parameters α, α when upating these variables: p ηzi xi + α N + α D N + α π N + α K Initialize your parameters p by ranomly sampling from a Uniform0, istribution an normalizing each p to have unit length, an π /. Running your implementation on the MNIST ataset with K 0 clusters an α α 0 8 for 0 iterations shoul give you some results similar to the following our reference coe runs in approximately 0 secons: 4

5 . Analysis. 5pts For each cluster, reshape the pixels into a 8x8 matrix an print the resulting grayscale images. What o you see? Explain, an inclue one such image in your writeup. See software/octave/oc/interpreter/representing-images.html for help on printing the matrix, or use the provie helper function show clustersp, a, b which will print the mixtures in p in an a b gri.. pts Using your implemente MoBlabels function, cluster the ata. Using the true labels given in ytrain, how many unique igits oes each cluster typically have? Are there any clusters that pice out exactly one igit? 3 Kernel PCA Fish; 5 pts Principal component analysis PCA is use to emphasize variation an is often use to mae ata easy to explore an visualize. Suppose we have ata x, x, x N R that has zero mean. PCA aims at fining a new coorinate system such that the greatest variance by some projection of the ata comes to lie on the first coorinate, the secon greatest variance on the secon coorinate, an so on. The basis vectors of the new coorinate system are the eigenvectors of the co-variance matrix, i.e., N x i x T i λ i v i vi T, where v, v,, v are orthogonal < v i, v j > 0 for i j. Then we can plot our ata points on our new coorinate system. The position of each ata points in the new coorinate system can be erive by projecting x on to the basis of the new coorinate system, i.e., v, v, v. 3. ernel PCA Often we will want to mae linear or non-linear transformation on the ata so that they are projecte to a higher imensional space. Suppose there is a transformation function φ : R R l. We map all the ata to a new space through this function, an we have φx, φx,, φx N. We again want to o PCA on the transforme ata in the new space. How can we o this?. pts Write out the co-variance matrix, C, for φx, φx,, φx N. Define φx N φx j j 5

6 C N T φx i φx φx i φx. pts Now we want to fin the basis vectors for the orthogonalize new space by solving eigenvectors of C. Using the efinition of an eigenvector, λv Cv, explain why v is a linear combination of φx φx, φx φx,, φx N φx. Since v is a linear combination of these vectors, v N α i φx i φx. λv N v λn v T φx i φx φx i φx v 3 φx i φx φx i φx, v α i φx i φx pts Before starting to erive α an fin out v, we introuce ernel function here. A ernel function is in the form of x i, x j φx i, φx j. Notice that we o not nee to now the function φ to calculate x i, x j. We can simply efine a function that is relate to x i, x j an x i, x j x j, x i. A classic example is Raial basis function RBF ernel, which is x i, x j exp x i x j /σ. In orer to use these ernel function, we nee to get ri of all the φx i an replace them with the ernel function. A ernel matrix is efine as K R N N an K ij x i, x j. Since we are ealing with non-centere ata, we first use a moifie ernel matrix K ij φx i φx, φx j φx. By using the results in question 3.. an 3.. an the efinition of eigenvectors, show that Nλ Kα K α. 6 Define a matrix Φ φx φx, φx φx,, φx N φx. So we have v Φα, C N ΦΦT an K Φ T Φ. Rewrite λv Cv with these notations, we have Multiply both sie by Φ T, we have 4. pts Show that the solutions α in are also solutions for equation 6. If α satisfies 0, then it will also satisfies pts Show that λφα N ΦΦT Φα 7 λφ T Φα N ΦT ΦΦ T Φα 8 Nλ Kα K α 9 4 Nλα Kα. 0 K I ee T KI ee T. 6

7 Thus, by solving the eigenvectors of K, we get α. Define Θ φx, φx,, φx N. Rewrite K an have K Θ N T Θ ΘT N ΘT Θ T Θ N T Θ T Θ N ΘT Θ T + N T Θ T Θ T 3 K ee T K Kee T + ee T Kee T 4 I ee T KI ee T 5 6. ptsafter obtaining α, we get the un-normalize basis vector v. To normalize it, what is the factor you nee to multiply to v? v v T v α T Φ T Φα α T Kα. 6 We nee to normalize v by this factor. You shuol not simply write ivie by v, because you o not now what is φ, an again you nee to obtain everything by going through the ernel functions. 7. pts For a ata point x, what is its position in the normalize new space? Explain why you can get the new coorinate without explicitly calculate φx. Projection on to the a basis vector Φα is α T Kα φx φx Φα α T Kα α i Kx, xi, 7 so only ernel function is neee. You can also expen out K an calculate it using K. 4 Huber function an its application in solving ual problem Fish; 35pts 4. Huber function Define a function where > 0.. 3pts Show that the function Bx, min λ + x λ 0 λ +. 8 Bx, { x x if x if x > 9 with the minimizer λ max0, x. B is often calle a huber function. Differentiate Bx, with λ, we have x γ+ 0. Solving the function an we get λ x. To match the constraint that λ 0, we nee to have x. Besies, with λ x, we have Bx, x + x x x x. On the other sie, if x <, notice that gλ λ + + λ+ is an increasing function since g λ x λ+ x > 0, which means you can obtain the minimum value at λ 0. So we have min λ 0 λ + + x λ+ + x x at λ 0. 7

8 . 3pts Is the function B convex in x? Assume is a fixe constant At the range of x <, Bx, x x an Bx, x 0, so the function is convex in this range. Next we examine the part when x >, Bx, x signx an Bx, x 0, so it is convex. Then we examine the points where x, you can see that the ifferentiation approching from the irections from x < an x > at this point are both signx, so the ifferentiation exists an the twice ifferentiation is again 0. So the function is also convex at this point. We thus conclue that the function is convex by showing that twice ifferentation is greater an equal to zero at all points. 3. 3pts Derive the graient of function B at any given point x,. Remember > 0. x, x, Bx,,, x, x, Bx,,, x x, x, Bx, x, x. 4. 3pts Function B is often use as a penalty term in classification or regression problems, which have the form min x Lx + Bx,, 0 where L is the loss function, an is a parameter with positive value. Describe how parameter affects the penalty term B on the solution. When is large, than it acts lie l penalty. When is small, then it acts as l penalty. 4. An application of using Huber loss to solve ual problem Consier the optimization problem min x ix i + r i x i s.t. a T x, x i, for i,, n.. 3pts Show that the problem has strictly feasible solutions if an only if a >. By Cauchy-schwarz inequality, we have a T x a x, x <, a >. 3 For the other irection, if a >, you can set x i signai a question. an this is a feasible solution to the. 3pts Re-express x i, as x i, for i,,, n. Write own the ual of this problem. The langrange function for this question The ual problem is Lx, u, v ix i + r i x i + ua T x + v i x i. 4 Now we solve x so that we can have a function that only contains u, v. sup inf xlx, u, v. 5 u,v Lx, u, v x i i x i + r i + ua i + v i x i 0, 6 x i r i + ua i i + v i. 7 8

9 plug this into the Lagrange function an we have r i + ua i i i + v i r r i + ua i r i + ua i ri + ua i i ua i + v i i + v i i + v i i + v i u 8 r i + ua i v i 9 v i + i So the ual problem is Or you can write it as max u,v r i + ua i v i + i v i u 30 s.t.v i 0 for i N 3 min u,v u + ri + ua i + v i v i + i 3 s.t.v i 0 for i N pts Write own the KKT conition for this problem. Does it characterize the optimal solution? Yes, KKT characterize the optimal solution. # Stationary: x Lx, u, v 0 # Primal feasible: x i a T x # Dual feasible: v i 0, for i N Complementary Slacness: v i x i 0 for i N. 4. 5pts Show that we can further reuce the ual problem to a one-imensional convex problem min µ µ + where B is the Huber function efine in the previous section. Br i + µa i, i, 34 Loo at 30, it is exactly in this form, where v i is the λ, an r i + ua i is x. 5. 3pts Describe an algorithm to solve this ual problem. What is the time complexity for your algorithm? We have shown that Huber function is convex an erive its graient in the question 4..3, so we can simply apply graient escent to solve it. The time complexity is ON, where is the number of step. 6. 3pts How can you recover an optimal primal solution x after solving the ual? After eriving u, v, we can recover x by as state in 5. x i r i + ua i i + v i 35 9

Homework 2 EM, Mixture Models, PCA, Dualitys

Homework 2 EM, Mixture Models, PCA, Dualitys Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

3 The variational formulation of elliptic PDEs

3 The variational formulation of elliptic PDEs Chapter 3 The variational formulation of elliptic PDEs We now begin the theoretical stuy of elliptic partial ifferential equations an bounary value problems. We will focus on one approach, which is calle

More information

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example

More information

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth MA 2232 Lecture 08 - Review of Log an Exponential Functions an Exponential Growth Friay, February 2, 2018. Objectives: Review log an exponential functions, their erivative an integration formulas. Exponential

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

State-Space Model for a Multi-Machine System

State-Space Model for a Multi-Machine System State-Space Moel for a Multi-Machine System These notes parallel section.4 in the text. We are ealing with classically moele machines (IEEE Type.), constant impeance loas, an a network reuce to its internal

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Designing Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations

Designing Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations EECS 6B Designing Information Devices an Systems II Fall 07 Note 3 Secon Orer Differential Equations Secon orer ifferential equations appear everywhere in the real worl. In this note, we will walk through

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Math 1271 Solutions for Fall 2005 Final Exam

Math 1271 Solutions for Fall 2005 Final Exam Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly

More information

II. First variation of functionals

II. First variation of functionals II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent

More information

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.

More information

Solving the Schrödinger Equation for the 1 Electron Atom (Hydrogen-Like)

Solving the Schrödinger Equation for the 1 Electron Atom (Hydrogen-Like) Stockton Univeristy Chemistry Program, School of Natural Sciences an Mathematics 101 Vera King Farris Dr, Galloway, NJ CHEM 340: Physical Chemistry II Solving the Schröinger Equation for the 1 Electron

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Exam 2 Review Solutions

Exam 2 Review Solutions Exam Review Solutions 1. True or False, an explain: (a) There exists a function f with continuous secon partial erivatives such that f x (x, y) = x + y f y = x y False. If the function has continuous secon

More information

164 Final Solutions 1

164 Final Solutions 1 164 Final Solutions 1 1. Question 1 True/False a) Let f : R R be a C 3 function such that fx) for all x R. Then the graient escent algorithm starte at the point will fin the global minimum of f. FALSE.

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

The Principle of Least Action and Designing Fiber Optics

The Principle of Least Action and Designing Fiber Optics University of Southampton Department of Physics & Astronomy Year 2 Theory Labs The Principle of Least Action an Designing Fiber Optics 1 Purpose of this Moule We will be intereste in esigning fiber optic

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Many problems in physics, engineering, and chemistry fall in a general class of equations of the form. d dx. d dx

Many problems in physics, engineering, and chemistry fall in a general class of equations of the form. d dx. d dx Math 53 Notes on turm-liouville equations Many problems in physics, engineering, an chemistry fall in a general class of equations of the form w(x)p(x) u ] + (q(x) λ) u = w(x) on an interval a, b], plus

More information

23 Implicit differentiation

23 Implicit differentiation 23 Implicit ifferentiation 23.1 Statement The equation y = x 2 + 3x + 1 expresses a relationship between the quantities x an y. If a value of x is given, then a corresponing value of y is etermine. For

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Convex Optimization and SVM

Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

More information

LINEAR DIFFERENTIAL EQUATIONS OF ORDER 1. where a(x) and b(x) are functions. Observe that this class of equations includes equations of the form

LINEAR DIFFERENTIAL EQUATIONS OF ORDER 1. where a(x) and b(x) are functions. Observe that this class of equations includes equations of the form LINEAR DIFFERENTIAL EQUATIONS OF ORDER 1 We consier ifferential equations of the form y + a()y = b(), (1) y( 0 ) = y 0, where a() an b() are functions. Observe that this class of equations inclues equations

More information

1 Lecture 20: Implicit differentiation

1 Lecture 20: Implicit differentiation Lecture 20: Implicit ifferentiation. Outline The technique of implicit ifferentiation Tangent lines to a circle Derivatives of inverse functions by implicit ifferentiation Examples.2 Implicit ifferentiation

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009 UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Designing Information Devices and Systems II Spring 2018 J. Roychowdhury and M. Maharbiz Discussion 2A

Designing Information Devices and Systems II Spring 2018 J. Roychowdhury and M. Maharbiz Discussion 2A EECS 6B Designing Information Devices an Systems II Spring 208 J. Roychowhury an M. Maharbiz Discussion 2A Secon-Orer Differential Equations Secon-orer ifferential equations are ifferential equations of

More information

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1 Assignment 1 Golstein 1.4 The equations of motion for the rolling isk are special cases of general linear ifferential equations of constraint of the form g i (x 1,..., x n x i = 0. i=1 A constraint conition

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable

More information

Center of Gravity and Center of Mass

Center of Gravity and Center of Mass Center of Gravity an Center of Mass 1 Introuction. Center of mass an center of gravity closely parallel each other: they both work the same way. Center of mass is the more important, but center of gravity

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

CS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA

CS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA CS9840 Learning an Computer Vision Prof. Olga Veksler Lecture Some Concepts from Computer Vision Curse of Dimensionality PCA Some Slies are from Cornelia, Fermüller, Mubarak Shah, Gary Braski, Sebastian

More information

1. Aufgabenblatt zur Vorlesung Probability Theory

1. Aufgabenblatt zur Vorlesung Probability Theory 24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

WJEC Core 2 Integration. Section 1: Introduction to integration

WJEC Core 2 Integration. Section 1: Introduction to integration WJEC Core Integration Section : Introuction to integration Notes an Eamples These notes contain subsections on: Reversing ifferentiation The rule for integrating n Fining the arbitrary constant Reversing

More information

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Short Intro to Coordinate Transformation

Short Intro to Coordinate Transformation Short Intro to Coorinate Transformation 1 A Vector A vector can basically be seen as an arrow in space pointing in a specific irection with a specific length. The following problem arises: How o we represent

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Outline. MS121: IT Mathematics. Differentiation Rules for Differentiation: Part 1. Outline. Dublin City University 4 The Quotient Rule

Outline. MS121: IT Mathematics. Differentiation Rules for Differentiation: Part 1. Outline. Dublin City University 4 The Quotient Rule MS2: IT Mathematics Differentiation Rules for Differentiation: Part John Carroll School of Mathematical Sciences Dublin City University Pattern Observe You may have notice the following pattern when we

More information

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes Fin these erivatives of these functions: y.7 Implicit Differentiation -- A Brief Introuction -- Stuent Notes tan y sin tan = sin y e = e = Write the inverses of these functions: y tan y sin How woul we

More information

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10 Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

18 EVEN MORE CALCULUS

18 EVEN MORE CALCULUS 8 EVEN MORE CALCULUS Chapter 8 Even More Calculus Objectives After stuing this chapter you shoul be able to ifferentiate an integrate basic trigonometric functions; unerstan how to calculate rates of change;

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Section 7.1: Integration by Parts

Section 7.1: Integration by Parts Section 7.1: Integration by Parts 1. Introuction to Integration Techniques Unlike ifferentiation where there are a large number of rules which allow you (in principle) to ifferentiate any function, the

More information

Lecture 3 Notes. Dan Sheldon. September 17, 2012

Lecture 3 Notes. Dan Sheldon. September 17, 2012 Lecture 3 Notes Dan Shelon September 17, 2012 0 Errata Section 4, Equation (2): yn 2 shoul be x2 N. Fixe 9/17/12 Section 5.3, Example 3: shoul rea w 0 = 0, w 1 = 1. Fixe 9/17/12. 1 Review: Linear Regression

More information

Lecture 10: October 30, 2017

Lecture 10: October 30, 2017 Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

11.7. Implicit Differentiation. Introduction. Prerequisites. Learning Outcomes

11.7. Implicit Differentiation. Introduction. Prerequisites. Learning Outcomes Implicit Differentiation 11.7 Introuction This Section introuces implicit ifferentiation which is use to ifferentiate functions expresse in implicit form (where the variables are foun together). Examples

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

Markov Chains in Continuous Time

Markov Chains in Continuous Time Chapter 23 Markov Chains in Continuous Time Previously we looke at Markov chains, where the transitions betweenstatesoccurreatspecifietime- steps. That it, we mae time (a continuous variable) avance in

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Bohr Model of the Hydrogen Atom

Bohr Model of the Hydrogen Atom Class 2 page 1 Bohr Moel of the Hyrogen Atom The Bohr Moel of the hyrogen atom assumes that the atom consists of one electron orbiting a positively charge nucleus. Although it oes NOT o a goo job of escribing

More information

Matrix Recipes. Javier R. Movellan. December 28, Copyright c 2004 Javier R. Movellan

Matrix Recipes. Javier R. Movellan. December 28, Copyright c 2004 Javier R. Movellan Matrix Recipes Javier R Movellan December 28, 2006 Copyright c 2004 Javier R Movellan 1 1 Definitions Let x, y be matrices of orer m n an o p respectively, ie x 11 x 1n y 11 y 1p x = y = (1) x m1 x mn

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information