Homework 2 Solutions EM, Mixture Models, PCA, Dualitys
|
|
- Barbara Hall
- 5 years ago
- Views:
Transcription
1 Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture of Bernoullis Eric; 5 pts In this section, you will erive an expectation-maximization EM algorithm to cluster blac an white images. The inputs x i can be thought of as vectors of binary values corresponing to blac an white pixel values, an the goal is to cluster the images into groups. You will be using a mixture of Bernoullis moel to tacle this problem. For the sae of brevity, you o not nee to substitute in previously erive expression in later problems. For example, beyon question.. you may use P x i p in your answers.. Mixture of Bernoullis. pts Consier a vector of binary ranom variables, x {0, } D. Assume each variable x is rawn from a Bernoullip istribution, so P x p. Let p 0, D be the resulting vector of Bernoulli parameters. Write an expression for P x p. P x p D p x p x. pts Now suppose we have a mixture of K Bernoulli istributions: each vector x i is rawn from some vector of Bernoulli ranom variables with parameters p, we will call this Bernoullip. Let {p,..., p K } p. Assume a istribution π over the selection of which set of Bernoulli parameters p is chosen. Write an expression for P x i p, π. Let A be the event that x x i was rawn from p. Then: P x p, π P x, A p, π P x A, p, πp A p, π π P x p 3. pts Finally, suppose we have inputs X {x i }...n. Using the above, write an expression for the log lielihoo of the ata X, log P X π, p. log Lp, π log P x i p, π. Expectation step. 4pts Now, we introuce the latent variables for the EM algorithm. Let z i {0, } K be an inicator vector, such that z i if x i was rawn from a Bernoullip, an 0 otherwise. Let Z {z i }...n. What is P z i π? What is P x i z i, p, π?
2 Let A i be the event that x i was rawn from p. P x i z i, p, π K P z i π K π zi P x i z i, p, π, A i zi K i z P x i p. pts Using the above two quantities, erive the lielihoo of the ata an the latent variables, P Z, X π, p. N N N K P Z, X π, p P x i, z i π, p P x i, z i, π, pp z i π P x i p i z K π zi 3. 5pts Let ηz i Ezi xi, π, p. Show that ηz i π D p D pj j π j xi xi p xi p j xi Let p, π be the new parameters that we lie to maximize, so p, π are from the previous iteration. Use this to erive the following final expression for the E step in the expectation-maximization algorithm: D Elog P Z, X p, π X, p, π ηz i log π + x i log p + x i log p ηz i Ezi xi, π, p P z i x i, π, p P xi z i, π, pp z i π, p P xi z i, π, pp z i π, p D π p π D p xi xi Next we compute the log lielihoo: K log P Z, D π, p z i P log x i p + z i z i z i p xi p K log P x i p + log π log π + log log π + Taing the expecte value an replacing Ez i D p D x i xi logp xi z i log π p xi + xi log p ηzi finishes the solution.
3 .3 Maximization step. 4pts We nee to maximize the above expression with respect to π, p. First, show that the value of p that maximizes the E step is p ηzi xi N where N ηzi. Setting the erivative to 0: p Multiply by the enominators: Solving for p ηz i results in Elog P Z, D π, p x i p + xi p ηzi ηz i p xi ηzi x i p. 4pts Show that the value of π that maximizes the E step is π N N + xi 0 p ηz i p ηzi xi N + x i 0 The exponential families notation may be useful. Alternatively, you can use Lagrange multipliers. We only nee to minimize K ηzi log π since the rest is not a function of π. In orer to eep π a istribution, we require π. Let λ be the ual variable for this constraint: K Lπ, λ ηz i log π + λ π Taing the erivate w.r.t. π : solve for π an we get Now we solve for λ: Lλ Tae the erivative w.r.t. λ: Solve for λ π Lπ, λ π λ λ ηz i π + λ 0 ηzi N λ λ ηz i log N log λ + ηz i ηz i 0 K N K N λ 3
4 Clustering Images of Numbers Eric; 5 pts In this section you will use the above algorithm to cluster images of numbers. You will be using the MNIST ataset. Each input is a binary number corresponing to blac an white pixels, an is a flattene version of the 8x8 pixel image. We will use the following conventions: N is the number of atapoints, D is the imension of each input, an K is the number of clusters. Xs is an N D matrix of the input ata, where row i is the pixel ata for picture i. p is a K D matrix of Bernoulli parameters, where row is the vector of parameters for the th mixture of Bernoullis. mix p is a K vector containing the istribution over the various mixtures. eta is a N K matrix containing the results of the E step, so etai, ηz i. clusters is an N vector containing the final cluster labels of the input ata. Each label is a number from to K.. Programming 0pts. Implement the E step of the algorithm within eta EstepXs,p,mix p, saving your calculate values within eta.. Implement the M step of the algorithm within p,mix p MstepXs, moel, alpha, alpha. p an mix p returne by this function shoul contain the new values that maximize the E step. alpha, alpha are Dirichlet smoothing parameters explaine below. 3. Implement clusters MoBlabelsXs,p,mix p. This function will tae in the estimate parameters an return the resulting labels that cluster the ata. 4. Some hints an tips: The autograer will separately grae the accuracy of your implementation separately for the E-step 4pts, M-step 4pts, an M-step with smoothing pts. Finally, it will run your implementation for several iterations on a subset of the MNIST ataset 8pts. A small subset of the MNIST ataset is provie for your convenience. Use the log operator to mae your calculations more numerically stable. In particular, pay attention to the calculation of ηz i. You will nee to avoi zeros in π an p or else you will tae log0. Use Dirichlet prior smoothing with the parameters α, α when upating these variables: p ηzi xi + α N + α D N + α π N + α K Initialize your parameters p by ranomly sampling from a Uniform0, istribution an normalizing each p to have unit length, an π /. Running your implementation on the MNIST ataset with K 0 clusters an α α 0 8 for 0 iterations shoul give you some results similar to the following our reference coe runs in approximately 0 secons: 4
5 . Analysis. 5pts For each cluster, reshape the pixels into a 8x8 matrix an print the resulting grayscale images. What o you see? Explain, an inclue one such image in your writeup. See software/octave/oc/interpreter/representing-images.html for help on printing the matrix, or use the provie helper function show clustersp, a, b which will print the mixtures in p in an a b gri.. pts Using your implemente MoBlabels function, cluster the ata. Using the true labels given in ytrain, how many unique igits oes each cluster typically have? Are there any clusters that pice out exactly one igit? 3 Kernel PCA Fish; 5 pts Principal component analysis PCA is use to emphasize variation an is often use to mae ata easy to explore an visualize. Suppose we have ata x, x, x N R that has zero mean. PCA aims at fining a new coorinate system such that the greatest variance by some projection of the ata comes to lie on the first coorinate, the secon greatest variance on the secon coorinate, an so on. The basis vectors of the new coorinate system are the eigenvectors of the co-variance matrix, i.e., N x i x T i λ i v i vi T, where v, v,, v are orthogonal < v i, v j > 0 for i j. Then we can plot our ata points on our new coorinate system. The position of each ata points in the new coorinate system can be erive by projecting x on to the basis of the new coorinate system, i.e., v, v, v. 3. ernel PCA Often we will want to mae linear or non-linear transformation on the ata so that they are projecte to a higher imensional space. Suppose there is a transformation function φ : R R l. We map all the ata to a new space through this function, an we have φx, φx,, φx N. We again want to o PCA on the transforme ata in the new space. How can we o this?. pts Write out the co-variance matrix, C, for φx, φx,, φx N. Define φx N φx j j 5
6 C N T φx i φx φx i φx. pts Now we want to fin the basis vectors for the orthogonalize new space by solving eigenvectors of C. Using the efinition of an eigenvector, λv Cv, explain why v is a linear combination of φx φx, φx φx,, φx N φx. Since v is a linear combination of these vectors, v N α i φx i φx. λv N v λn v T φx i φx φx i φx v 3 φx i φx φx i φx, v α i φx i φx pts Before starting to erive α an fin out v, we introuce ernel function here. A ernel function is in the form of x i, x j φx i, φx j. Notice that we o not nee to now the function φ to calculate x i, x j. We can simply efine a function that is relate to x i, x j an x i, x j x j, x i. A classic example is Raial basis function RBF ernel, which is x i, x j exp x i x j /σ. In orer to use these ernel function, we nee to get ri of all the φx i an replace them with the ernel function. A ernel matrix is efine as K R N N an K ij x i, x j. Since we are ealing with non-centere ata, we first use a moifie ernel matrix K ij φx i φx, φx j φx. By using the results in question 3.. an 3.. an the efinition of eigenvectors, show that Nλ Kα K α. 6 Define a matrix Φ φx φx, φx φx,, φx N φx. So we have v Φα, C N ΦΦT an K Φ T Φ. Rewrite λv Cv with these notations, we have Multiply both sie by Φ T, we have 4. pts Show that the solutions α in are also solutions for equation 6. If α satisfies 0, then it will also satisfies pts Show that λφα N ΦΦT Φα 7 λφ T Φα N ΦT ΦΦ T Φα 8 Nλ Kα K α 9 4 Nλα Kα. 0 K I ee T KI ee T. 6
7 Thus, by solving the eigenvectors of K, we get α. Define Θ φx, φx,, φx N. Rewrite K an have K Θ N T Θ ΘT N ΘT Θ T Θ N T Θ T Θ N ΘT Θ T + N T Θ T Θ T 3 K ee T K Kee T + ee T Kee T 4 I ee T KI ee T 5 6. ptsafter obtaining α, we get the un-normalize basis vector v. To normalize it, what is the factor you nee to multiply to v? v v T v α T Φ T Φα α T Kα. 6 We nee to normalize v by this factor. You shuol not simply write ivie by v, because you o not now what is φ, an again you nee to obtain everything by going through the ernel functions. 7. pts For a ata point x, what is its position in the normalize new space? Explain why you can get the new coorinate without explicitly calculate φx. Projection on to the a basis vector Φα is α T Kα φx φx Φα α T Kα α i Kx, xi, 7 so only ernel function is neee. You can also expen out K an calculate it using K. 4 Huber function an its application in solving ual problem Fish; 35pts 4. Huber function Define a function where > 0.. 3pts Show that the function Bx, min λ + x λ 0 λ +. 8 Bx, { x x if x if x > 9 with the minimizer λ max0, x. B is often calle a huber function. Differentiate Bx, with λ, we have x γ+ 0. Solving the function an we get λ x. To match the constraint that λ 0, we nee to have x. Besies, with λ x, we have Bx, x + x x x x. On the other sie, if x <, notice that gλ λ + + λ+ is an increasing function since g λ x λ+ x > 0, which means you can obtain the minimum value at λ 0. So we have min λ 0 λ + + x λ+ + x x at λ 0. 7
8 . 3pts Is the function B convex in x? Assume is a fixe constant At the range of x <, Bx, x x an Bx, x 0, so the function is convex in this range. Next we examine the part when x >, Bx, x signx an Bx, x 0, so it is convex. Then we examine the points where x, you can see that the ifferentiation approching from the irections from x < an x > at this point are both signx, so the ifferentiation exists an the twice ifferentiation is again 0. So the function is also convex at this point. We thus conclue that the function is convex by showing that twice ifferentation is greater an equal to zero at all points. 3. 3pts Derive the graient of function B at any given point x,. Remember > 0. x, x, Bx,,, x, x, Bx,,, x x, x, Bx, x, x. 4. 3pts Function B is often use as a penalty term in classification or regression problems, which have the form min x Lx + Bx,, 0 where L is the loss function, an is a parameter with positive value. Describe how parameter affects the penalty term B on the solution. When is large, than it acts lie l penalty. When is small, then it acts as l penalty. 4. An application of using Huber loss to solve ual problem Consier the optimization problem min x ix i + r i x i s.t. a T x, x i, for i,, n.. 3pts Show that the problem has strictly feasible solutions if an only if a >. By Cauchy-schwarz inequality, we have a T x a x, x <, a >. 3 For the other irection, if a >, you can set x i signai a question. an this is a feasible solution to the. 3pts Re-express x i, as x i, for i,,, n. Write own the ual of this problem. The langrange function for this question The ual problem is Lx, u, v ix i + r i x i + ua T x + v i x i. 4 Now we solve x so that we can have a function that only contains u, v. sup inf xlx, u, v. 5 u,v Lx, u, v x i i x i + r i + ua i + v i x i 0, 6 x i r i + ua i i + v i. 7 8
9 plug this into the Lagrange function an we have r i + ua i i i + v i r r i + ua i r i + ua i ri + ua i i ua i + v i i + v i i + v i i + v i u 8 r i + ua i v i 9 v i + i So the ual problem is Or you can write it as max u,v r i + ua i v i + i v i u 30 s.t.v i 0 for i N 3 min u,v u + ri + ua i + v i v i + i 3 s.t.v i 0 for i N pts Write own the KKT conition for this problem. Does it characterize the optimal solution? Yes, KKT characterize the optimal solution. # Stationary: x Lx, u, v 0 # Primal feasible: x i a T x # Dual feasible: v i 0, for i N Complementary Slacness: v i x i 0 for i N. 4. 5pts Show that we can further reuce the ual problem to a one-imensional convex problem min µ µ + where B is the Huber function efine in the previous section. Br i + µa i, i, 34 Loo at 30, it is exactly in this form, where v i is the λ, an r i + ua i is x. 5. 3pts Describe an algorithm to solve this ual problem. What is the time complexity for your algorithm? We have shown that Huber function is convex an erive its graient in the question 4..3, so we can simply apply graient escent to solve it. The time complexity is ON, where is the number of step. 6. 3pts How can you recover an optimal primal solution x after solving the ual? After eriving u, v, we can recover x by as state in 5. x i r i + ua i i + v i 35 9
Homework 2 EM, Mixture Models, PCA, Dualitys
Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics
UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationEuler equations for multiple integrals
Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationFunction Spaces. 1 Hilbert Spaces
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More information3 The variational formulation of elliptic PDEs
Chapter 3 The variational formulation of elliptic PDEs We now begin the theoretical stuy of elliptic partial ifferential equations an bounary value problems. We will focus on one approach, which is calle
More informationCollapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling
Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example
More informationMA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth
MA 2232 Lecture 08 - Review of Log an Exponential Functions an Exponential Growth Friay, February 2, 2018. Objectives: Review log an exponential functions, their erivative an integration formulas. Exponential
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationState-Space Model for a Multi-Machine System
State-Space Moel for a Multi-Machine System These notes parallel section.4 in the text. We are ealing with classically moele machines (IEEE Type.), constant impeance loas, an a network reuce to its internal
More informationLagrangian and Hamiltonian Mechanics
Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationDesigning Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations
EECS 6B Designing Information Devices an Systems II Fall 07 Note 3 Secon Orer Differential Equations Secon orer ifferential equations appear everywhere in the real worl. In this note, we will walk through
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationMath 1271 Solutions for Fall 2005 Final Exam
Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly
More informationII. First variation of functionals
II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent
More informationQF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim
QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.
More informationSolving the Schrödinger Equation for the 1 Electron Atom (Hydrogen-Like)
Stockton Univeristy Chemistry Program, School of Natural Sciences an Mathematics 101 Vera King Farris Dr, Galloway, NJ CHEM 340: Physical Chemistry II Solving the Schröinger Equation for the 1 Electron
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationExam 2 Review Solutions
Exam Review Solutions 1. True or False, an explain: (a) There exists a function f with continuous secon partial erivatives such that f x (x, y) = x + y f y = x y False. If the function has continuous secon
More information164 Final Solutions 1
164 Final Solutions 1 1. Question 1 True/False a) Let f : R R be a C 3 function such that fx) for all x R. Then the graient escent algorithm starte at the point will fin the global minimum of f. FALSE.
More informationWUCHEN LI AND STANLEY OSHER
CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationAdmin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016
Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming
More informationLecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations
Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationThe Principle of Least Action and Designing Fiber Optics
University of Southampton Department of Physics & Astronomy Year 2 Theory Labs The Principle of Least Action an Designing Fiber Optics 1 Purpose of this Moule We will be intereste in esigning fiber optic
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationCalculus and optimization
Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationMany problems in physics, engineering, and chemistry fall in a general class of equations of the form. d dx. d dx
Math 53 Notes on turm-liouville equations Many problems in physics, engineering, an chemistry fall in a general class of equations of the form w(x)p(x) u ] + (q(x) λ) u = w(x) on an interval a, b], plus
More information23 Implicit differentiation
23 Implicit ifferentiation 23.1 Statement The equation y = x 2 + 3x + 1 expresses a relationship between the quantities x an y. If a value of x is given, then a corresponing value of y is etermine. For
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationSwitching Time Optimization in Discretized Hybrid Dynamical Systems
Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationConvex Optimization and SVM
Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence
More informationLINEAR DIFFERENTIAL EQUATIONS OF ORDER 1. where a(x) and b(x) are functions. Observe that this class of equations includes equations of the form
LINEAR DIFFERENTIAL EQUATIONS OF ORDER 1 We consier ifferential equations of the form y + a()y = b(), (1) y( 0 ) = y 0, where a() an b() are functions. Observe that this class of equations inclues equations
More information1 Lecture 20: Implicit differentiation
Lecture 20: Implicit ifferentiation. Outline The technique of implicit ifferentiation Tangent lines to a circle Derivatives of inverse functions by implicit ifferentiation Examples.2 Implicit ifferentiation
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009
UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationDesigning Information Devices and Systems II Spring 2018 J. Roychowdhury and M. Maharbiz Discussion 2A
EECS 6B Designing Information Devices an Systems II Spring 208 J. Roychowhury an M. Maharbiz Discussion 2A Secon-Orer Differential Equations Secon-orer ifferential equations are ifferential equations of
More informationAssignment 1. g i (x 1,..., x n ) dx i = 0. i=1
Assignment 1 Golstein 1.4 The equations of motion for the rolling isk are special cases of general linear ifferential equations of constraint of the form g i (x 1,..., x n x i = 0. i=1 A constraint conition
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationLecture 2 Lagrangian formulation of classical mechanics Mechanics
Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,
More informationA note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz
A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae
More informationProblem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs
Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable
More informationCenter of Gravity and Center of Mass
Center of Gravity an Center of Mass 1 Introuction. Center of mass an center of gravity closely parallel each other: they both work the same way. Center of mass is the more important, but center of gravity
More informationFinal Exam Study Guide and Practice Problems Solutions
Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making
More informationCS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA
CS9840 Learning an Computer Vision Prof. Olga Veksler Lecture Some Concepts from Computer Vision Curse of Dimensionality PCA Some Slies are from Cornelia, Fermüller, Mubarak Shah, Gary Braski, Sebastian
More information1. Aufgabenblatt zur Vorlesung Probability Theory
24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationWJEC Core 2 Integration. Section 1: Introduction to integration
WJEC Core Integration Section : Introuction to integration Notes an Eamples These notes contain subsections on: Reversing ifferentiation The rule for integrating n Fining the arbitrary constant Reversing
More informationOptimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations
Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationShort Intro to Coordinate Transformation
Short Intro to Coorinate Transformation 1 A Vector A vector can basically be seen as an arrow in space pointing in a specific irection with a specific length. The following problem arises: How o we represent
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationSYNCHRONOUS SEQUENTIAL CIRCUITS
CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents
More informationSturm-Liouville Theory
LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationOutline. MS121: IT Mathematics. Differentiation Rules for Differentiation: Part 1. Outline. Dublin City University 4 The Quotient Rule
MS2: IT Mathematics Differentiation Rules for Differentiation: Part John Carroll School of Mathematical Sciences Dublin City University Pattern Observe You may have notice the following pattern when we
More information3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes
Fin these erivatives of these functions: y.7 Implicit Differentiation -- A Brief Introuction -- Stuent Notes tan y sin tan = sin y e = e = Write the inverses of these functions: y tan y sin How woul we
More informationSome vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10
Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the
More informationConservation Laws. Chapter Conservation of Energy
20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action
More information18 EVEN MORE CALCULUS
8 EVEN MORE CALCULUS Chapter 8 Even More Calculus Objectives After stuing this chapter you shoul be able to ifferentiate an integrate basic trigonometric functions; unerstan how to calculate rates of change;
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationSection 7.1: Integration by Parts
Section 7.1: Integration by Parts 1. Introuction to Integration Techniques Unlike ifferentiation where there are a large number of rules which allow you (in principle) to ifferentiate any function, the
More informationLecture 3 Notes. Dan Sheldon. September 17, 2012
Lecture 3 Notes Dan Shelon September 17, 2012 0 Errata Section 4, Equation (2): yn 2 shoul be x2 N. Fixe 9/17/12 Section 5.3, Example 3: shoul rea w 0 = 0, w 1 = 1. Fixe 9/17/12. 1 Review: Linear Regression
More informationLecture 10: October 30, 2017
Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More information11.7. Implicit Differentiation. Introduction. Prerequisites. Learning Outcomes
Implicit Differentiation 11.7 Introuction This Section introuces implicit ifferentiation which is use to ifferentiate functions expresse in implicit form (where the variables are foun together). Examples
More informationLECTURE NOTES ON DVORETZKY S THEOREM
LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.
More informationImplicit Differentiation
Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,
More informationMarkov Chains in Continuous Time
Chapter 23 Markov Chains in Continuous Time Previously we looke at Markov chains, where the transitions betweenstatesoccurreatspecifietime- steps. That it, we mae time (a continuous variable) avance in
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationBohr Model of the Hydrogen Atom
Class 2 page 1 Bohr Moel of the Hyrogen Atom The Bohr Moel of the hyrogen atom assumes that the atom consists of one electron orbiting a positively charge nucleus. Although it oes NOT o a goo job of escribing
More informationMatrix Recipes. Javier R. Movellan. December 28, Copyright c 2004 Javier R. Movellan
Matrix Recipes Javier R Movellan December 28, 2006 Copyright c 2004 Javier R Movellan 1 1 Definitions Let x, y be matrices of orer m n an o p respectively, ie x 11 x 1n y 11 y 1p x = y = (1) x m1 x mn
More informationSINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
More informationLagrange duality. The Lagrangian. We consider an optimization program of the form
Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization
More information