Homework 2 EM, Mixture Models, PCA, Dualitys
|
|
- Cecil Chandler
- 5 years ago
- Views:
Transcription
1 Homework 2 EM, Mixture Moels, PCA, Dualitys CMU : Machine Learning (Fall 2015) OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The homework is ue at 10:20 am on Monay October 19, Each stuent will given two late ays that can be spent on any homeworks, but at most one late ay per homework. Once you have use up your late ays for the term, late homework submissions will receive no creit. Submit both a paper copy an an electronic copy through through the submission website: https: //autolab.cs.cmu.eu/courses/10715-f15. You can sign in using your Anrew creentials. You shoul make sure to eit your account information an choose a nickname/hanle. This hanle will be use to isplay your results for any competition style questions on the class leaerboar. Some questions will be autograe. Please make sure to carefully follow the submission instructions for these questions. We recommen that you typeset your solutions using software such as LA TEXİf you write, ensure your hanwriting is clear an legible. The TAs will not invest unue effort to ecrypt ba hanwriting. Programming guielines: Octave: You must write submitte coe in Octave. Octave is a free scientific programming language, with syntax similar to that of MATLAB. Installation instructions can be foun on the Octave website. (You can evelop your coe in MATLAB if you prefer, but you must test it in Octave before submitting, or it may fail in the autograer.) Autograing: This problem is autograe using the CMU Autolab system. The coe which you write will be execute remotely against a suite of tests, an the results use to automatically assign you a grae. To make sure your coe executes correctly on our servers, you shoul avoi using libraries which are not present in the basic Octave install. Submission Instructions: For each programming question you will be given a function signature. You will be aske to write a single Octave function which satisfies the signature. In the coe hanout linke above, we have provie you with a single foler containing stubs for each of the functions you nee to complete. Do not moify the structure of this irectory or rename these files. Complete each of these functions, then compress this irectory as a tar file an submit to Autolab online. You may submit coe as many times as you like. When you ownloa the files, you shoul confirm that the autograer is functioning correctly by compressing an submitting the irectory of stubs provie. This shoul result in a grae of zero for all questions. SUBMISSION CHECKLIST Submission executes on our machines in less than 20 minutes. Submission is smaller than 2000K. Submission is a.tar file. Submission returns matrices of the exact imension specifie. 1
2 1 An EM algorithm for a Mixture of Bernoullis [Eric; 25 pts] In this section, you will erive an expectation-maximization (EM) algorithm to cluster black an white images. The inputs x (i) can be thought of as vectors of binary values corresponing to black an white pixel values, an the goal is to cluster the images into groups. You will be using a mixture of Bernoullis moel to tackle this problem. For the sake of brevity, you o not nee to substitute in previously erive expression in later problems. For example, beyon question you may use P (x (i) p (k) ) in your answers. 1.1 Mixture of Bernoullis 1. (2pts) Consier a vector of binary ranom variables, x {0, 1} D. Assume each variable x is rawn from a Bernoulli(p ) istribution, so P (x = 1) = p. Let p (0, 1) D be the resulting vector of Bernoulli parameters. Write an expression for P (x p). 2. (2pts) Now suppose we have a mixture of K Bernoulli istributions: each vector x (i) is rawn from some vector of Bernoulli ranom variables with parameters p (k), we will call this Bernoulli(p (k) ). Let {p (1),..., p (K) } = p. Assume a istribution π(k) over the selection of which set of Bernoulli parameters p (k) is chosen. Write an expression for P (x (i) p, π). 3. (2pts) Finally, suppose we have inputs X = {x (i) }...n. Using the above, write an expression for the log likelihoo of the ata X, log P (X π, p). 1.2 Expectation step 1. (4pts) Now, we introuce the latent variables for the EM algorithm. Let z (i) {0, 1} K be an inicator vector, such that z (i) k = 1 if x (i) was rawn from a Bernoulli(p (k) ), an 0 otherwise. Let Z = {z (i) }...n. What is P (z (i) π)? What is P (x (i) z (i), p, π)? 2. (2pts) Using the above two quantities, erive the likelihoo of the ata an the latent variables, P (Z, X π, p). 3. (5pts) Let η(z (i) k ) = E[z(i) k x(i), π, p]. Show that η(z (i) k ) = π k D =1 (p(k) D =1 (p(j) j π j )x(i) )x(i) (1 p (k) )1 x(i) (1 p (j) )1 x(i) Let p, π be the new parameters that we like to maximize, so p, π are from the previous iteration. Use this to erive the following final expression for the E step in the expectation-maximization algorithm: [ ] K D ( ) E[log P (Z, X p, π) X, p, π] = η(z (i) k ) log π k + x (i) log p(k) + (1 x (i) ) log(1 p(k) ) 1.3 Maximization step k=1 =1 1. (4pts) We nee to maximize the above expression with respect to π, p. First, show that the value of p that maximizes the E step is N p (k) = η(z(i) k )x(i) N k where N k = N η(z(i) k ). 2. (4pts) Show that the value of π that maximizes the E step is π k = N k k N k The exponential families notation may be useful. Alternatively, you can use Lagrange multipliers. 2
3 2 Clustering Images of Numbers [Eric; 25 pts] In this section you will use the above algorithm to cluster images of numbers. You will be using the MNIST ataset. Each input is a binary number corresponing to black an white pixels, an is a flattene version of the 28x28 pixel image. We will use the following conventions: N is the number of atapoints, D is the imension of each input, an K is the number of clusters. Xs is an N D matrix of the input ata, where row i is the pixel ata for picture i. p is a K D matrix of Bernoulli parameters, where row k is the vector of parameters for the kth mixture of Bernoullis. mix p is a K 1 vector containing the istribution over the various mixtures. eta is a N K matrix containing the results of the E step, so eta[i,k] = η(z (i) k ). clusters is an N 1 vector containing the final cluster labels of the input ata. Each label is a number from 1 to K. 2.1 Programming (20pts) 1. Implement the E step of the algorithm within [eta] = Estep(Xs,p,mix p), saving your calculate values within eta. 2. Implement the M step of the algorithm within [p,mix p] = Mstep(Xs, moel, alpha1, alpha2). p an mix p returne by this function shoul contain the new values that maximize the E step. alpha1, alpha2 are Dirichlet smoothing parameters (explaine below). 3. Implement [clusters] = MoBlabels(Xs,p,mix p). This function will take in the estimate parameters an return the resulting labels that cluster the ata. 4. Some hints an tips: The autograer will separately grae the accuracy of your implementation separately for the E-step (4pts), M-step (4pts), an M-step with smoothing (2pts). Finally, it will run your implementation for several iterations on a subset of the MNIST ataset (8pts). A small subset of the MNIST ataset is provie for your convenience. Use the log operator to make your calculations more numerically stable. In particular, pay attention to the calculation of η(z (i) k ). You will nee to avoi zeros in π an p or else you will take log(0) =. Use Dirichlet prior smoothing with the parameters α 1, α 2 when upating these variables: p (k) = N η(z(i) k )x(i) + α 1 N k + α 1 D N k + α 2 π k = k N k + α 2K Initialize your parameters p by ranomly sampling from a Uniform(0, 1) istribution an normalizing each p (k) to have unit length, an π k = 1/k. Running your implementation on the MNIST ataset with K = 20 clusters an α 1 = α 2 = 10 8 for 20 iterations shoul give you some results similar to the following (our reference coe runs in approximately 10 secons): 3
4 2.2 Analysis 1. (5pts) For each cluster, reshape the pixels into a 28x28 matrix an print the resulting grayscale images. What o you see? Explain, an inclue one such image in your writeup. See software/octave/oc/interpreter/representing-images.html for help on printing the matrix, or use the provie helper function show clusters(p, a, b) which will print the mixtures in p in an a b gri. 2. (2pts) Using your implemente MoBlabels function, cluster the ata. Using the true labels given in ytrain, how many unique igits oes each cluster typically have? Are there any clusters that picke out exactly one igit? 3 Kernel PCA [Fish; 15 pts] Principal component analysis (PCA) is use to emphasize variation an is often use to make ata easy to explore an visualize. Suppose we have ata x 1, x 2, x N R that has zero mean. PCA aims at fining a new coorinate system such that the greatest variance by some projection of the ata comes to lie on the first coorinate, the secon greatest variance on the secon coorinate, an so on. The basis vectors of the new coorinate system are the eigenvectors of the co-variance matrix, i.e., 1 N x i x T i = λ i v i vi T, (1) where v 1, v 2,, v are orthogonal (< v i, v j >= 0 for i j). Then we can plot our ata points on our new coorinate system. The position of each ata points in the new coorinate system can be erive by projecting x on to the basis of the new coorinate system, i.e., v 1, v 2, v. 3.1 kernel PCA Often we will want to make linear or non-linear transformation on the ata so that they are projecte to a higher imensional space. Suppose there is a transformation function φ : R R l. We map all the ata to a new space through this function, an we have φ(x 1 ), φ(x 2 ),, φ(x N ). We again want to o PCA on the transforme ata in the new space. How can we o this? 1. (2pts) Write out the co-variance matrix, C, for φ(x 1 ), φ(x 2 ),, φ(x N ). (Define φ(x) = 1 N φ(x j )) j=1 4
5 2. (2pts) Now we want to fin the basis vectors for the orthogonalize new space by solving eigenvectors ( of C. Using the ) ( efinition of ) an eigenvector, ( λv = ) Cv, explain why v is a linear combination of φ(x 1 ) φ(x), φ(x 2 ) φ(x),, φ(x N ) φ(x). Since v is a linear combination of these vectors, v = ( ) N α i φ(x i ) φ(x). 3. (2pts) Before starting to erive α an fin out v, we introuce kernel function here. A kernel function is in the form of k(x i, x j ) = φ(x i ), φ(x j ). Notice that we o not nee to know the function φ to calculate k(x i, x j ). We can simply efine a function that is relate to x i, x j an k(x i, x j ) = k(x j, x i ). A classic example is Raial basis function (RBF) kernel, which is k(x i, x j ) = exp( x i x j 2 /2σ 2 ). In orer to use these kernel function, we nee to get ri of all the φ(x i ) an replace them with the kernel function. A kernel matrix is efine as K R N N an K ij = k(x i, x j ). Since we are ealing with non-centere ata, we first use a moifie kernel matrix K ( ) ( ) ij = φ(x i ) φ(x), φ(x j ) φ(x). By using the results in question an an the efinition of eigenvectors, show that 4. (2pts) Show that the solutions α in are also solutions for equation (3pts) Show that Thus, by solving the eigenvectors of K, we get α. Nλ Kα = K 2 α. (2) Nλα = Kα. (3) K = (I ee T )K(I ee T ). (4) 6. (2pts)After obtaining α, we get the un-normalize basis vector v. To normalize it, what is the factor you nee to multiply to v? 7. (2pts) For a ata point x, what is its position in the normalize new space? Explain why you can get the new coorinate without explicitly calculate φ(x). 4 Huber function an its application in solving ual problem [Fish; 35pts] 4.1 Huber function Define a function where > (3pts) Show that the function B(x, ) = min λ + x2 λ 0 λ +. (5) B(x, ) = { x 2 2 x if x if x > (6) with the minimizer λ = max(0, x ). B is often calle a huber function. 2. (3pts) Is the function B convex in x? (Assume is a fixe constant) 3. (3pts) Derive the graient of function B at any given point (x, ). (Remember > 0.) 5
6 4. (3pts) Function B is often use as a penalty term in classification or regression problems, which have the form min x L(x) + B(x, ), (7) where L is the loss function, an is a parameter with positive value. Describe how parameter affects the penalty term B on the solution. 4.2 An application of using Huber loss to solve ual problem Consier the optimization problem min x ( 1 2 ix 2 i + r i x i ) (8) s.t. a T x = 1, x i [ 1, 1] for i = 1, 2, n. (9) 1. (3pts) Show that the problem has strictly feasible solutions if an only if a 1 > (3pts) Re-express x i [ 1, 1] as x 2 i 1, for i = 1, 2,, n. Write own the ual of this problem. 3. (6pts) Write own the KKT conition for this problem. Does it characterize the optimal solution? 4. (5pts) Show that we can further reuce the ual problem to a one-imensional convex problem min µ µ where B is the Huber function efine in the previous section. B(r i + µa i, i ), (10) 5. (3pts) Describe an algorithm to solve this ual problem. What is the time complexity for your algorithm? 6. (3pts) How can you recover an optimal primal solution x after solving the ual? 6
Homework 2 Solutions EM, Mixture Models, PCA, Dualitys
Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationMATH , 06 Differential Equations Section 03: MWF 1:00pm-1:50pm McLaury 306 Section 06: MWF 3:00pm-3:50pm EEP 208
MATH 321-03, 06 Differential Equations Section 03: MWF 1:00pm-1:50pm McLaury 306 Section 06: MWF 3:00pm-3:50pm EEP 208 Instructor: Brent Deschamp Email: brent.eschamp@ssmt.eu Office: McLaury 316B Phone:
More informationinflow outflow Part I. Regular tasks for MAE598/494 Task 1
MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the
More informationHOMEWORK 4: SVMS AND KERNELS
HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning Homework 2, version 1.0 Due Oct 16, 11:59 am Rules: 1. Homework submission is done via CMU Autolab system. Please package your writeup and code into a zip or tar
More informationII. First variation of functionals
II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationState-Space Model for a Multi-Machine System
State-Space Moel for a Multi-Machine System These notes parallel section.4 in the text. We are ealing with classically moele machines (IEEE Type.), constant impeance loas, an a network reuce to its internal
More informationLagrangian and Hamiltonian Mechanics
Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical
More informationHomework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent
Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Nov 4 DUE: Nov 18, 11:59 PM START HERE: Instructions Collaboration policy: Collaboration
More informationHomework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn
Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September
More informationEuler equations for multiple integrals
Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationAdmin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016
Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationProgramming Assignment 4: Image Completion using Mixture of Bernoullis
Programming Assignment 4: Image Completion using Mixture of Bernoullis Deadline: Tuesday, April 4, at 11:59pm TA: Renie Liao (csc321ta@cs.toronto.edu) Submission: You must submit two files through MarkUs
More information3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes
Fin these erivatives of these functions: y.7 Implicit Differentiation -- A Brief Introuction -- Stuent Notes tan y sin tan = sin y e = e = Write the inverses of these functions: y tan y sin How woul we
More informationCS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA
CS9840 Learning an Computer Vision Prof. Olga Veksler Lecture Some Concepts from Computer Vision Curse of Dimensionality PCA Some Slies are from Cornelia, Fermüller, Mubarak Shah, Gary Braski, Sebastian
More informationMA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth
MA 2232 Lecture 08 - Review of Log an Exponential Functions an Exponential Growth Friay, February 2, 2018. Objectives: Review log an exponential functions, their erivative an integration formulas. Exponential
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationMehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationEquations of lines in
Roberto s Notes on Linear Algebra Chapter 6: Lines, planes an other straight objects Section 1 Equations of lines in What ou nee to know alrea: The ot prouct. The corresponence between equations an graphs.
More informationYORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics. MATH A Test #2. June 25, 2014 SOLUTIONS
YORK UNIVERSITY Faculty of Science Department of Mathematics an Statistics MATH 505 6.00 A Test # June 5, 04 SOLUTIONS Family Name (print): Given Name: Stuent No: Signature: INSTRUCTIONS:. Please write
More informationQuantum Mechanics in Three Dimensions
Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationSummary: Differentiation
Techniques of Differentiation. Inverse Trigonometric functions The basic formulas (available in MF5 are: Summary: Differentiation ( sin ( cos The basic formula can be generalize as follows: Note: ( sin
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationBohr Model of the Hydrogen Atom
Class 2 page 1 Bohr Moel of the Hyrogen Atom The Bohr Moel of the hyrogen atom assumes that the atom consists of one electron orbiting a positively charge nucleus. Although it oes NOT o a goo job of escribing
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationTensors, Fields Pt. 1 and the Lie Bracket Pt. 1
Tensors, Fiels Pt. 1 an the Lie Bracket Pt. 1 PHYS 500 - Southern Illinois University September 8, 2016 PHYS 500 - Southern Illinois University Tensors, Fiels Pt. 1 an the Lie Bracket Pt. 1 September 8,
More informationMatrix Recipes. Javier R. Movellan. December 28, Copyright c 2004 Javier R. Movellan
Matrix Recipes Javier R Movellan December 28, 2006 Copyright c 2004 Javier R Movellan 1 1 Definitions Let x, y be matrices of orer m n an o p respectively, ie x 11 x 1n y 11 y 1p x = y = (1) x m1 x mn
More informationFunction Spaces. 1 Hilbert Spaces
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationHomework 6: Image Completion using Mixture of Bernoullis
Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled
More information3.6. Let s write out the sample space for this random experiment:
STAT 5 3 Let s write out the sample space for this ranom experiment: S = {(, 2), (, 3), (, 4), (, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)} This sample space assumes the orering of the balls
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics
UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationSYNCHRONOUS SEQUENTIAL CIRCUITS
CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationProblem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs
Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationSome vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10
Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the
More informationMachine Learning: Assignment 1
10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this
More informationCalculus of Variations
Calculus of Variations Lagrangian formalism is the main tool of theoretical classical mechanics. Calculus of Variations is a part of Mathematics which Lagrangian formalism is base on. In this section,
More information164 Final Solutions 1
164 Final Solutions 1 1. Question 1 True/False a) Let f : R R be a C 3 function such that fx) for all x R. Then the graient escent algorithm starte at the point will fin the global minimum of f. FALSE.
More informationEE 418: Network Security and Cryptography
Problem 1 EE 418: Network Security an Cryptography Homework 5 Assigne: Wenesay, November 23, 2016, Due: Tuesay, December 6, 2016 Instructor: Tamara Bonaci Department of Electrical Engineering University
More informationEE 595 (PMP) Introduction to Security and Privacy Homework 4
EE 595 (PMP) Introuction to Security an Privacy Homework 4 Assigne: Monay, February 12, 2017, Due: Sunay, March 5, 2017 Instructor: Tamara Bonaci Department of Electrical Engineering University of Washington,
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationLecture 10: October 30, 2017
Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationThe Principle of Least Action and Designing Fiber Optics
University of Southampton Department of Physics & Astronomy Year 2 Theory Labs The Principle of Least Action an Designing Fiber Optics 1 Purpose of this Moule We will be intereste in esigning fiber optic
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationMTH 133 Solutions to Exam 1 February 21, Without fully opening the exam, check that you have pages 1 through 11.
MTH Solutions to Eam February, 8 Name: Section: Recitation Instructor: INSTRUCTIONS Fill in your name, etc. on this first page. Without fully opening the eam, check that you have pages through. Show all
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationRank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col
Review of Linear Algebra { E18 Hanout Vectors an Their Inner Proucts Let X an Y be two vectors: an Their inner prouct is ene as X =[x1; ;x n ] T Y =[y1; ;y n ] T (X; Y ) = X T Y = x k y k k=1 where T an
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationMath 1271 Solutions for Fall 2005 Final Exam
Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly
More information05 The Continuum Limit and the Wave Equation
Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,
More informationAP Calculus. Derivatives and Their Applications. Presenter Notes
AP Calculus Derivatives an Their Applications Presenter Notes 2017 2018 EDITION Copyright 2017 National Math + Science Initiative, Dallas, Texas. All rights reserve. Visit us online at www.nms.org Copyright
More informationLecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations
Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:
More informationSolving the Schrödinger Equation for the 1 Electron Atom (Hydrogen-Like)
Stockton Univeristy Chemistry Program, School of Natural Sciences an Mathematics 101 Vera King Farris Dr, Galloway, NJ CHEM 340: Physical Chemistry II Solving the Schröinger Equation for the 1 Electron
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationA note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz
A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae
More informationDiagonalization of Matrices Dr. E. Jacobs
Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More information1 A Support Vector Machine without Support Vectors
CS/CNS/EE 53 Advanced Topics in Machine Learning Problem Set 1 Handed out: 15 Jan 010 Due: 01 Feb 010 1 A Support Vector Machine without Support Vectors In this question, you ll be implementing an online
More informationCalculus and optimization
Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationRotation by 90 degree counterclockwise
0.1. Injective an suejective linear map. Given a linear map T : V W of linear spaces. It is natrual to ask: Problem 0.1. Is that possible to fin an inverse linear map T 1 : W V? We briefly review those
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationPhysics 251 Results for Matrix Exponentials Spring 2017
Physics 25 Results for Matrix Exponentials Spring 27. Properties of the Matrix Exponential Let A be a real or complex n n matrix. The exponential of A is efine via its Taylor series, e A A n = I + n!,
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationFinal Exam: Sat 12 Dec 2009, 09:00-12:00
MATH 1013 SECTIONS A: Professor Szeptycki APPLIED CALCULUS I, FALL 009 B: Professor Toms C: Professor Szeto NAME: STUDENT #: SECTION: No ai (e.g. calculator, written notes) is allowe. Final Exam: Sat 1
More informationOutcomes. Unit 14. Review of State Machines STATE MACHINES OVERVIEW. State Machine Design
4. Outcomes 4.2 Unit 4 tate Machine Design I can create a state iagram to solve a sequential problem I can implement a working state machine given a state iagram 4.3 Review of tate Machines 4.4 TATE MACHINE
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationAPPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France
APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationDerivative Methods: (csc(x)) = csc(x) cot(x)
EXAM 2 IS TUESDAY IN QUIZ SECTION Allowe:. A Ti-30x IIS Calculator 2. An 8.5 by inch sheet of hanwritten notes (front/back) 3. A pencil or black/blue pen Covers: 3.-3.6, 0.2, 3.9, 3.0, 4. Quick Review
More information