13 Principal Components Analysis
|
|
- Lenard Morris
- 5 years ago
- Views:
Transcription
1 Prncpal Components Analyss 13 Prncpal Components Analyss We now dscuss an unsupervsed learnng algorthm, called Prncpal Components Analyss, or PCA. The method s unsupervsed because we are learnng a mappng wthout any examples of what the mappng looks lke; all we see are the outputs, and we want to estmate both the mappng and the nputs. PCA s prmarly a tool for dealng wth hgh-dmensonal data. If our measurements are 17- dmensonal, or 30-dmensonal, or 10,000-dmensonal, manpulatng the data can be extremely dffcult. Qute often, the actual data can be descrbed by a much lower-dmensonal representaton that captures all of the structure of the data. PCA s perhaps the smplest approach for fndng such a representaton, and yet s t also very fast and effectve, resultng n t beng very wdely used. There are several ways n whch PCA can help: Vsualzaton: PCA provdes a way to vsualze the data, by projectng the data down to two or three dmensons that you can plot, n order to get a better sense of the data. Furthermore, the prncpal component vectors sometmes provde nsght as to the nature of the data as well. Preprocessng: Learnng complex models of hgh-dmensonal data s often very slow, and also prone to overfttng the number of parameters n a model s usually exponental n the number of dmensons, meanng that very large data sets are requred for hgher-dmensonal models. Ths problem s generally called the curse of dmensonalty. PCA can be used to frst map the data to a low-dmensonal representaton before applyng a more sophstcated algorthm to t. Wth PCA one can also whten the representaton, whch rebalances the weghts of the data to gve better performance n some cases. Modelng: PCA learns a representaton that s sometmes used as an entre model, e.g., a pror dstrbuton for new data. Compresson: PCA can be used to compress data, by replacng data wth ts low-dmensonal representaton The model and learnng In PCA, we assume we are gven N data vectors {y }, where each vector s D-dmensonal: y R D. Our goal s to replace these vectors wth lower-dmensonal vectors{x } wth dmensonalty C, where C < D. We assume that they are related by a lnear transformaton: C y = Wx+b = w j x j +b (1) j=1 The matrx W can be vewed as a contanng a set of C bass vectors W = [w 1,...,w C ]. If we also assume Gaussan nose n the measurements, ths model s the same as the lnear regresson model studed earler, but now the x s are unknown n addton to the lnear parameters. Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaker 75
2 Prncpal Components Analyss To learn the model, we solve the followng constraned least-squares problem: arg mn W,b,{x } y (Wx +b) 2 (2) subject to W T W = I (3) The constrant W T W = I requres that we obtan an orthonormal mappng W; t s equvalent to sayng that w T w j = { 1 = j 0 j (4) Ths constrant s requred to resolve an ambguty n the mappng: f we dd not requre W to be orthonormal, then the objectve functon s underconstraned (why?). Note that an ambguty remans n the learnng even wth ths constrant (whch one?), but ths ambguty s not very mportant. The x coordnates are often called latent coordnates. The algorthm for mnmzng ths objectve functon s as follows: 1. Letb = 1 N y 2. LetK = 1 N (y b)(y b) T 3. Let VΛV T = K be the egenvector decomposton of K. Λ s a dagonal matrx of egenvalues (Λ = dag(λ 1,...λ D )). The matrx V contans the egenvectors: V = [V 1,...V D ] and s orthonormal V T V = I. 4. Assume that the egenvalues are sorted from largest to smallest (λ λ +1 ). If ths s not the case, sort them (and ther correspondng egenvectors). 5. LetWbe a matrx of the frstc egenvectors: W = [V 1,...V C ]. 6. Letx = W T (y b), for all Reconstructon Suppose we have learned a PCA model, and are gven a new y new value; how do we estmate ts correspondng x new? Ths can be done by mnmzng y new (Wx new +b) 2 (5) Ths s a lnear least-squares problem, and can be solved wth standard methods (n MATLAB, mplemented by the backslash operator). HoweverWs orthonormal, and thus ts transpose s the pseudonverse, so the soluton s gven smply by: x new = W T (y new b) (6) Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaker 76
3 Prncpal Components Analyss 13.3 Propertes of PCA Mean zero coeffcents. One can show that the PCA coeffcents that represent the tranng data,.e.,{x } N =1, are mean zero. mean(x) 1 x = 1 W T (y b) (7) N N ) ( = 1 N WT y Nb (8) = 0 (9) Varance maxmzaton. PCA can also be defned n the followng way; n fact, ths s the orgnal defnton of PCA, and the one that s often meant when people dscuss PCA. However, ths formulaton s exactly equvalent to the one dscussed above. In ths goal, we wsh to fnd the frst prncpal component w 1 to maxmze the varance of the frst coordnate of the data: var(x 1 ) = 1 x 2 1, = 1 (w T N N 1(y b)) 2 (10) such that w 1 2 = 1. Then, we wsh to choose the second prncpal component to be a unt vector and orthogonal to the frst component, whle maxmzng the varance ofx 2. The remanng prncple components are also defned n ths recursve way, so that each component w s a unt vector, orthogonal to all prevous bass vectors. Uncorrelated coeffcents. It s straghtforward to show that the covarance matrx of the PCA coeffcents s the just the upper leftc C submatrx ofλ(.e., the dagonal matrx contanng the C leadng egenvalues ofk. cov(x) 1 (W T (y b))(w T (y b)) T (11) N = 1 N WT ( (y b)(y b) T )W (12) = W T KW (13) = W T VΛV T W (14) = Λ (15) where Λ s the dagonal matrx contanng the C leadng egenvalues n Λ. Ths smple dervaton also shows that the margnal varances of the PCA coeffcents are gven by the egenvalues;.e., var(x j ) = λ j. Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaker 77
4 Prncpal Components Analyss Out of Subspace Error. The total varance n the data s gven by the sum of the egenvalues of the sample covarance matrx K. The varance captured by the PCA subspace representaton s the sum of the frst C egenvalues. The total amount of varance lost n the representaton s gven by the sum of the remanng egenvalues. In fact, one can show that the least-squares error n the approxmaton to the orgnal data provded by the optmal (ML) model parameters, W, {x }, andb, s gven by y (W x +b ) 2 = D j=c+1 λ j. (16) When learnng a PCA model t s common to use the rato of the total LS error and the total varance n the tranng data (.e., the sum of all egenvalues). One needs to choose C to be large enough that ths rato s small (often 0.1 or less) Whtenng Whtenng s a preprocess that replaces the data wth a representaton that has zero-mean and unt covarance, and s often useful as a data preprocessng step. Gven measurements{y }, we replace them wth {z } gven by where Λ s a dagonal matrx of the frstc egenvalues. Then, the sample mean of the z s s equal to 0: z = Λ 1 2 W T (y b) = Λ 1 2 x (17) mean(z) = mean( Λ 1 2 x ) = Λ 1 2 mean(x) = 0 (18) To derve the sample covarance, we wll frst compute the covarance of the untruncated values: z Λ 1 2 V T (y b): cov( z) 1 Λ 1 2 V T (y b)(y b) T VΛ 1 2 (19) N ( = Λ 1 2 V T 1 (y b)(y b) )VΛ T 1 2 (20) N = Λ 1 2 V T KVΛ 1 2 (21) = Λ 1 2 V T VΛV T VΛ 1 2 (22) = I (23) Snce z s just the frstc elements of z,zalso has sample covarance I. Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaker 78
5 Prncpal Components Analyss 13.5 Modelng PCA s sometmes used to model data lkelhood, e.g., we can use t as a form of a pror. For example, suppose we have nosy measurements of some y values and wsh to estmate ther true values. If we parameterze the unknown y values by ther correspondng x values nstead, then we constran the estmated values to le n the low-dmensonal subspace of the orgnal data. However, ths approach mples a unform pror over x values, whch may be nadequate, whle beng ntolerant to devatons from the subspace. A better approach wth an nherently probablstc model s descrbed below Probablstc PCA Probablstc PCA s a way to estmate a probablty dstrbuton p(y); n fact, t s a form of Gaussan dstrbuton. In partcular, we assume the followng probablty dstrbuton: x N(0,I) (24) y = Wx+b+n, n N(0,σ 2 I) (25) where x and n are assumed to be statstcally ndependent. The model says that the low-dmensonal coordnates x (.e., the underlyng causes) come from a unt Gaussan dstrbuton, and the y measurements are a lnear functon of these low-dmensoanl causes, plus Gaussan nose. Note that we do not requre that W be orthonormal anymore (n part because we now constran the magntude of the x varables). Snce any lnear transformaton of a Gaussan varable s tself Gaussan,ymust also be Gaussan. Ths dstrbuton s: p(y) = p(x,y)dx = p(y x)p(x)dx = G(y;Wx+b, σ 2 I)G(x;0, I)dx (26) Evaluatng ths ntegral wll gve usp(y), however, there s a smpler way to solve for the Gaussan dstrbuton. Snce we know that y s Gaussan, all we need to do s derve ts mean and covarance, whch can be done as follows (usng the fact that mathematcal expectaton s lnear): mean(y) = E[y] = E[Wx+b+n] (27) = WE[x]+b+E[n] (28) = b (29) cov(y) = E[(y b)(y b) T ] (30) = E[(Wx+b+n b)(wx+b+n b) T ] (31) = E[(Wx+n)(Wx+n) T ] (32) = E[Wxx T W T ]+E[Wxn T ]+E[nx T W T ]+E[nn T ] (33) = WE[xx T ]W T +WE[x]E[n T ]+E[n]E[x T ]W T +σ 2 I (34) = WW T +σ 2 I (35) Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaker 79
6 Prncpal Components Analyss y 2 w y 2 p(y ˆx) b }ẑ w b p(x) p(y) ˆx x y 1 y 1 Fgure 1: Vsualzaton of PPCA mappng for a 1D to 2D model. A Gaussan n 1D s mapped to a lne, and then blurred wth 2D nose. (Fgure from Pattern Recognton and Machne Learnng by Chrs Bshop.) Hence y N(b,WW T +σ 2 I) (36) In other words, learnng a PPCA model s equvalent to learnng a partcular form of a Gaussan dstrbuton. Ths s llustrated n Fgure 1. The PPCA model s not as general as learnng a full Gaussan model wth ad D covarance matrx; however, t uses fewer numbers to represent the Gaussan (CD+1 versusd 2 /2+D/2; why?). Because the representaton s more compact, t can be estmated from smaller datasets, and requres less memory to store the model. These dfferences wll be sgnfcant when D s large; e.g., f D = 100, the full covarance matrx would requre 5050 parameters and thus requre hundreds of thousands of data ponts to estmate relably. However, f the effectve dmensonalty s, say, 2 or 3, then the PPCA representaton wll only have a few hundred parameters and many fewer measurements. Learnng. The PPCA model can be learned by Maxmum Lkelhood,.e., by mnmzng: L(W,b,σ 2 ) = ln N G(y ; b, WW T +σ 2 I) (37) =1 = 1 (y b) T (WW T +σ 2 I) 1 (y b)+ N 2 2 ln(2π)d WW T +σ 2 I (38) Ths can be optmzed n closed form. The soluton s very smlar to the conventonal PCA case: 1. Letb = 1 N y 2. LetK = 1 N (y b)(y b) T Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaker 80
7 Prncpal Components Analyss 3. Let VΛV T = K be the egenvector decomposton of K. Λ s a dagonal matrx of egenvalues (Λ = dag(λ 1,...λ D )). The matrx V contans the egenvectors: V = [V 1,...V D ] and s orthonormal V T V = I. 4. Assume that the egenvalues are sorted from largest to smallest (λ λ +1 ). If ths s not the case, sort them (and ther correspondng egenvectors). 5. Let σ 2 = 1 D C D j=c+1 λ j. In words, the estmated nose varance s equal to the average margnal data varance over all drectons that are orthogonal to the C prncpal drectons (.e., ths s the average varance (per dmenson) of the data that s lost n the approxmaton of the data n the C dmensonal subspace). 6. Let Ṽ be the matrx comprsng the frst C egenvectors: Ṽ = [V 1,...V C ], and let Λ be the dagonal matrx wth the C leadng egenvalues: Λ = [λ 1,...λ C ]. 7. W = Ṽ( Λ σ 2 I) Letx = W T (y b), for all. Note that ths soluton s smlar to that n the conventonal PCA case wth whtenng, except that (a) the nose varance s estmated, and (b) the nose s removed from the varances of the remanng egenvalues. An alternatve optmzaton. In the above learnng algorthm, we margnalzed out x when estmatng PPCA. In other words, we maxmzed p(y 1:N W,b,σ 2 ) = p(y 1:N,x 1:N W,b,σ 2 )dx 1:N (39) = p(y 1:N x 1:N,W,b,σ 2 )p(x 1:N )dx 1:N (40) = p(y x,w,b,σ 2 )p(x )dx (41) nstead of maxmzng p(y 1:N,x 1:N W,b,σ 2 ) = = p(y,x W,b,σ 2 ) (42) p(y x,w,b,σ 2 )p(x ) (43) By ntegratng out x, we are estmatng fewer parameters and thus can get better estmates. Loosely speakng, dong so mght be vewed as beng more Bayesan. Suppose we dd nstead try to Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaker 81
8 Prncpal Components Analyss estmate the x s together wth the model parameters: L(x 1:N,W,b,σ 2 ) = lnp(y 1:N,x 1:N W,b,σ 2 ) (44) = ( 1 2σ 2 y (Wx +b) ) 2 x 2 + ND 2 lnσ2 +NDln2π (45) Now, suppose we are optmzng ths objectve functon, and we have some estmates for W and x. We can always reduce the objectve functon by replacng W 2W (46) x x/2 (47) By dong ths replacement arbtrarly many tmes, we can get nfntesmal values for x. Ths ndcates that the objectve functon s degenerate; usng t wll yeld to very poor results. Note that, however, ths arses usng the same model as before, but wthout margnalzng out x. Ths llustrates a general prncple: the more parameters you estmate (nstead of margnalzng out), the greater the danger of based and/or degenerate solutons. Copyrght c 2015 Aaron Hertzmann, Davd J. Fleet and Marcus Brubaker 82
14 Lagrange Multipliers
Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More information15 Lagrange Multipliers
15 The Method of s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve physcs equatons), t s used for several ey dervatons n
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More information11 Bayesian Methods. p(w D) = p(d w)p(w) (1)
Bayesan Methods Bayesan Methods So far, we have consdered statstcal methods whch select a sngle best model gven the data. Ths approach can have problems, such as over-fttng when there s not enough data
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationStatistical learning
Statstcal learnng Model the data generaton process Learn the model parameters Crteron to optmze: Lkelhood of the dataset (maxmzaton) Maxmum Lkelhood (ML) Estmaton: Dataset X Statstcal model p(x;θ) (θ parameters)
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationC/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1
C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned
More informationUnified Subspace Analysis for Face Recognition
Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationNon-linear Canonical Correlation Analysis Using a RBF Network
ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More information8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 493 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces you have studed thus far n the text are real vector spaces because the scalars
More informationCHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD
CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationLECTURE 9 CANONICAL CORRELATION ANALYSIS
LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of
More informationEconomics 130. Lecture 4 Simple Linear Regression Continued
Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More information17 Support Vector Machines
17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:
More informationLECTURE :FACTOR ANALYSIS
LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If
More informationChat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980
MT07: Multvarate Statstcal Methods Mke Tso: emal mke.tso@manchester.ac.uk Webpage for notes: http://www.maths.manchester.ac.uk/~mkt/new_teachng.htm. Introducton to multvarate data. Books Chat eld, C. and
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationRelevance Vector Machines Explained
October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More information= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.
Chapter Matlab Exercses Chapter Matlab Exercses. Consder the lnear system of Example n Secton.. x x x y z y y z (a) Use the MATLAB command rref to solve the system. (b) Let A be the coeffcent matrx and
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationA linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:
Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationEstimating the Fundamental Matrix by Transforming Image Points in Projective Space 1
Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com
More information1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands
Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationMaximum likelihood. Fredrik Ronquist. September 28, 2005
Maxmum lkelhood Fredrk Ronqust September 28, 2005 Introducton Now that we have explored a number of evolutonary models, rangng from smple to complex, let us examne how we can use them n statstcal nference.
More informationWhy Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)
Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch
More informationBezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0
Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationLecture 3. Ax x i a i. i i
18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationNatural Images, Gaussian Mixtures and Dead Leaves Supplementary Material
Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess
More informationTHE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More information2016 Wiley. Study Session 2: Ethical and Professional Standards Application
6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton
More information4.3 Poisson Regression
of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)
More informationAutomatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models
Automatc Object Trajectory- Based Moton Recognton Usng Gaussan Mxture Models Fasal I. Bashr, Ashfaq A. Khokhar, Dan Schonfeld Electrcal and Computer Engneerng, Unversty of Illnos at Chcago. Chcago, IL,
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationFisher Linear Discriminant Analysis
Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationBézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0
Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationWorkshop: Approximating energies and wave functions Quantum aspects of physical chemistry
Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationGaussian process classification: a message-passing viewpoint
Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP
More informationLecture 4: Constant Time SVD Approximation
Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationLearning from Data 1 Naive Bayes
Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why
More information