COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
|
|
- Giles Wilson
- 5 years ago
- Views:
Transcription
1 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University
2 PRINCIPAL COMPONENT ANALYSIS
3 DIMENSIONALITY REDUCTION We re given data x 1,..., x n, where x R d. This data is often high-dimensional, but the information doesn t use the full d dimensions. For example, we could represent the above images with three numbers since they have three degrees of freedom. Two for shifts and a third for rotation. Principal component analysis can be thought of as a way of automatically mapping data x i into some new low-dimensional coordinate system. It capture most of the information in the data in a few dimensions Extensions allow us to handle missing data, and unwrap the data.
4 PRINCIPAL COMPONENT ANALYSIS Example: How can we approximate this data using a unit-length vector q? x i q is a unit-length vector, q T q = 1. q (q T x i )q Red dot: The length, q T x i, to the axis after projecting x onto the line defined by q. The vector (q T x i )q takes q and stretches it to the corresponding red dot. So what s a good q? How about minimizing the squared approximation error, q = arg min q x i qq T x i 2 subject to q T q = 1 qq T x i = (q T x i )q : The approximation of x i by stretching q to the red dot.
5 PCA : THE FIRST PRINCIPAL COMPONENT This is related to the problem of finding the largest eigenvalue, q = arg min q = arg min q x i qq T x i 2 s.t. q T q = 1 ( ) xi T x i q T x i xi T q }{{} = XX T We ve defined X = [x 1,..., x n ]. Since the first term doesn t depend on q and we have a negative sign in front of the second term, equivalently we solve q = arg max q q T (XX T )q subject to q T q = 1 This is the eigendecomposition problem: q is the first eigenvector of XX T λ = q T (XX T )q is the first eigenvalue
6 PCA: GENERAL The general form of PCA considers K eigenvectors, q = arg min q = arg min q x i xi T x i K { (xi T q k )q k 2 s.t. q T 1, k = k k q k = 0, k k k=1 }{{} approximates x K k=1 q T k ( ) x i xi T } {{ } = XX T q k The vectors in Q = [q 1,..., q K ] give us a K dimensional subspace with which to represent the data: q T 1 x K x proj =., x (q T k x)q k = Qx proj q T k=1 Kx The eigenvectors of (XX T ) can be learned using built-in software.
7 EIGENVALUES, EIGENVECTORS AND THE SVD An equivalent formulation of the problem is to find (λ, q) such that (XX T )q = λq Since (XX T ) is a PSD matrix, there are r min{d, n} pairs, λ 1 λ 2 λ r > 0, q T k q k = 1, q T k q k = 0 Why is (XX T ) PSD? Using the SVD, X = USV T, we have that (XX T ) = US 2 U T Q = U, λ i = (S 2 ) ii 0 Preprocessing: Usually we first subtract off the mean of each dimension of x.
8 PCA: EXAMPLE OF PROJECTING FROM R 3 TO R 2 For this data, most information (structure in the data) can be captured in R 2. (left) The original data in R 3. The hyperplane is defined by q 1 and q 2. [ ] (right) The new coordinates for the data: x i x proj x T i = i q 1 xi T. q 2
9 EXAMPLE: DIGITS Data: images of handwritten 3 s (as vectors in R 256 ) Mean λ 1 = λ 2 = λ 3 = λ 4 = Above: The first four eigenvectors q and their eigenvalues λ. Original M = 1 M = 10 M = 50 M = 250 Above: Reconstructing a 3 using the first M 1 eigenvectors plus the mean, and approximation M 1 x mean + (x T q k )q k k=1
10 PROBABILISTIC PCA
11 PCA AND THE SVD We ve discussed how any matrix X has a singular value decomposition, X = USV T, U T U = I, V T V = I and S is a diagonal matrix with non-negative entries. Therefore, XX T = US 2 U T (XX T )U = US 2 U is a matrix of eigenvectors, and S 2 is a diagonal matrix of eigenvalues.
12 A MODELING APPROACH TO PCA Using the SVD perspective of PCA, we can also derive a probabilistic model for the problem and use the EM algorithm to learn it. This model will have the advantages of: Handling the problem of missing data Allowing us to learn additional parameters such as noise Provide a framework that could be extended to more complex models Gives distributions used to characterize uncertainty in predictions etc.
13 PROBABILISTIC PCA In effect, this is a new matrix factorization model. With the SVD, we had X = USV T. We now approximate X WZ, where W is a d K matrix. In different settings this is called a factor loadings matrix, or a dictionary. It s like the eigenvectors, but no orthonormality. The ith column of Z is called zi R K. Think of it as a low-dimensional representation of x i. The generative process of Probabilistic PCA is x i N(Wz i, σ 2 I), z i N(0, I). In this case, we don t know W or any of the z i.
14 THE LIKELIHOOD Maximum likelihood Our goal is to find the maximum likelihood solution of the matrix W under the marginal distribution, i.e., with the z i vectors integrated out, W ML = arg max W ln p(x 1,..., x n W) = arg max W ln p(x i W). This is intractable because p(x i W) = N(x i 0, σ 2 I + WW T ), N(x i 0, σ 2 I + WW T 1 ) = (2π) d 2 σ 2 I + WW T 1 2 e 1 2 xt (σ 2 I+WW T ) 1 x We can set up an EM algorithm that uses the vectors z 1,..., z n.
15 EM FOR PROBABILISTIC PCA Setup The marginal log likelihood can be expressed using EM as ln p(x i, z i W) dz i = + q(z i ) ln p(x i, z i W) q(z i ) q(z i ) ln dz i L q(z i ) p(z i x i, W) dz i KL EM Algorithm: Remember that EM has two iterated steps 1. Set q(z i ) = p(z i x i, W) for each i (making KL = 0) and calculate L 2. Maximize L with respect to W Again, for this to work well we need that we can calculate the posterior distribution p(z i x i, W), and maximizing L is easy, i.e., we update W using a simple equation
16 THE ALGORITHM EM for Probabilistic PCA Given: Data x 1:n, x i R d and model x i N(Wz i, σ 2 ), z i N(0, I), z R K Output: Point estimate of W and posterior distribution on each z i E-Step: Set each q(z i ) = p(z i x i, W) = N(z i µ i, Σ i ) where Σ i = (I + W T W/σ 2 ) 1, µ i = Σ i W T x i /σ 2 M-Step: Update W by maximizing the objective L from the E-step [ ] [ W = x i µ T i σ 2 I + (µ i µ T i + Σ i ) ] 1 Iterate E and M steps until increase in n ln p(x i W) is small. Comment: The probabilistic framework gives a way to learn K and σ 2 as well.
17 EXAMPLE: IMAGE PROCESSING = 8 x 8 patch X data matrix, e.g., 64 x 262,144 For image problems such as denoising or inpainting (missing data) Extract overlapping patches (e.g., 8 8) and vectorize to construct X Model with a factor model such as Probabilistic PCA Approximate x i Wµ i, where µ i is the posterior mean of z i Reconstruct the image by replacing x i with Wµ i (and averaging)
18 EXAMPLE: DENOISING Noisy image on left, denoised image on right. The noise variance parameter σ 2 was learned for this example.
19 E XAMPLE : M ISSING DATA Another somewhat extreme example: I Image is (RGB dimension) I Throw away 80% at random I (left) Missing data, (middle) reconstruction, (right) original image
20 KERNEL PCA
21 KERNEL PCA We ve seen how we can take an algorithm that uses dot products, x T x, and generalize with a nonlinear kernel. This generalization can be made to PCA. Recall: With PCA we find the eigenvectors of the matrix n x ix T i = XX T. Let φ(x) be a feature mapping from R d to R D, where D d We want to solve the eigendecomposition [ ] φ(x i )φ(x i ) T q k = λ k q k without having to work in the higher dimensional space. That is, how can we do PCA without explicitly using φ( ) and q?
22 KERNEL PCA Notice that we can reorganize the operations of the eigendecomposition φ(x i ) ( φ(x i ) T ) q k /λk = q k }{{} = a ki That is, the eigenvector q k = n a kiφ(x i ) for some vector a k R n. The trick is that instead of learning q k, we ll learn a k. Plug this equation for q k back into the first equation: N φ(x i ) j=1 a kj φ(x i ) T φ(x j ) }{{} = K(x i,x j) = λ k a ki φ(x i ) and multiply both sides by φ(x l ) T for each l {1,..., n}.
23 KERNEL PCA When we multiply the following by φ(x l ) T for l = 1..., n: N φ(x i ) j=1 we get a new set of linear equations a kj φ(x i ) T φ(x j ) }{{} = K(x i,x j) = λ k a ki φ(x i ) K 2 a k = λ k Ka k Ka k = λ k a k where K is the n n kernel matrix constructed on the data. Because K is guaranteed to be PSD because it is a matrix of dot-products, the LHS and RHS above share a solution for (λ k, a k ). Now perform regular PCA, but on the kernel matrix K instead of the data matrix XX T. We summarize the algorithm on the following slide.
24 KERNEL PCA ALGORITHM Kernel PCA Given: Data x 1,..., x n, x R d, and a kernel function K(x i, x j ). Construct: The kernel matrix on the data, e.g., K ij = b exp Solve: The eigendecomposition { xi xj 2 c }. Ka k = λ k a k for the first r n eigenvector/eigenvalue pairs (λ 1, a 1 ),..., (λ r, a r ). Output: A new coordinate system for x i by (implicitly) mapping φ(x i ) and then projecting q T k φ(x i) x i projection λ 1 a 1i. λ r a ri where a ki is the ith dimension of the kth eigenvector a k.
25 KERNEL PCA AND NEW DATA Q: How do we handle new data, x 0? Before, we could take the eigenvectors q k and project x T 0 q k, but a k is different here. A: Recall the relationship of a k to q k in kernel PCA is q k = a ki φ(x i ). We used the kernel trick to avoid working with or even defining φ(x i ). As with regular PCA, after mapping x 0 we want to project onto eigenvectors x 0 projection φ(x 0 ) T q 1. φ(x 0 ) T q r Plugging in for q k : φ(x 0 ) T q k = a ki φ(x 0 ) T φ(x i ) = a ki K(x 0, x i ).
26 EXAMPLE RESULTS An example of kernel PCA using the Gaussian kernel. (left) Original data, colored for reference (but may be classes) (middle) New coordinates using kernel width c = 2 (right) New coordinates using kernel width c = 10 Terminology: What we are doing is closely related to spectral clustering and can be considered an instance of manifold learning.
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationPrincipal components analysis COMS 4771
Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationCOMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017
COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationAnnouncements. stuff stat 538 Zaid Hardhaoui. Statistics. cry. spring. g fa. inference. VC dimension covering
Announcements spring Convex Optimization next quarter ML stuff EE 578 Margam FaZe CS 547 Tim Althoff Modeling how to formulate real world problems as convex optimization Data science constrained optimization
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationDimensionality Reduction with Principal Component Analysis
10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationLECTURE 16: PCA AND SVD
Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationDimensionality Reduc1on
Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the
More informationSingular Value Decomposition and Principal Component Analysis (PCA) I
Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More information(a)
Chapter 8 Subspace Methods 8. Introduction Principal Component Analysis (PCA) is applied to the analysis of time series data. In this context we discuss measures of complexity and subspace methods for
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationCSC 411 Lecture 12: Principal Component Analysis
CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationCOMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017
COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking
More informationPrincipal Component Analysis (PCA) CSC411/2515 Tutorial
Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)
More informationIntroduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond
PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More information4 Bias-Variance for Ridge Regression (24 points)
2 count = 0 3 for x in self.x_test_ridge: 4 5 prediction = np.matmul(self.w_ridge,x) 6 ###ADD THE COMPUTED MEAN BACK TO THE PREDICTED VECTOR### 7 prediction = self.ss_y.inverse_transform(prediction) 8
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationSupervised Learning Coursework
Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationNeuroscience Introduction
Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,
More informationLecture VIII Dim. Reduction (I)
Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and
More information