CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

Similar documents
CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2017

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

CPSC 340: Machine Learning and Data Mining

CPSC 340 Assignment 4 (due November 17 ATE)

Machine Learning (CSE 446): Neural Networks

Principal Component Analysis (PCA)

Machine Learning - MT & 14. PCA and MDS

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Linear Models for Classification

Machine Learning for NLP

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Collaborative Filtering. Radek Pelánek

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

CS 6375 Machine Learning

Machine learning for pervasive systems Classification in high-dimensional spaces

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Principal Component Analysis

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Machine Learning 2nd Edition

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

CPSC 540: Machine Learning

Machine Learning (BSMC-GA 4439) Wenke Liu

Dimensionality reduction

PCA, Kernel PCA, ICA

CS281 Section 4: Factor Analysis and PCA

CPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016

November 28 th, Carlos Guestrin 1. Lower dimensional projections

What is Principal Component Analysis?

Computer Vision Group Prof. Daniel Cremers. 3. Regression

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Expectation Maximization

Machine Learning (Spring 2012) Principal Component Analysis

Nonlinear Classification

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Advanced Introduction to Machine Learning CMU-10715

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

CSC321 Lecture 20: Autoencoders

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Generative v. Discriminative classifiers Intuition

CPSC 540: Machine Learning

Intelligent Systems (AI-2)

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Unsupervised Learning: K- Means & PCA

Support Vector Machines

Designing Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

STA 414/2104: Lecture 8

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Dimensionality Reduction

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

CPSC 540: Machine Learning

PCA and admixture models

Artificial Neural Networks Examination, June 2005

ECE521 week 3: 23/26 January 2017

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Stochastic Gradient Descent

Quick Introduction to Nonnegative Matrix Factorization

Statistical Machine Learning

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Introduction to Logistic Regression and Support Vector Machine

CPSC 540: Machine Learning

Classification Logistic Regression

CS534 Machine Learning - Spring Final Exam

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Intelligent Systems (AI-2)

Pattern Recognition and Machine Learning

Bias-Variance Tradeoff

Mathematical Formulation of Our Example

Optimization and Gradient Descent

Linear Methods for Regression. Lijun Zhang

Principal Component Analysis CS498

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Artificial Neural Networks Examination, March 2004

Click Prediction and Preference Ranking of RSS Feeds

Data Mining Techniques

VBM683 Machine Learning

Linear Models in Machine Learning

Generative Clustering, Topic Modeling, & Bayesian Inference

Non-Parametric Bayes

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Data Mining and Matrices

Transcription:

CPSC 340: Machine Learning and Data Mining More PCA Fall 2016

A2/Midterm: Admin Grades/solutions posted. Midterms can be viewed during office hours. Assignment 4: Due Monday. Extra office hours: Thursdays from 4:30-5:30 in ICICS X836.

1. Decision trees 2. Naïve Bayes classification 3. Ordinary least squares regression 4. Logistic regression 5. Support vector machines 6. Ensemble methods 7. Clustering algorithms 8. Principal component analysis 9. Singular value decomposition 10.Independent component analysis http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html

Last Week: Principal Component Analysis (PCA) PCA is a linear model for unsupervised learning. Represents features as linear combination of latent factors: But we re learning the latent factors W and latent features z i. Can also be viewed as an approximate matrix factorization:

Last Week: Principal Component Analysis (PCA) PCA is a linear model for unsupervised learning. Represents features as linear combination of latent factors: Uses: dimensionality reduction, visualization, factor discovery. http://infoproc.blogspot.ca/2008/11/european-genetic-substructure.html https://new.edu/resources/big-5-personality-traits

Last Week: Principal Component Analysis (PCA) PCA is a linear model for unsupervised learning. Represents features as linear combination of latent factors: Uses: dimensionality reduction, visualization, factor discovery. http://www.prismtc.co.uk/superheroes-pca/

Maximizing Variance vs. Minimizing Error Our synthesis view that PCA minimizes approximation error. Makes connection to k-means and our tricks for linear regression. Classic analysis view: PCA maximizes variance in z i space. You pick W to explain as much variance in the data as possible.

Choosing Number of Latent Factors Common approach to choosing k : Compute error with k=0: Compare to error with non-zero k : Gives a number between 0 and 1, giving how much variance remains. If you want to explain 90% of variance, choose smallest k where ratio is < 0.10.

PCA Computation The PCA objective with general d and k : 3 common ways to solve this problem: Singular value decomposition (SVD) classic non-iterative approach. Alternating between updating W and updating Z. Stochastic gradient: gradient descent based on random i and j. (Or just plain gradient descent). Not convex, all these methods work with random initialization.

PCA Computation The PCA objective with general d and k : Where we ve subtracted mean µ j from each feature. At test time, to find optimal Z given W for new data:

We have the scaling problem: PCA Non-Uniqueness

PCA Non-Uniqueness But with multiple PCs, we have new problems: Factors could be non-orthogonal (components interfere with each other): You can still rotate the factors and also have label switching. A fix is to fit the PCs sequentially (can be done with SVD approach):

Basis, Orthogonality, Sequential Fitting

Basis, Orthogonality, Sequential Fitting

Basis, Orthogonality, Sequential Fitting

Basis, Orthogonality, Sequential Fitting http://setosa.io/ev/principal-component-analysis

Do need all this math?

Colour Opponency in the Human Eye Classic model of the eye is with 4 photoreceptors: Rods (more sensitive to brightness). L-Cones (most sensitive to red). M-Cones (most sensitive to green). S-Cones (most sensitive to blue). Two problems with this system: Correlation between receptors (not orthogonal). Particularly between red/green. We have 4 receptors for 3 colours. http://oneminuteastronomer.com/astro-course-day-5/ https://en.wikipedia.org/wiki/color_visio

Colour Opponency in the Human Eye Bipolar and ganglion cells seem to code using opponent colors : 3-variable orthogonal basis: This is similar to PCA (d = 4, k = 3). http://oneminuteastronomer.com/astro-course-day-5/ https://en.wikipedia.org/wiki/color_visio http://5sensesnews.blogspot.ca/

Colour Opponency Representation

Application: Face Detection Consider problem of face detection: Classic methods use eigenfaces as basis: PCA applied to images of faces. https://developer.apple.com/library/content/documentation/graphicsimaging/conceptual/coreimaging/ci_detect_faces/ci_detect_faces.html

Eigenfaces Collect a bunch of images of faces under different conditions:

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Eigenfaces

Representing Faces But how should we represent faces? K-means: Grandmother cell : one neuron = one face. Almost certainly not true: too few neurons. PCA: Distributed representation. Coded by pattern of group of neurons. Can represent more concepts. But PCA uses positive/negative cancelling parts. Non-negative matrix factorization (NMF): Latent-factor where W and Z are non-negative. Example of sparse coding : Coded by small number of neurons in group. NMF makes object out of of parts. http://www.columbia.edu/~jwp2128/teaching/w4721/papers/nmf_nature.pdf

Why sparse coding? Representing Faces Parts are intuitive, and brains seem to use sparse representation. Energy efficiency if using sparse code. Increase number of concepts you can memorize? Some evidence in fruit fly olfactory system. http://www.columbia.edu/~jwp2128/teaching/w4721/papers/nmf_nature.pdf

Summary Analysis view of PCA is that it maximizes variance. We can choose k to explain x% of the variance in the data. Orthogonal basis and sequential fitting of PCs: Leads to non-redundant PCs with unique directions. Biological motivation for orthogonal and/or sparse latent factors. Non-negative matrix factorization leads to sparse LFM. Next time: modifying PCA so it splits faces into eyes, mouths, etc.