Unsupervised Learning: Linear Dimension Reduction

Size: px
Start display at page:

Download "Unsupervised Learning: Linear Dimension Reduction"

Transcription

1 Unsupervised Learning: Linear Dimension Reduction

2 Unsupervised Learning Clustering & Dimension Reduction ( 化繁為簡 ) Generation ( 無中生有 ) only having function input function only having function output function code Clustering & Dimension Reduction in these slides

3 Clustering Cluster 3 Open question: how many clusters do we need? K-means Cluster 1 Cluster 2 Clustering X = x 1,, x n,, x N into K clusters Initialize cluster center c i, i=1,2, K (K random x n from X) Repeat For all x n n 1 x in X: b n is most close to c i i 0 Otherwise Updating all c i : c i = x n b i n x n n x n b i

4 Clustering Hierarchical Agglomerative Clustering (HAC) Step 1: build a tree Step 2: pick a threshold root

5 Distributed Representation Clustering: an object must belong to one cluster 小傑是強化系 Distributed representation Dimension Reduction 強化系 0.70 放出系 0.25 小傑是 變化系 0.05 操作系 0.00 具現化系 0.00 特質系 0.00

6 Dimension Reduction Looks like 3-D Actually, 2-D somapcnxtrue1.png

7 Dimension Reduction In MNIST, a digit is 28 x 28 dims. Most 28 x 28 dim vectors are not digits

8 Dimension Reduction x function z The dimension of z would be smaller than x Feature selection x 2 Select x2? x 1 Principle component analysis (PCA) [Bishop, Chapter 12] z = Wx

9 Principle Component Analysis (PCA)

10 PCA z = Wx Reduce to 1-D: z 1 = w 1 x Small variance Large variance x Project all the data points x onto w 1, and obtain a set of z 1 w 1 We want the variance of z 1 as large as possible z 1 = w 1 x Var z 1 = z 1 z 1 ഥz 1 2 w 1 2 = 1

11 PCA z = Wx Project all the data points x onto w 1, and obtain a set of z 1 We want the variance of z 1 as large as possible Reduce to 1-D: z 1 = w 1 x Var z 1 = z 1 z 1 ഥz 1 2 w 1 2 = 1 W = z 2 = w 2 x w 1 T w 2 T Orthogonal matrix We want the variance of z 2 as large as possible Var z 2 = z 2 z 2 ഥz 2 2 w 2 2 = 1 w 1 w 2 = 0

12 Warning of Math

13 PCA z 1 = w 1 x ഥz 1 = 1 N z 1 = 1 N w1 x = w 1 1 N x = w1 xҧ Var z 1 = 1 N z 1 z 1 ഥz 1 2 = 1 N x w 1 x w 1 xҧ 2 = 1 N w1 x xҧ 2 a b 2 = a T b 2 = a T ba T b = a T b a T b T = a T bb T a Find w 1 maximizing = 1 N w1 T x xҧ = w 1 T 1 N x xҧ x xҧ T w 1 x xҧ T w 1 w 1 T Sw 1 w 1 2 = w 1 T w 1 = 1 = w 1 T Cov x w 1 S = Cov x

14 Find w 1 maximizing w 1 T Sw 1 w 1 T w 1 = 1 S = Cov x Symmetric positive-semidefinite (non-negative eigenvalues) Using Lagrange multiplier [Bishop, Appendix E] g w 1 = w 1 T Sw 1 α w 1 T w 1 1 Τ g w 1 w 1 1 = 0 Τ g w 1 w 2 1 = 0 Sw 1 αw 1 = 0 Sw 1 = αw 1 w 1 : eigenvector w 1 T Sw 1 = α w 1 T w 1 = α Choose the maximum one w 1 is the eigenvector of the covariance matrix S Corresponding to the largest eigenvalue λ 1

15 Find w 2 maximizing w 2 T Sw 2 w 2 T w 2 = 1 w 2 T w 1 = 0 g w 2 = w 2 T Sw 2 α w 2 T w 2 1 β w 2 T w 1 0 Τ g w 2 w 1 2 = 0 Τ g w 2 w 2 2 = 0 Sw 2 αw 2 βw 1 = 0 w 1 0 T Sw 2 α w 1 0 T w 2 β w 1 1 T w 1 = 0 = w 1 T Sw 2 T = w 2 T Sw 1 β = 0: Sw 2 αw 2 = 0 Sw 2 = αw 2 = w 2 T S T w 1 = λ 1 w 2 T w 1 = 0 Sw 1 = λ 1 w 1 w 2 is the eigenvector of the covariance matrix S Corresponding to the 2 nd largest eigenvalue λ 2

16 PCA - decorrelation z = Wx Cov z = D Diagonal matrix PCA z 2 z 1 Cov z = 1 N z zҧ z z ҧ T = WSW T S = Cov x = WS w 1 w K = W Sw 1 Sw K = W λ 1 w 1 λ K w K = λ 1 Ww 1 λ K Ww K = λ 1 e 1 λ K e K = D Diagonal matrix

17 End of Warning

18 PCA Another Point of View Basic Component: u 1 u 2 u 3 u 4 u x c 1 u 1 + c 2 u c K u K + xҧ Pixels in a digit image x 1x 1x component u 1 u 3 u 5 c 1 c 2 c K. Represent a digit image

19 PCA Another Point of View x xҧ c 1 u 1 + c 2 u c K u K = x Reconstruction error: (x x) ҧ x 2 PCA: z 1 z 2 z K = L = min x x ҧ u 1,,uK z = Wx w 1 T w 2 T w K T x Find u 1,, u K minimizing the error K k=1 c k u k w 1, w 2, w K is the component u 1, u 2, u K minimizing L Proof in [Bishop, Chapter ] x 2

20 x xҧ c 1 u 1 + c 2 u c K u K = x Reconstruction error: (x x) ҧ x 2 Find u 1,, u K minimizing the error x 1 xҧ c 1 1 u 1 + c 1 2 u 2 + x 2 xҧ c 2 1 u 1 + c 2 2 u 2 + x 3 xҧ c 3 1 u 1 + c 3 2 u 2 + Matrix X Minimize Error u 1 u 2 1 c 1 2 c 1 3 c 1 1 c 2 2 c 2 3 c 2

21 x 1 xҧ u 1 u 2 Matrix X Minimize Error M x N M x K K x K K x N V X U 1 c 1 2 c 1 3 c 1 1 c 2 2 c 2 3 c 2 K columns of U: a set of orthonormal eigen vectors corresponding to the k largest eigenvalues of XX T This is the solution of PCA SVD:

22 PCA looks like a neural network with one hidden layer (linear activation function) Autoencoder If w 1, w 2, w K K x = c k w k k=1 is the component u 1, u 2, u K x xҧ To minimize reconstruction error: c k = x xҧ w k x xҧ K = 2: w 1 1 w 2 1 w 3 1 c 1 w 1 1 w 2 1 w 3 1 x 1 x 2 x 3

23 PCA looks like a neural network with one hidden layer (linear activation function) Autoencoder If w 1, w 2, w K K x = c k w k k=1 is the component u 1, u 2, u K x xҧ To minimize reconstruction error: c k = x xҧ w k K = 2: It can be deep. Deep Autoencoder x xҧ w 2 2 w 3 2 w 1 2 c 1 c 2 w 3 2 w 1 2 w 2 2 x 1 x 2 x 3 Minimize error Gradient Descent? x xҧ

24 Weakness of PCA Unsupervised Linear PCA Non-linear dimension reduction in the following lectures LDA hapter7/fig_s_manifold_pca.html

25 PCA - Pokémon Inspired from: pal-component-analysis-of-pokemon-data 800 Pokemons, 6 features for each (HP, Atk, Def, Sp Atk, Sp Def, Speed) How many principle components? λ i λ 1 + λ 2 + λ 3 + λ 4 + λ 5 + λ 6 λ 1 λ 2 λ 3 λ 4 λ 5 λ 6 ratio Using 4 components is good enough

26 PCA - Pokémon HP Atk Def Sp Atk Sp Def Speed PC 強度 PC PC 防禦 0.1 ( 犧牲速度 ) PC

27 PCA - Pokémon HP Atk Def Sp Atk Sp Def Speed PC PC PC 特殊防禦 0.1 ( 犧牲 生命力強 PC 攻擊和生命 -0.3 )

28 PCA - MNIST 30 components: = a 1 w 1 + a 2 w 2 + images Eigen-digits

29 PCA - Face 30 components: ing08/assignment3.html Eigen-face

30 What happens to PCA? = a 1 w 1 + a 2 w 2 + Can be any real number PCA involves adding up and subtracting some components (images) Then the components may not be parts of digits Non-negative matrix factorization (NMF) Forcing a 1, a 2 be non-negative additive combination Forcing w 1, w 2 be non-negative More like parts of digits Ref: Daniel D. Lee and H. Sebastian Seung. "Algorithms for non-negative matrix factorization." Advances in neural information processing systems

31 NMF on MNIST

32 NMF on Face

33 Matrix Factorization

34 Matrix Factorization Number in table: number of figures a person has A B C D E There are some common factors behind otakus and characters.

35 Matrix Factorization The factors are latent. 呆 傲 A match 呆 呆 傲 呆 No one cares 傲 B Not directly 傲呆 observable 呆 C 呆 傲 傲 傲

36 呆 r 1 r 2 r 3 r 4 r A r B r C r D r E A B Matrix X C D E No. of Otaku = M No. of characters = N No. of latent factor = K r A r 1 5 r B r 1 4 r C r 1 1 傲 M N n A1 n A2 n B1 n B2 Matrix X N Minimize Error K r A r B K N r 1 r 2 Singular value decomposition (SVD)

37 r i rj r 1 r 2 r 3 r 4 r A r B r C r D r E n A1 A 5 3? 1 B 4 3? 1 C 1 1? 5 D 1? 4 4 E? r A r 1 5 r B r 1 4 r C r 1 1 Minimizing L = i,j Only considering the defined value r i r j n ij 2 Find r i and r j by gradient descent

38 r 1 r 2 r 3 r 4 r A r B r C r D r E A ? 1 n A1 B ? 1 C ? 5 D 1 0.6? 4 4 E 0.1? Assume the dimensions of r are all 2 (there are two factors) A B C D E ( 春日 ) ( 炮姐 ) ( 姐寺 ) ( 小唯 )

39 More about Matrix Factorization Considering the induvial characteristics r A r 1 5 r A r 1 + b A + b 1 5 b A : otakus A likes to buy figures b 1 : how popular character 1 is Minimizing L = i,j r i r j + b i + b j n ij 2 Find r i, r j, b i, b j by gradient descent (can add regularization) Ref: Matrix Factorization Techniques For Recommender Systems

40 Matrix Factorization for Topic analysis Latent semantic analysis (LSA) Doc 1 Doc 2 Doc 3 Doc 4 投資 股票 總統 選舉 立委 character document, otakus word Number in Table: Term frequency (weighted by inverse document frequency) Latent factors are topics ( 財經 政治 ) Probability latent semantic analysis (PLSA) Thomas Hofmann, Probabilistic Latent Semantic Indexing, SIGIR, 1999 latent Dirichlet allocation (LDA) Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, John, ed. "Latent Dirichlet Allocation". Journal of Machine Learning Research. 3 (4 5): pp

41 More Related Approaches Not Introduced Multidimensional Scaling (MDS) [Alpaydin, Chapter 6.7] Only need distance between objects Probabilistic PCA [Bishop, Chapter 12.2] Kernel PCA [Bishop, Chapter 12.3] non-linear version of PCA Canonical Correlation Analysis (CCA) [Alpaydin, Chapter 6.9] Independent Component Analysis (ICA) Ref: Linear Discriminant Analysis (LDA) [Alpaydin, Chapter 6.8] Supervised

42 Acknowledgement 感謝彭冲同學發現引用資料的錯誤 感謝 Hsiang-Chih Cheng 同學發現投影片上的錯誤

Classification: Probabilistic Generative Model

Classification: Probabilistic Generative Model Classification: Probabilistic Generative Model Classification x Function Class n Credit Scoring Input: income, savings, profession, age, past financial history Output: accept or refuse Medical Diagnosis

More information

Regression. Hung-yi Lee 李宏毅

Regression. Hung-yi Lee 李宏毅 Regression Hung-yi Lee 李宏毅 Regression: Output a scalar Stock Market Forecast f Self-driving Car = Dow Jones Industrial Average at tomorrow f = 方向盤角度 Recommendation f = 使用者 A 商品 B 購買可能性 Example Application

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Introduction Supervised learning: x r, y r R r=1 E.g.x r : image, y r : class labels Semi-supervised learning: x r, y r r=1 R, x u R+U u=r A set of unlabeled data, usually U >>

More information

Deep learning attracts lots of attention.

Deep learning attracts lots of attention. Deep Learning Deep learning attracts lots of attention. I believe you have seen lots of exciting results before. Deep learning trends at Google. Source: SIGMOD/Jeff Dean Ups and downs of Deep Learning

More information

Deep Learning. Hung-yi Lee 李宏毅

Deep Learning. Hung-yi Lee 李宏毅 Deep Learning Hung-yi Lee 李宏毅 Deep learning attracts lots of attention. I believe you have seen lots of exciting results before. Deep learning trends at Google. Source: SIGMOD 206/Jeff Dean 958: Perceptron

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Convolutional Neural Network. Hung-yi Lee

Convolutional Neural Network. Hung-yi Lee al Neural Network Hung-yi Lee Why CNN for Image? [Zeiler, M. D., ECCV 2014] x 1 x 2 Represented as pixels x N The most basic classifiers Use 1 st layer as module to build classifiers Use 2 nd layer as

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Deep Reinforcement Learning. Scratching the surface

Deep Reinforcement Learning. Scratching the surface Deep Reinforcement Learning Scratching the surface Deep Reinforcement Learning Scenario of Reinforcement Learning Observation State Agent Action Change the environment Don t do that Reward Environment

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

LECTURE 16: PCA AND SVD

LECTURE 16: PCA AND SVD Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Dimensionality Reduction Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Exam Info Scheduled for Tuesday 25 th of July 11-13h (same time as the

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Covariance and Correlation Matrix

Covariance and Correlation Matrix Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016 CPSC 340: Machine Learning and Data Mining More PCA Fall 2016 A2/Midterm: Admin Grades/solutions posted. Midterms can be viewed during office hours. Assignment 4: Due Monday. Extra office hours: Thursdays

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

More Tips for Training Neural Network. Hung-yi Lee

More Tips for Training Neural Network. Hung-yi Lee More Tips for Training Neural Network Hung-yi ee Outline Activation Function Cost Function Data Preprocessing Training Generalization Review: Training Neural Network Neural network: f ; θ : input (vector)

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 5, 2011 Today: Latent Dirichlet Allocation topic models Social network analysis based on latent probabilistic

More information

Beyond Vectors. Hung-yi Lee

Beyond Vectors. Hung-yi Lee Beyond Vectors Hung-yi Lee Introduction Many things can be considered as vectors. E.g. a function can be regarded as a vector We can apply the concept we learned on those vectors. Linear combination Span

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Non-Negative Matrix Factorization

Non-Negative Matrix Factorization Chapter 3 Non-Negative Matrix Factorization Part 2: Variations & applications Geometry of NMF 2 Geometry of NMF 1 NMF factors.9.8 Data points.7.6 Convex cone.5.4 Projections.3.2.1 1.5 1.5 1.5.5 1 3 Sparsity

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised

More information

Lecture 19, November 19, 2012

Lecture 19, November 19, 2012 Machine Learning 0-70/5-78, Fall 0 Latent Space Analysis SVD and Topic Models Eric Xing Lecture 9, November 9, 0 Reading: Tutorial on Topic Model @ ACL Eric Xing @ CMU, 006-0 We are inundated with data

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

Signal Analysis. Principal Component Analysis

Signal Analysis. Principal Component Analysis Multi dimensional Signal Analysis Lecture 2E Principal Component Analysis Subspace representation Note! Given avector space V of dimension N a scalar product defined by G 0 a subspace U of dimension M

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature

More information

MECHANICS OF MATERIALS

MECHANICS OF MATERIALS CHAPTER 2 MECHANICS OF MATERIALS Ferdinand P. Beer E. Russell Johnston, Jr. John T. DeWolf David F. Mazurek Lecture Notes: J. Walt Oler Texas Tech University Stress and Strain Axial Loading 2.1 An Introduction

More information

Language Modeling. Hung-yi Lee 李宏毅

Language Modeling. Hung-yi Lee 李宏毅 Language Modeling Hung-yi Lee 李宏毅 Language modeling Language model: Estimated the probability of word sequence Word sequence: w 1, w 2, w 3,., w n P(w 1, w 2, w 3,., w n ) Application: speech recognition

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Subspace Methods for Visual Learning and Recognition

Subspace Methods for Visual Learning and Recognition This is a shortened version of the tutorial given at the ECCV 2002, Copenhagen, and ICPR 2002, Quebec City. Copyright 2002 by Aleš Leonardis, University of Ljubljana, and Horst Bischof, Graz University

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information