Unsupervised Learning: Linear Dimension Reduction
|
|
- Derek Goodwin
- 5 years ago
- Views:
Transcription
1 Unsupervised Learning: Linear Dimension Reduction
2 Unsupervised Learning Clustering & Dimension Reduction ( 化繁為簡 ) Generation ( 無中生有 ) only having function input function only having function output function code Clustering & Dimension Reduction in these slides
3 Clustering Cluster 3 Open question: how many clusters do we need? K-means Cluster 1 Cluster 2 Clustering X = x 1,, x n,, x N into K clusters Initialize cluster center c i, i=1,2, K (K random x n from X) Repeat For all x n n 1 x in X: b n is most close to c i i 0 Otherwise Updating all c i : c i = x n b i n x n n x n b i
4 Clustering Hierarchical Agglomerative Clustering (HAC) Step 1: build a tree Step 2: pick a threshold root
5 Distributed Representation Clustering: an object must belong to one cluster 小傑是強化系 Distributed representation Dimension Reduction 強化系 0.70 放出系 0.25 小傑是 變化系 0.05 操作系 0.00 具現化系 0.00 特質系 0.00
6 Dimension Reduction Looks like 3-D Actually, 2-D somapcnxtrue1.png
7 Dimension Reduction In MNIST, a digit is 28 x 28 dims. Most 28 x 28 dim vectors are not digits
8 Dimension Reduction x function z The dimension of z would be smaller than x Feature selection x 2 Select x2? x 1 Principle component analysis (PCA) [Bishop, Chapter 12] z = Wx
9 Principle Component Analysis (PCA)
10 PCA z = Wx Reduce to 1-D: z 1 = w 1 x Small variance Large variance x Project all the data points x onto w 1, and obtain a set of z 1 w 1 We want the variance of z 1 as large as possible z 1 = w 1 x Var z 1 = z 1 z 1 ഥz 1 2 w 1 2 = 1
11 PCA z = Wx Project all the data points x onto w 1, and obtain a set of z 1 We want the variance of z 1 as large as possible Reduce to 1-D: z 1 = w 1 x Var z 1 = z 1 z 1 ഥz 1 2 w 1 2 = 1 W = z 2 = w 2 x w 1 T w 2 T Orthogonal matrix We want the variance of z 2 as large as possible Var z 2 = z 2 z 2 ഥz 2 2 w 2 2 = 1 w 1 w 2 = 0
12 Warning of Math
13 PCA z 1 = w 1 x ഥz 1 = 1 N z 1 = 1 N w1 x = w 1 1 N x = w1 xҧ Var z 1 = 1 N z 1 z 1 ഥz 1 2 = 1 N x w 1 x w 1 xҧ 2 = 1 N w1 x xҧ 2 a b 2 = a T b 2 = a T ba T b = a T b a T b T = a T bb T a Find w 1 maximizing = 1 N w1 T x xҧ = w 1 T 1 N x xҧ x xҧ T w 1 x xҧ T w 1 w 1 T Sw 1 w 1 2 = w 1 T w 1 = 1 = w 1 T Cov x w 1 S = Cov x
14 Find w 1 maximizing w 1 T Sw 1 w 1 T w 1 = 1 S = Cov x Symmetric positive-semidefinite (non-negative eigenvalues) Using Lagrange multiplier [Bishop, Appendix E] g w 1 = w 1 T Sw 1 α w 1 T w 1 1 Τ g w 1 w 1 1 = 0 Τ g w 1 w 2 1 = 0 Sw 1 αw 1 = 0 Sw 1 = αw 1 w 1 : eigenvector w 1 T Sw 1 = α w 1 T w 1 = α Choose the maximum one w 1 is the eigenvector of the covariance matrix S Corresponding to the largest eigenvalue λ 1
15 Find w 2 maximizing w 2 T Sw 2 w 2 T w 2 = 1 w 2 T w 1 = 0 g w 2 = w 2 T Sw 2 α w 2 T w 2 1 β w 2 T w 1 0 Τ g w 2 w 1 2 = 0 Τ g w 2 w 2 2 = 0 Sw 2 αw 2 βw 1 = 0 w 1 0 T Sw 2 α w 1 0 T w 2 β w 1 1 T w 1 = 0 = w 1 T Sw 2 T = w 2 T Sw 1 β = 0: Sw 2 αw 2 = 0 Sw 2 = αw 2 = w 2 T S T w 1 = λ 1 w 2 T w 1 = 0 Sw 1 = λ 1 w 1 w 2 is the eigenvector of the covariance matrix S Corresponding to the 2 nd largest eigenvalue λ 2
16 PCA - decorrelation z = Wx Cov z = D Diagonal matrix PCA z 2 z 1 Cov z = 1 N z zҧ z z ҧ T = WSW T S = Cov x = WS w 1 w K = W Sw 1 Sw K = W λ 1 w 1 λ K w K = λ 1 Ww 1 λ K Ww K = λ 1 e 1 λ K e K = D Diagonal matrix
17 End of Warning
18 PCA Another Point of View Basic Component: u 1 u 2 u 3 u 4 u x c 1 u 1 + c 2 u c K u K + xҧ Pixels in a digit image x 1x 1x component u 1 u 3 u 5 c 1 c 2 c K. Represent a digit image
19 PCA Another Point of View x xҧ c 1 u 1 + c 2 u c K u K = x Reconstruction error: (x x) ҧ x 2 PCA: z 1 z 2 z K = L = min x x ҧ u 1,,uK z = Wx w 1 T w 2 T w K T x Find u 1,, u K minimizing the error K k=1 c k u k w 1, w 2, w K is the component u 1, u 2, u K minimizing L Proof in [Bishop, Chapter ] x 2
20 x xҧ c 1 u 1 + c 2 u c K u K = x Reconstruction error: (x x) ҧ x 2 Find u 1,, u K minimizing the error x 1 xҧ c 1 1 u 1 + c 1 2 u 2 + x 2 xҧ c 2 1 u 1 + c 2 2 u 2 + x 3 xҧ c 3 1 u 1 + c 3 2 u 2 + Matrix X Minimize Error u 1 u 2 1 c 1 2 c 1 3 c 1 1 c 2 2 c 2 3 c 2
21 x 1 xҧ u 1 u 2 Matrix X Minimize Error M x N M x K K x K K x N V X U 1 c 1 2 c 1 3 c 1 1 c 2 2 c 2 3 c 2 K columns of U: a set of orthonormal eigen vectors corresponding to the k largest eigenvalues of XX T This is the solution of PCA SVD:
22 PCA looks like a neural network with one hidden layer (linear activation function) Autoencoder If w 1, w 2, w K K x = c k w k k=1 is the component u 1, u 2, u K x xҧ To minimize reconstruction error: c k = x xҧ w k x xҧ K = 2: w 1 1 w 2 1 w 3 1 c 1 w 1 1 w 2 1 w 3 1 x 1 x 2 x 3
23 PCA looks like a neural network with one hidden layer (linear activation function) Autoencoder If w 1, w 2, w K K x = c k w k k=1 is the component u 1, u 2, u K x xҧ To minimize reconstruction error: c k = x xҧ w k K = 2: It can be deep. Deep Autoencoder x xҧ w 2 2 w 3 2 w 1 2 c 1 c 2 w 3 2 w 1 2 w 2 2 x 1 x 2 x 3 Minimize error Gradient Descent? x xҧ
24 Weakness of PCA Unsupervised Linear PCA Non-linear dimension reduction in the following lectures LDA hapter7/fig_s_manifold_pca.html
25 PCA - Pokémon Inspired from: pal-component-analysis-of-pokemon-data 800 Pokemons, 6 features for each (HP, Atk, Def, Sp Atk, Sp Def, Speed) How many principle components? λ i λ 1 + λ 2 + λ 3 + λ 4 + λ 5 + λ 6 λ 1 λ 2 λ 3 λ 4 λ 5 λ 6 ratio Using 4 components is good enough
26 PCA - Pokémon HP Atk Def Sp Atk Sp Def Speed PC 強度 PC PC 防禦 0.1 ( 犧牲速度 ) PC
27 PCA - Pokémon HP Atk Def Sp Atk Sp Def Speed PC PC PC 特殊防禦 0.1 ( 犧牲 生命力強 PC 攻擊和生命 -0.3 )
28 PCA - MNIST 30 components: = a 1 w 1 + a 2 w 2 + images Eigen-digits
29 PCA - Face 30 components: ing08/assignment3.html Eigen-face
30 What happens to PCA? = a 1 w 1 + a 2 w 2 + Can be any real number PCA involves adding up and subtracting some components (images) Then the components may not be parts of digits Non-negative matrix factorization (NMF) Forcing a 1, a 2 be non-negative additive combination Forcing w 1, w 2 be non-negative More like parts of digits Ref: Daniel D. Lee and H. Sebastian Seung. "Algorithms for non-negative matrix factorization." Advances in neural information processing systems
31 NMF on MNIST
32 NMF on Face
33 Matrix Factorization
34 Matrix Factorization Number in table: number of figures a person has A B C D E There are some common factors behind otakus and characters.
35 Matrix Factorization The factors are latent. 呆 傲 A match 呆 呆 傲 呆 No one cares 傲 B Not directly 傲呆 observable 呆 C 呆 傲 傲 傲
36 呆 r 1 r 2 r 3 r 4 r A r B r C r D r E A B Matrix X C D E No. of Otaku = M No. of characters = N No. of latent factor = K r A r 1 5 r B r 1 4 r C r 1 1 傲 M N n A1 n A2 n B1 n B2 Matrix X N Minimize Error K r A r B K N r 1 r 2 Singular value decomposition (SVD)
37 r i rj r 1 r 2 r 3 r 4 r A r B r C r D r E n A1 A 5 3? 1 B 4 3? 1 C 1 1? 5 D 1? 4 4 E? r A r 1 5 r B r 1 4 r C r 1 1 Minimizing L = i,j Only considering the defined value r i r j n ij 2 Find r i and r j by gradient descent
38 r 1 r 2 r 3 r 4 r A r B r C r D r E A ? 1 n A1 B ? 1 C ? 5 D 1 0.6? 4 4 E 0.1? Assume the dimensions of r are all 2 (there are two factors) A B C D E ( 春日 ) ( 炮姐 ) ( 姐寺 ) ( 小唯 )
39 More about Matrix Factorization Considering the induvial characteristics r A r 1 5 r A r 1 + b A + b 1 5 b A : otakus A likes to buy figures b 1 : how popular character 1 is Minimizing L = i,j r i r j + b i + b j n ij 2 Find r i, r j, b i, b j by gradient descent (can add regularization) Ref: Matrix Factorization Techniques For Recommender Systems
40 Matrix Factorization for Topic analysis Latent semantic analysis (LSA) Doc 1 Doc 2 Doc 3 Doc 4 投資 股票 總統 選舉 立委 character document, otakus word Number in Table: Term frequency (weighted by inverse document frequency) Latent factors are topics ( 財經 政治 ) Probability latent semantic analysis (PLSA) Thomas Hofmann, Probabilistic Latent Semantic Indexing, SIGIR, 1999 latent Dirichlet allocation (LDA) Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, John, ed. "Latent Dirichlet Allocation". Journal of Machine Learning Research. 3 (4 5): pp
41 More Related Approaches Not Introduced Multidimensional Scaling (MDS) [Alpaydin, Chapter 6.7] Only need distance between objects Probabilistic PCA [Bishop, Chapter 12.2] Kernel PCA [Bishop, Chapter 12.3] non-linear version of PCA Canonical Correlation Analysis (CCA) [Alpaydin, Chapter 6.9] Independent Component Analysis (ICA) Ref: Linear Discriminant Analysis (LDA) [Alpaydin, Chapter 6.8] Supervised
42 Acknowledgement 感謝彭冲同學發現引用資料的錯誤 感謝 Hsiang-Chih Cheng 同學發現投影片上的錯誤
Classification: Probabilistic Generative Model
Classification: Probabilistic Generative Model Classification x Function Class n Credit Scoring Input: income, savings, profession, age, past financial history Output: accept or refuse Medical Diagnosis
More informationRegression. Hung-yi Lee 李宏毅
Regression Hung-yi Lee 李宏毅 Regression: Output a scalar Stock Market Forecast f Self-driving Car = Dow Jones Industrial Average at tomorrow f = 方向盤角度 Recommendation f = 使用者 A 商品 B 購買可能性 Example Application
More informationSemi-supervised Learning
Semi-supervised Learning Introduction Supervised learning: x r, y r R r=1 E.g.x r : image, y r : class labels Semi-supervised learning: x r, y r r=1 R, x u R+U u=r A set of unlabeled data, usually U >>
More informationDeep learning attracts lots of attention.
Deep Learning Deep learning attracts lots of attention. I believe you have seen lots of exciting results before. Deep learning trends at Google. Source: SIGMOD/Jeff Dean Ups and downs of Deep Learning
More informationDeep Learning. Hung-yi Lee 李宏毅
Deep Learning Hung-yi Lee 李宏毅 Deep learning attracts lots of attention. I believe you have seen lots of exciting results before. Deep learning trends at Google. Source: SIGMOD 206/Jeff Dean 958: Perceptron
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationPrincipal Component Analysis (PCA) CSC411/2515 Tutorial
Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationConvolutional Neural Network. Hung-yi Lee
al Neural Network Hung-yi Lee Why CNN for Image? [Zeiler, M. D., ECCV 2014] x 1 x 2 Represented as pixels x N The most basic classifiers Use 1 st layer as module to build classifiers Use 2 nd layer as
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationDeep Reinforcement Learning. Scratching the surface
Deep Reinforcement Learning Scratching the surface Deep Reinforcement Learning Scenario of Reinforcement Learning Observation State Agent Action Change the environment Don t do that Reward Environment
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationCSC 411 Lecture 12: Principal Component Analysis
CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationLECTURE 16: PCA AND SVD
Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationMachine Learning for Software Engineering
Machine Learning for Software Engineering Dimensionality Reduction Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Exam Info Scheduled for Tuesday 25 th of July 11-13h (same time as the
More informationNote on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing
Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationCovariance and Correlation Matrix
Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2016
CPSC 340: Machine Learning and Data Mining More PCA Fall 2016 A2/Midterm: Admin Grades/solutions posted. Midterms can be viewed during office hours. Assignment 4: Due Monday. Extra office hours: Thursdays
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationMore Tips for Training Neural Network. Hung-yi Lee
More Tips for Training Neural Network Hung-yi ee Outline Activation Function Cost Function Data Preprocessing Training Generalization Review: Training Neural Network Neural network: f ; θ : input (vector)
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationPrincipal Component Analysis CS498
Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy
More informationTechniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods
Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 5, 2011 Today: Latent Dirichlet Allocation topic models Social network analysis based on latent probabilistic
More informationBeyond Vectors. Hung-yi Lee
Beyond Vectors Hung-yi Lee Introduction Many things can be considered as vectors. E.g. a function can be regarded as a vector We can apply the concept we learned on those vectors. Linear combination Span
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationNon-Negative Matrix Factorization
Chapter 3 Non-Negative Matrix Factorization Part 2: Variations & applications Geometry of NMF 2 Geometry of NMF 1 NMF factors.9.8 Data points.7.6 Convex cone.5.4 Projections.3.2.1 1.5 1.5 1.5.5 1 3 Sparsity
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationLinear & Non-Linear Discriminant Analysis! Hugh R. Wilson
Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised
More informationLecture 19, November 19, 2012
Machine Learning 0-70/5-78, Fall 0 Latent Space Analysis SVD and Topic Models Eric Xing Lecture 9, November 9, 0 Reading: Tutorial on Topic Model @ ACL Eric Xing @ CMU, 006-0 We are inundated with data
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationSignal Analysis. Principal Component Analysis
Multi dimensional Signal Analysis Lecture 2E Principal Component Analysis Subspace representation Note! Given avector space V of dimension N a scalar product defined by G 0 a subspace U of dimension M
More informationIntroduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond
PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation
More informationPrincipal Component Analysis
Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature
More informationMECHANICS OF MATERIALS
CHAPTER 2 MECHANICS OF MATERIALS Ferdinand P. Beer E. Russell Johnston, Jr. John T. DeWolf David F. Mazurek Lecture Notes: J. Walt Oler Texas Tech University Stress and Strain Axial Loading 2.1 An Introduction
More informationLanguage Modeling. Hung-yi Lee 李宏毅
Language Modeling Hung-yi Lee 李宏毅 Language modeling Language model: Estimated the probability of word sequence Word sequence: w 1, w 2, w 3,., w n P(w 1, w 2, w 3,., w n ) Application: speech recognition
More informationCOMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017
COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA
More informationSubspace Methods for Visual Learning and Recognition
This is a shortened version of the tutorial given at the ECCV 2002, Copenhagen, and ICPR 2002, Quebec City. Copyright 2002 by Aleš Leonardis, University of Ljubljana, and Horst Bischof, Graz University
More informationPrincipal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014
Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes
More informationData Mining and Matrices
Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences
More informationPrincipal components analysis COMS 4771
Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More information