Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis
|
|
- Kory Candace Fields
- 5 years ago
- Views:
Transcription
1 Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German ECML KPDD, Antwerp, Belgium, September 16, 2008
2 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook
3 Paired Datasets Realistic data comes in man different modalities: tet, images... We call data paired, if the samples come in more than one such representation at the same time, e.g. images + captions, audio + transscript, multi-language tet documents MRI + CT scans We assume a latent aspect that relates the representations. Z X Y
4 Paired Datasets A full paired dataset has correspondences for all samples: X = { 1, 2,..., n }, Y = { 1, 2,..., n }.
5 Paired Datasets However, data is often onl partiall paired: ˆX = { 1, 2,..., n, n+1,..., n+p },?? Ŷ = { 1, 2,..., n, n+1,..., n+p }.
6 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook
7 Review of Canonical Correlation Analsis Principal Component Analsis (PCA) Single dataset 1,..., n. Find projections that maimize the variance of the projected data. simple eigenvalue problem. Canonical Correlation Analsis (CCA) Full paired data 1 1,..., n n. Find projection directions w and w that maimize the correlation between the projected data. ma w,w w T C w w T C w w T C w C /C /C : (cross) correlation matrices generalized eigenvalue problem. supervised situation: { 1, 1} ground truth labels CCA LDA!
8 Wh is Correlation better than Variance? To dataset: 1 signal direction, 1 (high variance) noise direction. X original data maimal variance maimal correlation Y CCA can ignore noise that is uncorrelated between X and Y.
9 Kernelization Kernel Canonical Correlation Analsis Kernelize CCA, to use it for arbitrar input domains, and (latent) data embeddings φ : X H X, φ : Y H Y. k ( i, j ) = φ ( i ), φ ( j ), k ( i, j ) = φ ( i ), φ ( j ) w = i α iφ ( i ), w = i β iφ ( i ) KCCA CCA in H X /H Y : Solve ma α,β α T K K β α T K 2 α β T K 2 β K, K : kernel matrices of X, Y, equivalent to the constrained optimization problem ma α,β α T K K β sb.t. α T K 2 α = 1 and β T K 2 β = 1.
10 Need for Regularization Problem: Maimization is degenerate for invertible K or K. ma α,β α T K K β sb.t. α T K 2 α = 1 and β T K 2 β = 1 allows α, β with perfect correlation but without learning anthing! We need to regularize!
11 Need for Regularization Problem: Maimization is degenerate for invertible K or K. ma α,β α T K K β sb.t. α T K 2 α = 1 and β T K 2 β = 1 allows α, β with perfect correlation but without learning anthing! Tikhonov regularization: ma w,w = ma α,β w T C w (w T C w + ε w 2 ) ( w T C w + ε w 2) α T K K β α T (K 2 + ε K ) α β ( ) T K 2 + ε K β Regularization parameters ε, ε need to be model selected.
12 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook
13 Proposed Laplacian Regularization Manifold assumption: The data lie on a lower-dimensional manifold M H. Use the Laplace operator M to measure smoothness along M. Laplacian regularization: Approimate M b Graph Laplacian L = D 1/2 (D W )D 1/2 (W similarit matri, D diagonal matri with D ii = j W ij). ma α,β α T K K β ( sb.t. α T K K + ε K + γ K L K )α = 1, ) β = 1 β T ( K K + ε K }{{} Tikhonov + γ K L K }{{} Laplacian Favour projections that var smoothl wrt. the manifold structure.
14 Partiall paired data revisited Making use of unpaired data: We don t need correspondences to compute Laplacian. More data allows a better estimate of the manifold structure.
15 Partiall paired data revisited Making use of unpaired data: We don t need correspondences to compute Laplacian. More data allows a better estimate of the manifold structure. Notation: Paired training data { 1,..., n } and { 1,..., n }, Additional unpaired data { n+1,..., n+p } and { n+1,..., n+p }. Data matri X = ( 1,..., n ) T HX n, Etended data matri ˆX = ( 1,..., n+p ) T H n+p X, Kernel matri K = XX T R n n, Etended kernel matrices Kˆ = ˆX X T R (n+p ) n, Kˆˆ = ˆX ˆX T R (n+p ) (n+p ), etc.
16 Semi-Supervised Laplacian Regularization Semi-Supervised KCCA with Laplacian Regularization: ma α,β α T Kˆ K ŷ β ( sb.t. α T γ ) Kˆ K ˆ + ε Kˆˆ + (n + p ) 2 KˆˆLˆ Kˆˆ α = 1 ( β T γ Kŷ K ŷ + ε Kŷŷ + }{{} (n + p ) 2 K ŷŷ Lŷ Kŷŷ Tikhonov }{{} semi-supervised Laplacian Finds data projections that achieve high correlation when applied to X Y var smoothl on manifolds defined b ˆX, Ŷ. ) β = 1 Four regularization parameters: ε, ε, γ, γ model selection
17 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook
18 Model Selection Supervised szenario: cross-validation or similar. Unsupervised szenario: dependence maimization HSNIC HSNIC: kernel measure of dependence between random variables (Fukumizu, Bach, & Gretton 2007) Hilbert-Schmidt norm of normalized cross-covariance operator V Closel related to KCCA HSNIC(, ) V 2 HS Regularization with Laplace-Beltrami operators on data manifolds V =(Σ + ε I }{{} Tikhonov + γ }{{ M ) 1 2 Σ } Laplacian }{{} covariance operator ( Σ + ε I + γ M ) 1 2 Finite sample set: closed form epression ˆV (see paper or poster)
19 Model Selection: Concluded ˆV 2 HS estimates correlation achievable b KCCA. But beware: overfitting! trivial maimum at ε = ε = γ = γ = 0. Better model selection criterion: Maimize increase in correlation onl due to the data pairing ρ(ε, ε, γ, γ ) = ˆV 2 HS ˆV Π() 2 HS with Π() randoml reshuffled version(s) of.
20 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook
21 Eample Results l 2 norm of cross correlations Four datasets of images + associated tet Multiple train / test splits Var percentage of correspondences Eample results: UIUC-ISD Bass, 6 semantic classes Perform (semi-supervised) KCCA on training set Measure correlations on test set Measure scatter ratio on test set (larger is better) Cross correlation Image scatter ratio Tet scatter ratio 2 Laplacian regularization no Laplacian Regularization image scatter ratio 1.6 Laplacian regularization no Laplacian Regularization tet scatter ratio 2.3 Laplacian regularization 2.2 no Laplacian Regularization % data w/ correspondences % data w/ correspondences % data w/ correspondences More results in paper.
22 Summar & Future Work Summar: Laplacian regularization for KCCA Allows straight-forward semi-supervised etension Model selection: closed form epression for Laplacian regularized kernel independence measure (HSNIC) Eperimental results: projections better respect latent aspect Future work: Improved model selection criterion ratio used somewhat heuristic Other application of Laplacian regularized HSNIC Causalit inference ICA
23 Summar & Future Work Summar: Laplacian regularization for KCCA Allows straight-forward semi-supervised etension Model selection: closed form epression for Laplacian regularized kernel independence measure (HSNIC) Eperimental results: projections better respect latent aspect Future work: Improved model selection criterion ratio used somewhat heuristic Other application of Laplacian regularized HSNIC Causalit inference ICA Thank ou.
24 More about V Covariance operator: Σ : H Y H X defined b f, Σ g HX = E, [f ()g()] E [f ()]E [g()] for all f H X, g H g.
25 More about V Covariance operator: Σ : H Y H X defined b f, Σ g HX = E, [f ()g()] E [f ()]E [g()] for all f H X, g H g. Normalized Cross-covariance operator, V, such that Σ = Σ 1 2 V Σ 1 2
26 Model Selection: Laplacian Regularized HSNIC Empirical estimate of V : ( 1 ˆV = n X T X + ε I + γ m 2 ) 1 ˆX T 2 1 Lˆ ˆX n X T Y ( 1 n Y T Y + ε I + γ m 2 Ŷ T Lŷ Ŷ Closed form solution [ ] ˆV 2 HS = Tr ˆV ˆV T = Tr [M M ] with M = I n ( ) 1 2 ni + 1 K 1 ( m 2 K ε ˆ I + Lˆ Kˆˆ ε ε γ ) 1 LˆKˆ) 1
Semi-supervised Laplacian Regularization of Kernel Canonical Correlation Analysis
Semi-supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Matthew B Blaschko, Christoph H Lampert, and Arthur Gretton Max Planck Institute for Biological Cybernetics Department
More informationKernel PCA, clustering and canonical correlation analysis
ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationSemi-supervised subspace analysis of human functional magnetic resonance imaging data
Max Planck Institut für biologische Kybernetik Max Planck Institute for Biological Cybernetics Technical Report No. 185 Semi-supervised subspace analysis of human functional magnetic resonance imaging
More informationMACHINE LEARNING 2012 MACHINE LEARNING
1 MACHINE LEARNING kernel CCA, kernel Kmeans Spectral Clustering 2 Change in timetable: We have practical session net week! 3 Structure of toda s and net week s class 1) Briefl go through some etension
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationPrincipal Component Analysis
Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)
More informationHilbert Space Representations of Probability Distributions
Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationResearch Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.
Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationUnit 12 Study Notes 1 Systems of Equations
You should learn to: Unit Stud Notes Sstems of Equations. Solve sstems of equations b substitution.. Solve sstems of equations b graphing (calculator). 3. Solve sstems of equations b elimination. 4. Solve
More informationLanguage and Statistics II
Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma
More informationKernel Measures of Conditional Dependence
Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for
More informationModels. Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence. Computer Science Departmental Seminar University of Bristol October 25 th, 2007
Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence Oxford Brookes University University of Manchester Computer Science Departmental Seminar University of Bristol October 25 th, 2007 Source code and slides
More information6.869 Advances in Computer Vision. Prof. Bill Freeman March 1, 2005
6.869 Advances in Computer Vision Prof. Bill Freeman March 1 2005 1 2 Local Features Matching points across images important for: object identification instance recognition object class recognition pose
More informationDimension Reduction and Low-dimensional Embedding
Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationHandout for Adequacy of Solutions Chapter SET ONE The solution to Make a small change in the right hand side vector of the equations
Handout for dequac of Solutions Chapter 04.07 SET ONE The solution to 7.999 4 3.999 Make a small change in the right hand side vector of the equations 7.998 4.00 3.999 4.000 3.999 Make a small change in
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLinear & Non-Linear Discriminant Analysis! Hugh R. Wilson
Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationPattern Recognition 45 (2012) Contents lists available at SciVerse ScienceDirect. Pattern Recognition
Pattern Recognition 5 () 5 8 Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr A unified dimensionality reduction framework for semi-paired
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationBig Hypothesis Testing with Kernel Embeddings
Big Hypothesis Testing with Kernel Embeddings Dino Sejdinovic Department of Statistics University of Oxford 9 January 2015 UCL Workshop on the Theory of Big Data D. Sejdinovic (Statistics, Oxford) Big
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationDependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression
Dependence and scatter-plots MVE-495: Lecture 4 Correlation and Regression It is common for two or more quantitative variables to be measured on the same individuals. Then it is useful to consider what
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationThis is a repository copy of Nonlinear modal analysis using pattern recognition.
This is a repository copy of Nonlinear modal analysis using pattern recognition. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/8985/ Version: Published Version Proceedings
More informationDimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }
More informationCSC 411 Lecture 12: Principal Component Analysis
CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationChapter 18 Quadratic Function 2
Chapter 18 Quadratic Function Completed Square Form 1 Consider this special set of numbers - the square numbers or the set of perfect squares. 4 = = 9 = 3 = 16 = 4 = 5 = 5 = Numbers like 5, 11, 15 are
More informationFrom Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples
From Last Meeting Studied Fisher Linear Discrimination - Mathematics - Point Cloud view - Likelihood view - Toy eamples - Etensions (e.g. Principal Discriminant Analysis) Polynomial Embedding Aizerman,
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationAdvanced Methods for Fault Detection
Advanced Methods for Fault Detection Piero Baraldi Agip KCO Introduction Piping and long to eploration distance pipelines activities Piero Baraldi Maintenance Intervention Approaches & PHM Maintenance
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationType II variational methods in Bayesian estimation
Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationColor Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14
Color Scheme www.cs.wisc.edu/ swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July 2016 1 / 14 Statistical Inference via Optimization Many problems in statistical inference
More informationA Selective Review of Sufficient Dimension Reduction
A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationWeakly-Paired Maximum Covariance Analysis for Multimodal Dimensionality Reduction and Transfer Learning
Weakly-Paired Maximum Covariance Analysis for Multimodal Dimensionality Reduction and Transfer Learning Christoph H. Lampert 1 and Oliver Krömer 2 1 Institute of Science and Technology Austria, Klosterneuburg,
More informationLinear Discriminant Functions
Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationLearning with Singular Vectors
Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:
More informationIndependent Component Analysis. Slides from Paris Smaragdis UIUC
Independent Component Analysis Slides from Paris Smaragdis UIUC 1 A simple audio problem 2 Formalizing the problem Each mic will receive a mi of both sounds Sound waves superimpose linearly We ll ignore
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationLecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA
Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationInference about the Slope and Intercept
Inference about the Slope and Intercept Recall, we have established that the least square estimates and 0 are linear combinations of the Y i s. Further, we have showed that the are unbiased and have the
More informationSTATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS
STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More information6. Vector Random Variables
6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationSecond-Order Inference for Gaussian Random Curves
Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)
More informationLecture 1c: Gaussian Processes for Regression
Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures AB = BA = I,
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 7 MATRICES II Inverse of a matri Sstems of linear equations Solution of sets of linear equations elimination methods 4
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More information