Tensor Canonical Correlation Analysis and Its applications

Size: px
Start display at page:

Download "Tensor Canonical Correlation Analysis and Its applications"

Transcription

1 Tensor Canonical Correlation Analysis and Its applications Presenter: Yong LUO The work is done when Yong LUO was a Research Fellow at Nanyang Technological University, Singapore

2 Outline Y. Luo, D. C. Tao, R. Kotagiri, C. Xu, and Y. G. Wen, Tensor Canonical Correlation Analysis for Multiview Dimension Reduction, IEEE Transactions on Knowledge and Data Engineering (T-KDE), vol. 27, no. 11, pp , Y. Luo, Y. G. Wen and D. C. Tao, On Combining Side Information and Unlabeled Data for Heterogeneous Multi-task Metric Learning, International Joint Conference on Artificial Intelligence (IJCAI), pp , 2016.

3 Multi-view dimension reduction (MVDR) Dimension reduction (DR) Find a low-dimensional representation for high dimensional data Benefits: reduce the chance of over-fitting, reduce computational cost, etc. Approaches: feature selection (IG, MI, sparse learning, etc.), feature transformation (PCA, LDA, LE, etc.)

4 MVDR Real world objects usually contain information from multiple sources, and can be extracted different kinds of features. Traditional DR methods cannot effectively handle multiple types of features Feature concatenation

5 MVDR Multi-view learning Learn to fuse multiple distinct feature representations Families: weighted view combination, multi-view dimension reduction, view agreement exploration Multi-view dimension reduction Multi-view feature selection Multi-view subspace learning: seek a low-dimensional common subspace to compactly represent the heterogeneous data; One of the most representative model: CCA

6 Canonical correlation analysis (CCA) Objective of CCA Correlation maximization on the common subspace z 1n = x T 1n h 1 z 2n = x T 2n h 2 z x 1 x 2 argmax z 1,z 2 ρ = corr z 1, z 2 = h 1 T C 12 h 2 h 1 T C 11 h 1 h 2 T C 22 h 2 H. Hotelling, Relations between two sets of variants, Biometrika, D. P. Foster, et al., Multi-view dimensionality reduction via canonical correlation analysis, Tech. Rep., 2008.

7 Generalizations of CCA to several views CCA-MAXVAR Generalizes CCA to M 2 views argmin M z,a, h p m=1 1 M M m=1 z α m z m 2 2, s. t. z m 2 = 1 z m = X m T h m is the vector of canonical variables for the m th view, and z is a centroid representation Solutions can be obtained using the SVD of X m J. R. Kettenring, Canonical analysis of several sets of variables, Biometrika, 1971.

8 Generalizations of CCA to several views CCA-LS argmin h M m m=1 1 2M M 1 M p,q=1 X p T h p X q T h q 2 2 s. t. 1 M m m=1 h m T C mm h m = 1 Equivalent to CCA-MAXVAR, but can be solved efficiently and adaptively based on LS regression J. Via et al., A learning algorithm for adaptive canonical correlation analysis of several data sets, Neural Networks, 2007.

9 The proposed TCCA framework Main drawback of CCA-MAXVAR and CCA-LS Only the statistics (correlation information) between pairs of features is explored, while high-order statistics is ignored Tensor CCA Directly maximize the high-order correlation between all views x 3 x 3 d 3 x 1 d 3 x 1 d 2 d 2 x 2 d 1 Pairwise correlation d 1 x 2 High order tensor correlation

10 The proposed TCCA framework for MVDR N Tensor CCA LAB WT d 1 X 1 d 3 x 3 x 1 u 3 1 λ 1 u λ r u 3 r u 1 r d 2 X 2 d 2 C 123 x 2 d 1 u 2 1 u 2 r SIFT d 3 X 3 Covariance tensor Sum of rank-1 approximation Mapping U 1 r Z 1 U 2 r 3r Z 2 Z U 3 r Z 3

11 Tensor basics Generalization of an n-dimensional array Scalar: order-0 tensor Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor

12 Tensor basics Tensor-matrix multiplication The m-mode product of an I 1 I 2 I M tensor A and an J m I m matrix U is a tensor B = A m U of size I 1 I m 1 J m I m+1 I M with the element B i 1,, i m 1, j m, i m+1,, i M = i m =1 The product of A and a sequence of matrices ሼU m I m A i 1, i 2,, i M U j m, i m B = A 1 U 1 2 U 2 M U M

13 Tensor basics Tensor-vector multiplication The contracted m-mode product of A and an I m -vector u is an I 1 I m 1 I m+1 I M tensor B = A ഥ m u of order M 1 with the entry B i 1,, i m 1, i m+1,, i M = i m =1 Tensor-tensor multiplication Outer product, contracted product, inner product i 1 =1 i 2 =1 I m A i 1, i 2,, i M u i m Frobenius norm of the tensor I 1 I 2 I M A 2 2 F = A, A = A i 1, i 2,, i M i M =1

14 Tensor basics Matricization The mode-m matricization of A is denoted as an I m I 1 I m 1 I m+1 I M matrix A m row-wise vectorizing column-wise vectorizing A 1 A 2 mode-1 mode-2 A frontal matricizing A 3 horizontal matricizing

15 Tensor basics Matricization property The m-mode multiplication B = A m U can be manipulated as matrix multiplication by storing the tensors in metricized form, i.e., B m = UA m The series of m-mode product can be expressed as a series of Kronecker products B = A 1 U 1 2 U 2 M U M T B m = U m A m U cm 1 U cm 1 U cm 1 c 1, c 2,, c K = m + 1, m + 2,, M, 1, 2,, m 1 is a forward cyclic ordering for indices of the tensor dims

16 TCCA formulation Optimization problem Maximize the correlation between the canonical variables z m = X m T h m, m = 1,, M: argmax h m ρ = corr z 1, z 2,, z M = z 1 z 2 z M T e, Equivalent formulation s. t. z m T z m = 1, m = 1,, M argmax h m ρ = C 12 m ഥ 1 h 1 T ഥ 2 h 2 T ഥ M h M T, s. t. h m T C mm + εi h m = 1, m = 1,, M Covariance tensor: C 12 M = 1 σ N n=1 N x 1n x 2n x Mn

17 TCCA formulation Reformulation 1Τ Let M = C 12 M 1 ሚC 2 1Τ 11 2 ሚC 2 1Τ 22 M ሚC 2 MM, and 1Τ u m = ሚC 2 mm h m, where ሚC mm = C mm + εi Main solution argmax u m ρ = M ഥ 1 u 1 T ഥ 2 u 2 T ഥ M u M T, s. t. u m T u m = 1, m = 1,, M If define M = ρu 1 u 2 u M, problem becomes 2, [Lathauwer et al., 2000a] argmin M M u F m Solved by alternating least square (ALS), high-order power method (HOPM), etc. L. De Lathauwer et al., On the best Rank-1 and rank-(r1, r2,..., rn) approximation of higher-order tensors, SIAM J. Matrix Anal. Appl., 2000.

18 TCCA solution Solutions Remaining solutions: recursively maximizing the same correlation as presented in the main TCCA problem All solutions: the best sum of rank-1 approximation, i.e., rank-r CP decomposition of M Projected data r M k=1 ρ k u 1 k u 2 k u M k Z m = XT 1Τ m ሚC 2 mm U m U m = u m 1,, u m r

19 KTCCA formulation Non-linear extension Non-linear feature mapping : X m = x m1, x m2,, x mn Canonical variables: z m = T X m h m Representer theorem: h m = X m a m Optimization problem argmax ρ = K 12 M ഥ 1 a T 1 ഥ 2 a T 2 ഥ M a T M, a m s. t. a T 2 m K mm + εk mm a m = 1, m = 1,, M L m T L m

20 KTCCA solution Reformulation 1Τ Let S = K 12 m 1 ሚC 2 1Τ 11 2 ሚC 2 1Τ 22 M ሚC 2 MM, and b m = 1Τ ሚC 2 mm a m : argmax u m ρ = S ഥ 1 b 1 T ഥ 2 b 2 T ഥ M b M T, Solved by ALS Projected data: s. t. b m T b m = 1, m = 1,, M Z m = K mm L m 1 B m, m = 1,, M

21 Experimental setup Datasets SecStr: biometric structure prediction 84K instances, 100 as labeled, additional 1200K unlabeled 3 views: attributes based on left, middle, right context generated from the sequence window of amino acid. Each view is 105-D Advertisement classification 3279 instances, 100 as labeled 3 views: features based on the terms in the images (588-D), terms in the current URL (495-D), and terms in the anchor URL (472-D) Web image annotation images, {4, 6, 8} labeled instances for each of 10 concepts 3 views: 500-D SIFT visual words, 144-D color, 128-D wavelet Classifier: RLS and KNN Evaluation criterion: Prediction/classification/annotation accuracy

22 Experimental setup Compared methods BSF: best single view feature CAT: concatenation of the normalized features FRAC: a recent multi-view feature selection algorithm CCA: m m 1 Τ2 subsets of two views CCA (BST): the best subset CCA (AVG): the average performance of all subsets CCA-LS: traditional generalizations of CCA to several views DSE: a popular unsupervised multi-view DR method SSMVD: a recent unsupervised multi-view DR method TCCA: the proposed method

23 Experimental results and analysis Biometric structure prediction Unlabeled = 84K Unlabeled = 1.3M Learn common subspace > CAT > BSF SSMVD, CCA-LS are comparable, so are DSE, CCA (BST) TCCA is the best at most dims, and does not decrease significantly when dim is high

24 Experimental results and analysis Web image annotation Non-linear Linear DSE comparable to CCA (BST), CCA (AVG) TCCA > SSMVD, and is better than the other CCA based methods Non-linear > linear

25 Conclusions and discussion Conclusions Finding a common subspace for all views using the CCAbased strategy is often better than simply concatenating all the features, especially when the feature dimension is high Examining more statistics, which may require more unlabeled data to be utilized, often leads to better performance; By exploring the high-order statistics, the proposed TCCA outperforms the other methods Discussion Can the common subspace be used for knowledge transfer between different views?

26 Distance metric learning (DML) Goal: learn appropriate distance function over input space to reflect relationships between data Useful in many ML algorithms, e.g., clustering, classification and information retrieval Most common DML scheme: Mahalanobis metric learning >>> learning linear transformation d A x i, x j = x i x j T A xi x j, d A x i, x j = Ux i Ux j 2 2 A = UU T Non-linear and local DML are able to capture complex structure in the data

27 Transfer DML (TDML) Motivation Needs large amount of side information to learn robust distance metric The training samples are insufficient in the task/domain of interest (target task/domain) We have abundant labeled data in certain related, but different tasks/domains (source tasks/domains) Goal Utilize the metrics obtained from source tasks/domains to help metric learning in the target tasks/domains

28 Homogeneous TDML (HoTDML) Data of source domain and target domain drawn from different distributions (same feature space) Examples [Pan and Yang, 2009] Web document classification: university website -> new website Indoor WiFi localization: WiFi signal-strength values change in different time periods or on different devices Sentiment classification: distribution of reviews among different types of products can be very different Challenge How to utilize the source information appropriately given the different distributions or find a subspace in which the distribution difference is reduced S. J. Pan and Q. Yang, A survey on transfer learning, IEEE TKDE, 2010.

29 Heterogeneous TDML (HeTDML) Data of source domain and target domain lie in different feature spaces, and may have different semantics Examples Multi-lingual document classification Labeled reviews in Spanish (scarce) Labeled reviews in English (abundant) Classify reviews in Spanish multi-view classification or retrieval, etc. Challenge How to find correspondences or common representations for the different domains

30 HeTDML existing solutions Heterogeneous transfer learning (HTL) approaches usually transform heterogeneous features into a common subspace, and the transformation can be used to derive a metric Groups Heterogeneous domain adaptation (HDA) Improve the performance in target domain Most only handle two domains Heterogeneous multi-task learning (HMTL) Improve the performance of all domains simultaneously

31 Heterogeneous multi-task metric learning (HMTML) Limitations of existing HMTL approaches Do not optimize w.r.t. the metric Mainly focus on utilizing the side information Can only explore the pairwise relationships between different domains, the high-order statistics that can only be obtained by simultaneously examining all domains is ignored Our method Handle arbitrary number of domains, and directly optimize w.r.t. the metrics Make use of large amounts of unlabeled data to build domain connections Explore high-order statistics between all domains

32 HMTML framework Unlabeled data English Germany D 1 L D U D M L A 1 = U 1 U 1 T U x 11 U x 12 U x1n X 1 U U U x x M2 M1 X M U U x MN A M = U M U M T U 1 A 1 = U 1 U 1 T U z 11 U z12 Z 1 U z 1N U U z z U MN M1 U zm2 Z M U Tensor based correlation maximization U M A M = U M U M T

33 HMTML formulation Optimization problem General formulation argmin A M m m=1 2 M F A m = m=1 Ψ A m + γr A 1, A 2,, A M, s.t. A m 0, m = 1, 2,, M, Ψ A m = σ N m N m 1 i<j L A m ; x mi, x mj, y mij is the empirical loss w.r.t. A m R A 1, A 2,, A M enforces information transfer across different domains

34 Knowledge transfer by high-order correlation maximization Main idea Decompose A m as A m = U m UT m Use U m to project the unlabeled data points of different domains into a common subspace, where the correlation of all domains are maximized Formulation argmax U M m m=1 1 N U n=1 N U corr z U 1n, z U U 2n,, z Mn, corr z U 1n, z U U 2n,, z Mn = z U 1n z U U 2n z Mn the correlation of the projected representations z U mn = U T m xu M mn m=1 T e is

35 Knowledge transfer by high-order correlation maximization Reformulation G = E r 1 U 1 2 U 2 M U M is the covariance tensor of the mappings; C n U is the covariance tensor of the representations for the n th unlabeled sample. argmax U M m m=1 argmax U M m m=1 argmin U M m m=1 1 N U n=1 1 N U n=1 1 N U n=1 N U corr z U 1n, z U U 2n,, z Mn N U G ഥ 1 N U C n U G F 2 [Luo et al., 2015] x U T U 1n ഥ M x Mn T [Lathauwer et al., 2000b] Y. Luo et al., Tensor Canonical Correlation Analysis for Multi-view Dimension Reduction, IEEE TKDE, L. De Lathauwer et al., A multilinear singular value decomposition, JMAA, 2000.

36 HMTML formulation Specific optimization problem M 1 N m argmin F U m = U M m m=1 m=1 N m + γ N U N U n=1 k=1 g y mk 1 δ T mk U m U T m δ mk C n U G F 2 + m=1 Corresponds to find a subspace where the representations of all domains are close to each other Knowledge is transferred in this subspace, and so different domains can help each other in learning the mapping U m, or equivalently the metric A m M γ m U m 1

37 HMTML solution Rewrite C n U G F 2 to an expression w.r.t. U m G = E r 1 U 1 2 U 2 M U M = B m U m, Metricizing property [Lathauwer et al., 2000a] G m C n U G F 2 = = U m B m U C n m C n U G F 2 = U C n m U m B m F 2 G m F 2 B = E r 1 U 1 m 1 U m 1 m+1 U m+1 M U M Alternating for each U m and solve each subproblem w.r.t. U m by projected gradient descent L. De Lathauwer et al., On the best Rank-1 and rank-(r1, r2,..., rn) approximation of higher-order tensors, SIAM J. Matrix Anal. Appl., 2000.

38 Experiments Datasets and features Reuters multilingual collection (RMLC) 6 categories, 3 domains: English (EN), Italian (IT), Spanish (SP) Number of Documents: EN=18758, IT=24039, SP=12342 TF-IDF features, PCA preprocess to find comparable and highlevel patterns for transfer NUS WIDE Subset of 12 animal concepts, images + tags {SIFT, wavelet, tag} + PCA preprocess, each representation is a domain Evaluation criteria Accuracy, MacroF1

39 Experiments Compared methods EU: Euclidean distance between samples based on their original feature representations RDML: an efficient and competitive DML algorithm, does not make use of any additional information from other domains DAMA: constructing mappings U m to link multiple heterogeneous domains using manifold alignment MTDA: the multi-task extension of linear discriminative analysis HMTML: the proposed method

40 Experiments Average perf. of all domains w.r.t. common factors Although the labeled samples in each domain is scarce, learning the distance metric separately using RDML can still improve the performance significantly

41 Experiments Average perf. of all domains w.r.t. common factors All the three heterogeneous transfer learning approaches achieve much better performance than RDML. This indicates that it is useful to leverage information from other domains in DML

42 Experiments Average perf. of all domains w.r.t. common factors HMTML outperforms both DAMA and MTDA at most numbers (of common factors). This indicates that the learned factors by our method are more expressive than the other approaches

43 Experiments Performance for individual domains RDML improves the performance in each domain, and the improvements are similar for different domains, since there is no communication between them

44 Experiments Performance for individual domains Transfer learning methods has much larger improvements than RDML in the domains that the discriminative ability of the original representations is not very good. This demonstrates that the knowledge is successfully transferred between different domains

45 Experiments Performance for individual domains The discriminative domain obtains little benefit from the other relatively non-discriminative domains in DAMA and MTDA, while in the proposed HMTML, we still achieve significant improvements

46 Conclusions The labeled data deficiency problem can be alleviated by learning metrics for multiple heterogeneous domains simultaneously The shared knowledge of different domains exploited by the transfer learning methods can benefit each domain if appropriate common factors are discovered, and the high-order statistics (correlation information) is critical in discovering such factors

47 Thank You! Q & A

THE objective of distance metric learning (DML) is to. Heterogeneous Multi-task Metric Learning across Multiple Domains

THE objective of distance metric learning (DML) is to. Heterogeneous Multi-task Metric Learning across Multiple Domains > TNNLS-2015-P-5907 REVISION 4 < 1 Heterogeneous Multi-task Metric Learning across Multiple Domains Yong Luo, Yonggang Wen, Senior Member, IEEE, and Dacheng Tao, Fellow, IEEE arxiv:1904.04081v1 [stat.ml]

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Kronecker Decomposition for Image Classification

Kronecker Decomposition for Image Classification university of innsbruck institute of computer science intelligent and interactive systems Kronecker Decomposition for Image Classification Sabrina Fontanella 1,2, Antonio Rodríguez-Sánchez 1, Justus Piater

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Multi-Label Informed Latent Semantic Indexing

Multi-Label Informed Latent Semantic Indexing Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate Technology Department of Neural Computation 2 University of Munich

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Exploiting High-Order Information in Heterogeneous Multi-Task Feature Learning

Exploiting High-Order Information in Heterogeneous Multi-Task Feature Learning Exploiting High-Order Information in Heterogeneous Multi-Task Feature Learning Yong Luo, Dacheng Tao, Yonggang Wen School of Computer Science and Engineering, Nanyang Technological University, Singapore

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are

More information

18 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008

18 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 18 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 MPCA: Multilinear Principal Component Analysis of Tensor Objects Haiping Lu, Student Member, IEEE, Konstantinos N. (Kostas) Plataniotis,

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Research Overview. Kristjan Greenewald. February 2, University of Michigan - Ann Arbor

Research Overview. Kristjan Greenewald. February 2, University of Michigan - Ann Arbor Research Overview Kristjan Greenewald University of Michigan - Ann Arbor February 2, 2016 2/17 Background and Motivation Want efficient statistical modeling of high-dimensional spatio-temporal data with

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Learning Bound for Parameter Transfer Learning

Learning Bound for Parameter Transfer Learning Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Fundamentals of Multilinear Subspace Learning

Fundamentals of Multilinear Subspace Learning Chapter 3 Fundamentals of Multilinear Subspace Learning The previous chapter covered background materials on linear subspace learning. From this chapter on, we shall proceed to multiple dimensions with

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification 250 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification Keigo Kimura, Mineichi Kudo and Lu Sun Graduate

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo

Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery Florian Römer and Giovanni Del Galdo 2nd CoSeRa, Bonn, 17-19 Sept. 2013 Ilmenau University of Technology Institute for Information

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Multi-scale Geometric Summaries for Similarity-based Upstream S

Multi-scale Geometric Summaries for Similarity-based Upstream S Multi-scale Geometric Summaries for Similarity-based Upstream Sensor Fusion Duke University, ECE / Math 3/6/2019 Overall Goals / Design Choices Leverage multiple, heterogeneous modalities in identification

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Cross-Modal Retrieval: A Pairwise Classification Approach

Cross-Modal Retrieval: A Pairwise Classification Approach Cross-Modal Retrieval: A Pairwise Classification Approach Aditya Krishna Menon 1,2, Didi Surian 1,3 and Sanjay Chawla 3,4 1 National ICT Australia, Australia 2 Australian National University, Australia

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

Salt Dome Detection and Tracking Using Texture Analysis and Tensor-based Subspace Learning

Salt Dome Detection and Tracking Using Texture Analysis and Tensor-based Subspace Learning Salt Dome Detection and Tracking Using Texture Analysis and Tensor-based Subspace Learning Zhen Wang*, Dr. Tamir Hegazy*, Dr. Zhiling Long, and Prof. Ghassan AlRegib 02/18/2015 1 /42 Outline Introduction

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Clustering based tensor decomposition

Clustering based tensor decomposition Clustering based tensor decomposition Huan He huan.he@emory.edu Shihua Wang shihua.wang@emory.edu Emory University November 29, 2017 (Huan)(Shihua) (Emory University) Clustering based tensor decomposition

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Notation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions

Notation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions Notation S pattern space X feature vector X = [x 1,...,x l ] l = dim{x} number of features X feature space K number of classes ω i class indicator Ω = {ω 1,...,ω K } g(x) discriminant function H decision

More information

EE 381V: Large Scale Learning Spring Lecture 16 March 7

EE 381V: Large Scale Learning Spring Lecture 16 March 7 EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

Higher-Order Singular Value Decomposition (HOSVD) for structured tensors

Higher-Order Singular Value Decomposition (HOSVD) for structured tensors Higher-Order Singular Value Decomposition (HOSVD) for structured tensors Definition and applications Rémy Boyer Laboratoire des Signaux et Système (L2S) Université Paris-Sud XI GDR ISIS, January 16, 2012

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Wei Wu a, Hang Li b, Jun Xu b a Department of Probability and Statistics, Peking University b Microsoft Research

More information

Higher Order Separable LDA Using Decomposed Tensor Classifiers

Higher Order Separable LDA Using Decomposed Tensor Classifiers Higher Order Separable LDA Using Decomposed Tensor Classifiers Christian Bauckhage, Thomas Käster and John K. Tsotsos Centre for Vision Research, York University, Toronto, ON, M3J 1P3 http://cs.yorku.ca/laav

More information

A Multi-Affine Model for Tensor Decomposition

A Multi-Affine Model for Tensor Decomposition Yiqing Yang UW Madison breakds@cs.wisc.edu A Multi-Affine Model for Tensor Decomposition Hongrui Jiang UW Madison hongrui@engr.wisc.edu Li Zhang UW Madison lizhang@cs.wisc.edu Chris J. Murphy UC Davis

More information

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)

More information

Reservoir Computing and Echo State Networks

Reservoir Computing and Echo State Networks An Introduction to: Reservoir Computing and Echo State Networks Claudio Gallicchio gallicch@di.unipi.it Outline Focus: Supervised learning in domain of sequences Recurrent Neural networks for supervised

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Riemannian Metric Learning for Symmetric Positive Definite Matrices CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides CSE 494/598 Lecture-6: Latent Semantic Indexing LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Homework-1 and Quiz-1 Project part-2 released

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Myoelectrical signal classification based on S transform and two-directional 2DPCA

Myoelectrical signal classification based on S transform and two-directional 2DPCA Myoelectrical signal classification based on S transform and two-directional 2DPCA Hong-Bo Xie1 * and Hui Liu2 1 ARC Centre of Excellence for Mathematical and Statistical Frontiers Queensland University

More information