Performance Limits on the Classification of Kronecker-structured Models

Size: px
Start display at page:

Download "Performance Limits on the Classification of Kronecker-structured Models"

Transcription

1 Performance Limits on the Classification of Kronecker-structured Models Ishan Jindal and Matthew Nokleby Electrical and Computer Engineering Wayne State University

2 Motivation: Subspace models (Union of) subspace models useful for signal processing/machine learning problems Represent signals via (overcomplete) dictionary: y = Ax + z [Basri and Jacob, Lambertian Reflectance Functions and Linear Subspaces, 2003] [Aharon et al. "K-SVD: An algorithm for designing overcomplete dictionaries 2006]

3 Motivation: Subspace models (Union of) subspace models useful for signal processing/machine learning problems Represent signals via (overcomplete) dictionary: y = Ax + z Offers a compact representation Efficient algorithms for classification, clustering, and dictionary learning Good understanding of fundamental performance limits [Zhang and Li, Discriminative K-SVD for dictionary learning 2010] [Heckel et al., Dimensionality-reduced subspace clustering 2015] [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]

4 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

5 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

6 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

7 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction time space [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

8 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction Even more compact representation Efficient dictionary learning methods time space What is the classification performance? [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

9 Contributions Information-theoretic analysis of the classification performance of K-S models Performance metrics: Diversity order: How fast does the classification error rate decay as the SNR approaches infinity? Main results: Exact derivation of the diversity order Find the diversity gap with unstructured subspace models Not in this talk: (Classification) capacity: How many K-S subspaces can one discern in the limit of large signal dimension?

10 Contributions Information-theoretic analysis of the classification performance of K-S models Performance metrics: Diversity order: How fast does the classification error rate decay as the SNR approaches infinity? Main results: Exact derivation of the diversity order Find the diversity gap with unstructured subspace models Not in this talk: (Classification) capacity: How many K-S subspaces can one discern in the limit of large signal dimension? [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]

11 Problem Definition: Signal model Signal belongs to one of L classes Signal model for class 1 l L: Y = A l XB T l + Z Y 2 R m 1 m 2 A l 2 R m 1 n 1 B l 2 R m 2 n 2 : signal of interest X 2 R n 1 n 2 : coefficients : column subspace basis Z 2 R m 1 m 2 :model noise : row subspace basis n 1 <m 1, n 2 <m 2 Signal columns live near an n 1 -dimensional subspace, signal rows live near an n 2 -dimensional subspace

12 Problem Definition: Signal model Useful to work with vectorized signal model: y =(B l A l )x + z, y =vec(y ) 2 R M z =vec(z) 2 R M x =vec(x) 2 R N Signal lives near an n 1 n 2 -dimensional subspace with a Kronecker-structured dictionary Classification is over a restricted set of subspace models m 1 = + + m 2 n 1 x n 2 basis elements

13 Problem Definition: Classification model Let coefficients and i.i.d. noise be Gaussian: x N (0, I), z N (0, 2 I) Noise variance quantifies deviation from K-S model Class-conditional distribution is Gaussian: p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Equivalent to classification of a Gaussian mixture model with matrix normal components

14 Problem Definition: Classification model Let coefficients and i.i.d. noise be Gaussian: x N (0, I), z N (0, 2 I) Noise variance quantifies deviation from K-S model Class-conditional distribution is Gaussian: p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Equivalent to classification of a Gaussian mixture model with matrix normal components Classifier estimates the class label l from y (e.g. by ML rule) Want to understand the probability of error: P e = 1 L LX l=1 Pr(ˆl 6= l y p(y A l, B l ))

15 Diversity Order How fast does the error probability decay in the model noise? Fix L and bases A l, B l for each class Definition: The diversity order is the exponent: d = lim 2!0 log(p e ) 1 2 log(1/ 2 )

16 Diversity Order How fast does the error probability decay in the model noise? Fix L and bases A l, B l for each class Definition: The diversity order is the exponent: d = lim 2!0 log(p e ) 1 2 log(1/ 2 ) Isolates the impact of subspace model on performance For standard subspace models (m 2 =n 2 =1): d =min{n 1,m 1 n 1 } Equivalent K-S subspace conjecture : d =min{n 1 n 2,m 1 m 2 n 1 n 2 } [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]

17 Bhattacharrya Bound Main analytical tool: Bhattacharrya bound Lemma [Bhattacharyya Bound] : For two zero-mean Gaussian classes with covariances Σ 1 and Σ 2 and equal priors, the pairwise maximum-likelihood classification error is bounded by: P e apple /2 2 1/2! 1/2

18 Bhattacharrya Bound Main analytical tool: Bhattacharrya bound Lemma [Bhattacharyya Bound] : For two zero-mean Gaussian classes with covariances Σ 1 and Σ 2 and equal priors, the pairwise maximum-likelihood classification error is bounded by: P e apple /2 2 1/2! 1/2 Tight w.r.t the exponent of the pairwise error as P e 0 Union bound is be exponentially tight, too Diversity order: worst pairiwise exponent of this bound [Kailath, The divergence and Bhattacharyya distance measures in signal selection 1967]

19 Diversity Order: Bhattacharrya bound Recall the class-conditional distributions p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Define Kronecker-structured dictionaries for each class: D l = B l A l 2 R m 1m 2 n 1 n 2

20 Diversity Order: Bhattacharrya bound Recall the class-conditional distributions p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Define Kronecker-structured dictionaries for each class: P e (D i, D j ) apple 1 2 D i DT i +D j DT j +2 2 I 2 D i D T i + 2 I 1 2 D j D T j + 2 I 1 2 Simplifying, we obtain, for almost every class pair: r ij P e (D i, D j )=2 n 1 n ijr 2 ij 2 r ij D l = B l A l 2 R m 1m 2 n 1 n 2 Pairwise error probability between classes i and j is bounded:! 1 2 n 1 n 2 : rank of D i D T i + D j D T j ijrij : smallest nonzero eigenvalue of D i D T i + D j D T j

21 Rank Analysis Theorem: For any K-S classification problem, the diversity order is where d(a) =r n 1 n 2 r =min i6=j rank(d id T i + D j D T j )

22 Rank Analysis Theorem: For any K-S classification problem, the diversity order is where d(a) =r n 1 n 2 r =min i6=j rank(d id T i + D j D T j ) Almost every dictionary D l has rank n 1 n 2 In general: rank(d 1 + D 2 ) = rank(d 1 ) + rank(d 2 ) dim(r(d 1 ) \ R(D 2 ))

23 Rank of the Sum of K-S Dictionaries Lemma : Let the intersection of row and column range spaces be Then, dim[r(a i ) \ R(A j )] = x dim[r(b i ) \ R(B j )] = y dim[r(a i B i ) \ R(A j B j )] = xy Intuition: If row or column subspaces are disjoint, entire K-S subspaces are disjoint

24 Rank of the Sum of K-S Dictionaries Lemma : Let the intersection of row and column range spaces be Then, dim[r(a i ) \ R(A j )] = x dim[r(b i ) \ R(B j )] = y dim[r(a i B i ) \ R(A j B j )] = xy Corollary: For almost every pair of KS dictionaries, rank(d i D T i + D j D T j )=2n 1 n 2 [2n 1 m 1 ] + [2n 2 m 2 ] + Intuition: If row or column subspaces are disjoint, entire K-S subspaces are disjoint

25 Main Result: Diversity Order Corollary : For almost any K-S classification problem, the diversity order is d = n 1 n 2 [2n 1 m 1 ] + [2n 2 m 2 ] + How does this compare to the unstructured conjecture? d =min{n 1 n 2,m 1 m 2 n 1 n 2 } = n 1 n 2 [2n 1 n 2 m 1 m 2 ] + When subspace dimensions are small, diversity is the same With larger dimension, the K-S diversity order is smaller than that of unstructured subspaces Gap is bilinear in dimensions => quadratic in overall dimension [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]

26 Numerical results: Synthetic data Synthetic data: L=2 K-S bases drawn randomly, ML classification performance plotted vs. SNR Validates numerically the diversity results

27 Numerical Results: YaleB Learned K-S dictionaries from 10 samples per class of the YaleB face dataset Compare to: standard subspace classification, classification from SIFT features Much more compact representation, competitive classification accuracy YaleB faces Subspaces SIFT[1] K-S Test Accuracy (%) Number of parameters K-S dictionary elements [Ramirez et al. Classification and clustering via dictionary learning 2010]

28 Conclusion Summary: Studied the classification performance of Kroneckerstructured subspace models Derived the exact diversity order Found the the error rate is higher than of unstructured subspace models Future work: Discriminative K-S dictionary/subspace learning Good feature extraction for K-S data Higher order tensor models

Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers

Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers Matthew Nokleby, Member, IEEE, Miguel Rodrigues, Member, IEEE, and Robert Calderbank, Fellow, IEEE Abstract arxiv:404.587v

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

LECTURE NOTE #10 PROF. ALAN YUILLE

LECTURE NOTE #10 PROF. ALAN YUILLE LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Rank Determination for Low-Rank Data Completion

Rank Determination for Low-Rank Data Completion Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,

More information

Sample Complexity of Learning Mahalanobis Distance Metrics. Nakul Verma Janelia, HHMI

Sample Complexity of Learning Mahalanobis Distance Metrics. Nakul Verma Janelia, HHMI Sample Complexity of Learning Mahalanobis Distance Metrics Nakul Verma Janelia, HHMI feature 2 Mahalanobis Metric Learning Comparing observations in feature space: x 1 [sq. Euclidean dist] x 2 (all features

More information

Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo

Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery Florian Römer and Giovanni Del Galdo 2nd CoSeRa, Bonn, 17-19 Sept. 2013 Ilmenau University of Technology Institute for Information

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives

More information

Learning sets and subspaces: a spectral approach

Learning sets and subspaces: a spectral approach Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of

More information

Singular value decomposition (SVD) of large random matrices. India, 2010

Singular value decomposition (SVD) of large random matrices. India, 2010 Singular value decomposition (SVD) of large random matrices Marianna Bolla Budapest University of Technology and Economics marib@math.bme.hu India, 2010 Motivation New challenge of multivariate statistics:

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

Bilinear and quadratic forms

Bilinear and quadratic forms Bilinear and quadratic forms Zdeněk Dvořák April 8, 015 1 Bilinear forms Definition 1. Let V be a vector space over a field F. A function b : V V F is a bilinear form if b(u + v, w) = b(u, w) + b(v, w)

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

The Singular-Value Decomposition

The Singular-Value Decomposition Mathematical Tools for Data Science Spring 2019 1 Motivation The Singular-Value Decomposition The singular-value decomposition (SVD) is a fundamental tool in linear algebra. In this section, we introduce

More information

A Unifying View of Image Similarity

A Unifying View of Image Similarity ppears in Proceedings International Conference on Pattern Recognition, Barcelona, Spain, September 1 Unifying View of Image Similarity Nuno Vasconcelos and ndrew Lippman MIT Media Lab, nuno,lip mediamitedu

More information

ON BEAMFORMING WITH FINITE RATE FEEDBACK IN MULTIPLE ANTENNA SYSTEMS

ON BEAMFORMING WITH FINITE RATE FEEDBACK IN MULTIPLE ANTENNA SYSTEMS ON BEAMFORMING WITH FINITE RATE FEEDBACK IN MULTIPLE ANTENNA SYSTEMS KRISHNA KIRAN MUKKAVILLI ASHUTOSH SABHARWAL ELZA ERKIP BEHNAAM AAZHANG Abstract In this paper, we study a multiple antenna system where

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Application of Numerical Algebraic Geometry to Geometric Data Analysis

Application of Numerical Algebraic Geometry to Geometric Data Analysis Application of Numerical Algebraic Geometry to Geometric Data Analysis Daniel Bates 1 Brent Davis 1 Chris Peterson 1 Michael Kirby 1 Justin Marks 2 1 Colorado State University - Fort Collins, CO 2 Air

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Sparse Subspace Clustering

Sparse Subspace Clustering Sparse Subspace Clustering Based on Sparse Subspace Clustering: Algorithm, Theory, and Applications by Elhamifar and Vidal (2013) Alex Gutierrez CSCI 8314 March 2, 2017 Outline 1 Motivation and Background

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Dimensionality Reduction and Principle Components

Dimensionality Reduction and Principle Components Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

LOW ALGEBRAIC DIMENSION MATRIX COMPLETION

LOW ALGEBRAIC DIMENSION MATRIX COMPLETION LOW ALGEBRAIC DIMENSION MATRIX COMPLETION Daniel Pimentel-Alarcón Department of Computer Science Georgia State University Atlanta, GA, USA Gregory Ongie and Laura Balzano Department of Electrical Engineering

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Change Detection in Multivariate Data

Change Detection in Multivariate Data Change Detection in Multivariate Data Likelihood and Detectability Loss Giacomo Boracchi July, 8 th, 2016 giacomo.boracchi@polimi.it TJ Watson, IBM NY Examples of CD Problems: Anomaly Detection Examples

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Nonparametric Bayesian Methods

Nonparametric Bayesian Methods Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Lecture 9: Diversity-Multiplexing Tradeoff Theoretical Foundations of Wireless Communications 1

Lecture 9: Diversity-Multiplexing Tradeoff Theoretical Foundations of Wireless Communications 1 : Diversity-Multiplexing Tradeoff Theoretical Foundations of Wireless Communications 1 Rayleigh Friday, May 25, 2018 09:00-11:30, Kansliet 1 Textbook: D. Tse and P. Viswanath, Fundamentals of Wireless

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principal Components Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Détection de sources astrophysiques dans les données du spectrographe intégral de champ MUSE

Détection de sources astrophysiques dans les données du spectrographe intégral de champ MUSE Détection de sources astrophysiques dans les données du spectrographe intégral de champ MUSE David Mary, André Ferrari and Raja Fazliza R.S. Université de Nice Sophia Antipolis, France Porquerolles, 15

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

The Lorentz group provides another interesting example. Moreover, the Lorentz group SO(3, 1) shows up in an interesting way in computer vision.

The Lorentz group provides another interesting example. Moreover, the Lorentz group SO(3, 1) shows up in an interesting way in computer vision. 190 CHAPTER 3. REVIEW OF GROUPS AND GROUP ACTIONS 3.3 The Lorentz Groups O(n, 1), SO(n, 1) and SO 0 (n, 1) The Lorentz group provides another interesting example. Moreover, the Lorentz group SO(3, 1) shows

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures STIAN NORMANN ANFINSEN ROBERT JENSSEN TORBJØRN ELTOFT COMPUTATIONAL EARTH OBSERVATION AND MACHINE LEARNING LABORATORY

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Section 2: Classes of Sets

Section 2: Classes of Sets Section 2: Classes of Sets Notation: If A, B are subsets of X, then A \ B denotes the set difference, A \ B = {x A : x B}. A B denotes the symmetric difference. A B = (A \ B) (B \ A) = (A B) \ (A B). Remarks

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type. Expectations of Sums of Random Variables STAT/MTHE 353: 4 - More on Expectations and Variances T. Linder Queen s University Winter 017 Recall that if X 1,...,X n are random variables with finite expectations,

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

MAXIMUM A POSTERIORI ESTIMATION OF SIGNAL RANK. PO Box 1500, Edinburgh 5111, Australia. Arizona State University, Tempe AZ USA

MAXIMUM A POSTERIORI ESTIMATION OF SIGNAL RANK. PO Box 1500, Edinburgh 5111, Australia. Arizona State University, Tempe AZ USA MAXIMUM A POSTERIORI ESTIMATION OF SIGNAL RANK Songsri Sirianunpiboon Stephen D. Howard, Douglas Cochran 2 Defence Science Technology Organisation PO Box 500, Edinburgh 5, Australia 2 School of Mathematical

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information

Semi-Supervised Learning by Multi-Manifold Separation

Semi-Supervised Learning by Multi-Manifold Separation Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob

More information

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?) Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Scalable Subspace Clustering

Scalable Subspace Clustering Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns

More information