Performance Limits on the Classification of Kronecker-structured Models
|
|
- Shawn Dale Summers
- 6 years ago
- Views:
Transcription
1 Performance Limits on the Classification of Kronecker-structured Models Ishan Jindal and Matthew Nokleby Electrical and Computer Engineering Wayne State University
2 Motivation: Subspace models (Union of) subspace models useful for signal processing/machine learning problems Represent signals via (overcomplete) dictionary: y = Ax + z [Basri and Jacob, Lambertian Reflectance Functions and Linear Subspaces, 2003] [Aharon et al. "K-SVD: An algorithm for designing overcomplete dictionaries 2006]
3 Motivation: Subspace models (Union of) subspace models useful for signal processing/machine learning problems Represent signals via (overcomplete) dictionary: y = Ax + z Offers a compact representation Efficient algorithms for classification, clustering, and dictionary learning Good understanding of fundamental performance limits [Zhang and Li, Discriminative K-SVD for dictionary learning 2010] [Heckel et al., Dimensionality-reduced subspace clustering 2015] [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]
4 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]
5 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]
6 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]
7 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction time space [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]
8 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction Even more compact representation Efficient dictionary learning methods time space What is the classification performance? [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]
9 Contributions Information-theoretic analysis of the classification performance of K-S models Performance metrics: Diversity order: How fast does the classification error rate decay as the SNR approaches infinity? Main results: Exact derivation of the diversity order Find the diversity gap with unstructured subspace models Not in this talk: (Classification) capacity: How many K-S subspaces can one discern in the limit of large signal dimension?
10 Contributions Information-theoretic analysis of the classification performance of K-S models Performance metrics: Diversity order: How fast does the classification error rate decay as the SNR approaches infinity? Main results: Exact derivation of the diversity order Find the diversity gap with unstructured subspace models Not in this talk: (Classification) capacity: How many K-S subspaces can one discern in the limit of large signal dimension? [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]
11 Problem Definition: Signal model Signal belongs to one of L classes Signal model for class 1 l L: Y = A l XB T l + Z Y 2 R m 1 m 2 A l 2 R m 1 n 1 B l 2 R m 2 n 2 : signal of interest X 2 R n 1 n 2 : coefficients : column subspace basis Z 2 R m 1 m 2 :model noise : row subspace basis n 1 <m 1, n 2 <m 2 Signal columns live near an n 1 -dimensional subspace, signal rows live near an n 2 -dimensional subspace
12 Problem Definition: Signal model Useful to work with vectorized signal model: y =(B l A l )x + z, y =vec(y ) 2 R M z =vec(z) 2 R M x =vec(x) 2 R N Signal lives near an n 1 n 2 -dimensional subspace with a Kronecker-structured dictionary Classification is over a restricted set of subspace models m 1 = + + m 2 n 1 x n 2 basis elements
13 Problem Definition: Classification model Let coefficients and i.i.d. noise be Gaussian: x N (0, I), z N (0, 2 I) Noise variance quantifies deviation from K-S model Class-conditional distribution is Gaussian: p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Equivalent to classification of a Gaussian mixture model with matrix normal components
14 Problem Definition: Classification model Let coefficients and i.i.d. noise be Gaussian: x N (0, I), z N (0, 2 I) Noise variance quantifies deviation from K-S model Class-conditional distribution is Gaussian: p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Equivalent to classification of a Gaussian mixture model with matrix normal components Classifier estimates the class label l from y (e.g. by ML rule) Want to understand the probability of error: P e = 1 L LX l=1 Pr(ˆl 6= l y p(y A l, B l ))
15 Diversity Order How fast does the error probability decay in the model noise? Fix L and bases A l, B l for each class Definition: The diversity order is the exponent: d = lim 2!0 log(p e ) 1 2 log(1/ 2 )
16 Diversity Order How fast does the error probability decay in the model noise? Fix L and bases A l, B l for each class Definition: The diversity order is the exponent: d = lim 2!0 log(p e ) 1 2 log(1/ 2 ) Isolates the impact of subspace model on performance For standard subspace models (m 2 =n 2 =1): d =min{n 1,m 1 n 1 } Equivalent K-S subspace conjecture : d =min{n 1 n 2,m 1 m 2 n 1 n 2 } [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]
17 Bhattacharrya Bound Main analytical tool: Bhattacharrya bound Lemma [Bhattacharyya Bound] : For two zero-mean Gaussian classes with covariances Σ 1 and Σ 2 and equal priors, the pairwise maximum-likelihood classification error is bounded by: P e apple /2 2 1/2! 1/2
18 Bhattacharrya Bound Main analytical tool: Bhattacharrya bound Lemma [Bhattacharyya Bound] : For two zero-mean Gaussian classes with covariances Σ 1 and Σ 2 and equal priors, the pairwise maximum-likelihood classification error is bounded by: P e apple /2 2 1/2! 1/2 Tight w.r.t the exponent of the pairwise error as P e 0 Union bound is be exponentially tight, too Diversity order: worst pairiwise exponent of this bound [Kailath, The divergence and Bhattacharyya distance measures in signal selection 1967]
19 Diversity Order: Bhattacharrya bound Recall the class-conditional distributions p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Define Kronecker-structured dictionaries for each class: D l = B l A l 2 R m 1m 2 n 1 n 2
20 Diversity Order: Bhattacharrya bound Recall the class-conditional distributions p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Define Kronecker-structured dictionaries for each class: P e (D i, D j ) apple 1 2 D i DT i +D j DT j +2 2 I 2 D i D T i + 2 I 1 2 D j D T j + 2 I 1 2 Simplifying, we obtain, for almost every class pair: r ij P e (D i, D j )=2 n 1 n ijr 2 ij 2 r ij D l = B l A l 2 R m 1m 2 n 1 n 2 Pairwise error probability between classes i and j is bounded:! 1 2 n 1 n 2 : rank of D i D T i + D j D T j ijrij : smallest nonzero eigenvalue of D i D T i + D j D T j
21 Rank Analysis Theorem: For any K-S classification problem, the diversity order is where d(a) =r n 1 n 2 r =min i6=j rank(d id T i + D j D T j )
22 Rank Analysis Theorem: For any K-S classification problem, the diversity order is where d(a) =r n 1 n 2 r =min i6=j rank(d id T i + D j D T j ) Almost every dictionary D l has rank n 1 n 2 In general: rank(d 1 + D 2 ) = rank(d 1 ) + rank(d 2 ) dim(r(d 1 ) \ R(D 2 ))
23 Rank of the Sum of K-S Dictionaries Lemma : Let the intersection of row and column range spaces be Then, dim[r(a i ) \ R(A j )] = x dim[r(b i ) \ R(B j )] = y dim[r(a i B i ) \ R(A j B j )] = xy Intuition: If row or column subspaces are disjoint, entire K-S subspaces are disjoint
24 Rank of the Sum of K-S Dictionaries Lemma : Let the intersection of row and column range spaces be Then, dim[r(a i ) \ R(A j )] = x dim[r(b i ) \ R(B j )] = y dim[r(a i B i ) \ R(A j B j )] = xy Corollary: For almost every pair of KS dictionaries, rank(d i D T i + D j D T j )=2n 1 n 2 [2n 1 m 1 ] + [2n 2 m 2 ] + Intuition: If row or column subspaces are disjoint, entire K-S subspaces are disjoint
25 Main Result: Diversity Order Corollary : For almost any K-S classification problem, the diversity order is d = n 1 n 2 [2n 1 m 1 ] + [2n 2 m 2 ] + How does this compare to the unstructured conjecture? d =min{n 1 n 2,m 1 m 2 n 1 n 2 } = n 1 n 2 [2n 1 n 2 m 1 m 2 ] + When subspace dimensions are small, diversity is the same With larger dimension, the K-S diversity order is smaller than that of unstructured subspaces Gap is bilinear in dimensions => quadratic in overall dimension [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]
26 Numerical results: Synthetic data Synthetic data: L=2 K-S bases drawn randomly, ML classification performance plotted vs. SNR Validates numerically the diversity results
27 Numerical Results: YaleB Learned K-S dictionaries from 10 samples per class of the YaleB face dataset Compare to: standard subspace classification, classification from SIFT features Much more compact representation, competitive classification accuracy YaleB faces Subspaces SIFT[1] K-S Test Accuracy (%) Number of parameters K-S dictionary elements [Ramirez et al. Classification and clustering via dictionary learning 2010]
28 Conclusion Summary: Studied the classification performance of Kroneckerstructured subspace models Derived the exact diversity order Found the the error rate is higher than of unstructured subspace models Future work: Discriminative K-S dictionary/subspace learning Good feature extraction for K-S data Higher order tensor models
Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers
Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers Matthew Nokleby, Member, IEEE, Miguel Rodrigues, Member, IEEE, and Robert Calderbank, Fellow, IEEE Abstract arxiv:404.587v
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLECTURE NOTE #10 PROF. ALAN YUILLE
LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationRank Determination for Low-Rank Data Completion
Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,
More informationSample Complexity of Learning Mahalanobis Distance Metrics. Nakul Verma Janelia, HHMI
Sample Complexity of Learning Mahalanobis Distance Metrics Nakul Verma Janelia, HHMI feature 2 Mahalanobis Metric Learning Comparing observations in feature space: x 1 [sq. Euclidean dist] x 2 (all features
More informationTensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo
Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery Florian Römer and Giovanni Del Galdo 2nd CoSeRa, Bonn, 17-19 Sept. 2013 Ilmenau University of Technology Institute for Information
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationLecture 6 Proof for JL Lemma and Linear Dimensionality Reduction
COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives
More informationLearning sets and subspaces: a spectral approach
Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of
More informationSingular value decomposition (SVD) of large random matrices. India, 2010
Singular value decomposition (SVD) of large random matrices Marianna Bolla Budapest University of Technology and Economics marib@math.bme.hu India, 2010 Motivation New challenge of multivariate statistics:
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationLecture Notes 2: Matrices
Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector
More informationBilinear and quadratic forms
Bilinear and quadratic forms Zdeněk Dvořák April 8, 015 1 Bilinear forms Definition 1. Let V be a vector space over a field F. A function b : V V F is a bilinear form if b(u + v, w) = b(u, w) + b(v, w)
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationThe Singular-Value Decomposition
Mathematical Tools for Data Science Spring 2019 1 Motivation The Singular-Value Decomposition The singular-value decomposition (SVD) is a fundamental tool in linear algebra. In this section, we introduce
More informationA Unifying View of Image Similarity
ppears in Proceedings International Conference on Pattern Recognition, Barcelona, Spain, September 1 Unifying View of Image Similarity Nuno Vasconcelos and ndrew Lippman MIT Media Lab, nuno,lip mediamitedu
More informationON BEAMFORMING WITH FINITE RATE FEEDBACK IN MULTIPLE ANTENNA SYSTEMS
ON BEAMFORMING WITH FINITE RATE FEEDBACK IN MULTIPLE ANTENNA SYSTEMS KRISHNA KIRAN MUKKAVILLI ASHUTOSH SABHARWAL ELZA ERKIP BEHNAAM AAZHANG Abstract In this paper, we study a multiple antenna system where
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationApplication of Numerical Algebraic Geometry to Geometric Data Analysis
Application of Numerical Algebraic Geometry to Geometric Data Analysis Daniel Bates 1 Brent Davis 1 Chris Peterson 1 Michael Kirby 1 Justin Marks 2 1 Colorado State University - Fort Collins, CO 2 Air
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationSparse Subspace Clustering
Sparse Subspace Clustering Based on Sparse Subspace Clustering: Algorithm, Theory, and Applications by Elhamifar and Vidal (2013) Alex Gutierrez CSCI 8314 March 2, 2017 Outline 1 Motivation and Background
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationDimensionality Reduction and Principle Components
Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,
More informationA Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang
More informationLOW ALGEBRAIC DIMENSION MATRIX COMPLETION
LOW ALGEBRAIC DIMENSION MATRIX COMPLETION Daniel Pimentel-Alarcón Department of Computer Science Georgia State University Atlanta, GA, USA Gregory Ongie and Laura Balzano Department of Electrical Engineering
More informationA Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality
A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,
More informationEUSIPCO
EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More informationChange Detection in Multivariate Data
Change Detection in Multivariate Data Likelihood and Detectability Loss Giacomo Boracchi July, 8 th, 2016 giacomo.boracchi@polimi.it TJ Watson, IBM NY Examples of CD Problems: Anomaly Detection Examples
More informationl(y j ) = 0 for all y j (1)
Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that
More informationNonparametric Bayesian Methods
Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationLecture 9: Diversity-Multiplexing Tradeoff Theoretical Foundations of Wireless Communications 1
: Diversity-Multiplexing Tradeoff Theoretical Foundations of Wireless Communications 1 Rayleigh Friday, May 25, 2018 09:00-11:30, Kansliet 1 Textbook: D. Tse and P. Viswanath, Fundamentals of Wireless
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationSparse & Redundant Signal Representation, and its Role in Image Processing
Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationDivide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates
: A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationLearning Binary Classifiers for Multi-Class Problem
Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationDimensionality Reduction and Principal Components
Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationDétection de sources astrophysiques dans les données du spectrographe intégral de champ MUSE
Détection de sources astrophysiques dans les données du spectrographe intégral de champ MUSE David Mary, André Ferrari and Raja Fazliza R.S. Université de Nice Sophia Antipolis, France Porquerolles, 15
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationThe Lorentz group provides another interesting example. Moreover, the Lorentz group SO(3, 1) shows up in an interesting way in computer vision.
190 CHAPTER 3. REVIEW OF GROUPS AND GROUP ACTIONS 3.3 The Lorentz Groups O(n, 1), SO(n, 1) and SO 0 (n, 1) The Lorentz group provides another interesting example. Moreover, the Lorentz group SO(3, 1) shows
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationSpectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures
Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures STIAN NORMANN ANFINSEN ROBERT JENSSEN TORBJØRN ELTOFT COMPUTATIONAL EARTH OBSERVATION AND MACHINE LEARNING LABORATORY
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationSection 2: Classes of Sets
Section 2: Classes of Sets Notation: If A, B are subsets of X, then A \ B denotes the set difference, A \ B = {x A : x B}. A B denotes the symmetric difference. A B = (A \ B) (B \ A) = (A B) \ (A B). Remarks
More information(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =
. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)
More information1 Cricket chirps: an example
Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationRecall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.
Expectations of Sums of Random Variables STAT/MTHE 353: 4 - More on Expectations and Variances T. Linder Queen s University Winter 017 Recall that if X 1,...,X n are random variables with finite expectations,
More informationMinimum Error Rate Classification
Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...
More informationDesign of Projection Matrix for Compressive Sensing by Nonsmooth Optimization
Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationMAXIMUM A POSTERIORI ESTIMATION OF SIGNAL RANK. PO Box 1500, Edinburgh 5111, Australia. Arizona State University, Tempe AZ USA
MAXIMUM A POSTERIORI ESTIMATION OF SIGNAL RANK Songsri Sirianunpiboon Stephen D. Howard, Douglas Cochran 2 Defence Science Technology Organisation PO Box 500, Edinburgh 5, Australia 2 School of Mathematical
More informationA New Space for Comparing Graphs
A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main
More informationSemi-Supervised Learning by Multi-Manifold Separation
Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob
More information1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)
Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from
More informationLinear Algebra. Session 12
Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationScalable Subspace Clustering
Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns
More information