On the Spectrum of Random Features Maps of High Dimensional Data

Size: px
Start display at page:

Download "On the Spectrum of Random Features Maps of High Dimensional Data"

Transcription

1 On the Spectrum of Random Features Maps of High Dimensional Data ICML 018, Stockholm, Sweden Zhenyu Liao, Romain Couillet LS, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab, Université Grenoble-Alpes, France. Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 1 / 18

2 Outline 1 Problem Statement Main Results 3 Summary Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN / 18

3 Problem Setup Random projection/random feature maps for feature extraction: data vectors random W R n p σ( ) entry-wise feature vectors X = [x 1,..., x T ] R p T Σ = σ(wx) R n T Figure: Illustration of random feature maps Objective Gram matrix of random features G 1 n ΣT Σ (sample covariance matrix in feature space): what kind of data information are extracted? what is the impact of different nonlinearities? how to perform clustering with G, what do its eigenvectors look like? With RMT: for large n, p, T, eigenspectrum of G is determined only by 1 the average kernel matrix Φ i,j E w G i,j = E w σ(w T x i )σ(w T x j ) (function of X) the ratios between n, p, T. 1 Louart Cosme, Zhenyu Liao, and Romain Couillet. A Random Matrix Approach to Neural Networks. The Annals of Applied Probability 8, no. (018): Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 4 / 18

4 Some Known Facts Objective: spectral characterization of Φ, with Φ i,j = E w σ(w T x i )σ(w T x j ): For standard Gaussian W integral calculus on R p. Table: Φ i,j for commonly used σ( ), xt i x j x i x j. σ(t) Φ i,j t x T i x j ( ) max(t, 0) 1 π x i x j arccos ( ) + 1 ( ) t π x i x j arcsin ( ) + 1 ς + max(t, 0)+ 1 ς max( t, 0) (ς + + ς )xt i x j + x i x j ( (ς π + + ς ) 1 arccos( ) 1 1 t>0 π 1 arccos ( ) sign(t) π arcsin ( ) ( ( ) ς t + ς 1 t + ς 0 ς x T i x j + xi ) ( x j + ς1 xt i x j + ς ς 0 cos(t) sin(t) exp exp ( ( 1 1 x i ) + x j + ς0 ( x i )) + x j cosh(x T i x j ) ( x i )) + x j sinh(x T i x j ) ) ( x erf(t) T π arcsin i x j (1+ x i )(1+ x j ) exp( t ) 1 (1+ x i )(1+ x j ) (x T i x j ) ) (still) highly nonlinear functions of the data x! Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 5 / 18

5 Dig Deeper into the Average Kernel Φ Data Model Consider data from a K-class Gaussian mixture model: x i C a x i = µ a / p + ω i, with ω i N (0, C a/p), a = 1,..., K of statistical mean µ a and covariance C a. Non-trivial Classification [Neyman-Pearson Minimal] For p large, we have µ a µ b = O(1), C a = O(1) and tr(c a C b )/ p = O(1). As a consequence, x i = ω i }{{} + µ a /p + µ T a ω i/ p }{{} O(1) O(p 1 ) = tr C a/p + ω }{{} i tr C a/p }{{} O(1) O(p 1/ ) + µ a /p + µ T a ω i/ p }{{} O(p 1 ) if relaxed, classification too easy: it suffices to compare the norm x i and x j! in fact reveals a more intrinsic property of high dimensional data: Curse of dimensionality: little difference in Euclidean distance between pairs! Denote C K T = i i=1 T Ca and Ca = C a + C for a = 1,..., K. Then x i = τ + O(p 1/ ) with τ tr(c )/p, x i x j = x i + x j xi Tx j τ: Almost constant distance no matter from the same or different classes! Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 7 / 18

6 Dig Deeper into the Average Kernel Φ Why things are still working? statistical information are hidden in smaller order terms! x i x j = x i + x j x T i x j τ + ω T i ω j }{{} O(p 1/ ) + µ T a µ b /p + µt a ω j/ p + µ T b ω i/ p }{{} O(p 1 ) Small entry-wise small in matrix form (in operator norm): repeated in p p large matrix spectral clustering works! Moreover, concentration brings simplifications: for Φ i,j = E w σ(w T x i )σ(w T x j ) and ReLU, with Φ i,j = 1 ( π x i x j arccos ( ) + 1 ) xt i x j x i x j. Concentration : = 0 τ + information terms (µ a, C a)! Blessing of Dimensionality High dimensional concentration Taylor expansion to linearize Φ! Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 8 / 18

7 Main Results Asymptotic Equivalent of Φ For all σ( ) listed in the table above, we have, as n p T, Φ Φ 0 almost surely, with ( ) T ( ) Φ d 1 Ω + M JT Ω + M JT p p + d UBU T + d 0 I T and U [ J p, φ ], B [ ] tt T + S t t T. 1 Table: Coefficients d i in Φ for different σ( ). σ(t) d 1 d t 1 0 max(t, 0) πτ t 0 1 ς + max(t, 0)+ ς max( t, 0) πτ 1 4 (ς + ς ) 8τπ 1 (ς + + ς ) 1 t>0 1 πτ 0 sign(t) πτ 0 ς t + ς 1 t + ς 0 ς1 ς cos(t) 0 e τ 4 sin(t) e τ 0 erf(t) 4 1 π 0 τ+1 exp( t ) 0 1 4(τ+1) 3 With J [j 1,..., j K ], j a canonical vector of C a: (j a) i = δ xi C a (for clustering), weighted by Ω, φ random fluctuations { of data. M [µ 1,..., µ K ], t tr C a/ } K p, S {tr(cac b)/p} K a=1 a,b=1 statistical information from data distribution. Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 9 / 18

8 Consequence Table: Coefficients d i in Φ for different σ( ). σ(t) d 1 d t 1 0 max(t, 0) πτ t 0 1 ς + max(t, 0)+ ς max( t, 0) πτ 1 4 (ς + ς ) 8τπ 1 (ς + + ς ) 1 t>0 1 πτ 0 sign(t) πτ 0 ς t + ς 1 t + ς 0 ς1 ς cos(t) 0 e τ 4 sin(t) e τ 0 erf(t) 4 1 π 0 τ+1 exp( t ) 0 1 4(τ+1) 3 A natural classification of σ( ): mean-oriented, d 1 0, d = 0: t, 1 t>0, sign(t), sin(t) and erf(t) separate with difference in means M; covariance-oriented, d 1 = 0, d 0: t, cos(t) and exp( t /) track differences in covariances t, S; balanced, both d 1, d 0: ReLU function max(t, 0), Leaky ReLU function ς + max(t, 0) + ς max( t, 0), quadratic function ς t + ς 1t + ς 0. make use of both statistics! Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 10 / 18

9 Numerical Validations: Gaussian Data Example: Gaussian mixture data of four classes: N (µ 1, C 1 ), N (µ 1, C ), N (µ, C 1 ) and N (µ, C ) with Leaky ReLU function ς + max(t, 0) + ς max( t, 0). Case 1: ς + = ς = 1 (equivalent to linear map σ(t) = t) C 1 C C 3 C 4 C 1 C C 3 C 4 Eigenvector 1 Eigenvector Case : ς + = ς = 1 (equivalent to σ(t) = t ) C 1 C C 3 C 4 C 1 C C 3 C 4 Eigenvector 1 Eigenvector Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 11 / 18

10 Numerical Validations: Gaussian Data Case 3: ς + = 1, ς = 0 (the ReLU function) C 1 C C 3 C 4 C 1 C C 3 C 4 Eigenvector 1 Eigenvector Eigenvector Eigenvector 1 Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 1 / 18

11 Numerical Validations: Real Datasets Figure: The MNIST image database. time Figure: The epileptic EEG datasets. Reproducibility: codes available at Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 13 / 18

12 Numerical Validations: Real Datasets Table: Empirical estimation of differences in means and covariances of the MNIST and epileptic EEG datasets. M T M tt T + S MNIST data EEG data Table: Clustering accuracies on MNIST dataset. σ(t) T = 64 T = 18 t 88.94% 87.30% 1 t>0 8.94% 85.56% sign(t) 83.34% 85.% sin(t) 87.81% 87.50% erf(t) 87.8% 86.59% t 60.41% 57.81% cos(t) 59.56% 57.7% exp( t ) 60.44% 58.67% balanced ReLU(t) 85.7% 8.7% Table: Clustering accuracies on EEG dataset. meanoriented covoriented meanoriented covoriented σ(t) T = 64 T = 18 t 70.31% 69.58% 1 t> % 63.47% sign(t) 64.63% 63.03% sin(t) 70.34% 68.% erf(t) 70.59% 67.70% t 99.69% 99.50% cos(t) 99.38% 99.36% exp( t ) 99.81% 99.77% balanced ReLU(t) 87.91% 90.97% Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 14 / 18

13 Numerical Validations: Real Datasets Leading eigenvector for MNIST data Simulation: mean/std for MNIST data Theory: mean/std for Gaussian data C 1 C Leading eigenvector for EEG data Simulation: mean/std for EEG data Theory: mean/std for Gaussian data C 1 C Figure: Leading eigenvector of Φ for the MNIST (top) and EEG (bottom) with Gaussian mixture data (of same statistics) with a width of ±1 standard deviations. Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 15 / 18

14 Summary Take-away message: concentration of high dimensional data to handle the nonlinearity different nonlinearities into three attributes: mean-, covariance-oriented and balanced optimize the choice of nonlinearity as a function of data (quadratic and LReLU) novel insight into understanding of neural networks for high dimensional data Future work: study of the eigenvalue distribution the (asymptotic) behavior of leading eigenvectors combination of different type of nonlinearities, e.g., sin + cos Gaussian kernel directly linking σ( ) and the coefficients d 0, d 1 and d Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 17 / 18

15 Thank you Thank you! Poster # 6 Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 18 / 18

Random Matrix Theory for Neural Networks

Random Matrix Theory for Neural Networks Random Matrix Theory for Neural Networks Ph.D. Mid-Term Evaluation Zhenyu Liao Laboratoire des Signaux et Systèmes CentraleSupélec Université Paris-Saclay Salle sd.207, Bâtiment Bouygues Gif-sur-Yvette,

More information

The Dynamics of Learning: A Random Matrix Approach

The Dynamics of Learning: A Random Matrix Approach The Dynamics of Learning: A Random Matrix Approach ICML 2018, Stockholm, Sweden Zhenyu Liao, Romain Couillet L2S, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab,

More information

On the Spectrum of Random Features Maps of High Dimensional Data

On the Spectrum of Random Features Maps of High Dimensional Data Zhenyu Liao * Romain Couillet * Abstract Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to

More information

Random Matrices in Machine Learning

Random Matrices in Machine Learning Random Matrices in Machine Learning Romain COUILLET CentraleSupélec, University of ParisSaclay, France GSTATS IDEX DataScience Chair, GIPSA-lab, University Grenoble Alpes, France. June 21, 2018 1 / 113

More information

A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications

A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications (EURECOM) Romain COUILLET CentraleSupélec, France June, 2017 1 / 80 Outline Basics of Random Matrix Theory

More information

A Random Matrix Framework for BigData Machine Learning

A Random Matrix Framework for BigData Machine Learning A Random Matrix Framework for BigData Machine Learning (Groupe Deep Learning, DigiCosme) Romain COUILLET CentraleSupélec, France June, 2017 1 / 63 Outline Basics of Random Matrix Theory Motivation: Large

More information

Learning sets and subspaces: a spectral approach

Learning sets and subspaces: a spectral approach Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

A RANDOM MATRIX APPROACH TO NEURAL NETWORKS. By Cosme Louart, Zhenyu Liao, and Romain Couillet CentraleSupélec, University of Paris Saclay, France.

A RANDOM MATRIX APPROACH TO NEURAL NETWORKS. By Cosme Louart, Zhenyu Liao, and Romain Couillet CentraleSupélec, University of Paris Saclay, France. Submitted to the Annals of Applied Probability A RANDOM MARIX APPROACH O NEURAL NEWORKS By Cosme Louart, Zhenyu Liao, and Romain Couillet CentraleSupélec, University of Paris Saclay, France. his article

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Examples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling

Examples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling 1 Introduction Many natural processes can be viewed as dynamical systems, where the system is represented by a set of state variables and its evolution governed by a set of differential equations. Examples

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

MATH 251 Examination II April 7, 2014 FORM A. Name: Student Number: Section:

MATH 251 Examination II April 7, 2014 FORM A. Name: Student Number: Section: MATH 251 Examination II April 7, 2014 FORM A Name: Student Number: Section: This exam has 12 questions for a total of 100 points. In order to obtain full credit for partial credit problems, all work must

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Research Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38

Research Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38 Research Program Romain COUILLET CentraleSupélec Université Paris-Sud 11 11 février 2016 1 / 38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large

More information

Random Matrices for Big Data Signal Processing and Machine Learning

Random Matrices for Big Data Signal Processing and Machine Learning Random Matrices for Big Data Signal Processing and Machine Learning (ICASSP 2017, New Orleans) Romain COUILLET and Hafiz TIOMOKO ALI CentraleSupélec, France March, 2017 1 / 153 Outline Basics of Random

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

21 Linear State-Space Representations

21 Linear State-Space Representations ME 132, Spring 25, UC Berkeley, A Packard 187 21 Linear State-Space Representations First, let s describe the most general type of dynamic system that we will consider/encounter in this class Systems may

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose

More information

Classification. Sandro Cumani. Politecnico di Torino

Classification. Sandro Cumani. Politecnico di Torino Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to

More information

Normalization Techniques

Normalization Techniques Normalization Techniques Devansh Arpit Normalization Techniques 1 / 39 Table of Contents 1 Introduction 2 Motivation 3 Batch Normalization 4 Normalization Propagation 5 Weight Normalization 6 Layer Normalization

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Presented

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Discussion of Hypothesis testing by convex optimization

Discussion of Hypothesis testing by convex optimization Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

Chapter III. Stability of Linear Systems

Chapter III. Stability of Linear Systems 1 Chapter III Stability of Linear Systems 1. Stability and state transition matrix 2. Time-varying (non-autonomous) systems 3. Time-invariant systems 1 STABILITY AND STATE TRANSITION MATRIX 2 In this chapter,

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.011: Introduction to Communication, Control and Signal Processing QUIZ 1, March 16, 2010 ANSWER BOOKLET

More information

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Kernels for Dynamic Textures

Kernels for Dynamic Textures Kernels for Dynamic Textures S.V.N. Vishwanathan SVN.Vishwanathan@nicta.com.au http://web.anu.edu.au/~vishy National ICT Australia and Australian National University Joint work with Alex Smola and René

More information

Efficient Complex Output Prediction

Efficient Complex Output Prediction Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Non white sample covariance matrices.

Non white sample covariance matrices. Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Free probability and quantum information

Free probability and quantum information Free probability and quantum information Benoît Collins WPI-AIMR, Tohoku University & University of Ottawa Tokyo, Nov 8, 2013 Overview Overview Plan: 1. Quantum Information theory: the additivity problem

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Smooth Common Principal Component Analysis

Smooth Common Principal Component Analysis 1 Smooth Common Principal Component Analysis Michal Benko Wolfgang Härdle Center for Applied Statistics and Economics benko@wiwi.hu-berlin.de Humboldt-Universität zu Berlin Motivation 1-1 Volatility Surface

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

The goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T

The goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T 1 1 Linear Systems The goal of this chapter is to study linear systems of ordinary differential equations: ẋ = Ax, x(0) = x 0, (1) where x R n, A is an n n matrix and ẋ = dx ( dt = dx1 dt,..., dx ) T n.

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Markov operators, classical orthogonal polynomial ensembles, and random matrices

Markov operators, classical orthogonal polynomial ensembles, and random matrices Markov operators, classical orthogonal polynomial ensembles, and random matrices M. Ledoux, Institut de Mathématiques de Toulouse, France 5ecm Amsterdam, July 2008 recent study of random matrix and random

More information

Unsupervised Kernel Dimension Reduction Supplemental Material

Unsupervised Kernel Dimension Reduction Supplemental Material Unsupervised Kernel Dimension Reduction Supplemental Material Meihong Wang Dept. of Computer Science U. of Southern California Los Angeles, CA meihongw@usc.edu Fei Sha Dept. of Computer Science U. of Southern

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Karhunen-Loève decomposition of Gaussian measures on Banach spaces Karhunen-Loève decomposition of Gaussian measures on Banach spaces Jean-Charles Croix jean-charles.croix@emse.fr Génie Mathématique et Industriel (GMI) First workshop on Gaussian processes at Saint-Etienne

More information

Grothendieck s Inequality

Grothendieck s Inequality Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A

More information

Lecture on Parameter Estimation for Stochastic Differential Equations. Erik Lindström

Lecture on Parameter Estimation for Stochastic Differential Equations. Erik Lindström Lecture on Parameter Estimation for Stochastic Differential Equations Erik Lindström Recap We are interested in the parameters θ in the Stochastic Integral Equations X(t) = X(0) + t 0 µ θ (s, X(s))ds +

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Gaussian Process Optimization with Mutual Information

Gaussian Process Optimization with Mutual Information Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

CLOSE-TO-CLEAN REGULARIZATION RELATES

CLOSE-TO-CLEAN REGULARIZATION RELATES Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of

More information

Understanding Big Data Spectral Clustering

Understanding Big Data Spectral Clustering Understanding Big Data Sectral Clustering Romain Couillet, Florent Benaych-Georges To cite this version: Romain Couillet, Florent Benaych-Georges. Understanding Big Data Sectral Clustering. IEEE 6th International

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.11: Introduction to Communication, Control and Signal Processing QUIZ 1, March 16, 21 QUESTION BOOKLET

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 15. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Bayesian simultaneous regression and dimension reduction

Bayesian simultaneous regression and dimension reduction Bayesian simultaneous regression and dimension reduction MCMski II Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University January 10, 2008

More information

Synchronization Transitions in Complex Networks

Synchronization Transitions in Complex Networks Synchronization Transitions in Complex Networks Y. Moreno 1,2,3 1 Institute for Biocomputation and Physics of Complex Systems (BIFI) University of Zaragoza, Zaragoza 50018, Spain 2 Department of Theoretical

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

Synchronization of a General Delayed Complex Dynamical Network via Adaptive Feedback

Synchronization of a General Delayed Complex Dynamical Network via Adaptive Feedback Synchronization of a General Delayed Complex Dynamical Network via Adaptive Feedback Qunjiao Zhang and Junan Lu College of Mathematics and Statistics State Key Laboratory of Software Engineering Wuhan

More information

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Nathan Noiry Modal X, Université Paris Nanterre May 3, 2018 Wigner s Matrices Wishart s Matrices Generalizations Wigner s Matrices ij, (n, i, j

More information

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Learning SVM Classifiers with Indefinite Kernels

Learning SVM Classifiers with Indefinite Kernels Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Jordan normal form notes (version date: 11/21/07)

Jordan normal form notes (version date: 11/21/07) Jordan normal form notes (version date: /2/7) If A has an eigenbasis {u,, u n }, ie a basis made up of eigenvectors, so that Au j = λ j u j, then A is diagonal with respect to that basis To see this, let

More information

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification

More information