On the Spectrum of Random Features Maps of High Dimensional Data
|
|
- Amie Gallagher
- 5 years ago
- Views:
Transcription
1 On the Spectrum of Random Features Maps of High Dimensional Data ICML 018, Stockholm, Sweden Zhenyu Liao, Romain Couillet LS, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab, Université Grenoble-Alpes, France. Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 1 / 18
2 Outline 1 Problem Statement Main Results 3 Summary Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN / 18
3 Problem Setup Random projection/random feature maps for feature extraction: data vectors random W R n p σ( ) entry-wise feature vectors X = [x 1,..., x T ] R p T Σ = σ(wx) R n T Figure: Illustration of random feature maps Objective Gram matrix of random features G 1 n ΣT Σ (sample covariance matrix in feature space): what kind of data information are extracted? what is the impact of different nonlinearities? how to perform clustering with G, what do its eigenvectors look like? With RMT: for large n, p, T, eigenspectrum of G is determined only by 1 the average kernel matrix Φ i,j E w G i,j = E w σ(w T x i )σ(w T x j ) (function of X) the ratios between n, p, T. 1 Louart Cosme, Zhenyu Liao, and Romain Couillet. A Random Matrix Approach to Neural Networks. The Annals of Applied Probability 8, no. (018): Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 4 / 18
4 Some Known Facts Objective: spectral characterization of Φ, with Φ i,j = E w σ(w T x i )σ(w T x j ): For standard Gaussian W integral calculus on R p. Table: Φ i,j for commonly used σ( ), xt i x j x i x j. σ(t) Φ i,j t x T i x j ( ) max(t, 0) 1 π x i x j arccos ( ) + 1 ( ) t π x i x j arcsin ( ) + 1 ς + max(t, 0)+ 1 ς max( t, 0) (ς + + ς )xt i x j + x i x j ( (ς π + + ς ) 1 arccos( ) 1 1 t>0 π 1 arccos ( ) sign(t) π arcsin ( ) ( ( ) ς t + ς 1 t + ς 0 ς x T i x j + xi ) ( x j + ς1 xt i x j + ς ς 0 cos(t) sin(t) exp exp ( ( 1 1 x i ) + x j + ς0 ( x i )) + x j cosh(x T i x j ) ( x i )) + x j sinh(x T i x j ) ) ( x erf(t) T π arcsin i x j (1+ x i )(1+ x j ) exp( t ) 1 (1+ x i )(1+ x j ) (x T i x j ) ) (still) highly nonlinear functions of the data x! Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 5 / 18
5 Dig Deeper into the Average Kernel Φ Data Model Consider data from a K-class Gaussian mixture model: x i C a x i = µ a / p + ω i, with ω i N (0, C a/p), a = 1,..., K of statistical mean µ a and covariance C a. Non-trivial Classification [Neyman-Pearson Minimal] For p large, we have µ a µ b = O(1), C a = O(1) and tr(c a C b )/ p = O(1). As a consequence, x i = ω i }{{} + µ a /p + µ T a ω i/ p }{{} O(1) O(p 1 ) = tr C a/p + ω }{{} i tr C a/p }{{} O(1) O(p 1/ ) + µ a /p + µ T a ω i/ p }{{} O(p 1 ) if relaxed, classification too easy: it suffices to compare the norm x i and x j! in fact reveals a more intrinsic property of high dimensional data: Curse of dimensionality: little difference in Euclidean distance between pairs! Denote C K T = i i=1 T Ca and Ca = C a + C for a = 1,..., K. Then x i = τ + O(p 1/ ) with τ tr(c )/p, x i x j = x i + x j xi Tx j τ: Almost constant distance no matter from the same or different classes! Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 7 / 18
6 Dig Deeper into the Average Kernel Φ Why things are still working? statistical information are hidden in smaller order terms! x i x j = x i + x j x T i x j τ + ω T i ω j }{{} O(p 1/ ) + µ T a µ b /p + µt a ω j/ p + µ T b ω i/ p }{{} O(p 1 ) Small entry-wise small in matrix form (in operator norm): repeated in p p large matrix spectral clustering works! Moreover, concentration brings simplifications: for Φ i,j = E w σ(w T x i )σ(w T x j ) and ReLU, with Φ i,j = 1 ( π x i x j arccos ( ) + 1 ) xt i x j x i x j. Concentration : = 0 τ + information terms (µ a, C a)! Blessing of Dimensionality High dimensional concentration Taylor expansion to linearize Φ! Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 8 / 18
7 Main Results Asymptotic Equivalent of Φ For all σ( ) listed in the table above, we have, as n p T, Φ Φ 0 almost surely, with ( ) T ( ) Φ d 1 Ω + M JT Ω + M JT p p + d UBU T + d 0 I T and U [ J p, φ ], B [ ] tt T + S t t T. 1 Table: Coefficients d i in Φ for different σ( ). σ(t) d 1 d t 1 0 max(t, 0) πτ t 0 1 ς + max(t, 0)+ ς max( t, 0) πτ 1 4 (ς + ς ) 8τπ 1 (ς + + ς ) 1 t>0 1 πτ 0 sign(t) πτ 0 ς t + ς 1 t + ς 0 ς1 ς cos(t) 0 e τ 4 sin(t) e τ 0 erf(t) 4 1 π 0 τ+1 exp( t ) 0 1 4(τ+1) 3 With J [j 1,..., j K ], j a canonical vector of C a: (j a) i = δ xi C a (for clustering), weighted by Ω, φ random fluctuations { of data. M [µ 1,..., µ K ], t tr C a/ } K p, S {tr(cac b)/p} K a=1 a,b=1 statistical information from data distribution. Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 9 / 18
8 Consequence Table: Coefficients d i in Φ for different σ( ). σ(t) d 1 d t 1 0 max(t, 0) πτ t 0 1 ς + max(t, 0)+ ς max( t, 0) πτ 1 4 (ς + ς ) 8τπ 1 (ς + + ς ) 1 t>0 1 πτ 0 sign(t) πτ 0 ς t + ς 1 t + ς 0 ς1 ς cos(t) 0 e τ 4 sin(t) e τ 0 erf(t) 4 1 π 0 τ+1 exp( t ) 0 1 4(τ+1) 3 A natural classification of σ( ): mean-oriented, d 1 0, d = 0: t, 1 t>0, sign(t), sin(t) and erf(t) separate with difference in means M; covariance-oriented, d 1 = 0, d 0: t, cos(t) and exp( t /) track differences in covariances t, S; balanced, both d 1, d 0: ReLU function max(t, 0), Leaky ReLU function ς + max(t, 0) + ς max( t, 0), quadratic function ς t + ς 1t + ς 0. make use of both statistics! Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 10 / 18
9 Numerical Validations: Gaussian Data Example: Gaussian mixture data of four classes: N (µ 1, C 1 ), N (µ 1, C ), N (µ, C 1 ) and N (µ, C ) with Leaky ReLU function ς + max(t, 0) + ς max( t, 0). Case 1: ς + = ς = 1 (equivalent to linear map σ(t) = t) C 1 C C 3 C 4 C 1 C C 3 C 4 Eigenvector 1 Eigenvector Case : ς + = ς = 1 (equivalent to σ(t) = t ) C 1 C C 3 C 4 C 1 C C 3 C 4 Eigenvector 1 Eigenvector Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 11 / 18
10 Numerical Validations: Gaussian Data Case 3: ς + = 1, ς = 0 (the ReLU function) C 1 C C 3 C 4 C 1 C C 3 C 4 Eigenvector 1 Eigenvector Eigenvector Eigenvector 1 Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 1 / 18
11 Numerical Validations: Real Datasets Figure: The MNIST image database. time Figure: The epileptic EEG datasets. Reproducibility: codes available at Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 13 / 18
12 Numerical Validations: Real Datasets Table: Empirical estimation of differences in means and covariances of the MNIST and epileptic EEG datasets. M T M tt T + S MNIST data EEG data Table: Clustering accuracies on MNIST dataset. σ(t) T = 64 T = 18 t 88.94% 87.30% 1 t>0 8.94% 85.56% sign(t) 83.34% 85.% sin(t) 87.81% 87.50% erf(t) 87.8% 86.59% t 60.41% 57.81% cos(t) 59.56% 57.7% exp( t ) 60.44% 58.67% balanced ReLU(t) 85.7% 8.7% Table: Clustering accuracies on EEG dataset. meanoriented covoriented meanoriented covoriented σ(t) T = 64 T = 18 t 70.31% 69.58% 1 t> % 63.47% sign(t) 64.63% 63.03% sin(t) 70.34% 68.% erf(t) 70.59% 67.70% t 99.69% 99.50% cos(t) 99.38% 99.36% exp( t ) 99.81% 99.77% balanced ReLU(t) 87.91% 90.97% Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 14 / 18
13 Numerical Validations: Real Datasets Leading eigenvector for MNIST data Simulation: mean/std for MNIST data Theory: mean/std for Gaussian data C 1 C Leading eigenvector for EEG data Simulation: mean/std for EEG data Theory: mean/std for Gaussian data C 1 C Figure: Leading eigenvector of Φ for the MNIST (top) and EEG (bottom) with Gaussian mixture data (of same statistics) with a width of ±1 standard deviations. Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 15 / 18
14 Summary Take-away message: concentration of high dimensional data to handle the nonlinearity different nonlinearities into three attributes: mean-, covariance-oriented and balanced optimize the choice of nonlinearity as a function of data (quadratic and LReLU) novel insight into understanding of neural networks for high dimensional data Future work: study of the eigenvalue distribution the (asymptotic) behavior of leading eigenvectors combination of different type of nonlinearities, e.g., sin + cos Gaussian kernel directly linking σ( ) and the coefficients d 0, d 1 and d Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 17 / 18
15 Thank you Thank you! Poster # 6 Z. Liao, R. Couillet (CentraleSupélec & UG-A) On the Spectrum of RFM of High Dimensional Data ICML 018, Stockholm, SWEDEN 18 / 18
Random Matrix Theory for Neural Networks
Random Matrix Theory for Neural Networks Ph.D. Mid-Term Evaluation Zhenyu Liao Laboratoire des Signaux et Systèmes CentraleSupélec Université Paris-Saclay Salle sd.207, Bâtiment Bouygues Gif-sur-Yvette,
More informationThe Dynamics of Learning: A Random Matrix Approach
The Dynamics of Learning: A Random Matrix Approach ICML 2018, Stockholm, Sweden Zhenyu Liao, Romain Couillet L2S, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab,
More informationOn the Spectrum of Random Features Maps of High Dimensional Data
Zhenyu Liao * Romain Couillet * Abstract Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to
More informationRandom Matrices in Machine Learning
Random Matrices in Machine Learning Romain COUILLET CentraleSupélec, University of ParisSaclay, France GSTATS IDEX DataScience Chair, GIPSA-lab, University Grenoble Alpes, France. June 21, 2018 1 / 113
More informationA Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications
A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications (EURECOM) Romain COUILLET CentraleSupélec, France June, 2017 1 / 80 Outline Basics of Random Matrix Theory
More informationA Random Matrix Framework for BigData Machine Learning
A Random Matrix Framework for BigData Machine Learning (Groupe Deep Learning, DigiCosme) Romain COUILLET CentraleSupélec, France June, 2017 1 / 63 Outline Basics of Random Matrix Theory Motivation: Large
More informationLearning sets and subspaces: a spectral approach
Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationA RANDOM MATRIX APPROACH TO NEURAL NETWORKS. By Cosme Louart, Zhenyu Liao, and Romain Couillet CentraleSupélec, University of Paris Saclay, France.
Submitted to the Annals of Applied Probability A RANDOM MARIX APPROACH O NEURAL NEWORKS By Cosme Louart, Zhenyu Liao, and Romain Couillet CentraleSupélec, University of Paris Saclay, France. his article
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationExamples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling
1 Introduction Many natural processes can be viewed as dynamical systems, where the system is represented by a set of state variables and its evolution governed by a set of differential equations. Examples
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationMATH 251 Examination II April 7, 2014 FORM A. Name: Student Number: Section:
MATH 251 Examination II April 7, 2014 FORM A Name: Student Number: Section: This exam has 12 questions for a total of 100 points. In order to obtain full credit for partial credit problems, all work must
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationResearch Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38
Research Program Romain COUILLET CentraleSupélec Université Paris-Sud 11 11 février 2016 1 / 38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large
More informationRandom Matrices for Big Data Signal Processing and Machine Learning
Random Matrices for Big Data Signal Processing and Machine Learning (ICASSP 2017, New Orleans) Romain COUILLET and Hafiz TIOMOKO ALI CentraleSupélec, France March, 2017 1 / 153 Outline Basics of Random
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationGraph Metrics and Dimension Reduction
Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More information21 Linear State-Space Representations
ME 132, Spring 25, UC Berkeley, A Packard 187 21 Linear State-Space Representations First, let s describe the most general type of dynamic system that we will consider/encounter in this class Systems may
More informationRegularized Least Squares
Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose
More informationClassification. Sandro Cumani. Politecnico di Torino
Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to
More informationNormalization Techniques
Normalization Techniques Devansh Arpit Normalization Techniques 1 / 39 Table of Contents 1 Introduction 2 Motivation 3 Batch Normalization 4 Normalization Propagation 5 Weight Normalization 6 Layer Normalization
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationNonlinear Statistical Learning with Truncated Gaussian Graphical Models
Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Presented
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationDiscussion of Hypothesis testing by convex optimization
Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationSupplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017
Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.
More informationChapter III. Stability of Linear Systems
1 Chapter III Stability of Linear Systems 1. Stability and state transition matrix 2. Time-varying (non-autonomous) systems 3. Time-invariant systems 1 STABILITY AND STATE TRANSITION MATRIX 2 In this chapter,
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.011: Introduction to Communication, Control and Signal Processing QUIZ 1, March 16, 2010 ANSWER BOOKLET
More informationGlobal (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction
Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationOn-line Variance Minimization
On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More informationKernels for Dynamic Textures
Kernels for Dynamic Textures S.V.N. Vishwanathan SVN.Vishwanathan@nicta.com.au http://web.anu.edu.au/~vishy National ICT Australia and Australian National University Joint work with Alex Smola and René
More informationEfficient Complex Output Prediction
Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationNon white sample covariance matrices.
Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationFree probability and quantum information
Free probability and quantum information Benoît Collins WPI-AIMR, Tohoku University & University of Ottawa Tokyo, Nov 8, 2013 Overview Overview Plan: 1. Quantum Information theory: the additivity problem
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationSmooth Common Principal Component Analysis
1 Smooth Common Principal Component Analysis Michal Benko Wolfgang Härdle Center for Applied Statistics and Economics benko@wiwi.hu-berlin.de Humboldt-Universität zu Berlin Motivation 1-1 Volatility Surface
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationThe goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T
1 1 Linear Systems The goal of this chapter is to study linear systems of ordinary differential equations: ẋ = Ax, x(0) = x 0, (1) where x R n, A is an n n matrix and ẋ = dx ( dt = dx1 dt,..., dx ) T n.
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationMarkov operators, classical orthogonal polynomial ensembles, and random matrices
Markov operators, classical orthogonal polynomial ensembles, and random matrices M. Ledoux, Institut de Mathématiques de Toulouse, France 5ecm Amsterdam, July 2008 recent study of random matrix and random
More informationUnsupervised Kernel Dimension Reduction Supplemental Material
Unsupervised Kernel Dimension Reduction Supplemental Material Meihong Wang Dept. of Computer Science U. of Southern California Los Angeles, CA meihongw@usc.edu Fei Sha Dept. of Computer Science U. of Southern
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationKarhunen-Loève decomposition of Gaussian measures on Banach spaces
Karhunen-Loève decomposition of Gaussian measures on Banach spaces Jean-Charles Croix jean-charles.croix@emse.fr Génie Mathématique et Industriel (GMI) First workshop on Gaussian processes at Saint-Etienne
More informationGrothendieck s Inequality
Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A
More informationLecture on Parameter Estimation for Stochastic Differential Equations. Erik Lindström
Lecture on Parameter Estimation for Stochastic Differential Equations Erik Lindström Recap We are interested in the parameters θ in the Stochastic Integral Equations X(t) = X(0) + t 0 µ θ (s, X(s))ds +
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationGaussian Process Optimization with Mutual Information
Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationData Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings
Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More informationCLOSE-TO-CLEAN REGULARIZATION RELATES
Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of
More informationUnderstanding Big Data Spectral Clustering
Understanding Big Data Sectral Clustering Romain Couillet, Florent Benaych-Georges To cite this version: Romain Couillet, Florent Benaych-Georges. Understanding Big Data Sectral Clustering. IEEE 6th International
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.11: Introduction to Communication, Control and Signal Processing QUIZ 1, March 16, 21 QUESTION BOOKLET
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 15. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationBayesian simultaneous regression and dimension reduction
Bayesian simultaneous regression and dimension reduction MCMski II Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University January 10, 2008
More informationSynchronization Transitions in Complex Networks
Synchronization Transitions in Complex Networks Y. Moreno 1,2,3 1 Institute for Biocomputation and Physics of Complex Systems (BIFI) University of Zaragoza, Zaragoza 50018, Spain 2 Department of Theoretical
More informationLecture 3: Huge-scale optimization problems
Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1
More informationSynchronization of a General Delayed Complex Dynamical Network via Adaptive Feedback
Synchronization of a General Delayed Complex Dynamical Network via Adaptive Feedback Qunjiao Zhang and Junan Lu College of Mathematics and Statistics State Key Laboratory of Software Engineering Wuhan
More informationRandom Matrices: Beyond Wigner and Marchenko-Pastur Laws
Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Nathan Noiry Modal X, Université Paris Nanterre May 3, 2018 Wigner s Matrices Wishart s Matrices Generalizations Wigner s Matrices ij, (n, i, j
More informationLarge sample covariance matrices and the T 2 statistic
Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationTechniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods
Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationLearning SVM Classifiers with Indefinite Kernels
Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationJordan normal form notes (version date: 11/21/07)
Jordan normal form notes (version date: /2/7) If A has an eigenbasis {u,, u n }, ie a basis made up of eigenvectors, so that Au j = λ j u j, then A is diagonal with respect to that basis To see this, let
More informationMultisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues
Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification
More information