Research Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38
|
|
- Cathleen King
- 5 years ago
- Views:
Transcription
1 Research Program Romain COUILLET CentraleSupélec Université Paris-Sud février / 38
2 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 2 / 38
3 Curriculum Vitae/ 3/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 3 / 38
4 Curriculum Vitae/ 4/38 Education and Professional Experience Professional Experience Full Professor since Jan CentraleSupélec, Gif sur Yvette, France. Telecom Department, LANEAS group, Dvision Signals & Stats Diplomas Habilitation à Diriger des Recherches Feb Place University Paris Saclay, France Topic Robust Estimation in the Large Random Matrix Regime PhD in Physics Nov Place CentraleSupélec, Gif sur Yvette, France Topic Application of Random Matrix Theory to Future Wireless Flexible Networks Advis. Mérouane Debbah Engineer and Master Diplomas Mar Place Telecom ParisTech, Paris, France Grade Very Good (Très Bien) Topic Communications, embedded systems, computer science. 4 / 38
5 Curriculum Vitae/ 5/38 Teaching Activities and Research Projects Teaching Activities ENS Cachan, Cachan, France since 2013 Courses Master MVA, 18 hrs/year CentraleSupélec, Gif sur Yvette, France since 2011 Courses PhD level, 18 hrs/year Master SAR, research seminars, 24 hrs/year Undergraduate, lectures + practical courses, 70 hrs/year Advising Interns, undergraduate projects, 8/year Research : Projects HUAWEI RMTin5G 100% (PI) ANR RMT4GRAPH 100% (PI) ERC MORE 50% ANR DIONISOS 25% Research : Community Life Special Session organizations 4 IEEE Senior Member since 2015 IEEE SPTM technical committee member since 2014 IEEE TSP Associate Editor since 2015 Member of GRETSI since / 38
6 Curriculum Vitae/ 6/38 PhD students Axel MÜLLER (research engineer at HUAWEI Labs, Paris) Subject Random matrix models for multi-cellular wireless communications Advising 50%, with M. Debbah (CentraleSupélec) Publications 3 articles in IEEE-JSTSP, -TIT, -TSP, 5 IEEE conferences Awards 1 best student paper award. Julia VINOGRADOVA (postdoc at Linköping University, Sweden) Subject Random matrix theory applied to detection and estimation in antenna arrays Advising 50%, with W. Hachem (Telecom ParisTech) Publications 2 articles in IEEE-TSP, 2 IEEE conferences Azary ABBOUD (postdoc at INRIA, France) Subject Distributed optimization in smart grids Advising 33%, with M. Debbah and H. Siguerdidjane (CentraleSupélec) Publications 1 article in IEEE-TSP, 1 IEEE conference Gil KATZ Subject Interactive communications for distributed computation Advising 33%, with M. Debbah and P. Piantanida (CentraleSupélec) Publications 1 IEEE conference Hafiz TIOMOKO ALI Subject Random matrices in machine learning Advising 100% Publications 2 IEEE conferences. 6 / 38
7 Curriculum Vitae/ 7/38 Research Activities Publication Record (as of February 1st, 2016) Publications Books : 1, Chapters : 3, Journals : 36, Conferences : 53, Patents : 4. Citations 1256 (five best : 282, 204, 84, 49, 33) Indices h-index : 17, i10-index : 25 Subjects Mathematics Applications random matrix theory, statistics machine learning, signal processing, communications Journal Articles per Area 8 6 Telecom Signal Maths/stats Learning soumis 7 / 38
8 Curriculum Vitae/ 8/38 Research Activities Prizes and Awards IEEE Senior Member 2016 CNRS Bronze Medal (section INS2I) 2013 IEEE ComSoc Outstanding Young Researcher Award (EMEA region) 2013 EEA/GdR ISIS/GRETSI PhD thesis award 2011 Paper Awards Second prize of the IEEE Australia Council Student Paper Contest 2013 Best Student Paper Award Final of the IEEE Asilomar Conference 2011 Best Student Paper Award of the ValueTools Conference / 38
9 Research Project : Learning in Large Dimensions/ 9/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 9 / 38
10 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 10 / 38
11 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 10 / 38
12 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 10 / 38
13 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity 10 / 38
14 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 10 / 38
15 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 10 / 38
16 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 10 / 38
17 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) 10 / 38
18 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) Our approach 10 / 38
19 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) Our approach 1. development of methods and mathematical tools to handle large and numerous datasets 10 / 38
20 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) Our approach 1. development of methods and mathematical tools to handle large and numerous datasets 2. revisit robust statistics in large dimensions 10 / 38
21 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) Our approach 1. development of methods and mathematical tools to handle large and numerous datasets 2. revisit robust statistics in large dimensions 3. (for lack of better approach) better understand and improve ad-hoc techniques on simple but large dimensional models. 10 / 38
22 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 11/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 11 / 38
23 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers 12 / 38
24 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers Several Estimators for C p 12 / 38
25 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers Several Estimators for C p ML estimator in Gaussian case : sample covariance matrix Ĉ p = 1 n x i x i n. i=1 very practical, used in many methods but very sensitive to outliers 12 / 38
26 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers Several Estimators for C p ML estimator in Gaussian case : sample covariance matrix Ĉ p = 1 n x i x i n. i=1 very practical, used in many methods but very sensitive to outliers [Huber 67 ; Maronna 76] Robust Estimators (outliers or heavy tails) Ĉ p = 1 n ( ) 1 u n p x i Ĉ 1 p x i x i x i. i=1 12 / 38
27 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers Several Estimators for C p ML estimator in Gaussian case : sample covariance matrix Ĉ p = 1 n x i x i n. i=1 very practical, used in many methods but very sensitive to outliers [Huber 67 ; Maronna 76] Robust Estimators (outliers or heavy tails) Ĉ p = 1 n ( ) 1 u n p x i Ĉ 1 p x i x i x i. i=1 [Pascal 13 ; Chen 11] Regularized Versions for Large Data (all n, p), Ĉ p(ρ) = (1 ρ) 1 n x i x i + ρi n 1 N. i=1 p x i Ĉ 1 p (ρ)x i 12 / 38
28 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives 13 / 38
29 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) 13 / 38
30 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) Only known results for fixed p and n : not appropriate in BigData. 13 / 38
31 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) Only known results for fixed p and n : not appropriate in BigData. We thus need : study Ĉp as n, p exploit the double-concentration effect to better understand Ĉp. 13 / 38
32 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) Only known results for fixed p and n : not appropriate in BigData. We thus need : study Ĉp as n, p exploit the double-concentration effect to better understand Ĉp. Results and Perspectives Asymptotic approximation of Ĉp by a tractable equivalent model Second order statistics (CLT type) for Ĉp Study of elliptical cases, outliers, regularized or not. Applications : radar array processing (impulsiveness due to clutter) financial data processing 13 / 38
33 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) Only known results for fixed p and n : not appropriate in BigData. We thus need : study Ĉp as n, p exploit the double-concentration effect to better understand Ĉp. Results and Perspectives Asymptotic approximation of Ĉp by a tractable equivalent model Second order statistics (CLT type) for Ĉp Study of elliptical cases, outliers, regularized or not. Applications : radar array processing (impulsiveness due to clutter) financial data processing Joint mean and covariance estimation Study of robust regression More generally, deeper study of iterative methods in large dimensions (such as AMP). 13 / 38
34 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 14/38 Axis 1 : Robust Estimation in Large Dimensions Theorem ([Couillet,Pascal,Silverstein 15] Maronna Estimator) For x i = τ i w i, with τ i impulsive, w i orthogonal and isotropic, w i = p, Ĉp Ŝp p.s. 0 in spectral norm, where Ĉ p = 1 n ( ) 1 u n p x i Ĉ 1 p x i x i x i i=1 Ŝ p = 1 n v(τ i γ p)x i x i n i=1 with v(t) similar to u(t) and γ p unique solution of 1 = 1 n γ pv(τ i γ p) n 1 + cγ j=1 pv(τ i γ. p) 14 / 38
35 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences 15 / 38
36 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp 15 / 38
37 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp Analysis and optimization possible by replacing Ĉp by Ŝp. 15 / 38
38 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp Analysis and optimization possible by replacing Ĉp by Ŝp. 3 Eigenvalues of Ĉp Figure n = 2500, p = 500, C p = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 15 / 38
39 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp Analysis and optimization possible by replacing Ĉp by Ŝp. 3 Eigenvalues of Ĉp Eigenvalues of Ŝp Figure n = 2500, p = 500, C p = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 15 / 38
40 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp Analysis and optimization possible by replacing Ĉp by Ŝp. 3 Eigenvalues of Ĉp Eigenvalues of Ŝp Limiting density Figure n = 2500, p = 500, C p = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 15 / 38
41 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 16/38 Axis 1 : Robust Estimation in Large Dimensions Application to Detection in Radars 16 / 38
42 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 16/38 Axis 1 : Robust Estimation in Large Dimensions Application to Detection in Radars Hypothesis testing under impulsive noise : purely noisy inputs x 1,..., x n, x i = τ i w i, new datum { τw, H0 y = s + τw, H 1 16 / 38
43 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 16/38 Axis 1 : Robust Estimation in Large Dimensions Application to Detection in Radars Hypothesis testing under impulsive noise : purely noisy inputs x 1,..., x n, x i = τ i w i, new datum { τw, H0 y = s + τw, H 1 Robust detector T p(ρ) given by T p(ρ) H 1 H 0 γ p where y Ĉ 1 p (ρ)s T p(ρ) = y Ĉ 1 p (ρ)y p Ĉ 1 p (ρ)p Ĉ p(ρ) = (1 ρ) 1 n x i x i + ρi n 1 p. i=1 p x i Ĉp(ρ) 1 x i 16 / 38
44 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 16/38 Axis 1 : Robust Estimation in Large Dimensions Application to Detection in Radars Hypothesis testing under impulsive noise : purely noisy inputs x 1,..., x n, x i = τ i w i, new datum { τw, H0 y = s + τw, H 1 Robust detector T p(ρ) given by T p(ρ) H 1 H 0 γ p where y Ĉ 1 p (ρ)s T p(ρ) = y Ĉ 1 p (ρ)y p Ĉ 1 p (ρ)p Ĉ p(ρ) = (1 ρ) 1 n x i x i + ρi n 1 p. i=1 p x i Ĉp(ρ) 1 x i Objectives performance analysis find optimal regularization ρ parameter. 16 / 38
45 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 17/38 Axis 1 : Robust Estimation in Large Dimensions Théorie Theory Estimateur empirique Empirical Estimator Détecteur Detector P ( ptp(ρ) > γ) γ = γ = ρ γ = γ = ρ Figure False alarm rate P ( pt p(ρ) > γ), for p = 20 (left), p = 100 (right), s = p 1 2 [1,..., 1] T, [C p] ij = 0.7 i j, p/n = 1/2. 17 / 38
46 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 18/38 Axis 1 : Robust Estimation in Large Dimensions 10 0 Theory Detector p = 20 p = Figure False alarm rate P (T p(ˆρ p ) > Γ), ˆρ p best estimated ρ, for p = 20 and p = 100, s = p 1 2 [1,..., 1] T, p/n = 1/2 and [C p] ij = 0.7 i j. Γ 18 / 38
47 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 19/38 Axis 1 : Robust Estimation in Large Dimensions Theoretical Results (clickable title links) R. Couillet, M. McKay, Large Dimensional Analysis and Optimization of Robust Shrinkage Covariance Matrix Estimators, Elsevier Journal of Multivariate Analysis, vol. 131, pp , R. Couillet, F. Pascal, J. W. Silverstein, The Random Matrix Regime of Maronna s M-estimator with elliptically distributed samples, Elsevier Journal of Multivariate Analysis, vol. 139, pp , D. Morales-Jimenez, R. Couillet, M. McKay, Large Dimensional Analysis of Robust M-Estimators of Covariance with Outliers, IEEE Transactions on Signal Processing, vol. 63, no. 21, pp , R. Couillet, A. Kammoun, F. Pascal, Second order statistics of robust estimators of scatter. Application to GLRT detection for elliptical signals, Elsevier Journal of Multivariate Analysis, vol. 143, pp , Applications (clickable title links) R. Couillet, A. Kammoun, F. Pascal, Second order statistics of robust estimators of scatter. Application to GLRT detection for elliptical signals, Elsevier Journal of Multivariate Analysis, vol. 143, pp , R. Couillet, Robust spiked random matrices and a robust G-MUSIC estimator, Elsevier Journal of Multivariate Analysis, vol. 140, pp , / 38
48 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 20/38 Axis 1 : Robust Estimation in Large Dimensions L. Yang, R. Couillet, M. McKay, A Robust Statistics Approach to Minimum Variance Portfolio Optimization IEEE Transactions on Signal Processing, vol. 63, no. 24, pp , A. Kammoun, R. Couillet, F. Pascal, M.-S. Alouini, Optimal Design of the Adaptive Normalized Matched Filter Detector (submitted to) IEEE Transactions on Information Theory, 2015, arxiv Preprint / 38
49 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 21/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 21 / 38
50 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). 22 / 38
51 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). Spectral Algorithms 22 / 38
52 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). Spectral Algorithms Data often non linearly separable 22 / 38
53 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). Spectral Algorithms Data often non linearly separable Numerous methods based on kernel matrices K R n n, with (for instance) K ij = f ( x i x j 2) and f some function (often decreasing). 22 / 38
54 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). Spectral Algorithms Data often non linearly separable Numerous methods based on kernel matrices K R n n, with (for instance) K ij = f ( x i x j 2) and f some function (often decreasing). Spectral methods consist in : extracting dominating eigenvectors of K (spectral clustering) solve optimization problem based on K (support vector machine) linear functional of K (semi-supervised methods) 22 / 38
55 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives 23 / 38
56 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives Matrix K very difficult to analyze for small dimensional x i (even Gaussian) Qualitative understanding of the tools, difficult to optimize. 23 / 38
57 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives Matrix K very difficult to analyze for small dimensional x i (even Gaussian) Qualitative understanding of the tools, difficult to optimize. We need here : study K as n, p exploit the double-concentration to better understand K deduce quantitative performance of learning methods improve performances as well as methods. 23 / 38
58 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives Matrix K very difficult to analyze for small dimensional x i (even Gaussian) Qualitative understanding of the tools, difficult to optimize. We need here : study K as n, p exploit the double-concentration to better understand K deduce quantitative performance of learning methods improve performances as well as methods. Results and Perspectives Development of tools for large dimensional analysis of kernel matrices. Thorough analysis of spectral clustering performance in (large) Gaussian mixtures. Optimization for subspace clustering (new approach, undergoing patent by HUAWEI). 23 / 38
59 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives Matrix K very difficult to analyze for small dimensional x i (even Gaussian) Qualitative understanding of the tools, difficult to optimize. We need here : study K as n, p exploit the double-concentration to better understand K deduce quantitative performance of learning methods improve performances as well as methods. Results and Perspectives Development of tools for large dimensional analysis of kernel matrices. Thorough analysis of spectral clustering performance in (large) Gaussian mixtures. Optimization for subspace clustering (new approach, undergoing patent by HUAWEI). Generalization to semi-supervised case. Study of support vector machines in this context. Generalization to more realistic models, deeper comparison to real datasets. 23 / 38
60 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T nd T n D1n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p 24 / 38
61 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T n D T n D1n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. 24 / 38
62 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T n D T nd1 n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. Important Notations : 1 p J = [j 1,..., j k ], j a canonical vector of class C a. 24 / 38
63 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T nd T nd1 n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. Important Notations : M = [µ 1,..., µ k ], µ a = µa k n b b=1 n µ b. 24 / 38
64 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T n D T nd1 n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. Important [ Notations : ] t = p 1 tr C1,..., p 1 tr Ck, Ca = Ca k n b b=1 n C b. 24 / 38
65 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T n D T nd1 n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. Important Notations : { } k T = 1 p tr C a C b a,b=1, C a = Ca k b=1 n b n C b. 24 / 38
66 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : 25 / 38
67 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure 25 / 38
68 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure important consequences to kernel choice (depends on derivatives f (l) (τ)) 25 / 38
69 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure important consequences to kernel choice (depends on derivatives f (l) (τ)) Application to real data : MNIST database (µ a, C a evaluated from full database) Figure Four leading eigenvectors of D 2 1 KD 1 2 for MNIST dataset (red), equivalent Gaussian model (black), and asymptotic results (blue). 25 / 38
70 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure important consequences to kernel choice (depends on derivatives f (l) (τ)) Application to real data : MNIST database (µ a, C a evaluated from full database) Figure Four leading eigenvectors of D 2 1 KD 1 2 for MNIST dataset (red), equivalent Gaussian model (black), and asymptotic results (blue). 25 / 38
71 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure important consequences to kernel choice (depends on derivatives f (l) (τ)) Application to real data : MNIST database (µ a, C a evaluated from full database) Figure Four leading eigenvectors of D 2 1 KD 1 2 for MNIST dataset (red), equivalent Gaussian model (black), and asymptotic results (blue). 25 / 38
72 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 26/38 Axis 2 : Classification in Large Dimensions Eigenvector 2/Eigenvector 1 Eigenvector 3/Eigenvector 2 Figure 2D plot of eigenvectors of L, MNIST database. Theoretical 1-σ and 2-σ standard deviations in blue. Classes 0, 1, 2 in colors. 26 / 38
73 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 27/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 27 / 38
74 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) 28 / 38
75 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) dynamical model x t = S (W x t 1 + mu t + ηε t) with n-node network with connectivity W R n n activation function S internal noise ε t (biological model essentially) 28 / 38
76 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) dynamical model x t = S (W x t 1 + mu t + ηε t) with n-node network with connectivity W R n n activation function S internal noise ε t (biological model essentially) readout ω R n training only (depth-1 NN) by LS regression ω = (XX T ) 1 Xr for T -long training u r R T and X = [x 1,..., x T ]. 28 / 38
77 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) dynamical model x t = S (W x t 1 + mu t + ηε t) with n-node network with connectivity W R n n activation function S internal noise ε t (biological model essentially) readout ω R n training only (depth-1 NN) by LS regression ω = (XX T ) 1 Xr for T -long training u r R T and X = [x 1,..., x T ]. Performance Measures : quadratic errors in training and testing memory, training MSE = r X T ω for training couples u r R T. 28 / 38
78 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) dynamical model x t = S (W x t 1 + mu t + ηε t) with n-node network with connectivity W R n n activation function S internal noise ε t (biological model essentially) readout ω R n training only (depth-1 NN) by LS regression ω = (XX T ) 1 Xr for T -long training u r R T and X = [x 1,..., x T ]. Performance Measures : quadratic errors in training and testing memory, training MSE = r X T ω for training couples u r R T. generalization, test MSE = ˆr ˆX T ω for test couples û ˆr R ˆT, ω = ω(u, r). 28 / 38
79 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives 29 / 38
80 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative 29 / 38
81 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t 29 / 38
82 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t We need here : Study asymptotic performances as n, T, ˆT Understand effects of W -defining hyper-parameters Generalize study to more advanced models. 29 / 38
83 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t We need here : Study asymptotic performances as n, T, ˆT Understand effects of W -defining hyper-parameters Generalize study to more advanced models. Results and Perspectives 29 / 38
84 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t We need here : Study asymptotic performances as n, T, ˆT Understand effects of W -defining hyper-parameters Generalize study to more advanced models. Results and Perspectives Asymptotic deterministic equivalents for training and testing MSE Multiples new consequences and intuitions Proposition of improved structures for W 29 / 38
85 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t We need here : Study asymptotic performances as n, T, ˆT Understand effects of W -defining hyper-parameters Generalize study to more advanced models. Results and Perspectives Asymptotic deterministic equivalents for training and testing MSE Multiples new consequences and intuitions Proposition of improved structures for W Generalization to non-linear setting Introduction of external memory, back-propagation Analogous study of deep networks, extreme ML, auto-encoders, etc. 29 / 38
86 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 30/38 Axis 3 : Random Matrices and Neural Networks Theorem ([Couillet,Wainrib 16] Training MSE for fixed W ) As n, T, with n/t c < 1, MSE = 1 ( T rt I T + R + 1 { } ) T 1 1 η 2 U T m T (W i ) T R 1 W j m U r + o(1) i,j=0 where U ij = u i j and R, R are solutions to { 1 ( )} T R = c n tr S R 1 i j, R 1 = i,j=1 T tr ( J q (I T + R) 1) S q q= with [J q ] ij δ i+q,j and S q k 0 W k+( q)+ (W k+q+ ) T. 30 / 38
87 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 30/38 Axis 3 : Random Matrices and Neural Networks Theorem ([Couillet,Wainrib 16] Training MSE for fixed W ) As n, T, with n/t c < 1, MSE = 1 ( T rt I T + R + 1 { } ) T 1 1 η 2 U T m T (W i ) T R 1 W j m U r + o(1) i,j=0 where U ij = u i j and R, R are solutions to { 1 ( )} T R = c n tr S R 1 i j, R 1 = i,j=1 T tr ( J q (I T + R) 1) S q q= with [J q ] ij δ i+q,j and S q k 0 W k+( q)+ (W k+q+ ) T. Corollaries : for c = 0 (S 0 = k 0 W k (W k ) T ), MSE = 1 ( T rt I T + 1 { } ) T 1 1 η 2 U T m T (W i ) T S 1 0 W j m U r + o(1) i,j=0 30 / 38
88 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 30/38 Axis 3 : Random Matrices and Neural Networks Theorem ([Couillet,Wainrib 16] Training MSE for fixed W ) As n, T, with n/t c < 1, MSE = 1 ( T rt I T + R + 1 { } ) T 1 1 η 2 U T m T (W i ) T R 1 W j m U r + o(1) i,j=0 where U ij = u i j and R, R are solutions to { 1 ( )} T R = c n tr S R 1 i j, R 1 = i,j=1 T tr ( J q (I T + R) 1) S q q= with [J q ] ij δ i+q,j and S q k 0 W k+( q)+ (W k+q+ ) T. Corollaries : for c = 0 (S 0 = k 0 W k (W k ) T ), MSE = 1 ( T rt I T + 1 { } ) T 1 1 η 2 U T m T (W i ) T S 1 0 W j m U r + o(1) i,j=0 for W = σz with Z Haar, m = 1 independent of W, MSE = (1 c) 1 ( T rt I T + 1 η 2 U T diag {(1 σ 2 )σ 2(i 1)} ) T 1 U r + o(1). i=1 30 / 38
89 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 31/38 Axis 3 : Random Matrices and Neural Networks n = 200, T = ˆT = 400 n = 400, T = ˆT = Test Test NMSE 10 3 Entr η 2 Monte Carlo Th. (W fixe) Th. (limite) Train Monte Carlo Th. (W fixed) Th. (limit) η 2 Figure Prediction for the Mackey Glass model, W = σz, σ =.9, Z Haar. 31 / 38
90 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 32/38 Axis 3 : Random Matrices and Neural Networks Consequences : Analysis suggests choice W = diag(w 1,..., W k ), W j = σ j Z j, Z j R n j n j Haar, leading to change k (1 σ 2 )σ 2τ j=1 MC(τ) c jσj 2τ k j=1 c j(1 σ. j 2) MC(τ; W ) MC(τ; W + 1 ) MC(τ; W + 2 ) MC(τ; W + 3 ) τ Figure Memory curve (MC) for W = diag(w 1, W 2, W 3), W j = σ jz j, Z j R n j n j Haar, σ 1 =.99, n 1/n =.01, σ 2 =.9, n 2/n =.1, and σ 3 =.5, n 3/n =.89. Matrices W + i defined by W + i = σ iz + i, with Z+ i R n n Haar. 32 / 38
91 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 33/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 33 / 38
92 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs 34 / 38
93 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) 34 / 38
94 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods 34 / 38
95 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods Tools : 34 / 38
96 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods Tools : methods based on adjacency A, Laplacian L, or modularity M L = D A M = A E[A]. e.g., if A ij Bern(q i q j ), M = A qq T. 34 / 38
97 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods Tools : methods based on adjacency A, Laplacian L, or modularity M L = D A M = A E[A]. e.g., if A ij Bern(q i q j ), M = A qq T. spectrum and eigenvectors of A, L fundamental to inference methods 34 / 38
98 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods Tools : methods based on adjacency A, Laplacian L, or modularity M L = D A M = A E[A]. e.g., if A ij Bern(q i q j ), M = A qq T. spectrum and eigenvectors of A, L fundamental to inference methods optimization, regression, PCA, etc., based on spectral properties. 34 / 38
99 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives 35 / 38
100 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods 35 / 38
101 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied 35 / 38
102 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied We need here : develop and analyze community detection algorithms for realistic graphs analyze performances of signal processing on graphs methods 35 / 38
103 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied We need here : develop and analyze community detection algorithms for realistic graphs analyze performances of signal processing on graphs methods Results and Perspectives 35 / 38
104 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied We need here : develop and analyze community detection algorithms for realistic graphs analyze performances of signal processing on graphs methods Results and Perspectives New algorithms (and their analysis) for community detection with heterogeneous nodes 35 / 38
105 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied We need here : develop and analyze community detection algorithms for realistic graphs analyze performances of signal processing on graphs methods Results and Perspectives New algorithms (and their analysis) for community detection with heterogeneous nodes Exportation to signal processing on graphs problems. 35 / 38
106 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 36/38 Axis 4 : Graphs Model : graph G with n nodes and k classes, with for i C a, j C b, where C ab = 1 + n 1 2 Γ ab, Γ ab = O(1). A ij Bern (q i q j C ab ) 36 / 38
107 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 36/38 Axis 4 : Graphs Model : graph G with n nodes and k classes, with for i C a, j C b, where C ab = 1 + n 1 2 Γ ab, Γ ab = O(1). A ij Bern (q i q j C ab ) Limitations of classical approaches : Normalized modularity (ˆq = 1 n A1n) [ ] L = 1 diag(ˆq) 1 A ˆqˆqT n 1 n 1T n ˆq diag(ˆq) degree bias 0.1 class C 1 class C , corrected bias class C 1 class C ,000 Figure 2nd eigenvector of A (top) and 1st eigenvector of L (bottom) with bimodal q i, 2 classes, n = / 38
108 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 37/38 Axis 4 : Graphs Theorem ([Tiomoko Ali,Couillet 16] Limiting Deterministic Equivalent) p.s. As n, L L 0 with L = 1 [ ] 1 n m 2 D 1 XD 1 + UΛU T q 1 where D = diag({q i }), m q = lim n n i q i and [ ] J n 1 U = D nm 1 X1 n q [ (Ik 1 Λ = k c T )Γ(I k c1 T k ) 1 ] k 1 k 0 J = [j 1,..., j k ], j a = [0,..., 0, 1,..., 1, 0,..., 0] T R n canonical vector of class C a. 37 / 38
109 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 37/38 Axis 4 : Graphs Theorem ([Tiomoko Ali,Couillet 16] Limiting Deterministic Equivalent) p.s. As n, L L 0 with L = 1 [ ] 1 n m 2 D 1 XD 1 + UΛU T q 1 where D = diag({q i }), m q = lim n n i q i and [ ] J n 1 U = D nm 1 X1 n q [ (Ik 1 Λ = k c T )Γ(I k c1 T k ) 1 ] k 1 k 0 J = [j 1,..., j k ], j a = [0,..., 0, 1,..., 1, 0,..., 0] T R n canonical vector of class C a. Consequences : detection based on eigenvalues of Γ alignment of eigenvectors to j a 37 / 38
110 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 38/38 End Thank you. 38 / 38
Random Matrices for Big Data Signal Processing and Machine Learning
Random Matrices for Big Data Signal Processing and Machine Learning (ICASSP 2017, New Orleans) Romain COUILLET and Hafiz TIOMOKO ALI CentraleSupélec, France March, 2017 1 / 153 Outline Basics of Random
More informationRandom Matrices in Machine Learning
Random Matrices in Machine Learning Romain COUILLET CentraleSupélec, University of ParisSaclay, France GSTATS IDEX DataScience Chair, GIPSA-lab, University Grenoble Alpes, France. June 21, 2018 1 / 113
More informationA Random Matrix Framework for BigData Machine Learning
A Random Matrix Framework for BigData Machine Learning (Groupe Deep Learning, DigiCosme) Romain COUILLET CentraleSupélec, France June, 2017 1 / 63 Outline Basics of Random Matrix Theory Motivation: Large
More informationA Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications
A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications (EURECOM) Romain COUILLET CentraleSupélec, France June, 2017 1 / 80 Outline Basics of Random Matrix Theory
More informationRobust covariance matrices estimation and applications in signal processing
Robust covariance matrices estimation and applications in signal processing F. Pascal SONDRA/Supelec GDR ISIS Journée Estimation et traitement statistique en grande dimension May 16 th, 2013 FP (SONDRA/Supelec)
More informationMinimum variance portfolio optimization in the spiked covariance model
Minimum variance portfolio optimization in the spiked covariance model Liusha Yang, Romain Couillet, Matthew R Mckay To cite this version: Liusha Yang, Romain Couillet, Matthew R Mckay Minimum variance
More informationRandom Matrix Theory for Neural Networks
Random Matrix Theory for Neural Networks Ph.D. Mid-Term Evaluation Zhenyu Liao Laboratoire des Signaux et Systèmes CentraleSupélec Université Paris-Saclay Salle sd.207, Bâtiment Bouygues Gif-sur-Yvette,
More informationOn the Spectrum of Random Features Maps of High Dimensional Data
On the Spectrum of Random Features Maps of High Dimensional Data ICML 018, Stockholm, Sweden Zhenyu Liao, Romain Couillet LS, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair,
More informationEXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2
EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut
More informationFree Probability, Sample Covariance Matrices and Stochastic Eigen-Inference
Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Alan Edelman Department of Mathematics, Computer Science and AI Laboratories. E-mail: edelman@math.mit.edu N. Raj Rao Deparment
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationThe Dynamics of Learning: A Random Matrix Approach
The Dynamics of Learning: A Random Matrix Approach ICML 2018, Stockholm, Sweden Zhenyu Liao, Romain Couillet L2S, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab,
More informationRMT for whitening space correlation and applications to radar detection
RMT for whitening space correlation and applications to radar detection Romain Couillet, Maria S Greco, Jean-Philippe Ovarlez, Frédéric Pascal To cite this version: Romain Couillet, Maria S Greco, Jean-Philippe
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationPerformances of Low Rank Detectors Based on Random Matrix Theory with Application to STAP
Performances of Low Rank Detectors Based on Random Matrix Theory with Application to STAP Alice Combernoux, Frédéric Pascal, Marc Lesturgie, Guillaume Ginolhac To cite this version: Alice Combernoux, Frédéric
More informationShort Term Memory Quantifications in Input-Driven Linear Dynamical Systems
Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Peter Tiňo and Ali Rodan School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {P.Tino,
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationEstimation of the Global Minimum Variance Portfolio in High Dimensions
Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results
More informationRobust adaptive detection of buried pipes using GPR
Robust adaptive detection of buried pipes using GPR Q. Hoarau, G. Ginolhac, A. M. Atto, J.-P. Ovarlez, J.-M. Nicolas Université Savoie Mont Blanc - Centrale Supelec - Télécom ParisTech 12 th July 2016
More informationPlug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection
Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection Nir Halay and Koby Todros Dept. of ECE, Ben-Gurion University of the Negev, Beer-Sheva, Israel February 13, 2017 1 /
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationTutorial on Machine Learning for Advanced Electronics
Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling
More informationCompressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach
Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationHigh Dimensional Minimum Risk Portfolio Optimization
High Dimensional Minimum Risk Portfolio Optimization Liusha Yang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology Centrale-Supélec June 26, 2015 Yang (HKUST)
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationLeast squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova
Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationEmpirical properties of large covariance matrices in finance
Empirical properties of large covariance matrices in finance Ex: RiskMetrics Group, Geneva Since 2010: Swissquote, Gland December 2009 Covariance and large random matrices Many problems in finance require
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationReservoir Computing and Echo State Networks
An Introduction to: Reservoir Computing and Echo State Networks Claudio Gallicchio gallicch@di.unipi.it Outline Focus: Supervised learning in domain of sequences Recurrent Neural networks for supervised
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationGaussian Process Optimization with Mutual Information
Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationOn corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR
Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationDetection of Anomalies in Texture Images using Multi-Resolution Features
Detection of Anomalies in Texture Images using Multi-Resolution Features Electrical Engineering Department Supervisor: Prof. Israel Cohen Outline Introduction 1 Introduction Anomaly Detection Texture Segmentation
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication
More informationResearch Overview. Kristjan Greenewald. February 2, University of Michigan - Ann Arbor
Research Overview Kristjan Greenewald University of Michigan - Ann Arbor February 2, 2016 2/17 Background and Motivation Want efficient statistical modeling of high-dimensional spatio-temporal data with
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationA spectral clustering algorithm based on Gram operators
A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping
More informationOn corrections of classical multivariate tests for high-dimensional data
On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationA SIRV-CFAR Adaptive Detector Exploiting Persymmetric Clutter Covariance Structure
A SIRV-CFAR Adaptive Detector Exploiting Persymmetric Clutter Covariance Structure Guilhem Pailloux 2 Jean-Philippe Ovarlez Frédéric Pascal 3 and Philippe Forster 2 ONERA - DEMR/TSI Chemin de la Hunière
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationNovel spectrum sensing schemes for Cognitive Radio Networks
Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationData Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs
Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationLecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian
Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Radu Balan February 5, 2018 Datasets diversity: Social Networks: Set of individuals ( agents, actors ) interacting with each other (e.g.,
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationUnsupervised Anomaly Detection for High Dimensional Data
Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationBack to the future: Radial Basis Function networks revisited
Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationTropical Graph Signal Processing
Tropical Graph Signal Processing Vincent Gripon To cite this version: Vincent Gripon. Tropical Graph Signal Processing. 2017. HAL Id: hal-01527695 https://hal.archives-ouvertes.fr/hal-01527695v2
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationPatch similarity under non Gaussian noise
The 18th IEEE International Conference on Image Processing Brussels, Belgium, September 11 14, 011 Patch similarity under non Gaussian noise Charles Deledalle 1, Florence Tupin 1, Loïc Denis 1 Institut
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationPredicting Graph Labels using Perceptron. Shuang Song
Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction
More informationA GLRT FOR RADAR DETECTION IN THE PRESENCE OF COMPOUND-GAUSSIAN CLUTTER AND ADDITIVE WHITE GAUSSIAN NOISE. James H. Michels. Bin Liu, Biao Chen
A GLRT FOR RADAR DETECTION IN THE PRESENCE OF COMPOUND-GAUSSIAN CLUTTER AND ADDITIVE WHITE GAUSSIAN NOISE Bin Liu, Biao Chen Syracuse University Dept of EECS, Syracuse, NY 3244 email : biliu{bichen}@ecs.syr.edu
More informationCorrelation Preserving Unsupervised Discretization. Outline
Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization
More informationExploiting Sparse Non-Linear Structure in Astronomical Data
Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationA central limit theorem for an omnibus embedding of random dot product graphs
A central limit theorem for an omnibus embedding of random dot product graphs Keith Levin 1 with Avanti Athreya 2, Minh Tang 2, Vince Lyzinski 3 and Carey E. Priebe 2 1 University of Michigan, 2 Johns
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationIntroduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond
PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationCS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis
CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More information