Research Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38

Size: px
Start display at page:

Download "Research Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38"

Transcription

1 Research Program Romain COUILLET CentraleSupélec Université Paris-Sud février / 38

2 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 2 / 38

3 Curriculum Vitae/ 3/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 3 / 38

4 Curriculum Vitae/ 4/38 Education and Professional Experience Professional Experience Full Professor since Jan CentraleSupélec, Gif sur Yvette, France. Telecom Department, LANEAS group, Dvision Signals & Stats Diplomas Habilitation à Diriger des Recherches Feb Place University Paris Saclay, France Topic Robust Estimation in the Large Random Matrix Regime PhD in Physics Nov Place CentraleSupélec, Gif sur Yvette, France Topic Application of Random Matrix Theory to Future Wireless Flexible Networks Advis. Mérouane Debbah Engineer and Master Diplomas Mar Place Telecom ParisTech, Paris, France Grade Very Good (Très Bien) Topic Communications, embedded systems, computer science. 4 / 38

5 Curriculum Vitae/ 5/38 Teaching Activities and Research Projects Teaching Activities ENS Cachan, Cachan, France since 2013 Courses Master MVA, 18 hrs/year CentraleSupélec, Gif sur Yvette, France since 2011 Courses PhD level, 18 hrs/year Master SAR, research seminars, 24 hrs/year Undergraduate, lectures + practical courses, 70 hrs/year Advising Interns, undergraduate projects, 8/year Research : Projects HUAWEI RMTin5G 100% (PI) ANR RMT4GRAPH 100% (PI) ERC MORE 50% ANR DIONISOS 25% Research : Community Life Special Session organizations 4 IEEE Senior Member since 2015 IEEE SPTM technical committee member since 2014 IEEE TSP Associate Editor since 2015 Member of GRETSI since / 38

6 Curriculum Vitae/ 6/38 PhD students Axel MÜLLER (research engineer at HUAWEI Labs, Paris) Subject Random matrix models for multi-cellular wireless communications Advising 50%, with M. Debbah (CentraleSupélec) Publications 3 articles in IEEE-JSTSP, -TIT, -TSP, 5 IEEE conferences Awards 1 best student paper award. Julia VINOGRADOVA (postdoc at Linköping University, Sweden) Subject Random matrix theory applied to detection and estimation in antenna arrays Advising 50%, with W. Hachem (Telecom ParisTech) Publications 2 articles in IEEE-TSP, 2 IEEE conferences Azary ABBOUD (postdoc at INRIA, France) Subject Distributed optimization in smart grids Advising 33%, with M. Debbah and H. Siguerdidjane (CentraleSupélec) Publications 1 article in IEEE-TSP, 1 IEEE conference Gil KATZ Subject Interactive communications for distributed computation Advising 33%, with M. Debbah and P. Piantanida (CentraleSupélec) Publications 1 IEEE conference Hafiz TIOMOKO ALI Subject Random matrices in machine learning Advising 100% Publications 2 IEEE conferences. 6 / 38

7 Curriculum Vitae/ 7/38 Research Activities Publication Record (as of February 1st, 2016) Publications Books : 1, Chapters : 3, Journals : 36, Conferences : 53, Patents : 4. Citations 1256 (five best : 282, 204, 84, 49, 33) Indices h-index : 17, i10-index : 25 Subjects Mathematics Applications random matrix theory, statistics machine learning, signal processing, communications Journal Articles per Area 8 6 Telecom Signal Maths/stats Learning soumis 7 / 38

8 Curriculum Vitae/ 8/38 Research Activities Prizes and Awards IEEE Senior Member 2016 CNRS Bronze Medal (section INS2I) 2013 IEEE ComSoc Outstanding Young Researcher Award (EMEA region) 2013 EEA/GdR ISIS/GRETSI PhD thesis award 2011 Paper Awards Second prize of the IEEE Australia Council Student Paper Contest 2013 Best Student Paper Award Final of the IEEE Asilomar Conference 2011 Best Student Paper Award of the ValueTools Conference / 38

9 Research Project : Learning in Large Dimensions/ 9/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 9 / 38

10 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 10 / 38

11 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 10 / 38

12 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 10 / 38

13 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity 10 / 38

14 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 10 / 38

15 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 10 / 38

16 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 10 / 38

17 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) 10 / 38

18 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) Our approach 10 / 38

19 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) Our approach 1. development of methods and mathematical tools to handle large and numerous datasets 10 / 38

20 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) Our approach 1. development of methods and mathematical tools to handle large and numerous datasets 2. revisit robust statistics in large dimensions 10 / 38

21 Research Project : Learning in Large Dimensions/ 10/38 Context The BigData Challenge 1. dramatic increase of data dimension (and number) 2. importance of outlying and missing data 3. data heterogeneity Limitations of classical tools 1. classical statistics limited by small but numerous data hypothesis 2. techniques relying on poorly robust empirical estimates or barely usable robust approaches 3. bipolarity between : powerful techniques based on models (signal processing approach) ad-hoc techniques based on data (machine learning/stats approach) Our approach 1. development of methods and mathematical tools to handle large and numerous datasets 2. revisit robust statistics in large dimensions 3. (for lack of better approach) better understand and improve ad-hoc techniques on simple but large dimensional models. 10 / 38

22 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 11/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 11 / 38

23 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers 12 / 38

24 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers Several Estimators for C p 12 / 38

25 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers Several Estimators for C p ML estimator in Gaussian case : sample covariance matrix Ĉ p = 1 n x i x i n. i=1 very practical, used in many methods but very sensitive to outliers 12 / 38

26 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers Several Estimators for C p ML estimator in Gaussian case : sample covariance matrix Ĉ p = 1 n x i x i n. i=1 very practical, used in many methods but very sensitive to outliers [Huber 67 ; Maronna 76] Robust Estimators (outliers or heavy tails) Ĉ p = 1 n ( ) 1 u n p x i Ĉ 1 p x i x i x i. i=1 12 / 38

27 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 12/38 Axis 1 : Robust Estimation in Large Dimensions Baseline Scenario : x 1,..., x n R p i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = Cp, but potentially heavy tailed existence of outliers Several Estimators for C p ML estimator in Gaussian case : sample covariance matrix Ĉ p = 1 n x i x i n. i=1 very practical, used in many methods but very sensitive to outliers [Huber 67 ; Maronna 76] Robust Estimators (outliers or heavy tails) Ĉ p = 1 n ( ) 1 u n p x i Ĉ 1 p x i x i x i. i=1 [Pascal 13 ; Chen 11] Regularized Versions for Large Data (all n, p), Ĉ p(ρ) = (1 ρ) 1 n x i x i + ρi n 1 N. i=1 p x i Ĉ 1 p (ρ)x i 12 / 38

28 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives 13 / 38

29 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) 13 / 38

30 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) Only known results for fixed p and n : not appropriate in BigData. 13 / 38

31 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) Only known results for fixed p and n : not appropriate in BigData. We thus need : study Ĉp as n, p exploit the double-concentration effect to better understand Ĉp. 13 / 38

32 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) Only known results for fixed p and n : not appropriate in BigData. We thus need : study Ĉp as n, p exploit the double-concentration effect to better understand Ĉp. Results and Perspectives Asymptotic approximation of Ĉp by a tractable equivalent model Second order statistics (CLT type) for Ĉp Study of elliptical cases, outliers, regularized or not. Applications : radar array processing (impulsiveness due to clutter) financial data processing 13 / 38

33 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 13/38 Axis 1 : Robust Estimation in Large Dimensions Problem and Objectives Ill-understood estimators, difficult to use (implicit definition of Ĉp) Only known results for fixed p and n : not appropriate in BigData. We thus need : study Ĉp as n, p exploit the double-concentration effect to better understand Ĉp. Results and Perspectives Asymptotic approximation of Ĉp by a tractable equivalent model Second order statistics (CLT type) for Ĉp Study of elliptical cases, outliers, regularized or not. Applications : radar array processing (impulsiveness due to clutter) financial data processing Joint mean and covariance estimation Study of robust regression More generally, deeper study of iterative methods in large dimensions (such as AMP). 13 / 38

34 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 14/38 Axis 1 : Robust Estimation in Large Dimensions Theorem ([Couillet,Pascal,Silverstein 15] Maronna Estimator) For x i = τ i w i, with τ i impulsive, w i orthogonal and isotropic, w i = p, Ĉp Ŝp p.s. 0 in spectral norm, where Ĉ p = 1 n ( ) 1 u n p x i Ĉ 1 p x i x i x i i=1 Ŝ p = 1 n v(τ i γ p)x i x i n i=1 with v(t) similar to u(t) and γ p unique solution of 1 = 1 n γ pv(τ i γ p) n 1 + cγ j=1 pv(τ i γ. p) 14 / 38

35 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences 15 / 38

36 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp 15 / 38

37 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp Analysis and optimization possible by replacing Ĉp by Ŝp. 15 / 38

38 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp Analysis and optimization possible by replacing Ĉp by Ŝp. 3 Eigenvalues of Ĉp Figure n = 2500, p = 500, C p = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 15 / 38

39 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp Analysis and optimization possible by replacing Ĉp by Ŝp. 3 Eigenvalues of Ĉp Eigenvalues of Ŝp Figure n = 2500, p = 500, C p = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 15 / 38

40 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 15/38 Axis 1 : Robust Estimation in Large Dimensions Consequences Difficult object Ĉp made tractable thanks to Ŝp Analysis and optimization possible by replacing Ĉp by Ŝp. 3 Eigenvalues of Ĉp Eigenvalues of Ŝp Limiting density Figure n = 2500, p = 500, C p = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 15 / 38

41 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 16/38 Axis 1 : Robust Estimation in Large Dimensions Application to Detection in Radars 16 / 38

42 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 16/38 Axis 1 : Robust Estimation in Large Dimensions Application to Detection in Radars Hypothesis testing under impulsive noise : purely noisy inputs x 1,..., x n, x i = τ i w i, new datum { τw, H0 y = s + τw, H 1 16 / 38

43 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 16/38 Axis 1 : Robust Estimation in Large Dimensions Application to Detection in Radars Hypothesis testing under impulsive noise : purely noisy inputs x 1,..., x n, x i = τ i w i, new datum { τw, H0 y = s + τw, H 1 Robust detector T p(ρ) given by T p(ρ) H 1 H 0 γ p where y Ĉ 1 p (ρ)s T p(ρ) = y Ĉ 1 p (ρ)y p Ĉ 1 p (ρ)p Ĉ p(ρ) = (1 ρ) 1 n x i x i + ρi n 1 p. i=1 p x i Ĉp(ρ) 1 x i 16 / 38

44 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 16/38 Axis 1 : Robust Estimation in Large Dimensions Application to Detection in Radars Hypothesis testing under impulsive noise : purely noisy inputs x 1,..., x n, x i = τ i w i, new datum { τw, H0 y = s + τw, H 1 Robust detector T p(ρ) given by T p(ρ) H 1 H 0 γ p where y Ĉ 1 p (ρ)s T p(ρ) = y Ĉ 1 p (ρ)y p Ĉ 1 p (ρ)p Ĉ p(ρ) = (1 ρ) 1 n x i x i + ρi n 1 p. i=1 p x i Ĉp(ρ) 1 x i Objectives performance analysis find optimal regularization ρ parameter. 16 / 38

45 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 17/38 Axis 1 : Robust Estimation in Large Dimensions Théorie Theory Estimateur empirique Empirical Estimator Détecteur Detector P ( ptp(ρ) > γ) γ = γ = ρ γ = γ = ρ Figure False alarm rate P ( pt p(ρ) > γ), for p = 20 (left), p = 100 (right), s = p 1 2 [1,..., 1] T, [C p] ij = 0.7 i j, p/n = 1/2. 17 / 38

46 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 18/38 Axis 1 : Robust Estimation in Large Dimensions 10 0 Theory Detector p = 20 p = Figure False alarm rate P (T p(ˆρ p ) > Γ), ˆρ p best estimated ρ, for p = 20 and p = 100, s = p 1 2 [1,..., 1] T, p/n = 1/2 and [C p] ij = 0.7 i j. Γ 18 / 38

47 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 19/38 Axis 1 : Robust Estimation in Large Dimensions Theoretical Results (clickable title links) R. Couillet, M. McKay, Large Dimensional Analysis and Optimization of Robust Shrinkage Covariance Matrix Estimators, Elsevier Journal of Multivariate Analysis, vol. 131, pp , R. Couillet, F. Pascal, J. W. Silverstein, The Random Matrix Regime of Maronna s M-estimator with elliptically distributed samples, Elsevier Journal of Multivariate Analysis, vol. 139, pp , D. Morales-Jimenez, R. Couillet, M. McKay, Large Dimensional Analysis of Robust M-Estimators of Covariance with Outliers, IEEE Transactions on Signal Processing, vol. 63, no. 21, pp , R. Couillet, A. Kammoun, F. Pascal, Second order statistics of robust estimators of scatter. Application to GLRT detection for elliptical signals, Elsevier Journal of Multivariate Analysis, vol. 143, pp , Applications (clickable title links) R. Couillet, A. Kammoun, F. Pascal, Second order statistics of robust estimators of scatter. Application to GLRT detection for elliptical signals, Elsevier Journal of Multivariate Analysis, vol. 143, pp , R. Couillet, Robust spiked random matrices and a robust G-MUSIC estimator, Elsevier Journal of Multivariate Analysis, vol. 140, pp , / 38

48 Research Project : Learning in Large Dimensions/Axis 1 : Robust Estimation in Large Dimensions 20/38 Axis 1 : Robust Estimation in Large Dimensions L. Yang, R. Couillet, M. McKay, A Robust Statistics Approach to Minimum Variance Portfolio Optimization IEEE Transactions on Signal Processing, vol. 63, no. 24, pp , A. Kammoun, R. Couillet, F. Pascal, M.-S. Alouini, Optimal Design of the Adaptive Normalized Matched Filter Detector (submitted to) IEEE Transactions on Information Theory, 2015, arxiv Preprint / 38

49 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 21/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 21 / 38

50 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). 22 / 38

51 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). Spectral Algorithms 22 / 38

52 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). Spectral Algorithms Data often non linearly separable 22 / 38

53 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). Spectral Algorithms Data often non linearly separable Numerous methods based on kernel matrices K R n n, with (for instance) K ij = f ( x i x j 2) and f some function (often decreasing). 22 / 38

54 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 22/38 Axis 2 : Classification in Large Dimensions Baseline Scenario : x 1,..., x n R p belonging to k classes C 1,..., C k to identify in supervised manner : numerous labelled data (e.g., support vector machine) in unsupervised manner : no labelled data (e.g., kernel spectral clustering) in semi-supervised manner : (few) labelled data (e.g., harmonic function method). Spectral Algorithms Data often non linearly separable Numerous methods based on kernel matrices K R n n, with (for instance) K ij = f ( x i x j 2) and f some function (often decreasing). Spectral methods consist in : extracting dominating eigenvectors of K (spectral clustering) solve optimization problem based on K (support vector machine) linear functional of K (semi-supervised methods) 22 / 38

55 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives 23 / 38

56 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives Matrix K very difficult to analyze for small dimensional x i (even Gaussian) Qualitative understanding of the tools, difficult to optimize. 23 / 38

57 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives Matrix K very difficult to analyze for small dimensional x i (even Gaussian) Qualitative understanding of the tools, difficult to optimize. We need here : study K as n, p exploit the double-concentration to better understand K deduce quantitative performance of learning methods improve performances as well as methods. 23 / 38

58 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives Matrix K very difficult to analyze for small dimensional x i (even Gaussian) Qualitative understanding of the tools, difficult to optimize. We need here : study K as n, p exploit the double-concentration to better understand K deduce quantitative performance of learning methods improve performances as well as methods. Results and Perspectives Development of tools for large dimensional analysis of kernel matrices. Thorough analysis of spectral clustering performance in (large) Gaussian mixtures. Optimization for subspace clustering (new approach, undergoing patent by HUAWEI). 23 / 38

59 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 23/38 Axis 2 : Classification in Large Dimensions Problems and Objectives Matrix K very difficult to analyze for small dimensional x i (even Gaussian) Qualitative understanding of the tools, difficult to optimize. We need here : study K as n, p exploit the double-concentration to better understand K deduce quantitative performance of learning methods improve performances as well as methods. Results and Perspectives Development of tools for large dimensional analysis of kernel matrices. Thorough analysis of spectral clustering performance in (large) Gaussian mixtures. Optimization for subspace clustering (new approach, undergoing patent by HUAWEI). Generalization to semi-supervised case. Study of support vector machines in this context. Generalization to more realistic models, deeper comparison to real datasets. 23 / 38

60 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T nd T n D1n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p 24 / 38

61 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T n D T n D1n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. 24 / 38

62 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T n D T nd1 n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. Important Notations : 1 p J = [j 1,..., j k ], j a canonical vector of class C a. 24 / 38

63 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T nd T nd1 n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. Important Notations : M = [µ 1,..., µ k ], µ a = µa k n b b=1 n µ b. 24 / 38

64 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T n D T nd1 n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. Important [ Notations : ] t = p 1 tr C1,..., p 1 tr Ck, Ca = Ca k n b b=1 n C b. 24 / 38

65 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 24/38 Axis 2 : Classification in Large Dimensions Model : Consider Laplacian matrix (core of Ng Weiss Jordan algorithm) L = nd 1 2 KD 1 2 n D n1 T n D T nd1 n where D = diag(k1 n) and x i C a x i N (µ a, 1 Ca), Ca = na. p Theorem ([Couillet,Benaych 16] Equivalent to Laplacian Matrix) a.s. As n, p, under appropriate hypotheses, L ˆL 0 with ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n where τ = 2 n a a np tr Ca, W = [w 1,..., w n] (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] [ ] 1 B11 U = p J, Φ, ψ, B = ( 5f B 11 = M T (τ) M + 8f(τ) f ) (τ) 2f tt T f (τ) (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k. Important Notations : { } k T = 1 p tr C a C b a,b=1, C a = Ca k b=1 n b n C b. 24 / 38

66 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : 25 / 38

67 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure 25 / 38

68 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure important consequences to kernel choice (depends on derivatives f (l) (τ)) 25 / 38

69 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure important consequences to kernel choice (depends on derivatives f (l) (τ)) Application to real data : MNIST database (µ a, C a evaluated from full database) Figure Four leading eigenvectors of D 2 1 KD 1 2 for MNIST dataset (red), equivalent Gaussian model (black), and asymptotic results (blue). 25 / 38

70 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure important consequences to kernel choice (depends on derivatives f (l) (τ)) Application to real data : MNIST database (µ a, C a evaluated from full database) Figure Four leading eigenvectors of D 2 1 KD 1 2 for MNIST dataset (red), equivalent Gaussian model (black), and asymptotic results (blue). 25 / 38

71 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 25/38 Axis 2 : Classification in Large Dimensions Consequences : thorough understanding of eigenvector structure important consequences to kernel choice (depends on derivatives f (l) (τ)) Application to real data : MNIST database (µ a, C a evaluated from full database) Figure Four leading eigenvectors of D 2 1 KD 1 2 for MNIST dataset (red), equivalent Gaussian model (black), and asymptotic results (blue). 25 / 38

72 Research Project : Learning in Large Dimensions/Axis 2 : Classification in Large Dimensions 26/38 Axis 2 : Classification in Large Dimensions Eigenvector 2/Eigenvector 1 Eigenvector 3/Eigenvector 2 Figure 2D plot of eigenvectors of L, MNIST database. Theoretical 1-σ and 2-σ standard deviations in blue. Classes 0, 1, 2 in colors. 26 / 38

73 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 27/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 27 / 38

74 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) 28 / 38

75 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) dynamical model x t = S (W x t 1 + mu t + ηε t) with n-node network with connectivity W R n n activation function S internal noise ε t (biological model essentially) 28 / 38

76 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) dynamical model x t = S (W x t 1 + mu t + ηε t) with n-node network with connectivity W R n n activation function S internal noise ε t (biological model essentially) readout ω R n training only (depth-1 NN) by LS regression ω = (XX T ) 1 Xr for T -long training u r R T and X = [x 1,..., x T ]. 28 / 38

77 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) dynamical model x t = S (W x t 1 + mu t + ηε t) with n-node network with connectivity W R n n activation function S internal noise ε t (biological model essentially) readout ω R n training only (depth-1 NN) by LS regression ω = (XX T ) 1 Xr for T -long training u r R T and X = [x 1,..., x T ]. Performance Measures : quadratic errors in training and testing memory, training MSE = r X T ω for training couples u r R T. 28 / 38

78 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 28/38 Axis 3 : Random Matrices and Neural Networks Baseline Scenario : Study of large recurrent neural nets (RNN and ESN) dynamical model x t = S (W x t 1 + mu t + ηε t) with n-node network with connectivity W R n n activation function S internal noise ε t (biological model essentially) readout ω R n training only (depth-1 NN) by LS regression ω = (XX T ) 1 Xr for T -long training u r R T and X = [x 1,..., x T ]. Performance Measures : quadratic errors in training and testing memory, training MSE = r X T ω for training couples u r R T. generalization, test MSE = ˆr ˆX T ω for test couples û ˆr R ˆT, ω = ω(u, r). 28 / 38

79 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives 29 / 38

80 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative 29 / 38

81 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t 29 / 38

82 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t We need here : Study asymptotic performances as n, T, ˆT Understand effects of W -defining hyper-parameters Generalize study to more advanced models. 29 / 38

83 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t We need here : Study asymptotic performances as n, T, ˆT Understand effects of W -defining hyper-parameters Generalize study to more advanced models. Results and Perspectives 29 / 38

84 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t We need here : Study asymptotic performances as n, T, ˆT Understand effects of W -defining hyper-parameters Generalize study to more advanced models. Results and Perspectives Asymptotic deterministic equivalents for training and testing MSE Multiples new consequences and intuitions Proposition of improved structures for W 29 / 38

85 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 29/38 Axis 3 : Random Matrices and Neural Networks Problems and Objectives Performance evaluation essentially qualitative Difficulty linked to randomness in W and ε t We need here : Study asymptotic performances as n, T, ˆT Understand effects of W -defining hyper-parameters Generalize study to more advanced models. Results and Perspectives Asymptotic deterministic equivalents for training and testing MSE Multiples new consequences and intuitions Proposition of improved structures for W Generalization to non-linear setting Introduction of external memory, back-propagation Analogous study of deep networks, extreme ML, auto-encoders, etc. 29 / 38

86 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 30/38 Axis 3 : Random Matrices and Neural Networks Theorem ([Couillet,Wainrib 16] Training MSE for fixed W ) As n, T, with n/t c < 1, MSE = 1 ( T rt I T + R + 1 { } ) T 1 1 η 2 U T m T (W i ) T R 1 W j m U r + o(1) i,j=0 where U ij = u i j and R, R are solutions to { 1 ( )} T R = c n tr S R 1 i j, R 1 = i,j=1 T tr ( J q (I T + R) 1) S q q= with [J q ] ij δ i+q,j and S q k 0 W k+( q)+ (W k+q+ ) T. 30 / 38

87 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 30/38 Axis 3 : Random Matrices and Neural Networks Theorem ([Couillet,Wainrib 16] Training MSE for fixed W ) As n, T, with n/t c < 1, MSE = 1 ( T rt I T + R + 1 { } ) T 1 1 η 2 U T m T (W i ) T R 1 W j m U r + o(1) i,j=0 where U ij = u i j and R, R are solutions to { 1 ( )} T R = c n tr S R 1 i j, R 1 = i,j=1 T tr ( J q (I T + R) 1) S q q= with [J q ] ij δ i+q,j and S q k 0 W k+( q)+ (W k+q+ ) T. Corollaries : for c = 0 (S 0 = k 0 W k (W k ) T ), MSE = 1 ( T rt I T + 1 { } ) T 1 1 η 2 U T m T (W i ) T S 1 0 W j m U r + o(1) i,j=0 30 / 38

88 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 30/38 Axis 3 : Random Matrices and Neural Networks Theorem ([Couillet,Wainrib 16] Training MSE for fixed W ) As n, T, with n/t c < 1, MSE = 1 ( T rt I T + R + 1 { } ) T 1 1 η 2 U T m T (W i ) T R 1 W j m U r + o(1) i,j=0 where U ij = u i j and R, R are solutions to { 1 ( )} T R = c n tr S R 1 i j, R 1 = i,j=1 T tr ( J q (I T + R) 1) S q q= with [J q ] ij δ i+q,j and S q k 0 W k+( q)+ (W k+q+ ) T. Corollaries : for c = 0 (S 0 = k 0 W k (W k ) T ), MSE = 1 ( T rt I T + 1 { } ) T 1 1 η 2 U T m T (W i ) T S 1 0 W j m U r + o(1) i,j=0 for W = σz with Z Haar, m = 1 independent of W, MSE = (1 c) 1 ( T rt I T + 1 η 2 U T diag {(1 σ 2 )σ 2(i 1)} ) T 1 U r + o(1). i=1 30 / 38

89 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 31/38 Axis 3 : Random Matrices and Neural Networks n = 200, T = ˆT = 400 n = 400, T = ˆT = Test Test NMSE 10 3 Entr η 2 Monte Carlo Th. (W fixe) Th. (limite) Train Monte Carlo Th. (W fixed) Th. (limit) η 2 Figure Prediction for the Mackey Glass model, W = σz, σ =.9, Z Haar. 31 / 38

90 Research Project : Learning in Large Dimensions/Axis 3 : Random Matrices and Neural Networks 32/38 Axis 3 : Random Matrices and Neural Networks Consequences : Analysis suggests choice W = diag(w 1,..., W k ), W j = σ j Z j, Z j R n j n j Haar, leading to change k (1 σ 2 )σ 2τ j=1 MC(τ) c jσj 2τ k j=1 c j(1 σ. j 2) MC(τ; W ) MC(τ; W + 1 ) MC(τ; W + 2 ) MC(τ; W + 3 ) τ Figure Memory curve (MC) for W = diag(w 1, W 2, W 3), W j = σ jz j, Z j R n j n j Haar, σ 1 =.99, n 1/n =.01, σ 2 =.9, n 2/n =.1, and σ 3 =.5, n 3/n =.89. Matrices W + i defined by W + i = σ iz + i, with Z+ i R n n Haar. 32 / 38

91 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 33/38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large Dimensions Axis 2 : Classification in Large Dimensions Axis 3 : Random Matrices and Neural Networks Axis 4 : Graphs 33 / 38

92 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs 34 / 38

93 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) 34 / 38

94 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods 34 / 38

95 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods Tools : 34 / 38

96 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods Tools : methods based on adjacency A, Laplacian L, or modularity M L = D A M = A E[A]. e.g., if A ij Bern(q i q j ), M = A qq T. 34 / 38

97 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods Tools : methods based on adjacency A, Laplacian L, or modularity M L = D A M = A E[A]. e.g., if A ij Bern(q i q j ), M = A qq T. spectrum and eigenvectors of A, L fundamental to inference methods 34 / 38

98 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 34/38 Axis 4 : Graphs Baseline Scenario : Analysis of inference methods for large graphs community detection on realistic graphs (spectral methods) analysis of signal processing on graphs methods Tools : methods based on adjacency A, Laplacian L, or modularity M L = D A M = A E[A]. e.g., if A ij Bern(q i q j ), M = A qq T. spectrum and eigenvectors of A, L fundamental to inference methods optimization, regression, PCA, etc., based on spectral properties. 34 / 38

99 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives 35 / 38

100 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods 35 / 38

101 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied 35 / 38

102 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied We need here : develop and analyze community detection algorithms for realistic graphs analyze performances of signal processing on graphs methods 35 / 38

103 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied We need here : develop and analyze community detection algorithms for realistic graphs analyze performances of signal processing on graphs methods Results and Perspectives 35 / 38

104 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied We need here : develop and analyze community detection algorithms for realistic graphs analyze performances of signal processing on graphs methods Results and Perspectives New algorithms (and their analysis) for community detection with heterogeneous nodes 35 / 38

105 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 35/38 Axis 4 : Graphs Problems and Objectives Community detection based on homogeneous graph methods Signal processing oh graphs purely deterministically studied We need here : develop and analyze community detection algorithms for realistic graphs analyze performances of signal processing on graphs methods Results and Perspectives New algorithms (and their analysis) for community detection with heterogeneous nodes Exportation to signal processing on graphs problems. 35 / 38

106 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 36/38 Axis 4 : Graphs Model : graph G with n nodes and k classes, with for i C a, j C b, where C ab = 1 + n 1 2 Γ ab, Γ ab = O(1). A ij Bern (q i q j C ab ) 36 / 38

107 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 36/38 Axis 4 : Graphs Model : graph G with n nodes and k classes, with for i C a, j C b, where C ab = 1 + n 1 2 Γ ab, Γ ab = O(1). A ij Bern (q i q j C ab ) Limitations of classical approaches : Normalized modularity (ˆq = 1 n A1n) [ ] L = 1 diag(ˆq) 1 A ˆqˆqT n 1 n 1T n ˆq diag(ˆq) degree bias 0.1 class C 1 class C , corrected bias class C 1 class C ,000 Figure 2nd eigenvector of A (top) and 1st eigenvector of L (bottom) with bimodal q i, 2 classes, n = / 38

108 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 37/38 Axis 4 : Graphs Theorem ([Tiomoko Ali,Couillet 16] Limiting Deterministic Equivalent) p.s. As n, L L 0 with L = 1 [ ] 1 n m 2 D 1 XD 1 + UΛU T q 1 where D = diag({q i }), m q = lim n n i q i and [ ] J n 1 U = D nm 1 X1 n q [ (Ik 1 Λ = k c T )Γ(I k c1 T k ) 1 ] k 1 k 0 J = [j 1,..., j k ], j a = [0,..., 0, 1,..., 1, 0,..., 0] T R n canonical vector of class C a. 37 / 38

109 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 37/38 Axis 4 : Graphs Theorem ([Tiomoko Ali,Couillet 16] Limiting Deterministic Equivalent) p.s. As n, L L 0 with L = 1 [ ] 1 n m 2 D 1 XD 1 + UΛU T q 1 where D = diag({q i }), m q = lim n n i q i and [ ] J n 1 U = D nm 1 X1 n q [ (Ik 1 Λ = k c T )Γ(I k c1 T k ) 1 ] k 1 k 0 J = [j 1,..., j k ], j a = [0,..., 0, 1,..., 1, 0,..., 0] T R n canonical vector of class C a. Consequences : detection based on eigenvalues of Γ alignment of eigenvectors to j a 37 / 38

110 Research Project : Learning in Large Dimensions/Axis 4 : Graphs 38/38 End Thank you. 38 / 38

Random Matrices for Big Data Signal Processing and Machine Learning

Random Matrices for Big Data Signal Processing and Machine Learning Random Matrices for Big Data Signal Processing and Machine Learning (ICASSP 2017, New Orleans) Romain COUILLET and Hafiz TIOMOKO ALI CentraleSupélec, France March, 2017 1 / 153 Outline Basics of Random

More information

Random Matrices in Machine Learning

Random Matrices in Machine Learning Random Matrices in Machine Learning Romain COUILLET CentraleSupélec, University of ParisSaclay, France GSTATS IDEX DataScience Chair, GIPSA-lab, University Grenoble Alpes, France. June 21, 2018 1 / 113

More information

A Random Matrix Framework for BigData Machine Learning

A Random Matrix Framework for BigData Machine Learning A Random Matrix Framework for BigData Machine Learning (Groupe Deep Learning, DigiCosme) Romain COUILLET CentraleSupélec, France June, 2017 1 / 63 Outline Basics of Random Matrix Theory Motivation: Large

More information

A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications

A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications (EURECOM) Romain COUILLET CentraleSupélec, France June, 2017 1 / 80 Outline Basics of Random Matrix Theory

More information

Robust covariance matrices estimation and applications in signal processing

Robust covariance matrices estimation and applications in signal processing Robust covariance matrices estimation and applications in signal processing F. Pascal SONDRA/Supelec GDR ISIS Journée Estimation et traitement statistique en grande dimension May 16 th, 2013 FP (SONDRA/Supelec)

More information

Minimum variance portfolio optimization in the spiked covariance model

Minimum variance portfolio optimization in the spiked covariance model Minimum variance portfolio optimization in the spiked covariance model Liusha Yang, Romain Couillet, Matthew R Mckay To cite this version: Liusha Yang, Romain Couillet, Matthew R Mckay Minimum variance

More information

Random Matrix Theory for Neural Networks

Random Matrix Theory for Neural Networks Random Matrix Theory for Neural Networks Ph.D. Mid-Term Evaluation Zhenyu Liao Laboratoire des Signaux et Systèmes CentraleSupélec Université Paris-Saclay Salle sd.207, Bâtiment Bouygues Gif-sur-Yvette,

More information

On the Spectrum of Random Features Maps of High Dimensional Data

On the Spectrum of Random Features Maps of High Dimensional Data On the Spectrum of Random Features Maps of High Dimensional Data ICML 018, Stockholm, Sweden Zhenyu Liao, Romain Couillet LS, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair,

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Alan Edelman Department of Mathematics, Computer Science and AI Laboratories. E-mail: edelman@math.mit.edu N. Raj Rao Deparment

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

The Dynamics of Learning: A Random Matrix Approach

The Dynamics of Learning: A Random Matrix Approach The Dynamics of Learning: A Random Matrix Approach ICML 2018, Stockholm, Sweden Zhenyu Liao, Romain Couillet L2S, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab,

More information

RMT for whitening space correlation and applications to radar detection

RMT for whitening space correlation and applications to radar detection RMT for whitening space correlation and applications to radar detection Romain Couillet, Maria S Greco, Jean-Philippe Ovarlez, Frédéric Pascal To cite this version: Romain Couillet, Maria S Greco, Jean-Philippe

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

Performances of Low Rank Detectors Based on Random Matrix Theory with Application to STAP

Performances of Low Rank Detectors Based on Random Matrix Theory with Application to STAP Performances of Low Rank Detectors Based on Random Matrix Theory with Application to STAP Alice Combernoux, Frédéric Pascal, Marc Lesturgie, Guillaume Ginolhac To cite this version: Alice Combernoux, Frédéric

More information

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Peter Tiňo and Ali Rodan School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {P.Tino,

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

Robust adaptive detection of buried pipes using GPR

Robust adaptive detection of buried pipes using GPR Robust adaptive detection of buried pipes using GPR Q. Hoarau, G. Ginolhac, A. M. Atto, J.-P. Ovarlez, J.-M. Nicolas Université Savoie Mont Blanc - Centrale Supelec - Télécom ParisTech 12 th July 2016

More information

Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection

Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection Plug-in Measure-Transformed Quasi Likelihood Ratio Test for Random Signal Detection Nir Halay and Koby Todros Dept. of ECE, Ben-Gurion University of the Negev, Beer-Sheva, Israel February 13, 2017 1 /

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Tutorial on Machine Learning for Advanced Electronics

Tutorial on Machine Learning for Advanced Electronics Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling

More information

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

Manifold Coarse Graining for Online Semi-supervised Learning

Manifold Coarse Graining for Online Semi-supervised Learning for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,

More information

High Dimensional Minimum Risk Portfolio Optimization

High Dimensional Minimum Risk Portfolio Optimization High Dimensional Minimum Risk Portfolio Optimization Liusha Yang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology Centrale-Supélec June 26, 2015 Yang (HKUST)

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Empirical properties of large covariance matrices in finance

Empirical properties of large covariance matrices in finance Empirical properties of large covariance matrices in finance Ex: RiskMetrics Group, Geneva Since 2010: Swissquote, Gland December 2009 Covariance and large random matrices Many problems in finance require

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Reservoir Computing and Echo State Networks

Reservoir Computing and Echo State Networks An Introduction to: Reservoir Computing and Echo State Networks Claudio Gallicchio gallicch@di.unipi.it Outline Focus: Supervised learning in domain of sequences Recurrent Neural networks for supervised

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Gaussian Process Optimization with Mutual Information

Gaussian Process Optimization with Mutual Information Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Detection of Anomalies in Texture Images using Multi-Resolution Features

Detection of Anomalies in Texture Images using Multi-Resolution Features Detection of Anomalies in Texture Images using Multi-Resolution Features Electrical Engineering Department Supervisor: Prof. Israel Cohen Outline Introduction 1 Introduction Anomaly Detection Texture Segmentation

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Research Overview. Kristjan Greenewald. February 2, University of Michigan - Ann Arbor

Research Overview. Kristjan Greenewald. February 2, University of Michigan - Ann Arbor Research Overview Kristjan Greenewald University of Michigan - Ann Arbor February 2, 2016 2/17 Background and Motivation Want efficient statistical modeling of high-dimensional spatio-temporal data with

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

A spectral clustering algorithm based on Gram operators

A spectral clustering algorithm based on Gram operators A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

A SIRV-CFAR Adaptive Detector Exploiting Persymmetric Clutter Covariance Structure

A SIRV-CFAR Adaptive Detector Exploiting Persymmetric Clutter Covariance Structure A SIRV-CFAR Adaptive Detector Exploiting Persymmetric Clutter Covariance Structure Guilhem Pailloux 2 Jean-Philippe Ovarlez Frédéric Pascal 3 and Philippe Forster 2 ONERA - DEMR/TSI Chemin de la Hunière

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Novel spectrum sensing schemes for Cognitive Radio Networks

Novel spectrum sensing schemes for Cognitive Radio Networks Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Radu Balan February 5, 2018 Datasets diversity: Social Networks: Set of individuals ( agents, actors ) interacting with each other (e.g.,

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Back to the future: Radial Basis Function networks revisited

Back to the future: Radial Basis Function networks revisited Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Tropical Graph Signal Processing

Tropical Graph Signal Processing Tropical Graph Signal Processing Vincent Gripon To cite this version: Vincent Gripon. Tropical Graph Signal Processing. 2017. HAL Id: hal-01527695 https://hal.archives-ouvertes.fr/hal-01527695v2

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Patch similarity under non Gaussian noise

Patch similarity under non Gaussian noise The 18th IEEE International Conference on Image Processing Brussels, Belgium, September 11 14, 011 Patch similarity under non Gaussian noise Charles Deledalle 1, Florence Tupin 1, Loïc Denis 1 Institut

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Predicting Graph Labels using Perceptron. Shuang Song

Predicting Graph Labels using Perceptron. Shuang Song Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction

More information

A GLRT FOR RADAR DETECTION IN THE PRESENCE OF COMPOUND-GAUSSIAN CLUTTER AND ADDITIVE WHITE GAUSSIAN NOISE. James H. Michels. Bin Liu, Biao Chen

A GLRT FOR RADAR DETECTION IN THE PRESENCE OF COMPOUND-GAUSSIAN CLUTTER AND ADDITIVE WHITE GAUSSIAN NOISE. James H. Michels. Bin Liu, Biao Chen A GLRT FOR RADAR DETECTION IN THE PRESENCE OF COMPOUND-GAUSSIAN CLUTTER AND ADDITIVE WHITE GAUSSIAN NOISE Bin Liu, Biao Chen Syracuse University Dept of EECS, Syracuse, NY 3244 email : biliu{bichen}@ecs.syr.edu

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Exploiting Sparse Non-Linear Structure in Astronomical Data

Exploiting Sparse Non-Linear Structure in Astronomical Data Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

A central limit theorem for an omnibus embedding of random dot product graphs

A central limit theorem for an omnibus embedding of random dot product graphs A central limit theorem for an omnibus embedding of random dot product graphs Keith Levin 1 with Avanti Athreya 2, Minh Tang 2, Vince Lyzinski 3 and Carey E. Priebe 2 1 University of Michigan, 2 Johns

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information