An Introduction to Nonlinear Principal Component Analysis

Size: px
Start display at page:

Download "An Introduction to Nonlinear Principal Component Analysis"

Transcription

1 An Introduction tononlinearprincipal Component Analysis p. 1/33 An Introduction to Nonlinear Principal Component Analysis Adam Monahan School of Earth and Ocean Sciences University of Victoria

2 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction

3 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction Principal Component Analysis

4 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction Principal Component Analysis Nonlinear PCA

5 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction Principal Component Analysis Nonlinear PCA theory

6 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction Principal Component Analysis Nonlinear PCA theory implementation

7 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction Principal Component Analysis Nonlinear PCA theory implementation Applications of NLPCA

8 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction Principal Component Analysis Nonlinear PCA theory implementation Applications of NLPCA Lorenz attractor

9 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction Principal Component Analysis Nonlinear PCA theory implementation Applications of NLPCA Lorenz attractor NH Tropospheric LFV

10 An Introduction tononlinearprincipal Component Analysis p. 2/33 Overview Dimensionality reduction Principal Component Analysis Nonlinear PCA theory implementation Applications of NLPCA Lorenz attractor NH Tropospheric LFV Conclusions

11 An Introduction tononlinearprincipal Component Analysis p. 3/33 Dimensionality Reduction Climate datasets made up of time series at individual stations/geographical locations

12 An Introduction tononlinearprincipal Component Analysis p. 3/33 Dimensionality Reduction Climate datasets made up of time series at individual stations/geographical locations Typical dataset has P O(10 3 ) time series

13 An Introduction tononlinearprincipal Component Analysis p. 3/33 Dimensionality Reduction Climate datasets made up of time series at individual stations/geographical locations Typical dataset has P O(10 3 ) time series Organised structure in atmosphere/ocean flows

14 An Introduction tononlinearprincipal Component Analysis p. 3/33 Dimensionality Reduction Climate datasets made up of time series at individual stations/geographical locations Typical dataset has P O(10 3 ) time series Organised structure in atmosphere/ocean flows time series at different locations not independent

15 An Introduction tononlinearprincipal Component Analysis p. 3/33 Dimensionality Reduction Climate datasets made up of time series at individual stations/geographical locations Typical dataset has P O(10 3 ) time series Organised structure in atmosphere/ocean flows time series at different locations not independent data does not fill out isotropic cloud of points in R P, but clusters around lower-dimensional surface (reflecting the attractor )

16 An Introduction tononlinearprincipal Component Analysis p. 3/33 Dimensionality Reduction Climate datasets made up of time series at individual stations/geographical locations Typical dataset has P O(10 3 ) time series Organised structure in atmosphere/ocean flows time series at different locations not independent data does not fill out isotropic cloud of points in R P, but clusters around lower-dimensional surface (reflecting the attractor ) Goal of dimensionality reduction in climate diagnostics is to characterise such structures in climate datasets

17 An Introduction tononlinearprincipal Component Analysis p. 4/33 Dimensionality Reduction Realising this goal has both theoretical and practical difficulties:

18 An Introduction tononlinearprincipal Component Analysis p. 4/33 Dimensionality Reduction Realising this goal has both theoretical and practical difficulties: Theoretical:

19 An Introduction tononlinearprincipal Component Analysis p. 4/33 Dimensionality Reduction Realising this goal has both theoretical and practical difficulties: Theoretical: what is the precise definition of structure?

20 An Introduction tononlinearprincipal Component Analysis p. 4/33 Dimensionality Reduction Realising this goal has both theoretical and practical difficulties: Theoretical: what is the precise definition of structure? how to formulate appropriate statistical model?

21 An Introduction tononlinearprincipal Component Analysis p. 4/33 Dimensionality Reduction Realising this goal has both theoretical and practical difficulties: Theoretical: what is the precise definition of structure? how to formulate appropriate statistical model? Practical:

22 An Introduction tononlinearprincipal Component Analysis p. 4/33 Dimensionality Reduction Realising this goal has both theoretical and practical difficulties: Theoretical: what is the precise definition of structure? how to formulate appropriate statistical model? Practical: many important observational climate datasets quite short, with O(10) O(1000) statistical degrees of freedom

23 An Introduction tononlinearprincipal Component Analysis p. 4/33 Dimensionality Reduction Realising this goal has both theoretical and practical difficulties: Theoretical: what is the precise definition of structure? how to formulate appropriate statistical model? Practical: many important observational climate datasets quite short, with O(10) O(1000) statistical degrees of freedom what degree of structure can be robustly diagnosed with existing data?

24 An Introduction tononlinearprincipal Component Analysis p. 5/33 Principal Component Analysis A classical approach to dimensionality principal component analysis (PCA)

25 An Introduction tononlinearprincipal Component Analysis p. 5/33 Principal Component Analysis A classical approach to dimensionality principal component analysis (PCA) Look for M-dimensional hyperplane approximation, optimal in least-squares sense X(t) = M k=1 X(t), e k e k + ɛ(t)

26 An Introduction tononlinearprincipal Component Analysis p. 5/33 Principal Component Analysis A classical approach to dimensionality principal component analysis (PCA) Look for M-dimensional hyperplane approximation, optimal in least-squares sense X(t) = M k=1 X(t), e k e k + ɛ(t) minimising E { ɛ 2 }

27 An Introduction tononlinearprincipal Component Analysis p. 5/33 Principal Component Analysis A classical approach to dimensionality principal component analysis (PCA) Look for M-dimensional hyperplane approximation, optimal in least-squares sense X(t) = M k=1 X(t), e k e k + ɛ(t) minimising E { ɛ 2 } inner product often (not always) simple dot product

28 An Introduction tononlinearprincipal Component Analysis p. 5/33 Principal Component Analysis A classical approach to dimensionality principal component analysis (PCA) Look for M-dimensional hyperplane approximation, optimal in least-squares sense X(t) = M k=1 X(t), e k e k + ɛ(t) minimising E { ɛ 2 } inner product often (not always) simple dot product Vectors e k are the empirical orthogonal functions (EOFs)

29 Principal Component Analysis An Introduction tononlinearprincipal Component Analysis p. 6/33

30 An Introduction tononlinearprincipal Component Analysis p. 7/33 Principal Component Analysis Operationally, EOFs are found as eigenvectors of covariance matrix (in appropriate norm)

31 An Introduction tononlinearprincipal Component Analysis p. 7/33 Principal Component Analysis Operationally, EOFs are found as eigenvectors of covariance matrix (in appropriate norm) PCA optimally efficient characterisation of Gaussian data

32 An Introduction tononlinearprincipal Component Analysis p. 7/33 Principal Component Analysis Operationally, EOFs are found as eigenvectors of covariance matrix (in appropriate norm) PCA optimally efficient characterisation of Gaussian data More generally: PCA provides optimally parsimonious data compression for any dataset whose distribution lies along orthogonal axes

33 An Introduction tononlinearprincipal Component Analysis p. 7/33 Principal Component Analysis Operationally, EOFs are found as eigenvectors of covariance matrix (in appropriate norm) PCA optimally efficient characterisation of Gaussian data More generally: PCA provides optimally parsimonious data compression for any dataset whose distribution lies along orthogonal axes But what if the underlying low-dimensional structure is curved rather than straight? (cigars vs. bananas)

34 Nonlinear Low-Dimensional Structure An Introduction tononlinearprincipal Component Analysis p. 8/33

35 An Introduction tononlinearprincipal Component Analysis p. 9/33 Nonlinear PCA An approach to diagnosing nonlinear low-dimensional structure is Nonlinear PCA (NLPCA)

36 An Introduction tononlinearprincipal Component Analysis p. 9/33 Nonlinear PCA An approach to diagnosing nonlinear low-dimensional structure is Nonlinear PCA (NLPCA) Goal: find functions (with M < P ) s f : R P R M, f : R M R P such that X(t) = (f s f ) (X(t)) + ɛ(t) where

37 An Introduction tononlinearprincipal Component Analysis p. 9/33 Nonlinear PCA An approach to diagnosing nonlinear low-dimensional structure is Nonlinear PCA (NLPCA) Goal: find functions (with M < P ) s f : R P R M, f : R M R P such that X(t) = (f s f ) (X(t)) + ɛ(t) where E { ɛ 2 } is minimised

38 An Introduction tononlinearprincipal Component Analysis p. 9/33 Nonlinear PCA An approach to diagnosing nonlinear low-dimensional structure is Nonlinear PCA (NLPCA) Goal: find functions (with M < P ) s f : R P R M, f : R M R P such that X(t) = (f s f ) (X(t)) + ɛ(t) where E { ɛ 2 } is minimised f(λ) approximation manifold

39 An Introduction tononlinearprincipal Component Analysis p. 9/33 Nonlinear PCA An approach to diagnosing nonlinear low-dimensional structure is Nonlinear PCA (NLPCA) Goal: find functions (with M < P ) s f : R P R M, f : R M R P such that X(t) = (f s f ) (X(t)) + ɛ(t) where E { ɛ 2 } is minimised f(λ) approximation manifold λ(t) = s f (X(t)) manifold parameterisation (time series)

40 An Introduction tononlinearprincipal Component Analysis p. 10/33 Nonlinear PCA s f X(t) λ t λ(t) = s f ( X(t)) ^ X(t) = ( f o s f )( X(t)) f From Monahan, Fyfe, and Pandolfo (2003)

41 An Introduction tononlinearprincipal Component Analysis p. 11/33 Nonlinear PCA As with PCA, fraction of variance explained is a measure of quality of approximation

42 An Introduction tononlinearprincipal Component Analysis p. 11/33 Nonlinear PCA As with PCA, fraction of variance explained is a measure of quality of approximation PCA is a special case of NLPCA

43 An Introduction tononlinearprincipal Component Analysis p. 11/33 Nonlinear PCA As with PCA, fraction of variance explained is a measure of quality of approximation PCA is a special case of NLPCA When implemented, NLPCA should reduce to PCA if:

44 An Introduction tononlinearprincipal Component Analysis p. 11/33 Nonlinear PCA As with PCA, fraction of variance explained is a measure of quality of approximation PCA is a special case of NLPCA When implemented, NLPCA should reduce to PCA if: data is Gaussian

45 An Introduction tononlinearprincipal Component Analysis p. 11/33 Nonlinear PCA As with PCA, fraction of variance explained is a measure of quality of approximation PCA is a special case of NLPCA When implemented, NLPCA should reduce to PCA if: data is Gaussian not enough data is available to robustly characterise non-gaussian structure

46 An Introduction tononlinearprincipal Component Analysis p. 12/33 NLPCA: Implementation Implemented NLPCA using neural networks (convenient, not necessary)

47 An Introduction tononlinearprincipal Component Analysis p. 12/33 NLPCA: Implementation Implemented NLPCA using neural networks (convenient, not necessary) Parameter estimation more difficult than for PCA

48 An Introduction tononlinearprincipal Component Analysis p. 12/33 NLPCA: Implementation Implemented NLPCA using neural networks (convenient, not necessary) Parameter estimation more difficult than for PCA PCA model is linear in statistical parameters: Y = MX so variational problem has unique analytic solution

49 An Introduction tononlinearprincipal Component Analysis p. 12/33 NLPCA: Implementation Implemented NLPCA using neural networks (convenient, not necessary) Parameter estimation more difficult than for PCA PCA model is linear in statistical parameters: Y = MX so variational problem has unique analytic solution NLPCA model nonlinear in model parameters, so solution

50 An Introduction tononlinearprincipal Component Analysis p. 12/33 NLPCA: Implementation Implemented NLPCA using neural networks (convenient, not necessary) Parameter estimation more difficult than for PCA PCA model is linear in statistical parameters: Y = MX so variational problem has unique analytic solution NLPCA model nonlinear in model parameters, so solution may not be unique

51 An Introduction tononlinearprincipal Component Analysis p. 12/33 NLPCA: Implementation Implemented NLPCA using neural networks (convenient, not necessary) Parameter estimation more difficult than for PCA PCA model is linear in statistical parameters: Y = MX so variational problem has unique analytic solution NLPCA model nonlinear in model parameters, so solution may not be unique must be found through numerical minimisation

52 An Introduction tononlinearprincipal Component Analysis p. 13/33 NLPCA: Parameter Estimation Two fundamental issues regarding parameter estimation common to all statistical models:

53 An Introduction tononlinearprincipal Component Analysis p. 13/33 NLPCA: Parameter Estimation Two fundamental issues regarding parameter estimation common to all statistical models: Reproducibility

54 An Introduction tononlinearprincipal Component Analysis p. 13/33 NLPCA: Parameter Estimation Two fundamental issues regarding parameter estimation common to all statistical models: Reproducibility model must be robust to the introduction of new data

55 An Introduction tononlinearprincipal Component Analysis p. 13/33 NLPCA: Parameter Estimation Two fundamental issues regarding parameter estimation common to all statistical models: Reproducibility model must be robust to the introduction of new data new observations shouldn t fundamentally change model

56 An Introduction tononlinearprincipal Component Analysis p. 13/33 NLPCA: Parameter Estimation Two fundamental issues regarding parameter estimation common to all statistical models: Reproducibility model must be robust to the introduction of new data new observations shouldn t fundamentally change model Classifiability:

57 An Introduction tononlinearprincipal Component Analysis p. 13/33 NLPCA: Parameter Estimation Two fundamental issues regarding parameter estimation common to all statistical models: Reproducibility model must be robust to the introduction of new data new observations shouldn t fundamentally change model Classifiability: model must be robust to details of optimisation procedure

58 An Introduction tononlinearprincipal Component Analysis p. 13/33 NLPCA: Parameter Estimation Two fundamental issues regarding parameter estimation common to all statistical models: Reproducibility model must be robust to the introduction of new data new observations shouldn t fundamentally change model Classifiability: model must be robust to details of optimisation procedure model shouldn t depend on initial parameter values

59 An Introduction tononlinearprincipal Component Analysis p. 14/33 NLPCA: Synthetic Gaussian Data Synthetic Gaussian data

60 Applications of NLPCA: Lorenz Attractor Scatterplots An Introduction tononlinearprincipal Component Analysis p. 15/33

61 Applications of NLPCA: Lorenz Attractor 1D PCA approximation (60%) An Introduction tononlinearprincipal Component Analysis p. 16/33

62 An Introduction tononlinearprincipal Component Analysis p. 17/33 Applications of NLPCA: Lorenz Attractor 1D NLPCA approximation (76%)

63 Applications of NLPCA: Lorenz Attractor 2D PCA approximation (94%) An Introduction tononlinearprincipal Component Analysis p. 18/33

64 An Introduction tononlinearprincipal Component Analysis p. 19/33 Applications of NLPCA: Lorenz Attractor 2D NLPCA approximation (97%)

65 An Introduction tononlinearprincipal Component Analysis p. 20/33 Applications of NLPCA: NH Tropospheric LFV EOF 1 EOF 2 10-day lowpass-filtered 500 hpa geopotential height EOFs

66 An Introduction tononlinearprincipal Component Analysis p. 21/33 Applications of NLPCA: NH Tropospheric LFV 1D NLPCA Approximation: spatial structure (PCA: 14.8%; NLPCA 18.4%)

67 An Introduction tononlinearprincipal Component Analysis p. 22/33 Applications of NLPCA: NH Tropospheric LFV 1D NLPCA Approximation: pdf of time series

68 An Introduction tononlinearprincipal Component Analysis p. 23/33 Applications of NLPCA: NH Tropospheric LFV 1D NLPCA Approximation: regime maps

69 An Introduction tononlinearprincipal Component Analysis p. 24/33 Applications of NLPCA: NH Tropospheric LFV 1D NLPCA Approximation: interannual variability

70 An Introduction tononlinearprincipal Component Analysis p. 25/33 NLPCA: Limitations and Drawbacks Parameter estimation in NLPCA (as in any nonlinear statistical model) must be done very carefully to ensure robust approximation

71 An Introduction tononlinearprincipal Component Analysis p. 25/33 NLPCA: Limitations and Drawbacks Parameter estimation in NLPCA (as in any nonlinear statistical model) must be done very carefully to ensure robust approximation analysis time-consuming, data hungry

72 An Introduction tononlinearprincipal Component Analysis p. 25/33 NLPCA: Limitations and Drawbacks Parameter estimation in NLPCA (as in any nonlinear statistical model) must be done very carefully to ensure robust approximation analysis time-consuming, data hungry insufficiently careful analysis leads to spurious results (e.g. Christiansen, 2005)

73 An Introduction tononlinearprincipal Component Analysis p. 25/33 NLPCA: Limitations and Drawbacks Parameter estimation in NLPCA (as in any nonlinear statistical model) must be done very carefully to ensure robust approximation analysis time-consuming, data hungry insufficiently careful analysis leads to spurious results (e.g. Christiansen, 2005) Theoretical underpinning of NLPCA is weak

74 An Introduction tononlinearprincipal Component Analysis p. 25/33 NLPCA: Limitations and Drawbacks Parameter estimation in NLPCA (as in any nonlinear statistical model) must be done very carefully to ensure robust approximation analysis time-consuming, data hungry insufficiently careful analysis leads to spurious results (e.g. Christiansen, 2005) Theoretical underpinning of NLPCA is weak no rigorous theory of sampling variability

75 An Introduction tononlinearprincipal Component Analysis p. 25/33 NLPCA: Limitations and Drawbacks Parameter estimation in NLPCA (as in any nonlinear statistical model) must be done very carefully to ensure robust approximation analysis time-consuming, data hungry insufficiently careful analysis leads to spurious results (e.g. Christiansen, 2005) Theoretical underpinning of NLPCA is weak no rigorous theory of sampling variability Information theory may provide new tools with

76 An Introduction tononlinearprincipal Component Analysis p. 25/33 NLPCA: Limitations and Drawbacks Parameter estimation in NLPCA (as in any nonlinear statistical model) must be done very carefully to ensure robust approximation analysis time-consuming, data hungry insufficiently careful analysis leads to spurious results (e.g. Christiansen, 2005) Theoretical underpinning of NLPCA is weak no rigorous theory of sampling variability Information theory may provide new tools with better sampling properties

77 An Introduction tononlinearprincipal Component Analysis p. 25/33 NLPCA: Limitations and Drawbacks Parameter estimation in NLPCA (as in any nonlinear statistical model) must be done very carefully to ensure robust approximation analysis time-consuming, data hungry insufficiently careful analysis leads to spurious results (e.g. Christiansen, 2005) Theoretical underpinning of NLPCA is weak no rigorous theory of sampling variability Information theory may provide new tools with better sampling properties better theoretical basis

78 An Introduction tononlinearprincipal Component Analysis p. 26/33 Conclusions Traditional PCA optimal for dimensionality reduction only if data distribution falls along orthogonal axes

79 An Introduction tononlinearprincipal Component Analysis p. 26/33 Conclusions Traditional PCA optimal for dimensionality reduction only if data distribution falls along orthogonal axes Can define nonlinear generalisation, NLPCA, which can robustly characterise nonlinear low-dimensional structure in datasets

80 An Introduction tononlinearprincipal Component Analysis p. 26/33 Conclusions Traditional PCA optimal for dimensionality reduction only if data distribution falls along orthogonal axes Can define nonlinear generalisation, NLPCA, which can robustly characterise nonlinear low-dimensional structure in datasets NLPCA approximations can provide a fundamentally different characterisation of data than PCA approximations

81 An Introduction tononlinearprincipal Component Analysis p. 26/33 Conclusions Traditional PCA optimal for dimensionality reduction only if data distribution falls along orthogonal axes Can define nonlinear generalisation, NLPCA, which can robustly characterise nonlinear low-dimensional structure in datasets NLPCA approximations can provide a fundamentally different characterisation of data than PCA approximations Implementation of NLPCA difficult and lacking in underlying theory; represents a first attempt at a big (and challenging) problem

82 An Introduction tononlinearprincipal Component Analysis p. 27/33 Acknowledgements William Hsieh (UBC) Lionel Pandolfo (UBC) John Fyfe (CCCma) Qiaobin Teng (CCCma) Benyang Tang (JPL)

83 An Introduction tononlinearprincipal Component Analysis p. 28/33 Parameter Estimation in NLPCA An ensemble approach was taken

84 An Introduction tononlinearprincipal Component Analysis p. 28/33 Parameter Estimation in NLPCA An ensemble approach was taken For a large number N ( 50) of trials:

85 An Introduction tononlinearprincipal Component Analysis p. 28/33 Parameter Estimation in NLPCA An ensemble approach was taken For a large number N ( 50) of trials: data was randomly split into training and validation sets (taking autocorrelation into account)

86 An Introduction tononlinearprincipal Component Analysis p. 28/33 Parameter Estimation in NLPCA An ensemble approach was taken For a large number N ( 50) of trials: data was randomly split into training and validation sets (taking autocorrelation into account) a random initial parameter set was selected

87 An Introduction tononlinearprincipal Component Analysis p. 28/33 Parameter Estimation in NLPCA An ensemble approach was taken For a large number N ( 50) of trials: data was randomly split into training and validation sets (taking autocorrelation into account) a random initial parameter set was selected For each ensemble member, iterative minimisation procedure carried out until either:

88 An Introduction tononlinearprincipal Component Analysis p. 28/33 Parameter Estimation in NLPCA An ensemble approach was taken For a large number N ( 50) of trials: data was randomly split into training and validation sets (taking autocorrelation into account) a random initial parameter set was selected For each ensemble member, iterative minimisation procedure carried out until either: error over training data stopped changing

89 An Introduction tononlinearprincipal Component Analysis p. 28/33 Parameter Estimation in NLPCA An ensemble approach was taken For a large number N ( 50) of trials: data was randomly split into training and validation sets (taking autocorrelation into account) a random initial parameter set was selected For each ensemble member, iterative minimisation procedure carried out until either: error over training data stopped changing error over validation data started increasing

90 An Introduction tononlinearprincipal Component Analysis p. 28/33 Parameter Estimation in NLPCA An ensemble approach was taken For a large number N ( 50) of trials: data was randomly split into training and validation sets (taking autocorrelation into account) a random initial parameter set was selected For each ensemble member, iterative minimisation procedure carried out until either: error over training data stopped changing error over validation data started increasing Method does not look for global error minimum

91 An Introduction tononlinearprincipal Component Analysis p. 29/33 Parameter Estimation in NLPCA Ensemble member becomes candidate model if ɛ 2 validation ɛ 2 training

92 An Introduction tononlinearprincipal Component Analysis p. 29/33 Parameter Estimation in NLPCA Ensemble member becomes candidate model if ɛ 2 validation ɛ 2 training Candidate models compared

93 An Introduction tononlinearprincipal Component Analysis p. 29/33 Parameter Estimation in NLPCA Ensemble member becomes candidate model if ɛ 2 validation ɛ 2 training Candidate models compared if they share same shape and orientation

94 An Introduction tononlinearprincipal Component Analysis p. 29/33 Parameter Estimation in NLPCA Ensemble member becomes candidate model if ɛ 2 validation ɛ 2 training Candidate models compared if they share same shape and orientation approximation is robust

95 An Introduction tononlinearprincipal Component Analysis p. 29/33 Parameter Estimation in NLPCA Ensemble member becomes candidate model if ɛ 2 validation ɛ 2 training Candidate models compared if they share same shape and orientation approximation is robust if they differ in shape and orientation

96 An Introduction tononlinearprincipal Component Analysis p. 29/33 Parameter Estimation in NLPCA Ensemble member becomes candidate model if ɛ 2 validation ɛ 2 training Candidate models compared if they share same shape and orientation approximation is robust if they differ in shape and orientation approximation is not robust

97 An Introduction tononlinearprincipal Component Analysis p. 29/33 Parameter Estimation in NLPCA Ensemble member becomes candidate model if ɛ 2 validation ɛ 2 training Candidate models compared if they share same shape and orientation approximation is robust if they differ in shape and orientation approximation is not robust If approximation not robust, model simplified & procedure repeated until robust model found

98 An Introduction tononlinearprincipal Component Analysis p. 30/33 Parameter Estimation in NLPCA Procedure will ultimately yield PCA solution if no robust non-gaussian structure present

99 An Introduction tononlinearprincipal Component Analysis p. 30/33 Parameter Estimation in NLPCA Procedure will ultimately yield PCA solution if no robust non-gaussian structure present Such a careful procedure necessary to avoid finding spurious non-gaussian structure

100 Applications of NLPCA: Tropical Pacific SST EOF Patterns An Introduction tononlinearprincipal Component Analysis p. 31/33

101 An Introduction tononlinearprincipal Component Analysis p. 32/33 Applications of NLPCA: Tropical Pacific SST 1D NLPCA Approximation: spatial structure

102 An Introduction tononlinearprincipal Component Analysis p. 33/33 Applications of NLPCA: Tropical Pacific SST 1D NLPCA Approximation: spatial structure

A cautionary note on the use of Nonlinear Principal Component Analysis to identify circulation regimes.

A cautionary note on the use of Nonlinear Principal Component Analysis to identify circulation regimes. A cautionary note on the use of Nonlinear Principal Component Analysis to identify circulation regimes. Bo Christiansen Danish Meteorological Institute, Copenhagen, Denmark B. Christiansen, Danish Meteorological

More information

Kernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA

Kernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA Kernel-Based Principal Component Analysis (KPCA) and Its Applications 4//009 Based on slides originaly from Dr. John Tan 1 Nonlinear PCA Natural phenomena are usually nonlinear and standard PCA is intrinsically

More information

Nonlinear Principal Component Analysis by Neural Networks: Theory and Application to the Lorenz System

Nonlinear Principal Component Analysis by Neural Networks: Theory and Application to the Lorenz System 821 Nonlinear Principal Component Analysis by Neural Networks: Theory and Application to the Lorenz System ADAM H. MONAHAN Oceanography Unit, Department of Earth and Ocean Sciences, and Crisis Points Group,

More information

Chap.11 Nonlinear principal component analysis [Book, Chap. 10]

Chap.11 Nonlinear principal component analysis [Book, Chap. 10] Chap.11 Nonlinear principal component analysis [Book, Chap. 1] We have seen machine learning methods nonlinearly generalizing the linear regression method. Now we will examine ways to nonlinearly generalize

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Nonlinear principal component analysis of noisy data

Nonlinear principal component analysis of noisy data Nonlinear principal component 1 Nonlinear principal component analysis of noisy data William W. Hsieh Dept. of Earth and Ocean Sciences, University of British Columbia Vancouver, BC V6T 1Z4, Canada Abstract

More information

Nonlinear principal component analysis by neural networks

Nonlinear principal component analysis by neural networks Nonlinear principal component analysis by neural networks William W. Hsieh Oceanography/EOS, University of British Columbia, Vancouver, B.C. V6T Z, Canada tel: (6) 8-8, fax: (6) 8-69 email: whsieh@eos.ubc.ca

More information

The Vertical Structure of Wintertime Climate Regimes of the Northern Hemisphere Extratropical Atmosphere

The Vertical Structure of Wintertime Climate Regimes of the Northern Hemisphere Extratropical Atmosphere 2005 The Vertical Structure of Wintertime Climate Regimes of the Northern Hemisphere Extratropical Atmosphere ADAM HUGH MONAHAN Institut für Mathematik, Humboldt-Universität zu Berlin, Berlin, Germany

More information

Modelling Interactions Between Weather and Climate

Modelling Interactions Between Weather and Climate Modelling Interactions Between Weather and Climate p. 1/33 Modelling Interactions Between Weather and Climate Adam Monahan & Joel Culina monahana@uvic.ca & culinaj@uvic.ca School of Earth and Ocean Sciences,

More information

e 2 e 1 (a) (b) (d) (c)

e 2 e 1 (a) (b) (d) (c) 2.13 Rotated principal component analysis [Book, Sect. 2.2] Fig.: PCA applied to a dataset composed of (a) 1 cluster, (b) 2 clusters, (c) and (d) 4 clusters. In (c), an orthonormal rotation and (d) an

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Empirical Orthogonal Functions: The Medium is the Message

Empirical Orthogonal Functions: The Medium is the Message Empirical Orthogonal Functions: The Medium is the Message Adam H. Monahan 1,2, John C. Fyfe 3, Maarten H.P. Ambaum 4, David B. Stephenson 5, and Gerald R. North 6 1 School of Earth and Ocean Sciences University

More information

Bearing fault diagnosis based on EMD-KPCA and ELM

Bearing fault diagnosis based on EMD-KPCA and ELM Bearing fault diagnosis based on EMD-KPCA and ELM Zihan Chen, Hang Yuan 2 School of Reliability and Systems Engineering, Beihang University, Beijing 9, China Science and Technology on Reliability & Environmental

More information

Information Theoretic Measures of Dependence, Compactness, and Non-Gaussianity for Multivariate Probability Distributions

Information Theoretic Measures of Dependence, Compactness, and Non-Gaussianity for Multivariate Probability Distributions Manuscript prepared for Nonlin. Processes Geophys. with version.3 of the L A TEX class copernicus.cls. Date: 8 January 9 Information Theoretic Measures of Dependence, Compactness, and Non-Gaussianity for

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Model error and parameter estimation

Model error and parameter estimation Model error and parameter estimation Chiara Piccolo and Mike Cullen ECMWF Annual Seminar, 11 September 2018 Summary The application of interest is atmospheric data assimilation focus on EDA; A good ensemble

More information

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI tailored seasonal forecasts why do we make probabilistic forecasts? to reduce our uncertainty about the (unknown) future

More information

Expectation-maximization analysis of spatial time series

Expectation-maximization analysis of spatial time series Nonlin. Processes Geophys., 1, 73 77, 7 www.nonlin-processes-geophys.net/1/73/7/ Author(s) 7. This work is licensed under a Creative Commons License. Nonlinear Processes in Geophysics Expectation-maximization

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

On Sampling Errors in Empirical Orthogonal Functions

On Sampling Errors in Empirical Orthogonal Functions 3704 J O U R N A L O F C L I M A T E VOLUME 18 On Sampling Errors in Empirical Orthogonal Functions ROBERTA QUADRELLI, CHRISTOPHER S. BRETHERTON, AND JOHN M. WALLACE University of Washington, Seattle,

More information

Reduction of complex models using data-mining and nonlinear projection techniques

Reduction of complex models using data-mining and nonlinear projection techniques Reduction of complex models using data-mining and nonlinear projection techniques Bernhardt, K. a, Wirtz, K.W. a Institute for Chemistry and Biology of the Marine Environment (ICBM) Carl-von-Ossietzky

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Chapter 6. Random Processes

Chapter 6. Random Processes Chapter 6 Random Processes Random Process A random process is a time-varying function that assigns the outcome of a random experiment to each time instant: X(t). For a fixed (sample path): a random process

More information

Q & A on Trade-off between intensity and frequency of global tropical cyclones

Q & A on Trade-off between intensity and frequency of global tropical cyclones SUPPLEMENTARY INFORMATION DOI: 10.1038/NCLIMATE2646 Q & A on Trade-off between intensity and frequency of global tropical cyclones Nam-Young Kang & James B. Elsner List of questions 1. What is new in this

More information

Ocean data assimilation for reanalysis

Ocean data assimilation for reanalysis Ocean data assimilation for reanalysis Matt Martin. ERA-CLIM2 Symposium, University of Bern, 14 th December 2017. Contents Introduction. On-going developments to improve ocean data assimilation for reanalysis.

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Sparse Kernel Principal Component Analysis

Sparse Kernel Principal Component Analysis Sparse Kernel Principal Component Analysis Michael E. Tipping Microsoft Research St George House, 1 Guildhall St Cambridge CB2 3NH, U.K. mtipping~microsoft.com Abstract 'Kernel' principal component analysis

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Irregularity and Predictability of ENSO

Irregularity and Predictability of ENSO Irregularity and Predictability of ENSO Richard Kleeman Courant Institute of Mathematical Sciences New York Main Reference R. Kleeman. Stochastic theories for the irregularity of ENSO. Phil. Trans. Roy.

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Principal Component Analysis!! Lecture 11!

Principal Component Analysis!! Lecture 11! Principal Component Analysis Lecture 11 1 Eigenvectors and Eigenvalues g Consider this problem of spreading butter on a bread slice 2 Eigenvectors and Eigenvalues g Consider this problem of stretching

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Semiblind Source Separation of Climate Data Detects El Niño as the Component with the Highest Interannual Variability

Semiblind Source Separation of Climate Data Detects El Niño as the Component with the Highest Interannual Variability Semiblind Source Separation of Climate Data Detects El Niño as the Component with the Highest Interannual Variability Alexander Ilin Neural Networks Research Centre Helsinki University of Technology P.O.

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

Clustering Techniques and their applications at ECMWF

Clustering Techniques and their applications at ECMWF Clustering Techniques and their applications at ECMWF Laura Ferranti European Centre for Medium-Range Weather Forecasts Training Course NWP-PR: Clustering techniques and their applications at ECMWF 1/32

More information

Statistical foundations

Statistical foundations Statistical foundations Michael K. Tippett International Research Institute for Climate and Societ The Earth Institute, Columbia Universit ERFS Climate Predictabilit Tool Training Workshop Ma 4-9, 29 Ideas

More information

EECS490: Digital Image Processing. Lecture #26

EECS490: Digital Image Processing. Lecture #26 Lecture #26 Moments; invariant moments Eigenvector, principal component analysis Boundary coding Image primitives Image representation: trees, graphs Object recognition and classes Minimum distance classifiers

More information

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries International Symposium on Climate Change Effects on Fish and Fisheries On the utility of self-organizing maps (SOM) and k-means clustering to characterize and compare low frequency spatial and temporal

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Complex-Valued Neural Networks for Nonlinear Complex Principal Component Analysis

Complex-Valued Neural Networks for Nonlinear Complex Principal Component Analysis Complex-Valued Neural Networks for Nonlinear Complex Principal Component Analysis Sanjay S. P. Rattan and William W. Hsieh Dept. of Earth and Ocean Sciences, University of British Columbia Vancouver, B.C.

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Proxy reconstructions of Pacific decadal variability. March 8, 2006

Proxy reconstructions of Pacific decadal variability. March 8, 2006 Proxy reconstructions of Pacific decadal variability Michael N. Evans Laboratory of Tree-Ring Research/Geosciences/Atmospheric Sciences ATMO/GEOS 513 web pages: http://ic.ltrr.arizona.edu/ic/enso/ March

More information

Nonlinear atmospheric teleconnections

Nonlinear atmospheric teleconnections GEOPHYSICAL RESEARCH LETTERS, VOL.???, XXXX, DOI:10.1029/, Nonlinear atmospheric teleconnections William W. Hsieh, 1 Aiming Wu, 1 and Amir Shabbar 2 Neural network models are used to reveal the nonlinear

More information

Synergy between Data Reconciliation and Principal Component Analysis.

Synergy between Data Reconciliation and Principal Component Analysis. Plant Monitoring and Fault Detection Synergy between Data Reconciliation and Principal Component Analysis. Th. Amand a, G. Heyen a, B. Kalitventzeff b Thierry.Amand@ulg.ac.be, G.Heyen@ulg.ac.be, B.Kalitventzeff@ulg.ac.be

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Exploring and extending the limits of weather predictability? Antje Weisheimer

Exploring and extending the limits of weather predictability? Antje Weisheimer Exploring and extending the limits of weather predictability? Antje Weisheimer Arnt Eliassen s legacy for NWP ECMWF is an independent intergovernmental organisation supported by 34 states. ECMWF produces

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Building knowledge from plant operating data for process improvement. applications

Building knowledge from plant operating data for process improvement. applications Building knowledge from plant operating data for process improvement applications Ramasamy, M., Zabiri, H., Lemma, T. D., Totok, R. B., and Osman, M. Chemical Engineering Department, Universiti Teknologi

More information

Machine Learning and Pattern Recognition Density Estimation: Gaussians

Machine Learning and Pattern Recognition Density Estimation: Gaussians Machine Learning and Pattern Recognition Density Estimation: Gaussians Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh 10 Crichton

More information

CS229 Final Project. Wentao Zhang Shaochuan Xu

CS229 Final Project. Wentao Zhang Shaochuan Xu CS229 Final Project Shale Gas Production Decline Prediction Using Machine Learning Algorithms Wentao Zhang wentaoz@stanford.edu Shaochuan Xu scxu@stanford.edu In petroleum industry, oil companies sometimes

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Matching the dimensionality of maps with that of the data

Matching the dimensionality of maps with that of the data Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Modeling and Predicting Chaotic Time Series

Modeling and Predicting Chaotic Time Series Chapter 14 Modeling and Predicting Chaotic Time Series To understand the behavior of a dynamical system in terms of some meaningful parameters we seek the appropriate mathematical model that captures the

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Antti Honkela 1, Stefan Harmeling 2, Leo Lundqvist 1, and Harri Valpola 1 1 Helsinki University of Technology,

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

NON-LINEAR CYCLIC REGIMES OF SHORT-TERM CLIMATE VARIABILITY. PERRY SIH B.Sc, The University of British Columbia, 2001

NON-LINEAR CYCLIC REGIMES OF SHORT-TERM CLIMATE VARIABILITY. PERRY SIH B.Sc, The University of British Columbia, 2001 NON-LINEAR CYCLIC REGIMES OF SHORT-TERM CLIMATE VARIABILITY by PERRY SIH B.Sc, The University of British Columbia, 2001 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER

More information

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to: System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :

More information

Gaussian Filtering Strategies for Nonlinear Systems

Gaussian Filtering Strategies for Nonlinear Systems Gaussian Filtering Strategies for Nonlinear Systems Canonical Nonlinear Filtering Problem ~u m+1 = ~ f (~u m )+~ m+1 ~v m+1 = ~g(~u m+1 )+~ o m+1 I ~ f and ~g are nonlinear & deterministic I Noise/Errors

More information

Lecture 15: Random Projections

Lecture 15: Random Projections Lecture 15: Random Projections Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 15 1 / 11 Review of PCA Unsupervised learning technique Performs dimensionality reduction

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

Coupled Ocean-Atmosphere Assimilation

Coupled Ocean-Atmosphere Assimilation Coupled Ocean-Atmosphere Assimilation Shu-Chih Yang 1, Eugenia Kalnay 2, Joaquim Ballabrera 3, Malaquias Peña 4 1:Department of Atmospheric Sciences, National Central University 2: Department of Atmospheric

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4]

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4] Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4] With 2 sets of variables {x i } and {y j }, canonical correlation analysis (CCA), first introduced by Hotelling (1936), finds the linear modes

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Nonlinear singular spectrum analysis by neural networks. William W. Hsieh and Aiming Wu. Oceanography/EOS, University of British Columbia,

Nonlinear singular spectrum analysis by neural networks. William W. Hsieh and Aiming Wu. Oceanography/EOS, University of British Columbia, Nonlinear singular spectrum analysis by neural networks William W. Hsieh and Aiming Wu Oceanography/EOS, University of British Columbia, Vancouver, B.C. V6T 1Z4, Canada tel: (64) 822-2821, fax: (64) 822-691

More information

3.4 Linear Least-Squares Filter

3.4 Linear Least-Squares Filter X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum

More information

Data Assimilation: Finding the Initial Conditions in Large Dynamical Systems. Eric Kostelich Data Mining Seminar, Feb. 6, 2006

Data Assimilation: Finding the Initial Conditions in Large Dynamical Systems. Eric Kostelich Data Mining Seminar, Feb. 6, 2006 Data Assimilation: Finding the Initial Conditions in Large Dynamical Systems Eric Kostelich Data Mining Seminar, Feb. 6, 2006 kostelich@asu.edu Co-Workers Istvan Szunyogh, Gyorgyi Gyarmati, Ed Ott, Brian

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Background Error Covariance Modelling

Background Error Covariance Modelling Background Error Covariance Modelling Mike Fisher Slide 1 Outline Diagnosing the Statistics of Background Error using Ensembles of Analyses Modelling the Statistics in Spectral Space - Relaxing constraints

More information

Received 3 October 2001 Revised 20 May 2002 Accepted 23 May 2002

Received 3 October 2001 Revised 20 May 2002 Accepted 23 May 2002 INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 22: 1687 178 (22) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 1.12/joc.811 A GRAPHICAL SENSITIVITY ANALYSIS FOR STATISTICAL

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information