A cautionary note on the use of Nonlinear Principal Component Analysis to identify circulation regimes.

Similar documents
Chap.11 Nonlinear principal component analysis [Book, Chap. 10]

An Introduction to Nonlinear Principal Component Analysis

Downward propagation and statistical forecast of the near-surface weather

On Sampling Errors in Empirical Orthogonal Functions

Nonlinear atmospheric teleconnections

High initial time sensitivity of medium range forecasting observed for a stratospheric sudden warming

Nonlinear principal component analysis of noisy data

Semiblind Source Separation of Climate Data Detects El Niño as the Component with the Highest Interannual Variability

Downward propagation from the stratosphere to the troposphere: A comparison of the two hemispheres

Tropical stratospheric zonal winds in ECMWF ERA-40 reanalysis, rocketsonde data, and rawinsonde data

Complex-Valued Neural Networks for Nonlinear Complex Principal Component Analysis

NOTES AND CORRESPONDENCE. On the Seasonality of the Hadley Cell

The Vertical Structure of Wintertime Climate Regimes of the Northern Hemisphere Extratropical Atmosphere

Exploring and extending the limits of weather predictability? Antje Weisheimer

Frequency-Based Separation of Climate Signals

On the remarkable Arctic winter in 2008/2009

Kernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA

Stratospheric polar vortex influence on Northern Hemisphere winter climate variability

Does increasing model stratospheric resolution improve. extended-range forecast skill?

NON-LINEAR CYCLIC REGIMES OF SHORT-TERM CLIMATE VARIABILITY. PERRY SIH B.Sc, The University of British Columbia, 2001

Climate Forecast Applications Network (CFAN)

Oceanic origin of the interannual and interdecadal variability of the summertime western Pacific subtropical high

By STEVEN B. FELDSTEINI and WALTER A. ROBINSON* University of Colorado, USA 2University of Illinois at Urbana-Champaign, USA. (Received 27 July 1993)

PRMS WHITE PAPER 2014 NORTH ATLANTIC HURRICANE SEASON OUTLOOK. June RMS Event Response

Low frequency variability in globally integrated tropical cyclone power dissipation

A Nonlinear Analysis of the ENSO Cycle and Its Interdecadal Changes

Change in Occurrence Frequency of Stratospheric Sudden Warmings. with ENSO-like SST Forcing as Simulated WACCM

What kind of stratospheric sudden warming propagates to the troposphere?

CHAPTER 2 DATA AND METHODS. Errors using inadequate data are much less than those using no data at all. Charles Babbage, circa 1850

Volcanic eruptions, large scale modes in the northern hemisphere, and the El Ninõ - Southern Oscillation

Influence of eddy driven jet latitude on North Atlantic jet persistence and blocking frequency in CMIP3 integrations

Characteristics of the QBO- Stratospheric Polar Vortex Connection on Multi-decadal Time Scales?

P3.11 A COMPARISON OF AN ENSEMBLE OF POSITIVE/NEGATIVE PAIRS AND A CENTERED SPHERICAL SIMPLEX ENSEMBLE

Arctic Oscillation or North Atlantic Oscillation?

2013 ATLANTIC HURRICANE SEASON OUTLOOK. June RMS Cat Response

4.3.2 Configuration. 4.3 Ensemble Prediction System Introduction

NOTES AND CORRESPONDENCE. Improving Week-2 Forecasts with Multimodel Reforecast Ensembles

2. Outline of the MRI-EPS

Nonlinear singular spectrum analysis by neural networks. William W. Hsieh and Aiming Wu. Oceanography/EOS, University of British Columbia,

Wave-driven equatorial annual oscillation induced and modulated by the solar cycle

An observational study of the impact of the North Pacific SST on the atmosphere

Long-Term Trend and Decadal Variability of Persistence of Daily 500-mb Geopotential Height Anomalies during Boreal Winter

Is Antarctic climate most sensitive to ozone depletion in the middle or lower stratosphere?

Irregularity and Predictability of ENSO

The stratospheric response to extratropical torques and its relationship with the annular mode

Analysis of the mid-latitude weather regimes in the 200-year control integration of the SINTEX model

Potential of Equatorial Atlantic Variability to Enhance El Niño Prediction

A Test for Annular Modes

Stratosphere Troposphere Coupling and Links with Eurasian Land Surface Variability

El Niño Seasonal Weather Impacts from the OLR Event Perspective

How far in advance can we forecast cold/heat spells?

Delayed Response of the Extratropical Northern Atmosphere to ENSO: A Revisit *

The Coupled Model Predictability of the Western North Pacific Summer Monsoon with Different Leading Times

Analysis Links Pacific Decadal Variability to Drought and Streamflow in United States

Atmospheric QBO and ENSO indices with high vertical resolution from GNSS RO

The North Atlantic Oscillation: Climatic Significance and Environmental Impact

Atmospheric circulation analysis for seasonal forecasting

Definition of Antarctic Oscillation Index

Downscaling in Time. Andrew W. Robertson, IRI. Advanced Training Institute on Climate Variability and Food Security, 12 July 2002

Stratosphere Troposphere Coupling in the Southern Hemisphere

COMPOSITE ANALYSIS OF EL NINO SOUTHERN OSCILLATION EVENTS ON ANTARCTICA

Extratropical transition of North Atlantic tropical cyclones in variable-resolution CAM5

Hybrid coupled modeling of the tropical Pacific using neural networks

Dynamical Changes in the Arctic and Antarctic Stratosphere During Spring

Can knowledge of the state of the stratosphere be used to improve statistical forecasts of the troposphere?

The Arctic Ocean's response to the NAM

Reduction of complex models using data-mining and nonlinear projection techniques

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

Interannual Variability of the Wintertime Polar Vortex in the Northern Hemisphere Middle Stratosphere1

Tropical drivers of the Antarctic atmosphere

Climatic changes in the troposphere, stratosphere and lower mesosphere in

SUPPLEMENTARY INFORMATION

Local Predictability of the Performance of an. Ensemble Forecast System

identify anomalous wintertime temperatures in the U.S.

Variations in the Mechanical Energy Cycle of the Atmosphere

Reversal of Arctic Oscillation pattern and its relation to extreme hot summer in Japan in 2010

Development of a Coupled Atmosphere-Ocean-Land General Circulation Model (GCM) at the Frontier Research Center for Global Change

NOTES AND CORRESPONDENCE. Annual Variation of Surface Pressure on a High East Asian Mountain and Its Surrounding Low Areas

Using HIRS Observations to Construct Long-Term Global Temperature and Water Vapor Profile Time Series

State of polar boreal winter stratosphere ( ) The middle and upper regions of the atmosphere are now recognized as important and

Q & A on Trade-off between intensity and frequency of global tropical cyclones

Traveling planetary-scale Rossby waves in the winter stratosphere: The role of tropospheric baroclinic instability

Evaluation of the Twentieth Century Reanalysis Dataset in Describing East Asian Winter Monsoon Variability

Dynamical Impacts of Antarctic Stratospheric Ozone Depletion on the Extratropical Circulation of the Southern Hemisphere

The Recent Trend and Variance Increase of the Annular Mode

HEIGHT-LATITUDE STRUCTURE OF PLANETARY WAVES IN THE STRATOSPHERE AND TROPOSPHERE. V. Guryanov, A. Fahrutdinova, S. Yurtaeva

Downward Coupling between the Stratosphere and Troposphere: The Relative Roles of Wave and Zonal Mean Processes*

Interannual Teleconnection between Ural-Siberian Blocking and the East Asian Winter Monsoon

Nonlinear atmospheric response to Arctic sea-ice loss under different sea ice scenarios

Modeling the Downward Influence of Stratospheric Final Warming events

4C.4 TRENDS IN LARGE-SCALE CIRCULATIONS AND THERMODYNAMIC STRUCTURES IN THE TROPICS DERIVED FROM ATMOSPHERIC REANALYSES AND CLIMATE CHANGE EXPERIMENTS

Will it rain? Predictability, risk assessment and the need for ensemble forecasts

The Stratospheric Link Between the Sun and Climate

The nonlinear association between ENSO and the Euro-Atlantic winter sea level pressure

East-west SST contrast over the tropical oceans and the post El Niño western North Pacific summer monsoon

Attribution of anthropogenic influence on seasonal sea level pressure

First-Order Draft Chapter 3 IPCC WG1 Fourth Assessment Report

July Forecast Update for Atlantic Hurricane Activity in 2017

ENSO influence on zonal mean temperature and ozone in the tropical lower stratosphere

Time Space Distribution of Long-Range Atmospheric Predictability

Transcription:

A cautionary note on the use of Nonlinear Principal Component Analysis to identify circulation regimes. Bo Christiansen Danish Meteorological Institute, Copenhagen, Denmark B. Christiansen, Danish Meteorological Institute, Climate Research Division, Lyngbyvej 100, DK-2100 Copenhagen Ø, Denmark. (boc@dmi.dk)

Abstract. Recent studies of regime behaviour in the extra-tropical variability have been based on a nonlinear extension to Principal Component Analysis. Multimodality has been identified in the Nonlinear Principal Component and the multimodality has been interpreted as evidence for the existence of multiple circulation regimes. We show that multimodality is abundant in Nonlinear Principal Component Analysis when applied to sufficiently isotropic data even if these data are inherently unimodal. We recommend that the Nonlinear Principal Component Analysis should not be used for detection of multimodality and regime behaviour.

1. Introduction Linear multivariate statistical analysis methods are often used in atmospheric sciences to extract leading patterns in high dimensional data sets. A popular method is Principal Component Analysis (PCA) which by a rotation in phase space finds the directions that maximises the variance of the data. For highly nonlinear data the limitations of such linear methods are obvious, and recently nonlinear extensions to the linear multivariate methods have become popular. In particular, an extension to PCA known as Nonlinear Principal Component Analysis (NLPCA) was introduced to the atmospheric sciences through a study of a low dimensional chaotic system (Monahan 2000). The method has subsequently been applied to the El Niño-Southern Oscillation (Hsieh 2001), the Quasi-Biennial Oscillation (Hamilton and Hsieh 2002), and the northern hemisphere extra-tropical circulation (Monahan at al. 2000, 2001, 2003, Teng et al. 2004). The existence of regimes in the extra-tropical low-frequency variability may have important consequences for our understanding of climate and climate change (Palmer 1999). However, the existence of such regimes are still debated (Corti et al. 1999, Christiansen 2002, Stephenson et al. 2004). Regimes are often inferred by multimodality in a probability density estimate. As the probability density only can be reliably estimated in one and two dimensions such studies are often based on heavily truncated data where only the directions spanned by the leading Principal Components are retained. Monahan at al. 2000, 2001, 2003 extended this method and studied multimodality in the leading component of NLPCA. The purpose of this note is to show that NLPCA often leads to spurious bimodality and that the use of NLPCA to detect atmospheric regime behaviour is therefore error-prone.

We will see that NLPCA produces spurious bimodality if the input data are sufficiently isotropic and that the spurious bimodality is a robust feature of the NLPCA. In section 2 we give a brief introduction to NLPCA. In section 3 we discuss some theoretical limitations of the NLPCA. In section 4 we present numerical experiments showing that NLPCA very often reports strong bimodality for unimodal distributions. The numerical experiments include both idealised data and observations of the northern hemisphere stratospheric circulation. 2. Nonlinear Principal Component Analysis Here we briefly describe the method of Nonlinear Principal Component Analysis. More details can be found in the recent review by Hsieh 2004. Nonlinear Principal Component Analysis was proposed by Kramer 1991. Given is the data set x j, j =1, 2,...n,withx =(x 1,x 2,...,x l ). The data set can be seen as n (often temporal) samples in a l-dimensional phase space. As for linear PCA one searches for a function f from R l to R l,wherer is the set of real numbers, such that the mean square error j x j f(x j ) 2 is minimised. In linear PCA f is restricted to be linear. In NLPCA this restriction is lifted and the challenge lies in obtaining a balance between the smoothness of f and the size of the error. Kramer 1991 proposed that f is defined by a feed forward neural network with three hidden layers. The layout of the neural network is shown in Fig. 1. The input and output layers contain l neurons, the second and fourth layers contain m neurons, and the central bottleneck layer contains a single neuron. Letting N j be the number of neurons in the jth layer and u j =(u j 1,u j 2,...,u j N j )bethe state of that layer then u j = s j (w j u j 1 + b j ). The weight, w j,isan j N j 1 matrix and the bias, b j, is a vector of length N j. For the transfer functions we choose the hyperbolic

tangent for s 1 and s 3 and the identity function for s 2 and s 4. The data set x j is fed into the input layer and the weights and biases adjusted by numerical methods to minimise the error. The neural network is then a composition of two nonlinear maps f = f 2 f 1, f 1 from R l to R, andf 2 from R to R l. The state of the bottleneck neuron, u 3, is called the score or the Nonlinear Principal Component and often denoted by λ. The map f 2 defines a one-dimensional curve in R l called the NLPCA mode. The projection of the original data on the NLPCA mode is known as the NLPCA approximation and is described by the map f 1. 3. Theoretical limitations and considerations The way chosen to parameterise the functions f 1 and f 2, i.e., the architecture of the neural network, has important consequences for the characteristics of the Nonlinear Principal Component, λ, and for the characteristics of the possible NLPCA modes. Here we first describe how the architecture of the first layers favours multimodality in the distribution of λ even for normally distributed inputs. We then describe some restrictions on the possible curves in R l when there are 2 neurons in the fourth layer. This choice of m was often made in recent literature. We conclude the section with a very general consideration on the ambiguity of the Nonlinear Principal Component λ (Malthouse 1998). The state of the bottleneck layer, the Nonlinear Principal Component λ, is a linear combination of hyperbolic tangents m i=1 wi 2 tanh v i + b 2, where the v i s are linear combinations of the state of the input neurons. If the inputs are normally distributed the v i s will also be normally distributed. However, the distributions of the terms tanh v i can be bimodal due to the nonlinearity of the hyperbolic tangent. The distributions are only

unimodal if the v s are small so they fall on the linear part of the hyperbolic tangent. In fact, if v is normally distributed with variance σ 2 then the distribution of tanh v is given by exp( v 2 /σ 2 /2)/(1 tanh 2 v), which is bimodal for σ > 1/2 and unimodal otherwise. Thus, even with normally distributed inputs the nonlinearity of the neural network - which is the basic feature of all neural networks also in the field of NLPCA - will produce multimodality. It is seen that for m = 2 up to four peaks can be present in the distribution of λ if the inputs are normally distributed. The class of NLPCA modes which can be described by the neural network is determined by the number, m, of hidden neurons in the fourth layer. When m = 1 only straight lines can be described while any continuous curve can be described for m (Malthouse 1998). For m = 2 the curves are described by x i =tanhtand x j = d j tanh t+tanh(at+b), in every two-dimensional projection (x i,x j )ofr l. This holds up to a translation, scaling and rotation because the map from the fourth layer to the output layer is a linear affinity. Here t is linearly related to λ. For every two-dimensional projection these curves will have either 0, 1 or 2 turning points in each direction, i,e., points where δx i /δλ is zero. It can also be seen that the curves will have the same asymptotic slope for t as for t. Figure 2 shows some examples of possible curves. Note that the most complex curvepossibleform = 2 is the Z-shaped curve in Fig. 2d with two parallel outer branches. A general consideration of direct importance for the subject of this paper was pointed out by Malthouse 1998. The neural network implementation of the NLPCA determines a parameterisation λ, λ = f 1 (x in ), x out = f 2 (λ) which minimises the error. However, any other parameterisation s = g(λ), where g is invertible, will have the same value of the error although the maps s = g f 1 (x in ), x out = f 2 g 1 (s) maynotberealizableforthe

neural network. The distribution of s will not necessarily have the same characteristics as the distribution of λ. As an example, choosing g = λ P (x)dx, where P is the distribution of λ, will make s homogeneously distributed even if P is bimodal. Therefore, only the order of λ is interpretable but not its magnitude. Often λ is not used directly but a transformation to the arc length is performed. This choice of parameterisation may improve the NLPCA in some applications (Newbigging et al. 2003), but does not remove the basic ambiguity of the NLPCA. 4. Numerical examples In this section we show that NLPCA regularly produces strong bimodal parameterisations even if the underlying distribution is unimodal. We focus on two examples. The first is a two-dimensional Gaussian distribution and the second is an analysis of the lowfrequency variability of the northern hemisphere stratospheric extra-tropical geopotential height. The latter example is motivated by the NLPCA study of Monahan et al. 2003 who reported three circulation regimes in the stratospheric circulation. Minimisation is carried out by the Broyden-Fletcher-Goldfarb-Shanno algorithm. To avoid the problem of getting stuck in a local minimum, the minimisation is repeated 1000 times starting from different initial conditions and the best fit chosen. The resulting Nonlinear Principal Component is transformed into the arc length with values between 0 and 1. However, this transformationis not important for our conclusions as the bimodality reported with the arc length parameterisation is also present with the original Nonlinear Principal Component. Two-dimensional probability density estimates are calculated with the kernel density estimate procedure with a Gaussian kernel using the algorithm based on the Fast Fourier

Transform (Silverman 1986) with a smoothing parameter of 0.2. Here the probability density estimates are only shown for illustrative purposes and the precise value of the smoothing parameter is not important. One-dimensional histograms are calculated with abinwidthof1/ n,wheren is the number of samples. 4.1. Two-dimensional Gaussian distributions We first consider the two-dimensional Gaussian distribution. We want to study how the degree of isotropy in the distribution influences the NLPCA. We focus on Gaussian distributions with centres at (0, 0) and widths (1,c), where c is a constant between 0 and 1. For different values of c we now randomly draw 1000 independent pairs of numbers from this distribution and calculate the NLPCA. Figures 3 and 4 show results for c =0.2 and c = 0.8, respectively. The upper panels show the two-dimensional probability distributions and the lower panels show the histograms of the Nonlinear Principal Components. For c =0.2the two-dimensional probability distribution estimate is unimodal while for c =0.8 some deviations from unimodality are visible due to sampling variability. Overlaid on the probability density functions are the NLPCA mode and the NLPCA approximation to the data. For c =0.2 the NLPCA mode is a straight horizontal line. The data are almost perfectly projected vertically onto this curve and the NLPCA basically simulates a linear least square fit. Accordingly, the Nonlinear Principal Component is almost normally distributed. For c = 0.8 the situation is quite different. Now, the NLPCA mode is Z- shaped and the histogram of the Nonlinear Principal Component is strongly bimodal. The two peaks correspond to the two outer branches of the NLPCA mode while the middle branch is very sparsely populated. Points to the lower right (upper left) of a curve centred in between and parallel with the two outer branches are projected onto the lower right

(upper left) branch of the NLPCA mode. The projections are almost perpendicular and the NLPCA basically simulates a least square fit to two parallel lines. We have repeated the calculations for many sets of 1000 randomly drawn pairs and the results described above are typical for the two values of c. Forc =0.2 thenlpcamode is always an almost straight horizontal line and the distribution of the Nonlinear Principal Component is always unimodal. For c =0.8the NLPCA mode is always Z-shaped and the distribution of the Nonlinear Principal Component is always bimodal. The orientation of the Z-shape varies with a seeming affinity for the orientation in Fig. 4 and its reflections about the symmetry axes of the two-dimensional Gaussian distribution. We will discuss the reproducibility of the results in more detail in section 4.3. The results do not depend on the number of random pairs and can be reproduced with 500 or 5000 pairs instead of 1000. For c =0.5the results of the NLPCA are less robust. Now some sets of 1000 randomly drawn pairs result in a straight horizontal NLPCA mode with unimodal Nonlinear Principal Component while other sets result in a Z-shaped NLPCA mode with bimodal Nonlinear Principal Component. In this subsection we have shown that the NLPCA can produce strongly bimodal Nonlinear Principal Components even if the input data are Gaussian. If the input data are sufficiently isotropic, NLPCA will always find bimodality. 4.2. The extra-tropical circulation Nonlinear Principal Component Analysis was used by Monahan et al. 2001, 2003 to study the dynamical structure of the northern hemisphere extra-tropical variability. They sub-sampled the original daily geopotential heights from the National Centers for Envi-

ronmental Prediction/National Center for Atmospheric Research reanalysis (Kalnay et al. 1996) to a coarser grid of 72 longitudes and 36 latitudes, removed the annual cycle, low-pass filtered, and selected the December to February seasons. Then they calculated the leading linear PCs north of 20 N and used those as inputs to the NLPCA. With this approach Monahan et al. 2001, 2003 found three regimes in both the stratosphere and the troposphere. We will here discuss the stratospheric results although we have obtained similar results for the troposphere. We use the same procedure as Monahan et al. except that we have used a 30-day low-pass filter instead of a 10-day filter in order to reproduce their results in detail. The same choice was made in Christiansen 2002. We calculate the two leading linear PCs, a 1 and a 2, from the 20 hpa geopotential height and normalise them both with the standard deviation of a 1. By definition the difference between the joint probability density P (a 1,a 2 ) and the product P (a 1 )P (a 2 )of the marginal probabilities is zero if a 1 and a 2 are statistically independent. This difference is shown in the upper panel of Fig. 5 and it is almost identical to Fig. 10 of Monahan et al. 2003. The distribution of the two leading PCs are both unimodal. The distribution of the first PC, a 1, is skewed towards larger values, while the distribution of the second PC can not be distinguished from a Gaussian distribution. The NLPCA mode, which is overlaid on the probability density, is Z-shaped and the probability distribution of the Nonlinear Principal Component (lower panel of Fig. 5) is highly bimodal. Our NLPCA approximation does not have the same orientation as the NLPCA approximation of Monahan et al. 2003. However, the orientations of the two NLPCA approximations are almost identical up to a reflection about the horizontal axes, about which the distribution is approximately symmetric. As the relative width of the

distributions of the two linear PCs is 0.64 this ambiguity should be expected from the experience of the previous subsection. Monahan et al. 2003 argue that the orientation of and the peaks in the difference between the joint probability density P (a 1,a 2 ) and the product P (a 1 )P (a 2 )ofthemarginal probabilities support the results of their NLPCA. However, as shown in Christiansen 2002 this difference does not have solid physical meaning as the difference is not preserved under an orthonormal rotation. As an additional test we have constructed surrogate data (x 1,x 2 )wherex 1 is randomly drawn from P (a 1 )andx 2 is randomly drawn from P (a 2 ). Consequently x 1 and x 2 are statistical independent and (x 1,x 2 ) follows the distribution P (a 1 )P (a 2 ) where both P (a 1 )andp(a 2 ) are unimodal. The resulting NLPCA approximation and the distribution of the Nonlinear Principal Component (Fig. 6) resemble those of the original data (Fig. 5) although the only possible deviation from unimodality in the surrogate data is due to chance. This result is not sensitive to the sample size. While Fig. 6 is based on a sample of 1000 points similar results are found with samples of 300 points. 4.3. Reproducibility and overfitting We saw in section 4.1 that NLPCA produces bimodality and Z-shaped NLPCA modes even for the Gaussian distributed data if the isotropy is large enough. We also saw that the orientation of the Z-shape for c =0.8 was not uniquely determined due to the symmetry of the data. For c = 1 the orientation of the Z-shaped NLPCA mode is completely random as should be expected. If the symmetry of the data is lifted, e.g., by drawing the data sets (with a sample size of 300 or 1000) from an asymmetric distribution, the NLPCA mode

is again Z-shaped with bimodal Nonlinear Principal Component, but now its orientation does not vary among the different realizations (not shown). There is still hope that a procedure that carefully tests the reproducibility of the NLPCA would reject the spurious bimodality as such a test could certainly reject the bimodality in our experiments with Gaussian distributed data. To address the possibility of such a method we need to compare the NLPCA of the atmospheric data with a NLPCA of carefully constructed surrogate data. These surrogate data should resemble the atmospheric data but be drawn from a unimodal distribution. In particular, and in contrast to the data analysed in Fig. 6, the surrogate data should now have the same serial correlations as the atmospheric data. We have constructed a two-dimensional Gaussian data set with the same number of samples, the same serial correlations, and the same widths as the original PCs from the 20 hpa geopotential height. To this end we have used the linear method described in Winkler et al. 2001. We proceed by performing the NLPCA on 25 different subsets of both the original and the surrogate data. The subsets each contain 80 % of data (as in Monahan 2000). If the NLPCA of the surrogate data is less reproducible than the NLPCA of the original data it would suggest that the Z-shaped NLPCA mode actually reflects some structure in the data. However, as illustrated by the four realizations in Fig. 7, there is no difference between the reproducibility of the NLPCA of the original data and the NLPCA of the surrogate data. The same result is found if the atmospheric data is compared to the skewed surrogate data described in the end of section 4.2 with sample sizes of either 300 or 1000 points. Therefore, statistical tests like those described in Monahan 2000 will not be able to reject the spurious bimodality detected by the NLPCA.

One could argue that the bimodality and the Z-shaped NLPCA modes are due to overfitting which is a well known problem with neural network methods (Hsieh 2004). One approach to avoid overfitting is to add a penalty term to the the mean square error and minimise the sum j x j f(x j ) 2 /n/l + p 4 j=1 Nj i=1(w j i ) 2. The penalty term is proportional to the sum of the squared weights and the inclusion of this term will reduce the nonlinearity of the neural network. We have repeated the NLPCA in section 4.2 of the 20 hpa geopotential height and the surrogate data with different values of the penalty parameter p. With strong penalty the NLPCA mode is a straight line. With decreasing penalty the NLPCA mode first takes the form of a curved line before it adopts the Z- shape. The same pattern is found for both the real data and the surrogate data (Fig. 8). Only for the strongest penalty where the NLPCA is a straight line will the distribution of the Nonlinear Principal Component be unimodal. 5. Discussion Recently, Nonlinear Principal Component Analysis has been introduced to the atmospheric sciences and the observed multimodality in the Nonlinear Principal Component has been taken as evidence for regimes in both the tropospheric and stratospheric circulation. We have reviewed the usefulness of NLPCA as a tool for detection of multimodality and found severe problems and limitations. Our review included both theoretical arguments and numerical simulations based on both idealised data sets and observations of the northern hemisphere stratospheric circulation. Theoretically we have shown that the Nonlinear Principal Component easily becomes multimodal even if the input is normally distributed. The multimodality is a consequence

of the nonlinearity of the NLPCA. We also saw that with two neurons in the fourth layer the NLPCA approximation can be only a straight line, a simple curve, or a Z-shaped structure. Finally, we reiterated a limitation of the method first reported by Malthouse 1998. This limitation states that only the order of the Nonlinear Principal Component is interpretable but not its magnitude. Numerical simulations based on Gaussian distributed data sets confirmed that the NLPCA produces multimodality when fed with unimodal data. We saw that multimodality resulted when the distribution of the input data was sufficiently isotropic. The multimodality of the Nonlinear Principal Component was accompanied by a Z-shaped NLPCA mode with a sparsely populated middle branch so that the NLPCA approximation effectively consisted of two parallel line pieces. We repeated the NLPCA analysis of the northern hemisphere stratospheric circulation reported by Monahan et al. 2003. As Monahan et al. 2003 we found bimodality in the distribution of the Nonlinear Principal Component and a Z-shaped NLPCA approximation in the space spanned by the two leading PCs. We also found similar bimodality and Z- shaped NLPCA approximation for data drawn randomly from the product of the marginal distributions of the original PCs. The two-dimensional, uncorrelated Gaussian data studied in section 4.1 were constructed to offer a simple and clean test case. We saw that the Z-shaped NLPCA mode appeared in all realizations but that its orientation varied with an affinity for certain orientations determined by the symmetry of the data. If the symmetry of the data is lifted, e.g., by drawing the numbers from an asymmetric distribution, the NLPCA mode is again Z-shaped with bimodal Nonlinear Principal Component, but now its orientation does not

vary among the different realizations. We elaborated on this point in section 4.3 where we saw that the atmospheric data and appropriate unimodal surrogate data have the same reproducibility so that attempts to validate the method by training it on one part of the data and testing it on the remaining part will fail to reject spurious bimodality. In the present paper we have focused on the bivariate case. However, additional experiments with data sets of higher dimensions show that also here will the NLPCA produce spurious bimodality when the data are sufficiently isotropic and the penalty factor is small enough. We also note that the spurious multimodality is robust to changes in the architecture of the neural network such as the transfer functions and the number of neurons, m, in the second and fourth layer. We conclude that the NLPCA abundantly produces spurious multimodality and that it should not be used for the detection of multimodality and regime behaviour. Acknowledgments. The author would like to acknowledge insightful comments provided by Matt Newman during the review process which substantially improved the manuscript. This work was supported by the Danish Climate Centre. The NCEP Reanalysis data were provided by the NOAA-CIRES Climate Diagnostics Center, Boulder, Colorado, USA, from their Web site at http://www.cdc.noaa.gov/. References Corti, S., F. Molteni, and T. N. Palmer, Signature of recent climate change in frequencies of natural atmospheric circulation regimes, Nature, 398, 799-802, 1999. Christiansen, B., On the physical nature of the Arctic Oscillation. Geophys. Res. Lett., 29(16), 10.1029/2001GL015208, 2002.

Hamilton, K., and W. W. Hsieh, Representation of the QBO in the tropical stratospheric wind by nonlinear principal component analysis, J. Geophys. Res., 107 (D15), 4232, 10.1029/2001JD001250, 2002. Hsieh W. W., Nonlinear principal component analysis by neural networks, Tellus, A53, 599-615, 2001. Hsieh W. W., Nonlinear multivariate and time series analysis by neural network methods, Review of Geophysics, 42, RG1003, 2004. Kalnay, E., et al., The NCEP/NCAR 40-year reanalysis project, Bull. Am. Meteorol. Soc., 77, 437-471, 1996. Kramer, M. A., Nonlinear principal component analysis using autoassociative neural networks, AIChE J., 37, 233-243, 1991. Malthouse, E. C., Limitations of nonlinear PCA as performed with generic neural networks, IEEE Trans. Neural Networks, 9, 165-173, 1998. Monahan, A. H., Nonlinear Principal Component Analysis by neural networks: Theory and application to the Lorenz system, J. Climate, 13, 821-835, 2000. Monahan, A. H., J. Fyfe, and G. M. Flato, A regime view of northern hemisphere atmospheric variability and change under global warming, Geophys. Res. Lett., 27, 1139-1142, 2000. Monahan, A. H., L. Pandolfo, and J. Fyfe, The preferred structure of variability of the northern hemisphere atmospheric circulation, Geophys. Res. Lett., 28, 1019-1022, 2001. Monahan, A. H., L. Pandolfo, and J. Fyfe, The vertical structure of wintertime climate regimes of the northern hemisphere extratropical atmosphere, J. Climate, 16, 2005-2020, 2003.

Newbigging, S. C., L. A. Mysak, W. W. Hsieh, Improvements to the Non-linear Principal Component Analysis method, with applications to ENSO and QBO, Atmosphere-Ocean, 41, 291-299, 2003. Palmer, T. N., A nonlinear dynamical perspective on climate change. J. Climate, 12, 575-591, 1999. Silverman B. W., Density estimation for statistics and data analysis, 175 pp., Chapman and Hall, 1986. Stephenson, D. B., A. Hannachi, and A. O Neill, On the existence of multiple climate regimes, Q. J. R. Meteorol. Soc., 130, 583-605, 2004. Teng, Q., A. H. Monahan, and J. C. Fyfe Effects of time averaging on climate regimes, Geophys. Res. Lett., 31, L22203, doi:10.1029/2004gl020840, 2004. Winkler, C. R., M. Newman, and P. D. Sardeshmukh, A linear model of wintertime lowfrequency variability. Part I: Formulation and forecast skill, J. Climate, 14, 4474-4494, 2001.

# neurons: l m 1 m l Activation function: nonlinear linear nonlinear linear Layer: 1 2 3 4 5 Figure 1. The architecture of the neural network for Nonlinear Principal Component Analysis. The network has 5 layers and the middle bottleneck layer consists of a single neuron. The number of neurons in the input and output layers are l. The second and fourth layer have m neurons. The activation function is linear for the bottleneck layer and for the output layer and nonlinear for layers 2 and 4.

Figure 2. Some possible shapes of the output of the neural network with m =2. Theshapes are limited by a maximum of two turning points in each direction. The asymptotic slopes are the same in both ends of the curves. However, this is not discernible in a) and b) where the asymptotic behaviour is limited to very short parts at the edges of the curves.

Figure 3. Results from a NLPCA of a two-dimensional Gaussian distribution with centre (0, 0) and width (1, 0.2). In the upper panel the coloured contours (arbitrarily chosen) show the probability density estimate, the yellow curve the NLPCA mode, and the filled circles the NLPCA approximation. The lower panel shows the histogram of the Nonlinear Principal Component (arc length). The numbers of neurons in the neural net are l =2andm =2.

Figure 4. As Fig. 3 but for a two-dimensional Gaussian distribution with centre (0, 0) and width (1, 0.8).

Figure 5. Results from a NLPCA of the northern hemisphere wintertime 20 hpa geopotential height in the space spanned by the two leading linear principal components. In the upper panel the contours (arbitrary scaling) show the difference between the joint probability density P (a 1,a 2 ) and the product P (a 1 )P (a 2 ) of the marginal probabilities, the yellow curve the NLPCA mode, and the filled circles the NLPCA approximation. The lower panel shows the histogram of the Nonlinear Principal Component (arc length). The numbers of neurons in the neural net are l = 2 and m =2.

Figure 6. As in Fig. 5 except that the 1000 data points are drawn randomly from the distribution P (a 1 )P (a 2 ), where P (a 1 )andp(a 2 ) are the are the marginal distributions of the two leading linear principal components of the northern hemisphere wintertime 20 hpa geopotential height. The arbitrary contours show the tw-dimensional probability density.

Figure 7. Results from NLPCA of the 20 hpa geopotential height (left panel) and surrogate data (right panel) for different subsets including 80 % of the samples. The contours (arbitrarily scaled) show the probability density P (a 1,a 2 ). The surrogate data are drawn from a two-dimensional Gaussian distribution and have the same serial correlations and the same widths as the atmospheric data.

Figure 8. Results from NLPCA of the 20 hpa geopotential height (left panel) and the surrogate data (right panel) for different values of the penalty parameter p. From top and down p = 0.0001, 0.001, 0.01, and 0.1. In the right panel the arbitrarily scaled contours show the probability density P (a 1,a 2 ) and in the left panel the contours show P (a 1,a 2 ) P (a 1 )P (a 2 ).