Can principal component analysis provide atmospheric circulation or teleconnection patterns?

Size: px
Start display at page:

Download "Can principal component analysis provide atmospheric circulation or teleconnection patterns?"

Transcription

1 INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. : 3 () Published online 4 July in Wiley InterScience ( DOI:./joc.54 Can principal component analysis provide atmospheric circulation or teleconnection patterns? Rosa H. Compagnucci a *, and Michael B. Richman b a Departamento de Ciencias de la Atmósfera y los Oceanos, Universidad de Buenos Aires/CONICET, Ciudad Universitaria, pabellón, (4) Ciudad de Buenos Aires, República Argentina b School of Meteorology and Cooperative Institute for Mesoscale Meteorological Studies, The University of Oklahoma, David L. Boren Blvd., Suite 59, Norman, OK 3, USA ABSTRACT: This investigation examines principal component (PC) methodology and the interpretation of the displays, such as eigenvalue magnitude, loadings and scores, which the methodology provides. The key question posed is, to what extent can S- and T-mode decompositions of a dispersion matrix yield the kinds of interpretations placed on them typically? In particular, a series of experiments are designed based on various amalgamations of three distinct synoptic flow patterns. Since these flow patterns are known, apriori, this allows testing via subtle alterations of the methodology to determine whether there is equivalence between the S- and T-mode decompositions, the degree to which the flow patterns or teleconnections can be recovered by each mode, and the interpretation of each mode. The findings are examined in two contexts: how well they classify the flow patterns, and how well they provide meaningful teleconnections. Both correlation and covariance dispersion matrices are used to determine differences that arise from the standardization. Additionally, unrotated and rotated results are included. By examining a variety of commonly applied methodologies, the results hold for a wider range of studies. Key findings are that eigenvalue degeneracy can influence one mode (but not the other) or both modes for any set of flow patterns resulting in pattern intermixing at times. Similarly, such degeneracy is found in one or both dispersion matrices. Congruence coefficients are used to provide a measure of validity by matching the PC loadings to the parent correlations and covariances. This matching is vital as the loadings exhibit dipoles that have been interpreted historically as physically meaningful, but the present work indicates they may arise purely through the methodology. Overall, we observe that S-mode results can be interpreted as teleconnection patterns and T-mode as flow patterns for well-designed analyses that are meticulously scrutinized for methodological problems. Copyright Royal Meteorological Society KEY WORDS principal component analysis; T-Mode; S-Mode; atmospheric circulation; teleconnections; regionalization; circulation patterns; circulation change Received May ; Accepted May. Introduction Multivariate statistical techniques, such as principal component analysis (PCA) or empirical orthogonal functions (EOF), have been used routinely in the atmospheric literature to address issues such as classification (Dyer, 95), circulation change (Compagnucci and Vargas, 99), teleconnections (Wallace and Gutzler, 9), spatial variability (Kutzbach, 9; Stidd, 9), and to assist in analogue forecasting (Grimmer, 93). Despite the wide scope of applications, since the work of Lorenz (95), who applied EOF, most of these follow the same basic procedure, without considering the precise objectives of the study. Investigators rarely deliberate about how the breadth of methodological decisions affects the results and interpretations in PCA. * Correspondence to: Rosa H. Compagnucci, Departamento de Ciencias de la Atmósfera y los Oceanos, Universidad de Buenos Aires/ CONICET, Ciudad Universitaria, pabellón, (4) Ciudad de Buenos Aires, República Argentina. rhc@at.fcen.uba.ar Both authors have the equivalent contribution to this paper. One such decision needs to be considered from the outset. The scientific objective of any study should be tied to a decision on a specific mode in which the data matrix is decomposed. This decision will influence the results of the analyses and consequent interpretation, irrevocably. Preisendorfer (9) mentions six basic modes of decomposition (O, P, Q, R, S and T). The vast majority of applications in meteorology are spacetime studies. Therefore, in such studies, a decision must be made whether the data will be expressed in an S- or T-mode approach. Selection of the S-mode treats the time series (m times) at each of the n stations (or gridpoints) as variables in the analysis; the domain is the geographical area. Conversely, selection of the T- mode treats the spatial field, defined by all the n stations (or gridpoints), at each of the m times as variables; the domain is time (Preisendorfer and Mobley, 9). As a consequence of these definitions, S-mode analysis results in an nxn similarity matrix whereas the T-mode will result in an mxm similarity matrix. Each of these will lead the investigator down a different interpretation path. Copyright Royal Meteorological Society

2 4 R. H. COMPAGNUCCI AND M. B. RICHMAN The next decision is to select the appropriate similarity matrix. The most common approaches use the covariance or correlation dispersion matrix. Bretherton et al. (99) introduced the idea of singular value decomposition (SVD) to the meteorological community, for EOF resolution. Mathematically, either of the aforementioned dispersion matrices can be expressed implicitly in SVD for the S- or T-mode. These are equivalent to the deviation (or standardization) of the data by means (or by means and standard deviation) of the time series for each station in S-mode and by the spatial means (or means and standard deviations) over all stations for each time in T-mode. Selection of either the covariance or the correlation matrix emphasizes different characteristics in the data that will be presented to the eigenanalysis and lead to distinct results and interpretation. In the literature, most applications apply EOF s, using covariances among time series or arising from the SVD, that are implicitly S-mode applications. Similarly, most methodological papers have been S-mode, including those of Buell (95, 99) who examined domain shape dependence on a spatial network and, North et al. (9) who sought to establish error bars on the eigenvalues to document spatial intermixing of associated eigenvectors. Furthermore, Preisendorfer et al. (9), Overland and Preisendorfer (9), Preisendorfer (9), Richman et al. (99) and Jolliffe () motivate their methodological discussions from S-mode perspective. In recent years, there has been an increasing sophistication in the application of methods rooted in S-mode, including various similarity matrices (von Storch and Zwiers, 999), complex techniques (Horel, 94), non-linear methods (Penland, 99), extended EOF (Weare and Nasstrom, 9), principal oscillation patterns and principal interaction patterns (Hasselmann, 9), and extended SVD (Kudora and Kodera, 999). In contrast, fewer studies have been carried out using T-mode and such analyses are relatively straightforward. The earliest, by Vargas and Compagnucci (93) and Richman (93), were methodological papers that explained advantages of T-mode for map classification. Compagnucci and Vargas (9, 99), Huth (993, 99), Drosdowsky (993a, 993b), Bartzokas et al. (994), Bartzokas and Metaxas (99) and Compagnucci and Salles (99) explain the useful application of T- mode. The focus of this investigation is to clarify the efficacy of the S- and T-modes of decomposition over a range of methodological options that arise from the decision to select a specific mode. At a first glance, the goal of a straightforward interpretation would appear to be within easy reach; however, the nuances of the methodology can rapidly become intertwined with the structure of the data, perhaps rendering a clear answer difficult to discern. A concrete example of this has been the interpretation of Figure in Richman s (9) results, which show pattern similarity between the input modes and covariance-based rotated S-mode principal component (PC) loadings for the simplified artificial patterns of an atmospheric circulation dataset created by Vargas and Compagnucci (93). There have been two problems in subsequent interpretation of these findings. The first is that Figure in Richman s (9) article was intended to be didactic and there is no indication of how well that result generalizes. Richman (9, p. ) noted that, in a geopotential height versus time dataset, the T-mode provides map-circulation patterns and the S-mode produces regionalization patterns. Despite that, the second problem is that some researchers believe that rotated S-mode PCA loadings or EOFs can be interpreted as map-circulation patterns, as stated in Yarnal (993) and others who follow his methodology (e.g. Slonosky et al., ). That misinterpretation of Richman s (9) example might be fostered by the apparent pattern equivalence between both modes for the particular dataset analysed. To clarify the interpretation of the most common modes of decomposition (S- and T-mode) used for either EOF or PCA, this research presents the matrix formulation of these approaches and discusses the implication of selecting a given mode. It is the underlying details, which can go unappreciated, that affect the results and interpretation of each mode. Since the selection of the mode of decomposition is the initial processing decision that is made within an EOF or PCA, it affects all choices that follow. Thus, to document the impact of each decision on the results, the experiments must be carried out in a confirmatory framework where the answer is known, apriori. Consequently, a series of known artificial constructs of atmospheric circulation are employed to draw distinctions between the S- and T-modes. A dataset, with known properties, which is used in controlled experiments, has been defined as a plasmode by Cattell and Sullivan (9). The rationale for using these controlled data is to determine how well the different characteristics of the input dataset can be recovered by the PCA. In the meteorological literature, plasmodes were introduced in Richman (9) to test PCA. Vargas and Compagnucci (93) enhanced the idea by formulating the input dataset to produce a group of fields that depict circular, zonal and meridional circulation to illustrate some differences between S- and T-modes for one particular experiment. Richman (9) noted the usefulness of Vargas and Compagnucci (93) enhancement and applied the same plasmode to show the impact of rotation. This work is organized such that Section offers details on the methodology and construction of the plasmodes. The results of S- and T-mode analyses of each plasmode and linkages between the two are presented in Section 3. The key results of the work are summarized in Section 4.. Differences in the mathematical expression of S- and T-mode decomposition in PCA.. PCA and SVD definitions and equivalence If the columns of the input data matrix Z are treated as mathematical variables and Z has n grid-point time series Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

3 CAN PCA PROVIDE ATMOSPHERIC CIRCULATION OR TELECONNECTION PATTERN? 5 (a) Meridional (direct) (b) Meridional (inverse) (c) Zonal (direct) (d) Zonal (inverse) (e) Circular (direct) (f) Circular (inverse) (g) Zonal (direct) (h) Zonal (inverse) (i) Meridional (direct) (j) Meridional (inverse) (k) Circular (direct) (l) Circular (inverse) (m) Plasmode 3 from -day to -day - Day - Day 3 - Day 4 - Day 5 - Day - Day - Day - Day 9 - Day - Day - Day - Day 3 - Day 4 - Day 5 - Day - Day - Day - Day 9 - Day - Day - Day -Day 3 - Day 4 - Day -Day - Day - Day - Day 9 - Day 3 - Day 3 - Day 3 - Day 33 - Day 34 - Day 35 - Day - Day Figure. Spatial fields: (.) examples of Zonal, Meridional and Circular fields for PLASMODE (a f) with low noise level added and PLASMODE (g l) with high noise level added and (..) PLASMODE 3 fields for days (m). and m time steps then, in S-mode, Z S is of order m n and, under T-mode, Z T is of order n m. The PCA model can be written in a matrix formulation as Z = FA T where F is a matrix of the new variables in columns, known as PC scores, and A is a matrix, whose columns are PC loadings relating the variables in the input matrix (Z) to the corresponding PC score matrix (F) (Green, 9; Richman, 9). For a specific decomposition, the model is Z S = F S A S T in S-mode and Z T = F T A T T in T-mode. For a covariance (correlation) input dispersion matrix, Z is in deviation (standardized) form as a departure from the respective column means (departure divided by the Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

4 R. H. COMPAGNUCCI AND M. B. RICHMAN standard deviation of each column) and A has elements that are the covariances (correlations) relating to the variables in Z and F. The raw S-mode data matrix is the transpose of the raw T-mode data matrix. However, nearly all PCA is based on some dispersion matrix. For PCA, based on dispersion matrices, it is crucial to note the direct correspondence between S- and T-modes is true for Z only when the means and standard deviations of the columns of Z are equal to the means and standard deviations of the corresponding rows of Z. If the data matrix is not square, this is not possible. Typically, Z S and Z T lead to a different covariance (correlation) dispersion matrix. A mathematically equivalent expression to that of PCA, that has gained widespread use among atmospheric scientists, is SVD. Under SVD, a matrix Z is expressed as Z = USQ T (Mestas Nuñez, ) where U and Q are the left and right singular vectors (eigenvector matrices), respectively, and S is the singular values diagonal matrix (square root of the eigenvalues multiplied by the degrees of freedom). The covariance (correlation) dispersion matrix that was expressed explicitly in PCA is implicitly incorporated into SVD by expressing the data in anomaly (or standardized anomaly) form. The expression is Z S = U S S S Q T S in S-mode where Z S is the data matrix with deviation (standardized) gridpoints time series in columns, and Z T = U T S T Q T T in T-mode, were Z T has the fields for each time step, are in the deviation (standardized) form. The matrices U S, U T and Q S, Q T have the property of being orthonormal (U T U = Iand Q T Q = I). It follows that, in the S-mode, Z S Z T S U S = U S S S and Z T S Z S Q S = Q S S S (Z T Z T T U T = U T S T and Z T T Z T Q T = Q T S T in T-mode). Thus, the following special property arises: Q S and U S (Q T and U T ) are the eigenvectors of the Z T T S Z S and its transpose Z S Z S (Z T T Z T and Z T Z T T ), respectively (Preisendorfer, 9). Owing to this, it is imperative to issue a cautionary note for SVD: in spite of the special property, both S- and T-mode solutions are different when Z S is different to the Z T T. This is analogous to the case using covariance or correlation dispersion matrices directly. SVD can now be expressed in the form previously defined for PCA, Z S = F S U S D / S,whereD S = S S /(m ) and A S = U S D /, Z S = F S A T / S (for T-mode: Z T = F T U T D T and D T = S T /(n )). Hence, any findings in this work for PCA will apply equally to SVD. Each column of the standardized PC scores matrix, F S (mxn) in S-mode, is a time series while, for F T (nxm) in the T-mode, can be mapped as a spatial field. For the corresponding PC loading matrix, A S (mxm), each column in S-mode can be mapped as a spatial field, whereas A T (nxn) in T-mode can be plotted as a time series. The previous development proves that F S A T and F T A S, despite the fact that both are time series and spatial fields, respectively... Range of analyses presented In the examples that follow, both modes of decomposition are used, and PC scores, PC loadings and eigenvalues (together with explained variances) for unrotated and Varimax rotated solutions (Kaiser, 95) will be shown. The Varimax criterion is one of many algorithms available (Richman, 9). It is chosen because it is the most common rotation applied in the atmospheric research. Varimax is an analytic formulation embodies some of the principles of simple structure (Thurstone, 94) by attempting to find an orthonormal rotation matrix T such that the sum of variances of the columns of B is a maximum, where B = AT. A simple structure is the idea that the PC loadings are most easily interpreted when they are simplified as much as the data allow (i.e. rotate the PCs into such a position with as many near-zero loadings as possible with relatively few large loadings on each PC). Varimax departs from Thurstone s definition of simple a structure as it defines simplicity in terms of variances (hence, near-zero and near-unity loadings). Formulation and properties of the rotation s advantages and disadvantages may be found in Richman (9, 9), Mestas Nuñez () and Jolliffe (9, )..3. Tests to compare the PC results and input dataset In order to obtain the interpretation of the results, the degree of correspondence must be assessed between the PC scores and the underlying flow patterns (T-mode) or between the grid-point time series (S-mode) to the input data. One way to achieve this is to compare qualitatively the input patterns to the PC scores, as discussed in Salles et al. () for T-mode or in Compagnucci et al. () for S-mode. A quantitative method is to match the PC loadings to the corresponding patterns from the parent dispersion matrix using the coefficient of congruence (CC) (Borg and Groenen, 99) that is a measure of similarity between the two..4. Plasmodes description In order to demonstrate the differences between the S- and T-modes, as well as between the covariance and correlation dispersion matrices, seven plasmodes are used. The synthetic data are sea level pressure, constructed to mimic three hypothetical zonal, meridional and circular uncorrelated flow patterns. Each plasmode has the same number of homogeneously spaced gridpoints on a network and time steps of flow patterns to allow for the possibility of showing correspondence between S-mode and T-mode. Each flow pattern has a direct and an inverse type, the latter derived by reflecting each grid-point about a constant mean field of hpa. When the same number of direct and inverse types are used, the mean field is a constant hpa. For T-mode, the plasmodes defined in this study lead to a correlation matrix, with integers +, or (except for plasmode 3), having linear dependency in the columns. To avoid this and to have nearly full-rank solution, Gaussian noise, in the form of N(,) was added to each observation. Plasmodes and 3 have a lower level of noise than the others do. Given the high signal-to-noise, the eigenvalues (λ) are expected to have the following property: λ >λ >λ 3 >λ 4 >...>λ. Plasmode is essentially the same as that used by Vargas and Compagnucci (93) and Richman (9) Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

5 CAN PCA PROVIDE ATMOSPHERIC CIRCULATION OR TELECONNECTION PATTERN? and it is included as a baseline in the present work. The flow patterns involved in the dataset are shown in Figure (a f). The sequence in which the types are included is shown in Table I. For the meridional (Figure (a b)) (zonal, Figure (c d)) flow pattern there is a gradient of hpa between the direct and the inverse types, a 4-hPa difference in the left side (lower edge) of the domain, while the difference is null in the right side (upper side) of the fields. The cyclonic and anti-cyclonic flows (Figure (e f)) have hpa in the corners with 99 and 3 hpa in the centres, respectively. This can be thought of as a surface of a square drum oscillating up and down with maximum displacement in the centre. Owing to the same frequency of each pattern, the first three eigenvalues will be very close ( degenerate multiplets ) with intermixing of the unrotated PC loadings (North et al., 9), at least in the T-mode. The degeneracy may be deconvoluted with a rotation of the PCs (Cheng et al., ). Plasmode (types shown in Figure (g l)) has similarities to plasmode. However, the sequencing of the zonal and meridional patterns differs (Table I). Furthermore, the hpa value of null variation between the direct and inverse types is located in this plasmode in the lower edge of the field for the zonal flows (Figure (g h)). The meridional patterns (Figure (i j)) are identical to those in plasmode but are located in time steps 3 4. The most relevant difference between both plasmodes is in the anti-cyclonic and cyclonic patterns (Figure (k l)) where the most variation between the circular cyclonic fields occurs on the edges of the domain for plasmode (Figure (a)). Plasmode was created to uncover the impact of differences in the magnitudes of Table I. PLASMODE construction details including the basic flow types (Figure ), the composition and sequences of each type. A denotes a meridional flow, B a meridional inverse, C is a zonal flow, D a zonal inverse, E is a cyclonic flow and F is an anti-cyclonic flow for PLASMODES and 3. G is zonal flow, H is zonal inverse, I is meridional flow, J is meridional inverse, K is a cyclonic flow and L is anti-cyclonic flow for PLASMODES and 4 through. PLASMODE number Sequence composition AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF GGGGGG HHHHHH IIIIII JJJJJJ KKKKKK LLLLLL 3 Linear progression of types from A, E, D, B, F, C to A (see text) 4 GGGGGGGGGG HHHHHHHHHH IIIII JJJJJ KKK LLL 5 GGGGGG HHHHHH III JJJ KKKKKKKKK LLLLLLLLL GGGG HHHHHHHH II JJJJ KKKKKK LLLLLLLLLLLL GGGGGGGGGGGGGGGGGGGG IIIII JJJJJ KKK LLL the values of the circular flow patterns. In addition, this plasmode has the potential for a degenerate unrotated solution. Plasmode 3 presents a linear change from zonal to cyclonic to meridional flows (Figure (m)). This is in contrast to plasmodes and, which mimic stationary flow without transitions between types. This third plasmode emulates more realistically the atmospheric characteristic of evolving flow patterns with no stationary and abrupt steps of change. However, it is important to note that the types that generated this sequence of different patterns are exactly the same six flow types from plasmode. Plasmodes 4 and 5 have the same flows as in plasmode (Figure (g l)) with different frequencies for each of the three main flows (Table I) to avoid a degenerate solution. While plasmode 4 has more contribution of zonal flow, plasmode 5 has the highest frequency for the circular flow. The idea is to determine the impact of changes in the frequencies of each pattern on the ability of the S- and T-mode to retrieve the basic input patterns. Plasmode has the same flows as in plasmode (Figure (g l)) but differs from all previous experiments, as there exists a non-constant mean field owing to the inclusion of different numbers of direct and inverse types for each pattern (Table I). By creating an uneven number of flows within each pattern, the mean pattern (Figure (b)) for each group is no longer hpa over the entire domain and the mean must be considered in the physical interpretation. Experiments with this dataset can detect sensitivity to the amount of residence time of each type. Plasmode is similar to plasmode 4 but, for zonal flow, only instances of the direct type are used, without inverses (Table I). This results in a mean field (Figure (a)) in the absence of inverse types for a particular flow and reflects circulations seen in parts of South America. 3. Results In our examples, all the plasmodes have different spatial or temporal variances or both (Figure ). The PCA decomposition will be controlled by spatial variance in the temporal domain in the T-mode and by the temporal variance in the spatial domain in the S-mode. 3.. Plasmodes and : stationary with degenerate solutions These plasmodes account for approximately the same amount of spatial variance, but the location of the maximum and gradient directions of the variance are different (Figure c, c). Nonetheless, the temporal variances are similar for both plasmodes, but less total variance exists for plasmode than for due to the lower level of noise added to plasmode (Figures 3 ). For unrotated T-mode with a correlation dispersion matrix, both plasmodes (Figure 3(a)) and Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

6 R. H. COMPAGNUCCI AND M. B. RICHMAN Plasmode Plasmode a) Cyclonic fields b ) Mean field plasmode b ) Mean field plasmode Plasmode variance Plasmode variance Plasmode 3 variance Plasmode 4 variance c ) Total var. = 5 c ) Total var. = 593 c 3) Total var. = 443 c 4) Total var. 5 spatial field's variance spatial field's variance spatial field's variance spatial field's variance d ) Total var. = 35 d ) Total var. = 43 d 3) Total var. = 93 d 4) Total var. = 59 Plasmode 5 variance Plasmode variance Plasmode variance c 5) Total var. =.3 c ) Total var. = 5. c ) Total var. = 4 spatial field's variance spatial field's variance spatial field's variance d 5) Total var. = d ) Total var. = d ) Total var. = 55 Figure. (a) variation on the edges of the domain between the circular cyclonic fields of PLASMODE and ; (b) mean field for (b.) PLASMODE and (b.) plasmode; (c) Temporal Variance for each grid point and (d) Spatial Variance for each PLASMODE s snapshot. (Figure (a)) have three significant eigenvalues that are approximately the same magnitude (about %, Table II), matching the number of input flows, and the remaining eigenvalues are exceedingly small. These leading eigenvalues form a statistically degenerate multiplet (North et al., 9) the impact of which may be observed in the three PC scores (plasmode, Figure 3(a) and plasmode, Figure (a)) that do not match well to any of the three input patterns (Figure (a f) and (g l)). The degeneracy influences the PC loadings (time series) too, causing them to behave in a manner inconsistent with the order that the spatial patterns are inserted in the dataset. Therefore, in an exploratory setting, when the correct groupings of the input patterns and their timing Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

7 CAN PCA PROVIDE ATMOSPHERIC CIRCULATION OR TELECONNECTION PATTERN? 9 PC - score PC - score 3 PC - score PC - score rotated PC - score rotated 3 PC - score rotated λ =.3 Var. = 33.4% λ =.9 λ 3 =.9 Σa j =. Σa j =. Σa 3j =. Var. = 33.% Var. = 33.% Var. = 33.3% Var. = 33.% Var. = 33.3% PC loading PC loaging PC loading PC loading rotated PC loading rotated PC loading rotated Unrotated T Mode : correlation matrix (a) Varimax rotated T Mode : correlation matrix (b) PC - score PC - score 3 PC - score PC - score rotated PC - score rotated 3 PC - score rotated λ = 5. λ = 54.9 λ 3 = 3.3 Σa j = 5.4 Σa j = 54.9 Σa 3j = 3.3 Var. = 4.9% Var. = 4.9% Var. =.4% Var. = 4.9% Var. = 4.% Var. =.4% PC loading PC loading PC loading PC loading rotated PC loadind rotated PC loading rotated Unrotated T Mode : covariance matrix (c) Varimax rotated T Mode : covariance matrix (d) Figure 3. Principal Component Analysis in T-Mode for PLASMODE (PC score maps with isopleths, each.5). are unknown, the interpretation of either the PC loadings or the PC scores will lead to erroneous conclusions as both the spatial pattern and the temporal sequence are not realistic. Note that disparities between the PCA for the two unrotated T-mode plasmodes are not indications of the real differences. CC matching of the time series loadings to the temporal correlation among the input flow patterns for plasmode ranges from qualitatively borderline (PC =.) to poor (PC3 =.) and terrible (PC =.). In plasmode (Figures ), the CCs improve somewhat (PC =.3, PC =.9 and PC3 =.) over the plasmode results. Results for unrotated T-mode with a covariance matrix (Figures 3(c) and (c)) are rather different. The variances for the first two unrotated PCs have nearly identical values (Table II) and suggest a statistically degenerate couplet, whereas the third dimension, which accounts for approximately half the variance of the first two, therefore, should be distinct. The reason is that the variance in the temporal domain is indistinguishable between zonal and meridional flow patterns ( 4 gpm ), whereas it decreases for the circular flow ( 9 gpm ) in plasmode (Figure (d)) and plasmode (Figure (d)). Hence, a range of degeneracy can be seen in the first two PC scores and loadings time series, while the third PC score and loading captures the third input pattern well. In spite of the visual detection of degeneracy between the first two PCs, the CCs for plasmode are qualitatively good (PC and PC =.9). The third PC is not degenerate and has excellent matching (PC3 is nearly.). For plasmode the CC values increase for the first two PCs (PC =.94, PC =.959 and PC3 =.999), as the eigenvalue spacing increases. Richman (9) notes that the CC is biased high; therefore, a high value of CC should be supplemented by the visual inspection between the input pattern having the highest absolute loading and the corresponding PC score pattern to insure a faithful correspondence. Varimax rotation for plasmodes and, using covariance (Figures 3(b) and (b)) and correlation (Figures 3(d) and (d)), has essentially the same percentage of variance explained by each PC as the corresponding unrotated case (Table II). Despite the same variance explained, the rotated PC score changes dramatically mimicking perfectly the input flow patterns, and the PC loadings capture the time step behaviour of the spatial patterns correctly. CC matching values are nearly. indicating an excellent fit to both, the correlation and covariance input matrix. In Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

8 R. H. COMPAGNUCCI AND M. B. RICHMAN PC - loading PC - loading 3 PC - loading PC - loading rotated PC - loading rotated 3 PC - loaging rotated λ =.9 λ = 4.3 λ 3 =.99 a j =. a j =. a 3j =.4 Var. = 9.9% Var. =.9% Var. =.3% Var. = 3.44% Var. = 33.5% Var. = 3.% PC score PC score PC score PC score rotated PC score rotated PC score rotated Unrotated S Mode : correlation matrix (a) Varimax rotated S Mode : correlation matrix (b) PC - loading PC - loading 3 PC - loading PC - loading rotated PC - loading rotated 3 PC - loading rotated λ = λ = λ 3 =.49 a j =.3 a j = 9. a 3j = 34.4 Var. = 3.44% Var. =.% Var. =.44% Var. = 39.5% Var. =.5% Var. = 3.5% PC score PC score PC score PC-score rotated PC-score rotated PC-score rotated Unrotated S Mode : covariance matrix (c) Varimax rotated S Mode : covariance matrix (d) Figure 4. S-Mode for PLASMODE (PC loadings maps with isopleths, each.5 for correlation, and each. for covariance). both groups of rotated PCs, results show clearly the differences between plasmodes and due to sequencing of flow patterns (Table I). However, the differences between both circular patterns, due to the location of highest variance, are not captured for these results. One can find fault with the all of the PC scores (unrotated, rotated, covariance, correlation) in the analysis due to the zero line location. This line occurs in the centre of the domain, because the PC score is standardized, which would be incorrectly interpreted as a zero anomaly in that location. Unrotated S-mode, for plasmodes (Figure 4(a) and (c)) and (Figure (a) and (c)), give results different from those obtained by T-mode. Again, the three first eigenvalues are large and explain almost all the variance (Table II). It is important to note that, in T-mode, the variances are similar for the three first correlation-based PCs (Figure 3(a)) and for the two leading ones for the covariance matrix (Figure 3(c)). Conversely, in the unrotated S-mode, the leading PC explains the majority of the variance and the remaining two explain less than one-third of the variance with closely spaced eigenvalues (Figure 4(a) and (c)). In S-mode, the PC loadings and scores, within each correlation- and covariance-based analysis, are considerably different between both plasmodes (Figures 4(a), (c) and 9(a), (c)). The disparity occurs, despite the seemingly similar flow patterns, because the grid-point time series are not the same (Figure for plasmode and Figure for plasmode ). This is reflected also in the corresponding point correlation and covariance structures (Figures 5 and for plasmode and Figures and for plasmode ). The correlation unrotated PC loading patterns (Figures 4(a) and 9(a)) match well to specific grid-point correlation fields for both plasmodes. The best matches occur for PC in the centre of the domain (Figure 5, gridpoints 5,,, ) for plasmode (Figure 4(a)) and in the northwest corner (Figure, gridpoints,,, ) for plasmode (Figure 9(a)). In those locations, the time series (Figures and ) are similar to the PC score patterns in plasmodes and. PCs and 3 explain considerably less variance than PC (Figures 4(a) and 9(a); Table II). Furthermore, the CCs for PC are.99 and.999 for plasmodes and, respectively. For the second and third PCs, the CC matches are worse (. and.93 for plasmode and.354 and.39 for plasmode ) than for a random match. Despite similar generating flow patterns, frequencies and comparable amounts of variance explained, the PC loadings between plasmodes and show large differences between them, which may be interpreted erroneously as a different circulation characteristics. In both plasmodes, the second and third PC loadings Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

9 CAN PCA PROVIDE ATMOSPHERIC CIRCULATION OR TELECONNECTION PATTERN? Grid point Grid point 3 Grid point 4 Grid point 5 Grid point Grid point Grid point Grid point 9 Grid point Grid point Grid point Grid point 3 Grid point 4 Grid point 5 Grid point Grid point Grid point Grid point 9 Grid point Grid point Grid point Grid point 3 Grid point 4 Grid point Grid point Grid point Grid point Grid point 9 Grid point 3 Grid point 3 Grid point 3 Grid point 33 Grid point 34 Grid point 35 Grid point Grid point Figure 5. Correlation maps between the grid point j and each grid point (isopleths, each.5) for PLASMODE. represent bipolar patterns, i.e., dipoles. Examination of the corresponding correlation maps (Figures 5 and ) proves that such dipole patterns do not exist and are an artifact of the unrotated S-mode PCA. Varimax rotation (Figures 4(b) and 9(b)) is applied as the next step to determine if the second and third PCs are irrelevant or if the dipole solution is inadequate due to the misalignment of the PCs and orthogonality when the significant correlations or covariances are positive (Karl and Koscielny, 9). After rotation, the CCs show excellent values (.993,.993 and.99) for the three PCs of plasmode and relatively lower values (.,.94 and.95) for plasmode. Note that for the first rotated PC, the CC actually decreases slightly (in the third decimal) from the unrotated counterpart, whereas, the other two increase dramatically. The rotated PC loadings have no dipoles and, consequently, the zero value disappears (Figures 4(b) and 9(b) for plasmodes and ). Another difference between the plasmodes, before and after rotation, is that the clustered areas in space differ for all three PC loadings, as do the corresponding PC score time series (Figures 4, 9). Furthermore, the variances of each rotated PC are similar in plasmode, while in plasmode, the first PC accounts for an additional % of the explained variance. It is clear that examination of the eigenvalues without probing the PC loadings and scores is precarious. Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

10 R. H. COMPAGNUCCI AND M. B. RICHMAN Grid point Grid point 3 Grid point 4 Grid point 5 Grid point Grid point Grid point Grid point 9 Grid point Grid point Grid point Grid point 3 Grid point 4 Grid point 5 Grid point Grid point Grid point Grid point 9 Grid point Grid point Grid point Grid point 3 Grid point 4 Grid point Grid point Grid point Grid point Grid point 9 Grid point 3 Grid point 3 Grid point 3 Grid point 33 Grid point 34 Grid point 35 Grid point Grid point Figure. Covariance maps between the grid point j and each grid point (isopleths, each ) for PLASMODE. What can be said is that similarity of the eigenvalues is a necessary but not sufficient condition to make a claim of the same characteristics (e.g. teleconnections or circulations) being present in two datasets. For covariance, unrotated S-mode PC loadings for plasmodes and (Figures 4(c) and 9(c)) have spatial fields controlled by the spatial variance field (Figure (c) and (c)), where the lower left corner quadrant of the domain in plasmode, and the upper left corner in plasmode, have the highest values. Therefore, PC s loadings (Figure 4(c)) are largest in that location. Both plasmodes second PC loadings have a constant gradient with a zero line in the centre, in a dipole configuration and the third PCs are dipoles. PCs and 3 loadings are unlike any of the input covariance fields, which have positive coefficients (Figures and for plasmodes and, respectively). Furthermore, the CCs for the first three PCs are.99,.35 and.3 for plasmode and.999,.9 and.3 for plasmode. These results are similar to those for the unrotated S-mode based on correlation. Varimax rotation (Figures 4(d) and 9(d) for plasmodes and ) changes the variance explained by each PC in the same manner as previously for the correlation case. The variance explained by the leading unrotated PC decreases by about a factor of from the unrotated solution, while the second and third rotated PCs show Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

11 CAN PCA PROVIDE ATMOSPHERIC CIRCULATION OR TELECONNECTION PATTERN? 3 grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point grid point - 35 grid point - Figure. Grid-point time series for PLASMODE. considerably higher values. After rotation, the loadings dipole structure disappears. The CCs for PC, PC and PC3 are.959,.95 and.933 (plasmode ) and.9,.9 and.93 (plasmode ), somewhat lower than to those for the correlation case. Covariance rotated loadings for plasmode have some resemblance to the PC scores in the rotated T-mode and the input flow patterns, whereas the S-mode PC scores bear some resemblance to the PCs loadings in the rotated T-mode. This might lead the unsuspecting investigator to assume tacitly one can interpret the S-mode scores (loadings) as T-mode loadings (scores). Such an interpretation would be incorrect, since there are changes in the variance explained from the S- to the T-mode (e.g. in Figure 4(d) vs 5 in Figure 3(d) for PC). Moreover, it is not possible to generalize the previous equivalence from T-mode and S-mode patterns when using unrotated or orthogonal rotation. For example, in plasmode, which has some differences in construction from plasmode (Section ), rotated covariance S-mode differs for T-mode more than in plasmode. A three-dimensional geometrical display for T- and S-mode was created (Figure 3) to provide insight into the degree to which the input variables can be recovered for covariance-based plasmode PC loadings. In the case of rotation of the T-mode loadings, the input flow patterns can be recovered with full fidelity by the PC scores, with the exception of the aforementioned zero line shift (Figure (b) and (d)). The reason for this improvement from the unrotated results can be seen where the unrotated three-dimensional axes do not line up with the covariance constellations (Figure 3(a)) but the rotation aligns them perfectly (Figure 3(b)), resulting in a strong simple structure in the T-mode and near-perfect CCs. In S-mode, the CCs do not match as well, and the poor performance of the rotation can be traced directly to the lack of a simple structure in the swarm of points in the covariance constellation when viewed as a three-dimensional plot in both the unrotated and rotated loadings (Figures 3(c) and (d), respectively). Interestingly, CC matches in the borderline to good range is precisely what Richman (9, Table IX) predicts for a Varimax rotation based on a weak simple structure. 3.. Plasmode 3: non-stationary flow based on plasmode The flows in plasmode 3 (Figure (m)) are linear combinations of the three flow patterns of plasmode (Figure (a f)). Both plasmodes have similar patterns of Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

12 4 R. H. COMPAGNUCCI AND M. B. RICHMAN PC - score PC - score 3 PC - score PC - score rotated PC - score rotated 3 PC - score rotated λ =.99 λ =.9 λ 3 =. Σa j =.93 Σa j =.93 Σa 3j =.3 Var. = 33.3% Var. = 33.% Var. = 3.% Var. = 33.4% Var. = 33.4% Var. = 3.% PC - loading PC loading PC loading PC loading rotated PC loading rotated PC loading rotated Unrotated T Mode : correlation matrix (a) Varimax rotated T Mode : correlation matrix (b) PC - score PC - score 3 PC - score PC - score rotated PC - score 3 PC - score λ = 5.9 λ = 5. λ 3 =.9 Σa j = 5.3 Σa j = 5.3 Σa 3j =.3 Var. = 4.9 Var. = 4.9% Var. =.39% Var. = 4.% Var. = 4.3% Var. =.4% PC loading PC loading PC loading PC loading rotated PC loadind rotated PC loading rotated Unrotated T Mode : covariance matrix (c) Varimax rotated T Mode : covariance matrix (d) Figure. Principal Component Analysis in T-Mode for PLASMODE (PC score maps with isopleths, each.5). correlation and covariance among gridpoints (not shown here for plasmode 3). Unrotated T-mode PCs scores and loadings (Figure 4(a) and (c)), based on the correlation matrix, are different from those of plasmode (Figure 3(a)). Furthermore, while plasmode has three statistically degenerate eigenvalues, in this plasmode they are distinct, in particular, between PCs and 3. The CCs range from an excellent match for PC (close to ), to borderline for PC (.5) and terrible for PC3 (.5). The unrotated PC scores, based on a covariance matrix (Figure 4(c)), appear the same as those in plasmode (Figure 3(c)). Nonetheless, the eigenvalues and PC loadings differ considerably and do not capture well the generating input flow patterns, for either the covariance or correlationbased analyses. Rotated T-mode PCs scores, that are correlation-based, fit well to the generating input patterns, and the PC loadings match well to the transitions between flows (Figure 4(b)). The rotation extracts clearly the proper sequencing and temporal magnitude of the plasmode flow patterns. For example, the first PC loading measures the meridional flow contribution with values decreasing towards zero at first, vanishing on day, then staying precisely at zero through day 3, when there is no meridional contribution, and then increasing to a maximum on day 9, when the flow returns to % meridional. However, careful inspection of the input flows (Figure (m)) and the PC loadings (Figure 4(b)) indicates that relatively large loadings (..) can occur for flows that are not close to the PC score patterns (e.g. for PC score, zonal flow, compare the plasmode patterns on day or to the magnitude of the PC loadings). In such cases, the loading squared, measures the contribution of the flow to the PC score and a linear combination of PC loadings may be necessary to reproduce the flow. Since the sum of the square of the loadings is the variance explained, this means that, in an evolving flow, the explained variance cannot be assumed indicative of the number of input patterns having a specific flow type represented by the PC scores. However, the explained variance does give an estimate of the number of patterns having some contribution of zonal, meridional and circular flow. The analyst should be vigilant when interpreting such PC loadings and the eigenvalues/explained variance. For T-mode covariance-based PCA, the unrotated results (Figure 4(c)) are more similar to those obtained for plasmode (Figure 3(c)) and, again, the PC scores do not capture well the morphology of the input plasmode flow patterns. After rotation, the PC scores (Figure 4(d)) appear to have the correct shapes and the PC loadings are improved over their unrotated counterparts. However, owing to the confounding influence of the variance fields (not shown), these loadings do not correspond as well to Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

13 CAN PCA PROVIDE ATMOSPHERIC CIRCULATION OR TELECONNECTION PATTERN? 5 Table II. Eigenvalues (unrotated solution)/sum of squared loadings (rotated solution) and explained variance (%) in brackets for PLASMODE analysed in T- and S-Mode, using correlation and covariance input matrix. Perfect zonal, meridional and circular PC spatial patterns are pointed out by letters Z, M and C respectively. T-Mode : (cor.) T-Mode : (cov.) S-Mode : (cor.) S-Mode : (cov.) PLAS. Unrotated Rotated Unrotated Rotated Unrotated Rotated Unrotated Rotated st PC. (33.4). (33.3)M 5. (4.9) 5.4 (4.9) Z.9 (9.9). (3.44) (3.44).3 (39.5) nd PC.9 (33.). (33.) Z 54. (4.9) 54.9 (4.) M 4.3 (.9). (33.5) 54.9 (.) 9. (.5) 3rd PC.9 (33.). (33.3) C 3.3 (.4) 3.3 (.) C.99 (.3).4 (3.).5 (.44) 34. (3.5) PLAS. st PC.99 (33.3).93 (33.5)M 5.9 (4.) 5. (4.) Z 9.3 (.9) 5.3 (4.4) (3.4) 99.4 (43.) nd PC.95 (33.9).93 (33.5) Z 5.3 (4.4) 5.3 (4.3) M 3.3 (9.4). (9.) 53.5 (.5) 4.9 (.5) 3rd PC.5 (3.3) C.3 (3.) C.3 (.4).3 (.4) C 3.3 (9.9). (.) 39.4 (.9).9 (.) PLAS. 3 st PC.5 (4.3) 3.3 (.95)M 4. (5.) 39.9 (4.)Z 9.9 (.3) 4. (4.) 35. (.3) 94. (43.5) nd PC 4.34 (39.3) 3. (.3)Z (35.3) 39.9 (4.9) M 4.5 (.) 5. (4.) 4. (.9) 9.3 (43.) 3rd PC 5.9 (4.) 9.4 (.3) C 4. (.) 49. (.) C.4 (4.).5 (.) 5. (3.44) 5.4 (3.) PLAS. 4 st PC 9. (55.) Z 9. (55.) Z 9. (.4) Z 9. (.4) Z.4 (.) 9. (54.4) 43.3 (3.9) 35.9 (.) nd PC 9.95 (.3) M 9.95 (.3) M 4. (3.9)M 4. (3.9)M 4.95 (3.5). (3.5) 53. (.4) 4.4 (3.) 3rd PC 5.9 (.43) C 5.9 (.43) C 3. (.34) C 3. (.39) C. (.) 5. (4.).9 (4.44) 45. (.9) PLAS. 5 st PC. (49.34) C. (49.34) C 5.3 (45.45) Z 5.3 (45.45) Z 3.53 (4.) 4.5 (4.4) (.4). (4.39) nd PC.93 (33.5) Z.93 (33.5) Z 39. (3.) C 39. (3.) C 3.55 (9.).99 (33.3) 5.4 (.55). (33.) 3rd PC 5.9 (.) M 5.9 (.) M.9 (.59)M.9 (.59) M.4 (5.) 9. (.) 3. (4.9) 55.3 (.) PLAS. st PC.5 (49.33) C.5 (49.33) C (45.3) Z (45.3) Z 3. (5.59) 4.5 (4.3) 5.9 (.) 49.5 (4.9) nd PC.93 (33.) Z.93 (33.) Z 393. (3.) C 393. (3.) C 3.3 (9.4). (3.) 49. (.4) 4.5 (33.) 3rd PC 5.9 (.) M 5.9 (.) M 9. (.) M 9. (.) M. (5.) 9.53 (.4). (4.4) 53. (.93) PLAS. stpc 9. (55.)Z 9. (55) Z 955. (.) Z 955. (.) Z.5 (9.) 5.94 (44.) 334. (.) 49.9 (4.) nd PC 9.94 (.) M 9.94 (.) M 45. (3.)M 45. (3.) M 4.9 (.).3 (35.) 45.3 (.) 5.4 (4.3) 3rd PC 5.9 (.44) C 5.9 (.44) C 3. (.43) C 3. (.43) C 3. (9.5).9 (.) 34. (5.) 5 (.) Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

14 R. H. COMPAGNUCCI AND M. B. RICHMAN PC - loading PC - loading 3 PC - loading PC - loading rotated PC - loading rotated 3 PC - loading rotated λ = 9.3 λ = 3.3 λ 3 = 3.3 Σa j = 5.3 Σa j =. Σa 3j =. Var. =.9% Var. = 9.39% Var. = 9.9% Var. = 4.4% Var. = 9.% Var. =.% PC score PC score PC score PC score rotated PC score rotated PC score rotated Unrotated S Mode : correlation matrix (a) Varimax rotated S Mode : correlation matrix (b) PC - loading PC - loading 3 PC - loading PC - loading rotated PC - loading rotated 3 PC - loading rotated λ = λ = 53.5 λ 3 = 39.4 Σa j = 99.4 Σa j = 4.9 Σa 3j =.9 Var. = 3.4% Var. =.5% Var. =.9% Var. = 43.4% Var. =.5% Var. =.% PC score PC score PC score PC score rotated PC score rotated PC score rotated Unrotated S Mode : covariance matrix (c) Varimax rotated S Mode : covariance matrix (d) Figure 9. S-Mode (PC loadings maps with isopleths, each.5 for correlations, and each. for covariance) for PLASMODE. the sequencing and evolution of the patterns as does the correlation counterpart. S-mode, unrotated and rotated correlation- (Figure 5(a) and (b)) and covariance-based results (Figure 5(c) and (d)) produced PC loadings similar to those obtained for plasmode (Figure 4) since both plasmodes had similar dispersion matrices. For that reason, the same observations that were made for plasmode apply here. The main differences between plasmodes and 3 can be observed between the PC scores (Figures 4 and 5) and the explained variance. In plasmode 3, the PC score rate of transition in the time between types occurs more slowly than for plasmode. The CCs for the plasmode 3 unrotated correlation analyses are nearly,. and.33 for PCs, and 3 respectively and, for covariance, nearly,.343 and.. After rotation, the CCs are.999,.999 and.94 for correlation-based PCs and, for covariance,.93,.93 and.93. As was the case in plasmode, the CCs for plasmode 3 show improvement for rotated PCs and 3 to the good to excellent range Plasmodes 4 and 5: changes in the percentage of each flow pattern Both plasmodes are comprised of the same input patterns as plasmode but with different frequencies, and the same number of direct and inverse types within each pattern. The rationale is to investigate the variation due to the change in the percentage (or frequencies) of each pattern (Table I and discussion in Section ). Plasmode 4 has 55.5% zonal and only.% of circular flows, whereas plasmode 5 has 5% circular flows. Recall, in plasmode they are equal. Therefore, non-degenerate results will be expected for plasmodes 4 and 5, in the T-mode. Unrotated T-mode PC solutions for correlation- and covariance-based analyses have results that are the same as those obtained for plasmode under Varimax-rotated T-mode. Hence, the reader is referred to Figure (b) and (d) to view the appropriate scores and loadings. Differences from plasmode will be discussed. One of the most significant results for the correlation-based PCs is that effectively, the eigenvalues are well-separated (Table II) and the percentage of explained variance for each PC corresponds to the percentage of the input flow patterns that is matched by the corresponding PC score. In plasmode 4, the first and third PC scores are zonal and circular patterns, respectively, explaining 55. and.4% of the variance, while a circular flow is the first PC score accounting for 49.34% of the variance for plasmode 5. Therefore, changes in circulation, due solely to changes in the frequencies of the input patterns, appear to be captured well by this approach. Such an agreement is advantageous when cataloging the frequency of flow Copyright Royal Meteorological Society Int. J. Climatol. : 3 () DOI:./joc

NOTES AND CORRESPONDENCE. Removal of Systematic Biases in S-Mode Principal Components Arising from Unequal Grid Spacing

NOTES AND CORRESPONDENCE. Removal of Systematic Biases in S-Mode Principal Components Arising from Unequal Grid Spacing 394 JOURNAL OF CLIMATE NOTES AND CORRESPONDENCE Removal of Systematic Biases in S-Mode Principal Components Arising from Unequal Grid Spacing DIEGO C. ARANEO AND ROSA H. COMPAGNUCCI Departamento de Ciencias

More information

e 2 e 1 (a) (b) (d) (c)

e 2 e 1 (a) (b) (d) (c) 2.13 Rotated principal component analysis [Book, Sect. 2.2] Fig.: PCA applied to a dataset composed of (a) 1 cluster, (b) 2 clusters, (c) and (d) 4 clusters. In (c), an orthonormal rotation and (d) an

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Problems with EOF (unrotated)

Problems with EOF (unrotated) Rotated EOFs: When the domain sizes are larger than optimal for conventional EOF analysis but still small enough so that the real structure in the data is not completely obscured by sampling variability,

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI tailored seasonal forecasts why do we make probabilistic forecasts? to reduce our uncertainty about the (unknown) future

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature

More information

On Sampling Errors in Empirical Orthogonal Functions

On Sampling Errors in Empirical Orthogonal Functions 3704 J O U R N A L O F C L I M A T E VOLUME 18 On Sampling Errors in Empirical Orthogonal Functions ROBERTA QUADRELLI, CHRISTOPHER S. BRETHERTON, AND JOHN M. WALLACE University of Washington, Seattle,

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Principal Component Analysis of Sea Surface Temperature via Singular Value Decomposition

Principal Component Analysis of Sea Surface Temperature via Singular Value Decomposition Principal Component Analysis of Sea Surface Temperature via Singular Value Decomposition SYDE 312 Final Project Ziyad Mir, 20333385 Jennifer Blight, 20347163 Faculty of Engineering Department of Systems

More information

E = UV W (9.1) = I Q > V W

E = UV W (9.1) = I Q > V W 91 9. EOFs, SVD A common statistical tool in oceanography, meteorology and climate research are the so-called empirical orthogonal functions (EOFs). Anyone, in any scientific field, working with large

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is factor analysis? What are factors? Representing factors Graphs and equations Extracting factors Methods and criteria Interpreting

More information

Gopalkrishna Veni. Project 4 (Active Shape Models)

Gopalkrishna Veni. Project 4 (Active Shape Models) Gopalkrishna Veni Project 4 (Active Shape Models) Introduction Active shape Model (ASM) is a technique of building a model by learning the variability patterns from training datasets. ASMs try to deform

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

Frequency-Based Separation of Climate Signals

Frequency-Based Separation of Climate Signals Frequency-Based Separation of Climate Signals Alexander Ilin 1 and Harri Valpola 2 1 Helsinki University of Technology, Neural Networks Research Centre, P.O. Box 5400, FI-02015 TKK, Espoo, Finland Alexander.Ilin@tkk.fi

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Principal Component Analysis & Factor Analysis. Psych 818 DeShon Principal Component Analysis & Factor Analysis Psych 818 DeShon Purpose Both are used to reduce the dimensionality of correlated measurements Can be used in a purely exploratory fashion to investigate

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Francina Dominguez*, Praveen Kumar Department of Civil and Environmental Engineering University of Illinois at Urbana-Champaign

Francina Dominguez*, Praveen Kumar Department of Civil and Environmental Engineering University of Illinois at Urbana-Champaign P1.8 MODES OF INTER-ANNUAL VARIABILITY OF ATMOSPHERIC MOISTURE FLUX TRANSPORT Francina Dominguez*, Praveen Kumar Department of Civil and Environmental Engineering University of Illinois at Urbana-Champaign

More information

Empirical Orthogonal Function (EOF) (Lorenz, 1956) Hotelling, H., 1935: The most predictable criterion. J. Ed. Phych., 26,

Empirical Orthogonal Function (EOF) (Lorenz, 1956) Hotelling, H., 1935: The most predictable criterion. J. Ed. Phych., 26, Principal Component Analysis (PCA) or Empirical Orthogonal Function (EOF) (Lorenz, 1956) Hotelling, H., 1935: The most predictable criterion. J. Ed. Phych., 26, 139-142. (from Jackson, 1991 and Graham,1996

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms. Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms. January 5, 25 Outline Methodologies for the development of classification

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Giorgos Korfiatis Alfa-Informatica University of Groningen Seminar in Statistics and Methodology, 2007 What Is PCA? Dimensionality reduction technique Aim: Extract relevant

More information

4. Matrix Methods for Analysis of Structure in Data Sets:

4. Matrix Methods for Analysis of Structure in Data Sets: ATM 552 Notes: Matrix Methods: EOF, SVD, ETC. D.L.Hartmann Page 68 4. Matrix Methods for Analysis of Structure in Data Sets: Empirical Orthogonal Functions, Principal Component Analysis, Singular Value

More information

Interpreting variability in global SST data using independent component analysis and principal component analysis

Interpreting variability in global SST data using independent component analysis and principal component analysis INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 30: 333 346 (2010) Published online 23 March 2009 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/joc.1888 Interpreting variability

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

Empirical Orthogonal Function (EOF) (Lorenz, 1956) Hotelling, H., 1935: The most predictable criterion. J. Ed. Phych., 26,

Empirical Orthogonal Function (EOF) (Lorenz, 1956) Hotelling, H., 1935: The most predictable criterion. J. Ed. Phych., 26, Principal Component Analysis (PCA) or Empirical Orthogonal Function (EOF) (Lorenz, 1956) Hotelling, H., 1935: The most predictable criterion. J. Ed. Phych., 26, 139-142. (from Jackson, 1991 and Graham,1996

More information

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Multivariate Statistics (I) 2. Principal Component Analysis (PCA) Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation

More information

SPATIAL AND TEMPORAL DISTRIBUTION OF AIR TEMPERATURE IN ΤΗΕ NORTHERN HEMISPHERE

SPATIAL AND TEMPORAL DISTRIBUTION OF AIR TEMPERATURE IN ΤΗΕ NORTHERN HEMISPHERE Global Nest: the Int. J. Vol 6, No 3, pp 177-182, 2004 Copyright 2004 GLOBAL NEST Printed in Greece. All rights reserved SPATIAL AND TEMPORAL DISTRIBUTION OF AIR TEMPERATURE IN ΤΗΕ NORTHERN HEMISPHERE

More information

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 65 CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE 4.0. Introduction In Chapter

More information

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces Singular Value Decomposition This handout is a review of some basic concepts in linear algebra For a detailed introduction, consult a linear algebra text Linear lgebra and its pplications by Gilbert Strang

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4]

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4] Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4] With 2 sets of variables {x i } and {y j }, canonical correlation analysis (CCA), first introduced by Hotelling (1936), finds the linear modes

More information

Central limit theorem - go to web applet

Central limit theorem - go to web applet Central limit theorem - go to web applet Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 * [ Z(20N,160W) - Z(45N,165W) + Z(55N,115W) - Z(30N,85W)

More information

Incompatibility Paradoxes

Incompatibility Paradoxes Chapter 22 Incompatibility Paradoxes 22.1 Simultaneous Values There is never any difficulty in supposing that a classical mechanical system possesses, at a particular instant of time, precise values of

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Semiblind Source Separation of Climate Data Detects El Niño as the Component with the Highest Interannual Variability

Semiblind Source Separation of Climate Data Detects El Niño as the Component with the Highest Interannual Variability Semiblind Source Separation of Climate Data Detects El Niño as the Component with the Highest Interannual Variability Alexander Ilin Neural Networks Research Centre Helsinki University of Technology P.O.

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Principal Component Analysis vs. Independent Component Analysis for Damage Detection 6th European Workshop on Structural Health Monitoring - Fr..D.4 Principal Component Analysis vs. Independent Component Analysis for Damage Detection D. A. TIBADUIZA, L. E. MUJICA, M. ANAYA, J. RODELLAR

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Meteorol. Appl. 6, (1999)

Meteorol. Appl. 6, (1999) Meteorol. Appl. 6, 253 260 (1999) Meteorological situations associated with significant temperature falls in Buenos Aires: an application to the daily consumption of residential natural gas Gustavo Escobar

More information

Received 3 October 2001 Revised 20 May 2002 Accepted 23 May 2002

Received 3 October 2001 Revised 20 May 2002 Accepted 23 May 2002 INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 22: 1687 178 (22) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 1.12/joc.811 A GRAPHICAL SENSITIVITY ANALYSIS FOR STATISTICAL

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Introduction to Principal Component Analysis (PCA)

Introduction to Principal Component Analysis (PCA) Introduction to Principal Component Analysis (PCA) NESAC/BIO NESAC/BIO Daniel J. Graham PhD University of Washington NESAC/BIO MVSA Website 2010 Multivariate Analysis Multivariate analysis (MVA) methods

More information

A CLASSIFICATION OF AMBIENT CLIMATIC CONDITIONS DURING EXTREME SURGE EVENTS OFF WESTERN EUROPE

A CLASSIFICATION OF AMBIENT CLIMATIC CONDITIONS DURING EXTREME SURGE EVENTS OFF WESTERN EUROPE INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 19: 725 744 (1999) A CLASSIFICATION OF AMBIENT CLIMATIC CONDITIONS DURING EXTREME SURGE EVENTS OFF WESTERN EUROPE TOM HOLT* Climatic Research Unit,

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

A non-gaussian decomposition of Total Water Storage (TWS), using Independent Component Analysis (ICA)

A non-gaussian decomposition of Total Water Storage (TWS), using Independent Component Analysis (ICA) Titelmaster A non-gaussian decomposition of Total Water Storage (TWS, using Independent Component Analysis (ICA Ehsan Forootan and Jürgen Kusche Astronomical Physical & Mathematical Geodesy, Bonn University

More information

Penalized varimax. Abstract

Penalized varimax. Abstract Penalized varimax 1 Penalized varimax Nickolay T. Trendafilov and Doyo Gragn Department of Mathematics and Statistics, The Open University, Walton Hall, Milton Keynes MK7 6AA, UK Abstract A common weakness

More information

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Image Registration Lecture 2: Vectors and Matrices

Image Registration Lecture 2: Vectors and Matrices Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this

More information

Contents 1 Introduction 4 2 Examples of EOF-analyses SST in the tropical Atlantic SST in the tropical Indian Oc

Contents 1 Introduction 4 2 Examples of EOF-analyses SST in the tropical Atlantic SST in the tropical Indian Oc A Cautionary Note on the Interpretation of EOFs Dietmar Dommenget and Mojib Latif Max Planck Institut fur Meteorologie Bundesstr. 55, D-20146 Hamburg email: dommenget@dkrz.de submitted to J. Climate August

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

UCLA STAT 233 Statistical Methods in Biomedical Imaging

UCLA STAT 233 Statistical Methods in Biomedical Imaging UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Mid-troposphere variables and their association with daily local precipitation

Mid-troposphere variables and their association with daily local precipitation Meteorol. Appl. 6, 273 282 (1999) Mid-troposphere variables and their association with daily local precipitation N E Ruiz, W M Vargas, Departamento de Ciencias de la Atmósfera, FCEyN, Universidad de Buenos

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

THE SIGNIFICANCE OF SYNOPTIC PATTERNS IDENTIFIED BY THE KIRCHHOFER TECHNIQUE: A MONTE CARLO APPROACH

THE SIGNIFICANCE OF SYNOPTIC PATTERNS IDENTIFIED BY THE KIRCHHOFER TECHNIQUE: A MONTE CARLO APPROACH INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 19: 619 626 (1999) THE SIGNIFICANCE OF SYNOPTIC PATTERNS IDENTIFIED BY THE KIRCHHOFER TECHNIQUE: A MONTE CARLO APPROACH ROBERT K. KAUFMANN*, SETH

More information

Covariance and Principal Components

Covariance and Principal Components COMP3204/COMP6223: Computer Vision Covariance and Principal Components Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Independent Component Analysis and Its Application on Accelerator Physics

Independent Component Analysis and Its Application on Accelerator Physics Independent Component Analysis and Its Application on Accelerator Physics Xiaoying Pang LA-UR-12-20069 ICA and PCA Similarities: Blind source separation method (BSS) no model Observed signals are linear

More information

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where: VAR Model (k-variate VAR(p model (in the Reduced Form: where: Y t = A + B 1 Y t-1 + B 2 Y t-2 + + B p Y t-p + ε t Y t = (y 1t, y 2t,, y kt : a (k x 1 vector of time series variables A: a (k x 1 vector

More information

Characteristics of Snowfall over the Eastern Half of the United States and Relationships with Principal Modes of Low-Frequency Atmospheric Variability

Characteristics of Snowfall over the Eastern Half of the United States and Relationships with Principal Modes of Low-Frequency Atmospheric Variability 234 JOURNAL OF CLIMATE Characteristics of Snowfall over the Eastern Half of the United States and Relationships with Principal Modes of Low-Frequency Atmospheric Variability MARK C. SERREZE, MARTYN P.

More information

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful

More information

Forecast comparison of principal component regression and principal covariate regression

Forecast comparison of principal component regression and principal covariate regression Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

WINTER TEMPERATURE COVARIANCES IN THE MIDDLE AND THE LOWER TROPOSPHERE OVER EUROPE AND THE NORTH ATLANTIC OCEAN

WINTER TEMPERATURE COVARIANCES IN THE MIDDLE AND THE LOWER TROPOSPHERE OVER EUROPE AND THE NORTH ATLANTIC OCEAN INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 21: 679 696 (2001) DOI: 10.1002/joc.651 WINTER TEMPERATURE COVARIANCES IN THE MIDDLE AND THE LOWER TROPOSPHERE OVER EUROPE AND THE NORTH ATLANTIC

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Principal Component Analysis (PCA) of AIRS Data

Principal Component Analysis (PCA) of AIRS Data Principal Component Analysis (PCA) of AIRS Data Mitchell D. Goldberg 1, Lihang Zhou 2, Walter Wolf 2 and Chris Barnet 1 NOAA/NESDIS/Office of Research and Applications, Camp Springs, MD 1 QSS Group Inc.

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

10.5 ATMOSPHERIC AND OCEANIC VARIABILITY ASSOCIATED WITH GROWING SEASON DROUGHTS AND PLUVIALS ON THE CANADIAN PRAIRIES

10.5 ATMOSPHERIC AND OCEANIC VARIABILITY ASSOCIATED WITH GROWING SEASON DROUGHTS AND PLUVIALS ON THE CANADIAN PRAIRIES 10.5 ATMOSPHERIC AND OCEANIC VARIABILITY ASSOCIATED WITH GROWING SEASON DROUGHTS AND PLUVIALS ON THE CANADIAN PRAIRIES Amir Shabbar*, Barrie Bonsal and Kit Szeto Environment Canada, Toronto, Ontario, Canada

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

Atmospheric patterns for heavy rain events in the Balearic Islands

Atmospheric patterns for heavy rain events in the Balearic Islands Adv. Geosci., 12, 27 32, 2007 Author(s) 2007. This work is licensed under a Creative Commons License. Advances in Geosciences Atmospheric patterns for heavy rain events in the Balearic Islands A. Lana,

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

SHORT COMMUNICATION EXPLORING THE RELATIONSHIP BETWEEN THE NORTH ATLANTIC OSCILLATION AND RAINFALL PATTERNS IN BARBADOS

SHORT COMMUNICATION EXPLORING THE RELATIONSHIP BETWEEN THE NORTH ATLANTIC OSCILLATION AND RAINFALL PATTERNS IN BARBADOS INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 6: 89 87 (6) Published online in Wiley InterScience (www.interscience.wiley.com). DOI:./joc. SHORT COMMUNICATION EXPLORING THE RELATIONSHIP BETWEEN

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Recei ed 24 December 1999 Re ised 30 August 2000 Accepted 31 August INTRODUCTION

Recei ed 24 December 1999 Re ised 30 August 2000 Accepted 31 August INTRODUCTION INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 21: 419 437 (2001) DOI: 10.1002/joc.606 THE SPATIAL AND TEMPORAL BEHAVIOUR OF THE LOWER STRATOSPHERIC TEMPERATURE OVER THE SOUTHERN HEMISPHERE: THE

More information

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to: System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Explaining Correlations by Plotting Orthogonal Contrasts

Explaining Correlations by Plotting Orthogonal Contrasts Explaining Correlations by Plotting Orthogonal Contrasts Øyvind Langsrud MATFORSK, Norwegian Food Research Institute. www.matforsk.no/ola/ To appear in The American Statistician www.amstat.org/publications/tas/

More information

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2

More information

Concentration Ellipsoids

Concentration Ellipsoids Concentration Ellipsoids ECE275A Lecture Supplement Fall 2008 Kenneth Kreutz Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San Diego VERSION LSECE275CE

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information