Constructed Analogs and Linear Regression

Size: px

Start display at page:

Download "Constructed Analogs and Linear Regression"

Andrea Lang
5 years ago
Views:

1 JULY 2013 T I P P E T T A N D D E L S O L E 2519 Constructed Analogs and Linear Regression MICHAEL K. TIPPETT International Research Institute for Climate and Society, Columbia University, Palisades, New York, and Center of Excellence for Climate Change Research, Department of Meteorology, King Abdulaziz University, Jeddah, Saudi Arabia TIMOTHY DELSOLE George Mason University, Fairfax, Virginia, and Center for Ocean Land Atmosphere Studies, Calverton, Maryland (Manuscript received 6 August 2012, in final form 25 November 2012) ABSTRACT The constructed analog procedure produces a statistical forecast that is a linear combination of past predictand values. The weights used to form the linear combination depend on the current predictor value and are chosen so that the linear combination of past predictor values approximates the current predictor value. The properties of the constructed analog method have previously been described as being distinct from those of linear regression. However, here the authors show that standard implementations of the constructed analog method give forecasts that are identical to linear regression forecasts. A consequence of this equivalence is that constructed analog forecasts based on many predictors tend to suffer from overfitting just as in linear regression. Differences between linear regression and constructed analog forecasts only result from implementation choices, especially ones related to the preparation and truncation of data. Two particular constructed analog implementations are shown to correspond to principal component regression and ridge regression. The equality of linear regression and constructed analog forecasts is illustrated in a Ni~no-3.4 prediction example, which also shows that increasing the number of predictors results in low-skill, highvariance forecasts, even at long leads, behavior typical of overfitting. Alternative definitions of the analog weights lead naturally to nonlinear extensions of linear regression such as local linear regression. 1. Introduction A general prediction problem is to find the best estimate of a quantity y given a related quantity x. We refer to vectors y and x as the predictand and predictor, respectively. Examples of typical earth science prediction problems are as follows: x is the current sea surface temperature and y is its future state (Penland and Magorian 1993); x is a prescribed CO 2 concentration and y is global surface temperature (Krueger and Von Storch 2011); x is a large-scale climate feature and y is an associated small-scale climate feature (Robertson et al. 2012). In principle, the probability distribution of y for a particular value of predictor x 5 x 0 (the conditional distribution) can be computed from physical laws or Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus, 61 Route 9W, Palisades, NY tippett@iri.columbia.edu estimated from data. In either case, the mean of that distribution (the conditional mean) is the best forecast in the sense of minimizing the expected squared error. When x and y have a joint Gaussian distribution, the best forecast, as well as its uncertainty, is given by linear regression (LR). The idea of conditional averaging is also found in the constructed analog (CA) method (Van den Dool 1994, 2006), a statistical forecast method that has been applied in a variety of geophysical problems (e.g., Van den Dool et al. 2003; Maurer and Hidalgo 2008; Hawkins et al. 2011). A prediction y CA is made for a particular value of the predictor x 5 x 0 by searching through historical data for values of y corresponding to values of x that are close to x 0, so-called analogs. The CA method expresses the current predictor state x 0 as a weighted linear combination of past states and makes a prediction by applying those same weights to the corresponding values of y, an averaging procedure reminiscent of the conditional mean. The CA has previously been described as differing from LR in two fundamental ways. First, it has been DOI: /MWR-D Ó 2013 American Meteorological Society

2 2520 M O N T H L Y W E A T H E R R E V I E W VOLUME 141 claimed that by making no assumption of a linear relation between predictor and predictand, CA captures nonlinearity. Second, it has been claimed that since CA is not based on minimizing the mean squared error of the predictions, there is no danger of overfitting. Here we show that typical implementations of CA do not have these properties, and, in fact, CA forecasts are identical to LR forecasts. The paper is organized as follows. In section 2 we review the least squares problems that arise in the formulations of LR and CA, and use the matrix pseudoinverse to show that simple (without predictor truncation or regularization) implementations of the two methods give identical forecasts. In section 3, we identify situations where the simple implementation overfits the data and show that a recommended CA implementation is the same as principal component regression. In section 4, we show that another common CA implementation corresponds to ridge regression. In section 5, we show that LR and CA predictions of the Ni~no-3.4 index are identical and may have large variance even at long leads. In section 6, we present and illustrate some nonlinear regression methods that follow naturally from modifications to CA. A summary and discussion are given in section Linear regression, constructed analogs, and pseudoinverses We use the following matrix notation for the training data. Let X be the N x 3 N t matrix of predictor data; N x is the number of predictor variables and N t is the number of time samples. Each column of X contains the predictor variables (x) at a particular time; each row of X contains the time series of a particular predictor variable. Likewise, let Y be the N y 3 N t matrix of predictand data; N y is the number of predictand variables. Let x 0 be the N x 3 1 column vector of predictor variables to be used in a forecast. We assume that predictors and predictands are expressed as anomalies. More generally, a row of ones can be included in X to account for an intercept term. Linear regression finds the N y 3 N x matrix A of regression coefficients such that the norm of the residuals ky 2 AXk 2 (1) is minimized. The notation kk 2 denotes the square of the Frobenius norm, which is the sum of the squares of the entries of the matrix or vector to which it is applied. The linear regression forecast y LR is y LR 5 Ax 0. (2) Practically, computing the matrix of regression coefficients by direct minimization of (1) may be ill posed (there is no unique solution when N x. N t ) or ill advised (overfitting can lead to poor performance in independent data when N x is comparable to N t ). The CA method also involves a linear least squares minimization problem. In the CA method, x 0 is expressed as a weighted sum of past states (columns of X), and a prediction is formed by applying those same weights to the columns of Y. Specifically, CA finds the N t 3 1 column-vector a of weights that minimizes kx 0 2 Xak 2, (3) and then makes a prediction y CA by applying those weights to the columns of Y: y CA 5 Ya. (4) The linear least squares problems appearing in the formulations of LR and CA appear quite different. For instance, the matrix A of LR coefficients multiplies the data on the left to combine different predictors, and the vector a of CA weights multiplies the data on the right to combine different times. Also, CA involves fitting x 0 while LR fits Y. One of the least squares problems is always underdetermined and the other overdetermined unless N x 5 N t. We will use the pseudoinverse of the data matrix X to solve both linear least squares problems and show that the resulting LR and CA forecasts are identical. In particular, (1) is minimized by A 5 YX 1, (5) where X 1 is the pseudoinverse of X, a quantity that we will define and discuss later (Hansen 1998). When N x. N t, (1) is underdetermined, its minimizer is not unique, and A 5 YX 1 is the minimizer with minimum value of kak 2. The pseudoinverse commutes with the transpose in that, (X 1 ) T 5 (X T ) 1. For this reason, the minimizer of (3) can also be expressed using the pseudoinverse, and a 5 X 1 x 0. (6) When N x, N t, (3) is underdetermined, and a 5 X 1 x 0 is the minimizer with minimum norm. We refer to these direct minimizing solutions as providing simple implementations of LR and CA. Substituting the simple minimizers of (5) and (6) into the definitions of the LR and CA predictions, (2) and (4), respectively, we see that y LR 5 Ax 0 5 YX 1 x 0 5 Ya 5 y CA. (7)

3 JULY 2013 T I P P E T T A N D D E L S O L E 2521 Remarkably, the simple linear regression and constructed analog predictions are identical. 3. Connection to principal component regression While the simple LR implementation does solve the least squares problem and find the best fit to the data, it does so using all of the predictors. Such an approach is ill advised when the number of predictors is comparable to the number of samples since overfitting may result in poor predictions on independent data. To see this point more clearly, let us return to the matter of actually defining the pseudoinverse. The pseudoinverse of X is defined using its singular value decomposition (SVD): X 5 USV T, (8) where U and V are orthogonal square matrices of size N x 3 N x and N t 3 N t, respectively, and S is a diagonal N x 3 N t matrix with nonnegative entries (Golub and Van Loan 1996). The so-called economical SVD is X 5 _ U _ S _ V T, (9) where _ U and _ V retain the columns of U and V, respectively, corresponding to the nonzero diagonal elements of S, and the elements of the square diagonal matrix _ S are strictly positive; the number of positive diagonal entries of S is at most min(n x, N t 2 1) for anomaly data. The pseudoinverse of X is defined to be X 1 5 _ V _ S 21 _ U T. (10) The matrix _ S is square with positive diagonal entries and is thus invertible. Therefore, the simple LR and CA forecasts are y LR 5 y CA 5 YX 1 x 0 5 Y _ V _ S 21 _ U T x 0. (11) In the language of principal component analysis (PCA), the columns of the matrices US/ _ pffiffiffiffiffi N t and N t _V are the empirical orthogonal function (EOFs) and principal components (PCs), pffiffiffiffiffi respectively, of the anomaly data X. The factors of N t serve to normalize the PCs to have unit variance since the columns of V _ are unit vectors with zero mean. Principal component regression (PCR) arises from taking the PCs as predictors rather than the original data in X. Ifweweretouse all of the PCs as predictors [simple PCR (SPCR)], we would find the matrix A SPCR of regression coefficients that minimizes q ffiffiffiffiffi 2 Y 2 A SPCR N _V T t. (12) This linear least squares problem can be solved by finding the pseudoinverse of N t _V T, which is V/ _ N t. Therefore, A SPCR 5 p 1 ffiffiffiffiffiy V. _ (13) N t The simple PCR forecast y SPCR is obtained by applying A SPCR to the PC amplitudes of x 0 which are N t _S 21 U _ T x 0. Therefore, q ffiffiffiffiffi y SPCR 5 A SPCR N _S 21 _ t U T x 0 5 Y V _ S _ 21 U _ T x 0 5 YX 1 x 0 5 y LR 5 y CA. (14) Therefore, the LR and CA forecasts with the simple minimizers are the same as the simple PCR forecast, which uses all of the PCs as predictors. Such an approach overfits the data and has poor prediction skill on independent data unless the number of samples is substantially larger than the number of predictors. To obtain more robust CA weights in the case where the number N x of predictors is comparable or exceeds the number N t of samples, Van den Dool (2006) proposed projecting x 0 and X on to a truncated set of EOFs. We use the tilde notation to denote such a truncation with ~ X 5 U S V T and ~x 0 5 U U T x 0,andthe double-dot notation to denote the truncation of the SVD. Computing the CA weights ~a with the truncated data gives ~a 5 ~ X 1 ~x 0 5 V S 21 U T ~x 0, (15) and applying them to Y gives as prediction, ~y CA 5 Y~a 5 Y V S 21 U T ~x 0 5 A ~ PCR S 21 U T ~x 0, (16) where A ~ PCR 5 Y V/ pffiffiffiffiffi N t and N t S 21 U T ~x 0 are the (truncated) PC amplitudes of x 0. From the previous discussion leading to (14) we recognize (16) as the PCR forecast based on the truncated set of PCs. Computing CA weights with data projected on to a truncated set of EOFs gives the same forecast as PCR using the same truncated set of PCs. The choice of the number of PCs to use in the calculation of the CA weights has exactly the same effect on the forecast as the choice of the number of PCs to use in PCR. In both cases, using too many PCs leads to overfitting.

4 2522 M O N T H L Y W E A T H E R R E V I E W VOLUME Connection to ridge regression Another approach to the linear least squares problems in (1) and (3) that appear in the formulations of LR and CA is ridge regression, also known as Tikhonov regularization. The regularized solutions of (1) and (3) are and A d 5 YX T (XX T 1 di) 21, (17) a d 5 (X T X 1 di) 21 X T x 0, (18) respectively, where I is the appropriately sized identity matrix and the ridge parameter d is a positive scalar (Hansen 1998). The regularized solutions are welldefined irrespective of the parameters N x and N t.the matrix A d is precisely that used in ridge regression, and Van den Dool (2006) suggested using a d in CA. Remarkably, the resulting forecasts y LA,d and y CA,d are identical: y CA,d 5 Ya d 5 Y(X T X 1 di) 21 X T x 0 5 YX T (XX T 1 di) 21 x 0 5 A d x 0 5 y LR,d, (19) where we have used the push-through matrix identity (X T X 1 di) 21 X T 5 X T (XX T 1 di) 21. Use of ridging in computing the CA weights or in computing the LR coefficients results in identical forecasts. The ridge regression solution is directly related to the pseudoinverse-based solution since an equivalent definition of the pseudoinverse is X 1 5 lim (X T X 1 di) 21 X T 5 lim X T (XX T 1 di) 21. Consequently, (20) lim y CA,d 5 lim y LR,d 5 y CA 5 y LR. (21) The ridge regression forecast in the limit of d going to zero is the same as the LR or CA forecast with a simple minimizer. This result is consistent with the interpretation of ridge regression as solving the least squares problems subject to a constraint on the size of the solution (DelSole 2007). 5. Example: Ni~no-3.4 prediction A typical application of CA and LR is the prediction of the Ni~no-3.4 index (Van den Dool 2006). We consider forecasts made in the beginning of July and take as predictors the gridded April June sea surface temperature (SST) anomaly in the region from 408Sto408Nfrom the extended reconstructed SST (ERSST) dataset, version 3b (Smith and Reynolds 2004). The historical data used to form X come from the 49-yr period , and the anomalies are computed with respect to the same period. The predictand y is the 3-month average of Ni~no- 3.4 anomaly with respect to the period taken from the extended Kaplan dataset (Kaplan et al. 1998) at leads extending to lead 22; denoting the July September 2005 as the zero-month lead forecast, lead 22 is April June Here the initial condition x 0 is the April June 2005 SST anomaly, and y consists of the Ni~no-3.4 index from April June 2005 to April June 2007, 25 leads. Forecasts are made based on varying number of areaweighted EOFs; no ridging is used. Figure 1 shows that CA and PCR forecasts based on the same number of EOFs are identical. On the other hand, forecasts based on different numbers of EOFs can vary greatly. Forecasts using 10 EOFs show little variability, while those with 25 or more show considerable variability. This particular set of forecasts verifies well against observations out to a lead of nearly two years. The skill of forecasts made in July for the following March May (lead 8) was computed for period using the entire dataset and using leave-one-out cross validation (CV) applied to the LR coefficients and CA weights; the PCs were computed using the full dataset. The CV skill of the 10 EOF forecasts is the highest, and as the number of EOFs increases, the resulting forecasts have lower CV skill and greater variance (Table 1). On the other hand, the in-sample correlation increases as the number of EOFs increases, and the in-sample ratio of forecast to climatological variance is equal to the in-sample correlation. The variance of the cross-validated forecasts is greater than the climatological variance when 25 or more EOFs are used. The reason for this behavior is that the insample explained variance and the variance of the regression coefficient estimates, both of which are increasing functions of the number of predictors, contribute to the variance of the cross-validated forecasts. The behavior of the CV forecasts, especially those with more than 10 EOFs, is consistent with that of overfitting with in-sample skill being substantially greater than the CV skill, and the CV skill being inconsistent with the variance. 6. Nonlinear CA Our demonstration that CA forecasts are identical to LR forecasts depends on the weights being defined as the solution of the least squares problem in (3), and a particular solution being chosen in the underdetermined

5 JULY 2013 T I P P E T T A N D D E L S O L E 2523 TABLE 1. Skill and ratio of forecast to climatological variance of in-sample and leave-one-out cross-validated (CV) forecasts made in the beginning of July for the following March May average (lead 8) of the Ni~no-3.4 index during the period EOFs Correlation (in sample) Correlation (CV) Ratio of forecast to climatological variance (in sample) Ratio of forecast to climatological variance (CV) FIG. 1. Constructed analog (CA) and principal component regression (PCR) forecasts along with observations (obs) of the threemonth-average Ni~no-3.4 index. Forecasts are made at the beginning of July and extend through April June of The numbers in the legend indicate the number of EOFs retained. case. Other characterizations of the weights lead to quite different methods. Before considering other methods of computing weights, we examine the properties of the CA weights in more detail, focusing on the case when N x, N t. In this case, (3) does not have a unique solution and using the pseudoinverse or ridge regression selects a particular solution for the weights. The simple minimizer weights are a 5 _ V _ S 21 _ U T x 0. (22) The form of (22) means that for any x 0, the vector a of weights is a linear combination of the columns of _ V. Since the columns of _ V span the same linear space as the rows of X, the weights are a linear combination of the rows of X. In other words, for some N x 3 1 vector b, a 5 X T b; (23) in particular, b 5 _ U _ S 22 _ U T x 0, and in the case that the predictors are PCs, b 5 x 0. Equation (23) means that the weights, viewed as a function of the data, lie on a hyperplane perpendicular to the (N t 1 1) 3 1 vector [b T, 21]. Because the weights are linear functions of the data, data with values near x 0 do not receive the largest weights, nor do data far from x 0 receive the smallest weights. The CA weights do not measure the distance of x 0 to the training data values. In particular, if x 0 is a natural analog and has the same values as a column of X, the weights are not concentrated on that column of X. Modifying the definition of the CA weights, so that they are a function of the distance between the data and x 0, results in nonlinear statistical prediction algorithms with weights that depend nonlinearly on the data. Importantly, in the case when N x, N t, such a modification requires neither changing the least squares problem in (3) or the forecast equation in (4), but rather involves constructing alternative solutions to (3), that is, ones without the constraint that kak 2 be minimized. For instance, in the k-nearest neighbors (KNN) algorithm, the elements of the weight vector are all zero except for those corresponding to the k columns of X that are closest to x 0, which have value 1/k (Hastie et al. 2009). Explicitly, the ith KNN weight is 8 >< 1 a KNN,i 5 k, if x i 2 C k (x 0 ) >: 0, otherwise (24) where C k (x 0 ) is the set of k columns of X nearest to x 0. The KNN prediction is the average of the columns of Y corresponding to the k columns of X nearest to x 0. Kernel methods generalize KNN by using weights that are a smoothly decreasing function of the distance between the columns of X and x 0. In particular, the ith kernel smoother (KS) weight is a KS,i 5 K(x i, x 0, l) N t å j51 K(x j, x 0, l), (25) where the kernel function K(x, x 0, l) is a smoothly decreasing, positive function of the distance between x and x 0, and l is a parameter that determines how quickly the kernel function decreases to zero. Local linear regression (LLR) is another kernel method and computes the weights using generalized least squares with data close to x 0 receiving more emphasis. Specifically, a LLR 5 W 1/2 (XW 1/2 ) 1 x 0, (26) where the matrix W is a N t 3 N t diagonal matrix that depends on x 0 and whose ith diagonal entry is W ii 5 K(x i, x 0, l). (27)

6 2524 M O N T H L Y W E A T H E R R E V I E W VOLUME 141 coefficients, these parameters should be chosen objectively in a way that avoids overfitting. The CA, KNN, GKS, and LLR weights are quite different for x as shown in Fig. 2b. The sum of the weights is one for all methods due to the intercept term. A clear feature of the CA weights is that they are a linear function of the data values and display no maximum near x 0. This behavior is general as discussed earlier. The KNN weights are zero except for the five data points nearest to x 0 where they are 1 /5. The GKS weights have largest values near x 0 anddecreasetozeroasthedistance to x 0 increases. The LLR weights are locally linear near x 0 with values that go to zero far from x Summary and discussion FIG. 2. (a) Data (plus signs) generated by (28) fit by linear regression (LR) constructed analog (CA), k-nearest neighbors (KNN), Gaussian kernel smoother (GKS), and local linear regression (LLR). The truth curve is the expected value of y given x. (b) The CA, KNN, GKS, and LLR weights for x The LLR weights are divided by 4 for display purposes. We applied these methods to 30 samples of univariate data generated by y 5 x 1 0:8x 3 1, (28) where x and are Gaussian distributed with mean zero and unit variance. A row of ones is included in X to account for a possible intercept term. Figure 2a shows that LR CA fails to capture the nonlinear relation. The KKN fit with k 5 5 is noisy and piecewise constant with discontinuities. A Gaussian kernel smoother (GKS) with a standard deviation of 0.35 and LLR (with the same Gaussian kernel) give similar results with LLR showing an advantage near the boundaries of the data. It is important to note that the performance of KNN depends on the choice of k, while the performance of the GKS and LRR depends on the kernel parameter l. Here we have selected fairly arbitrary values for these parameters that give good performance. However, like the regression While the constructed analog (CA) statistical forecast method has previously been described as having properties that are distinct from those of linear regression (LR; Van den Dool 2006), we have shown here that, with comparable treatment of the data, CA and LR produce identical forecasts, and therefore the properties of CA are the same as those of LR. In particular, CA forecasts are linear functions of the predictors and subject to overfitting. When EOF truncation is used in the CA calculation, the resulting forecast is the same as that given by principal component regression (PCR) based on the same EOFs. Likewise, using ridging in the calculation of CA weights results in the same forecast as does ridge regression. These results were illustrated in an example where sea surface temperature was used to predict the Ni~no-3.4 index. The CA and PCR forecasts based on the same number of PCs are identical. When many PCs were used, the forecasts show high variance, even at long leads, but low cross-validated skill, a symptom of overfitting. The equivalence between LR and CA depends on the precise definition of the weights. Allowing the weights to depend nonlinearly on the data leads naturally to generalizations of CA such as kernel smoothers and local linear regression, which we have illustrated with an example. In practice, LR forecasts are observed to differ from CA forecasts. Moreover, forecasts from different implementations of LR also differ. For instance, LR-based statistical forecasts of ENSO including CA have quite different properties (Barnston et al. 2012). Use of distinct datasets may explain some of these differences. However, it must be recognized that many linear regression forecasts, with significant variations in skill, can be constructed from a given dataset of predictors and predictands. There are two primary sources of variety. First, the predictors or predictands can be truncated, and the regression developed on the truncated data. Principal component analysis and canonical correlation

7 JULY 2013 T I P P E T T A N D D E L S O L E 2525 analysis are commonly used methods for truncating the data that enter a LR. The resulting forecasts depend on the truncation choices as illustrated here in the Ni~no-3.4 example where the forecasts depend strongly on the number of principal components retained as predictors. Linear inverse models and autoregressive methods usually project both the predictors and predictands onto EOFs (DelSole and Chang 2003); CA generally only projects the predictors, thus leading to different forecasts. Second, there are a variety of methods for estimating the LR coefficients. In addition to the classic least squares method, there are shrinkage methods like ridge and lasso (Hastie et al. 2009). The CA often uses ridge; PCR does not, again leading to different forecasts. Appropriate choices of data truncation and coefficient estimation method are key to developing a skillful LR forecast. Acknowledgments. The authors thank Huug van den Dool for his generous and helpful comments, and two anonymous reviewers for their useful suggestions. MKT is supported by grants from the National Oceanic and Atmospheric Administration (Grants NA05OAR and NA08OAR ) and the Office of Naval Research (Grant N ). TD gratefully acknowledges support from grants from the NSF (Grant ), the National Oceanic and Atmospheric Administration (Grant NA09OAR ), and the National Aeronautics and Space Administration (Grant NNX09AN50G). The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies. REFERENCES Barnston, A. G., M. K. Tippett, M. L. L Heureux, S. Li, and D. G. DeWitt, 2012: Skill of real-time seasonal ENSO model predictions during Is our capability increasing? Bull. Amer. Meteor. Soc., 93, DelSole, T., 2007: A Bayesian framework for multimodel regression. J. Climate, 20, , and P. Chang, 2003: Predictable component analysis, canonical correlation analysis, and autoregressive models. J. Atmos. Sci., 60, Golub, G. H., and C. F. Van Loan, 1996: Matrix Computations. 3rd ed. The Johns Hopkins University Press, 694 pp. Hansen, P., 1998: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. Society for Industrial and Applied Mathematics, 247 pp. Hastie, T., R. Tibshirani, and J. Friedman, 2009: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 768 pp. Hawkins, E., J. Robson, R. Sutton, D. Smith, and N. Keenlyside, 2011: Evaluating the potential for statistical decadal predictions of sea surface temperatures with a perfect model approach. Climate Dyn., 37, Kaplan, A., M. A. Cane, Y. Kushnir, A. C. Clement, M. B. Blumenthal, and B. Rajagopalan, 1998: Analyses of global sea surface temperature J. Geophys. Res., 103 (C9), Krueger, O., and J.-S. Von Storch, 2011: A simple empirical model for decadal climate prediction. J. Climate, 24, Maurer, E. P., and H. G. Hidalgo, 2008: Utility of daily vs. monthly large-scale climate data: An intercomparison of two statistical downscaling methods. Hydrol.EarthSyst.Sci.,12, Penland, C., and T. Magorian, 1993: Prediction of Ni~no-3 sea surface temperatures using linear inverse modeling. J. Climate, 6, Robertson, A. W., J.-H. Qian, M. K. Tippett, V. Moron, and A. Lucero, 2012: Downscaling of seasonal rainfall over the Philippines: Dynamical versus statistical approaches. Mon. Wea. Rev., 140, Smith, T. M., and R. W. Reynolds, 2004: Improved extended reconstruction of SST ( ). J. Climate, 17, Van den Dool, H., 1994: Searching for analogues, how long must we wait? Tellus, 46A, , 2006: Empirical Methods in Short-Term Climate Prediction. Oxford University Press, 240 pp., J. Huang, and Y. Fan, 2003: Performance and analysis of the constructed analogue method applied to U.S. soil moisture over J. Geophys. Res., 108, 8617, doi: / 2002JD

Multimodel Ensemble forecasts

Multimodel Ensemble forecasts Calibrated methods Michael K. Tippett International Research Institute for Climate and Society The Earth Institute, Columbia University ERFS Climate Predictability Tool Training