Statistical downscaling daily rainfall statistics from seasonal forecasts using canonical correlation analysis or a hidden Markov model?

Statistical downscaling daily rainfall statistics from seasonal forecasts using canonical correlation analysis or a hidden Markov model? Andrew W. Robertson International Research Institute for Climate and Society (IRI), Palisades, NY Koen Verbist Water Centre for Arid Zones in Latin America and the Caribbean, La Serena (CAZALAC), Chile

Outline Goal: downscale retrospective GCM seasonal forecasts of precipitation to a network of rainfall stations over central Chile, to predict daily rainfall statistics pertinent to dry-land management. Targeted statistics: winter season rainfall total, rainfall frequency, mean daily intensity; drought indices: number of heavy rainfall days, (daily) accumulated precipitation deficit. 2 Methods: 1. Linear regression of seasonal statistics using CCA 2. Hidden Markov model (HMM) of daily rainfall sequences

Experiment design Obs data: 42 daily rainfall stations in central Chile, May August, 1981 2005 Seasonal forecasts: CFS (T63) of May August gridded precip [20 40 S, 65 85 W], initialized on April 1 Rainfall seasonality Rainfall vs ENSO

Multiple linear regression via Canonical correlation analysis (CCA) Regress seasonal average observed rainfall fields y onto GCM f cast fields x, y = Ax + ε Expand x and y in truncated Principal Component time series Vx and Vy, and standardize the PCs The singular value decomposition Vy T Vx = RMS T identifies linear combinations of the observation and predictor PCs with maximum correlation and uncorrelated time series (Barnett and Preisendorfer, 1987) these new pattern-variables give a diagonal regression matrix whose coefficients are correlations: (VyR) = M (VxS) The CCA modes with low correlation are neglected

Leading canonical modes with 4 X- and 5 Y-PCs

Pearson correlation skill using CCA Leave-5-out cross-validation seas. amount = (no. of wet days) x (mean intensity on wet days)

Hidden Markov model of daily rainfall Simplify the joint probability distribution of all the daily data (42 stations, 123 days, 25 years) by introducing a latent discrete state variable St with a Markovian daily time dependence: p (R 1:T, S 1:T )= [ p (S 1 ) T t=2 p (S t S t 1 ) ][ T t=1 p (R t S t ) ] Graphical model: hidden states P(R t S t ) observed daily rainfall fields P(S t S t 1 )...... S 1 S 2 S 3 S t S T...... R 1 R 2 R R R 3 t T

Non-homogenous HMM conditioned on GCM forecasts [GCM Rainfall PC-1] t Exogenous variable x at day t P (S t = i S t 1 = j, X t = x) = exp (σ ji + ρ i x) K k=1 exp σ jk + ρ k x. S t-1 Markovian dependence S t state at day t R t rainfall field at day t use seasonal mean PC-1 of GCM precip [5S 40S, 100W 50W] repeated daily 15 CFS members * 10 stochastic realizations = 150 simulations 4-state model

NHMM rainfall states

Daily state sequence Sea level pressure anomaly composites State 4: 2.7% days 56% total rainfall 0.52 ENSO cor.

Pearson correlation skill using NHMM Leave-3-out cross-validation

Summary comparison of correlation skill via CCA via NHMM

Drought indices computed from NHMM daily rainfall sequences

http://iri.columbia.edu/climate/tools

Summary canonical correlation analysis (CCA) is employed to the time series of the targeted seasonal statistic, calculated from daily station rainfall observations, together with the GCM (here CFS) retrospective forecasts of gridded seasonal-mean precipitation. a non-homogeneous hidden Markov model (NHMM) is trained on the daily station observations, using the GCM's seasonal-mean precipitation as a predictor; the targeted seasonal statistic is then computed from a large ensemble of stochastic daily rainfall sequences generated by the NHMM. Both methods are shown to perform quite similarly under cross-validation. Although more complex, the NHMM is able to provide probabilistic information and the daily rainfall sequences required for water balance modeling directly.