E-AIMS. Global ocean analysis and forecasting: OSE/OSSEs results and recommandations

Research Project co-funded by the European Commission Research Directorate-General 7 th Framework Programme Project No. 284391 E-AIMS Euro-Argo Improvements for the GMES Marine Service Global ocean analysis and forecasting: OSE/OSSEs results and Ref: E-AIMS: D3.313-v2 Date: August 2015 Author: Elisabeth Rémy, Victor Turpin (Mercator Ocean) and Stéphanie Guinehut (CLS) Co-ordinator: Institut Français de Recherche pour l'exploitation de la Mer - France

Table of Content 1. Introduction... 2 1.1. General presentation... 2 1.2. Applicable documents... 2 1.3. Operational oceanography... 2 2. Results of OSEs experiments... 3 2.1. OSE experiments conducted at Mercator Ocean... 3 2.1.1. Experimental setup... 3 2.1.2. OSE results... 6 2.1.3. OSE conclusions... 15 2.2. OSE experiments conducted at CLS... 17 2.2.1. ARMOR3D observation-based system... 17 2.2.2. DFS in ARMOR3D... 18 2.2.3. Experiments... 18 2.2.4. Results... 19 2.2.5. Summary... 36 3. Results of OSSEs experiments... 37 3.1. OSSE conducted at Mercator Ocean... 37 3.1.1. Experimental setup... 37 3.1.2. OSSE results... 40 4. Recommandations... 46 5. References... 47 1

1. Introduction 1.1. General presentation This document contains a synthesis of the OSE/OSSEs experimental results and from the operational global ocean analysis and forecasting center from Copernicus Marine Service (MyOcean project). It is the deliverable D3.313 identified in the description of work DA-1 which was due by the end of September 2014 (T0+21) and shifted to mid December 2014, T0 being the 1 st of January 2013. 1.2. Applicable documents DA-1: Annex 1 to the grant agreement N0 312642: Description of work, date 24 April 2012. 1.3. Operational oceanography Following the international Global Ocean Data Assimilation Experiment (GODAE) strategic plan (2001) Operational processing means: Whenever the processing is done in a routine and regular way; with a pre-determined systematic approach; and constant monitoring of performance. This document collects a synthesis of the Observing System Evaluations (OSE) and Observing System Simulation Experiments (OSSE) for the Global Ocean, using assimilative models that provide short term forecasts (Mercator Ocean), and observation-based system derived from observations and statistical methods (CLS). The quality of the ocean state estimates highly relies on temperature and salinity profiles to constrain the ocean interior. The Argo data set is now one of the major observation components for the operational ocean analysis systems. Its future evolution can help improving the ocean analysis and forecasts. In the past, the impact of different type of observations in the operational systems was assessed, including the Argo observations. As the operational assimilation system and observation arrays quickly evolve, such studies should be regularly done. For the operational systems, the fit to the assimilated observations is routinely monitored to check the quality level of the products. Dedicated experiments such as OSEs are useful tools to study observation impact of the actual observation array but they also help to check the ability of the assimilation system to benefit from the assimilated observations. An important aspect of E-AIMS is to build recommendations on the future possible deployment of Argo floats. OSSEs, where observations are simulated, are usefull to evaluate impact of future observation array. In the framework of E-AIMS, different operational centers will focus on specific aspect or region of interest. Mercator Ocean will evaluate the sensitivity of global physical analysis to deep Argo floats or increased number of profiles. Results of OSEs and OSSEs are presented in the next sections. 2

2. Results of OSEs experiments 2.1. OSE experiments conducted at Mercator Ocean In order to better understand the impact of the current Argo array on the analysis and forecast quality, OSEs are done with/without the Argo floats, all the other components being the same. Such experiments allow comparing the analyzed fields of the different experiments. OSEs dedicated to the impact of the Argo floats in the global ocean are set up with the current operational global ¼ system assimilating altimetry and SST observations. They cover the year 2012. Analyses are weekly produced. Such experiments also help to verify that the information contained in the observations is well used during the assimilation process. We assessed the impact of the Argo array by comparing the observations with estimated Temperature and Salinity fields before and after assimilation. Physical impact of Argo floats assimilation is also quantify by comparing the water mass description in different regions of the ocean. These comparisons demonstrate the importance of Argo observing system on the performance of the ¼ degree real time ocean forecasting system. Results will of course depend on the model resolution, its ability to reproduce the observed physical processses, the atmospheric forcing and the other data sets assimilated. SST and SLA are also constraining the temperature and salinity fields thanks to the multivariate assimilation scheme which will project information on Temperature and Salinity at depth. 2.1.1. Experimental setup The ocean forecasting system used in this study is the actual operational ocean forecasting system running at Mercator-Océan (here after called PSY3V3R3, J.M. Lellouche et al., 2013). It is operational since 2007. It uses the version 3.1 of NEMO. The model configuration is based on a ¼ degree tri-polar ORCA grid with a horizontal resolution of 27 km at the Equator, 21 km at Cape Hatteras (mid-latitudes) and 6km toward the Ross and Weddell Seas. The 50 vertical level discretization decreases from 1m resolution at the surface to 450 m at the bottom of the sea, with 22 levels within the upper 100 m. The data assimilation scheme has been developed at Mercator-Océan. It has been implemented in several ocean configurations with a 7-days window. It is inspired from the reduced order Kalman filter with 3-D multivariate modal decomposition of the forecast error and local analysis. The calculated increment is applied progressively using the Incremental Analysis Update (IAU) method. A 3-D-Var scheme provides a correction of the slowly evolving large-scale biases in temperature and salinity under the mixed layer. The assimilated observations consist on 4 along track altimeter SLA from AVISO (Envisat, jason1, Jason2, Cryosat) during the year 2012. The mean dynamic topography (MDT) named CNES-CLS09 derived from observations is used as a reference for SLA assimilation. SST with a ¼ resolution comes from NCEP and NOAA, it includes AVHRR and AMSR observations. In situ temperature and salinity profiles from the Coriolis data center are also assimilated. Six experiments have been performed withholding different insitu datasets from January 2012 to the December 2012 corresponding to 50 analysis cycles of 7 days. In 5 experiments, altimetry and SST data are assimilated by the system. In the last one, there is no data assimilation during this experiment. In our reference experiment (RunRef), all the datasets have been assimilated as it is done in the operational system. One simulation assimilates only satellite altimetry and SST, but no insitu data. The insitu data set has been separated in 2 sub datasets: the Argo dataset and the in 3

situ no-argo dataset. Also, the Argo dataset has been divided in two nearly equal parts to study the impact of 50% of the actual Argo array in our system. To divide the dataset and to keep a coherent spatial and temporal resolution we use the platform number criterion. Odd and even Argo floats were separated. With these 3 new datasets, 3 OSEs have been built. RunNoArgo assimilates every data set except the Argo dataset. The RunArgo2 assimilates 50% of the Argo array (odd dataset). Another run assimilates only the Argo dataset. Figure 1 : spatial distribution of 2012 insitu dataset divided is 3 sub datasets. 1a are the odds Argo profiles, 1b are evens Argo profiles, 1c are the other insitu observations. The Table 1 summarizes the experiment design. We have decided to focus this report on the RunRef, RunNoArgo and RunArgo2. The other experiments have been used to verify some hypothesis but won t be detailed here. SST SLA Argo Other Insitu data RunRef X X X X RunNoInsitu X X RunNoArgo X X X RunArgo2 X X 50% of the array X FreeRun RunArgo X X X Table 1: assimilated datasets in the different OSE experiments 4

The first part of this report will focus on the physical fields of the global ocean. We will compare daily average physical fields of temperature and salinity for the 3 experiments at 1000m and calculate the 6 month mean and RMS of these differences. Then we will compare the integrated value of temperature and salinity in the Argo layer (800m 2000m) to have a better vision of the global impact of the Argo observations on our analyses, at those depths. The second part will focus on the comparison between model and insitu observations. We will assess the importance of the Argo array in our system by comparing the whole insitu dataset by the daily average temperature and salinity forecast fields of each OSEs. Comparing to the forecast fields is a good way to use independent data to evaluate the impact of Argo assimilation. RMS maps and time series of the differences (innovations) have been produced. We will mainly focus on the 800-2000m layer where most of the observations are Argo profiles. The number of data used to calculate the RMS remains the same for each experiment to make any profiles, time series and maps comparable one to another. In the last paragraph, we will discuss the water masses description in specific region such as the Gibraltar Strait outflow and the Labrador Sea convection area where the impact of Argo assimilation is significant. We will see that Argo assimilation is necessary for a better representation of water masses in these high dynamical regions. 5

2.1.2. OSE results 2.1.2.1. Analysed fields In this first part, we compare the analyzed temperature and salinity fields. The RunRef is used as the reference experiment as we assimilate the whole datasets. The other experiments are compared to the RunRef in term of difference. 1 day average fields are used for these comparisons. a b c d Figure 2 : 14/12/2012 salinity (a,c) and temperature (b,d) of RunRef (a,b) and RunNoArgo (c,d) at 1000m Figure 2 shows temperature and salinity at 1000m for the RunRef and the RunNoArgo. From the global point of view it is not easy to point out the differences between these 2 experiments. To assess the impact of the assimilation of Argo profiles we ve decided to focus on the differences between OSEs. Plots of the mean and RMS differences between temperature and salinity fields, calculated on the last 6 month, have been produced. Figure 3 and Figure 4 show these maps at 1000m depth. The mean and RMS are calculated for the last 6 month of experiment using one day average temperature and salinity fields. North Atlantic Ocean, Agulhas current and Aden Gulf are the most impacted areas at 1000m. The RMS differences are around 0.5 C and 0.1 PSU in these regions when there is no Argo assimilation and we observe a strong decrease when half of the profiles are assimilated. 6

The description of most of the western boundary currents (Gulf Stream, Agulhas current, Brazilian current) is improved when Argo profiles are assimilated. But it is not the case for the Kuroshio Current. One of the reasons could be that the Kuroshio is densely sampled by other instruments than Argo (see Figure 1). In this area, at that depth, Argo profiles don t provide more information to our system than the others In Situ instruments. Those figures point out the importance of Argo floats sampling for these regions. The mean and RMS temperature and salinity differences became more localized but remain high when only half of the profiles are assimilated. The improvement made by the assimilation of the other half of the Argo array is significant. Temperature Mean RMS RunRef - RunNoArgo RunRef RunArgo2 Figure 3 : Mean and RMS of the temperature differences between RunRef and RunNoArgo, RunRef and RunArgo2 at 1000m for the last 6 month of experiment. 7

Salinity Mean RMS RunRef - RunNoArgo RunRef RunArgo2 Figure 4: Mean and RMS of the temperature differences between RunRef and RunNoArgo, RunRef and RunArgo2 at 1000m for the last 6 month of experiment. Figure 5 and Figure 6 shows, for the month of December, the integrated temperature and salinity differences in the layer 800m 2000m between RunRef and each experiment. It has been calculated using a 1 month average T and S fields. In this specific layers most of the data comes from Argo profiles. It appears that the differences in temperature is well spread on the global ocean and can easily reach +/-0.2 C. The most impacted regions are Mediterranean outflow, Agulhas current, circumpolar current or Gulf Stream where the dynamic is very intense in this layer. The Indian Ocean is also very impacted by the assimilation of Argo profiles. On the other hand, equatorial regions seem to have a specific behavior. The impact of Argo assimilation in these areas region is very low comparing to mid latitude oceans. It is mainly due to the ocean activity at that depth. Also, it is interesting to see that the intensity and the spatial distribution of these differences are highly reduced when half of the Argo profiles are assimilated for temperature and salinity to a lesser extent. 8

Salinity differences are concentrated on the North Atlantic Ocean, Arabian Sea, circumpolar currents and south West Atlantic. All these regions are characterized by a high dynamic activity at that depth. Argo floats help to constrain the system for a better representation of the water masses. The particular case of the Antarctic region, south of the Crozet Island is interesting too. The impact of the assimilation of Argo floats is important. These differences can come from a conflict between Argo floats and other observations such as sea mammal profilers. The assimilation of half Argo salinity profiles has a great impact on the global ocean description. In many regions it seems enough to assimilate 50% of salinity profile to get good results. Figure 5 : December mean temperature differences between RunRef and RunNoArgo and RunArgo2 in the 800-2000m layer Figure 6 : December mean salinity differences between RunRef and RunNoArgo and RunArgo2 in the 800-2000m layer 9

2.1.2.2. Statistics on observation-model misfit In this second part we present RMS differences between observations and forecast or analyzed fields. These statistical results are computed with the same number of data (assimilated or not) to guaranty a homogeneous calculation of the performance of the system. The Levitus 2009 climatology is also used as a reference of our basic knowledge of the ocean. Figure 7 and Figure 8 presents the RMS innovations of temperature and salinity between In Situ observations and forecast fields. Figure 7 and Figure 8 present the same result but normalized by the reference Run. The interest of comparing observations to forecast fields is to use independent observations to assess the improvement of the analysis as the data hasn t been assimilated yet. For both experiment, the impact of Argo assimilation from the surface to 2000m depth is significant. Looking at the temperature innovation, the improvement made by Argo assimilation goes from 20% on the 0-1000m depth to 30% on the Argo layer depth (1000m 2000m). In terms of global RMS temperature innovation, the assimilation of 50% of the full Argo array seems to impact the forecast field as much as the assimilation of the global array. This value can hide great regional disparities. The same type of results is found with the global RMS of the salinity innovation. The improvement of the systems is around 30% from surface to 2000 m and grow with depth as there is few salinity informations without Argo under 1000 m depth. For salinity, the assimilation of 50% of the Argo array is equivalent from the global ocean point of view to assimilate the global array in our ¼ degree system. 10

Figure 7: raw and normalized mean RMS innovation between observations and forecast temperature field for RunRef (RUN1), RunNoArgo(RUN11) and RunArgo2(RUN12). 11

Figure 8 : Raw and normalized mean RMS innovation between observations and forecast salinity field for RunRef (RUN1), RunNoArgo(RUN11) and RunArgo2(RUN12). We now represent those statistics on maps. The mean and RMS analysis observation misfits are computed over the last 6 month of the experiment and computed on a 2 x2 regular grid. The size of the box represents the number of residuals used to compute the statistics, the color indicates the value of the RMS and mean in each box. Those figures shows that the spatial distribution of the error is non homogeneous. In general, western boundary currents and the tropics show the largest values. Patterns are generally the same for temperature and salinity. 12

Deliverable D3.313 Global ocean analysis and forecasting: OSE/OSSEs results and a) c) b)d) 7c) 7d) Figure 9: Spatial distribution of the mean and RMS temperature differences between Run-Ref and in situ observations in the 0-300m and 700-2000m layers: a) is the RMS temperature differences in the 0-300m layer. b) is the RMS temperature differences in the 700-2000m layer. c) is the mean temperature differences in the 0-300m layer. d) is the mean temperature differences is the 700-2000m layer. a) c) b) d) Figure 10: Spatial distribution of the RMS and the mean salinity differences between Run-Ref and in situ observations in the 0-300m and 700-2000m layers: a) is the RMS salinity differences in the 0-300m layer. b) is the 13

RMS salinity differences in the 700-2000m layer. c) is the mean salinity differences in the 0-300m layer. d) is the mean salinity differences is the 700-2000m layer. 2.1.2.3. Specific water mass representation We will focus on the regions where Argo data assimilation improves their description. Figure 11: Comparison of the Mediterranean salinity outflow at 1000m for the RunRef and RunNoArgo for the month of september Figure 11 describes the analyzed salinity at 1000m in the vicinity of the Portugal coast for the month of September. It has been calculated from the one day average salinity files. The Mediterranean outflow at 1000m is kept at the right depth when Argo floats are assimilated. It is well known that NEMO in z-coordinate is not representing correctly the vertical position of the Mediterranean waters. The trend of the model is to describe the Mediterranean water shallower than it is. The Mediterranean water is shallower when Argo is not assimilated. Argo observations help the systems to correct the biais of the free model. Also, the zonal spread of the Mediterranean outflow is a good indicator of the representation of the Mediterranean water, the salty water is kept closer to the coast with Argo. Argo is a very important observation system for a better description of the water masses at depth. 14

Figure 12: Comparison of a numerical mooring at 56.15W 60.58N for the RunRef and RunNoArgo Figure 12 represents salinity time series from a numerical mooring at 56.15W 60.58N in the Labrador Sea in the reference experiment (left) and in the experiment where no Argo profiles have been assimilated (right). According to the depth of the mixing and the re-stratification episode the convection is better describe when Argo floats are assimilated. Argo profiles are important in such a system as they allow a better representation of the restratification. 2.1.3. OSE conclusions Observing System Experiments were carried out with the real time Mercator-Ocean ¼ degree global ocean system to quantify the impact of Argo data assimilation. We considered the effect of Argo data assimilation on the 0-2000m layer depth, focusing on the 0-300m layer depth and on the 700-2000m layer where Argo observations are almost the only in situ observing system. The different OSEs cover the year 2012. The quality of the analyses without Argo observations was first assessed using independent non assimilated Argo data. This pointed out the system weaknesses when only SST, SLA and non Argo in-situ data are assimilated. Without Argo data assimilated, large errors are found in the western boundary currents, ACC, Mediterranean and Red Sea outflows and in the tropics. The impact of Argo data assimilation was then evaluated through the comparison of the analyzed temperature and salinity fields over the last 6 months of the different experiments. The comparison of Run-Ref and Run-NoArgo experiments point out the high sensitivity of the analysis to Argo data assimilation. The 6-month RMS differences of daily fields between these experiments easily reach 1 C and 0.1 PSU at 100 m and 0.3 C and 0.05 PSU at 1000 m. The location of the main discrepancies is strongly correlated to high variability regions, both at the surface and at depth. Large impacts are also found in the Red Sea, Amazon and Mediterranean outflow regions. Regions sensitive to Argo data assimilation well coincide with the regions where there is a large difference between analyzed fields without Argo observations and situ observations. The evolution of integrated values such as heat content anomaly or mean salinity anomaly from different regions and depths shows that Argo data assimilation has a large impact on the 15

estimated evolution of heat content and mean salinity anomaly in both the surface layer from 0 to 300 m and the 700-2000 m layer, the later one being mainly unobserved without Argo observations. Finally, we evaluate the skill of the PSY3 forecasting system by computing the RMS differences between in-situ observations and forecasted fields for the different OSEs. Through Argo assimilation the differences between observation and forecasted fields is reduced by about 20% in the 0-300 m depth and from 20% to 65% in the 700-2000 m layers depth. Results at depth show the importance of the global spatial coverage of Argo assimilation to constrain the temperature and salinity 3D-correction deduced from data assimilation of SLA and SST to be more realistic. We show that the impact on the analyzed global ¼ temperature and salinity fields when Argo is assimilated corresponds to an improvement of the analysis and forecast fields in terms of innovation and residuals to in situ observations. This shows the ability of the data assimilation system to take advantages of the Argo observations. The continuous improvement of the system skills from half of the Argo array to full Argo array also indicates that all observations are needed to constrain our system. These performances highlight the major importance of Argo data assimilation for operational oceanography. A decrease of the existing coverage of the Argo array will lead to a degradation of our PSY3 global ocean analysis and forecasts. Finally, it is important to remind that results from OSEs depend on the modelling and data assimilation system used. One should thus be cautious to derive general statements on observing systems unless consistent results based on several systems are obtained. This is the approach promoted by the GODAE OceanView OSE/OSSE task team. As our results are consistent with and complement those carried out by other teams (e.g. Lea et al., 2014), we believe that our statements on the major contribution of Argo for global ocean analysis and forecasting systems are robust ones and can be generalized to other systems. 16

2.2. OSE experiments conducted at CLS CLS has developed a multivariate data analysis system that merges satellite (altimetry and sea surface temperature) and in situ observations (Argo, moorings, CTDs, XBTs, etc) through linear regression and optimal interpolation (ARMOR3D system described in Guinehut et al., 2012). This observation-based system is the result of more than ten years of work during which OSE and OSSEs have been conducted (Guinehut et al., 2002; Guinehut et al., 2004; Guinehut et al., 2012). The ARMOR3D observation-based system is used as part of E-AIMS to assess the impact of Argo observations to map temperature and salinity fields with satellite observations using Degree of Freedom of Signal (DFS) diagnostics. DFS is an influence matrix diagnostics that provides a measure of the gain in information brought by the observations. It is a complementary approach to the one developed in the past in the ARMOR3D system (see E-AIMS D3.312: Rémy and Guinehut, 2013). Several experiments are conducted. The first one studies the impact of the in situ observing system (XBTs, CTDs, Argo, moorings, gliders...) together with the satellite observations (2 datasets). The second one studies the impact of the Argo observing system together with the other in situ observations and the satellite measurements (3 datasets). The third one tests the sensitivity of the results to the error prescribed on the satellite dataset. 2.2.1. ARMOR3D observation-based system ARMOR3D T/S fields result from a combination between satellite and in situ data processed in two steps (Guinehut et al., 2012): Step1: Satellite data (SLA + SST) are projected onto the vertical via a multiple linear regression method and covariances deduced from historical observations. This step gives synthetic fields; Step2: Combination between these synthetic fields with T/S in situ profiles via an optimal interpolation method. This leads to ARMOR3D combined fields. The first step of the method consists in deriving synthetic temperature profiles from the surface down to 1500-meter depth from altimeter and SST data through a multiple linear regression method and covariances calculated from historical data. For synthetic salinity profiles, the method uses only altimeter data. Pre-processing of altimeter SLA includes the extraction of the steric part of the SLA using regression coefficients deduced from an altimeter/in situ comparison study (Guinehut et al., 2006; Dhomps et al., 2011). This first step is implemented as anomalies from the Arivo monthly climatology ARV11 (http://wwz.ifremer.fr/lpo/so- Argo/Products/Global-Ocean-T-S/ARV11-climatology; see also Gaillard et al., 2008, 2012 and Kolodziejczyk and Gaillard 2012); The second step of the method consists in combining the synthetic profiles with in situ temperature and salinity profiles using an optimal interpolation (OI) method (Bretherton et al., 1976). To gain maximum benefit from the qualities of both data sets, namely the accurate information given by in situ T/S profiles and the mesoscale variability given by the T/S synthetic profiles, a precise statistical description of the errors of these observations has been introduced in the optimal interpolation method. For the in situ profiles, since these observations are considered almost perfect, a very low white noise is applied. For the synthetic profiles, simulating remotesensing (altimeter and SST) observations, since these observations are not direct measurements but are derived from the regression method, correlated errors have to be applied to correct long- 17

wavelength errors or biases present in the synthetic fields and introduced by the regression method. This second step is implemented as anomalies from synthetic fields. Analyses are performed at a weekly period on a 1/4 regular horizontal grid on 24 vertical levels from the surface down to 1500-meter depth. The altimeter data used are from the SSALTO/DUACS center. They consist of gridded Sea Level Anomaly (SLA) products obtained from an optimal combination of all available satellite altimeters (http://www.aviso.oceanobs.com/fileadmin/documents/data/tools/hdbk_duacs.pdf). The delayed-mode version of the product is used. Satellite SST data are from daily Reynolds L4 analysis combining AVHRR, AMSR and in situ observations and distributed by the National Climatic Data Center at NOAA (Reynolds at al., 2007). In situ T/S profiles are from the CORA3.3 database distributed by the Coriolis data center. It includes Argo floats, XBTs, CTDs, moorings, gliders and sea mammals profiles (Cabanes et al., 2013). 2.2.2. DFS in ARMOR3D In order to assess the impact of in situ observations to map temperature and salinity fields together with satellite observations, Degree of Freedom of Signal (DFS) are derived from ARMOR3D method. DFS is an influence matrix diagnostics, first developed for the atmosphere (Cardinali et al., 2004), and now used for the ocean in data assimilation systems (Oke et al., 2009; Sakov et al., 2012) and also in the altimeter DUACS system (Dibarboure et al., 2011). It provides a measurement of the gain in information brought by the observations. DFS is calculated as the trace of the HK matrix, H being the observation operator and K the Kalman gain matrix. The optimal interpolation method used in the ARMOR3D system uses a Gauss-Markov estimator that provides a direct access to the HK matrix as it is explicitly computed along with the error covariance matrix or formal mapping error (Bretherton et al., 1976). DFS are computed on each HK matrix, meaning each grid point in case of a suboptimal optimal interpolation method. It is thus possible to use this metric to access the local mapping gain in information provided by each dataset (in situ, satellite ). Partial DFS are associated with a particular dataset and are computed from the partial trace of the HK matrix, taking only elements associated with the dataset to be analyzed. Partial DFS associated with the dataset i is written DFS(i). Two metrics have been particularly studied: 1- The fraction of the overall information coming from a given dataset (DFS(i)/Σ i DFS(i)), 2- The fraction of the information from the dataset actually exploited by the optimal interpolation method (i.e. the amount of information not lost to duplicate data and measurement error) (DFS(i)/N(i), N(i) being the actual number of observations from dataset i). 2.2.3. Experiments Three experiments are conducted. The first one studies the impact of the in situ observing system as a whole together with the satellite observations. Two datasets are thus considered to compute DFS. The ARMOR3D reprocessing covering the 1993-2012 periods is used for this first analysis. 18

The second experiment studies the impact of the Argo observing system together with the other in situ observations and the satellite measurements. Three datasets are considered here and results from a 2-years period (2008-2009) are analysed. The third experiment tests the sensitivity of the results to the error prescribed to the synthetic fields. Three datasets are also considered here (Argo, other in situ and satellite) and one date (04/06/2008) is studied. 2.2.4. Results 2.2.4.1. 2 datasets The method is first illustrated for two specific analyses: one at the beginning of the 1993-2012 time periods for the 5 th of June 1996 and one at the end of the period for the 4 th of June 2008, 12 years later and for the temperature field at 100 m. The first date shows very poor in situ observations coverage, particularly in the Southern Ocean. In June 1996, the in situ observing system is mainly composed of moorings and XBTs lines (Figure 13). Opposite, the second date shows a very good and spatially homogeneous sampling of the in situ observations, mainly due to the Argo observations (Figure 15, Figure 20). Synthetic T at 100 m T in situ observations at 100 m Combined T at 100 m Difference between combined and synthetic T Figure 13: T fields at 100 m for the 05/06/1996 analysis (in C) 19

For the 5 th of June 1996, DFS metrics for the temperature field at 100 m show that the information comes from the in situ dataset where in situ observations are available (Figure 14 top left) and from satellite dataset otherwise (Figure 14 bottom left). The term satellite corresponds in fact to a misuse of language since the information comes from satellite dataset where available and the climatology everywhere else. Results show additionally that the ARMOR3D method considered to have redundant information in the in situ observing system in the tropical Pacific Ocean and in the Gulf Stream area where the fraction of information content exploited by the OI system is bellow 0.4 (Figure 14 top right). Everywhere else, about 80 % of the information from in situ observations is exploited. Opposite, only 20 to 30 % of the information content from the satellite dataset is exploited by the OI system meaning that most of the information is lost by the ARMOR3D method because of duplicate data and high measurement error (Figure 14 bottom right). Fraction of information content 0.34 ± 0.34 Fraction of information content exploited by OI system 0.64 ± 0.24 From in situ dataset 0.66 ± 0.34 0.22 ± 0.07 From satellite dataset Figure 14: DFS metrics for T field at 100 m of the 05/06/1996 analysis - Global means +/- 1 std are also indicated (Units: x100%) 20

Synthetic T at 100 m T in situ observations at 100 m Combined T at 100 m Difference between combined and synthetic T Figure 15: Same as Figure 13 for the 04/06/2008 analysis (in C) For the 4 th of June 2008, as the in situ observations are very well developed, more than 80 % of the information comes from the in situ dataset between 65 S and 65 N (Figure 16- top left). The information comes from the climatology in the Artic Ocean and near to the Antarctic shelfs and also in the Gulf of Mexico where no in situ observation is available (Figure 16- bottom left). As for the 5 th of June 1996, the ARMOR3D method considered to have redundant information in the in situ observing system in the three tropical oceans, Indian, Pacific and Atlantic as well as in the Kuroshio and Gulf Stream areas (Figure 16- top right). Again, everywhere else, about 80 % of the information from in situ observations is exploited and only 20 to 30 % of the information content from the satellite dataset is exploited by the OI system (Figure 16- bottom right). 21

Fraction of information content 0.65 ± 0.33 Fraction of information content exploited by OI system 0.74 ± 0.16 From in situ dataset 0.35 ± 0.33 0.19 ± 0.06 From satellite dataset Figure 16: Same as Figure 14 for the 04/06/2008 analysis (Units: x100%) Except at the surface (i.e. depth of zero), DFS metrics for the temperature field of the 4 th of June 2008 show very similar values as a function of depth (Figure 17). As global means, 60 % of the information comes from the in situ dataset and 40 % come from the satellite dataset. Those values are associated to standard deviation (std) of the order of 30 %. The optimal interpolation method exploits between 80 % at the surface and 85 % at depth of the information from in situ observations. Those values are associated to std of the order of 15 % down to 600 m that decreases to less than 10 % at depth. 20 % +/- 6 % of the information from satellite observations is exploited by the OI system. This value is constant over depth since the parameter of the OI system (error covariances, correlation scales) are the same at all depths. 22

Fraction of information content Fraction of information content exploited by OI system From satellite dataset From in situ dataset Figure 17: DFS metrics as a function of depth for the T field of the 04/06/2008 analysis (Global means +/- 1 std) (Units: x100%) For the 1993-2012 periods, time series of DFS metrics, as global means for the temperature field at 100-meter depth indicate that 1/3 of the overall information comes from the in situ dataset at the beginning of the period and that this number increases to 2/3 when the Argo observing system is fully deployed. The synthetic field dataset completes the information with 2/3 at the beginning of the period and then 1/3 (Figure 18). Those values are associated to quite large standard deviation of 30 %. The fraction of the information from the in situ dataset actually exploited by the optimal interpolation method is quite constant over time with mean values around 65 %. This number is really dictated by the correlation scales used in the optimal interpolation method and by the space/time distribution of observations. Associated mean standard deviation is of the order of 20 %. In some area, redundant in situ observations show indeed lower values and isolated observations (like in the Southern Ocean) have values close to 100 %. The fraction of information content from the synthetic field dataset actually exploited by the optimal interpolation method is also quite constant over time with mean values around 20 % and associated mean standard deviation of the order of 6 %. These numbers are dictated by the way the synthetic fields are used (i.e. as first guess for step 2 of the method) and the measurement errors applied to those fields. 23

Fraction of information content Fraction of information content exploited by OI system From satellite dataset From in situ dataset Figure 18: DFS metrics for the 1993-2012 periods and the T field at 100-meter depth (Global means +/- 1 std) (Units: x100%). When the same DFS metrics are averaged over the 65 S-65 N area, the fraction of information content coming from the in situ dataset increases by 10 % and the one coming from the satellite dataset decreases mathematically by 10 % (Figure 19). As there is almost no in situ observation available in the Arctic Ocean, the information comes only from the climatology for the full 1993-2012 period which artificially biaises the information content towards the satellite dataset. For the 65 S-65 N area, fraction of information content mean standard deviation decrease from 30 % at the beginning of the time period to 18 % when the Argo observing system is well developed. 24

Fraction of information content Fraction of information content exploited by OI system From satellite dataset From in situ dataset Figure 19: DFS metrics for the 1993-2012 periods and the T field at 100-meter depth (65 S-65 N means +/- 1 std) (Units: x100%) 2.2.4.2. 3 datasets As for the 2 datasets experiment, the 3 datasets experiment is first illustrated for the 4 th of June 2008 and for the temperature field at 100 m. In June 2008, the Argo observing system is very well deployed (Figure 20- middle right) and the in situ observing system is completed by the tropical moorings, some XBTs lines and a few observations from marine mammals in the south Indian Ocean (Figure 20- middle left). 25

Synthetic T at 100 m T in situ observations at 100 m T in situ other observations at 100 m T in situ Argo observations at 100 m Combined T at 100 m Difference between combined and synthetic T Figure 20: Same as Figure 15. The in situ observations have been separated into two sets: Argo and other (in C) 26

From in situ Argo dataset Fraction of information content 0.56 ± 0.30 / 0.67 ± 0.19 Fraction of information content exploited by OI system 0.82 ± 0.08 / 0.82 ± 0.08 0.09 ± 0.13 / 0.11 ± 0.13 0.49 ± 0.24 / 0.49 ± 0.24 From in situ other dataset 0.35 ± 0.33 / 0.21 ± 0.19 0.19 ± 0.06 / 0.20 ± 0.06 From satellite dataset Figure 21: DFS metrics for the T field at 100 m of the 04/06/2008 analysis - Global means +/- 1 std and 65 S-65 N means +/- 1 std are also indicated (Units: x100%) 27

Considering the 65 S-65 N area and for the temperature field at 100 m of the 4 th of June 2008, 67 % of information content comes from the Argo observing system, 11 % of the information content comes from the other in situ instruments and 21 % of the information content comes from the satellite dataset (Figure 21 - left). Almost no rendundancy is found in the Argo dataset since the mean information content exploited by the ARMOR3D method is 82 % with values above 90 % almost everywhere. High density Argo floats population are found in the west tropical Pacific Ocean and North Indian Ocean where light redundancy is visible (Figure 21 top right). As for the two datasets experiment, redundant information in the other in situ dataset is visible in the three tropical oceans and in the Gulf Stream and Kuroshio regions. This dataset nevertheless complement well the Argo observing system in mid latitude regions (Figure 21 middle right). Results for the satellite dataset are identical than for the 2 datasets experiment (Figure 21 bottom). For the 2008-2009 period, 65 S-65 N means of the two DFS metrics show very stable values (Figure 22). The fraction of information content coming from the satellite dataset decreases slightly but continuously during the two years period owing to the in situ other dataset as the Argo dataset show very stable values (Figure 22 - left). At the same time, a little bit more redundancy is found in the in situ other dataset with values starting at 50 % at the beginning of the year 2008 to values lower than 40 % at the end of 2009. From in situ Argo dataset Fraction of information content Fraction of information content exploited by OI system 28

From satellite dataset From in situ other dataset Figure 22: DFS metrics for the 2008-2009 periods and the T field at 100-meter depth (65 S-65 N means +/- 1 std) (Units: x100%) 2.2.4.3. Impact of the error on synthetic fields The objective of this third experiment is to test the sensitivity of the results to the errors applied to the synthetic fields during the optimal interpolation. DFS metrics are largely dependent on applied error covariances and correlation scales. As already mentioned, to gain maximum benefit from the qualities of both datasets, namely the accurate information given by in situ T/S profiles and the mesoscale variability given by the synthetic profiles, a precise statistical description of the errors of these observations are introduced in the optimal interpolation method. Particularly, for the synthetic profiles simulating remote-sensing (altimeter and SST) observations, since these observations are not direct measurements but are derived from the regression method, correlated errors have to be applied to correct long-wavelength errors or biases present in the synthetic fields and introduced by the regression method. These errors are decomposed into a long-wavelength error, which is perfectly correlated and assumed constant in the subdomain (defined as three times the space correlation scale) around the point to be estimated, and an error correlated spatially at space correlation scales proportional to the signal correlation scales. For the in situ profiles, since these observations are considered almost perfect, a very low white noise is applied. The error covariances are therefore expressed in the following form: 29

2 iε j = δijb ε for points i, j not from the same source of data and for points i, j from in situ observations, 2 i j ij LW + ε ε = δ b + E E C(r,t) for points i, j from synthetic observations deduced CS from remote-sensing measurements, 2 where b, E LW and E CS are the variance of the white measurement noise, of the longwavelength error and of the spatially correlated error, respectively. E LW and E CS have been estimated a few years ago from a comparison study between collocated in situ and synthetic profiles. An example is showed on Figure 23 for the year 2008. The comparison has been performed on 58 357 profiles. Temperature error as percentage of signal variance varies as a function of depth between 30 % at the surface, around 60 % in the mixed layer depth, then decreases around 50 % between 500 and 800 m depth and then increases up to 80 % at 1500 m depth. Mean values fixed to 20% of the signal variance constant over depth have been choosen for E LW and E CS. Figure 23: Top: Geographical distribution of the in situ observations for the year 2008, in 1 x1 boxes. Bottom: Mean, rms (left in C) and rms as percentage of in situ minus ARIVO variance (right in %) of the in situ minus fields differences for the temperature fields. In red, from the ARIVO climatology, in blue, from the synthetic fields and in green from the ARMOR3D combined fields. 30

This number of 20% of the signal variance is revisited here. Maps of the differences between collocated in situ and synthetic fields have been calculated and are now applied for each depth. For the T field at 100 m, E LW and E CS have maximum values in the western boundary current region and in the tropics with values around 2.5 C 2. Values are almost lower than 0.5 C 2 everywhere else (Figure 24 - right). Previous values, corresponding to 20 % of the signal variance (Figure 24 left, and Figure 25) were higher in the tropics and lower in mid latitudes. 31

Old New Figure 24: Long-wavelength error (E LW ) and error correlated spatially (E CS ) for the T field at 100 m. Left: old, Right: new (in C 2 ) Figure 25: T variance at 100 m ( in C 2 ) Again, considering the 65 S-65 N area for the temperature field at 100 m of the 4 th of June 2008 and three datasets as for the second experiment, 70 % of information content comes now from the Argo observing system, compared to 67 % in the previous experiment, 11 % of the information content comes from the other in situ instruments, as for the previous experiment, and by construction, the information content coming from the satellite dataset is decreased to 18 %, it was 21 % in the previous experiment (compare Figure 21 left and Figure 26 left). With the new long wavelength error and the new spatially correlated error, the influence of the Argo observing system is slightly increased and the one of the satellite observing system is slightly decreased. The fraction of information content exploited by OI system is unchanged for the two in situ datasets and it is slightly decreased for the satellite dataset meaning that the system considered now to have more redundant information. 32

From in situ Argo dataset Fraction of information content 0.70 ± 0.17 Fraction of information content exploited by OI system 0.82 ± 0.08 0.11 ± 0.13 0.49 ± 0.24 From in situ other dataset 0.18 ± 0.16 0.18 ± 0.05 From satellite dataset Figure 26: Same as Figure 21 when the new error on synthetic field is used 65 S-65 N means +/- 1 std are indicated (Unit: x100%) 33

As the long-wavelength and the spatially correlated errors vary now as a function of depth, their impact is also expected to be visible as a function of depth. This is what is showed on Figure 27 for the 65 S-65 N region and on Figure 28 for the tropics (20 S-20 N). Major impact is visible bellow the mixed layer depth in the main thermocline. The fraction of information content from the Argo dataset is slightly increased and the one from the satellite dataset is slightly decrease. The fraction of information content from satellite dataset exploited by the OI system is now not represented by a strait line but is more consistent with the vertical dynamics of the ocean. Fraction of information content Fraction of information content exploited by OI system From satellite dataset From in situ other dataset From in situ Argo dataset Figure 27: DFS metrics as a function of depth for the T field of the 04/06/2008 analysis (65 S-65 N means +/- 1 std) with the new errors (green) and the old errors (black) (Units: x100%) 34

Fraction of information content Fraction of information content exploited by OI system From satellite dataset From in situ other dataset From in situ Argo dataset Figure 28: Same as Figure 27 with mean limited to the 20 S-20 N region 35

2.2.5. Summary In order to assess the impact of Argo observations to map temperature and salinity fields with satellite observations, Degree of Freedom of Signal (DFS) have been derived from the ARMOR3D observation-based system (Guinehut et al., 2012). DFS is an influence matrix diagnostics that provides a measure of the gain in information brought by the observations. Several experiments have been conducted and two metrics have been studied. When two datasets are considered: in situ (including Argo, XBTs, moorings,...) and satellite, results for the temperature field at 100 m and the 1993-2012 period show that for the global ocean, 1/3 of the overall information comes from the in situ dataset at the beginning of the period and that this number increases to 2/3 when the Argo observing system is fully deployed. The satellite dataset completes the information with 2/3 at the beginning of the period and then 1/3. When three datasets are considered: Argo, other in situ (including XBTs, moorings,...) and satellite, results for the temperature field at 100 m and the 2008-2009 period show that for the 65 S-65 N area, most of the information comes from the Argo observing system (67 %), then it comes from the satellite dataset (21 %) and finally from the other in situ instruments (11 %). Almost no redundancy is found in the Argo dataset, apart from the Bay of Bengale and the very west part of the tropical Pacific Ocean where the density of Argo network is very high. Redundant information is found in the other in situ dataset in the three tropical oceans and in the Gulf Stream and Kuroshio regions. This dataset nevertheless complement well the Argo observing system in mid latitude regions. For the satellite dataset, only 20 to 30 % of the information content is exploited by the ARMOR3D method meaning that most of the information is lost because of duplicate data and high measurement error. Results vary slightly between the surface and 1500 m depth but main conclusions still remain. Moreover, it has been showed that a better representation of the errors on the synthetic fields (i.e. satellite dataset) induces a more realistic vertical structure of the results. Results presented here are widely driven by the parameter (correlation scales, error covariances on the different observing systems) used in the optimal interpolation method and thus by the characteristics of the field to be reconstructed. It is, for example, not very surprising that redundant information is found in the three tropical oceans in the other in-situ dataset since the method uses large correlation scales, up to 500 km in latitude and 700 km in longitude in these regions with a temporal correlation scale fixed at 15 days. Specific studies with adapted parameters should thus be carried out for a better evaluation of the different observing system and particularly for the tropical ocean observing system. 36

37 Deliverable D3.313 Global ocean analysis and forecasting: OSE/OSSEs results and 3. Results of OSSEs experiments 3.1. OSSE conducted at Mercator Ocean The Argo float ocean observation array is in constant evolution. The spatial coverage allows observation of most of the world ocean with a spatial and temporal coverage of about 3 x3 each 10 days. Floats are diving up to 2000 m depth. This appears to be a strong limitation as the deep ocean is evolving but the actual in situ sampling do not allow a good description of its variability. It is also one of the key areas which can be impacted by climate change. The thermohaline circulation at depth is not well observed. Altimetry data gives us some insight that has to be completed with in situ profiles. To test the impact of future possible extension of the Argo array, OSSE were conducted with the Mercator Ocean global ¼ ocean analysis and forecasting system. This system aims at describing the physical properties and its variability of the ocean large scale. The OSSEs experiments allow us to test the sensitivity of our system to deep measurements. The observations will be simulated with a 1/12 global ocean model simulation and then assimilated in the ¼ global system. Such experiments allow exact 3D field comparison with the reference run taken as the true ocean. We can then conclude on the ability of our system to reconstruct the simulated 1/12 depending on the deep Argo float assimilated. It is important to remind that results from OSSEs depend on the analysis system (ocean model physic and configuration) used and the model used to simulate the observations. One should thus be cautious to derive general statements on observing system simulation experiments. 3.1.1. Experimental setup Both model configurations are based on a global ORCA grid. The spatial resolution of the high resolution configuration (1/12 ) goes from 10 km at the equator to 2 kilometers close to the grid poles. The spatial resolution of the eddy permitting configuration goes from 28 km at the equator to 6 km close to the northern grid pole. The 1/12 simulation, used for observations simulation, was initialized in October 2006 with the Levitus 2005 climatology. It is forced by real time ECMWF forcing with bulk formulae. The mass budget follows an analytical seasonal cycle. The ¼ simulation (T323) was initialized in January 1989 from the Levitus 98 climatology. It is forced by ERA interim atmospheric fluxes forcing with bulk formulae. The mass budget is not controled. Both configurations have the same 50 vertical levels going from 1m thickness at the surface to 450 m at the bottom. The deep ocean water properties can differ in those two simulations for different reasons: - Initialization fields: different version of the Levitus climatology (1998 vs 2005), - Running time from initialization, a model drift can appear, - Different atmospheric forcings. Figure 29 shows the mean temperature difference below 4000 m up to the model ocean bottom between the simulation with assimilation of Argo float up 2000 m depth and the global 1/12 simulation used to simulate the observations. Differences are situated in the Altantic Ocean and around the circumpolar ocean, as expected from the thermohaline circulation. The location of those differences is compared to a map of the mean local heat fluxes through 4000 m implied by

abyssal warming below 4000 m from the 1990s to the 2000s. It is interesting to notice that the signal simulated between the observations and the model solution shows similar pattern, except south of the Denmark Strait where our simulation shows a signal not seen in the other estimation. The amplitude of the simulated differences is much larger than the observed one from the 1990s to the 2000s (figure 30). Figure 29: Mean temperature differences below 4000m between RunNoDeep RunTruth. Figure 30: (a) Mean local heat fluxes through 4000 m implied by abyssal warming below 4000 m from the 1990s to the 2000s within each of the 24 sampled basins (black numbers and color bar) with 95% confidence intervals. The local contribution to the heat flux through 1000 m south of the SAF (magenta line) implied by deep Southern Ocean warming from 1000 to 4000 m is also given (magenta number) with its 95% confidence interval. Basin boundaries (thick gray lines) and 4000-m isobath (thin black lines) are also shown (from Purkey and Jonhson, 2010). Observations are simulated with the daily output of the global ocean simulation at 1/12. We computed the model equivalent at the position in time and space of the existing in situ observation of 2009 in the CORA 3.4 data basis. The model values are added to the original files. All model equivalent of Argo profiles were extended up to the ocean bottom. New variables are created: HDCST_TEMP, HDCST_SAL and HDCST_PRES. The stored temperature is an in situ temperature: the model potential temperature was transformed into in situ temperature to mimic the native file format of the CORA files. 38

Figure 31 : Temperature innovation at 3220 m depth for a given week (14 october 2009) when 1/9 Argo floats are diving below 2000 m depth. Figure 32 : Temperature innovation at 3220 m depth for a given week (14 october 2009) when all Argo floats are diving below 2000 m depth. We run the assimilation system over the year 2009, loading the simulated in situ observations only. There is no data assimilation south of 60 S. Different experiments were done to assess the impact of the data coverage at depth and density, they are presented in the Table 2. We simulate an Argo observation array with all floats going to 2000 m depth, to simulate the present situation; we then expand the assimilated profiles up to 4000 m depth for all the floats. Then we simulate the case were only a one third of the floats will make measurements up to 4000 m, and only one over third profiles. The density of observation between 2000m and 4000m is then reduced by a factor of 9. This allows to keep a spatially uniform global coverage but sparser. 39

As the variability scales, both in time and space, of the deep ocean are supposed to be smaller than at surface a coarser sampling could be sufficient. Argo up to 2000m Argo up to 4000m Argo up to ocean bottom Run1 Classic 100% 0% 0% Run2 all 4000m 100% 100% 0% Run3-1/9 bottom Run4 1/9 4000m 100% 11% 11% 100% 11% 0% Table 2: OSSE experiments We will assimilate ONLY the in situ simulated observations to focus on their contribution without introducing the problem of the coherency between the Sea Level observation and the in situ observation, both of them constraining the dynamic height in different way. This is a complex problem in data assimilation system as the assimilation of SLA requires the use of an MDT. 3.1.2. OSSE results We will mainly focus on the ability of the data assimilation system to control the deep ocean heat and salt content depending on the simulated observation array. 3.1.2.1. Maps Figure 33 represents the mean temperature error estimate for the different OSSE experiments for different depth range. When no Argo profile is going deeper than 2000 m, which is supposed to represent the present situation, the model-observation misfits are small in the Pacific Ocean, slightly larger in the Indian Ocean and significantly larger, up to 1 C and locally more, in the Antarctic Circumpolar Ocean and in the North Atlantic. Differences are large scale as expected in the deep ocean; the small noise on the topographic accident is due to bathymetry representation differences between the ¼ and 1/12 degree model configuration. 40

Bias temp 2000m-4000m Bias temp 4000m-6000m Run with all Argo up to 4000 m Run with 1/9 Argo up to 4000 m Run with Argo up to 2000 m Figure 33: Mean deep ocean temperature errors in the different OSSEs for different depth ranges. Standard Deviation temp 2000m-4000m Standard Deviation temp 4000m-6000 m 41

Run with all Argo up to 4000 m Run with 1/9 Argo up to 4000 m Run with Argo up to 2000 m Figure 34: Deep ocean temperature RMS error in the different OSSEs for different depth ranges. The bias existing in the 2000-4000 m layer in the simulation without Argo data assimilation below 2000 m is largely reduced in all experiments with profiles assimilated up to 4000 m depth. An unexplained small cold biais spatially homogeonous appears. The biais existing in the 4000-5500 m layer is reduced if the Argo profiles, all or 1/9, are assimilated up to 4000 m depth. 3.1.2.2. SEEK increment and 3DVar bias correction increment The data assimilation systems produce weekly corrections that are added to the model forecast. 42

The data assimilation system SAM relies on local inversion: for a sparse array of observations this hypothesis can be very restrictive. It is the case in this study. The corrections are localized around the observation points. Even when the observed profiles do not extend deeper than 2000m depth, SAM will produce correction below that depth, the 3DVar bias correction will not (Figure 35). The 3Dvar bias correction is build to estimate large scale temperature and salinity bias below the thermocline. The correlation length scale is prescribed to 6 at that depth. When Argo observations reach 4000 m depth, the bias correction is in agreement with the innovations (Figure 36) and has large scales (Figure 37). Figure 35: SAM2 and 3D-Var bias temperature increment at 3220 m for a week in October 2009 for the simulation with Argo floats diving up to 2000 m depth. Figure 36: Temperature innovation in October 2009 for the simulation with Argo floats diving up to 4000 m depth. 43