Markov chain Monte Carlo methods for high dimensional inversion in remote sensing

J. R. Statist. Soc. B (04) 66, Part 3, pp. 591 7 Markov chain Monte Carlo methods for high dimensional inversion in remote sensing H. Haario and M. Laine, University of Helsinki, Finland M. Lehtinen, University of Oulu, Sodankylä, Finland E. Saksman University of Jyväskylä, Finland and J. Tamminen Finnish Meteorological Institute, Helsinki, Finland [Read before The Royal Statistical Society at a meeting organized by the Research Section on Statistical approaches to inverse problems on Wednesday, December 10th, 03, Professor J. T. Kent in the Chair ] Summary. We discuss the inversion of the gas profiles (ozone, NO 3,NO 2, aerosols and neutral density) in the upper atmosphere from the spectral occultation measurements. The data are produced by the Global ozone monitoring of occultation of stars instrument on board the Envisat satellite that was launched in March 02. The instrument measures the attenuation of light spectra at various horizontal paths from about 100 km down to 10 km.the new feature is that these data allow the inversion of the gas concentration height profiles. A short introduction is given to the present operational data management procedure with examples of the first real data inversion. Several solution options for a more comprehensive statistical inversion are presented. A direct inversion leads to a non-linear model with hundreds of parameters to be estimated. The problem is solved with an adaptive single-step Markov chain Monte Carlo algorithm. Another approach is to divide the problem into several non-linear smaller dimensional problems, to run parallel adaptive Markov chain Monte Carlo chains for them and to solve the gas profiles in repetitive linear steps.the effect of grid size is discussed, and we present how the prior regularization takes the grid size into account in a way that effectively leads to a grid-independent inversion. Keywords: Adaptive Markov chain Monte Carlo algorithms; Atmospheric remote sensing; Global ozone monitoring of occultation of stars satellite instrument; High dimensional Markov chain Monte Carlo methods; Inverse problems; Regularization 1. Introduction Atmospheric research is increasingly based on remote sensing measurements. As a good example of this, in March 02 the European Space Agency launched a large satellite called Envisat to a polar orbit at 800 km height. On board Envisat there are 10 instruments focusing on studying the Earth s environment, three of them are especially targeted to atmospheric research. These three instruments, Global ozone monitoring of occultation of stars (GOMOS), the Michelson Address for correspondence: H. Haario, Department of Mathematics, University of Helsinki, PO Box 4, Yliopistonkatu 5, 0014 Helsinki, Finland. E-mail: haario@cc.helsinki.fi 04 Royal Statistical Society 1369 7412/04/66591

592 H. Haario, M. Laine, M. Lehtinen, E. Saksman and J. Tamminen interferometer for passive atmospheric sounding and the Scanning imaging absorption spectrometer for atmospheric chartography, all use different remote sensing methods to study the atmosphere at heights between 10 and 100 km. The upper atmosphere changes slowly; therefore continuous and long measurement series are required to detect these changes. Moreover, the volumes of data are large and measurements are typically local in both time and space. So a systematic further analysis is needed to tie together the data into a large, global and dynamic map of the atmosphere. Currently, data assimilation methods the extended Kalman filter and four-dimensional variance methods are applied and further developed to answer these challenges. Their success strongly depends on how correctly the error covariances of the local inversions are estimated at each assimilation step. It is also important that we can combine the information measured with different instruments in a proper way. This requirement makes it crucial that the error estimates of the various measurement results are correct. Here we discuss the use of Markov chain Monte Carlo (MCMC) methods for this in the case of the GOMOS instrument data. However, a straightforward application of basic MCMC methods is not feasible here, but new MCMC methods are required. The amount of GOMOS data is enormous and the characteristic features of the data vary from measurement to measurement. Therefore, methods which do not require tuning of the proposal distribution are necessary. When applying MCMC methods this can be achieved by using adaptive algorithms. Another obstacle to the use of standard MCMC methods, which is typical for inverse problems in general, is the high dimensionality of the problem. Indeed, the dimension of the function-valued unknown is infinite in principle, and the practical number of unknowns is determined by the discretization of the problem. Naturally, the final result should be independent of the discretization. The present operational retrieval method for the GOMOS data yields point estimates for the gas profiles in a two-step algorithm, together with standard error bar estimates. Here we shall extend this approach by running parallel low dimensional adaptive MCMC chains (using the adaptive Metropolis (AM) algorithm from Haario et al. (01)) and combining the results to arrive at high dimensional inversion profiles, without actually running high dimensional chains. The method is fast and robust, but limited to the specific characteristics of the operational inversion of GOMOS data. As a more general approach to deal with high dimensions, we have successfully applied the single-component adaptive Metropolis (SCAM) algorithm from Haario et al. (03). The results of both methods are compared below. The prior information here the assumed mean concentrations of gas at various heights or the smoothness of the profiles typically leads to a Tikhonov-type regularization (or ridge regression). The inversion result depends on both the discretization and the regularization weight parameter. Here we combine the choice of the regularization parameter with the discretization interval in a way that leads to an inversion result that is essentially independent of the discretization. In Section 2 we first describe the measurement principle of the GOMOS instrument and the related inverse problems. The adaptive MCMC algorithms are described in Section 3. The parallel MCMC approach is applied to the operational algorithm in Section 4. Section 5 describes the application of the SCAM algorithm and gives comparisons between the various approaches. Finally, the regularization and the discretization issues are more closely discussed in Section 6. 2. Global ozone monitoring by occultation of stars instrument to study the atmosphere 2.1. Measurement principle of the instrument The GOMOS instrument is the first operational satellite instrument which uses stellar light

High Dimensional Inversion in Remote Sensing 593 Fig. 1. Principle of the stellar occultation measurement: the reference spectrum is measured above the atmosphere and the attenuated spectrum through the atmosphere; by dividing the latter by the reference spectrum we obtain the transmission spectrum; the tangential altitude of the measurement is denoted by z; the atmosphere is (locally) presented as spherical layers around the Earth (the thickness of the atmosphere is largely overexaggerated in the figure) to study the chemical composition of the atmosphere (Bertaux et al., 1991). Its main objective is to measure vertical profiles of ozone, but it can also detect the air density, NO 2,NO 3 and aerosols. Minor gases like OClO and BrO might be detected in certain atmospheric conditions also. The stellar occultation measurement principle of GOMOS is demonstrated in Fig. 1. The optical spectrometer of GOMOS detects light coming from a star at ultraviolet and visible wavelengths. The GOMOS charged-coupled device detector measures the intensity of stellar light by counting photons. A reference intensity spectrum Iλ reference at various wavelengths λ is first measured above the atmosphere. In the atmosphere the stellar light is attenuated, absorbed or scattered, owing to different gases in the atmosphere. As the satellite moves along the orbit, the stellar spectra I λ are repeatedly measured at different tangential altitudes when the ray path traverses the atmosphere. When dividing the stellar spectrum measured through the atmosphere with the reference spectrum we obtain the transmission spectrum T λ = I λ =I reference λ : Here the specific stellar features in the spectrum have vanished and only the atmospheric contribution is present. The transmission spectrum includes the fingerprints of the various gases along the ray path, from which the amounts of the gases may be inverted. Owing to the measurement geometry of the GOMOS instrument, the sampling resolution denser than 1.7 km is better than in earlier remote sensing methods that have been used to study the global atmospheric composition. During 1 day the Envisat satellite orbits the Earth about 15 times, resulting in roughly 4 occultations (in which a star sets behind the Earth s limb) at different geographical locations. There are about 0 stars which can be used as possible sources of light for GOMOS. In one occultation, the transmission is measured at about 10 different wavelengths at typically different tangential altitudes between 10 and 100 km, resulting in around 100000 data points. Depending on atmospheric conditions, the number of

594 H. Haario, M. Laine, M. Lehtinen, E. Saksman and J. Tamminen gases whose data are to be inverted ranges from 3 to 7. Depending on the discretization of the height profiles of the gases, we obtain around 0 0 unknown parameters for one occultation. For more details on GOMOS see http://envisat.esa.int/instruments/gomos/ or Harris (01). 2.2. Modelling The transmission at wavelength λ, measured along the ray path l, includes a term Tλ,l abs due to absorption and scattering by gases (aerosols and air density are treated here as gases) and a term Tλ,l ref due to refractive attenuation and scintillation, T λ,l = Tλ,l abst λ,l ref. The dependence of the transmission on the gas densities along the line of sight l is given by Beer s law: [ ] Tλ,l abs = exp α gas λ {z.s/} ρgas {z.s/} ds,.1/ l gas where ρ gas.z/ gives the density of gas at altitude z and α denotes the cross-sections. Each gas has typical wavelength ranges where the gas is active either by absorbing, scattering or emitting light. The cross-sections reflect this behaviour and their values are considered to be known from laboratory measurements. In equation (1) the sum is over different gases and the integral is taken over the ray path. As the GOMOS detector measures counts, the measurement noise is, in principle, Poisson distributed. However, since the number of counts is high due to the dominating instrumental background noise, the noise has been approximated to be normally distributed. The amount of background noise can be estimated by measurements of dark sky. It is also assumed that the noise is independent at each wavelength and each height; see European Space Agency (02) and Bertaux (1999) for more details and a justification of the assumptions. These assumptions agree reasonably well with the first measured data sets, although some non-idealities, presumably due to scintillation effects, emerge in certain situations. The measurements are thus modelled by y λ,l = T abs λ,l T ref λ,l + " λ,l, with independent Gaussian noise " λ,l N.0, σ 2 λ,l /, λ = λ 1,...,λ Λ, l = l 1,...,l M. The likelihood function for the gas profiles then reads P{ρ.z/} exp{.t y/c 1.T y/=2}.2/ with C = diag.σ 2 λ,l /, y =.y λ,l/ and T =.T λ,l /. The inversion problem is to estimate the gas profiles ρ.z/ =.ρ gas.z//, gas = 1,...,n gas. 2.3. The operational algorithm For clarity, certain technical details, which are not relevant for this paper, such as the instrumental point spread function, finite exposure time and chromatic refraction, have been omitted from the modelling of the GOMOS measurements above (see European Space Agency (02) for a complete discussion of these practical details). In addition, several simplifying assumptions are made in the operational GOMOS data processing algorithm. The effects due to refraction T ref are approximated in the data processing. The refractive attenuation is estimated by using the analysis of the neutral density obtained from the European Center for Medium-Range Weather Forecast. The transmission due to the scintillation is approximated by using data coming from the two fast photometers of the GOMOS instrument which operate in blue and red light with a high frequency of 1kHz (see European Space Agency (02) for details). As a consequence of

High Dimensional Inversion in Remote Sensing 595 these approximations, the transmission term T abs alone contains the parameters to be estimated in the present GOMOS retrieval algorithm. Owing to the size of the problem the operational data processing algorithms are based on separating the problem into two parts (see for example Kyrölä (1999) or Kyrölä et al. (1993)). The separation of the inverse problem is possible if we can approximate the temperature-dependent cross-sections sufficiently well with representative cross-sections (e.g. cross-sections at the temperature of the tangent point of the ray path). In the operational algorithm this is assumed, so the cross-sections become constant on each ray path and we may write ( Tλ, abs l = exp ) α gas λ,l Ngas l, λ = λ 1,...,λ Λ,.3/ gas where the horizontally integrated gas densities, line densities, are given by N gas l = ρ gas {z.s/} ds, l = l 1,...,l M :.4/ l Solving equation (3) for line densities N gas l is called the spectral inversion and solving local constituent densities ρ gas as unknowns in equation (4) is called the vertical inversion. The spectral inversion (3) is non-linear. Linearization by taking logarithms is not quite straightforward since in every occultation there are situations where the attenuated signal from the star is small in comparison with the background noise. In the operational algorithm the preestimated background noise constants are subtracted from the measurements in a preprocessing step (level 1b processing; see European Space Agency (02)). Recall that the measurement noise in transmission data is assumed to be normally distributed and non-correlated both in time and in wavelength direction. So the line density vector N l =.N gas l /, gas=1,...,n gas, in the likelihood function P.N l / exp[ {T l.n l / y l }Cl 1 {T l.n l / y l }=2],.5/ with T l =.T λ,l /, λ = 1,...,Λ, is fitted to the data y l =.y λ,l /, C l = diag.σλ, 2 l /, λ = 1,...,Λ, separately for each line of sight l. In the operational algorithm this is done by the Levenberg Marquardt algorithm. The vertical inversion problem is a linear problem. It can be considered as an atmospheric tomography problem, where the gas density profiles are computed from the integrated line densities. By discretizing the atmosphere into layers (see Fig. 1) and assuming constant (or for example linearly interpolated) gas densities inside the layers the problem can be solved separately for each gas as the linear inversion problem Aρ gas = N gas,.6/ where N gas =.N gas l /, l = l 1,...,l M. With constant gas densities inside layers, the matrix A contains the lengths of the line of sight in the layers and depends on the discretization. In the present operational retrieval algorithm the discretization is fixed so that the number of layers is the same as the number of measurement lines in each occultation (ρ gas =.ρ gas i /, i = 1,...,M). So the matrix of the linear problem becomes invertible and the solution is easily computed. In the operational spectral inversion it is assumed that the posterior distribution around the maximum of expression (5) is sufficiently quadratic that it can be approximated with a covariance matrix. This covariance matrix is computed with the iterative Levenberg Marquardt method. In the operational linear vertical inversion, each gas is treated separately and the correlations between the posterior distributions of different gases are not taken into account any more.

596 H. Haario, M. Laine, M. Lehtinen, E. Saksman and J. Tamminen 90 80 Altitude (km) 10 7 10 8 10 9 10 10 10 11 10 12 10 13 10 14 Local density (1/cm 3 ) Fig. 2. Two ozone profiles measured by the GOMOS instrument on September 17th, 02, close to Hawaii: occultation R02967/S009 (Å) is performed by using a bright star and R02967/S143 (ı) with a more typical star; the 1-σ error bars ( ) are 68.3% probability regions; the data retrieval is done during the GOMOS phase at ACRI with the GOPR software, version 5.0 (level 1b) and version 5.3 (level 2) (this version is different from the first release that will be provided to Envisat users after the commissioning phase) The covariance matrix of the retrieved local density values, computed as simple Gaussian error propagation for the linear step, hence includes only correlations between different altitudes. Fig. 2 shows ozone profiles produced from the real GOMOS data by the operational algorithm. 2.4. Need for Markov chain Monte Carlo methods for global ozone monitoring of occultation of stars Although the present operational approach which follows the usual tradition of non-linear fitting gives reasonable results, it is unsatisfactory both with respect to the statistical treatment and how the a priori knowledge on the gas profiles should be employed. The regularization and discretization of the gas profiles are treated in Section 6. For poorly identified situations the results become unstable, as the fitted values depend on tuning of the optimization algorithm, initial values or stopping criteria for instance. The linearized covariance matrix may or may not correctly approximate the true probability region of the line density values. Positivity constraints must be applied in the inversion; at high altitudes the gas concentrations tend to 0 and the amounts of the minor gases are typically small even at lower altitudes. The approximative covariances are misleading if the fitted maximum likelihood values come close to the positivity bounds. The MCMC solution extends the present operational solution in several respects. First, it can be used to validate the error bounds that are obtained by the linearizations of the fitting procedure. Indeed, for occultation data from bright stars the results for well-identified gases

High Dimensional Inversion in Remote Sensing 597 practically coincide. Unfortunately, only a minority of the available stars belong to this class. For dim stars the gas profiles are more poorly identified and in the simulations the solutions by MCMC sampling have been shown to be more robust against the relatively large noise in the data (Haario et al., 1999; Tamminen and Kyrölä, 01). The MCMC methods allow us to calculate the non-gaussian posterior distributions correctly owing to the prior constraints. Finally, the emerging real data may reveal features that are not anticipated in the present retrieval algorithms: non-gaussian correlated noise, for instance. The eventual generalizations or corrections may be naturally incorporated in the MCMC approach. 3. Adaptive Markov chain Monte Carlo algorithms In certain applications we need to perform a large number of simulations with varying underlying target functions. This is especially so in inverse problems which are solved repeatedly for many data sets, a prime example being the GOMOS data processing that is considered here. The amount of data is enormous and the induced posterior distributions vary greatly, depending on atmospheric conditions and the level of noise in the data. Examples of real transmission data with two stars, a bright and a dim star, are presented in Fig. 3. The noise level is clearly larger in the dim star data. Using standard Metropolis Hastings methods in such problems leads to a laborious and time-consuming tuning of the proposal distributions, which is done typically by trial and error. Here we solve this problem by utilizing adaptive MCMC algorithms that can be used in an automatic way for large data sets and in fairly large dimensions. Let us first describe the AM algorithm (introduced in Haario et al. (01)). It was originally motivated by problems that are encountered in applying standard MCMC methods in the GOMOS inversion. The AM algorithm works like the basic Metropolis algorithm with a d-dimensional Gaussian proposal distribution centred at the current point. The only difference is that the AM algorithm updates the covariance matrix of the proposal distribution along time by employing the information learned so far. More exactly, let π denote the target density on R d and assume that at time t 1wehave sampled the states X 0, X 1,...,X t 1, where X 0 is the initial state. A candidate point Y is then sampled from the Gaussian proposal distribution q t. X 0,...,X t 1 /. The candidate point Y is accepted with probability min{1, π.y/=π.x t 1 /}, in which case we set X t = Y, and otherwise X t = X t 1. The proposal distribution q t. X 0,...,X t 1 / employed in the AM algorithm is 1 1 Transmission 0.5 0 Transmission 0.5 0 0.5 0 0 0 0 Wavelength (nm) (a) 0.5 0 0 0 0 Wavelength (nm) (b) Fig. 3. Real measured transmission spectra at km tangent height (corrected for refractive attenuation and scintillation; the level of noise varies according to the brightness of the star; it is planned that the GOMOS instrument will follow stars up to magnitude 4): (a) transmission spectrum when the star is bright (stellar magnitude 0.5); (b) transmission spectrum when the star is dimmer (stellar magnitude 2.9)

598 H. Haario, M. Laine, M. Lehtinen, E. Saksman and J. Tamminen centred at the current point X t 1 and has covariance C t = C t.x 0,...X t 1 /,where(fort t 0, i.e. after an initial period) C t = s d cov.x 0,...,X t 1 / + s d "I d :.7/ cov.x 0,...,X t 1 / denotes the empirical covariance matrix determined by the points X 0,..., X t 1 R d and I d is the d-dimensional identity matrix. The small parameter ">0 prevents the covariance from shrinking to 0 (in our experience the algorithm is not sensitive to the actual value of "). As a basic choice for the scaling parameter we adopted the value s d = 2:4 2 =d from Gelman et al. (1996), where it was shown that in a certain sense this choice optimizes the mixing properties of the Metropolis search in the case of Gaussian targets and Gaussian proposals. A novel feature of the AM algorithm is its non-markovian nature, which follows from the natural adaptation scheme (7) that is employed. This non-markovianity makes the (correct) convergence of the algorithm non-obvious. However, the following theorem can be shown to hold. Theorem 1. Assume that the target density π has bounded support and is bounded from above. Then the AM chain simulates properly the target distribution: for any bounded and measurable function f : R d R it holds almost surely that ] lim n [ 1 n + 1 {f.x 0/ + f.x 1 / +...+ f.x n /} = f.x/ π.x/ dx: d R We refer to Haario et al. (01), theorem 1, for a more exact statement and for a proof of the result. The basic idea behind the proof is that the adaptation scheme turns out to be asymptotically Markovian; the change in the covariance slows down along time. Thus, the adaptation is almost constant along time intervals of increasing length, and this explains heuristically why convergence takes place. The actual proof, however, is somewhat technical. In Andrieu and Robert (01) the idea of non-markovian simulation is generalized and carried further. The references in Haario et al. (01) and Andrieu and Robert (01) link to other works on adaptive MCMC algorithms. The AM algorithm has been applied successfully in many different applications besides GOMOS, where the dimension is not very high. The SCAM algorithm, introduced in Haario et al. (03), is a variant of AM which appears to perform well even in relatively high dimensions. In the SCAM algorithm we follow the standard single-component Metropolis rule, with one-dimensional Gaussian proposal distributions, whose variances are adapted individually analogously to the AM algorithm. Thus, when deciding the ith co-ordinate Xt i (i = 1,...,d) of the tth state X t we apply the one-dimensional Metropolis step. (a) Sample y i from one-dimensional normally distributed proposal distribution q t centred at Xt 1 i with variance Ci t. Here, after an initial period, Ct i = s 1 var.x0 i,...,xi t 1 / + s 1": (b) Accept the candidate point y i with probability { min 1, π.x1 t,...,xi 1 t, z i, Xt 1 i+1,...,xd t 1 / } π.xt 1,...,Xt i 1, Xt 1 i,...,xd t 1 /, in which case we set Xt i = yi, and otherwise Xt i = Xi t 1. The counterpart of theorem 1 for the SCAM algorithm is analogous (see Haario et al. (03), theorem 1). As usual with one-step algorithms, we must additionally assume that the geometry

High Dimensional Inversion in Remote Sensing 599 of the situation allows the algorithm to move around the whole support of the target distribution. The practical performance of the SCAM algorithm naturally depends on the nature (non-linearity) of the underlying target distribution. This has been studied in various test cases in Haario et al. (03). 4. Parallel Markov chain Monte Carlo methods The operational data processing of GOMOS is done in two steps as discussed in Section 2.3. Next we follow the operational approach, but we employ the MCMC approach for the spectral inversion part. The posterior distribution of the line densities is transformed to the posterior distribution of the gas density profiles by solving the linear vertical inversion (6) for all the MCMC samples of the distribution of the line densities. More precisely, because in the operational approach the measurement error is assumed to be Gaussian and independent in time (i.e. successive altitudes), the distributions of the line densities N =.N l / are independent and can be sampled separately, i.e. we can run separate and independent MCMC chains for each altitude. The number of parameters in each spectral inversion problem is equal to the number of gases that we are detecting. As this is a low dimensional problem the length of the MCMC chain can be moderate and the computational central processor unit (CPU) time remains low. Let N.i/ l, i = 1,...,K, be an MCMC chain of length K, i.e. a sample from the posterior distribution of the line densities at altitude l, with likelihood (5). The posterior distribution of the gas density profiles are obtained by solving the vertical inversion problem (6) for each sampled set N.i/ =.N.i/ 1,...,N.i/ M /, i.e. picking the ith element from each independent chain representing the line densities at different altitudes. This procedure is repeated for i = 1,...,K, yielding a sample from the posterior distribution of the gas densities. In this step, if no prior restrictions between the gases are set, the gases can be treated separately. The procedure can now be described as consisting of two parts. (a) Solve M non-linear inversion problems (3) independently of each other by applying the adaptive AM algorithm from Section 3, and using expression (5) as the likelihood function. This will produce M chains N.i/ l, l = l 1,...,l M, i = 1,...,K. (b) Solve the linear vertical inversions (6) Aρ gas,.i/ = N gas,.i/, i = 1,...,K,.8/ separately for each gas, with N gas,.i/ obtained from step (a). This will produce samples ρ.i/, i = 1,...,K, from the posterior of ρ. Equation (8) could equally be written as a larger system to solve for all the gases simultaneously. This is needed if any prior dependences between the gases are included. In the absence of prior dependences, the above sampling procedure yields the posterior correlations between the gases, even with a separate treatment of the gases. An example of the parallel approach will be given in the next section, where the same inversion problem is also solved directly with a high dimensional MCMC scheme (Fig. 4). Step (b) can be achieved either by sampling values randomly from the MCMC chains or just by using each value of the chain sequentially. The latter choice allows us to discard the spectral inversion result (and to conserve memory) as soon as it is used. Just run M parallel MCMC simulations and after each step solve the linear vertical inversion, saving only the results of the linear step. This does not affect the AM algorithm, as the covariance update (7) can be done recursively.

0 H. Haario, M. Laine, M. Lehtinen, E. Saksman and J. Tamminen 10 12 10 14 10 16 10 18 10 (a) 100 0 100 (d) Altitude (km) Altitude (km) 10 8 10 10 10 12 10 14 (b) 10 0 10 (e) 10 4 10 2 10 0 10 2 Gas density (1/cm 3 ) (c) 100 0 100 Relative error (%) Fig. 4. Inverted gas profiles (, true profile; Å, mean value of the parallel chain; ı, mean value of the SCAM algorithm chain; the discretization is fixed as, in the operational algorithm, no regularization is applied; the 95% probability regions were taken from the parallel MCMC chain; practically the same results were obtained with the SCAM chain): (a), (d) air; (b), (e) ozone; (c), (f) aerosol; (d), (e), (f) relative errors of the parallel MCM method when compared with the true profiles (f) Linear or non-linear prior information can be used in the MCMC algorithm in step (a) without trouble. In step (b), linear priors can be used in the procedure without changing the linear nature of the problem. As in step (b) we are now doing plain Monte Carlo simulation of the posterior distribution of the gas densities, without any Markov chain algorithm, inequality constraints can also be used by simply dropping out those simulations results where the constraints are not met.

High Dimensional Inversion in Remote Sensing 1 This procedure effectively makes the high dimensional problem computationally easy. In the non-linear part we use low dimensional MCMC sampling and only need to evaluate the direct model formula without inversion. With a typical present personal computer the AM run for the spectral inversion, for one line of sight with chain length 10000 takes a few seconds, and, if needed, all runs for one occultation can easily be distributed for separate processors. The high dimensional vertical part is a fast linear operation with negligible CPU demand. Since the GOMOS instrument produces a new occultation data set roughly every 3 min, we see that the parallel MCMC method for the inversion easily meets the CPU time limits that are required for on-line operational use. We shall discuss issues about regularization and prior information below, but we note here that they do not essentially affect the CPU time demands. 5. One-step retrieval The GOMOS retrieval problem can also be solved directly in one step without dividing the problem into two steps as is done in the operational data processing and in the parallel MCMC approach, as discussed above. The one-step inversion of the GOMOS data involves solving a high dimensional inversion problem with expression (2) as the likelihood function (we assume here again that the transmission due to refraction T ref can be estimated as is done in the operational algorithm). The advantage of this approach is that we do not have to assume that the cross-sections for different gases, α gas, which actually depend on atmospheric temperature and therefore also on altitude (z) should be approximated to be constant at each ray path. It is obvious that this approximation is an additional source of error in the inversion. If the problem is solved in the original form (1) such approximations need not be made. The size of the one-step problem, typically around 100000 measurements and 3 unknowns, is problematic for MCMC algorithms and the need for an automatic algorithm creates extra requirements. The tuning of the proposal distribution is essential in obtaining efficient sampling especially in high dimensions. We solved this problem by applying the adaptive MCMC algorithm SCAM, which was described in Section 3. The SCAM algorithm is easily applied to the GOMOS one-step problem. No tuning of the sampling algorithm parameters was required. In the GOMOS problem, rather strong correlations between gases at certain altitudes are known to exist, and we chose to run the SCAM algorithm in two stages. In the first stage, the correlations between parameters were roughly estimated. In the next stage, the sampling was continued in the new co-ordinates given by the correlated directions. The SCAM algorithm was run for chain lengths of 00 and 10000 samples, with very similar results. As an example, Fig. 4 presents a synthetic case including transmission data (1417 wavelengths) from altitudes, altogether 42510 data points. The number of gases included is 3. By discretizing the atmosphere into equally many layers as there are transmission spectra, we end up with a problem with 90 unknown parameters. Gas profiles are calculated with both the parallel MCMC method and with the one-step SCAM method. The error bounds in Fig. 4 are obtained with the parallel MCMC chains. For an illustration of the covariance structure of the estimated profiles, Fig. 5 exhibits the correlation coefficients of all the gas profiles at all heights, as computed from the MCMC chain of the one-step approach. The results are from the same synthetic example as that in Fig. 4. For example, the bottom left-hand corner of Fig. 5(a) shows correlations in air density at different altitudes and the top left-hand corner correlations between air (horizontal), and aerosols (vertical) at different altitudes. The main conclusion is that certain gases correlate with each other at the same heights, whereas the concentrations of each gas correlate (negatively) at neighbouring

2 H. Haario, M. Laine, M. Lehtinen, E. Saksman and J. Tamminen 1 90 Aerosol (61 90) 0.8 80 0.6 0.4 Ozone (31 ) 0.2 0 0.2 0.4 Air (1 ) (low high) 0.6 10 0.8 10 80 1 90 (a) 1.1 1.02 1.05 1.01 Aero 26 km Ozone 24 km 1 1 0.95 0.9 0.99 0.85 0.98 0.98 0.8 0.99 1 Ozone 26 km (b) 1.01 1.02 0.975 1 1.025 1.05 Air 26 km (c) Fig. 5. (a) Correlation coefficients (one-step method computed by the SCAM algorithm; parameter indices 1 refer to air density from low altitudes to high altitudes, 31 to ozone and 61 90 to aerosols), (b) scaled marginal posterior distributions of ozone at two successive layers at 24 and 26 km and (c) scaled marginal posterior distributions of aerosols at 26 km

High Dimensional Inversion in Remote Sensing 3 heights. Figs 5(b) and 5(c) exhibit examples of the situation in detail, as two-dimensional marginal posteriors. Recall that no a priori smoothness was assumed here; regularization would make the off-diagonal bandwidths larger. The example was constructed under the assumptions of the operational algorithm, so the parallel and one-step methods should give essentially the same results. This can indeed be seen in Fig. 4. The small differences in the estimates might be attributed to the relatively short chain length that was used in the high dimensional SCAM chain. The correlation plot computed by using the parallel MCMC approach was practically identical to that of the one-step SCAM chain in Fig. 5, also. This example shows that computing the posterior distributions of the unknown parameters of the original GOMOS one-step problem is possible also in practice. The large dimensionality of the problem does not prevent the full posterior distributions from being solved. By applying the SCAM algorithm no hand tuning is needed (e.g. ad hoc tuning of the proposal distributions to obtain wanted acceptance ratios). The SCAM algorithm can be run in a fully automatic way in the GOMOS one-step inversion also for different stars with varying noise levels. Different methods of regularization can also be implemented in this approach easily; however, if the dimension of the problem rises (e.g. if the discretization grid is more dense), the computing time naturally also grows. A 90-dimensional run with chain length 00 took roughly 90 min on a standard personal computer, so the one-step method cannot be applied to on-line operational use unless considerably faster computers are available. However, the present two-step operational algorithm, as well as the parallel MCMC extension of it, requires that the error structure of the measurements between different altitudes can be assumed independent. Moreover, the cross-section terms in the model were assumed constant (temperature independent) on each altitude. The one-step inversion is free of these limitations, so it can be used as an off-line validation method for the operational retrieval methods. This is important at the moment when the real data from the Envisat satellite, with possible unexpected non-idealities in the error structure, is starting to be available. It is also quite possible that improvements in the scintillation correction lead to a time-dependent noise structure, so the one-step solution becomes crucial. 6. Discretization and regularization Inverse problems are often unstable, whence they are referred to as ill-posed problems. In such a situation stable solutions can be obtained by feeding in additional a priori information. Because of the structure of the present problem (a stable coefficient matrix and highly peaked coefficients on the diagonal), the vertical inversion is actually rather well posed in the operational formulation. However, it becomes unstable as soon as the number of unknowns in the discretization becomes larger than the number of measurements that are available. It is common in the inversion algorithms that we fix the discretization on the basis of the measurement grid. The choice of the grid implies a certain degree of regularization for the problem. This is unsatisfactory, as the regularization should rather be based on available physical knowledge about the unknowns. Moreover, a unique feature of GOMOS is that the sampling resolution varies depending on which direction the star is setting. Stars which are setting far from the orbital plane descend slowly and the sampling resolution becomes very dense, only a few hundred metres. A standard way of applying regularization methods tuning the regularization parameters to achieve satisfactory results would again lead to excessive work, with the smoothing parameters selected separately for each measurement grid and discretization grid. To avoid these difficulties we shall define the regularization in a grid-independent way.

4 H. Haario, M. Laine, M. Lehtinen, E. Saksman and J. Tamminen Let us first fix some notation. The direct problem is discretized by a linear interpolation in a grid of height step z. The unknown gas density values at these grid points are denoted by x i. A natural way to regularize function-valued unknowns is to suppose that the neighbouring discretized values cannot be too different. A way of stating this is to write x i+1 x i = 0 ± " reg i. z/,.9/ where the random variables " reg i are mutually independent, zero mean and Gaussian. To mention another alternative, we may assume that second (or higher) differences of the discretized values are small: 2x i x i 1 x i+1 = 0 ± " reg i. z 3 /,.10/ with again Gaussian error structure. Or we might have a priori information about the actual values of x i and we could limit the deviation around the prior values ˆx i : x i = ˆx i ± " reg i. z 1 /:.11/ The fact that we have included the discretization step z = z i+1 z i in our formulae is interesting. Usually this is not done and it is fairly generally taken as a heuristic rule that the inversion problem becomes ever more unstable as we make the discretization step smaller. If we define the a priori information as above, this is no longer true, and the inversion problem stays essentially uniformly stable independent of the step size z. Intuitively, the explanation is that we are, in a sense, defining the prior for an underlying continuous profile. To be more specific, recall that the Brownian motion process is characterized by continuity and the condition that its increments are centred Gaussian variables that satisfy var{x.t/ x.t /} = constant t t. Moreover, non-overlapping increments must be independent, which means that the two variables x.t/ x.t / and x.t / x.t / are independent, if t>t >t. These equations clearly correspond to condition (9), and one may check that generating random samples with only the prior knowledge (9) does produce samples of discretized Brownian motion. Similarly, if we give a suitable (almost diagonal) covariance structure to the " reg i s, model (10) corresponds to samples from integrals of Brownian motion, and thus from a smoother family of curves. Finally, in equations (11) we have a white noise error structure. Because formulae (9) (11) represent consistent ways to model function-valued random variables such as white noise, Brownian motion and its integral discretely, it is easy to believe that the a posteriori distributions should show some organized limit behaviour as z 0. Related work on the convergence of the a posteriori distributions in a mathematically precise sense can be found in Lasanen (02). Here, our purpose is to demonstrate the methods with numerical experiments with the gas profiles. An earlier practical application was published in D Ambrogi et al. (1999). As an example, suppose that the variances for the mutually independent regularization variables " reg i in equation (9) are of the form var." reg i / = σreg 2 C.z i/ 2,.12/ where C.z/ is a height-dependent function making the strength of the regularization dependent on height and σ reg is the discretization-independent regularization error bar. Then the regularization equations can be expressed in matrix form as j A reg ij x j = 0 ± " reg i,.13/

High Dimensional Inversion in Remote Sensing 5 where A reg. z/ is the two-diagonal matrix producing the first difference. Now the regularization covariance matrix is C reg = zσ 2 reg diag{c.z/2 },.14/ and by well-known inversion formulae the statistical inversion solution for independent and identically distributed measurements m with error variances σm 2 is given by 10 14 10 16 10 18 10 (a) 100 0 100 (d) Altitude (km) Altitude (km) 10 8 10 10 10 12 10 14 (b) 10 0 10 (e) 10 4 10 2 10 0 10 2 Gas density (1/cm 3 ) (c) 100 0 100 Relative error (%) Fig. 6. Regularized inverted gas profiles (the measurement grid is as in Fig. 4; uniform retrieval grid with discretization points): (a), (b), (c) true profiles ( ) and the mean values of the parallel MCMC chain (..., although not visually different from the true profiles), together with 95% probability regions; (d), (e), (f) relative errors of the parallel chain estimates (f)

6 H. Haario, M. Laine, M. Lehtinen, E. Saksman and J. Tamminen ˆx =.A T σm 2 A + AregT Creg 1 Areg / 1 A T σm 2 m,.15/ where the second term in the sum on the left-hand side represents the regularization. Observe that the same formulae hold in all cases (9) (11); just the matrices A reg and C reg change in an obvious manner. In the numerical solutions we vary both the resolution (discretization step size) z and the regularization error bar σ reg. We also let the regularization strength vary as a function of height according to some function C.z/, as we know in advance for example that the ozone concentration will vary more at certain heights than at others. The final selection of C.z/ would require a more extensive geophysical discussion. Here we only show how the grid independence works with a fixed choice for C.z/ resembling the actual expected ozone profile. Let us apply the above approach to the linear part of the parallel inversion, using the same synthetic example as before. Fig. 6 shows the true profiles that were used in the simulation together with the example of regularized inversion. The grey area shows the 95% posterior limits of the estimated profiles. The choice of the regularization in an grid-independent way does not introduce any numerical difficulties into the problem. The final formulae for the solution are the same as would appear using standard first-difference Tikhonov regularization, for example. The new issue discussed here is a proper definition of the regularization parameter that in the discretization takes into account the chosen underlying continuous parameter model in a proper way. Fig. 7 shows two examples: a part of the aerosol density profile from Fig. 6 with two different grids, but fixed (different on the two plots) regularization levels. 32 32 31 31 Altitude (km) 29 28 Altitude (km) 29 28 27 27 26 26 25 0 0.5 1 1.5 2 2.5 3 3.5 aerosol density (1/cm 3 ) (a) 25 0 0.5 1 1.5 2 2.5 3 3.5 aerosol density (1/cm 3 ) (b) Fig. 7. Discretization independence a part of the aerosol density profile, with increasing number of discretization points (, inverted gas profiles calculated without any extra smoothness priors;, 95% posterior limits;, ı, gas profiles with different grids used in the vertical inversion;..., 95% posterior limits, which also coincide for both grids; a prior for both the first and the second differences is used): (a) has larger regularization error bars than (b)

7. Conclusions High Dimensional Inversion in Remote Sensing 7 The data from the GOMOS instrument on board the recently launched Envisat satellite were due to be released for scientific use during 04. The measurements will be used both for local gas profile inversions and for global assimilation purposes. We have presented the principles of the present operational local retrieval algorithm, together with several new, more extensive ways of solving the GOMOS inversion problem. The parallel and the one-step MCMC approaches can both fully solve the inversion problem, with different conditions on the error structure and modelling assumptions. The parallel MCMC approach yields results essentially as efficiently as can be calculated with operational Levenberg Marquardt methods, with the same underlying assumptions. The SCAM algorithm can solve the problem under more general modelling assumptions. Both approaches employ new adaptive MCMC methods, which are necessary for such a complex measurement system with huge masses of data. We also demonstrate how to define the regularization in a way that yields results that are essentially independent of the spacing of the model grid. This helps us to understand the sources of instability in inverse problems and to isolate the choice of calculation grid from the question of how strong the regularization should be. Acknowledgements This work has been supported by the Finnish Academy Mathematical methods and modelling in the sciences project and by National Technology Agency Tekes within the GOMOS project. We thank the GOMOS expert support laboratory teams and especially Erkki Kyrölä, Jean- Loup Bertaux, Alain Hauchecorne, Didier Fussen and Odile Fanton d Andon. Finally we are grateful for the instructive comments of the referees. References Andrieu, C. and Robert, C. P. (01) Controlled MCMC for optimal sampling. Bertaux, J. L. (1999) GOMOS ATBD level 1b. In Proc. Eur. Symp. Atmospheric Measurements from Space, vol. WPP-161, pp. 115 123. Noordwijk: European Space Agency. Bertaux, J. L., Megie, G., Widemann, T., Chassefiere, E., Pellinen, R., Kyrölä, E., Korpela, S. and Simon, P. (1991) Monitoring of ozone trend by stellar occultations: the GOMOS instrument. Adv. Space Res., 11, 237 242. D Ambrogi, B., Mäenpää, S. and Markkanen, M. (1999) Discretization independent retrieval of atmospheric ozone profile. Geophysica, 35, 87 99. European Space Agency (02) GOMOS Product Handbook. European Space Agency. (Available from http:// envisat.esa.int/dataproducts/gomos/cntr.htm.) Gelman, A. G., Roberts, G. O. and Gilks, W. R. (1996) Efficient Metropolis jumping rules. In Bayesian Statistics 5 (eds J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith), pp. 599 8. New York: Oxford University Press. Haario, H., Saksman, E. and Tamminen, J. (1999) Adaptive proposal distribution for random walk Metropolis algorithm. Comput. Statist., 14, 375 395. Haario, H., Saksman, E. and Tamminen, J. (01) An adaptive Metropolis algorithm. Bernoulli, 7, 223 242. Haario, H., Saksman, E. and Tamminen, J. (03) Componentwise adaptation for high dimensional MCMC. Comput. Statist., to be published. Harris, R. A. (ed.) (01) Envisat GOMOS an instrument for global atmospheric ozone monitoring. Technical Report SP-1244. European Space Agency, Noordwijk. Kyrölä, E. (1999) GOMOS ATBD level 2. In Proc. Eur. Symp. Atmospheric Measurements from Space, vol. WPP-161, pp. 125 137. Noordwijk: European Space Agency. Kyrölä, E., Sihvola, E., Kotivuori, Y., Tikka, M., Tuomi, T. and Haario, H. (1993) Inverse theory for occultation measurements, 1, spectral inversion. J. Geophys. Res., 98, 7367 7381. Lasanen, S. (02) Discretizations of generalized random variables with applications to inverse problems. PhD Thesis. University of Oulu, Sodankylä. Tamminen, J. and Kyrölä, E. (01) Bayesian solution for nonlinear and non-gaussian inverse problems by Markov chain Monte Carlo method. J. Geophys. Res., 106, 14377 14390.