Feb 21 and 25: Local weighted least squares: Quadratic loess smoother

Size: px

Start display at page:

Download "Feb 21 and 25: Local weighted least squares: Quadratic loess smoother"

Walter Powell
6 years ago
Views:

1 Feb 1 and 5: Local weighted least squares: Quadratic loess smoother An example of weighted least squares fitting of data to a simple model for the purposes of simultaneous smoothing and interpolation is the quadratic loess smoother. In one dimension, this can be used to smooth, filter or interpolate a time series of values that may or may not be at regular sampling intervals. An application in two or more dimensions could be to produce a gridded analysis or climatology from a set of data observed at irregularly spaced locations and times, such as a set of shipboard hydrographic observations of temperature, salinity, or other ocean properties. The model is a local quadratic function, which can be written in terms of coordinates centered at the estimation point, x o, y o y = a + a ( x x ) + a ( y y ) + a ( x x )( y y ) + a ( x x ) + a ( y y ) 1 o 3 o 4 o o 5 o 6 o which has the design matrix E = 1 xi xo yi yo ( xi xo)( yi yo) ( xi xo) ( yi yo) d1 data d = dn, a1 a = and a 6 The flexibility to arbitrarily choose a linear model and a set of weights means the least squares fitting approach can be tailored to meet a user s notion of what constitutes a rational model based on some a priori knowledge of the processes being observed and modeled. An example of this is the Climatology of the Australian Regional Seas (CARS): Ridgway, K., J. R. Dunn and J. L. Wilkin (00), Ocean interpolation by 4- dimensional weighted least squares: Application to the waters around Australasia, Journal of Atmospheric and Oceanic Technology, 19, (pdf)

2 Additional terms in the model can be included provided they have a form that can be expressed with the design matrix, coefficients a k, and data coordinates (including, e.g. observation times t i ). Length and time scales of variability The quadratic loess smoother requires an a priori choice be made for the scales (usually length or time) to apply in the selection of the smoothing weights. The smoother can be interpreted as a filter, since the linear weighting procedure is effectively implemented as a convolution of the weights with the data. It is more general in the sense that the data do not have to be at regular intervals, because the weights are computed simply as a function of the normalized distance function r. It has been shown, empirically, that the effective cutoff frequency f c the quadratic loess smoother when it is interpreted as a filter is f c L -1 where L is the half width (i.e. the normalization scale) used in the loess smoother. If the loess smoother is to be used to deliberately remove certain scales of variability (i.e. as a filter), then selection of L is straightforward. However, if the objective is to use the smoother to do the best possible job of interpolating gaps in the data, then the smoothing scale should be adapted to the natural length or time scales of variability in the underlying physical process being observed. The weighting can be viewed in terms of the model equations to which we seek the least squares best solution for the parameters a We weight the rows of the matrix equation Ea = d by weights w i which can be summarized as a weighting vector w solved with diag( wea ) = diag( wd ) Ea ˆ = Wd >> a=ehat\(w*d); The classic quadratic loess smoother uses the weighting function: w= r in r < 3 3 (1 ) 1 where r might be normalized one of two ways: (1) r is a normalized Cartesian distance with prescribed smoothing scale L

3 r 1/ x xo y y o = + L L () r is normalized differently for each estimation point x o,y o after finding the distance r max that encloses the nearest N data points (( ) ( ) o o ) * r = x x + y y * * r sort( r ) * rmax = r ( N) * r r = r max Since the data coordinates are transformed to be with respect to the estimation location, x o,y o, the final loess estimate is simply a 1. Two-dimensional spatial mapping using a loess filter is demonstrated in Matlab script jw_lecture_loessd.m Optimal Interpolation / Objective Analysis / Gauss-Markov smoothing John s old scanned notes on OI This brings us to method of optimal interpolation (OI), also known as objective mapping or Gauss-Markov smoothing. [See Emery and Thompson, section 4..] Optimal interpolation estimates the field being observed at an arbitrary location and time through a linear combination of the available data. The weights used are chosen so that the expected error of the estimate is a minimum in the least squares sense, and the estimate itself is unbiased (i.e. has the same mean as the true field). OI is therefore sometimes referred to as the Best Linear Unbiased Estimator (BLUE) of a field. The underlying covariance length and time scales of the data and true field enter into the computation of the linear weights. Important concepts in optimal interpolation: OI produces the best linear unbiased estimate of a field from a set of arbitrarily distributed observations. Central to the estimation procedure is knowledge of the underlying covariance function of between the data and the process being observed (the model-data covariance), and the data being used with themselves (the data-data covariance).

4 This data-data covariance includes an a priori estimate of the uncertainty (error) in the observations. The model-data and data-data covariance patterns should be similar. If the data errors are independent, the error variance simply adds to the diagonal of data covariance matrix. If the data errors are correlated, off-diagonal elements of the data-data covariance matrix would differ from the mode-data covariance, but in practice this is seldom if ever considered. Frequently, the covariance is assumed to be homogenous and isotropic, in which case the covariance becomes simply a function of the distance separating the locations of the data and model (grid) points. If valid, assumptions of homogeneity and isotropy facilitate the estimation of the shape of the covariance function by taking an ensemble of data covariance binned according to spatial and/or temporal lags. The OI method produces an objective estimate of the expected error in the result. The OI technique can be formulated to simultaneously interpolate different but related data types (e.g. winds and geopotential heights) provided a linear relationship between the model and data (e,g. as in the case of geostrophic winds (or ocean currents) computed from geopotential (or sea surface) heights). In the case of geostrophic turbulence, the assumption of isotropy dictates a fixed relationship between the covariance of the individual velocity components and streamfunction. Simultaneously estimating multiple variables has the advantage that known physical constraints (e.g. continuity, geostrophy) can be incorporated into the mapping procedure, thereby producing results that are balanced kinematically and/or dynamically. Examples: Combining altimeter sea surface height observations and velocity observations from sequential satellite imagery. Wilkin, J. L., M. M. Bowen and W. J. Emery (00), Mapping mesoscale currents by optimal interpolation of satellite radiometer and altimeter data, Ocean Dynamics, 5, (pdf). John s old scanned notes Optimal interpolation exploits knowledge of the autocorrelation of a process to determine the relative weight to be given to a set of data (in a weighted sum) to estimate the true field at a certain location (in space and time). The autocorrelation is essentially indicating which data are near and which are far from the estimation point.

5 The problem: Estimate some variable, D, at location(s) x a, t on the basis of a set of neighboring observations (the data) d at locations x b, t. The data are assumed to be observations of the true field with some observational error: dx (, t) = Dx (, t) + nx (, t) The measurement errors n are assumed unbiased, n = 0, and uncorrelated with the field being observed, D. [In practice it is desirable to remove any well resolved (long space/time scales) deterministic signals from the data first so that the interpolation is being applied to a data set with reduced variance. (For example, a seasonal cycle or spatial variability of very long wavelength.) ] We denote Dx ( ) as the estimate of the true value at location x (and time t), and will compute this is a linear weighted sum of the data: T Dx ( ) = Dx ( ) + ( d d) w( x) T = Dx ( ) + w ( x)( d d) where the weights w(x) are not specified (yet), and the dependence on x emphasizes that the weights will be different for every estimate location. The assumption we have unbiased data implies that the mean of the data, d, will be a valid estimate of the mean of the field D. The weights w are selected so as to minimize the expected value of the mean square variance of the error between the linear weighted estimate, Dx ( ), and the true value of the variable being observed, D(x) (Of course, we don t actually know what this true value is if we did we probably wouldn t be bothering with all this.) Therefore, we minimize: ( ( ) ( )) n D x D x = true estimate T T T n = D D D D ( ( d d) w) ( ( d d) w) T T T T T = ( D D) w ( d d)( D D) ( D D) ( d d) w+ w ( d d)( d d) w Here, ( d d)( d d ) T is the data-data covariance matrix which we denote as C.

6 The nd through 4 th terms are of the form T T T AB BA + ACA T T T w ( d-d)( D D) [( d-d)( D D)] w+ w Cw and we denote ( D D)( d-d ) T as the model-data covariance matrix, C md, i.e. this is the covariance of the true field at the estimation location, D(x), with all the data, d, (hence it is a vector the same length as the data). The identity of completing the square for a simple quadratic algebraic equation finds the constants k i, k to rearrange ax + bx + c = a(...) + constant = a( x + k ) + k 1 When completing the square for the matrix equation above it can be shown that this rearranges to: T T T -1-1 T -1 T ACA BA AB = (A BC )C(A BC ) BC B [You can verify this by expanding the line above and simplifying by noting that C is T -1 symmetric, CC =I. ] So it follows that: C=C (because it is a covariance matrix), and therefore ( ) T T T n = ( D D) + ( w C C ) C( w C C ) C C C T md md md md The second term is quadratic, and the expected vale of n is minimized by making this term zero. This gives us the optimal weights w. -1 w C C = 0 or md w = C C md -1 The Best Linear Unbiased Estimate (BLUE) is then T ˆD= D+ w (d-d) = D + C -1 mdc (d-d)

7 In practice, the data-data covariance matrix can be very large and expense to invert. Typically, it has a much larger dimension than the model which would be a grid of coordinates on which we are computing our climatology or analysis. It is more computationally efficient to compute the product of the weights and data directly by solving, in a least squares sense, the problem: by a Matlab matrix left divide: >> ws = Cdd\(d-dbar); * -1 which gives us the product w =C (d-d) and the estimate is then calculated as: * Cw = (d - d) ˆD= D+ C = D + C -1 mdc (d-d) * mdw However, we would still need the data-data covariance inverse to make a formal estimate of the expected error in the analysis. Optimal interpolation example The matlab scripts cov_mercator.m and oi_mercator.m demonstrate fitting a covariance function to a set of synthetic data, and using this function to optimally interpolate to a regular grid. The data used in this example is ocean temperature taken from the French Mercator operational ocean forecast system for the North Atlantic. cov_mercator: The script cov_mercator.m loads the example Mercator snapshot form a mat file, subsamples the data to a small (3%) subset and adds some normally distributed random noise to emulate instrument error (or unresolved high frequency physical variability due e.g. to internal waves in the case of in situ ocean temperature observations). The lon/lat coordinates of the sub-sampled data set are converted to simple x,y, coordinates w.r.t. the southwest corner of the data range, and then the separation distance between all data (r) so that a binned-lagged covariance as a function of r

8 can be computed from the data themselves, i.e. estimate Cr () = d(x)d(x + r) Two functional forms (Gaussian and Markov) for C(r) are fitted to the estimated covariance using Matlab s fminsearch function. Note that the covariance at r=0 is not used in the fit because this includes the effect of the independent observational, or error, variance. r Gaussian: Cr ( ) = s exp( ) a Markov: Cr ( ) = s(1 + r )exp( r/ a) a where s is the signal variance, i.e. the variance of the true field at zero lag. The apparent error variance, e, is calculated from the difference of the data variance at r=0 (i.e. var[data]) and the signal variance as r 0 that is indicated by the y-intercept of the functional fit, i.e. C(0). e s oi_mercator: Using the fitted Markov covariance function, normalized model-data (C md ) and data-data (C dd ) covariance matrices are computed. The data-data is augmented on the diagonal with the ratio of error to signal variance. The optimal interpolation fit to the data is computed by direct inversion of C dd and also by the Matlab matrix-left-divide operation to compute the product C dd d directly.

9 Expected errors of the OI are calculated, and qualitatively compared to the actual residuals of the fit to show that, as expected, approximately 65% of the residuals fall within the expected errors. If the data set (N points) is large, the matrix sizes may get too large for practical handling in Matlab (or any other language) because the OI problem has to solve matrix inversions or simultaneous equation solutions of dimension NxN. This may be handled by dividing the model grid into subdomains, like tiles, with subsets of the data limited to only those points that fall within the model tile plus a halo region around the tiles. The halo region should be at least one covariance scale wide to ensure smoothness at the tile boundaries. The data-data matrix C dd will have to be computed anew for each tile, but computing e.g. order(10) OI operations for order(n/10) data elements may be faster than one OI for order(n) data elements. This is because the computational efforts of the matrix operations scale with N 3. If there are too many data within a few covariance scales to practically invert C dd, then it is likely that there are more data than necessary to resolve the mapped field. This indicates that a shorter covariance scale can probably be used. Alternatively, it is probably safe to decimate the data (just use less of it, thereby making N smaller) or average the observations in small bins. For independent errors, the binning step will reduce the expected error (the noise variance) of the binned values, and this information can be carried through the analysis. Expected errors A posteriori testing of error estimates can be done to see whether the proportion of residuals within the expected errors is statistically consistent. More in depth tests would compare the results to a set of independent data, such as from another instrument, or data withheld from the OI itself. See Walker and Wilkin (1998) for an example of checking the validity of error estimates though a Chi-squared test. The expected error is computed in the demonstration script oi_mercator.m The expected error in the analysis, or estimate, is given by e ( x ) = s c C c k md -1 T md where s is the variance of the signal (the true solution) and c md is the covariance of the model (at location x k ) with the data d, i.e. it is the k th row of C md. The vector of all estimated errors is = diag( s ) -1 T e I cmdc c md The optimal interpolation analysis of the data therefore states that our best linear unbiased estimate of the true signal is ˆD± e.

10 If we were able to make some independent analysis of the error in ˆD, such as we actually fabricated the data to test the method as in the oi_mercator.m script, then we expect for about 68% of the estimates the true value D would fall within ˆD± e. From the equation for e we see that the maximum the expected error can be is simply the signal variance. This occurs when there are no data within a few covariance scales of the estimation location and c md is 0. In this case our best estimate is just the background field an our uncertainty is the full variance of the signal basically the OI is unable to help inform us. If we have some data close (in terms of covariance scale) to the estimation location, and c md is greater than zero, then the expected error is less than the signal variance and OI has helped us. If the data error variance is small, the diagonal elements of C -1 are large, and this would further decrease the expected error. So having better quality data improves the skill of the estimate.

The minimisation gives a set of linear equations for optimal weights w:

The minimisation gives a set of linear equations for optimal weights w: 4. Interpolation onto a regular grid 4.1 Optimal interpolation method The optimal interpolation method was used to compute climatological property distributions of the selected standard levels on a regular