Application of Principal Component Analysis to TES data

Size: px

Start display at page:

Download "Application of Principal Component Analysis to TES data"

Isaac Woods
5 years ago
Views:

1 Application of Principal Component Analysis to TES data Clive D Rodgers Clarendon Laboratory University of Oxford Madison, Wisconsin, 27th April

2 My take on the PCA business 2/41

3 What is the best estimate of a spectrum given many measured spectra? Simple approach using optimal estimation: Forward model is y=x+ ε x is true spectrum, y is measurement, ε is noise Maximum a posteriori estimate of x is x r = x a + S a (S a + S ε ) -1 (y - x a ) 3/41

4 Continued If we have a large sample of spectra: Expect that x a = <x + ε> = <y> Can estimate S a +S ε from statistics of y Should have a good idea of S ε a priori But: S a (S a +S ε ) -1 will be a large matrix S a found from S a +S ε and S ε is likely to be ill-conditioned S a is likely to have a small number of eigenvalues greater than noise 4/41

5 Singular Vectors (or Principal Components) Let the ensemble of (y -<y>) be N columns of a matrix Y Represent Y as its singular vector decomposition: Y = UΛV T where Λ is diagonal, U T U=I and V T V=I The j th individual spectrum y j is then y j = <y> + Σ i u i λ i v ij T The spectrum is represented as a sum of columns u i of U, with coefficients λ i v ijt. Because U T U=I, we can compute λ i v ijt for any spectrum as U T (y j -<y>). 5/41

6 What do we expect? 1 / N YY T is the covariance matrix S y of the spectra Left singular vectors U are the same as eigenvectors of YY T, singular values are the square roots of its eigenvalues In the linear case with independent constant noise, S y would be S y = S a + σ ε 2 I S a has rank n, I is of dimension m >> n, where n is the degrees of freedom of the atmosphere The eigenvalues of S y are λ i2 /N The eigenvalues of S a should be λ i2 /N - σ ε 2 6/41

7 Reconstructing Spectra We can drop terms with λ i2 /N ~ σ ε2, i.e. i > n, without significant loss they correspond to noise only Better, multiply retained terms by something like (λ i2 - N σ ε 2)/ λ i 2 So spectra can be reconstructed from the first few coefficients. The noise can be reconstructed from the rest Reconstructed spectra have much reduced noise 7/41

8 Information Content The Shannon information content of a single spectrum relative to the ensemble is H = 1 / 2 i ln(λ i 2 /N σ ε2 ) The degrees of freedom for signal is d s = i (1 - N σ ε2 /λ i2 ) But reality isn t quite like that... 8/41

9 Tropospheric Emission Spectrometer Connes-type four-port Fourier transform spectrometer Nadir view: 0.06 cm -1 Limb view: cm element detector arrays in 4 focal planes: 1A ( ), 1B ( ), 2A ( ), 2B ( ) 0.5 X 5 km nadir; 2.3 X 23 km limb Global Survey mode: Along track nadir, 216 views/orbit. 9/41

10 SVD Applications Information content analysis Validation Instrument performance Forward model De-noised spectra Retrieval? 10/41

11 A typical example Run Sept 2004? Channel 1B cm views * 16 detectors 11/41

12 An example of Information Content from Channel 1B2 Using one orbit, 1152 spectra each with 3951 elements cm -1 n Lambda Information d.f.s cum. cum Total: 14.5 bits 7.05 d.f. Sing. Vals. Residual Fit to noise asymptote SV number /41

13 How do you fit noise to the asymptote? I havn t used enough spectra Expected distribution of singular values of independent gaussian noise for a finite number of samples Simulation Theory: there is an explicit formula Fit expected distribution to the long tail 13/41

14 Validation an early example Run Sept 2004? Channel 1B cm views * 16 detectors 14/41

15 Singular Vectors 15/41

16 Singular vectors * Lambda 16/41

17 V-vectors 17/41

18 Features Most of variation is in the first singular vector. First six are: Data spikes - identified Data spikes - unidentified Pixel-dependent variation in the spectra Scan-direction dependence 18/41

19 Singular Vector 6 Systematic variation across the detector array Must be an artefact Suggests systematic error in ILS How is it related to mean spectrum? Least squares fit to find function that when convolved with mean spectrum gives SV6 19/41

20 SV6 20/41

21 SV6 21/41

22 SV6 Suggests the derivative of the ILS 22/41

23 Curious behaviour of the singular values of an ensemble of B1 spectra What is this? This is the atmospheric variability cm -1 23/41

24 Features It occurs at around SV 200. I have used 100 sequences, 1600 spectra. Implies two of these SV s per sequence. Confirmed by trying other numbers of sequences The vectors look like noise Only in 2B1 does this 24/41

25 With 64 sequences, plotted in 9/04 The shoulder here is at 64 there was only 1 per sequence then 25/41

26 Examine pixel noise Look at the spectrum for each pixel minus the mean over all pixels removes atmospheric structure, but differences will be slightly correlated Singular vectors of the difference: look like noise Coefficients of the singular vectors for one set of 16: don t look like noise for two of the SV s Covariance & correlation matrices 26/41

27 SV coefficients 27/41

28 2B1 noise is significantly correlated between pixels 28/41

29 Deductions The change from 1 to 2 SVs per sequence implies something in the processing rather than the instrument Each pixel has its own independent noise Plus noise correlated between all the pixels of each sequence The correlations depend on distance between the pixels, quasi-sinusoidally Not an odd pixel/even pixel effect 29/41

30 SV s of Residual Spectra Difference between measured spectrum and calculated from retrieval No just in the microwindows used Courtesy Reinhard Beer & Susan S. Kulawik 30/41

31 1.0e-7 Mean Residual Radiance for Run 2147 (2004 September 21) from 220 Spectra between 30 o S & 30 o N 5.0e-8 Residual Radiance, watts/cm 2 /sr/cm e-8-1.0e-7-1.5e-7-2.0e-7-2.5e-7 Residual O 3 Residual Silicate Surface Feature "Over-fitted" HDO Unmodeled F12 Residual CH 4-3.0e-7 Retrieval Microwindows -3.5e Frequency, cm -1 31/41

32 Singular Values of Run 2147 Full-Filter Residuals ("W" Matrix Diagonal; First 51 Elements of 568 only) 3.0e-5 2.5e-5 2.0e-5 Singular Value 1.5e-5 1.0e-5 5.0e Singular Value Number 32/41

33 0.08 U Singular Vector Amplitude The quasi-periodicity is due to the orbital variation of total radiance Index (Time order) V Singular Vector Amplitude Note the strong resemblance to the grand average Frequency, cm -1 33

34 0.15 U Singular Vector 1 Amplitude This pattern of roughly orbits has been observed in other diagnostics. The cause is under investigation Index (Time order) V Singular Vector 1 Amplitude The discontinuity at 1120 cm -1 occurs at the overlap of two different filters Frequency, cm -1 34/41

35 Amplitude U Singular Vector 2 Positive peaks align with land scenes (green bars) Index (Time order) Amplitude V Singular Vector Broad feature is a component of the unmodeled surface silicate emissivity Frequency, cm -1 35/41

36 Retrieval some untested thoughts Possible methods include: Retrieve from denoised spectrum in the same way as usual, but with noise covariance reduced in some ad-hoc way not optimal, inefficient Ditto, but with the correct error covariance covariance matrix is singular Select a subset of channels according to information content straightforward if linear, not otherwise Retrieve from SV representation coefficients needs a special forward model for efficiency PCRTM or a regression or neural net method 36/41

37 Characterising a denoised spectrum Reconstruct y with the first r singular vectors, U r, giving y r y r = U r U rt y Measurement function is: y r = U r U rt F(x) + U r U rt ε We assume that U r U rt F(x) ~ F(x), so y r = F(x) + U r U rt ε = F(x) + ε r Covariance of ε r is S εr = U r U rt S ε U r U r T which is of rank r 37/41

38 Retrieve from a denoised spectrum We can use all or some of the spectrum. Microwindows Channels selected by information content Channels selected by Xu Liu s approach Define a selection operator M, and retrieve from the m element subset y m = My r The subset depends on the whole of the original spectrum, y m = MU r U rt y, but we assume that we can model it as MF(x). The error covariance of ε m is S m = M U r U rt S ε U r U rt M T which may be singular if m > r. 38/41

39 Do another SVD Use the SVD of MU r U rt = LΛR T to give L T y m = L T MF(x) + ΛR T ε Drop elements of L T y m with zero (or small) singular values, leaving p=min(r,m) elements at most. Gives L pt y m Retrieve from the rest of L pt y m with L pt MF(x) as the forward model. L pt can be precomputed. Error covariance is Λ p R pt S ε R p Λ p. If S ε = σ 2 I, this reduces to σ 2 Λ p2. 39/41

40 Comments This allows us to retrieve with the minimum number of evaluations of the forward model And the minimum length of measurement vector With very little extra matrix manipulation I havn t tried it yet, so I don t know what the catch is 40/41

41 Summary/Comments PCA does not improve the information content of a measurement But it does allow us to use it more efficiently Very useful for validation - separates out independent sources of variation, e.g. artefacts, omitted physics,... Denoised spectra guide the eye to real features that may otherwise not be seen Data compression 41/41

Inverse Theory. COST WaVaCS Winterschool Venice, February Stefan Buehler Luleå University of Technology Kiruna

Inverse Theory. COST WaVaCS Winterschool Venice, February Stefan Buehler Luleå University of Technology Kiruna Inverse Theory COST WaVaCS Winterschool Venice, February 2011 Stefan Buehler Luleå University of Technology Kiruna Overview Inversion 1 The Inverse Problem 2 Simple Minded Approach (Matrix Inversion) 3