Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation, Department o Geodesy and Geomatics Engineering, Technical Report No. 282, University o New Brunswick, Fredericton, New Brunswick, Canada, 208 pp. 1. LEAST-SQUARES SPECTRAL ANALYSIS Least Squares Spectral Analysis is a powerul sotware developed at the University o New Brunswick, Fredericton. LSSA was irst developed by Vaníček (1968, 1971) as an alternative to the classical Fourier methods (Pagiatakis, 1998). It has been revised by Wells, Vaníček and Pagiatakis in 1985 in order to eliminate certain bugs and to be more versatile (Wells, 1985). In this study version LSSA v. 5.02 is used developed by Spiros Pagiatakis. LSSA has several advantages over the commonly known Fourier transorm. Table 1 summarizes these and also the limitations o the Fourier transorm based on Pagiatakis [1998, 2008]. Table 1. Advantages o LSSA in comparison with the classical Fourier transorm Time series with unequally spaced values can be analyzed without preprocessing Time series does not have to be continuous (allowance o data gaps) Time series with an associated covariance matrix can be analyzed Time series can have datum shits and trends - 1 - LSSA FOURIER TRANSFORM
No limitations or the length o the time series Higher accuracy achieved with longer time series Statistical testing on the signiicance o spectral peaks can be perormed Perhaps the most essential advantage o LSSA is the allowance o data gaps in the observed time series, what is common in GPS pseudorange and carrier-phase observations. The original time series can be directly used without interpolating or adding artiicial data to ill the data gaps, hence not degrading the results o the spectral analysis. LSSA also permits to add bigger weights to statistically more signiicant observations and smaller weight to the other ones. This can be useul in geodetic applications when analyzing pseudorange and carrier-phase observations, assigning more weight to the more precise carrier-phase and less weight to pseudorange observations. Let us discuss the LSSA and the other great properties which this sotware oers in the next section. 4.1 Theory o signal and noise The observed time series can be considered to be composed o signal, what we are interested to study, and noise which obscure the original signal. The noise can be random or periodic. Idealizing, we can think about the random noise as a white noise what is uncorrelated, has a constant spectral density, and it may or may not have a Gaussian distribution. Usually in practice we deal with a non-white random noise. Systematic noise is a noise what can be described by a certain unctional orm. It can be periodic (colored) or non-periodic. Non-periodic noise can be datum-shits (osets) and trends (linear, exponential, quadratic, etc.), and it causes the statistical properties o the series to be a unction o time (Pagiatakis, 1998). - 2 -
4.2 Mathematics o the least-squares spectrum Let us consider a discrete time series (ti) in Hilbert space, (H) where ti is a vector o observation times and i 12,,, m, m is the number o data points in the time series. We assume that a ully populated covariance matrix C is available or the time series. One o the main obectives o LSSA is to detect unknown periodic signals in, especially when contains systematic variations o unknown magnitude whose unctional orms are known. Ater Vaníček (1971) we can call these known constituents systematic noise since their presence is nuisance rom the point o view o detecting the unknown periodic signal (Vaníček, 1981). The idea is to approximate with another unction g using the least squares principle, so that the dierences between and g (the vector o residuals r) are minimum in the least-squares sense. Thus the time series can be modeled by g as ollows (Pagiatakis, 1998): g Φ x, (9) where x is the vector o unknowns and is the design matrix (matrix o normal equations) which consist o several column vectors: 12,,m] called based unctions, each o which is a know unction o the same dimension as (Wells, 1985). Matrix speciies the unctional orm o both signal and (systematic) noise. Using the standard least-squares notation we can write or the residual series (Pagiatakis, 1998). - 3 -
T 1 C 1 Φ Φ T C 1 ˆ r gˆ Φ Φ. (10) In equation (10), ĝ is the orthogonal proection o onto the subspace S (rom Hilbert space) generated by the column vector o (see Figure 1 or illustration). It ollows rom the proection theorem that ˆr gˆ. This means that has been decomposed into a signal ĝ and noise rˆ. Figure 1. Orthogonal proection o vectors ĝ and rˆ In Figure 1 the red arrow represents the orthogonal proection o ĝ into. Then the inner product (dot product) o these two vectors is:, ˆg ˆg cos (11) and the orthogonal proection is then:,gˆ ˆg cos. (12) - 4 -
From equation (11) one can conclude that i the angle between ĝ and is 0 degrees, then ĝ ully represents. Thus the ratio o the length o the orthogonal proection multiplied by 100% and the length o tells how much the least-squares approximation o g represents in terms o percent. The ratio is smaller than unity and can be expressed as ollows: s, gˆ, gˆ 2 T C 1 gˆ T C 1 (13) where s is the least-squares spectrum in percentage ( 0 s 1). The larger the s (closer to 1) the better is the least-squares it to the data (Pagiatakis, 2008). In spectral analysis it is usual to search or periodic signals which can be expressed in terms o sine and cosine base unctions. Speciically or LSSA we know (t) and we use: Φ cos t i,sin t i (14) where is a spectral requency or which spectral values spectral requencies can be written as: orthogonal proection o onto S will be dierent or each independently rom the rest (Pagiatakis, 1998). For each compute: s are desired. The vector o where 12,,, k (Wells, 1985). The and each is tried or which we want s we - 5 -
g( 1 2 ) xˆ cos t xˆ sin t, (15) where xˆ 1 ˆx is determined rom (Wells, 1985): xˆ 2 T 1 1 T 1 Φ C Φ Φ C ˆ x. (16) Then the LSS is deined by T 1 C gˆ, 12,,, k. (17) T C s 1 When calculating the LSS there is a simultaneous least-squares solution or the parameters o the process. This is a rigorous approach to the hidden periodicities: the parameters o the assumed linear system driven by noise are determined simultaneously with the amplitudes and phases o the periodic components, and with other parameters that describe systematic noise (Pagiatakis, 1998). For example the approximating unction can be a sine wave with known requency: g acos( 2 ) (18) where a is the amplitude and is the phase or a speciic requency which gives the best approximation (least-squares it) to. For dierent requencies the LSS s varies (s is a unction o - 6 -
requency) and so it becomes the representation o how well the sine wave its the data. The time series is best approximated when s equals to 1 or a certain predeined requency (Pagiatakis, 2008). 4.3 Least-squares spectrum evaluation Table 1 also lists that LSSA allows perorming statistical testing on the signiicance o spectral peaks. This is very important since we need to know which peak is statistically signiicant and which one can be suppressed. Vaníček [1971] derived the expected (mean) spectral value o white noise in the LSS or series consisting o statistically independent random values. He pointed out the possibility o deriving magnitudes (threshold values) above which spectral peaks are statistically signiicant (Pagiatakis, 1998). Postulating that series has been derived rom a population o random variables ollowing the multidimensional normal distribution n(0, C). This null hypothesis (H0) implies that i: v 2Fv,, Qn Qs 2 (19) then the series will contain statistically insigniicant noise. In equation (18) Q is a random variable with subscripts n and s reerring to signal and noise, respectively; v is the degree o reedom and F stands or F-distribution and is the signiicance level. The alternative hypothesis H1 is: v 2Fv,, Qn Qs 2. (20) - 7 -
It makes sense to use only the lower tail end o F, since large values o the ratio Q n Q s (upper tail o F) imply the signal is signiicantly larger than the noise and hence can not be detected. Then the statistically signiicant spectral peaks will satisy the ollowing inequality (Pagiatakis, 1998): 1 v s 1 F v, 2, 2. (21) Once a statistically signiicant signal is detected it should be related to the physics o the observed system. Once it is understood the detected signal becomes systematic noise and then urther hidden periodicities can be searched (Vaníček, 1981). 4.4 Input and output parameters or LSSA The input parameters to compute the spectrum consist o the time series, the limits and density o the spectral band to be produced and the known constituents base unctions (representing the systematic noise unctional orms) (Vaníček, 1985). The input ile, which contains the userdeined parameters, is called lssa.in. There are eight blocks o parameters that need to be speciied or the analysis o the series beore each run. Let us shortly describe the parameters to be deined in lssa.in (Vaníček, 1985). First, the user must enter the name o the ile to be analyzed, or example data_ser.dat, the number o data points, the units o time o the series and the units o the values o the series. The - 8 -
irst line in the data_ser.dat is a text line containing the name o the proect, ollowed by three columns. The irst column stands or the time o the series (years, days, seconds, etc.), the second column is the time series itsel and the third column represents the standard deviation o the time series i known. I the standard deviations are unknown the user can replace them with values he/she may think are reasonable. The units o the standard deviations are the same as the units o the time series. Next, the user can identiy the known constituents in the time series according to his/her best knowledge. There are our optional types o known constituents build in the algorithm. The sotware makes it possible or the user to speciy his/her own unctions as well. The known constituents build in LSSA are the ollows: Random constant (Datum Shits). The user can speciy the number o datum biases in the time series and the epochs when the shits are suspected. The minimum number o shits is one which represents the beginning o the time series. Linear trend. The user can decide whether to calculate linear trend or not. Periodic signals. Certain periodic constituents can be orced (itted) to the series. The result is a sine wave given by its amplitude and phase, along with a statistical test on its signiicance. The user can enter the number o sine waves to be orced, the periods and names o each sine wave. User speciied I the user believes that some nonlinear trend (exponential, polynomial) exists he/she can speciy it here. In general, a polynomial it o maximum order 5 can be achieved. - 9 -
The other parameters to be deined in lssa.in are: Processes Certain random processes can be tried on the series. I an autoregressive (AR) process up to the order 5 is suspected then the user should switch to 1. In this case the coeicient o the AR process along with a statistical test is calculated. The latter indicates on output whether it is signiicant or not. In addition a statistical test is perormed on the calculated coeicient to speciy whether it is dierent rom unity (random walk). I the coeicient is statistically equal to unity then it is a random walk. Characteristics o series This block gives inormation about the covariance o the series. Only a diagonal covariance matrix is accepted. I the a-priori variance actor o the input series is known the user should switch to 1. I so, the values o the series are considered equally weighted and the weight is set to unity. Otherwise the weight o each value o the series is calculated as the inverse o the variance. As mentioned above the standard deviations o the time series are given in the 3 rd column o the input series. Spectrum characteristics The user can deine the output spectrum, as well as the number o spectral values in each band. The user should keep in mind that the spectrum is band limited between the undamental requency and the Nyquist requency. Statistics The user can deine the critical level or statistical testing. Usually values =0.05 or=0.01 are used or critical levels to detect signiicant peaks in spectrum. - 10 -
Ater executing LSSA using the input ile lssa.in with properly deined parameters, several output iles are created. These are: lssa.out, residual.dat, spectrum.dat and hist.dat. Lssa.out contains the summary o results together with the statistical tests. The spectrum is described in three dierent orms: percentage variance (least-squares spectrum), power spectral density in db (what is identical with the Fourier transorm when the series is equally spaced), and power spectral density in unit 2 /requency (where unit is the unit o the time series e.g. cycles/day etc.). The resulting spectrum is given in six columns which list the period o spectrum in units o the time series, requency in cycles/unit time, idelity (in units o time). I a signiicant peak is present in the spectrum it shows up in this column. The 4 th column stands or the least-squares spectrum in percentage variance, next ollows with the power spectral density in db and last, the power spectral density in unit 2 /requency is given. The spectrum is also printed with asterisks or quick identiication o peaks Residual.dat lists the input and residual series. There are our columns which contain the time, the input time series, normalized residual series (ater removing trends, periods, processes, etc.) and the standard deviation o the residuals. - 11 -
Spectrum.dat contains the aorementioned 6 columns o the output spectrum as described in lssa.out. This ile can be easily exported into Matlab or other mathematical package used or plotting or any other urther analysis. Hist.dat contains the histogram o the normalized residual series. - 12 -