Spectral Analysis of Non-Uniformly Sampled Data: A New Approach Versus the Periodogram

Size: px

Start display at page:

Download "Spectral Analysis of Non-Uniformly Sampled Data: A New Approach Versus the Periodogram"

Ralph Lamb
6 years ago
Views:

1 IEEE TRANSACTIONS ON SIGNAL PROCESSING 1 Spectral Analysis of Non-Uniformly Sampled Data: A New Approach Versus the Periodogram Petre Stoica, Fellow, IEEE, Jian Li, Fellow, IEEE, Hao He Abstract We begin by revisiting the periodogram to explain why arguably the plain least-squares periodogram (LSP) is preferable to the classical Fourier periodogram, from a data-fitting viewpoint, as well as to the frequently-used form of LSP due to Lomb and Scargle, from a computational standpoint. Then we go on to introduce a new enhanced method for spectral analysis of non-uniformly sampled data sequences. This new method can be interpreted as an iteratively weighted LSP that makes use of a data-dependent weighting matrix built from the most recent spectral estimate. This approach is referred to as the real-valued iterative adaptive approach (RIAA). LSP and RIAA are non-parametric methods that can be used for the spectral analysis of general data sequences with both continuous and discrete spectra. However, they are most suitable for data sequences with discrete spectra (i.e., sinusoidal data), which is the case we emphasize in this paper. In the latter type of data, we present a procedure for obtaining a parametric spectral estimate, from the LSP or RIAA non-parametric estimate, by means of a Bayesian information criterion (BIC). The use of BIC for the said purpose can be viewed as a way of testing the significance of the dominant peaks of the LS or RIAA periodograms, a problem for which there was hardly any satisfactory solution available. Manuscript March 008. This work was supported in part by the Swedish Research Council (VR) and the National Science Foundation under Grant No. CCF and ECCS Petre Stoica is with the Department of Information Technology, Uppsala University, Uppsala, Sweden. Phone: ; Fax: ; ps@it.uu.se. Jian Li is with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL , USA. Phone: (35) 39-64; Fax: (35) ; li@dsp.ufl.edu. Please address all correspondence to Jian Li. Hao He is with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL , USA. haohe@ufl.edu.

2 IEEE TRANSACTIONS ON SIGNAL PROCESSING Finally, we discuss a possible strategy for designing the sampling pattern of future measurements, based on the spectral estimate obtained from the already available observations. To compare the performances of LSP and RIAA, as well as of their parametric versions proposed in the paper, and to illustrate the type of results that can be obtained with the sampling pattern design strategy presented here, we use both simulated and measured data (the latter are radial velocity data for an extrasolar star). Index Terms Periodogram, Non-uniformly sampled data, Real-valued, Lease-squares, Iterative adaptive approach, Spectral estimation, BIC. I. INTRODUCTORY DISCUSSIONS AND THE PERIODOGRAM A. The Data Let {y(t n )} N denote the sequence of observations whose spectral analysis is our main goal. We assume that the observation times {t n } N are given, that y(t n ) R (n = 1,..., N), and that a possible non-zero mean has been removed from {y(t n )} N, so that N y(t n ) = 0. We will also assume throughout this paper that the data sequence consists of a finite number of sinusoidal components and of noise, which is a case of interest in many applications. Note that, while this assumption is not strictly necessary for the non-parametric spectral analysis methods discussed in this paper, these methods perform most satisfactorily when it is satisfied. B. The Fourier Periodogram by: The classical Fourier transform-based Periodogram (FP) associated with {y(t n )} N is given P F (ω) = 1 N y(t n )e jωtn, (1) where ω is the frequency variable and where, depending on the applications, the normalization factor might be different from 1/N. It can be readily verified that P F (ω) can be obtained from the solution to the following least-squares (LS) data fitting problem: ˆβ(ω) = arg min β(ω) P F (ω) = ˆβ(ω), y(t n ) β(ω)e jωtn. ()

3 IEEE TRANSACTIONS ON SIGNAL PROCESSING 3 Because {y(t n )} N R, the LS criterion above can be rewritten as (below β(ω) = β(ω) e jφ(ω) ): [y(t n ) β(ω) cos(ωt n + φ(ω))] + β(ω) N sin (ωt n + φ(ω)). (3) Minimization of the first term in Eq. (3) makes sense, given the sinusoidal data assumption made previously. However, the same cannot be said about the second term in Eq. (3), which has no data fitting interpretation and hence only acts as an additive data-independent perturbation on the first term. C. The LS Periodogram It follows from the discussion in the previous subsection that in the case of real-valued (sinusoidal) data, considered in this paper, the use of FP is not completely suitable, and that a more satisfactory spectral estimate should be obtained by solving the following LS fitting problem: min α 0 φ [0,π] [y(t n ) α cos(ωt n + φ)], (4) (we omit the dependence of α and φ on ω, for notational simplicity). Using we can re-parameterize the LS criterion in terms of a and b: min a,b a = α cos φ and b = α sin φ, (5) [y(t n ) a cos(ωt n ) b sin(ωt n )]. (6) The solution to the minimization problem in Eq. (6) is well known to be: â = R 1 r, (7) ˆb where and R = cos(ωt [ n) cos(ωt n ) sin(ωt n ) sin(ωt n ) ], (8) r = cos(ωt n) y(t n ). (9) sin(ωt n )

4 IEEE TRANSACTIONS ON SIGNAL PROCESSING 4 The power of the sinusoidal component with frequency ω, corresponding to (â, ˆb), is given by: 1 [ ] â N ˆb cos(ωt n ) = 1 [ ] â sin(ωt n ) N ˆb R â ˆb The LS periodogram is accordingly given by: (where we have re-instated the dependence on ω, for clarity). = 1 N rt R 1 r. (10) P LS (ω) = 1 N rt (ω)r 1 (ω)r(ω), (11) The LSP has been discussed, for example, in [1] [6], under different forms and including various generalized versions. In particular, the papers [4][6] introduced a special case of LSP that has received significant attention in the subsequent literature. To describe this special instance of LSP, let us re-parameterize Eq. (4) as follows: min α 0 φ [0,π] where φ is chosen as the solution to the equation: [y(t n ) α cos(ωt n + φ + φ)], (1) cos(ωt n + φ) sin(ωt n + φ) = 0, (13) ( φ can be obtained in an explicit form, see [4][6]). Re-parameterizing Eq. (1) in terms of ã = α cos( φ) and b = α sin( φ), and making use of Eq. (13), we obtain (similarity to Eq. (7)): ˆã = ˆ b R 1 r, (14) where and R = cos (ωt n + φ) 0 0 sin (ωt n + φ), (15) r = cos(ωt n + φ) sin(ωt n + φ) y(t n ). (16) The power of the sinusoidal component with frequency ω, corresponding to ˆã and ˆ b above, of course is identical to that associated with â and ˆb in Eq. (7) (i.e., it is given by Eq. (11)), as the best LS sinusoidal fit to the data is unique. Then the question is whether the form of LSP

5 IEEE TRANSACTIONS ON SIGNAL PROCESSING 5 obtained from Eq. (14), i.e., r(ω) R 1 (ω) r(ω), is preferable to the plain LSP in Eq. (10) on computational or any other grounds. In the words of [4], for numerical work it is simpler to use Eq. (7) than Eq. (14). We basically agree, as we briefly explain next. Eq. (7) requires the inversion of a matrix, whereas this is not needed in Eq. (14) where the matrix is diagonal; however, Eq. (14) requires the computation of φ (note that φ depends on ω), which may be slightly trickier than the inversion of the matrix in Eq. (7). More importantly, the elements of the vector r in Eq. (7) are equal to the real and imaginary parts of N y(t n )e jωtn. There are fast algorithms, similar to the FFT, termed nonuniform FFT (NFFT) that can be used to compute the latter quantity and therefore the vector r (see, e.g., [7][8][9]). The same algorithms can be used to evaluate the vector r appearing in Eq. (14), which is given by the real and imaginary parts of e j φ(ω) y(t n )e jωtn, (17) but at a computational cost a bit larger than that required for r. To conclude this point, Eq. (7) may indeed be slightly more preferable than Eq. (14) from a computational standpoint (despite the fact that Eq. (14) avoids the matrix inversion operation). The suggestion in both [4] and [6] was that, again in the words of [4], the use of Eq. (14) would facilitate the statistical description of the LSP. We do not find this suggestion to be justified. Indeed, the statistical analysis in either [4] or [6], which used Eq. (14) to show that NP LS (ω) χ () (i.e., that NP LS (ω) is distributed as a chi-square random variable with degrees of freedom) under the assumption that {y(t n )} N is a normal white noise sequence with unit variance, is not simpler than the following proof of the same result, based on Eq. (7). To prove the result under discussion, first observe that under the assumption made on {y(t n )} N, the vector r is normally distributed with mean zero and covariance matrix equal to R: r N (0, R). This observation implies that (here R 1 is a symmetric square root of R 1 ): R 1 r N (0, I), (18) from which it follows at once that NP LS (ω) = R 1 r χ (), (19) (for any given ω, and under the normal white data assumption).

6 IEEE TRANSACTIONS ON SIGNAL PROCESSING 6 In view of the above argument, we tend to recommend the use of the plain LSP in lieu of its version in [4] and [6]; this is in contrast to what has been usually done so far in the literature where the latter periodogram was favored on unclear grounds, sometimes even missing the fact that the final result is the same as that obtained with the plain LSP. D. Spectral Window and Frequency Range Consider a sinusoidal component with frequency ω, different from ω, that might be present in the data: ā cos( ωt n ) + b sin( ωt n ), n = 1,..., N, (0) (if no such component exists in the data, then ā = b = 0). The effect of Eq. (0) on the LS estimate in Eq. (7) is to introduce an error term, or leakage, given by: N R 1 cos(ωt [ ] n) ā cos( ωt n ) sin( ωt n ). (1) sin(ωt n ) b In general, Eq. (1) is a function of both ω and ω that is relatively difficult to deal with. To simplify, we make use of the fact that, asymptotically in N and for non-pathological sampling patterns, R N I, () and cos(ωt [ ] n) ā cos( ωt n ) sin( ωt n ) sin(ωt n ) b ā cos(ωt n) cos( ωt n ). (3) b sin(ωtn ) sin( ωt n ) Next, assuming that ω and ω are not both very close to zero, we have that N cos(ω+ ω)t n 0, which implies that cos(ωt n ) cos( ωt n ) = 1 [cos((ω ω)t n ) + cos((ω + ω)t n )] 1 cos((ω ω)t n ), (4)

7 IEEE TRANSACTIONS ON SIGNAL PROCESSING 7 and sin(ωt n ) sin( ωt n ) = 1 [cos((ω ω)t n ) cos((ω + ω)t n )] 1 cos((ω ω)t n ). (5) The right-hand side in the above two equations can be further approximated as: 1 cos((ω ω)t n ) = 1 [ N ] Re e j(ω ω)tn 1 e j(ω ω)tn. (6) Inserting Eq. (4) (6) into Eq. (3) leads to the following approximation of the error term in Eq. (1): 1 ā N b e j(ω ω)tn. (7) We can express the size of this error as the (spectral) Euclidean norm of the vector in Eq. (7): 1 N P ( ω) e j(ω ω)tn, (8) where P ( ω) = ā (ω) + b (ω). As might have been expected, Eq. (8) is the same as the error term that would be caused by a complex sinusoid, (ā + j b)e j ωt, on the Fourier periodogram in Eq. (1) evaluated as ω. By analogy with the uniformly sampled data case, the part of Eq. (8) that depends on the sampling pattern can be written as a function of a single frequency variable, viz. W (ω) = e jωtn, (9) is called the spectral window (as a quick motivation for this name, note that W (ω) above is proportional to the squared magnitude of the Fourier transform of the sampling window N δ(t t n ), where δ( ) denotes the Kronecker delta). Next we discuss, following [10], the use of the spectral window to define the maximum frequency interval, let us say [0, ω max ], that can be dealt with unambiguously, for a given set of sampling times {t n }. The spectral window attains its maximum value of N at ω = 0:

8 IEEE TRANSACTIONS ON SIGNAL PROCESSING 8 W (0) = N W (ω), ω. We evaluate W (ω) for ω > 0 to find the smallest frequency value, let us say ω 0, at which the spectral window has a peak whose height is close to N (if we cannot find any such peak even for ω 1, then we set ω 0 = ). Theoretically, we should consider W (ω 0 ) = N. However, in applications with relatively low signal-to-noise ratios, we may choose W (ω) < N, perhaps even as low as 0.8N or smaller. In any case, once ω 0 is computed, then the claim is that ω max = ω 0 should be the largest frequency of the data sequence in question. To motivate this claim, we must verify that no aliasing can take place if and only if Eq. (30) is satisfied. However, this follows ( easily from the fact that if Eq. (30) is true, then ω ω < ω 0 for any ω, ω ω 0, ω ) 0, ω ω; and that if Eq. (30) does not hold, then the frequency ω = ω ( 0 +, for any 0, ω ) 0, is aliased as ω = ω [ 0 + ω, ω ω 0, ω ] 0. πn π Note that often in the literature, ω max is chosen as or as. As noted t N t 1 min k p (t k t p ) in [10], ω max in Eq. (30) is typically much larger than either of these choices. Regarding the smallest frequency separation that can be safely detected, let us say ω, this can be approximated by the requirement that W ( ω) = N ; W (ω) > N (30) for ω [0, ω]. (31) For non-pathological sampling patterns, we typically have a reasonable approximation: ω = π t N t 1. (3) Because ω above is essentially the resolution limit of the LSP, we may want to evaluate the periodogram function on a grid with a step (much) smaller than Eq. (3), such as ω/10. II. CONTRIBUTIONS AND OUTLOOK As is well known, the spectral estimates obtained with either FP or LSP suffer from both local and global (or distant) leakage problems. Local leakage is due to the width of the main beam of the spectral window, and it is what limits the resolution capability of the periodogram. Global leakage is due to the sidelobes of the spectral window, and is what causes spurious peaks to occur (which leads to false alarms ) and small peaks to drown in the leakage from large peaks (which leads to misses ).

9 IEEE TRANSACTIONS ON SIGNAL PROCESSING 9 Additionally, there is no satisfactory procedure for testing the significance of the periodogram peaks. In the uniformly sampled data case, there is a relatively well-established test for the significance of the most dominant peak of the periodogram, see [11] [13] and the references therein. In the non-uniformly sampled data case, [6] has proposed a test that mimics the uniform data case mentioned above. However, it appears that the said test is not really applicable to the non-uniform data case, see [11] and the references therein. As a matter of fact, even if the test were applicable, it would only be able to decide whether {y(t n )} are white noise samples, and not whether the data sequences contains one or several sinusoidal components (we remark in passing on the fact that, even in the uniform data case, testing the existence of multiple sinusoidal components, i.e., the significance of the second largest peaks of the periodogram, and so forth, is rather intricate [1][13]). The only way of correcting the test, to make it applicable to nonuniform data, appears to be via Monte-Carlo simulations, which may be a rather computationally intensive task (see the lucid discussion in [11] on this aspect). The main contribution of the present paper is the introduction of a new method for spectral estimation and detection in the non-uniformly sampled data case, that does not suffer from the above drawbacks of the periodogram (i.e., poor resolution due to local leakage through the mainlobe of the spectral window, significant global leakage through the sidelobes, and lack of satisfactory tests for the significance of the dominant peaks). A preview of what the paper contains is as follows: The Amplitude and Phase EStmation (APES) method, proposed in [14] for uniformly sampled data, has significantly less leakage (both local and global) than the periodogram. We follow here the ideas in [15] and [16] to extend APES to the non-uniformly sampled data case. The so-obtained generalized method is referred as RIAA. Both LSP and RIAA provide nonparametric spectral estimates in the form of an estimated amplitude spectrum (or periodogram P (ω)). We use the frequencies and amplitudes corresponding to the dominant peaks of P (ω) (first the largest one, then the second largest, and so on) in a Bayesian Information Criterion (BIC), see, e.g., [17] and the references therein, to decide which peaks we should retain and which ones we can discard. The combined methods, viz. LSP+BIC and RIAA+BIC, provide parametric spectral estimates in the form of a number of estimated sinusoidal components that are deemed to fit the data well. Therefore, the use of BIC in the outlined manner not only bypasses the need for

10 IEEE TRANSACTIONS ON SIGNAL PROCESSING 10 testing the significance of the periodogram peaks in the manner of [6] (which would be an intractable problem for RIAA, and almost an intractable one for LSP as well see [11]), but it also provides additional information in the form of an estimated number of sinusoidal components, which no periodogram test of the type discussed in the cited references can really provide. Finally, we present a method for designing an optimal sampling pattern that minimizes an objective function based on the spectral window. In doing so, we assume that a sufficient number of observations are already available, for which we can get a reasonably accurate spectral estimate. We make use of this spectral estimate to design the sampling times when future measurements would be performed. The literature is relatively scarce in papers that approach the sampling pattern design problem. One reason for this may be that, as explain later on, spectral window-based criteria are relatively insensitive to the sampling pattern, unless prior information (such as a spectral estimate) is assumed to be available as in this paper. Another reason may be the fact that measurement plans might be difficult to realize in some applications, due to factors that are beyond the control of the experimenter. However, this is not a serious problem for the sampling pattern design strategy proposed here which is flexible enough to tackle cases with missed measurements by revising the measurement plan on the fly. III. RIAA AND BIC A. RIAA Let ω denote the step size of the grid considered for the frequency variable, see Eq. (3) and the subsequent comment, and let K = ω max / ω denote the number of the grid points needed to cover the frequency interval [0, ω max ], where x denotes the largest integer less than or equal to x; also, let ω k = k ω, k = 1,..., K. (33) The uniform frequency grid in Eq. (33) will be the one used in most applications, and this is why we have mentioned it here, but we should stress that the spectral analysis methods discussed in this paper can be used with any type of frequency grid without any conceptual modification (only the implementation code will depend on whether the grid is uniform or non-uniform).

11 IEEE TRANSACTIONS ON SIGNAL PROCESSING 11 The following additional notation will be needed as well in this section: y(t 1 ) [ ] y =., A k = c k s k, θ k = a(ω k), b(ω k ) y(t N ) cos(ω k t 1 ) sin(ω k t 1 ) c k =., s k =.. (34) cos(ω k t N ) sin(ω k t N ) Using this notation, we can rewrite the LS fitting criterion in Eq. (6) in the following vector form (for ω = ω k ): y A k θ k, k = 1,..., K, (35) where denotes the Euclidean norm. The LS estimate of θ k in Eq. (7) can be rewritten as: ˆθ k = (A T k A k ) 1 A T k y. (36) Besides a possible sinusoidal component with frequency ω k, as considered in Eq. (35), the data y might contain other sinusoidal components with frequencies different from ω k, as well as noise. Regarding the latter, we do not consider a noise component of y explicitly, but rather implicitly via its contributions to the data spectrum at {ω k } K k=1; for typical values of the signal-to-noise ratio, these noise contributions to the spectrum are comparatively small. Let us define: Q k = K p=1,p k (a p + b p)a p A T p. (37) which can be thought of as the covariance matrix of the other possible components in y, besides the sinusoidal component with frequency ω k considered in Eq. (35). Remark: In some applications, the covariance matrix of the noise component of y is known (or, rather, can be assumed with a good approximation) to be: σ 1 0 Σ =..... ; with {σ n} N given. (38) 0 σn In such cases, we can simply add Σ to the matrix Q k in Eq. (37).

12 IEEE TRANSACTIONS ON SIGNAL PROCESSING 1 Assuming that Q k is available, and that it is invertible, it would make sense to consider the following weighted LS (WLS) criterion, instead of Eq. (35), [y A k θ k ] T Q 1 k [y A kθ k ]. (39) As is well known that the estimate of θ k obtained by minimizing Eq. (39) is more accurate, under quite general conditions, than the LS estimate obtained from Eq. (35). Note that a necessary condition for Q 1 k K, via reducing the grid size ω. to exist is that (K 1) N, which is easily satisfied in general by increasing The vector θ k that minimizes Eq. (39) is given by: ˆθ k = (A T k Q 1 k A k) 1 (A T k Q 1 y). (40) Similarly to Eq. (10), the WLS periodogram, corresponding to Eq. (40), can be defined as: k P WLS (ω k ) = 1 N ˆθ T k (A T k A k )ˆθ k. (41) The WLS estimate in Eq. (40) appears to require the inversion of an N N matrix Q k, for each value of k = 1,..., K. For N 1, this would be computationally a rather intensive task. To show how we can simply reduce the computational complexity of Eq. (40), let us introduce the matrix: A simple calculation shows that: K Γ = (a p + b p)a p A T p. = Q k + (a k + b k)a k A T k. (4) p=1 Q 1 k A k = Γ 1 A k (I + D k A T k Q 1 k A k). (43) To verify this equation, pre-multiply it with the Γ in Eq. (4) and observe that ΓQ 1 k A k = A k + A k D k A 1 k Q 1 k A k = A k (I + D k A T k Q 1 k A k). (44) Inserting Eq. (43) into Eq. (40) yields the following alternative expression for the WLS estimate: ˆθ k = (A T k Γ 1 A k ) 1 (A T k Γ 1 y), (45) which is computationally more appealing than Eq. (40), as Γ 1 in Eq. (45) needs to be computed only once for all values of k = 1,..., K.

13 IEEE TRANSACTIONS ON SIGNAL PROCESSING 13 Next, we explain how to circumvent the problem that Γ in Eq. (45) depends on the very quantities that we want to estimate, viz. the {θ k } K k=1, and consequently ˆθ k cannot be implemented directly. The only apparent solution to this problem is some form of iterative process, for example the RIAA algorithm outlined in Table I. TABLE I THE RIAA ALGORITHM Iteration Initialization Use the LS method (see Eq. (36)) to obtain initial estimate of {θ k }, denoted by {ˆθ 0 k} Let {ˆθ i k} denotes the estimates of {θ k } at the i th iteration (i = 0, 1,,...), and let ˆΓ i denote the estimate of Γ obtained from {ˆθ i k}. For i = 0, 1,,..., compute: ˆθ i+1 k until a given number of iterations is performed. Periodogram Calculation: = [A T k (ˆΓ i ) 1 A k ] 1 [A T k (ˆΓ i ) 1 y], k = 1,..., K, (46) Let {ˆθ I k} denote the estimates of {θ k } obtained by the iterative process outlined above (here, I denotes the iteration number at which the iteration process is stopped). Use {ˆθ I k} to compute the RIAA periodogram as (see Eq. (41)): P RIAA(ω k ) = 1 N (ˆθ I k) T (A T k A k )(ˆθ I k), k = 1,..., K. (47) In most applications, the RIAA algorithm is expected to require no more than 10 0 iterations (see, e.g., the numerical examples in Section V). Note that we might think of replacing ˆΓ i in Eq. (46) by: ˆΓ i,i+1 k 1 K k = A ˆDi+1 p p A T p + A p ˆDi p A T p, (48) p=1 p=k which makes use not only of the i th iteration estimates of {θ k } but also of the (i + 1) st iteration estimates as they become available. However, the estimates of Γ in Eq. (48) depends on k (as indicated by notation), which would complicate the computation of i+1 ˆθ k : indeed, while the matrix inversion lemma could be used to compute (Γ i,i+1 k ) 1 from (Γ i,i+1 k 1 ) 1, the computation

14 IEEE TRANSACTIONS ON SIGNAL PROCESSING 14 cost compared with that of Eq. (46) would increase, and this without any guarantee for better performance. Consequently, we will use the RIAA with the simpler estimate of Γ in Eq. (46). B. RIAA and APES In what follows, we make use of the analogy between RIAA and the amplitude and phase estimate (APES) periodogram (see, e.g., [14][18]) to provide some additional insights into the expected behavior of RIAA. In particular, we use this analogy to explain intuitively why RIAA is expected to have less (both local and global) leakage than LSP. We begin by noting that there is an essential difference between RIAA and APES: the latter uses an estimate of the residual matrix Q k obtained from multiple realizations of the data vector y, whereas RIAA estimates Q k using a theoretical formula for this matrix, along with the most recent spectral estimate available; note that in the spectral analysis problem considered in this paper, we dispose of only one realization of y, so APES would not be applicable directly. However, once this difference is realized (as well as the fact that APES was derived in [14][18] for complex-valued data, whereas RIAA assumes real-valued data, which complicates its formulas to some extent), Eq. (38) used by RIAA to estimate θ k looks similar to the corresponding equation of APES (see [14][18]). The implication of this observation is that we should be able to obtain the estimate in Eq. (38) in the manner in which the APES estimate is usually derived (see the cited papers). To describe this alternative manner, let Ĥ k denote an N matrix that is the solution to the following constrained minimization problem: min H k f(h T k Q k H k ) (49) s.t. H T k A k = I, (50) where f is a monotonically increasing function on the domain of positive definite matrices (which means that X Y, i.e., the difference matrix X Y is positive semi-definite, implies f(x) f(y)), for example the trace or the determinant. We use Ĥk, in the manner of APES, to obtain an estimate of θ k as: ˆθ k = ĤT k y. (51) To prove the equivalence between Eq. (51) and Eq. (40), we need to show that the solution to Eq. (49) is given by: Ĥ T k = (A T k Q 1 k A k) 1 A T k Q 1 k, (5)

15 IEEE TRANSACTIONS ON SIGNAL PROCESSING 15 or, equivalently, that: H T k Q k H k ĤT k Q k Ĥ k = (A T k Q 1 k A k) 1, (53) for any H k that satisfies the constraint in Eq. (50). Using the fact that H T k A k = I, we can write Eq. (53) in the following form: H T k Q k H k H T k A k (A T k Q 1 k A k) 1 A T k H k 0, (54) which is equivalent (by standard Schur complement results, see, e.g., [13]) to: HT k Q k H k H T k A k 0, (55) A T k H k A T k Q 1 k A or yet HT k Q 1 k A T k Q 1 k [ Q 1 H k ] Q 1 A k 0. (56) Because the matrix in the left-hand side of Eq. (56) is evidently positive semi-definite, the proof that Eq. (5) is the solution to Eq. (49), and therefore that ˆθ k in Eq. (51) coincides with the WLS estimate of θ k, is concluded. Remark: Making use of the constraint, we can reformulate the minimization problem in Eq. (49) as: min H k f(h T k ΓH k ) (57) s.t. H T k A k = I (58) whose solution, viz. Ĥ T k = (A T k ΓA k ) 1 A T k Γ 1, (59) must coincide with Eq. (5). This observation provides another proof of the equivalence between the WLS estimate in Eq. (40) and Eq. (45). The intuition afforded by the above APES-like derivation of WLS is interesting. The matrix Ĥ T k that solves the optimization problem in Eq. (49) can be viewed as a filter that passes the sinusoidal component of current interest (with frequency ω k ) without any distortion (by Eq. (50), and attenuates all the other components in y as much as possible (by Eq. (49)). To illustrate the latter property of Ĥk, we use the fact that, by assumption, the data contains a finite (usually small) number of sinusoidal components. This means that there are only a limited number of

16 IEEE TRANSACTIONS ON SIGNAL PROCESSING 16 frequencies which contribute significant terms to Q k. Let ω p be one of these frequencies. Then, in view of Eq. (49), Ĥ T k should be nearly orthogonal to A p, i.e., Ĥ T k A p 0, which means that Ĥ T k acts as an annihilating filter for any strong sinusoidal component in y whose frequency is different from ω k. This observation explains why RIAA can be expected to have significantly reduced leakage problems compared with LSP: note that the filtering matrix used by LSP, viz. (A T k A k ) 1 A T k is data-independent, and as such it cannot eliminate the sinusoidal components in y, with frequencies different from ω k, as well as ĤT k does. C. labelsec:biclsp+bic and RIAA+BIC As already mentioned in the previous section, the use of the standard χ -test (borrowed from the uniform data case, see [1][13] and the references therein) to decide whether or not the most dominant peak of LSP is significant, as suggested in [6], faces a number of problems in the non-uniformly sampled data case. First, in the latter case, the test is in general invalid (see, e.g., [11]). Second, and more important in our opinion, even in the uniformly sampled data case, the test can hardly be used to decide if there are sinusoidal components in the data (and, if yes, how many of them), but rather only to check whether or not the data consist of white noise alone. An additional problem is that for RIAA, the distribution of the corresponding spectral estimate is unknown (even in the uniform data case), and hence a χ -type of test would not be applicable even approximately. The BIC rule is an alternative to standard statistical tests of hypothesis testing, such as the χ test discussed above, which can be used in the problem of interest here, albeit in an approximate manner (see below for details). To explain how this can be done, we let {P (ω k )} K k=1 denote the values taken by either the LSP or the RIAA at the points of the frequency grid, and let { ω m, ă m, b m } M m=1 (60) denote the frequencies and the amplitude and phase related parameters corresponding to the M largest peaks of {P (ω k )} K k=1, arranged in a decreasing order of their associated periodogram values: P ( ω 1 ) P ( ω ) P ( ω M ), (61) for M 1. Under the idealizing assumptions that the data sequence consists of a finite number of sinusoidal components and of normal white noise, and that { ω m, ă m, b m } M m=1 are the maximum

17 IEEE TRANSACTIONS ON SIGNAL PROCESSING 17 likelihood (ML) estimates for the frequencies and the amplitude and phase related parameters of M such sinusoidal components, the BIC rule estimates M as follows: where ˆM = arg min M=0,1,,... BIC(M), (6) BIC(M) [ =N ln y(t n ) ] M (ă m cos( ω m t n ) + b m sin( ω m t n )) m=1 + ρm ln N}, (63) and where ρ = ρ ML = 5, (64) (see, e.g., [13][17] and the references therein). Note that BIC is made of two terms: a LS data fitting term that decreases as M increases, and a complexity penalization term which increases with increasing M; therefore, the BIC estimate of M, i.e., ˆM in Eq. (6), is obtained by a trade-off between in-sample fitting accuracy and complexity of the sinusoidal data description. Also note that if ˆM = 0 is selected, then, according to BIC, the data consists of only white noise. In applications, the idealizing assumptions made above do not hold exactly. Indeed, usually the noise in the data is not white and normally distributed. Moreover, the { ω m, ă m, b m } M m=1 obtained from the peaks of the periodogram are not ML estimates (unless M = 1 and the noise is white and normal). Despite these problems, BIC can still be used for estimating M (i.e., essentially for testing the significance of the peaks of LSP or RIAA), albeit only in an approximate manner, as explained next. It follows from the derivation of BIC (see, e.g., the cited works) that the large complexity penalty corresponding to the ρ in Eq. (64) is mainly due to the high accuracy of the ML estimates. Clearly, as ρ increases, the probability of false alarms decreases, whereas the probability of misses increases; as accurate estimation method, such as the ML, should allow a detection rule to operate at quite a small probability of false alarm without more than an insignificant increase in the probability of miss this explains intuitively why the value of ρ in BIC, corresponding to the ML, is relatively high.

18 IEEE TRANSACTIONS ON SIGNAL PROCESSING 18 By comparison, the frequencies and amplitude and phase related parameters estimated with RIAA and LSP are expected to be less accurate and, respectively, significantly less accurate than the ML estimates. Consequently, we suggest the use of BIC with smaller values of ρ in the case of LSP and RIAA than the ρ ML = 5 used in the case of ML, namely: ρ RIAA = 3 and ρ LSP = (65) Admittedly, these choices are somewhat ad-hoc, but the corresponding BIC rules are simple to use and they appear to provide accurate estimates of M (see Section V for details). IV. SAMPLING PATTERN DESIGN We consider the following situation: the data samples {y(t n )} N have already been collected, and a spectral estimate, {P (ω k )} K k=1, has been obtained based on them; we want to choose the sampling times for future measurements, {x f } F f=1, in an optimal manner according to some objective function. Let {τ g } G g=1 denote a set of possible times when future measurements of y(t) could in principle be performed. Naturally, we assume that G > F and that {x f } F f=1 {τ g } G g=1 = S (66) Note that any known estimates on the set of possible future sampling times, such as black out dates and the like, can be taken into account by properly selecting S. The discussion in the previous sections has evidenced quite clearly the fact that the performance of LSP depends, to a large extent, on the shape of the spectral window (its mainlobe width and sidelobe heights), and so does the performance of RIAA, but to a smaller extent. Consequently, a spectral window-based function should be a logical choice of an objective to optimize with respect to {x f } F f=1. The spectral window corresponding to the set of actual sampling times {t n } N and possible future sampling times {τ g } G g=1 is given by (see Eq. (9)): G W (ω k ) = e jω kt n + e jω kτ g g=1, k = 1,..., K, (67) and the integrated spectral window by K K G W (ω k ) = e jω kt n + e jω kτ g k=1 k=1 g=1 (68)

19 IEEE TRANSACTIONS ON SIGNAL PROCESSING 19 We might think of minimizing Eq. (68) with respect to {x f } F f=1 S. Furthermore, because Eq. (68) does not depend on any prior information, we might even think of minimizing Eq. (68) with respect to {x f } F f=1 for N = 0 (in which case we would design the entire set of sampling times). However, doing so is not advisable. The reason is that the integrated spectral window is rather insensitive to the choice of {x f }. To see why this is so, observe that Eq. (68) is the discrete approximation of the following integral function (to within a multiplication constant): [ u T ωmax ] v(ω)v (ω) dω u, (69) 0 where ( ) denotes the conjugate transpose, and e jωt 1. 1 u =. ; v(ω) = e jωt N. (70) e jωτ 1 1. e jωτ G For non-pathological choices of ω max, {t n } and {τ g }, ωmax v(ω)v (ω) dω I, (71) 0 so that Eq. (69), and therefore Eq. (68), is approximately constant. This observation extends the well-known fact that in the uniform sampling case the integrated spectral window is constant (a fact that in particular implies that if the sidelobe heights are reduced, then the mainlobe width must increase, and vice versa). It follows from the previous discussion that in order to get a meaningful spectral windowbased objective function for the design of {x f } F f=1, we must assume that N > 0 and that prior information, e.g., in the form of a spectral estimate {P (ω k )}, is available. Under this condition, the spectral window (which now is a two-dimensional function) is given at ω = ω p by (see Eq. (8)): P (ω k ) e j(ωp ω k)t n + G e j(ωp ω k)τ g, g=1 p, k = 1,..., K, (7)

20 IEEE TRANSACTIONS ON SIGNAL PROCESSING 0 and therefore the integrated spectral window (at ω = ω p ) is K G P (ω k ) e j(ωp ω k)t n + k=1 e j(ωp ω k)τ g g=1. (73) Compared with Eq. (68) (which can be obtained from Eq. (73) by setting P (ω k ) = 1), the above function has a more appealing character: for a given ω p, the sidelobe corresponding to ω k is weighted by the power in the data sequence at ω k ; therefore, the larger the said power, the more emphasis will be put on minimizing the level of the corresponding sidelobe of the spectral window. Remark: Instead of the weighted -norm type of function in Eq. (73), we might want to consider a weighted -norm objective: max P (ω k) e j(ωp ω k)t n + k [1,K] G e j(ωp ω k)τ g. (74) g=1 The largest (weighted) sidelobe that will result from the use of Eq. (74) will of course be smaller than the largest (weighted) sidelobe that is obtained by using Eq. (73). However, many other (weighted) sidelobes of the former design may be larger than those of the latter, a fact that suggests that Eq. (73) may be preferable in many cases. The function in Eq. (73) still depends on ω p. Because the sampling times cannot be frequency dependent, we must eliminate the dependence of the objective function on the frequency. This can be done by integrating Eq. (73) with respect to ω p as well, preferably over a set of frequencies of main interest, let us say Ω: K G P (ω k ) e j(ωp ω k)t n + e j(ωp ω k)τ g. (75) ω p Ω k=1 g=1 If P (ω p ) takes on rather different values over Ω, we may even want to weight Eq. (75) in the following way: ω p Ω 1 P (ω p ) K G P (ω k ) e j(ωp ω k)t n + e j(ωp ω k)τ g. (76) k=1 g=1 to normalize the terms associated with different values of ω p Ω. In the discussion so far, we have been able to define what appears to be a reasonable objective function that has the form of an integrated weighted spectral window. The problem that remains is how to minimize this function with respect to {x f } F f=1 S. For relatively small values of G

21 IEEE TRANSACTIONS ON SIGNAL PROCESSING 1 (which is equal to the cardinality of S), the minimization problem in question can be solved by checking all possible combinations of {x f } F f=1 in S. However, such an exhaustive combinational search might not be feasible for larger values of G (and of F ). In the latter case, we can proceed in the following approximate but simpler manner. Let {µ g } G g=1 be Boolean variables: µ g {0, 1}. Making use of these variables, we can write the problem of minimizing Eq. (75) with respect to {x f } F f=1 S in the following way: K G min P (ω k ) e j(ωp ω k)t n + µ g e j(ωp ω k)τ g {µ g} G g=1 ω p Ω k=1 s.t. µ g {0, 1} G µ g = F. (77) g=1 (We have used the form Eq. (75) of the objective function, and not the one in Eq. (76), just for notational simplicity). Because the Boolean constraint is what makes the problem in Eq. (77) hard, we relax it to: 0 µ g 1, g = 1,..., G. (78) The problem Eq. (77) with the constraint µ g {0, 1} replaced by Eq. (78) is a linearly constrained quadratic program that can be solved efficiently, for instance by means of the public domain software in [19]. Once the solution {µ g } to the said program has been found, we choose its F largest elements and set them to 1 (and the rest to 0). Letting g 1 < g < < g F denote the indices of these F largest elements of {τ g } G g=1, we use the following sampling times x f = τ gf, f = 1,..., F (79) as an approximation of the solution to the minimization problem of Eq. (75). (See, e.g., [0] for a general discussion on this type of approximation of the solution to a Boolean optimization problem.) Finally, we remark on the possible use of a data taper, for a given sampling pattern. LSP with a data taper can be viewed as a special case of weighted LSP corresponding to a diagonal weighting matrix (whose diagonal elements are determined by the taper). Consequently, a well chosen taper may improve the performance of LSP. However, for RIAA this can hardly be the case, as this method is based on a general and theoretically optimal weighting matrix. Therefore, because the main emphasis here is on RIAA, we do not consider data tapers in this paper. g=1

22 IEEE TRANSACTIONS ON SIGNAL PROCESSING V. NUMERICAL ILLUSTRATIONS AND CONCLUDING REMARKS A. Methods Not Included in the Present Study The main goal of this section is to illustrate the differences in performance between LSP and RIAA, as well as between LSP+BIC and RIAA+BIC. The literature on the spectral analysis of non-uniform data contains several other methods that we do not consider in this performance study, and we explain here why. First, note that we have emphasized the case of sinusoidal data in this paper, and therefore methods designed for non-uniform data with continuous spectra, such as ARMA sequences, fall outside the scope of the paper. Of the existing methods for non-uniform sinusoidal data, the CAP, MUSIC and ESPRIT methods introduced in [15] appear to be the closest in spirit to the RIAA proposed here (see the cited paper for explanations of the acronyms used to designate these methods). Indeed, all these methods make use of the estimated covariance matrix that is computed in the first iteration of RIAA from LSP. In fact, CAP is essentially identical (by using a proper parameter) to the first iteration of RIAA. MUSIC and ESPRIT, on the other hand, are parametric methods that require a guess of the number of sinusoidal components present in the data, otherwise they cannot be used; furthermore, they make the assumption that the noise in the data sequence is white, and they may perform poorly if this assumption is not satisfied. Several different versions of a cyclic algorithm for estimating the parameters of sinusoidal sequences by minimizing a non-linear LS fitting criterion have been discussed, apparently in an independent manner, in [1][][5][1] and []. The method of this type introduced in [1] was called RELAX, because cyclic algorithms are sometimes referred to as relaxation algorithms in the optimization literature, and we will use this name in the following discussion to designate a generic algorithm in this class. RELAX is a powerful and usually quite accurate parametric method; in particular, note that RELAX coincides with the ML under the assumption that the noise is white and normally distributed, and that RELAX appears to be relatively insensitive to the violation of the said noise assumption. Being a parametric method, RELAX is thus comparable with the LSP+BIC and RIAA+BIC methods discussed in this paper. However, unlike the latter methods, RELAX does not provide any non-parametric (intermediate) estimate. In our opinion, this is a drawback: indeed, a parametric spectral model can never provide an exact description of

23 IEEE TRANSACTIONS ON SIGNAL PROCESSING 3 real-world data; on the other hand, a non-parametric spectral estimate (such as the one obtained with RIAA) is much more robust, as it does not suffer from the mis-modeling problems of a parametric method, and therefore it is often good to have a non-parametric estimate at least as a back-up for and cross-checking of a parametric estimation method. Another class of methods, that have been recently proposed for solving the spectral analysis problem dealt with in this paper, make use of sparsity enforcing-based techniques (see, e.g., [15] and [3]). While these methods have a seemingly non-parametric flavor, in actuality, they are essentially parametric and do not provide any (intermediate) non-parametric spectral estimate, unlike LSP+BIC and RIAA+BIC. Consequently, they suffer from the same drawback as the RELAX approach discussed in the previous paragraph. B. Simulated-Data Examples We consider a real-valued data sequence consisting of M = 3 sinusoidal components with frequencies 0.1, 0.4 and 0.41 Hz, and amplitudes, 4 and 5, respectively. The phases of the 3 sinusoids are assumed independently and uniformly distributed over [0, π] and the additive noise is assumed to be white Gaussian with mean 0 and variance σ = We define the signal-to-noise ratio (SNR) of each sinusoid as: SNR m = 10 log 10 ( α m / σ where α m is the amplitude of the m th SNR = 9 db and SNR 3 = 31 db in our examples. ) db, m = 1,, 3, (80) sinusoidal component. Hence we have SNR 1 = 3 db, The sampling pattern is modeled as a Poisson process with parameter λ = 0.1 s 1, that is, the sampling times arrive with intervals following the exponential distribution with mean µ = 1/λ = 10 s. We choose the sampling number N = 64. Figure 1 shows the sampling pattern, which we will fix for our Monte-Carlo trials below. The Poisson process brings about a highly irregular sampling pattern. The sampling intervals range from to s, with mean value t avg = s. Such a dynamic range prohibits us from finding a ω 0 in the spectral window W (ω) such that W (ω 0 ) is close to N (see the discussions in Section I-D). Therefore, we set f max = (although for convenience we do not plot the frequencies larger than 0.6 Hz in the following figures of this subsection). The size of the frequency grid f is chosen to be Hz.

24 IEEE TRANSACTIONS ON SIGNAL PROCESSING 4 Fig. 1. The sampling pattern fixed in all Monte-Carlo trials. The height of each stem is the sampling interval of the corresponding sampling index. Figure presents the amplitude estimates averaged in 100 independent realizations. LSP nearly buries the smallest sinusoid by its relatively high sidelobes while RIAA successfully resolves all three sinusoids. We further show in Figure 3 the overlapped estimates from the first 15 Monte-Carlo trials. Note that RIAA suffers from much less variations from one trial to another. Next, we apply BIC to select dominant peaks in the spectra obtained above by both LSP and RIAA. We let P D denote the probability of detecting the correct number of sinusoids, P FA the probability of making false alarms, and P M the probability of missing. Figure 4 shows the scatter plots of the estimates for the dominant sinusoids obtained via using LSP with BIC and RIAA with BIC. LSP s amplitude estimates of the two closely-spaced sinusoids are mostly biased and the smallest sinusoid is almost totally missed. For LSP, the missing probability P M is as high as 0.93 (even we choose ρ = 0), P FA is 0.00, and P D is only Compared to LSP, RIAA shows much better stability and accuracy, as illustrated in Figure 4(b). For RIAA, P M = 0.00, P FA = 0.0, and P D = We also observe that usually after 15 iterations RIAA s performance does not improve obviously. So in the above and all subsequent examples, we terminate the loop of RIAA after 15 iterations.

25 IEEE TRANSACTIONS ON SIGNAL PROCESSING 5 (a) (b) Fig.. Amplitude estimates averaged in 100 Monte-Carlo trials. The solid line is the estimated spectrum and the circles represent the true frequencies and amplitudes of the 3 sinusoids. (a) LSP, (b) RIAA. C. Sampling Pattern Design Examples In this subsection, we present sampling pattern design examples. Suppose we have already collected data at sampling times {t n } N=3, which are randomly selected from 1,,..., 50 seconds. The set of sampling times of possible future measurements {τ g } G=64 g=1 are randomly selected

(a) LSP, (b) RIAA. from 51, 5,..., 00 seconds.

26 IEEE TRANSACTIONS ON SIGNAL PROCESSING 6 (a) (b) Fig. 3. Amplitude estimates on top of each other in 15 Monte-Carlo trials. The solid line is the estimated spectrum and the circles represent the true frequencies and amplitudes of the 3 sinusoids. (a) LSP, (b) RIAA. from 51, 5,..., 00 seconds. The spectral window corresponding to {t n } N=3 is plotted in Figure 5 and the smallest frequency f 0 at which the spectral window has a peak bigger than 0.8N is approximately 0.99 Hz. According to Eq. (30), the largest frequency in question f max = f 0 / 0.5 Hz.

27 IEEE TRANSACTIONS ON SIGNAL PROCESSING 7 (a) (b) Fig. 4. Scatter plot of dominant sinusoidal components selected by BIC in 100 Monte-Carlo trials. Dots symbolize the estimates while circles represent the true frequencies and amplitudes of the 3 sinusoids. (a) LSP with BIC, ρ = 0, (b) RIAA with BIC, ρ = 3.

28 IEEE TRANSACTIONS ON SIGNAL PROCESSING 8 Fig. 5. The spectral window corresponding to sampling times {t n} N=3. Consider a real-valued data sequence consisting of M = 3 sinusoidal components with frequencies 0.1, 0.4 and 0.41 Hz, and amplitudes, 4 and 5, respectively. The phases of the 3 sinusoids are assumed independently and uniformly distributed over [0, π] and the additive noise is assumed to be white Gaussian with mean 0 and variance σ = The estimated amplitude spectra are shown in Figure 6. Due to the shortage of samples, LSP mixes up the two closely-spaced sinusoids and generates a large false peak at the frequency 0 Hz. RIAA provides reasonable estimates, although the amplitudes estimates are obviously downward biased. Figure 7 shows the scatter plot. The corresponding probabilities P D, P FA and P M are 0.17, 0.01 and 0.8 for LSP and 0.70, 0.0 and 0.8 for RIAA, respectively. We now apply the optimal sampling design technique in Section IV to choose {x f } F =3 f=1 from {τ g } G=64 g=1. The frequency set of our main interest is chosen as Ω p = π [0.35, 0.45] Hz, because we aim to resolve the two closely-spaced sinusoids furthermore. The sampling patterns before and after our optimal design are shown in Figure 8, from which we notice that small and large sampling intervals interweave in the new pattern to raise the resolution. With only the new sampling times {x f } F =3 f=1, we obtain the estimated amplitude spectra shown in Figure 9 and the scatter plots in Figure 10. We now get the new probabilities: P D, P FA and P M are 0.10, 0.00 and 0.90 for LSP; and 0.91, 0.09 and 0.00 for RIAA. All 3 sinusoidal components are

29 IEEE TRANSACTIONS ON SIGNAL PROCESSING 9 (a) (b) Fig. 6. Amplitude estimates from the initial 3 sampling times, averaged in 100 Monte-Carlo trials. The solid line is the estimated spectrum and the circles represent the true frequencies and amplitudes of the 3 sinusoids. (a) LSP, (b) RIAA.

30 IEEE TRANSACTIONS ON SIGNAL PROCESSING 30 (a) (b) Fig. 7. Scatter plot of dominant sinusoidal components selected by BIC in 100 Monte-Carlo trials, from the initial 3 sampling times. Dots symbolize the estimates while circles represent the true frequencies and amplitudes of the 3 sinusoids. (a) LSP with BIC, ρ = 0, (b) RIAA with BIC, ρ = 3.

31 IEEE TRANSACTIONS ON SIGNAL PROCESSING 31 now clearly resolved by RIAA and the amplitudes estimates are also quite accurate. Interestingly, LSP trades off the detection of the weakest sinusoid for the resolution of the two closely-spaced sinusoids. This compromise is a direct result of the choice of Ω p used in the optimization for sampling pattern design. (a) (b) Fig. 8. Sampling patterns which are fixed in all Monte-Carlo trials. The height of each stem is the sampling interval of the corresponding sampling index. (a) Initial 3 samples, (b) Designed 3 samples.

32 IEEE TRANSACTIONS ON SIGNAL PROCESSING 3 (a) (b) Fig. 9. Amplitude estimates from the designed 3 sampling times, averaged in 100 Monte-Carlo trials. The solid line is the estimated spectrum and the circles represent the true frequencies and amplitudes of the 3 sinusoids. (a) LSP, (b) RIAA.

33 IEEE TRANSACTIONS ON SIGNAL PROCESSING 33 (a) (b) Fig. 10. Scatter plot of dominant sinusoidal components selected by BIC in 100 Monte-Carlo trials, from the designed 3 sampling times. Dots symbolize the estimates while circles represent the true frequencies and amplitudes of the 3 sinusoids. (a) LSP with BIC, ρ = 0, (b) RIAA with BIC, ρ = 3.

34 IEEE TRANSACTIONS ON SIGNAL PROCESSING 34 D. Radial Velocity Data Examples Finally, we provide examples using measured astronomical data, which is used in the discovery of an extrasolar planet named ET-1 (see [4] and the references therein). Figure 11 plots the data, together with the sampling pattern. We notice the highly-irregular sampling intervals, which range from 0.01 to days due to varying conditions during a total of days of observation. Figure 1 shows the corresponding spectral window. Similarly to the above two subsections, we readily get f max 0.5 cycles/day and we are concerned with the frequency interval [0.01, 0.5] cycles/day (corresponding to the periods between and 100 days). Figure 13 shows the power spectra estimated by LSP and RIAA. Although LSP and RIAA almost show the same dominant peak, the relatively large sidelobe levels given by LSP are not desirable. We also follow the custom of astronomy and re-plot the spectra with respect to days in a semi-log scale in Figure 14. Figure 13 and Figure 14 show clearly the sharp peaks and low sidelobes of RIAA. The dominant peak selected by RIAA with BIC is located at Hz (4.1 days), which is consistent with the result (4.11 days) in [4]. We further synthesize the data using the sinusoidal component picked by BIC. Figure 15 shows the comparison between the original data sequence y and our synthesized one ŷ. Although the sampling intervals are highly irregular, the two sequences fit well with the normalized mean squared error y ŷ / y = In order to demonstrate more clearly the advantages of RIAA over LSP, we do the estimation again using only the first 16 of the complete 38 samples in the data set. The estimated power spectra are shown in Figure 16, where spurious peaks smear LSP s estimates but RIAA still succeeds with the only dominant peak and low sidelobes. It is worth mentioning that the first 16 samples only cover an observation period of days, which is around half of the total observation time. This implies that the observation efficiency can be improved because RIAA does not require the observation time as long as the traditional periodogram does. In this subsection we add σ I to the covariance matrix Γ in Eq. (4). We choose σ = 1 N y ŷi within the i th iteration of RIAA, where y is the original data sequence, N is the data length, ŷ i is the synthesized data sequence from the dominant peaks selected by BIC in the i th iteration. We remark that highly irregular samplings in reality sometimes render Γ illconditioning and σ I mainly focuses on improving the stability of RIAA. In most cases without

35 IEEE TRANSACTIONS ON SIGNAL PROCESSING 35 (a) (b) Fig. 11. Radial velocity data of HD 10195, which is the star that ET-1 orbits. (a) Data sequence. Circles represent the samples. (b) Sampling pattern of the data sequence. The height of each stem is the sampling interval of the corresponding sampling index.

IEEE TRANSACTIONS ON SIGNAL PROCESSING 36 Fig. 1. The spectral window corresponding to the radial velocity data of HD 10195.

36 IEEE TRANSACTIONS ON SIGNAL PROCESSING 36 Fig. 1. The spectral window corresponding to the radial velocity data of HD pathological sampling intervals (like other parts of this section), we have similar results no matter we add σ I or not. REFERENCES [1] F. J. M. Barning, The numerical analysis of the light-curve of 1 lacertae, Bulletin of the Astronomical Institutes of the Netherlands, vol. 17, no. 1, pp. 8, August [] P. Vanicek, Approximate spectral analysis by least-squares fit, Astrophysics and Space Science, vol. 4, no. 4, pp , August [3], Further development and properties of the spectral analysis by least-squares, Astrophysics and Space Science, vol. 1, no. 1, pp , July [4] N. R. Lomb, Least-squares frequency analysis of unequally spaced data, Astrophysics and Space Science, vol. 39, no., pp , February [5] S. Ferraz-Mello, Estimation of periods from unequally spaced observations, Astronomical Journal, vol. 86, no. 4, pp , April [6] J. D. Scargle, Studies in astronomical time series analysis. ii - statistical aspects of spectral analysis of unevenly spaced data, Astrophysical Journal, vol. 63, pp , December 198. [7] W. H. Press and G. B. Rybicki, Fast algorithm for spectral analysis of unevenly sampled data, Astrophysical Journal, vol. 338, pp , March [8] J. A. Fessler and B. P. Sutton, Nonuniform fast Fourier transforms using min-max interpolation, IEEE Transactions on Signal Processing, vol. 51, pp , February 003.

37 IEEE TRANSACTIONS ON SIGNAL PROCESSING 37 (a) (b) Fig. 13. Power spectrum estimates for the radial velocity data of HD The solid line is the estimated spectrum and the circle is the dominant peak selected by BIC ( ˆM = 1 here, i.e., BIC only picks one peak). (a) LSP, ρ =. (b) RIAA, ρ = 3.

38 IEEE TRANSACTIONS ON SIGNAL PROCESSING 38 Fig. 14. scale. Power spectrum estimates by LSP (dashed line) and RIAA (solid line), with respect to 1/f (days) and in a semi-log Fig. 15. Data synthesis from the picked dominant sinusoidal component by RIAA. The dashed line is the original data sequence y and the solid line is our synthesized one ŷ.

IEEE TRANSACTIONS ON SIGNAL PROCESSING 39 Fig. 16. Power spectrum estimates by LSP (dashed line) and RIAA (solid line), using only the first 16 of the complete 38 samples in the data set. [9] N.

39 IEEE TRANSACTIONS ON SIGNAL PROCESSING 39 Fig. 16. Power spectrum estimates by LSP (dashed line) and RIAA (solid line), using only the first 16 of the complete 38 samples in the data set. [9] N. Nguyen and Q. Liu, The regular fourier matrices and nonuniform fast Fourier transforms, SIAM journal on Scientific Computing, vol. 1, no. 1, pp , 000. [10] L. Eyer and P. Bartholdi, Variable stars: Which Nyquist frequency? Astronomy and Astrophysics, Supplement Series, vol. 135, pp. 1 3, February [11] F. A. M. Frescura, C. A. Engelbrecht, and B. S. Frank, Significance tests for periodogram peaks, E-print arxiv:0706.5, June 007. [1] M. B. Priestley, Spectral Analysis and Time Series. Academic Pr, June [13] P. Stoica and R. L. Moses, Spectral Analysis of Signals. Prentice Hall, April 005. [14] J. Li and P. Stoica, An adaptive filtering approach to spectral estimation and SAR imaging, IEEE Transcations on Signal Processng, vol. 44, no. 6, June [15] P. Stoica and N. Sandgren, Spectral analysis of irregularly-sampled data: Paralleling the regularly-sampled data approaches, Digital Signal Processng, vol. 16, no. 6, pp , November 006. [16] T. Yardibi, M. Xue, J. Li, P. Stoica, and A. B. Baggeroer, Iterative adaptive approach for sparse signal representation with sensing applications, IEEE Transcations on Aerospace and Electronic Systems, submitted in 007. [17] P. Stoica and Y. Selen, Model-order selection: a review of information criterion rules, Signal Processng Magazine, IEEE, vol. 1, no. 4, pp , July 004. [18] P. Stoica, H. Li, and J. Li, A new derivation of the APES filter, Signal Processng Letters, IEEE, vol. 6, no. 8, pp , August [19] J. F. Sturm, Using SeDuMi 1.0, A Matlab toolbox for optimization over symmetric cones, Optimization Methods and Software, vol. 11, pp , [0] S. P. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 004.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 3, MARCH X/$ IEEE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 57, NO 3, MARCH 2009 843 Spectral Analysis of Nonuniformly Sampled Data: A New Approach Versus the Periodogram Petre Stoica, Fellow, IEEE, Jian Li, Fellow, IEEE,