Discrimination and Classification of Nonstationary Time Series Using the SLEX Model

Size: px
Start display at page:

Download "Discrimination and Classification of Nonstationary Time Series Using the SLEX Model"

Transcription

1 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. Discrimination and Classification of Nonstationary ime Series Using the SLEX Model 60 Hsiao-Yun HUANG, Hernando OMBAO, and David S. SOFFER 4 63 Statistical discrimination for nonstationary random processes is important in many applications. Our goal was to develop a discriminant scheme that can extract local features of the time series, is consistent, and is computationally efficient. Here, we propose a discriminant scheme based on the SLEX (smooth localized complex exponential) library. he SLEX library forms a collection of Fourier-type bases 8 that are simultaneously orthogonal and localized in both time and frequency domains. hus, the SLEX library has the ability to extract 67 9 local spectral features of the time series. he first step in our procedure, which is the feature extraction step based on work by Saito, is to 68 0 find a basis from the SLEX library that can best illuminate the difference between two or more classes of time series. In the next step, we 69 construct a discriminant criterion that is related to the Kullback Leibler divergence between the SLEX spectra of the different classes. he 70 discrimination criterion is based on estimates of the SLEX spectra that are computed using the SLEX basis selected in the feature extraction step. We show that the discrimination method is consistent and demonstrate via finite sample simulation studies that our proposed method 3 performs well. Finally, we apply our method to a seismic waves dataset with the primary purpose of classifying the origin of an unknown 72 4 seismic recording as either an earthquake or an explosion KEY WORDS: Kullback Leibler divergence; Likelihood ratio; Nonstationary random process; Seismic time series; SLEX library INRODUCION he extension of classical pattern-recognition techniques to experimental time series is a problem of great practical interest. A series of observations indexed in time often produces a pattern that may form a basis for discriminating between different classes of events. As an example, Figure shows regional (00 2,000 km) recordings of a typical Scandinavian earthquake, a mining explosion, and an event of unknown origin measured by stations in Scandinavia. he unknown event took place near the Russian nuclear test facility in Novaya Zemlya. he problem of discriminating between mining explosions and earthquakes is a reasonable proxy for the problem of discriminating between nuclear explosions and earthquakes. his latter problem is of critical importance for monitoring a comprehensive test-ban treaty. An extensive discussion of this problem can be found in Shumway and Stoffer (2000), chapter 5; the data are available on the website for the text, and its mirrors. ime series classification problems are not restricted to geophysical applications, but occur under many and varied circumstances in other fields. raditionally, detecting a signal embedded in a noise series has been analyzed in the engineering literature by statistical pattern recognition techniques. he historical approaches to the problem of discriminating among different classes of time series can be divided into two distinct categories. he optimality approach, as found in the engineering and statistics literature, makes specific Gaussian assumptions about the probability density functions of the separate groups and then develops solutions that satisfy welldefined minimum error criteria. ypically, in the time series case, we might assume the difference between classes is expressed through differences in the theoretical mean and covariance functions, and use likelihood methods to develop an optimal classification function. A second class of techniques, which might be described as a feature extraction approach, proceeds more heuristically by looking at quantities that tend to be good visual discriminators for well-separated populations and have some basis in physical theory or intuition. Less attention is paid to finding functions that are approximations to some welldefined optimality criterion. When analyzing time series data, both time domain and frequency domain approaches are typically available; this is true in the case of discrimination and classification. For relatively short stationary series, a time domain approach that follows conventional multivariate discriminant analysis is viable; an example is given in Shumway and Stoffer (2000), example 5.. For longer stationary time series, a frequency domain approach is computationally easier because it reduces the dimension of the problem. For nonstationary time series, the reduction in dimension is essential for computational efficiency. here has been extensive research on the discrimination problem of stationary time series. For example, Shumway (982) reviewed many different discriminant methods for time series, including time domain and frequency domain approaches. Kakizawa, Shumway, and aniguchi (998) proposed the method of discrimination for multivariate time series by using the Kullback Leibler discrimination information and the Chernoff information measure. Pulli (996) considered using the ratio of spectra to process the discriminant problem between earthquakes and explosions. hese methods are developed for stationary time series. A discussion of the current state of the art for stationary series can be found in Shumway and Stoffer (2000), section 5.7. In many practical problems, however, the time series are realizations of nonstationary random processes. For example, it is clear that the seismic waves displayed in Figure have variance and spectrum that change over time. Various models of 22 8 Hsiao-Yun Huang is Assistant Professor, Department of Statistics and Information Science, Fu-Jen Catholic University, aiwan ( stat202@mails. erature. Priestley (965) was the first to introduce the concept 52 nonstationary random processes have been proposed in the lit- 53 fju.edu.tw). Hernando Ombao is Assistant Professor, Department of Statistics, University of Illinois, Champaign, IL 6820 ( ombao@uiuc.edu). of a Cramér representation with time-varying transfer function. 2 David S. Stoffer is Professor, Department of Statistics, University of Pittsburgh, his idea was later refined by Dahlhaus (997), who established Pittsburgh, PA 5260 ( stoffer@pitt.edu). his research was supported 56 in part by National Science Foundation grant 0025 and National Institute of 5 57 Mental Health grant 62298, and it is based on Hsiao-Yun Huang s Ph.D. Dissertation, University of Pittsburgh. he authors thank Professor aniguchi for Journal of the American Statistical Association 0 American Statistical Association 6 supplying an advanced copy of Sakiyama and aniguchi (2003), and thank the???? 0, Vol. 0, No. 0, heory and Methods 59 associate editor and reviewers for suggestions that improved the presentation. DOI 0.98/00 8

2 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 2 2 Journal of the American Statistical Association,???? Figure. Example of Seismic Recordings an asymptotic framework for locally stationary processes. Recently, Ombao, Raz, von Sachs, and Guo (2002) introduced the SLEX (smooth localized complex exponentials) model of a nonstationary random process. he SLEX model uses the SLEX vectors (orthogonal and localized Fourier vectors) as stochastic building blocks in the Cramér representation. Sakiyama and aniguchi (2003) and Shumway (2003) discussed discrimination for Dahlhaus locally stationary time series. We approach the problem of discrimination and classification of nonstationary time series using the SLEX model. One very important consideration in using the SLEX library is that it employs computationally efficient algorithms. In particular, it uses the fast Fourier transform algorithm to compute the SLEX transform and the best basis algorithm (BBA) of Coifman and Wickerhauser (992) to search for the best basis for discrimination between classes. Our discriminant scheme contains two parts: a feature extraction part and a classification part. he feature extraction step consists of selecting a basis from the SLEX library that illuminates the most differences between the groups. We follow Saito (994), who proposed a criterion that is based on the Kullback Leibler divergence. his method is aimed at selecting the basis from any library of orthogonal bases that can best present the time series as clouds in space with maximal distance (distance is between the normalized time-varying spectrum). Because the SLEX functions are local, they are able to extract local spectral features of the time series. After selecting a basis using training datasets, we compute the SLEX periodogram of the time series that we need to classify. We propose a classification criterion which we show to be asymptotically consistent. he essential idea is that an observed time series is assigned to a class (population or group) c if the Kullback Leibler divergence between the estimated spectrum (SLEX periodogram computed from the data) and the spectrum of c is smaller than that between the estimated spectrum and the spectra of any other class. In the next section we present our procedure for selecting the best local discriminant basis from the SLEX library. In Section 3, we discuss our discriminant criterion. Finally, in Section 4, we present some simulation studies and an analysis of seismic recordings. 2. SELECING HE BES LOCAL DISCRIMINAN BASIS FROM HE SLEX LIBRARY he first step in our proposed method is to choose a basis from the SLEX library, which is a collection of bases; each basis consists of orthogonal and localized basis vectors. he SLEX vector is a time and frequency localized orthogonal generalization of the Fourier (complex exponential) basis vectors. It is ideal for analyzing nonstationary time series. Ombao, Raz, von Sachs, and Mallow (200) used the SLEX to estimate the spectral density matrix of a bivariate nonstationary time series. Moreover, Ombao et al. (2002) introduced a model of a nonstationary random process that has a spectral representation in terms of the SLEX. he SLEX library is a rich collection of localized bases. hus it is able to extract local spectral features of the time series and is well suited to the problem of discrimination and classification of nonstationary time series. 2. he SLEX Basis Vectors Here, we give a brief overview of the SLEX transform. For details, we refer the reader to Ombao et al. (200) for specific applications to time series and to Wickerhauser (994) for a broader discussion on local trigonometric transforms. Fourier basis vectors are perfectly localized in frequency and hence are ideal for representing stationary time series. However, they cannot adequately represent nonstationary time series, that is, the time series with spectra that change over time. In this article, we will use the SLEX basis vectors, which are simultaneously orthogonal and localized in time and frequency. hey are constructed by applying a projection operator on the Fourier vectors. It turns out that the action of a projection operator on any periodic vector is identical to applying two specially constructed smooth windows to the Fourier vectors. A SLEX basis vector ψ S,ω (t) that has support on the discrete time block

3 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 3 Huang, Ombao, and Stoffer: Discrimination and Classification Using the SLEX Model Figure 2. Real and Imaginary Parts of a SLEX Waveform With Support on Block Indexed by ime t {52,...,,023} With ɛ = 32. block b on level j to be S(j,b) and M ɛ = /2 J +, which is the same for all levels j. We denote the j = /2 j. he SLEX 70 S ={α 0 ɛ +,...,α ɛ} and oscillates at frequency ω has the form ( ψ S,ω (t) = S,+ (t) exp i2πω t ) S ( + S, (t) exp i2πω t ), (2.) S where ω [ /2, /2], S =α α 0, and ɛ is a small overlap between two consecutive time blocks (the size of ɛ is given in Sec. 2.2). he windows S,+ (t) and S, (t) are the two particularly constructed smooth windows, which take the form ( ) ( ) t S,+ (t) = r 2 α0 r 2 α t, (2.2) ɛ ɛ ( ) ( ) t α0 α0 t S, (t) = r r ɛ ɛ ( ) ( ) t α α t r r, (2.3) ɛ ɛ 22 8 where r( ) is called a rising cut-off function. In our implementation, we use the sine rising cut-off function ( ) π r(u)= sin ( + u), where u [, ]. (2.4) 4 34 t 93 Other types of rising cut-off functions may be used. See Wickerhauser (994) for details. he SLEX basis vectors are defined at dyadic blocks that overlap. One important property of the SLEX vectors is that they remain orthogonal despite the overlap. Moreover, in our implementation, we use the fundamental frequencies ω k = k/ S, k = S /2 +,..., S /2, where S is the length of time block S. An example of a SLEX basis vector is given in Figure he SLEX Library he SLEX library is a collection of bases, each having orthogonal vectors with time support that is obtained by segmenting the time series, of length, in a dyadic manner. he library is constructed by first specifying the finest resolution level J (smallest time block has length /2 J ). At resolution level j (where j = 0,...,J), the time series is divided into 2 j overlapping blocks. he amount of overlap is set to vectors on block S(j,b) are allowed to oscillate at different fundamental frequencies ω k = k/m j (where k = M j /2 +,...,M j /2). o illustrate this further, consider the bottom of Figure 3, where the SLEX library is constructed by setting J = 2. In this example, the SLEX library consists of five orthogonal bases and we enumerate the support of these basis vectors: () S(0, 0); (2) S(, 0) S(, ); (3)S(2, 0) S(2, ) S(2, 2) S(2, 3); (4) S(, 0) S(2, 2) S(2, 3); (5)S(2, 0) S(2, ) S(, ). Clearly, the SLEX basis vectors are allowed to have different lengths of support (or different time and frequency resolutions). Moreover, each basis captures local spectral features of the time series that are useful for discrimination and classification. In the latter part of this section, we discuss our criterion for selecting a basis from the SLEX library. 2.3 he SLEX Coefficients and Periodograms he SLEX transform consists of the set of coefficients that correspond to all the SLEX vectors defined in the library. he SLEX coefficients on block S = S(j,b) are defined by θ S,k = X t, ψ S,ωk (t), (2.5) Mj where the fundamental frequency is ω k = k/m j and k = M j /2 +,...,M j /2. he SLEX periodogram, an analog of the Fourier periodogram, is defined to be α S,k = θ S,k 2. (2.6) 2.4 he Discriminant Criterion We use the best local discriminant criterion that was described by Saito (994) as a criterion for selecting the best basis. In the actual search over the SLEX library, we can employ either the algorithm used by Saito or the BBA of Coifman and Wickerhauser (992). Both of these are computationally efficient algorithms, each of order O(). Intuitively, a best basis for discrimination is that which gives the largest disparity between two or more different classes. One well-known measure of disparity is relative entropy (also known as cross-entropy, Kullback Leibler divergence, or I divergence), which we now define. Let p ={p i } n i= and q = {q i } n i= be two nonnegative sequences that satisfy i p i = i q i =. he relative entropy between p and q is defined to be 56 5 Figure 3. he SLEX Library Constructed With Level J = 2. he n 58 shaded blocks S(, 0) S(2, 2) S(2, 3) represent one basis (out of 7 I(p, q) = p i log p i. 59 the five in this library). q i 8 i=

4 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 4 4 Journal of the American Statistical Association,???? 0 By convention, we define log 0 =,log(x/0) =+ for x>0, and 0 (± ) = 0. Relative entropy satisfies I(p, q) 0 and the equality holds if and only if p q. his quantity is not a metric because it is not commutative and does not satisfy the triangle inequality. However, it measures the discrepancy of p from q. Moreover, relative entropy, an additive measure in the sense that I(p, q) = n i= I(p i,q i ), enables us to use this divergence measure in the best basis algorithm. For C 2 classes of time series, we define C 70 I(p,...,p C ) = C I(p l, p k ). Now, let {x c i }N c i= be a set of N c training signals belonging to class c, forc =,...,C. hen the time frequency energy map of class c, denoted by Ɣ c, is a table of real values specified by the triplet (j,b,k)as / 78 N c Nc criterion at the parent block is greater than or equal to the corresponding sum at the children blocks, then the children blocks Ɣ c (j,b,k) α (j,b),k i x c i 2, (2.7) i= i= do not separate the populations better than the parent block, and 22 8 where α i 23 (j,b),k 82 is the periodogram of the ith time series computed at level j = 0,...,J in block b = 0,...,2 j, and frequency ω k,wherek = M j /2 +,...,M j /2. Note that the time frequency energy map for a class c is in fact average normalized periodograms, that is, for any basis B in each class c, (j,b,k) B Ɣ c (j,b,k) =. he time frequency energy map gives the location (in time and frequency) where the energy in a particular class is mostly concentrated. hus, our procedure for classification is based on the time frequency energy concentration of the signals. After computing the time frequency energy map of each class, we then calculate the divergence between the C signals at block S(j,b) as 37 M 96 j /2 j,b = I[Ɣ (j,b,k),...,ɣ C (j,b,k)]. 39 k= M j /2+ 98 In this case, j,b gives a local measure of disparity between 42 the time frequency energy maps of the C classes in each time 0 X t, = M i /2 θ i,k, ψ i,k (t)z i,k, (2.8) 43 block S(j,b). Our procedure selects the basis that consists of Mi 02 Si B k= M i /2+ blocks S(j,b), where, essentially, the distance between classes, j,b, is maximized. 2.5 he Algorithm for Selecting the Best Basis In this section, we give the specific steps for selecting the best basis. Saito (994) developed the local discriminant basis algorithm (LDBA), which we apply to the SLEX library. Let A j,b represent the best basis (maximizer of the local discriminant criterion) and let B j,b denote the SLEX vectors at level j in block b. he algorithm for selecting the best basis is as follows: 56 5 Step 0. Specify the maximum depth of decomposition J. Step. Construct time frequency energy maps Ɣ c for c =,...,C. Step 2. Set A J,b = B J,b. Determine the A j,b for b = 0,...,2 j andj = J,...,0 by the following rule: If j,b j+,2b + j+,2b+,thena j,b = B j,b, else A j,b = A j+,2b A j+,2b+ and set j,b = j+,2b + j+,2b+. Step 3. Extract the blocks S(j,b ) that are defined by A(j, b) in the previous step. he SLEX periodograms computed at the blocks defined by the best basis will be used to construct the classification rule. he algorithm essentially compares a parent block against its children blocks; for example, the parent block S(j,b) ver- 3 l= k=l+ 72 sus its children blocks S(j +, 2b) S(j +, 2b + ). Ifthe value of the local discriminant criterion at the parent block is smaller than the corresponding sum at the children blocks, then the children blocks separate the populations better than the parent block. hus the children blocks are chosen in favor of their parent in the best basis B. If the value of the local discriminant the parent block would be selected in favor of its children. Finally, we point out the connection between LDBA and the BBA. he basis that maximizes the I divergence is equivalent to the basis that minimizes the negative I divergence. One may use the BBA to search for the best local discriminant basis in the SLEX library. hus, our procedure is computationally efficient and can handle massive time series datasets. 2.6 he SLEX Model We now describe the SLEX model for a nonstationary time series X t, for t = 0,...,. Let B be the best local discriminant basis selected from the SLEX library and let S i be the blocks in B. Note that S i is a particular dyadic segmentation of the time series. Define M i to be the number of points on the block S i.letj be the highest time resolution level in B, that is, the smallest time block in B has length /2 J. he frequencies defined on S i are the grid frequencies ω ki = k i /M i for k i = M i /2 +,...,M i /2. he spectral representation of X t, is where θ i,k, is the transfer function on time block S i and frequency k; ψ i,k is the SLEX basis vector oscillating at frequency k and having support at block S i ;andz i,k is an orthonormal random process with finite fourth moment he SLEX Spectrum. he SLEX spectrum is defined analogously to the spectrum of a stationary process. It is the square of the modulus of the time-varying transfer function. It is defined on rescaled time [0, ]. Letu be in an interval I [0, ] such that [u ] is in some time block S i on B.he SLEX spectrum is f (u, ω k ) = θ i,k, 2 [u ] S i.note that for a fixed frequency ω k, f (u, ω k ) is constant within each time subinterval (or block). his is because for each fixed, the SLEX model gives an explicit partitioning of the time frequency plane as determined by the blocks i S i in the basis B. 59 8

5 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 5 Huang, Ombao, and Stoffer: Discrimination and Classification Using the SLEX Model 5 Ombao et al. (2002) developed asymptotic theory on the SLEX model. In this article, we discuss only the ideas that are essential in proving the consistency of our discrimination 60 4 procedure. As, f (u, ω k ) approaches the limiting 63 5 spectrum f(u,ω), which is independent of. he limiting = M i /2{ } α i,k log f (u i,ω k ) +. (2.0) 64 f (u i,ω k ) 6 Si B k=0 65 log-spectrum is defined subsequently. As increases, the number of observations in each block also increases. However, the number of blocks in B should also be allowed to increase if we want to model a limiting spectrum that, as a function of time, has an infinite wavelet expansion. hus, we need to allow J to increase as increases, but at a rate that is slower than K = log 2 ( ). he infinite wavelet expansion in Ombao et al. (2002) was defined on log f(u,ω) instead of f(u,ω) to guarantee positivity of the spectrum. Let the Haar wavelet basis of L 2 ([0, ]) be {ϕ0,0 } {ϕ l,m} l 0,m 0,whereϕ is the father Haar wavelet and ϕ is the mother wavelet. he infinite wavelet expansion of the log-limiting SLEX spectrum is log f(u,ω)= lim log f (u, ω) ( = lim β,0 (ω)ϕ0,0 (u) 22 is consistent, that is, the probability of incorrectly classifying a 8 24 J 2 l 83 + β l,m (ω)ϕ l,m (u) ), (2.9) where the coefficients are β,0 (ω) = log f(u,ω)ϕ0,0 (u) du, 0 respectively. Our rule is to classify the time series x into if p [ α(x)] p 2 [ α(x)]. In terms of the log-likelihood ratio, the β l,m (ω) = log f(u,ω)ϕ l,m (u) du Ombao et al. (2002) demonstrated that for a given, log f(u,ω), and SLEX basis B with finest time resolution level J,logf (u, ω) arises as a finite wavelet expansion of log f(u,ω)truncated to a level J. As a final remark, Ombao et al. (2002) showed that the SLEX model and the Dahlhaus (997) model of a locally stationary process are asymptotically mean-squared equivalent. he equivalence implies nonstationary processes that have smoothly time-varying spectrum that can be modeled using the SLEX model. For example, one may model autoregressive moving average processes with parameters that vary smoothly over time by a SLEX model with a spectrum that varies with time but is piecewise constant within small time blocks he Likelihood of the SLEX Periodograms. We now 48 derive the likelihood of the SLEX periodograms. When the orthonormal random increment processes, z i,k,arecomplex = M i /2{ } log f (u α i,k i,ω k ) + 08 f 50 Si B k=0 (u (3.2) i,ω k ) 09 Gaussian, then the SLEX coefficients, θ i,k, are independent complex Gaussian. Moreover, the SLEX periodograms α i,k = θ i,k 2 are distributed as f (u i,ω k )V i,k,wherev i,k are independent over i and k 0 and are distributed as χ2 2/2when 54 k 0,M i /2, and χ 3 2 when k = 0,M i/2. Because the periodograms at each level j and block b are symmetric about the = M i /2{ } 55 log f 2 (u α i,k i,ω k ) + 4 f 2 56 Si B k=0 (u. (3.3) i,ω k ) 5 zero frequency (k = 0), we consider only the periodograms at nonnegative frequencies. Denote α ={ α i,k }.Letp( α) be the joint density of the periodograms under the SLEX process and let f be the spectrum. hen the log-likelihood of the periodograms is l(f α) In the next section, we develop a classification rule that is based on the log-likelihood ratio. 3. HE CLASSIFICAION RULE Let x ={x 0,...,x } be a time series that we wish to classify as a realization of either or 2 with time-varying spectra f or f 2, respectively. We consider only the case where there are C = 2 classes of time series, although our method is applicable for C>2. Our classification criterion is based on a frequency domain approach, that is, we use the spectrum as the basic signature for classifying time series. We extract the SLEX periodograms computed from the blocks in the basis selected in the feature extraction step. In this section, we derive our criterion using both the log-likelihood ratio and the Kullback Leibler divergence. We then show that this criterion time series goes to zero as the length of the time series. 3. Classification Rule as Log-Likelihood Ratio 26 l=0 m=0 he basic idea of our approach is as follows. We first ex- 85 tract the SLEX periodograms, which we denote as α(x), from 28 the blocks of the best basis selected. We then form the like- 87 lihood under and 2, denoted as p [ α(x)] and p 2 [ α(x)], discriminant statistic is D (f,f2 ; x) = logp [ α(x)] p 2 [ α(x)] 59 8 (3.) and the classification rule is to assign x as a realization of if D 0. Otherwise, it is assigned to 2. In this section, we derive the discrimination statistic (loglikelihood ratio) and then show that this criterion is consistent, that is, the probability of incorrectly classifying a time series goes to zero as the length of the time series. 3.. he SLEX Likelihood Ratio. Denote α ={ α i,k }.Let p ( α) and p 2 ( α) be the density under processes and 2, respectively, and let f and f 2 be the spectra under these two processes. hen the log-likelihoods under these two densities are, respectively, l(f α) and l(f 2 α) Our classification rule is based on the likelihood ratio, that is, we classify x into if l(f α) l(f 2 α); otherwise,itis classified into 2. Following (3.) (3.3), the discriminant sta-

6 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 6 6 Journal of the American Statistical Association,???? 0 tistic is 60 M i /2 D (f,f2 ; x) = { log f 2(u i,ω k ) f (u i,ω k ) 4 63 i 5 S i B k= M i /2+ 64 [ ]} + α i,k f 2(u i,ω k ) f (u. (3.4) i,ω k ) We classify x into if D (f,f2 ; x) 0. Remark. First, we note that D (f,f2 ; x) 2 [l(f α) l(f 2 α)]. In the computation of the log-likelihood, we do not explicitly include the negative frequencies, because the periodograms are symmetric about k = Remark 2. o simplify the computation of the log-likelihood 74 KL( α,f 6 l(f 75 ) and l(f 2 ) = M i /2 α i,k log ), we ignored the fact that the distribution of i S f i B k= M i /2+ (u i,ω k ) α i,k /f (u i,ω k ) at k = 0andM i /2isχ 2, whereas for k =,...,M i /2, the ratio is χ2 2/2. However, for M i sufficiently large, we expect the difference to be practically negligible. α i,k Remark 3. he derivation of D (f,f2 ; x) in the SLEX 22 approach is inspired by the work of Sakiyama and aniguchi 8 23 (2003), who derived a discriminant statistic for the Dahlhaus KL( α,f 2 ) = M i /2 { α i,k log (997) model of a locally stationary random process. Denote 83 i S f i B k= M i /2+ 2(u i,ω k ) the spectrum for the Dahlhaus model as f D (u, ω). he likelihood is obtained by segmenting the time series into N time blocks, each having length M, and then computing the Fourier periodograms P M (b, ω),atblocksb =,...,N. heir classification criterion is 27 α i,k 86 D (f,f 2 ; x) 32 /2 9 = N N log f 2 D (u j,ω) his leads to the same classification rule discussed in Sec- 34 j= /2 fd (u j,ω) tion 3.. following (3.4), namely classify x into if D 0; 93 [ ] + P M (u j,ω) fd 2 (u j,ω) fd (u dω, (3.5) j,ω) which is comparable to (3.4). Remark 4. In practice, the true spectra, f and f 2, are unknown. We replace these quantities in D by the SLEX periodograms that are averaged across time series replicates. Suppose that there are N l independent time series from class l for l =, 2. Denote the periodograms computed from the We now state the results pertinent to the consistency of our classification scheme based on the SLEX model. Denote the probability of misclassifying a time series x to process 2, given that the true process is, to be P (2 ) = P {D (f,f2 ; x) <0 }. Similarly, we denote P ( 2) = P {D (f,f2 ; x) 0 2}. hen, under appropriate condi- 45 mth time series in the training dataset from class l to be α l,m i,k. tions, 04 hen the estimate of f l (u i,ω k ) is f 49 l(u i,ω k ) = N l α l,m lim i,k, l=, 2. (3.6) P ( 2) = 0. (3.0) N l 08 m= 50 are replaced 09 Essentially, we perform averaging across subjects, which gives the similar desired effect of reducing the variance when smoothing over frequency. However, if N is small, we might not achieve the desired reduction in the variance and thus we may need to do an extra smoothing step, which is over frequency. Hence, we may use the weighted form 56 5 f l(u i,ω k ) = M W s f l (u i,ω k s ), where the 2M + is the span of the weights, W s 0, and Ms= M W s =. he estimate f 2(u i,ω k ) can be obtained in a similar manner if N 2 is small. 3.2 he Classification Rule as Kullback Leibler Divergence he discriminant statistic, D given in (3.4), can be derived using the Kullback Leibler divergence. As before, let f and f 2 represent the spectra of and 2, respectively. Let the periodogram of the time series x,tobeclassifiedintoeither and 2, be denoted by α ={ α i,k }. he Kullback Leibler divergences between α and f and f 2 are 59 8 s= M and { [ + α i,k f (u i,ω k ) ]} [ + α i,k f 2(u i,ω k ) ]}. We classify the series x into if its periodogram α is closer to f than f 2;thatis,ifKL( α,f ) KL( α,f2 ) or if D (f,f2 ; x) = KL( α,f2 ) KL( α,f ) 0. otherwise, classify x into Consistency heorem lim P (2 ) = 0, (3.9) Moreover, (3.9) and (3.0) hold when f and f 2 by the consistent estimates defined in (3.6) if, in addition, we have N l for l =, 2. he conditions and proofs of these results are given in Appendix B. 4. SIMULAION RESULS AND DAA ANALYSIS 4. Simulation Results A number of numerical experiments were performed to test how well our proposed method was able to classify time series.

7 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 7 Huang, Ombao, and Stoffer: Discrimination and Classification Using the SLEX Model 7 able. Misclassification Rate for the First Simulation Study Where Π 60 2 Is Gaussian White Noise and Π 2 Is AR() With Parameter φ 6 φ Error rate We discuss a few of simulation experiments here. In the first experiment we had two classes, where processes from were Gaussian white noise and processes from 2 were stationary first-order autoregressive with parameter φ. he length of each time series was set to =,024 and the training set for each group consisted of eight time series. Several simulation studies were performed by varying φ. Using the training datasets, we selected the best local discriminant basis and then set up the discriminant criterion. We then generated 50 time series datasets Figure 4. he Coefficient a t;τ versus t for τ =.5 ( ), τ =.4 ( ), 75 7 from each of and 2. In able we report the error rate, τ =.3 ( ), and τ =.2 ( ). 76 which is the number of incorrect classifications out of a total of 00 trials. Note that there is a small misclassification rate when the value of φ is close to 0. For example, when φ =±., there is some misclassification error because these AR() processes are close to white noise. In the second experiment, processes from were defined by { () Y t if t / Y t = 27 Y (2) (4.) 86 t if /2 + t, where Y t () is white noise and Y (2) is AR() with parameter φ =.. he processes from 2 were defined by { X () t if t /2 X t = (4.2) 29 t 88 X t (2) if /2 + t, t is AR() with parameter where X t () is white noise and X (2) φ =.. We generated N = 8, 6 training datasets from each group, and 2, with = 52,,024, 2,048. In each case Figure 5. A Simulated Slowly Changing AR(2) Dataset From Π. we generated 50 time series datasets from and another 50 from 2. he misclassification rates are displayed in able 2, and again we note the favorable results. In the third experiment, the processes from and 2 were slowly time-varying AR(2) processes. he processes from were generated as Y t = a t;τ Y t.8y t 2 + ɛ t, t =,...,,024, where a t;τ =.8[ τ cos(πt/,024)], τ =.5, and ɛ t were iid standard normal. We performed three subexperiments, where 2 was defined by the processes Y t = a t;τ Y t.8y t 2 + ɛ t, t =,...,,024, where τ takes on the three different values τ =.4,.3,.2. he plots of a t;τ for the various values of τ are shown in Figure 4. A simulated dataset from process is shown in Figure 5 and that from 2 with τ =.3 is shown in Figure able 2. Misclassification Rate for the Second Simulation Study 4 Where Π Is (4.) and Π 2 Is (4.2) ,024,024 2,048 2, N Error rate Figure 6. A Simulated Slowly Changing AR(2) Dataset From Π 2 With 7 59 τ =.3. 8

8 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 8 8 Journal of the American Statistical Association,???? 0 3 τ Error rate We generated 0 datasets from each of and 2 as the training data. From the training dataset, we obtained the best basis and the estimated spectra. hen we generated 0 new data for each category to evaluate the error rate. he simulation results are shown in able 3. We can see that the error rates are zero when the τ =.3,.2in 2. here was a misclassification error for the case where the τ s are close for both processes, that is, τ =.4 for 2 and τ =.5for Data Analysis As discussed in the Introduction, discriminating between nuclear explosions and earthquakes is a problem of critical importance for monitoring a comprehensive test-ban treaty. We apply the proposed SLEX methodology to construct the discriminant rule for classifying a time series as either an explosion or an earthquake. Because the proliferation of nuclear explosions is monitored in regional distances (00 2,000 km) nowadays, the data on mining explosions can serve as a reasonable proxy. A dataset constructed by Blandford (993) that able 3. he Simulation Results for the Slowly Varying comprises regional (00 2,000 km) recordings of several typi- 60 AR(2) Experiment cal Scandinavian earthquakes and mining explosions measured by stations in Scandinavia are used in this study. A list of these events (eight earthquakes and eight explosions) and an extra event of uncertain origin that was located in the Novaya Zemlya region of Russia (called the NZ event) were given by Kakizawa et al. (998). he problem was discussed in detail by Shumway and Stoffer (2000, chap. 5) and the data are availiable online from the website of the text (the URL is given in the Introduction). he earthquake and explosion seismic recordings are given in Figures 7 and 8, respectively. he SLEX spectral estimates for one earthquake and one explosion time series are given in Figures 9 and 0, respectively, whereas the SLEX spectral estimate of the NZ event is given in Figure. o evaluate our method, we used the holdout procedure; that is, we removed one time series (at a time) to be used for classification and used all the remaining time series in the training dataset (excluding the unknown NZ event) to construct the classification rule. For each holdout time series, we performed the following steps in sequence: We selected a basis for discrimination, we extracted the SLEX periodograms computed at the blocks in this basis, we constructed the classification rule, and, finally, we assigned the holdout time series. We repeated the process for each of the holdout time series. In the Figure 7. Seismic Recordings of Earthquake Origin. 8

9 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 9 Huang, Ombao, and Stoffer: Discrimination and Classification Using the SLEX Model Figure 8. Seismic Recordings of Explosion Origin. 92 classification criterion, we estimated the spectra for each group by first averaging the SLEX periodograms across subjects and then smoothing the averaged SLEX periodograms across frequency. Denote the estimated spectra for the earthquake and the explosion classes to be, respectively, f and f 2. he classification criterion was D ( f, f 2 ; x) and the testing data was classified as an earthquake event if D ( f, f 2 ; x) 0andas an explosion if D ( f, f 2 ; x)<0. he results are reported in Figure 9. he SLEX Spectral Estimate of an Earthquake ime Series Figure 0. he SLEX Spectral Estimate of an Explosion ime Series. 8

10 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 0 0 Journal of the American Statistical Association,???? 0 60 As mentioned in Section 2.6, the existence of the limiting SLEX 4 63 spectrum depends on three assumptions, which we state now for completeness. 5 Details can be found in Ombao et al. (2002) Assumption. For f(u, ω) as a function of ω [ /2, /2], uniformly in u [0, ], we assume a Hölder condition of order µ (0, ] 66 with constant L>0: 0 f(u,ω) f(u,ω ) L ω ω µ. 69 Assumption 2. here exists a hierarchical collection I of dyadic 70 2 subintervals of [0, ], I = {[ 2 l m, 2 l (m + ) ) : l = 0,,...; m = 0,...,2 l }, 5 and a subset of intervals, say I v =[u v,u v+ ) I, such that v I v = 74 6 [0, ] and that f(u,ω), as a function of u, is Hölder of order 0 <s v < 75 7 on I v for all ω [ /2, /2]. Moreover, the transition points u v between the intervals I 76 Figure. he SLEX Spectral Estimate of the NZ Event. 8 v and I v allow for a finite number of possible 77 jumps of finite height. 20 Assumption 3. (a) Either the maximum depth J is fixed or, if 79 able 4, where we denote Eqk to be the kth earthquake in the 2 J = J is allowed to grow as a function of (i.e., J ), then 80 dataset and Expk to be the kth explosion in the dataset. he 22 we must have 2 J / 0as.Furthermore,J J 2. 8 proposed method had a zero misclassification rate. Moreover, (b) he length M 23 j of each segment S j satisfies M J /M j for 82 the unknown NZ event here is classified as an explosion, which j = 0,,...,J agrees with the result in Kakizawa et al. (998). 5. CONCLUSION In Section 3.3 we claimed that our discrimination rule is consistent in the sense of (3.9) and (3.0). We now prove these results un- 28 In this article, we proposed the SLEX method for discrimination and classification of nonstationary time series that is based spectrum, f to the SLEX limiting spectrm, f (recall Remark 4 in 87 der Assumptions 3. First, we state a theorem that relates the SLEX 30 on the SLEX transform. he SLEX transform is localized in Sec. 2.6). 89 both time and frequency domains; thus our method is able to heorem (Ombao et al. 2002). Under Assumptions 3, given the extract local features of the data. Moreover, our method is computationally efficient and hence is able to handle large datasets. SLEX limiting spectrum f, and a sequence of frequencies ω k, ω SLEX model as defined in Section 2.6, with SLEX spectrum f 33 and 92 Our procedure computes the Kullback Leibler distance of the as, we have the following situations: time frequency energy maps (normalized SLEX periodograms 36. Let u I v and let 2 J µ/(µ+sv).hen 95 or the estimated normalized spectrum) of the classes and selects 37 log f 96 the basis from the SLEX library that can best discriminate between classes of nonstationary time series. Our classification uniformly in u I v. (u, ω k, ) log f(u,ω) =O ( s vµ/(µ+s v ) ) 40 rule, which is based on the Kullback Leibler distance between 2. Define s := inf v s v and let 2 J µ/(µ+sv).hen 99 4 the estimated spectra of the groups and the time series that [ needs to be classified, is shown to be consistent; that is, the log f (u, ω k, ) log f(u,ω) ] 2 du = O ( s vµ/(µ+s v ) ) probability of misclassification goes to zero as the length of o aid in the proof of the consistency of our method, we first define the following functions. Let 03 the time series goes to infinity. Finite sample simulation studies and data analysis demonstrate that the method performs well in APPENDIX A: ASSUMPIONS ON HE LIMIING SPECRUM OF HE SLEX MODEL APPENDIX B: PROOF OF CONSISENCY φ (u, ω) = /f 2 46 practice. (u, ω) /f (u, ω) 05 and 48 φ(u,ω)= /f 2 (u, ω) /f (u, ω) able 4. Result of Holdout Procedure for Classifying 08 Seismic Recordings Define 5 Holdout series D 0 ( f, f 2 ; x) Holdout series D ( f, f 2 ; x) H (φ ) = M i /2 φ (u i,ω k ) α i,k, 52 Eq.526 Exp.5096 i S i B k= M i /2+ 53 Eq2.627 Exp Eq3.832 Exp /2 3 Eq4.03 Exp H(φ) = φ(u,ω)f (u, ω) dω du, Eq Exp /2 56 Eq6.789 Exp where α i,k is the SLEX periodogram of x, i 57 Eq7.66 Exp S i represents the blocks 6 corresponding to the best basis B Eq Exp8.044,andf is the SLEX limiting spectrum that corresponds, generically, to class (i.e., x ). Simi- NZ larly, f 8 represents the SLEX spectrum that corresponds to.he

11 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. Huang, Ombao, and Stoffer: Discrimination and Classification Using the SLEX Model proofs of the following lemmas are brief to save space; explicit details Proof. Without loss of generality, we prove the result for x can be obtained from the authors. Set 6 Lemma. Under the established notation and conditions, 4 R 63 ([ ] 5 M µ ) ([ 2 J ] µ ) = M i /2 log f 2(u i,ω k ) 64 E 6 H (φ ) = H(φ)+ O + O + O(b s ), i S f i B k= M i /2+ (u i,ω k ). 65 hen where E 8 denotes expection with respect to (wrt), M := sup i M i, D 67 b 9 := M J /,ands := inf v s v. (f,f2 ; x) = R + H (φ ) 68 and 0 Proof. 69 E { D (f 70 E H (φ ) = M i /2,f2 ; x) } = R + E{H (φ ) }. M i φ (u i,ω k )f 3 72 i 4 S M (u i,ω k ) Hence, i i B k= M i /2+ E {[ D (f,f2 ; x) E{D (f,f2 ; x)}] 2 } = M i /2 M i [φ(u i,ω k ) + O(b s 75 7 i S M )] = Var{H (φ ) } 0 i i B k= M i /2+ as by Lemma [f (u i,ω k ) + O(b s )] We are now ready to establish the main result on the consistency of 9 our discrimination rule. 78 = [ M /2 i φ(u i,ω)f (u i,ω)dω heorem 2. Under the established notation and conditions, 2 80 i S /2 i B lim 22 P (2 ) = lim P {D (f 8 ([ 23 2 J ] µ ) ],f2 ; x)<0 }=0, 82 + O + O(b s 24 ) lim P ( 2) = lim P {D (f,f2 ; x) 0 2}= / = φ(u,ω)f Proof. We prove the first result; the second result follows in a similar manner. Using Lemma, we have 85 (u, ω) dω du 27 0 /2 86 ([ ] 28 M µ ) ([ 2 J ] µ ) 87 + O + O + O(b s 29 ). E [ D (f,f2 ; x) ] 88 Lemma 2. Under the established notation and conditions, 3 = M i /2 [ log f 2(u i,ω k ) 90 ( 32 b 9 Var 33 {H (φ )}=O( s ) ( ) + O 2 J ) i + O, S f i B k= M i /2+ (u i,ω k ) + f ] (u i,ω k ) f 2(u i,ω k ) 92 /2 [ 34 = log f 2 (u, ω) 93 where Var 35 denotes variance wrt. 0 /2 f (u, ω) + f ] (u, ω) f 2 (u, ω) dωdu Proof. Let P 95 i = M i /2 k= M i /2+ φ ([ ] (u i,ω k ) α i,k.hen M µ ) ([ 2 J ] µ ) ([ ] s ) M 37 + O + O + O. 96 { M i /2 } 39 Var {H (φ )}=Var φ (u i,ω k ) α However, i,k i S i B k= M i /2+ 99 log f 2 (u, ω) 42 = 0 2 Var (P i ) + f (u, ω) + f (u, ω) f 2 (u, ω) 0, 43 i 2 Cov (P i,p j ) so that i j E [ D 03 in obvious notation. Using arguments similar to the proof of Lemma (f,f2 ; x) ] C 0 (A.) 45 and the result of Lemma, we can show 04 as. In light of (A.) and Lemma 3, the result follows immediately. 47 ( b Var {P i }=O( s ) ) + O Corollary. Let f i l(u i,ω k ) for l =, 2 be the estimates given in (3.6) and let N = min{n,n 2 }. hen, under the established notation and conditions, 08 and 5 ( 2 J ) Cov (P i,p j ) = O. lim lim P { D ( f N, f 2; x) } < 0 = 0, i j 54 Lemma 3. Let D (f lim lim P { D ( f 3,f2 ; x) denote the discriminant statistic defined in (3.4), where f 4 N, f 2; x) } 0 2 = and f 2 are the SLEX spectra in classes Proof. By the strong law of large numbers, D ( f 56 and 2, respectively. hen under the established notation and conditions, as, (f,f2 ; x) as N, and the result follows immediately from, f 2 as ; x) 5 D heorem D (f,f2 ; x) E{ D (f p,f2 ; x) i} 0 forx i,i=, 2. [Received May Revised January 2004.] 8

12 JASA jasa v.2003/0/20 Prn:23/03/2004; 5:42 F:jasatm034r2.tex; (Aiste) p. 2 2 Journal of the American Statistical Association,???? 0 REFERENCES Pulli, J. J. (996), Extracting and Processing Signal Parameters for Regional Seismic Event Identification, in Monitoring a Comprehensive est Blandford, R. R. (993), Discrimination of Earthquakes and Explosions, 3 Report AFAC-R HQ, Air Force echnical Applications Center, Ban reaty, eds. E. S. Husebye and A. M. Dainty, Dordrecht: Kluwer, 62 4 Patrick Air Force Base, FL. pp Coifman, R., and Wickerhauser, M. (992), Entropy Based Algorithms Saito, N. (994), Local Feature Extraction and Its Applications, Ph.D. dissertation, Yale University, Dept. of Mathematics. 5 for Best Basis Selection, IEEE ransactions on Information heory, 32, Sakiyama, K., and aniguchi, M. (2003), Discrimination for Locally Stationary Random Processes, Journal of Multivariate Analysis, in press Dahlhaus, R. (997), Fitting ime Series Models to Nonstationary Processes, he Annals of Statistics, 25, Kakizawa, Y., Shumway, R. H., and aniguchi, M. (998), Discrimination and Shumway, R. H. (982), Discriminant Analysis for ime Series, in Classification, Pattern Recognition and Reduction of Dimensionality, Handbook of Clustering for Multivariate ime Series, Journal of the American Statistical Association, 93, Statistics, Vol. 2, eds. P. R. Krishnaiah and L. N. Kanal, Amsterdam: North- Ombao, H., Raz, J., von Sachs, R., and Guo, W. (2002), he SLEX Model of Holland, pp. 46. Nonstationary Processes, Annals of the Institute of Statistical Mathematics, 70 (2003), ime Frequency Clustering and Discriminant Analysis, Statistics and Probability Letters, in press , Ombao, H., Raz, J., von Sachs, R., and Malow, B. (200), Automatic Statistical 3 Shumway, R. H., and Stoffer, D. S. (2000), ime Series Analysis and Its Applications, New York: Springer-Verlag Analysis of Bivariate Nonstationary ime Series, Journal of the American 4 Statistical Association, 96, Priestley, M. (965), Evolutionary Spectra and Nonstationary Processes, Wickerhauser, M. (994), Adapted Wavelet Analysis From heory to Software, 74 Journal of the Royal Statistical Society, Ser. B, 28, New York: IEEE Press

Discrimination and Classification of Nonstationary Time Series using the SLEX Model

Discrimination and Classification of Nonstationary Time Series using the SLEX Model Discrimination and Classification of Nonstationary ime Series using the SLEX Model Hsiao-Yun Huang, Hernando Ombao, and David S. Stoffer Abstract Statistical discrimination for nonstationary random processes

More information

Series. Andrés M. Alonso 1, David Casado 1, Sara López-Pintado 2 and Juan Romo 1. ERCIM, United Kingdom, 2011

Series. Andrés M. Alonso 1, David Casado 1, Sara López-Pintado 2 and Juan Romo 1. ERCIM, United Kingdom, 2011 Robust Alonso 1, David Casado 1, Sara 2 1 1 Universidad Carlos III de Madrid 28903 Getafe (Madrid), Spain 2 Columbia University New York, NY 10032, USA ERCIM, United Kingdom, 2011 Outline Time series

More information

Consistent classification of non-stationary time series using stochastic wavelet representations

Consistent classification of non-stationary time series using stochastic wavelet representations Consistent classification of non-stationary time series using stochastic wavelet representations Piotr Fryzlewicz 1 and Hernando Ombao 2 December 23, 2007 Abstract A method is proposed for classifying

More information

FOR. Keywords: time series, classification, integrated periodogram, data depth

FOR. Keywords: time series, classification, integrated periodogram, data depth Working Paper 8-74 Statistics and Econometrics Series 27 December 28 Departamento de Estadística Universidad Carlos III de Madrid Calle Madrid, 126 2893 Getafe (Spain) Fax (34) 91 624-98-49 A FUNCTIONAL

More information

Fourier Analysis of Stationary and Non-Stationary Time Series

Fourier Analysis of Stationary and Non-Stationary Time Series Fourier Analysis of Stationary and Non-Stationary Time Series September 6, 2012 A time series is a stochastic process indexed at discrete points in time i.e X t for t = 0, 1, 2, 3,... The mean is defined

More information

Time-Frequency Clustering and Discriminant Analysis

Time-Frequency Clustering and Discriminant Analysis Time-Frequency Clustering and Discriminant Analysis Robert H. humway Abstract: We consider the use of time-varying spectra for classification and clustering of non-stationary time series. In particular,

More information

Adapted Feature Extraction and Its Applications

Adapted Feature Extraction and Its Applications saito@math.ucdavis.edu 1 Adapted Feature Extraction and Its Applications Naoki Saito Department of Mathematics University of California Davis, CA 95616 email: saito@math.ucdavis.edu URL: http://www.math.ucdavis.edu/

More information

THE SPECTRAL ANALYSIS OF NONSTATIONARY CATEGORICAL TIME SERIES USING LOCAL SPECTRAL ENVELOPE

THE SPECTRAL ANALYSIS OF NONSTATIONARY CATEGORICAL TIME SERIES USING LOCAL SPECTRAL ENVELOPE THE SPECTRAL ANALYSIS OF NONSTATIONARY CATEGORICAL TIME SERIES USING LOCAL SPECTRAL ENVELOPE by Hyewook Jeong BS, Sookmyung Women s University, 1996 MS, Sookmyung Women s University, 1999 Submitted to

More information

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω) Online Appendix Proof of Lemma A.. he proof uses similar arguments as in Dunsmuir 979), but allowing for weak identification and selecting a subset of frequencies using W ω). It consists of two steps.

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

8.2 Harmonic Regression and the Periodogram

8.2 Harmonic Regression and the Periodogram Chapter 8 Spectral Methods 8.1 Introduction Spectral methods are based on thining of a time series as a superposition of sinusoidal fluctuations of various frequencies the analogue for a random process

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches

More information

Statistics of Stochastic Processes

Statistics of Stochastic Processes Prof. Dr. J. Franke All of Statistics 4.1 Statistics of Stochastic Processes discrete time: sequence of r.v...., X 1, X 0, X 1, X 2,... X t R d in general. Here: d = 1. continuous time: random function

More information

Functional Mixed Effects Spectral Analysis

Functional Mixed Effects Spectral Analysis Joint with Robert Krafty and Martica Hall June 4, 2014 Outline Introduction Motivating example Brief review Functional mixed effects spectral analysis Estimation Procedure Application Remarks Introduction

More information

Multiscale and multilevel technique for consistent segmentation of nonstationary time series

Multiscale and multilevel technique for consistent segmentation of nonstationary time series Multiscale and multilevel technique for consistent segmentation of nonstationary time series Haeran Cho Piotr Fryzlewicz University of Bristol London School of Economics INSPIRE 2009 Imperial College London

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

Introduction to Signal Processing

Introduction to Signal Processing to Signal Processing Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Intelligent Systems for Pattern Recognition Signals = Time series Definitions Motivations A sequence

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Local Spectral Analysis via a Bayesian Mixture of Smoothing Splines

Local Spectral Analysis via a Bayesian Mixture of Smoothing Splines Local Spectral Analysis via a Bayesian Mixture of Smoothing Splines Ori Rosen, David S. Stoffer, and Sally Wood In many practical problems, time series are realizations of nonstationary random processes.

More information

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore

Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore Stochastic Structural Dynamics Prof. Dr. C. S. Manohar Department of Civil Engineering Indian Institute of Science, Bangalore Lecture No. # 33 Probabilistic methods in earthquake engineering-2 So, we have

More information

1 Procedures robust to weak instruments

1 Procedures robust to weak instruments Comment on Weak instrument robust tests in GMM and the new Keynesian Phillips curve By Anna Mikusheva We are witnessing a growing awareness among applied researchers about the possibility of having weak

More information

Data-driven methods in application to flood defence systems monitoring and analysis Pyayt, A.

Data-driven methods in application to flood defence systems monitoring and analysis Pyayt, A. UvA-DARE (Digital Academic Repository) Data-driven methods in application to flood defence systems monitoring and analysis Pyayt, A. Link to publication Citation for published version (APA): Pyayt, A.

More information

Detecting abrupt changes in a piecewise locally stationary time series

Detecting abrupt changes in a piecewise locally stationary time series Journal of Multivariate Analysis 99 (2008) 191 214 www.elsevier.com/locate/jmva Detecting abrupt changes in a piecewise locally stationary time series Michael Last a,, Robert Shumway b a National Institute

More information

2014:05 Incremental Greedy Algorithm and its Applications in Numerical Integration. V. Temlyakov

2014:05 Incremental Greedy Algorithm and its Applications in Numerical Integration. V. Temlyakov INTERDISCIPLINARY MATHEMATICS INSTITUTE 2014:05 Incremental Greedy Algorithm and its Applications in Numerical Integration V. Temlyakov IMI PREPRINT SERIES COLLEGE OF ARTS AND SCIENCES UNIVERSITY OF SOUTH

More information

9 Brownian Motion: Construction

9 Brownian Motion: Construction 9 Brownian Motion: Construction 9.1 Definition and Heuristics The central limit theorem states that the standard Gaussian distribution arises as the weak limit of the rescaled partial sums S n / p n of

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Research Article Least Squares Estimators for Unit Root Processes with Locally Stationary Disturbance

Research Article Least Squares Estimators for Unit Root Processes with Locally Stationary Disturbance Advances in Decision Sciences Volume, Article ID 893497, 6 pages doi:.55//893497 Research Article Least Squares Estimators for Unit Root Processes with Locally Stationary Disturbance Junichi Hirukawa and

More information

ASYMPTOTIC NORMALITY OF THE QMLE ESTIMATOR OF ARCH IN THE NONSTATIONARY CASE

ASYMPTOTIC NORMALITY OF THE QMLE ESTIMATOR OF ARCH IN THE NONSTATIONARY CASE Econometrica, Vol. 7, No. March, 004), 4 4 ASYMPOIC NORMALIY OF HE QMLE ESIMAOR OF ARCH IN HE NONSAIONARY CASE BY SØREN OLVER JENSEN AND ANDERS RAHBEK We establish consistency and asymptotic normality

More information

THE information capacity is one of the most important

THE information capacity is one of the most important 256 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 1, JANUARY 1998 Capacity of Two-Layer Feedforward Neural Networks with Binary Weights Chuanyi Ji, Member, IEEE, Demetri Psaltis, Senior Member,

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

where r n = dn+1 x(t)

where r n = dn+1 x(t) Random Variables Overview Probability Random variables Transforms of pdfs Moments and cumulants Useful distributions Random vectors Linear transformations of random vectors The multivariate normal distribution

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

ON THE BOUNDEDNESS BEHAVIOR OF THE SPECTRAL FACTORIZATION IN THE WIENER ALGEBRA FOR FIR DATA

ON THE BOUNDEDNESS BEHAVIOR OF THE SPECTRAL FACTORIZATION IN THE WIENER ALGEBRA FOR FIR DATA ON THE BOUNDEDNESS BEHAVIOR OF THE SPECTRAL FACTORIZATION IN THE WIENER ALGEBRA FOR FIR DATA Holger Boche and Volker Pohl Technische Universität Berlin, Heinrich Hertz Chair for Mobile Communications Werner-von-Siemens

More information

On Input Design for System Identification

On Input Design for System Identification On Input Design for System Identification Input Design Using Markov Chains CHIARA BRIGHENTI Masters Degree Project Stockholm, Sweden March 2009 XR-EE-RT 2009:002 Abstract When system identification methods

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Adjusted Empirical Likelihood for Long-memory Time Series Models

Adjusted Empirical Likelihood for Long-memory Time Series Models Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Finding Robust Solutions to Dynamic Optimization Problems

Finding Robust Solutions to Dynamic Optimization Problems Finding Robust Solutions to Dynamic Optimization Problems Haobo Fu 1, Bernhard Sendhoff, Ke Tang 3, and Xin Yao 1 1 CERCIA, School of Computer Science, University of Birmingham, UK Honda Research Institute

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Evgeny Spodarev WIAS, Berlin. Limit theorems for excursion sets of stationary random fields

Evgeny Spodarev WIAS, Berlin. Limit theorems for excursion sets of stationary random fields Evgeny Spodarev 23.01.2013 WIAS, Berlin Limit theorems for excursion sets of stationary random fields page 2 LT for excursion sets of stationary random fields Overview 23.01.2013 Overview Motivation Excursion

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Wavelet Methods for Time Series Analysis. Motivating Question

Wavelet Methods for Time Series Analysis. Motivating Question Wavelet Methods for Time Series Analysis Part VII: Wavelet-Based Bootstrapping start with some background on bootstrapping and its rationale describe adjustments to the bootstrap that allow it to work

More information

Evolutionary Power Spectrum Estimation Using Harmonic Wavelets

Evolutionary Power Spectrum Estimation Using Harmonic Wavelets 6 Evolutionary Power Spectrum Estimation Using Harmonic Wavelets Jale Tezcan Graduate Student, Civil and Environmental Engineering Department, Rice University Research Supervisor: Pol. D. Spanos, L.B.

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM J. Appl. Prob. 49, 876 882 (2012 Printed in England Applied Probability Trust 2012 A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM BRIAN FRALIX and COLIN GALLAGHER, Clemson University Abstract

More information

Power Spectral Density of Digital Modulation Schemes

Power Spectral Density of Digital Modulation Schemes Digital Communication, Continuation Course Power Spectral Density of Digital Modulation Schemes Mikael Olofsson Emil Björnson Department of Electrical Engineering ISY) Linköping University, SE-581 83 Linköping,

More information

Jeffrey s Divergence Between Complex-Valued Sinusoidal Processes

Jeffrey s Divergence Between Complex-Valued Sinusoidal Processes 07 5th European Signal Processing Conference EUSIPCO Jeffrey s Divergence Between Complex-Valued Sinusoidal Processes Eric Grivel, Mahdi Saleh and Samir-Mohamad Omar Bordeaux university - INP Bordeaux

More information

Spectral Analysis of Random Processes

Spectral Analysis of Random Processes Spectral Analysis of Random Processes Spectral Analysis of Random Processes Generally, all properties of a random process should be defined by averaging over the ensemble of realizations. Generally, all

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in

More information

NOTES AND PROBLEMS IMPULSE RESPONSES OF FRACTIONALLY INTEGRATED PROCESSES WITH LONG MEMORY

NOTES AND PROBLEMS IMPULSE RESPONSES OF FRACTIONALLY INTEGRATED PROCESSES WITH LONG MEMORY Econometric Theory, 26, 2010, 1855 1861. doi:10.1017/s0266466610000216 NOTES AND PROBLEMS IMPULSE RESPONSES OF FRACTIONALLY INTEGRATED PROCESSES WITH LONG MEMORY UWE HASSLER Goethe-Universität Frankfurt

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

A class of non-convex polytopes that admit no orthonormal basis of exponentials

A class of non-convex polytopes that admit no orthonormal basis of exponentials A class of non-convex polytopes that admit no orthonormal basis of exponentials Mihail N. Kolountzakis and Michael Papadimitrakis 1 Abstract A conjecture of Fuglede states that a bounded measurable set

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Fractal functional regression for classification of gene expression data by wavelets

Fractal functional regression for classification of gene expression data by wavelets Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,

More information

Akaike criterion: Kullback-Leibler discrepancy

Akaike criterion: Kullback-Leibler discrepancy Model choice. Akaike s criterion Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; θ) is (ψ

More information

COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS

COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS VII. COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS Until now we have considered the estimation of the value of a S/TRF at one point at the time, independently of the estimated values at other estimation

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

STAD57 Time Series Analysis. Lecture 23

STAD57 Time Series Analysis. Lecture 23 STAD57 Time Series Analysis Lecture 23 1 Spectral Representation Spectral representation of stationary {X t } is: 12 i2t Xt e du 12 1/2 1/2 for U( ) a stochastic process with independent increments du(ω)=

More information

Stochastic Process II Dr.-Ing. Sudchai Boonto

Stochastic Process II Dr.-Ing. Sudchai Boonto Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkuts Unniversity of Technology Thonburi Thailand Random process Consider a random experiment specified by the

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Wavelet Methods for Time Series Analysis. Part IX: Wavelet-Based Bootstrapping

Wavelet Methods for Time Series Analysis. Part IX: Wavelet-Based Bootstrapping Wavelet Methods for Time Series Analysis Part IX: Wavelet-Based Bootstrapping start with some background on bootstrapping and its rationale describe adjustments to the bootstrap that allow it to work with

More information

More Empirical Process Theory

More Empirical Process Theory More Empirical Process heory 4.384 ime Series Analysis, Fall 2008 Recitation by Paul Schrimpf Supplementary to lectures given by Anna Mikusheva October 24, 2008 Recitation 8 More Empirical Process heory

More information

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY PREFACE xiii 1 Difference Equations 1.1. First-Order Difference Equations 1 1.2. pth-order Difference Equations 7

More information

SZEGÖ ASYMPTOTICS OF EXTREMAL POLYNOMIALS ON THE SEGMENT [ 1, +1]: THE CASE OF A MEASURE WITH FINITE DISCRETE PART

SZEGÖ ASYMPTOTICS OF EXTREMAL POLYNOMIALS ON THE SEGMENT [ 1, +1]: THE CASE OF A MEASURE WITH FINITE DISCRETE PART Georgian Mathematical Journal Volume 4 (27), Number 4, 673 68 SZEGÖ ASYMPOICS OF EXREMAL POLYNOMIALS ON HE SEGMEN [, +]: HE CASE OF A MEASURE WIH FINIE DISCREE PAR RABAH KHALDI Abstract. he strong asymptotics

More information

In this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment.

In this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment. 1 Introduction and Problem Formulation In this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment. 1.1 Machine Learning under Covariate

More information

Multiresolution Models of Time Series

Multiresolution Models of Time Series Multiresolution Models of Time Series Andrea Tamoni (Bocconi University ) 2011 Tamoni Multiresolution Models of Time Series 1/ 16 General Framework Time-scale decomposition General Framework Begin with

More information

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Spectral Analysis. Jesús Fernández-Villaverde University of Pennsylvania

Spectral Analysis. Jesús Fernández-Villaverde University of Pennsylvania Spectral Analysis Jesús Fernández-Villaverde University of Pennsylvania 1 Why Spectral Analysis? We want to develop a theory to obtain the business cycle properties of the data. Burns and Mitchell (1946).

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models arxiv:1811.03735v1 [math.st] 9 Nov 2018 Lu Zhang UCLA Department of Biostatistics Lu.Zhang@ucla.edu Sudipto Banerjee UCLA

More information

ADAPTIVE SPECTRAL ANALYSIS OF REPLICATED NONSTATIONARY TIME SERIES

ADAPTIVE SPECTRAL ANALYSIS OF REPLICATED NONSTATIONARY TIME SERIES ADAPTIVE SPECTRAL ANALYSIS OF REPLICATED NONSTATIONARY TIME SERIES ROBERT KRAFTY Department of Biostatistics University of Pittsburgh Joint work with: Scott Bruce Department of Statistical Science, Temple

More information

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno Stochastic Processes M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno 1 Outline Stochastic (random) processes. Autocorrelation. Crosscorrelation. Spectral density function.

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

1. Fundamental concepts

1. Fundamental concepts . Fundamental concepts A time series is a sequence of data points, measured typically at successive times spaced at uniform intervals. Time series are used in such fields as statistics, signal processing

More information

Detection of structural breaks in multivariate time series

Detection of structural breaks in multivariate time series Detection of structural breaks in multivariate time series Holger Dette, Ruhr-Universität Bochum Philip Preuß, Ruhr-Universität Bochum Ruprecht Puchstein, Ruhr-Universität Bochum January 14, 2014 Outline

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Residual Bootstrap for estimation in autoregressive processes

Residual Bootstrap for estimation in autoregressive processes Chapter 7 Residual Bootstrap for estimation in autoregressive processes In Chapter 6 we consider the asymptotic sampling properties of the several estimators including the least squares estimator of the

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata ' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3--

More information