A Fast Agorithm for Computing the Fourier Spectrum of a Fractiona Period arxiv:1604.01589v3 [math.sp] 2 Nov 2017 Jiasong Wang 1, Changchuan Yin 2, 1.Department of Mathematics, Nanjing University, Nanjing, Jiangsu 210093, China 2.Department of Mathematics, Statistics and Computer Science The University of Iinois at Chicago, Chicago, IL 60607-7045, USA Corresponding author, Emai: cyin1@uic.edu Abstract The Fourier spectrum at a fractiona period is often examined when extracting features from bioogica sequences and time series. It refects the inner information structure of the sequences. A fractiona period is not uncommon in time series. A typica exampe is the 3.6 period in protein sequences, which determines the α-heix secondary structure. Computing the spectrum of a fractiona period offers a high-resoution insight into a time series. It has thus become an important approach in genomic anaysis. However, computing Fourier spectra of fractiona periods by the traditiona Fourier transform is computationay expensive. In this paper, we present a nove, fast agorithm for directy computing the fractiona period spectrum (FPS) of time series. The agorithm is based on the periodic distribution of signa strength at periodic positions of the time series. We provide theoretica anaysis, deduction, and specia techniques for reducing the computationa costs of the agorithm. The anaysis of the computationa compexity of the agorithm shows that the agorithm is much faster than traditiona Fourier transform. Our agorithm can be appied directy in computing fractiona periods in time series from a broad of research fieds. The computer programs of the FPS agorithm are avaiabe at https://github.com/cyinbox/fps. Keywords: Fourier transform, periodicity, fractiona period spectrum, DNA sequences, protein sequences Preprint submitted to arxiv.org November 3, 2017
1. Introduction Signa processing approaches have been widey appied in the periodicity anaysis of time series, and thus becoming a major research too for bioinformatics (Anastassiou, 2001; Chen et a., 2003). The main requisite in appying signa processing to symboic sequences is mapping the sequences onto numerica time series. For exampe, we previousy presented a numerica representation of DNA sequences without degeneracy (Yau et a., 2003). After numerica mapping, mathematica methods can be empoyed to study features, structures, and functions of the symboic sequences. The most common signa processing approach is Fourier transform (Wech, 1967), which has been widey used to study periodicity and repetitive regions in symboic DNA sequences (Siverman and Linsker, 1986; Anastassiou, 2001), as we as in genome comparison (Yin et a., 2014a; Yin and Yau, 2015). We have studied the mathematica properties of the Fourier spectrum for symboic sequences (Wang et a., 2012, 2014). For a review of mathematica methods for the study of bioogica sequences, one may refer to our book (Wang and Yan, 2013). Periods in bioogica sequences can be categorized into two types: integer periods and fractiona periods. For integer periods, the 3-base periodicity of protein-coding regions is often used in gene finding (Tiwari et a., 1997; Yin and Yau, 2005, 2007; Yin, 2015); 2-base periodicity was found in introns of genomes (Arquès and Miche, 1987). 2-period exists in protein sequence regions for beta-sheet structures. As indicated by their name, fractiona periods do not correspond to an integer number of cyces in numerica series. Fractiona periods are prevaent and important in structures and functions of protein sequences and genomes. For exampe, the 3.6-period in protein sequences determines the α-heix secondary structure (Eisenberg et a., 1984; Gruber et a., 2005; Leonov and Arkin, 2005; Yin and Yau, 2017). Strong 10.4- or 10.5-base periodicity in genomes are associated with nuceosomes (Trifonov, 1998; Saih and Trifonov, 2015). The 6.5-base periodicity is present in C. eegans introns (Messaoudi et a., 2013). Fractiona period spectrum may offer high resoution and precise features of sequences. However, identifying fractiona period features in a arge genome is chaenging. The demand of the periodicity research of DNA and protein sequences motivates us to investigate advanced techniques for fractiona periodicity anaysis of time series, which correspond to bioogica symboic sequences. With the quicky growing voume of genomic sequences of human and 2
mode organisms, there is a strong demand to characterize not ony integer periods but aso fractiona periods in genomic and protein sequences. We previousy proposed a method, periodic power spectrum (PPS), which can directy compute Fourier power spectrum based on periodic distributions of signa strength on periodic positions (Wang et a., 2012; Yin and Wang, 2016). The main advantage the PPS method is that it can avoid spectra eakage and reduce background noise, which both appear in power spectrum in Fourier transform. Thus, the PPS method can capture a atent integer periodicities in DNA sequences. We have appied this method in detection of atent periodicities in different genome eements, incuding exons and microsateite DNA sequences. This paper extends the PPS method to suggest a nove method for fast and directy cacuating Fourier spectrum of a fractiona period. This method wi be aso hepfu in directy computing the fractiona period spectrum for any time series from science and technoogy fieds outside of bioinformatics. 2. Methods Deduced 2.1. Fourier transform and congruence derivative sequence For a numerica sequence x of ength m, consisting of x 0,x 1,,x m 1, its discrete Fourier transform (DFT) at frequency k is defined as (Wech, 1967) X(k) = m 1 x j e i2πkj/m,i = 1, k = 0,1,...,m 1 (2.1) And its Fourier power spectrum at frequency k is defined as (Wech, 1967) PS(k) = X (k)x(k) m 1 m 1 = ( x j e i2πjk/m ) ( x j e i2πjk/m ) i = 1,k = 0,1,...,m 1 (2.2) where indicates the compex conjugate. Because our method is appied to researching the periodicity of sequence by using Fourier spectrum, for convenience, the frequency k/m is caed period m/k. In this paper, we directy define the DFT spectrum by period instead of frequency. 3
We want to compute Fourier spectrum at period /k, > k and,k are smaer than m. Using traditiona formua of Fourier transform, we need to cacuate Fourier transform X(k) = m 1 Then, its spectrum at period /k or at frequency k is m 1 PS(k) = ( x j e i2πjk x j e i2πjk (2.3) m 1 ) ( x j e i2πjk ) (2.4) We wi introduce a fast method for directy computing the Fourier spectrum at period /k. We previousy demonstrate that the Fourier power spectrum of a numerica sequence is determined by the distribution of signa strength at periodic positions in this sequence (Wang et a., 2012; Yin and Wang, 2016). The distribution can be represented and measured by the congruence derivative sequence of the origina sequence, which is defined as foows. Definition 2.1. For a rea number sequence x of ength m, if two positive integers n and satisfy m = n, then the congruence derivative sequence, y, of x has a ength of and its eement is defined as foows n 1 y t = x j+t,t = 0,1,... 1 (2.5) The foowing deduced theorems indicate that the congruence derivative sequence of ength of the numerica sequence x can be used to compute periodic power spectrum at period. The Fourier transform of the congruence derivative sequence y is 1 Y(k) = y t e i2πtk/,k = 0,1,... 1 (2.6) The reationship between Fourier transform of the origina sequence and its congruence derivative sequence was introduced in (Wang et a., 2012; Yin and Wang, 2016). The DFT of sequence x at frequency n = m/ is 4
the same as the DFT of the congruence derivative sequence y at frequency. X(n) = X(m/) = Y(1) (2.7) This formua is the mathematica ground of the PPS method. The foowing theorem extends the concusion to a genera case. Theorem 2.1. For a rea number sequence x of ength m, suppose n = m/, then the DFT of the numerica sequence x at frequency kn is X(kn) = Y(k),0 k 1 (2.8) Proof. We know 1 1 n 1 1 n 1 Y(k) = y t e i2πtk/ = x j+t e i2πtk/ = x j+t e i2π(j+t)k/ = = X(kn) m 1 r=0 x r e i2πrkn/m The proof of Theorem 2.1 indicates that the property of conjugate symmetry in DFT of a rea sequence is preserved. We thus have the foowing coroary. Coroary 2.1. X(kn) = X (( k)n),1 k 1 (2.9) According to formua 2.2, Fourier power spectrum of sequence x at frequency kn is defined as PS(kn) = X (kn)x(kn) = X (km/)x(km/) = Y (k)y(k). (2.10) Definition 2.2. If m = n, the DFT spectrum of numerica sequence x at fractiona period /k is named the fractiona period spectrum (FPS) and is written as FPS(/k) = Y (k)y(k),1 k 1 Definition 2.3. For a rea number sequence of ength, y t,t = 0,1,..., 1, its sef q-shift summation is defined by 1 z,q = y t y t+q (2.11) 5
with t+q taken moduo, 0 q <. If we write sequencey t,t = 0,1,..., 1, as a vector, y T = [y 0,y 1,,y 1 ], then z,q can be reaized by the autocorreation of vector y. For a specia case, z,0 is equa to the inner product of vector y, or z,0 = y T y. From the definition of DFT of the congruence derivative sequence (formua 2.6), we have as 1 1 1 Y(k) = y t e i2πtk/ = y t cos(2πtk/) i y t sin(2πtk/) (2.12) From the definition of FPS, FPS(/k) = Y (k)y(k), we notice that 1 1 FPS(/k) = ( y t cos(2πtk/)) 2 +( y t sin(2πtk/)) 2 (2.13) which is rea quadratic form of variabes, y 0,y 1,...,y 1, can be written FPS(/k) = y t Ay (2.14) where the coefficient matrix A = a rs is of quadratic form, in which a rs = cos((r ) 2kπ )cos((s ) 2kπ )+sin((r ) 2kπ )sin((s ) 2kπ ) r,s = 1,2,..., (2.15) After reorganizing formua (2.15), we obtain the foowing theorem about the coefficient matrix A. Theorem 2.2. The coefficient matrix of the quadratic form, A, is a symmetric matrix. Proof. According to some fundamenta identities of trigonometry, a rs = cos((r ) 2kπ )cos((s ) 2kπ )+sin((r ) 2kπ )sin((s ) 2kπ ) woud be equa to cos((r (s )) 2kπ ) = cos((r s) 2kπ ), i.e., a rs = cos((r s) 2kπ ). It is cear that a rs = a sr, so the coefficient matrix A is symmetric. 6
Theorem 2.3. For a rea number sequence x of ength m, if m = n and its congruence derivative sequence is y t,t = 0,1,, 1, then FPS(/k) of the sequence x can be expressed as foows. If is an odd number, If is an even number, FPS(/k) = z,0 +2 2 1 FPS(/k) = z,0 +2 q=1 2 1 q=1 z,q cos((q) 2kπ z,q cos((q) 2kπ ). )+2cos(( 2 )2kπ 2 1 ) y t y t+ 2 Proof. It is cear the diagona eements of the coefficient matrix A are unit eements. Due to the symmetry of matrix A, we need to check the ower trianguar matrix of A under the diagona of A, which is sufficient for proof. The eement of the ower trianguar matrix ofa, a rs = cos((r s) 2kπ ),r = 2,3,...,,r > s, aso has some kind of symmetry. The resut is as foows. Whenis an odd number, a 21 = a 32 = a 43 =... = a ( 1) = cos((1) 2kπ ) and a 21,a 32,...,a ( 1) aso are equa to a 1. Because a rs = a t r s t,r > s,0 t < r s, simiary, a 31 = a 42 =... = a ( 2) = a ( 1)1 = a 2 = cos((2) 2kπ )..., then a (+1)/21 = a (+3)/22 =...,= a (+1)/2 = a (+3)/21 =...,a ( 1)/2 = cos((( 1)/2) 2kπ ). When is an even number, we have a 21 = a 32 = a 43 =... = a ( 1) = a 1 = cos((1) 2kπ ) and a 31 = a 42 =... = a ( 2) = a ( 1)1 = a 2 = cos((2) 2kπ ). In genera, a /21 = a (/2+1)2 =...,= a (/2+1) = a (/2+2)1 = a (/2+3)2 =...,= a (/2 1) = cos((/2 1) 2kπ ), and a (/2+1)1 = a (/2+2)2 =...,= a /2 = cos((/2) 2kπ ). We then substitute those resuts into the coefficient matrix of the quadratic form A for the FPS(/k). The foowing formuas can be obtained. 7
If is an odd number, FPS(/k) = y 2 0 +y 2 1 +y 2 2 + +y 2 1 +2cos((1) 2kπ )(y 0 y 1 +y 1 y 2 +y 2 y 3 +y 3 y 4 y 1 y 0 ) +2cos((2) 2kπ )(y 0 y 2 +y 1 y 3 +y 2 y 4 ++y 3 y 5 y 2 y 0 +y 1 y 1 ) +2cos((3) 2kπ )(y 0 y 3 +y 1 y 4 +y 2 y 5 ++y 3 y 6 y 3 y 0 +y 2 y 1 +y 1 y 2 ) +2cos((( 1)/2) 2kπ )(y 0 y ( 1)/2 +y 1 y (+1)/2 +y 2 y (+3)/2 + y ( ( 1)/2) y 0 + +y 1 y ( 3)/2 ). (2.16) If is an even number, FPS(/k) = y 2 0 +y2 1 +y2 2 + +y2 1 +2cos((1) 2kπ )(y 0 y 1 +y 1 y 2 +y 2 y 3 +y 3 y 4 y 1 y 0 ) +2cos((2) 2kπ )(y 0 y 2 +y 1 y 3 +y 2 y 4 ++y 3 y 5 y 2 y 0 +y 1 y 1 ) +2cos((3) 2kπ )(y 0 y 3 +y 1 y 4 +y 2 y 5 ++y 3 y 6 y 3 y 0 +y 2 y 1 +y 1 y 2 ) +2cos((/2 1) 2kπ )(y 0 y /2 1 +y 1 y /2 +y 2 y /2+1 + y 3 y /2 2 +y 2 y /2 1 +y 1 y /2 ) +2cos((/2) 2kπ )(y 0 y /2 +y 1 y /2+1 +y 2 y /2+2 + y /2 1 y 1 ). (2.17) The ast term in equation (2.17), cos((/2) 2kπ )(y 0 y /2 +y 1 y /2+1 +y 2 y /2+2 + +y /2 1 y 1 ) can be written as cos((/2) 2kπ ) /2 1 y t y t+/2. Noticing the definition of z,q, the formuas (2.16) and (2.17) may be written as foows. If is an odd number, FPS(/k) = z,0 +2 8 2 1 q=1 z,q cos((q) 2kπ ) (2.18)
If is an even number, 2 1 FPS(/k) = z,0 +2 q=1 z,q cos((q) 2kπ )+2cos(( 2 )2kπ 2 1 ) y t y t+ 2 (2.19) Based on Theorem 2.3 and Coroary 2.1, FPS(/k) may be re-written by foowing theorem, which may improve the technique for computing F P S(/k). Theorem 2.4. FPS(/k) = FPS(/( k),)1 k 1 (2.20) According to Theorem 2.4, to compute a of the FPS(/k), 1 k 1, we ony require about haf of computing costs. For k = 1, noticing formuas (2.18) and (2.19) and the symmetry of FPS(/k), we need to cacuate( 1)/2 coefficients of quadraticsy 0,y 1,...,y 1 when is an odd number; and /2 coefficients when is an even number. It is vauabe to notice on the property of a cosine function, cos((q) 2kπ ) = cos((q) 2( k)π ),0 k 1. (2.21) Certainy, it is the foundation of Theorem 2.4 and an important formua to prove the foowing theorem. Theorem 2.5. The coefficient matrix of quadratic form A when k = 1 is sufficient for computing F P S(/k) when k 1. Proof. Without oss of generaity, we show the case when is an odd number. For a fixed k, k 1, from Theorem 2.4, the range of k shoud be in 2 k ( 1)/2 and the ( 1)/2 coefficients of quadratics of y 0,y 1,,y 1 are cos((t) 2kπ ),t = 1,2,..., 1 2 For k = 1, we obtain the coefficients cos((q) 2π 1 ),q = 1,2,...,. 2 If the ( 1)/2 coefficients, cos((t) 2kπ ),t = 1,2,,( 1)/2 can be represented by cos((q) 2π ),q = 1,2,,( 1)/2, then, the proposition is proved. It is known that the interva of k is [2,( 1)/2], so, for t = 1,cos((k) 2π ), the first coefficient of FPS(/k), is one of cos((q) 2π ),q = 1,2,...,( 1)/2. 9
When t 1, we may have one of two resuts: if tk in the interva [1,( 1)/2], it is obvious that cos((t) 2kπ ) is one of cos((q) 2π ),q = 1,2,...,( 1)/2; otherwise, if tk > ( 1)/2, based on formua (2.21) since tk beongs to [1,( 1)/2]. 3. Fast Agorithm 3.1. Fast agorithm for computing fractiona period spectrum at period /k Based on Theorem 2.5, the agorithm shoud do a buid-in data in the computer. The data contains the coefficients of quadratics, when k = 1 and for periods, = 2,3,..., to store in the computer. The buit-in data save much computing time because the agorithm does not need to repeatedy find the coefficients for next user. The programs sha design two functions. One function is to obtain the congruence derivative sequence (CDS) (y). The other function Z(, q) is to yied the autocorreation of the congruence derivative sequence with two parameters and q. A MATLAB functionrem(p,) may be used forpandare integer number if the programming is written by MATLAB, where the rem(p,) means the remainder after p divided by. Given input origina sequence x, the function CDS (y) yieds the congruence derivative sequence of the sequence x for a period in the computation. Suppose we need to cacuate the fractiona period /k, we then have the foowing agorithm. If is odd number, compete the oop for ordering to take out ( 1)/2 coefficients of quadratics, cos((1) 2π ),cos((2)2π),,cos((( 1)/2)2π ), from the buid-in data. From t = 1 to ( 1)/2, do if r = rem(tk,) > ( 1)/2 then takecos(( r) 2π) ese takecos((r)2π). At the same time, the2cos(()2π) vaue times the Z(,t) and to do summation corresponding to t order unti t = ( 1)/2. The computed summation added with Z(,0) is the FPS(/k). If is even number, doing the oop for ordering to take out /2 coefficients of quadratics, cos((1) 2π ),cos((2)2π),,cos((/2)2π )), from the buid-in data, i.e., t = 1 to ( 1)/2 do if r = rem(tk,) > ()/2 then take cos( r) 2π ese take cos((r) 2π ). At the same time, the 2cos(()2π ) vaue times the Z(,t) to do summation corresponding to t order unti t = (/2 1). FPS(/k) wi be the computed summation added with two quantities, 2cos((/2) 2kπ ) /2 1 timing y t y t+/2, and Z(,0). 10
To assess the effectiveness of proposed FPS agorithm in capturing fractiona periodicities in digita signas, we compute the FPS spectrum of the simuated sinusoida signa. The sinusoida signa of ength N = 300, consists of sine and cosine signas with fractiona periodicities 3.7 and 5.6, respectivey. The signa is corrupted by white Gaussian noise (Equation (3.1)). x(n) = sin(2π n + π)+cos(2π n + 3π)+noise 3.7 4 5.6 4 n = 1,2,,300 (3.1) Since the periodic signa is buried by the white Gaussian noise, two periodicities 3.7 and 5.6 in the origina signa are hidden by noise (Fig.1 (a)). The proposed FPS spectrum of the signa can ceary discover two pronounced peaks at positions 3.7 and 5.6 (Fig.1 (b)), corresponding to two periodicities 3.7 and 5.6 in the origina signa. The power spectrum of this simuation sequence by the proposed method agrees with the DFT (Fig.1 (b) and (c)). This resut demonstrates the effectiveness of the FPS agorithm. 3.2. Computationa compexity of the FPS agorithm Recaing the process for computing FPS(/k) of a time series by using the new agorithm, the first step is to yied the congruence derivative sequence y j, j = 0,1,..., 1, from the origina sequence x of ength m. This computing process ony needs pus operation and do not use mutipication. The second step in the agorithm is to compute z,q when is an even or odd number, the agorithm needs approximate /2( + 1) mutipications. Finay, using formuas (2.18, 2.19) needs /2 mutipications to determine the vaue of z,q cos((q)2kπ/). So the tota computationa quantity for the FPS agorithm invoves /2( + 1) + /2 = 2 /2 + mutipications, i.e., the computationa compexity is then O( 2 +). Using the traditiona Fourier transform to compute F P S(/k), we need to cacuate the Fourier transform at period /k by the formuas 2.3 and 2.4. From (2.3), we must cacuate m exponents, i2πjk/,j = 0,1,2,...,m 1, and ca exp function m times to figure out e i2πjk/,j = 0,1,2,...,m 1, if using MATLAB. It is known that to compute function e i2πjk/ costs much time. Then to figure out m 1 e i2πjk/ needs 2m mutipications and the time for caing m exp functions. So, according to formua (2.4) cacuating out the Fourier spectrum at period /k needs 2m + 1 mutipications, i.e., O(2m+ 1). If F P S(/k) is computed for whoe fractiona periods, traditiona Fourier transform needs /2O(2m + 1) operations, i.e. O(m + ). 11
4 3 2 1 Signa 0 1 2 3 4 0 50 100 150 200 250 300 Time (second) 3 x 104 (a) 2.5 Power spectrum 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 Periodicity 3 x 104 (b) 2.5 Power spectrum 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 Periodicity (c) 12 Figure 1: Fractiona period spectrum anaysis of sinusoida signa coapsed with white noise. (a) origina sinusoida signa. (b) Fractiona period spectrum of the sinusoida signa by the FPS agorithm. (c) Fractiona period spectrum of the sinusoida signa by DFT.
In summary, the anaysis of computationa compexity indicates that the FPS agorithm is much faster than traditiona Fourier transform for computing fractiona periodic spectrum because of O( 2 +) << O(m+). 4. Concusions A fast agorithm for computing the FPS of bioogica sequences is introduced in this paper. First, the mathematica deduction and proof are presented. Then, based on the mathematica theory, the technique of buitin data is suggested and some functions are introduced, so that a simpe program can be designed. The anaysis of computationa compicity shows that this fast agorithm is much faster than traditiona Fourier transform. The fast agorithm wi be aso hepfu in computing the FPS for any time series from the science and technoogy fied other than bioinformatics. In addition, the congruence derivative sequence of origina sequences can be used to yied Ramanujan transform easiy (Hua et a., 2014; Yin et a., 2014b), which do not have any computing cost. The computation for identifying periods of the bioogica sequence or other time series woud be done by Fourier transform and Ramanujan transform simutaneousy. 13
References Anastassiou, D., 2001. Genomic signa processing. Signa Processing Magazine, IEEE 18 (4), 8 20. Arquès, D. G., Miche, C. J., 1987. Periodicities in introns. Nuceic Acids Research 15 (18), 7581 7592. Chen, J., Li, H., Sun, K., Kim, B., 2003. How wi bioinformatics impact signa processing research? Signa Processing Magazine, IEEE 20 (6), 106 206. Eisenberg, D., Weiss, R., Terwiiger, T., 1984. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc. Nat. Acad. Sci. 81, 140 144. Gruber, M., Söding, J., Lupas, A. N., 2005. Repper repeats and their periodicities in fibrous proteins. Nuceic Acids Research 33 (supp 2), W239 W243. Hua, W., Wang, J., Zhao, J., 2014. Discrete ramanujan transform for distinguishing the protein coding regions from other regions. Moecuar and ceuar probes 28 (5), 228 236. Leonov, H., Arkin, I. T., 2005. A periodicity anaysis of transmembrane heices. Bioinformatics 21 (11), 2604 2610. Messaoudi, I., Ouesati, A. E., Lachiri, Z., 2013. Detection of the 6.5-base periodicity in the c. eegans introns based on the frequency chaos game signa and the compex moret waveet anaysis. Internationa Journa of Scientific Engineering and Technoogy 2 (12), 1247 1251. Saih, B., Trifonov, E. N., 2015. Strong nuceosomes reside in meiotic centromeres of c. eegans. Journa of Biomoecuar Structure and Dynamics 33 (2), 365 373. Siverman, B., Linsker, R., 1986. A measure of DNA periodicity. Journa of theoretica bioogy 118 (3), 295 300. Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R., 1997. Prediction of probabe genes by Fourier anaysis of genomic sequences. Bioinformatics 13 (3), 263 270. 14
Trifonov, E. N., 1998. 3-, 10.5-, 200-and 400-base periodicities in genome sequences. Physica A: Statistica Mechanics and its Appications 249 (1), 511 516. Wang, J., Dang, C., Yin, C., 2014. A comparative study of the signa-tonoise ratios of different representations for symboic sequences. Journa of Appied Mathematics and Bioinformatics 372, 135 145. Wang, J., Liu, G., Zhao, J., 2012. Some features of Fourier spectrum for symboic sequences. Numerica Mathematics A Journa of Chinese Universities (4), 341 356. Wang, J., Yan, M., 2013. Numerica Methods in Bioinformatics: An Introduction. Science Press. Wech, P., 1967. The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Eectroacoustics 15 (2), 70 73. Yau, S. S.-T., Wang, J., Niknejad, A., Lu, C., Jin, N., Ho, Y.-K., 2003. DNA sequence representation without degeneracy. Nuceic Acids Research 31 (12), 3078 3080. Yin, C., 2015. Representation of DNA sequences in genetic codon context with appications in exon and intron prediction. Journa of Bioinformatics and Computationa Bioogy 13 (2), 1550004 2. Yin, C., Chen, Y., Yau, S. S.-T., 2014a. A measure of DNA sequence simiarity by Fourier transform with appications on hierarchica custering. Journa of Theoretica Bioogy 359, 18 28. Yin, C., Wang, J., 2016. Periodic power spectrum with appications in detection of atent periodicities in DNA sequences. Journa of Mathematica Bioogy 73 (5), 1053 1079. Yin, C., Yau, S. S.-T., 2005. A Fourier characteristic of coding sequences: origins and a non-fourier approximation. Journa of Computationa Bioogy 12 (9), 1153 1165. 15
Yin, C., Yau, S. S.-T., 2007. Prediction of protein coding regions by the 3-base periodicity anaysis of a DNA sequence. Journa of Theoretica Bioogy 247 (4), 687 694. Yin, C., Yau, S. S.-T., 2015. An improved mode for whoe genome phyogenetic anaysis by Fourier transform. Journa of Theoretica Bioogy 359 (21), 18 28. Yin, C., Yau, S. S.-T., 2017. A coevoution anaysis for identifying proteinprotein interactions by Fourier transform. PoS ONE 12 (4), e0174862. Yin, C., Yin, X. E., Wang, J., 2014b. A nove method for comparative anaysis of DNA sequences by Ramanujan-Fourier transform. Journa of Computationa Bioogy 21 (12), 867 879. 16