arxiv: v3 [math.sp] 2 Nov 2017

Similar documents
A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

Partial permutation decoding for MacDonald codes

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5].

2M2. Fourier Series Prof Bill Lionheart

Investigation on spectrum of the adjacency matrix and Laplacian matrix of graph G l

BP neural network-based sports performance prediction model applied research

A proposed nonparametric mixture density estimation using B-spline functions

Math 124B January 17, 2012

FOURIER SERIES ON ANY INTERVAL

DISTRIBUTION OF TEMPERATURE IN A SPATIALLY ONE- DIMENSIONAL OBJECT AS A RESULT OF THE ACTIVE POINT SOURCE

CS229 Lecture notes. Andrew Ng

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case

Discrete Techniques. Chapter Introduction

Research Article On the Lower Bound for the Number of Real Roots of a Random Algebraic Equation

A Brief Introduction to Markov Chains and Hidden Markov Models

CONSTRUCTION AND APPLICATION BASED ON COMPRESSING DEPICTION IN PROFILE HIDDEN MARKOV MODEL

Discrete Techniques. Chapter Introduction

Course 2BA1, Section 11: Periodic Functions and Fourier Series

Some Measures for Asymmetry of Distributions

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

Cryptanalysis of PKP: A New Approach

A Robust Voice Activity Detection based on Noise Eigenspace Projection

STABILITY OF A PARAMETRICALLY EXCITED DAMPED INVERTED PENDULUM 1. INTRODUCTION

A Two-Parameter Trigonometric Series

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

Explicit overall risk minimization transductive bound

International Journal of Mass Spectrometry

Distributed average consensus: Beyond the realm of linearity

Math 124B January 31, 2012

Lecture Note 3: Stationary Iterative Methods

Fitting Algorithms for MMPP ATM Traffic Models

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

14 Separation of Variables Method

Converting Z-number to Fuzzy Number using. Fuzzy Expected Value

SEA Subsystem Attribute Correction Based on Stiffness Multipliers

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC

b n n=1 a n cos nx (3) n=1

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Week 6 Lectures, Math 6451, Tanveer

The influence of temperature of photovoltaic modules on performance of solar power plant

A Novel Learning Method for Elman Neural Network Using Local Search

Haar Decomposition and Reconstruction Algorithms

arxiv: v1 [math.co] 17 Dec 2018

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

8 Digifl'.11 Cth:uits and devices

On the evaluation of saving-consumption plans

Supplementary Material: An energy-speed-accuracy relation in complex networks for biological discrimination

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

A Novel Approach to Security Enhancement of Chaotic DSSS Systems

Minimizing Total Weighted Completion Time on Uniform Machines with Unbounded Batch

Soft Clustering on Graphs

Data Search Algorithms based on Quantum Walk

Improving the Reliability of a Series-Parallel System Using Modified Weibull Distribution

12.2. Maxima and Minima. Introduction. Prerequisites. Learning Outcomes

FREQUENCY modulated differential chaos shift key (FM-

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

Simplified analysis of EXAFS data and determination of bond lengths

4 Separation of Variables

arxiv: v1 [math.ca] 6 Mar 2017

Maximum likelihood decoding of trellis codes in fading channels with no receiver CSI is a polynomial-complexity problem

Unit 48: Structural Behaviour and Detailing for Construction. Deflection of Beams

Efficiently Generating Random Bits from Finite State Markov Chains

Numerical Simulation for Optimizing Temperature Gradients during Single Crystal Casting Process

An explicit Jordan Decomposition of Companion matrices

AALBORG UNIVERSITY. The distribution of communication cost for a mobile service scenario. Jesper Møller and Man Lung Yiu. R June 2009

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

II. PROBLEM. A. Description. For the space of audio signals

Identification of macro and micro parameters in solidification model

Intuitionistic Fuzzy Optimization Technique for Nash Equilibrium Solution of Multi-objective Bi-Matrix Games

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces

AST 418/518 Instrumentation and Statistics

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Tracking Control of Multiple Mobile Robots

Approach to Identifying Raindrop Vibration Signal Detected by Optical Fiber

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Unconditional security of differential phase shift quantum key distribution

C. Fourier Sine Series Overview

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Statistical Learning Theory: A Primer

HYDROGEN ATOM SELECTION RULES TRANSITION RATES

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

On prime divisors of remarkable sequences

Formulas for Angular-Momentum Barrier Factors Version II

Approximated MLC shape matrix decomposition with interleaf collision constraint

Laboratory Exercise 1: Pendulum Acceleration Measurement and Prediction Laboratory Handout AME 20213: Fundamentals of Measurements and Data Analysis

Related Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage

Paragraph Topic Classification

XSAT of linear CNF formulas

New Efficiency Results for Makespan Cost Sharing

Applied and Computational Harmonic Analysis

In-plane shear stiffness of bare steel deck through shell finite element models. G. Bian, B.W. Schafer. June 2017

STA 216 Project: Spline Approach to Discrete Survival Analysis

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Effect of transport ratio on source term in determination of surface emission coefficient

Efficient Algorithms for Pairing-Based Cryptosystems

Transcription:

A Fast Agorithm for Computing the Fourier Spectrum of a Fractiona Period arxiv:1604.01589v3 [math.sp] 2 Nov 2017 Jiasong Wang 1, Changchuan Yin 2, 1.Department of Mathematics, Nanjing University, Nanjing, Jiangsu 210093, China 2.Department of Mathematics, Statistics and Computer Science The University of Iinois at Chicago, Chicago, IL 60607-7045, USA Corresponding author, Emai: cyin1@uic.edu Abstract The Fourier spectrum at a fractiona period is often examined when extracting features from bioogica sequences and time series. It refects the inner information structure of the sequences. A fractiona period is not uncommon in time series. A typica exampe is the 3.6 period in protein sequences, which determines the α-heix secondary structure. Computing the spectrum of a fractiona period offers a high-resoution insight into a time series. It has thus become an important approach in genomic anaysis. However, computing Fourier spectra of fractiona periods by the traditiona Fourier transform is computationay expensive. In this paper, we present a nove, fast agorithm for directy computing the fractiona period spectrum (FPS) of time series. The agorithm is based on the periodic distribution of signa strength at periodic positions of the time series. We provide theoretica anaysis, deduction, and specia techniques for reducing the computationa costs of the agorithm. The anaysis of the computationa compexity of the agorithm shows that the agorithm is much faster than traditiona Fourier transform. Our agorithm can be appied directy in computing fractiona periods in time series from a broad of research fieds. The computer programs of the FPS agorithm are avaiabe at https://github.com/cyinbox/fps. Keywords: Fourier transform, periodicity, fractiona period spectrum, DNA sequences, protein sequences Preprint submitted to arxiv.org November 3, 2017

1. Introduction Signa processing approaches have been widey appied in the periodicity anaysis of time series, and thus becoming a major research too for bioinformatics (Anastassiou, 2001; Chen et a., 2003). The main requisite in appying signa processing to symboic sequences is mapping the sequences onto numerica time series. For exampe, we previousy presented a numerica representation of DNA sequences without degeneracy (Yau et a., 2003). After numerica mapping, mathematica methods can be empoyed to study features, structures, and functions of the symboic sequences. The most common signa processing approach is Fourier transform (Wech, 1967), which has been widey used to study periodicity and repetitive regions in symboic DNA sequences (Siverman and Linsker, 1986; Anastassiou, 2001), as we as in genome comparison (Yin et a., 2014a; Yin and Yau, 2015). We have studied the mathematica properties of the Fourier spectrum for symboic sequences (Wang et a., 2012, 2014). For a review of mathematica methods for the study of bioogica sequences, one may refer to our book (Wang and Yan, 2013). Periods in bioogica sequences can be categorized into two types: integer periods and fractiona periods. For integer periods, the 3-base periodicity of protein-coding regions is often used in gene finding (Tiwari et a., 1997; Yin and Yau, 2005, 2007; Yin, 2015); 2-base periodicity was found in introns of genomes (Arquès and Miche, 1987). 2-period exists in protein sequence regions for beta-sheet structures. As indicated by their name, fractiona periods do not correspond to an integer number of cyces in numerica series. Fractiona periods are prevaent and important in structures and functions of protein sequences and genomes. For exampe, the 3.6-period in protein sequences determines the α-heix secondary structure (Eisenberg et a., 1984; Gruber et a., 2005; Leonov and Arkin, 2005; Yin and Yau, 2017). Strong 10.4- or 10.5-base periodicity in genomes are associated with nuceosomes (Trifonov, 1998; Saih and Trifonov, 2015). The 6.5-base periodicity is present in C. eegans introns (Messaoudi et a., 2013). Fractiona period spectrum may offer high resoution and precise features of sequences. However, identifying fractiona period features in a arge genome is chaenging. The demand of the periodicity research of DNA and protein sequences motivates us to investigate advanced techniques for fractiona periodicity anaysis of time series, which correspond to bioogica symboic sequences. With the quicky growing voume of genomic sequences of human and 2

mode organisms, there is a strong demand to characterize not ony integer periods but aso fractiona periods in genomic and protein sequences. We previousy proposed a method, periodic power spectrum (PPS), which can directy compute Fourier power spectrum based on periodic distributions of signa strength on periodic positions (Wang et a., 2012; Yin and Wang, 2016). The main advantage the PPS method is that it can avoid spectra eakage and reduce background noise, which both appear in power spectrum in Fourier transform. Thus, the PPS method can capture a atent integer periodicities in DNA sequences. We have appied this method in detection of atent periodicities in different genome eements, incuding exons and microsateite DNA sequences. This paper extends the PPS method to suggest a nove method for fast and directy cacuating Fourier spectrum of a fractiona period. This method wi be aso hepfu in directy computing the fractiona period spectrum for any time series from science and technoogy fieds outside of bioinformatics. 2. Methods Deduced 2.1. Fourier transform and congruence derivative sequence For a numerica sequence x of ength m, consisting of x 0,x 1,,x m 1, its discrete Fourier transform (DFT) at frequency k is defined as (Wech, 1967) X(k) = m 1 x j e i2πkj/m,i = 1, k = 0,1,...,m 1 (2.1) And its Fourier power spectrum at frequency k is defined as (Wech, 1967) PS(k) = X (k)x(k) m 1 m 1 = ( x j e i2πjk/m ) ( x j e i2πjk/m ) i = 1,k = 0,1,...,m 1 (2.2) where indicates the compex conjugate. Because our method is appied to researching the periodicity of sequence by using Fourier spectrum, for convenience, the frequency k/m is caed period m/k. In this paper, we directy define the DFT spectrum by period instead of frequency. 3

We want to compute Fourier spectrum at period /k, > k and,k are smaer than m. Using traditiona formua of Fourier transform, we need to cacuate Fourier transform X(k) = m 1 Then, its spectrum at period /k or at frequency k is m 1 PS(k) = ( x j e i2πjk x j e i2πjk (2.3) m 1 ) ( x j e i2πjk ) (2.4) We wi introduce a fast method for directy computing the Fourier spectrum at period /k. We previousy demonstrate that the Fourier power spectrum of a numerica sequence is determined by the distribution of signa strength at periodic positions in this sequence (Wang et a., 2012; Yin and Wang, 2016). The distribution can be represented and measured by the congruence derivative sequence of the origina sequence, which is defined as foows. Definition 2.1. For a rea number sequence x of ength m, if two positive integers n and satisfy m = n, then the congruence derivative sequence, y, of x has a ength of and its eement is defined as foows n 1 y t = x j+t,t = 0,1,... 1 (2.5) The foowing deduced theorems indicate that the congruence derivative sequence of ength of the numerica sequence x can be used to compute periodic power spectrum at period. The Fourier transform of the congruence derivative sequence y is 1 Y(k) = y t e i2πtk/,k = 0,1,... 1 (2.6) The reationship between Fourier transform of the origina sequence and its congruence derivative sequence was introduced in (Wang et a., 2012; Yin and Wang, 2016). The DFT of sequence x at frequency n = m/ is 4

the same as the DFT of the congruence derivative sequence y at frequency. X(n) = X(m/) = Y(1) (2.7) This formua is the mathematica ground of the PPS method. The foowing theorem extends the concusion to a genera case. Theorem 2.1. For a rea number sequence x of ength m, suppose n = m/, then the DFT of the numerica sequence x at frequency kn is X(kn) = Y(k),0 k 1 (2.8) Proof. We know 1 1 n 1 1 n 1 Y(k) = y t e i2πtk/ = x j+t e i2πtk/ = x j+t e i2π(j+t)k/ = = X(kn) m 1 r=0 x r e i2πrkn/m The proof of Theorem 2.1 indicates that the property of conjugate symmetry in DFT of a rea sequence is preserved. We thus have the foowing coroary. Coroary 2.1. X(kn) = X (( k)n),1 k 1 (2.9) According to formua 2.2, Fourier power spectrum of sequence x at frequency kn is defined as PS(kn) = X (kn)x(kn) = X (km/)x(km/) = Y (k)y(k). (2.10) Definition 2.2. If m = n, the DFT spectrum of numerica sequence x at fractiona period /k is named the fractiona period spectrum (FPS) and is written as FPS(/k) = Y (k)y(k),1 k 1 Definition 2.3. For a rea number sequence of ength, y t,t = 0,1,..., 1, its sef q-shift summation is defined by 1 z,q = y t y t+q (2.11) 5

with t+q taken moduo, 0 q <. If we write sequencey t,t = 0,1,..., 1, as a vector, y T = [y 0,y 1,,y 1 ], then z,q can be reaized by the autocorreation of vector y. For a specia case, z,0 is equa to the inner product of vector y, or z,0 = y T y. From the definition of DFT of the congruence derivative sequence (formua 2.6), we have as 1 1 1 Y(k) = y t e i2πtk/ = y t cos(2πtk/) i y t sin(2πtk/) (2.12) From the definition of FPS, FPS(/k) = Y (k)y(k), we notice that 1 1 FPS(/k) = ( y t cos(2πtk/)) 2 +( y t sin(2πtk/)) 2 (2.13) which is rea quadratic form of variabes, y 0,y 1,...,y 1, can be written FPS(/k) = y t Ay (2.14) where the coefficient matrix A = a rs is of quadratic form, in which a rs = cos((r ) 2kπ )cos((s ) 2kπ )+sin((r ) 2kπ )sin((s ) 2kπ ) r,s = 1,2,..., (2.15) After reorganizing formua (2.15), we obtain the foowing theorem about the coefficient matrix A. Theorem 2.2. The coefficient matrix of the quadratic form, A, is a symmetric matrix. Proof. According to some fundamenta identities of trigonometry, a rs = cos((r ) 2kπ )cos((s ) 2kπ )+sin((r ) 2kπ )sin((s ) 2kπ ) woud be equa to cos((r (s )) 2kπ ) = cos((r s) 2kπ ), i.e., a rs = cos((r s) 2kπ ). It is cear that a rs = a sr, so the coefficient matrix A is symmetric. 6

Theorem 2.3. For a rea number sequence x of ength m, if m = n and its congruence derivative sequence is y t,t = 0,1,, 1, then FPS(/k) of the sequence x can be expressed as foows. If is an odd number, If is an even number, FPS(/k) = z,0 +2 2 1 FPS(/k) = z,0 +2 q=1 2 1 q=1 z,q cos((q) 2kπ z,q cos((q) 2kπ ). )+2cos(( 2 )2kπ 2 1 ) y t y t+ 2 Proof. It is cear the diagona eements of the coefficient matrix A are unit eements. Due to the symmetry of matrix A, we need to check the ower trianguar matrix of A under the diagona of A, which is sufficient for proof. The eement of the ower trianguar matrix ofa, a rs = cos((r s) 2kπ ),r = 2,3,...,,r > s, aso has some kind of symmetry. The resut is as foows. Whenis an odd number, a 21 = a 32 = a 43 =... = a ( 1) = cos((1) 2kπ ) and a 21,a 32,...,a ( 1) aso are equa to a 1. Because a rs = a t r s t,r > s,0 t < r s, simiary, a 31 = a 42 =... = a ( 2) = a ( 1)1 = a 2 = cos((2) 2kπ )..., then a (+1)/21 = a (+3)/22 =...,= a (+1)/2 = a (+3)/21 =...,a ( 1)/2 = cos((( 1)/2) 2kπ ). When is an even number, we have a 21 = a 32 = a 43 =... = a ( 1) = a 1 = cos((1) 2kπ ) and a 31 = a 42 =... = a ( 2) = a ( 1)1 = a 2 = cos((2) 2kπ ). In genera, a /21 = a (/2+1)2 =...,= a (/2+1) = a (/2+2)1 = a (/2+3)2 =...,= a (/2 1) = cos((/2 1) 2kπ ), and a (/2+1)1 = a (/2+2)2 =...,= a /2 = cos((/2) 2kπ ). We then substitute those resuts into the coefficient matrix of the quadratic form A for the FPS(/k). The foowing formuas can be obtained. 7

If is an odd number, FPS(/k) = y 2 0 +y 2 1 +y 2 2 + +y 2 1 +2cos((1) 2kπ )(y 0 y 1 +y 1 y 2 +y 2 y 3 +y 3 y 4 y 1 y 0 ) +2cos((2) 2kπ )(y 0 y 2 +y 1 y 3 +y 2 y 4 ++y 3 y 5 y 2 y 0 +y 1 y 1 ) +2cos((3) 2kπ )(y 0 y 3 +y 1 y 4 +y 2 y 5 ++y 3 y 6 y 3 y 0 +y 2 y 1 +y 1 y 2 ) +2cos((( 1)/2) 2kπ )(y 0 y ( 1)/2 +y 1 y (+1)/2 +y 2 y (+3)/2 + y ( ( 1)/2) y 0 + +y 1 y ( 3)/2 ). (2.16) If is an even number, FPS(/k) = y 2 0 +y2 1 +y2 2 + +y2 1 +2cos((1) 2kπ )(y 0 y 1 +y 1 y 2 +y 2 y 3 +y 3 y 4 y 1 y 0 ) +2cos((2) 2kπ )(y 0 y 2 +y 1 y 3 +y 2 y 4 ++y 3 y 5 y 2 y 0 +y 1 y 1 ) +2cos((3) 2kπ )(y 0 y 3 +y 1 y 4 +y 2 y 5 ++y 3 y 6 y 3 y 0 +y 2 y 1 +y 1 y 2 ) +2cos((/2 1) 2kπ )(y 0 y /2 1 +y 1 y /2 +y 2 y /2+1 + y 3 y /2 2 +y 2 y /2 1 +y 1 y /2 ) +2cos((/2) 2kπ )(y 0 y /2 +y 1 y /2+1 +y 2 y /2+2 + y /2 1 y 1 ). (2.17) The ast term in equation (2.17), cos((/2) 2kπ )(y 0 y /2 +y 1 y /2+1 +y 2 y /2+2 + +y /2 1 y 1 ) can be written as cos((/2) 2kπ ) /2 1 y t y t+/2. Noticing the definition of z,q, the formuas (2.16) and (2.17) may be written as foows. If is an odd number, FPS(/k) = z,0 +2 8 2 1 q=1 z,q cos((q) 2kπ ) (2.18)

If is an even number, 2 1 FPS(/k) = z,0 +2 q=1 z,q cos((q) 2kπ )+2cos(( 2 )2kπ 2 1 ) y t y t+ 2 (2.19) Based on Theorem 2.3 and Coroary 2.1, FPS(/k) may be re-written by foowing theorem, which may improve the technique for computing F P S(/k). Theorem 2.4. FPS(/k) = FPS(/( k),)1 k 1 (2.20) According to Theorem 2.4, to compute a of the FPS(/k), 1 k 1, we ony require about haf of computing costs. For k = 1, noticing formuas (2.18) and (2.19) and the symmetry of FPS(/k), we need to cacuate( 1)/2 coefficients of quadraticsy 0,y 1,...,y 1 when is an odd number; and /2 coefficients when is an even number. It is vauabe to notice on the property of a cosine function, cos((q) 2kπ ) = cos((q) 2( k)π ),0 k 1. (2.21) Certainy, it is the foundation of Theorem 2.4 and an important formua to prove the foowing theorem. Theorem 2.5. The coefficient matrix of quadratic form A when k = 1 is sufficient for computing F P S(/k) when k 1. Proof. Without oss of generaity, we show the case when is an odd number. For a fixed k, k 1, from Theorem 2.4, the range of k shoud be in 2 k ( 1)/2 and the ( 1)/2 coefficients of quadratics of y 0,y 1,,y 1 are cos((t) 2kπ ),t = 1,2,..., 1 2 For k = 1, we obtain the coefficients cos((q) 2π 1 ),q = 1,2,...,. 2 If the ( 1)/2 coefficients, cos((t) 2kπ ),t = 1,2,,( 1)/2 can be represented by cos((q) 2π ),q = 1,2,,( 1)/2, then, the proposition is proved. It is known that the interva of k is [2,( 1)/2], so, for t = 1,cos((k) 2π ), the first coefficient of FPS(/k), is one of cos((q) 2π ),q = 1,2,...,( 1)/2. 9

When t 1, we may have one of two resuts: if tk in the interva [1,( 1)/2], it is obvious that cos((t) 2kπ ) is one of cos((q) 2π ),q = 1,2,...,( 1)/2; otherwise, if tk > ( 1)/2, based on formua (2.21) since tk beongs to [1,( 1)/2]. 3. Fast Agorithm 3.1. Fast agorithm for computing fractiona period spectrum at period /k Based on Theorem 2.5, the agorithm shoud do a buid-in data in the computer. The data contains the coefficients of quadratics, when k = 1 and for periods, = 2,3,..., to store in the computer. The buit-in data save much computing time because the agorithm does not need to repeatedy find the coefficients for next user. The programs sha design two functions. One function is to obtain the congruence derivative sequence (CDS) (y). The other function Z(, q) is to yied the autocorreation of the congruence derivative sequence with two parameters and q. A MATLAB functionrem(p,) may be used forpandare integer number if the programming is written by MATLAB, where the rem(p,) means the remainder after p divided by. Given input origina sequence x, the function CDS (y) yieds the congruence derivative sequence of the sequence x for a period in the computation. Suppose we need to cacuate the fractiona period /k, we then have the foowing agorithm. If is odd number, compete the oop for ordering to take out ( 1)/2 coefficients of quadratics, cos((1) 2π ),cos((2)2π),,cos((( 1)/2)2π ), from the buid-in data. From t = 1 to ( 1)/2, do if r = rem(tk,) > ( 1)/2 then takecos(( r) 2π) ese takecos((r)2π). At the same time, the2cos(()2π) vaue times the Z(,t) and to do summation corresponding to t order unti t = ( 1)/2. The computed summation added with Z(,0) is the FPS(/k). If is even number, doing the oop for ordering to take out /2 coefficients of quadratics, cos((1) 2π ),cos((2)2π),,cos((/2)2π )), from the buid-in data, i.e., t = 1 to ( 1)/2 do if r = rem(tk,) > ()/2 then take cos( r) 2π ese take cos((r) 2π ). At the same time, the 2cos(()2π ) vaue times the Z(,t) to do summation corresponding to t order unti t = (/2 1). FPS(/k) wi be the computed summation added with two quantities, 2cos((/2) 2kπ ) /2 1 timing y t y t+/2, and Z(,0). 10

To assess the effectiveness of proposed FPS agorithm in capturing fractiona periodicities in digita signas, we compute the FPS spectrum of the simuated sinusoida signa. The sinusoida signa of ength N = 300, consists of sine and cosine signas with fractiona periodicities 3.7 and 5.6, respectivey. The signa is corrupted by white Gaussian noise (Equation (3.1)). x(n) = sin(2π n + π)+cos(2π n + 3π)+noise 3.7 4 5.6 4 n = 1,2,,300 (3.1) Since the periodic signa is buried by the white Gaussian noise, two periodicities 3.7 and 5.6 in the origina signa are hidden by noise (Fig.1 (a)). The proposed FPS spectrum of the signa can ceary discover two pronounced peaks at positions 3.7 and 5.6 (Fig.1 (b)), corresponding to two periodicities 3.7 and 5.6 in the origina signa. The power spectrum of this simuation sequence by the proposed method agrees with the DFT (Fig.1 (b) and (c)). This resut demonstrates the effectiveness of the FPS agorithm. 3.2. Computationa compexity of the FPS agorithm Recaing the process for computing FPS(/k) of a time series by using the new agorithm, the first step is to yied the congruence derivative sequence y j, j = 0,1,..., 1, from the origina sequence x of ength m. This computing process ony needs pus operation and do not use mutipication. The second step in the agorithm is to compute z,q when is an even or odd number, the agorithm needs approximate /2( + 1) mutipications. Finay, using formuas (2.18, 2.19) needs /2 mutipications to determine the vaue of z,q cos((q)2kπ/). So the tota computationa quantity for the FPS agorithm invoves /2( + 1) + /2 = 2 /2 + mutipications, i.e., the computationa compexity is then O( 2 +). Using the traditiona Fourier transform to compute F P S(/k), we need to cacuate the Fourier transform at period /k by the formuas 2.3 and 2.4. From (2.3), we must cacuate m exponents, i2πjk/,j = 0,1,2,...,m 1, and ca exp function m times to figure out e i2πjk/,j = 0,1,2,...,m 1, if using MATLAB. It is known that to compute function e i2πjk/ costs much time. Then to figure out m 1 e i2πjk/ needs 2m mutipications and the time for caing m exp functions. So, according to formua (2.4) cacuating out the Fourier spectrum at period /k needs 2m + 1 mutipications, i.e., O(2m+ 1). If F P S(/k) is computed for whoe fractiona periods, traditiona Fourier transform needs /2O(2m + 1) operations, i.e. O(m + ). 11

4 3 2 1 Signa 0 1 2 3 4 0 50 100 150 200 250 300 Time (second) 3 x 104 (a) 2.5 Power spectrum 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 Periodicity 3 x 104 (b) 2.5 Power spectrum 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 Periodicity (c) 12 Figure 1: Fractiona period spectrum anaysis of sinusoida signa coapsed with white noise. (a) origina sinusoida signa. (b) Fractiona period spectrum of the sinusoida signa by the FPS agorithm. (c) Fractiona period spectrum of the sinusoida signa by DFT.

In summary, the anaysis of computationa compexity indicates that the FPS agorithm is much faster than traditiona Fourier transform for computing fractiona periodic spectrum because of O( 2 +) << O(m+). 4. Concusions A fast agorithm for computing the FPS of bioogica sequences is introduced in this paper. First, the mathematica deduction and proof are presented. Then, based on the mathematica theory, the technique of buitin data is suggested and some functions are introduced, so that a simpe program can be designed. The anaysis of computationa compicity shows that this fast agorithm is much faster than traditiona Fourier transform. The fast agorithm wi be aso hepfu in computing the FPS for any time series from the science and technoogy fied other than bioinformatics. In addition, the congruence derivative sequence of origina sequences can be used to yied Ramanujan transform easiy (Hua et a., 2014; Yin et a., 2014b), which do not have any computing cost. The computation for identifying periods of the bioogica sequence or other time series woud be done by Fourier transform and Ramanujan transform simutaneousy. 13

References Anastassiou, D., 2001. Genomic signa processing. Signa Processing Magazine, IEEE 18 (4), 8 20. Arquès, D. G., Miche, C. J., 1987. Periodicities in introns. Nuceic Acids Research 15 (18), 7581 7592. Chen, J., Li, H., Sun, K., Kim, B., 2003. How wi bioinformatics impact signa processing research? Signa Processing Magazine, IEEE 20 (6), 106 206. Eisenberg, D., Weiss, R., Terwiiger, T., 1984. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc. Nat. Acad. Sci. 81, 140 144. Gruber, M., Söding, J., Lupas, A. N., 2005. Repper repeats and their periodicities in fibrous proteins. Nuceic Acids Research 33 (supp 2), W239 W243. Hua, W., Wang, J., Zhao, J., 2014. Discrete ramanujan transform for distinguishing the protein coding regions from other regions. Moecuar and ceuar probes 28 (5), 228 236. Leonov, H., Arkin, I. T., 2005. A periodicity anaysis of transmembrane heices. Bioinformatics 21 (11), 2604 2610. Messaoudi, I., Ouesati, A. E., Lachiri, Z., 2013. Detection of the 6.5-base periodicity in the c. eegans introns based on the frequency chaos game signa and the compex moret waveet anaysis. Internationa Journa of Scientific Engineering and Technoogy 2 (12), 1247 1251. Saih, B., Trifonov, E. N., 2015. Strong nuceosomes reside in meiotic centromeres of c. eegans. Journa of Biomoecuar Structure and Dynamics 33 (2), 365 373. Siverman, B., Linsker, R., 1986. A measure of DNA periodicity. Journa of theoretica bioogy 118 (3), 295 300. Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R., 1997. Prediction of probabe genes by Fourier anaysis of genomic sequences. Bioinformatics 13 (3), 263 270. 14

Trifonov, E. N., 1998. 3-, 10.5-, 200-and 400-base periodicities in genome sequences. Physica A: Statistica Mechanics and its Appications 249 (1), 511 516. Wang, J., Dang, C., Yin, C., 2014. A comparative study of the signa-tonoise ratios of different representations for symboic sequences. Journa of Appied Mathematics and Bioinformatics 372, 135 145. Wang, J., Liu, G., Zhao, J., 2012. Some features of Fourier spectrum for symboic sequences. Numerica Mathematics A Journa of Chinese Universities (4), 341 356. Wang, J., Yan, M., 2013. Numerica Methods in Bioinformatics: An Introduction. Science Press. Wech, P., 1967. The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Eectroacoustics 15 (2), 70 73. Yau, S. S.-T., Wang, J., Niknejad, A., Lu, C., Jin, N., Ho, Y.-K., 2003. DNA sequence representation without degeneracy. Nuceic Acids Research 31 (12), 3078 3080. Yin, C., 2015. Representation of DNA sequences in genetic codon context with appications in exon and intron prediction. Journa of Bioinformatics and Computationa Bioogy 13 (2), 1550004 2. Yin, C., Chen, Y., Yau, S. S.-T., 2014a. A measure of DNA sequence simiarity by Fourier transform with appications on hierarchica custering. Journa of Theoretica Bioogy 359, 18 28. Yin, C., Wang, J., 2016. Periodic power spectrum with appications in detection of atent periodicities in DNA sequences. Journa of Mathematica Bioogy 73 (5), 1053 1079. Yin, C., Yau, S. S.-T., 2005. A Fourier characteristic of coding sequences: origins and a non-fourier approximation. Journa of Computationa Bioogy 12 (9), 1153 1165. 15

Yin, C., Yau, S. S.-T., 2007. Prediction of protein coding regions by the 3-base periodicity anaysis of a DNA sequence. Journa of Theoretica Bioogy 247 (4), 687 694. Yin, C., Yau, S. S.-T., 2015. An improved mode for whoe genome phyogenetic anaysis by Fourier transform. Journa of Theoretica Bioogy 359 (21), 18 28. Yin, C., Yau, S. S.-T., 2017. A coevoution anaysis for identifying proteinprotein interactions by Fourier transform. PoS ONE 12 (4), e0174862. Yin, C., Yin, X. E., Wang, J., 2014b. A nove method for comparative anaysis of DNA sequences by Ramanujan-Fourier transform. Journa of Computationa Bioogy 21 (12), 867 879. 16