arxiv: v1 [physics.comp-ph] 21 Feb 2018

Size: px
Start display at page:

Download "arxiv: v1 [physics.comp-ph] 21 Feb 2018"

Transcription

1 arxiv: v1 [physics.cmp-ph] 21 Feb 218 rspa.ryalscietypublishing.rg Research Article submitted t jurnal Subject Areas: Mechanical Engineering Keywrds: Data-driven frecasting, Lng-Shrt Term Memry, Gaussian Prcesses, T21 bartrpic climate mdel, Lrenz 96 Authr fr crrespndence: Petrs Kumutsaks petrs@ethz.ch Data-Driven Frecasting f High-Dimensinal Chatic Systems with Lng-Shrt Term Memry Netwrks Pantelis R. Vlachas 1, Wnmin Byen 1, Zhng Y. Wan 2, Themistklis P. Sapsis 2, Petrs Kumutsaks 1 1 Chair f Cmputatinal Science, ETH Zurich, Clausiusstrasse 33, Zurich, CH-892, Switzerland 2 Department f Mechanical Engineering, Massachussetts Institute f Technlgy, 77 Massachusetts Ave., Cambridge, MA 2139, United States We intrduce a data-driven frecasting methd fr high dimensinal, chatic systems using Lng-Shrt Term Memry (LSTM) recurrent neural netwrks. The prpsed LSTM neural netwrks perfrm inference f high dimensinal dynamical systems in their reduced rder space and are shwn t be an effective set f nn-linear apprximatrs f their attractr. We demnstrate the frecasting perfrmance f the LSTM and cmpare it with Gaussian prcesses (GPs) in time series btained frm the Lrenz 96 system, the Kuramt-Sivashinsky equatin and a prttype climate mdel. The LSTM netwrks utperfrm the GPs in shrt-term frecasting accuracy in all applicatins cnsidered. A hybrid architecture, extending the LSTM with a mean stchastic mdel (MSM-LSTM), is prpsed t ensure cnvergence t the invariant measure. This nvel hybrid methd is fully data-driven and extends the frecasting capabilities f LSTM netwrks. c The Authrs. Published by the Ryal Sciety under the terms f the Creative Cmmns Attributin License by/4./, which permits unrestricted use, prvided the riginal authr and surce are credited.

2 1. Intrductin Natural systems, ranging frm atmspheric climate and cean circulatin t rganisms and cells, invlve cmplex dynamics extending ver multiple spati-tempral scales. Centuries ld effrts t cmprehend and frecast the dynamics f such systems have spurred develpments in large scale simulatins, dimensinality reductin techniques and a multitude f frecasting methds. The gals f understanding and predictin have been cmplementing each ther but have been hindered by the high dimensinality and chatic behavir f these systems. In recent years we bserve a cnvergence f these appraches due t advances in cmputing pwer, algrithmic innvatins and the ample availability f data. A majr beneficiary f this cnvergence are data-driven dimensinality reductin methds [2 7], mdel identificatin prcedures [9 13] and frecasting techniques [14 19] that aim t prvide precise shrt term predictins while capturing the lng term statistics f these systems. Successful frecasting methds address the highly nnlinear energy transfer mechanisms between mdes nt captured effectively by the dimensinality reductin methds. The pineering technique f analg frecasting prpsed in [2] inspired a widespread research in nn-parametric predictin appraches. Tw dynamical system states are called analgues if they resemble ne anther n the basis f a specific criterin. This class f methds uses a training set f histrical bservatins f the system. The system evlutin is predicted using the evlutin f the clsest analgue frm the training set crrected by an errr term. This apprach has led t prmising results in practice [21] but the selectin f the resemblance criterin t pick the ptimal analgue is far frm straightfrward. Mrever, the gemetrical assciatin between the current state and the training set is nt explited. Mre recently [22], analg frecasting is perfrmed using a weighted cmbinatin f data-pints based n a lcalized kernel that quantifies the similarity f the new pint and the weighted cmbinatin. This technique explits the lcal gemetry instead f selecting a single ptimal analgue. Similar kernel-based methds, [23,24] use diffusin maps t glbally parametrize a lw dimensinal manifld capturing the slwer time scales. Mrever, nn-trivial interplatin schemes are investigated in rder t encde the system dynamics in this reduced rder space as well as map them t the full space (lifting). Althugh the gemetrical structure f the data is taken int accunt, the slutin f an eigen-system with a size prprtinal t the training data is required, rendering the apprach cmputatinally expensive. In additin, the inherent uncertainty due t sparse bservatins in certain regins f the attractr intrduces predictin errrs which cannt be mdeled in a deterministic cntext. In [25] a methd based n Gaussian prcess regressin (GPR) [26] was prpsed fr predictin and uncertainty quantificatin in the reduced rder space. The technique is based n a training set that sparsely samples the attractr. Stchastic predictins explit the gemetrical relatinship between the current state and the training set, assuming a Gaussian prir ver the mdeled latent variables. A key advantage f GPR is that uncertainty bunds can be analytically derived frm the hyper-parameters f the framewrk. Mrever, in [25] a Mean Stchastic Mdel (MSM) is used fr under-sampled regins f the attractr t ensure accurate mdeling f the steady state in the lng term regime. Hwever the resulting inference and training have a quadratic cst in terms f the number f data samples O(N 2 ). Sme f the earlier appraches t capture the evlutin f time series in chatic systems using recurrent neural netwrks were develped during the inceptin f the Lng-Shrt Term Memry netwrks (LSTM) [27]. Hwever, t the best f ur knwledge, these methds have been used nly n lwdimensinal chatic systems [34]. Similarly, ther machine learning algrithms such as Ech State Netwrks [36,37] and radial basis functins [38,39] have been successful, albeit nly fr lw rder dynamical systems. In this wrk, we prpse LSTM based methds that explit infrmatin f the recent histry f the reduced rder state t predict the high-dimensinal dynamics. Time-series data are used t train the mdel while n knwledge f the underlying system equatins is required. Inspired by Taken s therem [4] an embedding space is cnstructed using time delayed versins f the 2 rspa.ryalscietypublishing.rg Prc R Sc A

3 reduced rder variable. The prpsed methd tries t identify an apprximate frecasting rule glbally fr the reduced rder space. In cntrast t GPR [25], the methd has a deterministic utput while its training cst scales linearly with the number f training samples and it exhibits an O( ) inference cmputatinal cst. Mrever, fllwing [25], LSTM is cmbined with a MSM, t cpe with attractr regins that are nt captured in the training set. In attractr regins, under-represented in the training set, the MSM is used t guarantee cnvergence t the invariant measure and avid an expnential grwth f the predictin errr. The effectiveness f the prpsed hybrid methd in accurate shrt term predictin and capturing the lng-term behavir is shwn in the Lrenz 96 system and the Kuramt-Sivashisky system. Finally the methd is als tested n predictins f a prttypical climate mdel. The structure f the paper is as fllws: In Sectin 2 we explain hw the LSTM can be emplyed fr mdeling and predictin f a reference dynamical system and a blended LSTM- MSM technique is intrduced. In Sectin 3 three ther state f the art methds, GPR, MSM and the hybrid GPR-MSM scheme are presented and tw cmparisn metrics are defined. The prpsed LSTM technique and its LST-MSM extensin are benchmarked in three cmplex chatic systems in Sectin 4. In Sectin 5 we discuss the cmputatinal cmplexity f training and inference in LSTM. Finally, Sectin 6 ffers a summary and discusses future research directins. 2. Lng-Shrt Term Memry (LSTM) Recurrent Neural Netwrks The LSTM was intrduced in rder t regularize the training f recurrent neural netwrks (RNNs) [27]. RNNs cntain lps that allw infrmatin t be passed between cnsecutive tempral steps (see Figure 1) and can be expressed as: 3 rspa.ryalscietypublishing.rg Prc R Sc A ( ) h t = σ h Whi i t + W hh h t 1 + b h, (2.1) ( ) t = σ Wh h t + b (2.2) where i t, t and h t are the input, the utput and the hidden state f the RNN at time step t, while D represents a delay blck and W hi, W hh, W h are the input-t-hidden, hidden-t-hidden and hidden-t-utput weight matrices. Mrever, σ h and σ are the hidden and utput activatin functins, while b h and b are the respective biases. Tempral dependencies are captured by the hidden-t-hidden weight matrix W hh, which cuples tw cnsecutive hidden states tgether. The RNN can be viewed in its unflded frm in Figure 2. In many practical applicatins, RNNs Figure 1: RNN Figure 2: RNN unflded in time suffer frm the vanishing (r explding) gradient prblem and have failed t capture lng term dependencies [41,42]. Tday the RNNs we their renaissance largely t the LSTM, that cpes effectively with the afrementined prblem using gates. The LSTM has been successfully applied in sequence mdeling [32], speech recgnitin [28 3], hand-writing recgnitin [31] and language translatin [33].

4 The equatins f the LSTM are g f t = σ ( ) f Wf [h t 1, i t ] + b f gt i ( ) = σ i Wi [h t 1, i t ] + b i C t = tanh ( W C [h t 1, i t ] + b C ) (2.3) (2.4) (2.5) C t = g f t C t 1 + gt i C t (2.6) gt ( ) = σ h Wh [h t 1, i t ] + b h (2.7) h t = g t tanh(c t ), (2.8) where g f t, gi t and g t are the gate signals (frget, input and utput gates), i t is the input, h t is the hidden state, C t is the cell state, while W f, b f, W i, b i, W C, b C, W h and b h are weight matrices and biases f apprpriate dimensins. The activatin functins σ f, σ i and σ h are sigmids. Fr a mre detailed explanatin n the LSTM architecture refer t [27]. The hidden state h t R h, with h the number f hidden units. In practice we want the utput t have a specific dimensin d. Fr this reasn, a trivial fully cnnected final layer withut activatin functin is added t = W h h t, (2.9) with W h R d h. In the fllwing we refer t the LSTM hidden and cell states (h t and C t ) jintly as LSTM states. In this wrk, we cnsider the reduced rder prblem where the system state is prjected in the reduced rder space. Mrever, the system is cnsidered t be autnmus, while z t = dzt dt is the system state derivative at time step t. The LSTM mdel is trained using time series data frm the system t predict the state derivative t ˆ= z t = dzt dt at time t, using delayed versins f the reference reduced mdel state z t. It is a slely data-driven apprach and n explicit infrmatin regarding the frm f the underlying equatins is required. 4 rspa.ryalscietypublishing.rg Prc R Sc A (a) Training and inference The available time series data are divided int tw separate sets, the training dataset and the validatin dataset, i.e. zt train, zt train, t {1,, N train }, and zt val, zt val, t {1,, N val }. N train and N val are the number f training and validatin samples respectively. This data is stacked in batches as z train t+d 1 I train z train t+d 2 t =, train t = zt+d 1 train, (2.1). }{{} zt train Output batch }{{} Input batch fr t {1, 2,..., N train d + 1}, in rder t frm the training (and validatin) input and utput f the LSTM. These training batches are used t ptimize the parameters f the LSTM (weights and biases) in rder t learn the mapping I t t. The training prceeds by ptimising the netwrk weights iteratively fr each batch (training f ne epch). The training lss functin is a weighted versin f the rt mean square errr, i.e. 1 ( lss = d d i=1 w i train, i 2 t t) i where d is the dimensin f the utput f the LSTM, and the weights w i are selected accrding t the significance f each utput cmpnent, e.g. energy f each cmpnent. Mrever, the LSTM is trained using truncated Back-prpagatin Thrugh Time (BPTT) [35]. The BPTT is truncated after layer d. As a cnsequence, the LSTM is trained t predict the derivative at time t using infrmatin frm the previus d time steps. An imprtant issue is hw t select the hidden state dimensin h and hw t initialize the LSTM states at the truncatin layer d. A small h reduces the expressive capabilities f the LSTM

5 and deterirates inference perfrmance. On the ther hand, a big h leads t fast verfitting, an upturn in the generalizatin errr and increased cmputatinal cst f training. Fr this reasn, h has t be tuned depending n the bserved data (training and validatin). Fr the truncatin layer d, there are tw alternatives, namely stateless and statefull LSTM. In stateless LSTM the LSTM states at layer d are initialized t zer. As a cnsequence, the LSTM can nly capture dependencies up t d previus time steps. In the secnd variant, the statefull LSTM, the state is always prpagated fr p time steps in the future and then reinitialized t zer, t help the LSTM capture lnger dependencies. In this wrk, the systems cnsidered exhibit chatic behavir and the dependencies are inherently shrt term, as the states in tw time steps that differ significantly can be cnsidered statistically independent. Fr this reasn, the shrt tempral dependencies can be captured withut prpagating the hidden state fr a lng hrizn. As a cnsequence, we cnsider nly the stateless variant p =. We als applied statefull LSTM withut any significant imprvement s we mit the results fr brevity. Optimizatin during training is perfrmed using the Adam stchastic ptimizatin methd [1] with an adaptive learning rate (initial learning rate η =.1). Training is stpped when cnvergence f the training errr is detected r the maximum r 1 epchs is reached. The LSTM mdel with the smallest validatin errr is cnsidered t avid ver-fitting. The trained LSTM mdel can be used t frecast the system state in the next time steps in an iterative fashin. The histry f the system up t time step d, i.e. z1 true,..., zd true, is assumed t be knwn. We initialize the LSTM states with h and C and we use the trained LSTM t predict the derivative z pred d. By integrating the derivative with a reference time difference dt and initial cnditin zd true the value z pred d+1 is btained. This value is used fr the next predictin in an iterative fashin as illustrated in Figure 3. In statefull LSTM, initial values fr h and C can be btained by teacher frcing the LSTM fr a few time steps prpagating values frm the knwn histry and ignring the utputs. In stateless LSTM, h and C are initialized with zer vectrs. 5 rspa.ryalscietypublishing.rg Prc R Sc A Figure 3: Iterative predictin using LSTM (b) Mean Stchastic Mdel (MSM) and Hybrid LSTM-MSM The MSM is a pwerful data-driven methd used t quantify uncertainty and perfrm frecasts in turbulent systems with high intrinsic attractr dimensinality [25,43]. It is parametrized a priri t capture glbal statistical infrmatin f the attractr by design, while its cmputatinally cmplexity is very lw cmpared t LSTM r GPR. The cncept behind MSM is t mdel each cmpnent f the state z i independently with an Ornstein-Uhnelbeck (OU) prcess that captures the energy spectrum and the damping time scales f the statistical equilibrium. The prcess takes the fllwing frm dz i = c i z i dz + ξ i dw i, (2.11)

6 where c i, ξ i are parameters fitted t the centered training data and W i is a wiener prcess. In the statistical steady state the mean, energy and damping time scale f the prcess are given by 6 µ i = E[z i ] =, E i = E[z i (z i ) ] = ξ2 2c i, T i = 1 c i. (2.12) In rder t fit the mdel parameters c i, ξ i we directly estimate the variance E[z i (z i ) ] frm the time series training data and the decrrelatin time using 1 T i = E[z i (z i ) E[z i (t)(z i ) (t + τ)dτ. (2.13) ] After cmputing these tw quantities we replace in (2.12) and slve with respect t c i and ξ i. Since the MSM is mdelled a priri t mimic the glbal statistical behavir f the attractr, frecasts made with MSM can never escape. This is nt the case with LSTM and GPR, as predictin errrs accumulate and iterative frecasts escape the attractr fast due t the chatic dynamics, althugh shrt term predictins are accurate. This prblem has been addressed with respect t GPR in [25]. In rder t cpe effectively with this prblem we intrduce a hybrid LSTM-MSM technique that prevents frecasts frm diverging frm the attractr. The state dependent decisin rule fr frecasting in LSTM-MSM is given by { ( zt ) LST M, if p train (z t ) = p train i (zt) i > δ z t = (2.14) ( z t ) MSM, therwise where p train (z t ) is an apprximatin f the prbability density functin f the training dataset and δ.1 a cnstant threshld tuned based n p train (z t ). We apprximate p train (z t ) using a mixture f Gaussian kernels. This hybrid architecture explits the advantages f LSTM and MSM. In case there is a high prbability that the state z i lies clse t the training dataset (interplatin) the LSTM having memrized the lcal dynamics is used t perfrm inference. This ensures accurate LSTM shrt-term predictins. On the ther hand, clse t the bundaries the attractr is nly sparsely sampled p train (z i ) < δ and errrs frm LSTM predictins wuld lead t divergence. In this case, MSM guarantees that frecasting trajectries remain clse t the attractr, and that we cnverge t the statistical invariant measure in the lng-term. rspa.ryalscietypublishing.rg Prc R Sc A 3. Benchmark and Perfrmance Measures The perfrmance f the prpsed LSTM based predictin mechanism is benchmarked against the fllwing state-f-the-art methds: Mean Stchastic Mdel (MSM) Gaussian Prcess Regressin (GPR) Mixed Mdel (GPR-MSM) In rder t guarantee that the predictin perfrmance is independent f the initial cnditin selected, fr all applicatins and all perfrmance measures cnsidered the average value f each measure fr a number f different initial cnditins sampled independently and unifrmly frm the attractr is reprted. The grund truth trajectry is btained by integrating the discretized reference equatin starting frm each initial cnditin, and prjecting the states t the reduced rder space. The reference equatin and the prjectin methd are f curse applicatin dependent. Frm each initial cnditin, we generate an empirical Gaussian ensemble f dimensin N en arund the initial cnditin with a small variance σ en. This nise represents the uncertainty in the knwledge f the initial system state. We frecast the evlutin f the ensemble by iteratively predicting the derivatives and integrating (deterministically fr each ensemble member fr the LSTM, stchastically fr GPR) and we keep track f the mean. The ensemble size N ensemble is

7 selected in the rder f 5, which is the usual chice in envirnmental science, e.g. weather predictin and shrt term climate predictin [44]. The grund truth trajectry at each time instant z is then cmpared with the predicted ensemble mean z. As a cmparisn measure we use the rt mean square errr (RMSE) defined as RMSE(z k ) = 1/V V i=1 ( z i k z i k) 2,where index k dentes the k th cmpnent f the reduced rder state z, i is the initial cnditin, and V is the ttal number f initial cnditins. The RMSE is cmputed at each time instant fr each cmpnent k f the reduced rder state, resulting in errr curves that describe the evlutin f errr with time. Mrever, we use the mean Anmaly Crrelatin (AC) [47] ver V initial cnditins t quantify the pattern crrelatin f the predicted trajectries with the grund-truth. The AC is defined as AC = 1 V V i=1 rdim k=1 w k ( )( ) rdim k=1 w k zk i z k z k i z k ( z i k z k) 2 rdim k=1 w k ( ), 2 (3.1) z k i z k where k refers t the mde number, i refers t the initial cnditin, w k are mde weights selected accrding t the energies f the mdes after dimensinality reductin and z k is the time average f the respective mde, cnsidered as reference. This scre ranges frm 1. t 1.. If the frecast is perfect, the scre equals t 1.. The AC cefficient is a widely used frecasting accuracy scre in the meterlgical cmmunity [46]. 4. Applicatins In this sectin, the effectiveness f the prpsed methd is demnstrated with respect t three chatic dynamical systems, exhibiting different levels f chas, frm weakly chatic t fully turbulent, i.e. the Lrenz 96 system, the Kuramt-Sivashinsky equatin and a prttypical bartrpic climate mdel. 7 rspa.ryalscietypublishing.rg Prc R Sc A (a) The Lrenz 96 System In [45] a mdel f the large-scale behaviur f the mid-latitude atmsphere is intrduced. This mdel describes the time evlutin f the cmpnents X j fr j {, 1,..., J 1} f a spatially discretized (ver a single latitude circle) atmspheric variable. In the fllwing we refer t this mdel as the Lrenz 96. The Lrenz 96 is usually used ( [25,46] and references therein) as a ty prblem t benchmark methds fr weather predictin. The system f differential equatins that gverns the Lrenz 96 is defined as dx j dt = (X j+1 X j 2 )X j 1 X j + F, (4.1) fr j {, 1,..., J 1}, where by definitin X 1 = X J, X 2 = X J 1. In ur analysis J = 4. The right-hand side f (4.1) cnsists f a nn-liner adjective term (X j+1 X j 2 )X j 1 X j, a linear advectin (dissipative) term X j and a psitive external frcing term F. The discrete energy f the system remains cnstant thrughut time and the Lrenz 96 states X j remain bunded. By increasing the external frcing parameter F the behavir that the system exhibits changes frm peridic F < 1 t weakly chatic (F = 4) t end up in fully turbulent regimes (F = 16). We refer t X j as the states f the Lrenz 96 mdel. These regimes can be bserved in Figures 4 Fllwing [25,44] we apply a shifting and scaling t standardize the Lrenz 96 states X j. The discrete r Dirichlet energy is given by E = 1 J 2 j=1 X2 j. In rder fr the scaled Lrenz 96 states t have zer mean and unit energy we transfrm them using X j = X j X Ep, d t = E pdt, (4.2)

8 2. F = 4 2. F = 8 2. F = 16 8 t x t x t x Figure 4: Lrenz 96 cntur plts fr different frcing regimes F. Chaticity rises with bigger values f F. where E p is the average energy fluctuatin, i.e. E p = 1 2T J 1 j= T+T T (X j X) 2 dt. (4.3) rspa.ryalscietypublishing.rg Prc R Sc A In this way the scaled energy is Ẽ = 2 1 J 1 X j= j 2 = 1 and the scaled variables have zer mean X = J 1 J 1 X j= j =, with X the mean state. The scaled Lrenz 96 states X j bey the fllwing differential equatin d X j d t = F X E p + ( X j+1 X j 2 )X X j Ep + (i) Dimensinality Reductin: Discrete Furier Transfrm + ( X j+1 X j 2 ) X j 1 (4.4) Firstly, the Discrete Furier Transfrm (DFT) is applied t the energy standardized Lrenz 96 states X j. The Furier cefficients ˆX k C are given by ˆX k = 1 J 1 X j e 2πikj/J (4.5) J j= while the Lrenz 96 states can be recvered frm the Furier cefficients using the inverse DFT J 1 X j = k= ˆX k e 2πikj/J (4.6) After applying the DFT t the Lrenz 96 states we end up with a symmetric energy spectrum that can be uniquely characterized by J/2 + 1 (J is cnsidered t be an even number) cefficients ˆX k fr k K = {, 1,, J/2}. In ur case J = 4, thus we end up with K = 21 cmplex cefficients ˆX k C. These cefficients are referred t as the Furier mdes r simply mdes. The Furier energy f each mde is defined as E k = V ar( ˆX k ) = E [ ( ˆX k ( t) ˆX k )( ˆX k ( t) ˆX k ) ]. (4.7) The energy spectrum f the Lrenz 96 system is pltted in Figure 5 fr different values f the frcing term F. We take int accunt nly the r dim = 6 mdes crrespnding t the highest

9 Energy Ek Wavenumber k Cummulative energy % Number f mst energetic mdes used Figure 5: Energy spectrum E k and cumulative energy with respect t the number f mst energetic mdes used fr different frcing regimes f Lrenz 96 system. As the frcing increases, mre chaticity is intrduced t the system. F = 4 ; F = 8 ; F = 16 Frcing Wavenumbers k Frcing Wavenumbers k F = 4 7,1,14,9,17,16 F = 8 8,9,7,1,11,6 F = 6 8,7,9,1,11,6 F = 16 8,9,1,7,11,6 Table 1: Mst energetic Furier mdes used in the reduced rder phase space rspa.ryalscietypublishing.rg Prc R Sc A energies and the rest f the mdes are truncated. Fr the different frcing regimes F = 1, 2, 3, 4, the six mst energetic mdes crrespnd t apprximately 89%, 57.8%, 52% and 43.8% f the ttal energy respectively. The space where the reduced variables live in is referred t as the reduced rder phase space and the mst energetic mdes are ntated as ˆX k r fr k {, 1,..., r dim 1}. As shwn in [48] the mst energetic mdes are nt necessarily the nes that capture better the dynamics f the mdel. Hwever, in this wrk we are nt interested in an ptimal reduced space representatin, but rather in the effectiveness f a predictin mdel given this space. The respective wavenumbers f the mst energetic mdes as well as their energy are given in Table 1. The truncated mdes are ignred fr nw. Nevertheless, their effect can be mdelled stchastically as in [25]. Since each Furier mde ˆX k r is a cmplex number, it cnsists f a real part and an imaginary part. By stacking these real and imaginary parts f the r dim truncated mdes we end up with the 2 r dim dimensinal reduced mdel state X [Re( ˆX r 1 ),..., Re( ˆX r r dim ), Im( ˆX r 1 ),..., Im( ˆX r r dim )] T (4.8) Assuming that Xj t fr j {, 1,..., J 1} are the Lrenz 96 states at time instant t, the mapping Xj t, j X is unique and the reduced mdel state f the Lrenz 96 has a specific vectr value. Fr high dimensins, Furier Transfrm is equivalent t Principal Cmpnent Analysis. (ii) Training and Predictin in Lrenz 96 The reduced Lrenz 96 system states X t are cnsidered as the true reference states z t. The LSTM is trained t frecast the derivative f the reduced rder state dz t /dt as in [34]. In the fllwing we analyze the influence f the truncatin layer d and the number f hidden units h f the LSTM with respect t the chatic Lrenz 96 system. The influence f d in training and perfrmance f the LSTM mdel is the fllwing. On the ne hand, selecting a large d makes the training mre challenging, fr tw reasns. Firstly, the LSTM has mre layers and secndly mre nise might be included in the input (irrelevant infrmatin)

10 rendering subptimal predictin perfrmance. On the ther hand, selecting a small d might lead t an input sequence with pr infrmatin cntent, leading t lw predictin perfrmance. Increasing the number f hidden ndes h rises the expressiveness f LSTM, but it is easier t verfit the training set. A stateless LSTM is used. The back-prpagatin truncatin hrizn is set t d = 1 and we use h = 2. In rder t btain training data fr the LSTM, we integrate the Lrenz 96 system state Eg. (4.1) starting frm an initial cnditin Xj fr j {, 1,..., J 1} using a Runge-Kutta 4th rder methd with a time step dt =.1 up t T = 51. In this way a time series Xj t, t {, 1, } is cnstructed. Using the scaling and dimensinality reductin methd explained in Sectin i we cnstruct the reduced rder state time series X t, t {, 1, }, using the mapping X j t j X t. Frm this time series we discard the first 1 4 initial time steps t avid transients, ending up with a time series with N train = 5 samples. A similar but independent prcess is repeated fr the validatin set. (iii) Results The trained LSTM mdels are used fr predictin based n the iterative prcedure explained in Sectin 2. In this sectin, we demnstrate the frecasting capabilities f LSTM and cmpare it with the state f the art. 1 different initial cnditins are simulated. Fr each initial cnditin, an ensemble with size N en = 5 is cnsidered by perturbing it with a nrmal nise with variance σ en =.1. In Figures 6a, 6b, and 6c we reprt the mean RMSE predictin errr f the mst energetic mde ˆX r 1 C, scaled with E p fr the frcing regimes F {6, 8, 16} fr the first N = 1 time steps (T =.1). In the RMSE the cmplex nrm v 2 = vv is taken int accunt. The 1% f the standard deviatin f the attractr is als pltted fr reference (1%σ). As F increases, the system becmes mre chatic and difficult t predict. As a cnsequence, the number f predictin steps that remain under the 1%σ threshld are decreased. The LSTM mdels extend this predictability hrizn fr all frcing regimes cmpared t GPR and MSM. Hwever, when LSTM is cmbined with MSM the shrt term predictin perfrmance is cmprmised. Nevertheless, hybrid LSTM- MSM mdels utperfrm GPR methds in shrt term predictin accuracy. In Figures 6d, 6e, and 6f, the RMSE errr fr T = 2 is pltted. The standard deviatin frm the attractr σ is pltted fr reference. We can bserve the fllwing 1 rspa.ryalscietypublishing.rg Prc R Sc A The predictin perfrmance f the LSTM in the quasi-peridic regime F = 4 is clearly superir t all ther appraches. Blending LSTM with MSM guarantees accurate mdeling f the steady state in the lng term, but leads t a perfrmance cmprmise in the shrt-term. LSTM-MSM utperfrms GPR-MSM. In all frcing regimes, bth GPR and LSTM eventually diverge, while MSM, and blended GPR-MSM, LSTM-MSM schemes remain clse t the attractr in the lng term as expected. Fr F = 8 althugh the RMSE errr in the shrt-term is smaller fr LSTM, GPR remains fr a lnger perid clse t the attractr (e.g. T =.75 fr F = 8). Hwever, when blended schemes are taken int accunt, LSTM-MSM shws superir perfrmance in the shrtterm and slightly better perfrmance in the lng term cmpared t GPR-MSM. In Figures 6g, 6h, and 6i, the mean AC ver 1 initial cnditins is given. The predictability threshld f.6 is als pltted. After crssing this critical threshld, the methds d nt predict better than a trivial mean predictr. Fr F = 4 GPR methds shw inferir perfrmance cmpared t LSTM appraches as analyzed previusly in the RMSE cmparisn. Hwever, fr F = 8 LSTM mdels d nt predict better than the mean after T.35, while GPR shws better perfrmance. In turn, when blended with MSM the cmprmise in the perfrmance fr GPR- MSM is much bigger cmpared t LSTM-MSM. The LSTM-MSM scheme shws slightly superir perfrmance than GPR-MSM during the entire relevant time perid (AC >.6). Fr the fully

11 F = 4, wavenumber k = 7.1 F = 8, wavenumber k = 8.15 F = 16, wavenumber k = %σ (a) (d) AC F = 4, wavenumber k = 7 F = %σ (b) F = 8, wavenumber k = (e) AC F = %σ (c) 2 1 F = 16, wavenumber k = (f) AC F = rspa.ryalscietypublishing.rg Prc R Sc A (g) (h) (i) Figure 6: Mean RMSE f the mst energetic mde and mean AC ver 1 initial cnditins fr the Lrenz 96 system. 1% f the standart deviatin frm the atractr ; Standart deviatin frm the atractr ; AC predictability threshld ; MSM ; GPR ; GPR-MSM ; LSTM ; LSTM-MSM turbulent regime F = 16, LSTM shws cmparable perfrmance with bth GPR and MSM and all methds cnverge as chaticity rises, since the intrinsic dimensinality f the system attractr increases and the system becme inherently unpredictable. In Figure 7, the evlutin f the mean RMSE ver 1 initial cnditins f the wavenumbers k = 8, 9, 1, 11 f the Lrenz 96 with frcing F = 8 is pltted. In cntrast t GPR, the RMSE errr f LSTM is much lwer in the mderate and lw energy wavenumbers k = 9, 1, 11 cmpared t the mst energetic mde k = 8. This difference amng mdes is nt bserved in GPR. This can be attributed t the highly nn-linear energy transfer mechanisms between these lwer energy mdes as ppsed t the Gaussian and lcally linear energy transfers f the mst energetic mde. As illustrated befre, the hybrid LSTM-MSM architecture effectively cmbines the accurate shrt-term predictin perfrmance f LSTM with the lng-term stability f MSM. The percentage f ensemble members in the hybrid scheme explained by LSTM is pltted with respect t time in Figure 8. In parallel with the GPR results presented in [25], the slpe f the percentage drp increases with F up t time t 1.5. Hwever, in cntrast t the results frm GPR reprted in [25], LSTM shws a mre stable behavir as a bigger percentage f the ensembles is explained by it

12 .1 F = 8, wavenumber k = 8.1 F = 8, wavenumber k = %σ (a) (c) F = 8, wavenumber k = 1 1%σ %σ (b) F = 8, wavenumber k = 11 1%σ (d) rspa.ryalscietypublishing.rg Prc R Sc A Figure 7: Mean RMSE f the mst energetic mde (k = 8) and medium and lw energy mdes (k = 9, 1, 11) ver 1 initial cnditins fr the Lrenz 96 system with frcing F = 8. 1% f the standart deviatin frm the atractr ; MSM ; GPR ; GPR-MSM ; LSTM ; LSTM- MSM cmpared t GPR in general. This is because LSTM is a lcal nnlinear attractr apprximatr and can better capture the mean lcal dynamics, while GPR is lcally linear. 1 LSTM dynamics % Time Figure 8: Average percentage ver 5 initial cnditins f the ensemble members evaluated using LSTM dynamics ver time fr different Lrenz 96 frcing regimes in the hybrid LSTM-MSM methd. F = 4 ; F = 8 ; F = 16

13 (b) Kuramt-Sivashinsky Equatin 13 4u 2u u u = ν 4 u, t x x x2 u u u(, t) = u(l, t) = = x x= x (4.9) =, x=l u(x, ) = u (x), where u(x, t) is the mdeled quantity f interest depending n a spatial variable x [, L] and time t [, ]. The negative viscsity is mdeled by the parameter ν >. We impse Dirichlet and secnd-type bundary cnditins t guarantee ergdicity [53]. In rder t spatially discretize (4.9) we use a grid size x with D = L/ x the number f ndes. Further, we dente with ui = u(i x) the value f u at nde i {,..., D}. Discretizatin using a secnd rder finite differences scheme yields u 4ui 1 + 6ui 4ui+1 + ui+2 dui = ν i 2 dt x4 (4.1) u2i+1 u2i 1 u 2ui + ui 1. i+1 4 x x2 Further, we impse u = ud+1 = and add ghst ndes u 1 = u1, ud+2 = ud t accunt fr the Dirichlet and secnd-rder bundary cnditins. In ur analysis, the number f ndes is D = 512. The Kuramt-Sivashinsky equatin exhibits different levels f chas depending n the bifurcatin parameter L = L/2π ν [54]. Higher values f L lead t mre chatic systems [25]. ν = 1/ x ν = 1/ t t 75 ν = 1/16 1 t x x Figure 9: Cntur plts f u(x, t) fr different values f ν in steady state. Chaticity rises with smaller values f ν. In ur analysis the spatial variable bund is held cnstant t L = 16 and chaticity level is cntrlled thrugh the negative viscsity ν, where a smaller value leads t a system with a rspa.ryalscietypublishing.rg Prc R Sc A... The Kuramt-sivashinsky (K-S) system is extensively used in many scientific fields t mdel a multitude f chatic physical phenmena. It was first derived by Kuramt [49,5] as a turbulence mdel f the phase gradient f a slwly varying amplitude in a reactin-diffusin type medium with negative viscsity cefficient. Later, Sivashinsky [51] studied the spntaneus instabilities f the plane frnt f a laminar flame ending up with the K-S equatin, while in [52] the K-S equatin is fund t describe the surface behavir f viscus liquid in a vertical flw. Fr ur study, we restrict urselves t the ne dimensinal K-S equatin with bundary and initial cnditins given by

14 higher level f chas (see Figure 9). The tempral average f the state and the cumulative energy are pltted in Figure 1. As ν declines, chaticity in the system rises and higher scillatins f the mean twards the Dirichlet bundary cnditins are bserved, while the number f mdes needed t capture mst f the energy is higher. In ur study, we cnsider tw values, namely ν = 1/1 and ν = 1/16 t benchmark the predictin skills f the prpsed methd. The discretized equatin (4.1) is integrated with a time interval dt =.2 up t T = 11. The data pints up t T = 1 are discarded as initial transients. Half f the remaining data (N = 25 samples) are used fr training and the ther half fr validatin. ū x Number f mdes used Figure 1: Tempral average u and cumulative mde (PCA) energy fr different values f ν. 1/ν = 1 ; 1/ν = 16 ; 1/ν = 36 Cumulative Energy in % rspa.ryalscietypublishing.rg Prc R Sc A (i) Dimensinality Reductin: Singular Value Decmpsitin The dimensinality f the prblem is reduced using Singular Value Decmpsitin (SVD). By subtracting the tempral mean u and stacking the data, we end up with the data matrix U R N 513, where N is the number f data samples (N = 5 in ur case). Perfrming SVD n U leads t U = MΣV T, M R N N, Σ R N 513, V R , (4.11) with Σ diagnal, with descending diagnal elements. The right singular vectrs crrespnding t the r dim largest singular values are the first clumns f V = [V r, V r ]. Stacking these singular vectrs yields V r R 513 r dim. Assuming that u t R 513 is a vectr f the discretized values f u(x, t) in time t, in rder t get a reduced rder representatin crrespnding t the cmpnents with the highest energies (singular values) we multiply c = V r T u, c R r dim. (4.12) Applying SVD n the data matrix U is equivalent with Principal Cmpnent Analysis n the cvariance matrix as in [25]. The percentage f cumulative energy w.r.t. t the number f cmpnents (mdes) cnsidered is pltted in Figure 1. Further, the 9% threshld is pltted. In ur study, we pick r dim = 2 (ut f 512) mst energetic mdes, as they explain apprximately 9% f the ttal energy. The reduced mdel state is then given by: (ii) Results c [c 1,..., c rdim ] T. (4.13) We train stateless LSTM mdels with h = 1 and d = 5. Fr testing, starting frm 1 initial cnditins unifrmly sampled frm the attractr, we generate a Gaussian ensemble f dimensin N = 5 centered arund the initial cnditin in the riginal space with standard deviatin f

15 σ =.1. This ensemble is prpagated using the LSTM predictin mdels, and GPR, MSM and GPR-MSM mdels trained as in [25]. The rt mean square errr between the predicted ensemble mean and the grund-truth is pltted in Figures 11a, 11b fr different values f the parameter ν. All methds reach the invariant measure much faster fr 1/ν = 16 cmpared t the less chatic regime 1/ν = 1 (nte the different integratin times T = 4 fr 1/ν = 1, while T = 1.5 fr 1/ν = 16). In bth chatic regimes 1/ν = 1 and 1/ν = 16, the reduced rder LSTM utperfrms all ther methds in the shrt term befre escaping the attractr. Hwever, in the lng term, LSTM des nt stabilize and will eventually diverge faster than GPR (see Figure 11b). Blending LSTM with MSM alleviates the prblem and bth accurate shrt term predictins and lng term stability is attained. Mrever, the hybrid LSTM-MSM has better frecasting capabilities cmpared t GPR. The need fr blending LSTM with MSM in the KS equatin is less imperative as the system is less chatic than the Lrenz 96 and LSTM methds diverge much slwer, while they sufficiently capture the cmplex nnlinear dynamics. As the intrinsic dimensinality f the attractr rises LSTM diverges faster. The mean Anmaly Crrelatin (3.1) is pltted with respect t time in Figures 11c and 11d fr ν = 1 and 16 respectively. The evlutin f the AC justifies the afrementined analysis. The mean AC f the trajectry predicted with LSTM remains abve the predictability threshld f.6 fr a highest time duratin cmpared t ther methds. This predictability hrizn is apprximately 2.5 fr ν = 1/1 and.6 fr ν = 1/16, since the chaticity f the system rises and accurate predictins becme mre challenging. 15 rspa.ryalscietypublishing.rg Prc R Sc A 5 v = 1/1 6 v = 1/ RMSE(α1) 3 2 RMSE(α1) (a) (b) 1. v = 1/1 1. v = 1/16 Anmaly Crrelatin Anmaly Crrelatin (c) (d) Figure 11: Mean RMSE f the mst energetic mde and mean AC ver 1 initial cnditins fr the K-S equatin with 1/ν = 1 (11a,11c) and 1/ν = 16 ( 11c,11d). Standard deviatin frm the attractr ; AC predictability threshld ; MSM ; GPR ; GPR-MSM ; LSTM ; LSTM-MSM

16 Fr the hybrid LSTM-MSM, the percentage f the ensemble members that are explained by LSTM dynamics is pltted in Figure 12. The qutient drps slwer fr 1/ν = 1 in the lng run as the intrinsic dimensinality f the attractr is smaller and trajectries diverge slwer. Hwever, in the beginning the LSTM percentage is higher fr 1/ν = 16 as the MSM drives initial cnditins clse t the bundary faster twards the attractr due t the higher damping cefficients cmpared t the case 1/ν = 1. This explains the initial knick in the graph fr 1/ν = 16. The slw damping cefficients fr 1/ν = 1 d nt allw the MSM t drive the trajectries back t the attractr in a faster pace than the diffusin caused by the LSTM frecasting. LSTM Dynamics % Figure 12: Mean ver 1 initial cnditins f the percentage f ensemble members explained by the LSTM dynamics fr the Kuramt-Sivashinsky (T = 1.5) 1/ν = 1 ; 1/ν = rspa.ryalscietypublishing.rg Prc R Sc A

17 (c) A Bartrpic Climate Mdel In this sectin, we examine a standard bartrpic climate mdel [55] riginating frm a realistic winter circulatin. The mdel equatins are given by ζ t = J (ψ, ζ + f + h) + k 1ζ + k 2 δ 3 ζ + ζ, (4.14) where ψ is the streamfunctin, ζ = δψ the relative vrticity, f the Crilis parameter, ζ a cnstant vrticity frcing, while k 1 and k 2 are the Ekman damping and the scale-selective damping cefficient. J is the Jacbi peratr given by ( a B J (a, b) = λ µ a B ), (4.15) µ λ where µ and λ dente the sine f the gegraphical latitude and lngitude respectively. The equatin f the bartrpic mdel (4.14) is nn-dimensinalized using the radius f the earth as unit length and the inverse f the earth angular velcity as time unit. The nn-dimensinal rgraphy h is related t the real Nrthern Hemisphere rgraphy h by h = 2sin(φ )A h /H, where phi is a fixed amplitude f 45 N, A is a factr expressing the surface wind strength blwing acrss the rgraphy, and H a scale height [55]. The stream-functin ψ is expanded int a spherical harmnics series and truncated at wavenumber 21, while mdes with an even ttal wavenumber are excluded, aviding currents acrss the equatr and ending up with a hemispheric mdel with 231 degrees f freedm. The training data are btained by integrating the Eq. (4.14) fr 1 5 days after an initial spin-up perid f 1 days, using a furth-rder Adams-Bashfrth integratin scheme with a 45-min time step in accrdance with [25], with k 1 = 15 days, while k 2 is selected such that wavenumber 21 is damped at a time scale f 3 days. In this way we end up with a time series ζ t with 1 4 samples. The spherical surface is discretized int a D = mesh with equally spaces latitude and lngitude. Frm the gathered data, 9% is used fr training and 1% fr validatin. The mean and variance f the statistical steady state are shwn in Figure 13a. 17 rspa.ryalscietypublishing.rg Prc R Sc A (i) Dimensinality Reductin: Classical Multidimensinal Scaling The riginal prblem dimensin f 231 is reduced using a generalized versin f the classical multidimensinal scaling methd [56]. The prcedure tries t identify an embedding with a lwer dimensinality such that the pairwise inner prducts f the dataset are preserved. Assuming that the dataset cnsists f pints ζ i, i {1,..., N}, whse reduced rder representatin is dented with y i, the prcedure is equivalent with the slutin f the fllwing ptimizatin prblem minimize y 1,...,y N ( ) 2, ζi, ζ j ζ y i, y j y (4.16) i<j where, ζ, and, y dente sme well defined inner prduct f the riginal space ζ and the embedding space y respectively. Prblem (4.16) minimizes the ttal squared errr between pairwise prducts. In case bth prducts are the scalar prducts, the slutin f (4.16) is equivalent with PCA. Assuming nly, y is the scalar prduct, prblem (4.16) als accepts an analytic slutin. Let W ij = ζ i, ζ j ζ be the cefficients f the Gram matrix, k 1 k 2 k N its eigenvalues srted in descending abslute value and u 1, u 2,..., u N the respective eigenvectrs. The ptimal d-dimensinal embedding fr a pint ζ n is given by k 1/2 1 u n 1 k 1/2 2 u n 2 y n =, (4.17). k 1/2 d u n d

18 Mean 18 W Variance 18 W Energy 6 W W (a) 45 N 12 E 6 E Cumulative energy in % 6 W W 45 N 12 E 6 E rspa.ryalscietypublishing.rg Prc R Sc A Mde Number f mdes used (b) (c) Figure 13: Mean, variance and energy distributin f the Bartrpic mdel at statistical steady state. where u n m dentes the n th cmpnent f the m th eigenvectr. The ptimality f (4.17) can be prven by the Eckart-Yung-Mirsky therem, as prblem (4.16) is equivalent with finding the best d rank apprximatin in the Frbenius nrm. In ur prblem, the standard kinetic energy prduct is used t preserve the nnlinear symmetries f the system dynamics [25]: ζ i, ζ j ζ = ψ i ψ j d S = ζ i ψ j d S = ζ j ψ i d S, (4.18) S S S where the last identities are derived using partial integratin and the fact that ζ = y. The energy spectrum f the mdes f the reduced rder space y is pltted in Figure 13a. Slutin (4.17) is nly ptimal w.r.t. the N training data pints used t cnstruct the Gram matrix. In rder t calculate the embedding fr a new pint, it is cnvenient t cmpute the empirical rthgnal functins (EOFs) which frm an rthnrmal basis f the reduced rder space y [25]. The EOFs are given by φ = N n=1 k 1/2 m u n mζ n, (4.19) where m runs frm 1 t d. The EOFs are srted in descending rder accrding t their energy level. The first fur EOFs are pltted in Figure 14. EOF analysis has been used t identify individual realistic climatic mdes such as the Arctic Oscillatin (AO) [57,58] knwn as telecnnectins. The first EOF is characterized by a center f actin ver the Arctic that

19 Mde 1 18 W.2 Mde 2 18 W.2 12 W W W 6 E W.2 Mde 4 18 W.2 12 W W W 6 E E 45 N E E 45 N W Mde 3.1 E 45 N E.1 E 45 N W Figure 14: The fur mst energetic empirical rthgnal functins f the bartrpic mdel As a cnsequence f the rthgnality f the EOFs w.r.t. the kinetic energy prduct, the reduced representatin y f a new state ζ can be recvered frm hζ, φ1 iζ hζ, φ i 2 ζ. y = (4.2)... hζ, φd iζ In essence, the EOFs act as an rthgnal basis f the reduced rder space and the new state ζ is prjected t this basis. Only the d cefficients crrespnding t the mst energetic EOFs frm the reduced rder state y. In ur study, the dimensinality f the reduced space is rdim = 3, as φ3 cntains nly 3.65% f the energy f φ1, while the 3 mst energetic mdes cntain apprximately 82% f the ttal energy, as depicted in Figure 13c. 19 rspa.ryalscietypublishing.rg Prc R Sc A... is surrunded by a znal symmetric structure in mid-latitudes. This pattern resembles the Arctic Oscillatin/Nrthern Hemisphere Annular Mde (AO/NAM) [57] and explains apprximately 13.5% f the ttal energy. The secnd, third and furth EOFs are quantitatively very similar t the East Atlantic/West Russia [59], the Pacific/Nrth America (PNA) [6] and the Trpical/Nrthern Hemisphere (TNH) [61] patterns end accunt fr 11.4%, 1.4% and 7.1% f the ttal energy respectively. Since these EOFs feature realistic climate telecnnectins, perfrming accurate predictins f them is f high practical imprtance.

20 (ii) Training and Predictin The reduced rder state that we want t predict using the LSTM are the 3 cmpnents f y. A stateless LSTM with h = 14 hidden units is cnsidered, while the truncated back-prpagatin hrizn is set t d = 1. The prttypical system is less chatic than the KS equatin and the Lrenz 96, which enables us t use mre hidden units. The reasn is that as chaticity is decreased trajectries sampled frm the attractr as training and validatin dataset becme mre intercnnected and the task is inherently easier and less prne t verfitting. In the extreme case f a peridic system, the infrmatin wuld be identical. 5 pints are randmly picked frm the attractr as initial cnditins fr testing. A Gaussian ensemble with a small variance (σ en =.1) alng each dimensin is frmed and marched using the reduced-rder GPR, MSM, Mixed GPR-MSM and LSTM methds. (iii) Results The RMSE errr f the fur mst energetic reduced rder space variables y i fr i {1,..., 4} is pltted in Figure 15. The LSTM takes 4 5 h t reach the attractr, while GPR based methds generally take 3 4 h. In cntrast, the MSM reaches the attractr already after 1 hur. This implies that the LSTM can better capture the nn-linear dynamics cmpared t GPR. Nte that the bartrpic mdel is much less chatic than the Lrenz 96 system with F = 16, where all methds shw cmparable predictin perfrmance. Blended LSTM mdels with MSM are mitted here, as LSTM mdels nly reach the attractr standard deviatin twards the end f the simulated time and MSM-LSTM shws identical perfrmance. 2 rspa.ryalscietypublishing.rg Prc R Sc A RMSE (y1).3.2 RMSE (y2) Time (hurs) Time (hurs) (a).5.4 (b).5.4 RMSE (y3).3.2 RMSE (y4) Time (hurs) Time (hurs) (c) (d) Figure 15: Mean RMSE f the mst energetic EOFs ver 5 initial cnditins fr the Bartrpic climate mdel. Standard deviatin frm the attractr ; MSM ; GPR ; GPR-MSM ; LSTM

21 5. A Cmment n Cmputatinal Cst f Predictin The cmputatinal cst f making a single predictin can be quantified by the number f peratins (multiplicatins and additins) needed. In GPR based appraches the cmputatinal cst in the Landau ntatin is O(N 2 ), where N is the number f samples used in training. Fr GPR methds illustrated in the previus sectin N 25. The GPR mdels the glbal dynamics by unifrmly sampling the attractr and "carries" this training dataset at each time instant t identify the gemetric relatin between the input and the training dataset and make (exact) prbabilistic inference n the utput. In cntrast, LSTM learns the behavir by adjusting its parameters, which leads t a predictin cmputatinal cmplexity that des nt depend n the number f samples used fr training. The inference cmplexity is rughly O(d i d h + d h 2 ), where d i is the dimensin f each input, d is the number f inputs and h is the number f hidden units. This cmplexity is significantly smaller than GPR, which can be translated t faster predictin. Especially in real-time applicatins that require fast shrt-term predictins f a cmplex system, the LSTM has an advantage. Hwever, it is lgical that the LSTM is mre prne t diverge frm the attractr, as there is n guarantee that the infrequent training samples near the attractr limits where memrized. This remark explains the faster divergence f LSTM in the mre turbulent regimes cnsidered in Sectin Cnclusins We prpse a data-driven methd, based n lng-shrt term memry netwrks, fr mdeling and predictin in the reduced space f chatic dynamical systems. The LSTM uses the shrt term histry f the reduced rder variable t predict the state derivative and uses it fr ne-step predictin. The netwrk is trained n time-series data and it requires n prir knwledge f the underlying gverning equatins. Using the trained netwrk, lng-term predictins are made by iteratively predicting ne step frward. The features f the prpsed technique are shwcased thrugh cmparisns with GPR and MSM n bench-marked cases. Three applicatins are cnsidered, the Lrenz 96 system, the Kuramt-Sivashinsky equatin and a bartrpic climate mdel. The chaticity f these systems ranges frm weakly chatic t fully turbulent, ensuring a cmplete simulatin study. Cmparisn measures include the RMSE and AC between the predicted trajectries and trajectries f the real dynamics. In all cases, the prpsed apprach perfrms better, in shrt term predictins, as the LSTM is mre efficient in capturing the lcal dynamics and cmplex interactins between the mdes. Hwever, the predictin errr prpagates fast and the predictin similar t GPR des nt cnverge t the invariant measure. Furthermre in the cases f increased chaticity the LSTM diverges faster than GPR. This may be attributed t the nn-presence f certain attractr regins in the training data, insufficient training, and prpagatin f the expnentially increasing predictin errr. T mitigate this effect, LSTM is als cmbined with MSM, fllwing ideas presented in [25], in rder t guarantee cnvergence t the invariant measure. Blending LSTM r GPR with MSM leads t a deteriratin in the shrt term predictin perfrmance but the steady-state statistical behavir is captured. The hybrid LSTM-MSM exhibits a slightly superir perfrmance than GPR-MSM in all systems cnsidered in this study. In the Kuramt-Sivashinsky equatin LSTM can capture better the lcal dynamics cmpared t Lrenz 96 due t the lwer intrinsic dimensinality f the attractr. The LSTM shws cmparable frecasting accuracy with GPR in the bartrpic mdel. The intrinsic dimensinality is significantly smaller than Kuramt-Sivashinsky and Lrenz 96 and bth methds can effectively capture the dynamics. Mrever, the predictin errr des nt prpagate as rapidly as in Lrenz 96 and the blended LSTM-MSM scheme is mitted. Future directins include mdeling the lwer energy mdes and interplatin errrs using a stchastic cmpnent in the LSTM t imprve the frecasting accuracy. Anther pssible 21 rspa.ryalscietypublishing.rg Prc R Sc A

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

A Scalable Recurrent Neural Network Framework for Model-free

A Scalable Recurrent Neural Network Framework for Model-free A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

Lyapunov Stability Stability of Equilibrium Points

Lyapunov Stability Stability of Equilibrium Points Lyapunv Stability Stability f Equilibrium Pints 1. Stability f Equilibrium Pints - Definitins In this sectin we cnsider n-th rder nnlinear time varying cntinuus time (C) systems f the frm x = f ( t, x),

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Modeling the Nonlinear Rheological Behavior of Materials with a Hyper-Exponential Type Function

Modeling the Nonlinear Rheological Behavior of Materials with a Hyper-Exponential Type Function www.ccsenet.rg/mer Mechanical Engineering Research Vl. 1, N. 1; December 011 Mdeling the Nnlinear Rhelgical Behavir f Materials with a Hyper-Expnential Type Functin Marc Delphin Mnsia Département de Physique,

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic. Tpic : AC Fundamentals, Sinusidal Wavefrm, and Phasrs Sectins 5. t 5., 6. and 6. f the textbk (Rbbins-Miller) cver the materials required fr this tpic.. Wavefrms in electrical systems are current r vltage

More information

7 TH GRADE MATH STANDARDS

7 TH GRADE MATH STANDARDS ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

Comparison of hybrid ensemble-4dvar with EnKF and 4DVar for regional-scale data assimilation

Comparison of hybrid ensemble-4dvar with EnKF and 4DVar for regional-scale data assimilation Cmparisn f hybrid ensemble-4dvar with EnKF and 4DVar fr reginal-scale data assimilatin Jn Pterjy and Fuqing Zhang Department f Meterlgy The Pennsylvania State University Wednesday 18 th December, 2013

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

EDA Engineering Design & Analysis Ltd

EDA Engineering Design & Analysis Ltd EDA Engineering Design & Analysis Ltd THE FINITE ELEMENT METHOD A shrt tutrial giving an verview f the histry, thery and applicatin f the finite element methd. Intrductin Value f FEM Applicatins Elements

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Multiple Source Multiple. using Network Coding

Multiple Source Multiple. using Network Coding Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,

More information

Churn Prediction using Dynamic RFM-Augmented node2vec

Churn Prediction using Dynamic RFM-Augmented node2vec Churn Predictin using Dynamic RFM-Augmented nde2vec Sandra Mitrvić, Jchen de Weerdt, Bart Baesens & Wilfried Lemahieu Department f Decisin Sciences and Infrmatin Management, KU Leuven 18 September 2017,

More information

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL JP2.11 APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL Xingang Fan * and Jeffrey S. Tilley University f Alaska Fairbanks, Fairbanks,

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

(1.1) V which contains charges. If a charge density ρ, is defined as the limit of the ratio of the charge contained. 0, and if a force density f

(1.1) V which contains charges. If a charge density ρ, is defined as the limit of the ratio of the charge contained. 0, and if a force density f 1.0 Review f Electrmagnetic Field Thery Selected aspects f electrmagnetic thery are reviewed in this sectin, with emphasis n cncepts which are useful in understanding magnet design. Detailed, rigrus treatments

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are: Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track

More information

Aerodynamic Separability in Tip Speed Ratio and Separability in Wind Speed- a Comparison

Aerodynamic Separability in Tip Speed Ratio and Separability in Wind Speed- a Comparison Jurnal f Physics: Cnference Series OPEN ACCESS Aerdynamic Separability in Tip Speed Rati and Separability in Wind Speed- a Cmparisn T cite this article: M L Gala Sants et al 14 J. Phys.: Cnf. Ser. 555

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Time-domain lifted wavelet collocation method for modeling nonlinear wave propagation

Time-domain lifted wavelet collocation method for modeling nonlinear wave propagation Lee et al.: Acustics Research Letters Online [DOI./.] Published Online 8 August Time-dmain lifted wavelet cllcatin methd fr mdeling nnlinear wave prpagatin Kelvin Chee-Mun Lee and Wn-Seng Gan Digital Signal

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

ENGI 4430 Parametric Vector Functions Page 2-01

ENGI 4430 Parametric Vector Functions Page 2-01 ENGI 4430 Parametric Vectr Functins Page -01. Parametric Vectr Functins (cntinued) Any nn-zer vectr r can be decmpsed int its magnitude r and its directin: r rrˆ, where r r 0 Tangent Vectr: dx dy dz dr

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

SAMPLING DYNAMICAL SYSTEMS

SAMPLING DYNAMICAL SYSTEMS SAMPLING DYNAMICAL SYSTEMS Melvin J. Hinich Applied Research Labratries The University f Texas at Austin Austin, TX 78713-8029, USA (512) 835-3278 (Vice) 835-3259 (Fax) hinich@mail.la.utexas.edu ABSTRACT

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

Numerical Simulation of the Thermal Resposne Test Within the Comsol Multiphysics Environment

Numerical Simulation of the Thermal Resposne Test Within the Comsol Multiphysics Environment Presented at the COMSOL Cnference 2008 Hannver University f Parma Department f Industrial Engineering Numerical Simulatin f the Thermal Respsne Test Within the Cmsl Multiphysics Envirnment Authr : C. Crradi,

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

David HORN and Irit OPHER. School of Physics and Astronomy. Raymond and Beverly Sackler Faculty of Exact Sciences

David HORN and Irit OPHER. School of Physics and Astronomy. Raymond and Beverly Sackler Faculty of Exact Sciences Cmplex Dynamics f Neurnal Threshlds David HORN and Irit OPHER Schl f Physics and Astrnmy Raymnd and Beverly Sackler Faculty f Exact Sciences Tel Aviv University, Tel Aviv 69978, Israel hrn@neurn.tau.ac.il

More information

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

FIELD QUALITY IN ACCELERATOR MAGNETS

FIELD QUALITY IN ACCELERATOR MAGNETS FIELD QUALITY IN ACCELERATOR MAGNETS S. Russenschuck CERN, 1211 Geneva 23, Switzerland Abstract The field quality in the supercnducting magnets is expressed in terms f the cefficients f the Furier series

More information

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance Verificatin f Quality Parameters f a Slar Panel and Mdificatin in Frmulae f its Series Resistance Sanika Gawhane Pune-411037-India Onkar Hule Pune-411037- India Chinmy Kulkarni Pune-411037-India Ojas Pandav

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

THE PARTITION OF ENERGY INTO WAVES AND CURRENTS

THE PARTITION OF ENERGY INTO WAVES AND CURRENTS THE PARTITION OF ENERGY INTO WAVES AND CURRENTS W. Perrie, C. Tang, Y. Hu and B.M. DeTracy Fisheries & Oceans Canada, Bedfrd Institute f Oceangraphy, Dartmuth, Nva Sctia, Canada 1. INTRODUCTION Ocean mdels

More information

Engineering Approach to Modelling Metal THz Structures

Engineering Approach to Modelling Metal THz Structures Terahertz Science and Technlgy, ISSN 1941-7411 Vl.4, N.1, March 11 Invited Paper ngineering Apprach t Mdelling Metal THz Structures Stepan Lucyszyn * and Yun Zhu Department f, Imperial Cllege Lndn, xhibitin

More information

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS Christpher Cstell, Andrew Slw, Michael Neubert, and Stephen Plasky Intrductin The central questin in the ecnmic analysis f climate change plicy cncerns

More information

3D FE Modeling Simulation of Cold Rotary Forging with Double Symmetry Rolls X. H. Han 1, a, L. Hua 1, b, Y. M. Zhao 1, c

3D FE Modeling Simulation of Cold Rotary Forging with Double Symmetry Rolls X. H. Han 1, a, L. Hua 1, b, Y. M. Zhao 1, c Materials Science Frum Online: 2009-08-31 ISSN: 1662-9752, Vls. 628-629, pp 623-628 di:10.4028/www.scientific.net/msf.628-629.623 2009 Trans Tech Publicatins, Switzerland 3D FE Mdeling Simulatin f Cld

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large

More information

1 The limitations of Hartree Fock approximation

1 The limitations of Hartree Fock approximation Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

The Destabilization of Rossby Normal Modes by Meridional Baroclinic Shear

The Destabilization of Rossby Normal Modes by Meridional Baroclinic Shear The Destabilizatin f Rssby Nrmal Mdes by Meridinal Barclinic Shear by Jseph Pedlsky Wds Hle Oceangraphic Institutin Wds Hle, MA 0543 Abstract The Rssby nrmal mdes f a tw-layer fluid in a meridinal channel

More information

ENG2410 Digital Design Sequential Circuits: Part A

ENG2410 Digital Design Sequential Circuits: Part A ENG2410 Digital Design Sequential Circuits: Part A Fall 2017 S. Areibi Schl f Engineering University f Guelph Week #6 Tpics Sequential Circuit Definitins Latches Flip-Flps Delays in Sequential Circuits

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

Soliton-Effect Optical Pulse Compression in Bulk Media with χ (3) Nonlinearity. 1 Introduction

Soliton-Effect Optical Pulse Compression in Bulk Media with χ (3) Nonlinearity. 1 Introduction Nnlinear Analysis: Mdelling and Cntrl, Vilnius, IMI,, N 5 Lithuanian Assciatin f Nnlinear Analysts, Slitn-Effect Optical Pulse Cmpressin in Bulk Media with χ (3) Nnlinearity Received: 9.7. Accepted: 11.1.

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y ) (Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well

More information

Modelling of NOLM Demultiplexers Employing Optical Soliton Control Pulse

Modelling of NOLM Demultiplexers Employing Optical Soliton Control Pulse Micwave and Optical Technlgy Letters, Vl. 1, N. 3, 1999. pp. 05-08 Mdelling f NOLM Demultiplexers Emplying Optical Slitn Cntrl Pulse Z. Ghassemly, C. Y. Cheung & A. K. Ray Electrnics Research Grup, Schl

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

THE TOPOLOGY OF SURFACE SKIN FRICTION AND VORTICITY FIELDS IN WALL-BOUNDED FLOWS

THE TOPOLOGY OF SURFACE SKIN FRICTION AND VORTICITY FIELDS IN WALL-BOUNDED FLOWS THE TOPOLOGY OF SURFACE SKIN FRICTION AND VORTICITY FIELDS IN WALL-BOUNDED FLOWS M.S. Chng Department f Mechanical Engineering The University f Melburne Victria 3010 AUSTRALIA min@unimelb.edu.au J.P. Mnty

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

Neural Networks with Wavelet Based Denoising Layers for Time Series Prediction

Neural Networks with Wavelet Based Denoising Layers for Time Series Prediction Neural Netwrks with Wavelet Based Denising Layers fr Time Series Predictin UROS LOTRIC 1 AND ANDREJ DOBNIKAR University f Lublana, Faculty f Cmputer and Infrmatin Science, Slvenia, e-mail: {urs.ltric,

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

Introductory Thoughts

Introductory Thoughts Flw Similarity By using the Buckingham pi therem, we have reduced the number f independent variables frm five t tw If we wish t run a series f wind-tunnel tests fr a given bdy at a given angle f attack,

More information

Supporting information

Supporting information Electrnic Supplementary Material (ESI) fr Physical Chemistry Chemical Physics This jurnal is The wner Scieties 01 ydrgen perxide electrchemistry n platinum: twards understanding the xygen reductin reactin

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network Research Jurnal f Applied Sciences, Engineering and Technlgy 5(2): 465-469, 2013 ISSN: 2040-7459; E-ISSN: 2040-7467 Maxwell Scientific Organizatin, 2013 Submitted: May 08, 2012 Accepted: May 29, 2012 Published:

More information

Investigation of a Single-Point Nonlinearity Indicator in One-Dimensional Propagation. 2 Theory

Investigation of a Single-Point Nonlinearity Indicator in One-Dimensional Propagation. 2 Theory Investigatin f a Single-Pint Nnlinearity Indicatr in One-Dimensinal Prpagatin Lauren Falc, Kent Gee, Anthny Atchley, Victr Sparrw The Pennsylvania State University, Graduate Prgram in Acustics, University

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

Principal Components

Principal Components Principal Cmpnents Suppse we have N measurements n each f p variables X j, j = 1,..., p. There are several equivalent appraches t principal cmpnents: Given X = (X 1,... X p ), prduce a derived (and small)

More information

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design

More information