Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning
|
|
- Zoe Harper
- 6 years ago
- Views:
Transcription
1 Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles 1050 Bruxelles - Belgium gbonte@ulb.ac.be Abstract. Existing approaces to long term time series forecasting are based eiter on iterated one-step-aead predictors or direct predictors. In bot cases te modeling tecniques wic are used to implement tese predictors are multi-input single-output tecniques. Tis paper discusses te limits of single-output approaces wen te predictor is expected to return a long series of future values and presents a multi-output approac to long term prediction. Te motivation for tis work is te fact tat, wen predicting multiple steps aead of a time series, it could be interesting to exploit te information tat a future series value could ave on anoter future value. We propose a multi-output extension of our previous work on Lazy Learning, called LL-MIMO, and we introduce an averaging strategy of several long term predictors to improve te final accuracy. In order to sow te effectiveness of te metod, we present te results obtained on te tree training time series of te ESTSP 08 competition. 1 Introduction A regular time series is a sequence of measurements t of an observable at equal time intervals. Bot a deterministic and a stocastic interpretation of te forecasting problem on te basis of istorical dataset exist. Te deterministic interpretation is supported by te well-known Takens teorem [13] wic implies tat for a wide class of deterministic systems, tere exists a diffeomorpism (one-to-one differential mapping) between a finite window of te time series { t 1, t 2,..., t m } (lag vector) and te state of te dynamic system underlying te series. Tis means tat in teory it exists a multi-input singleoutput mapping (delay coordinate embedding) f : R m R so tat: t+1 = f( t d, t d 1,..., t d m+1 ) (1) were m (dimension) is te number of past values taken into consideration and d is te lag time. Tis formulation returns a state space description, were in te m dimensional space te time series evolution is a trajectory, and eac point represents a temporal pattern of lengt m. Te representation (1) does not take into account any noise component, since it assumes tat a deterministic process f can accurately describe te time series. Note, owever, tat tis is only a possible way of representing te time series penomenon and tat alternative representations sould not be discarded a priori. In fact, once we assume tat we ave not access to an accurate model of te function f, it is reasonable to extend te deterministic formulation (1) to a statistical Nonlinear Auto Regressive (NAR) formulation [8] t+1 = f ( t d, t d 1,..., t d m+1) + w(t) (2)
2 were te missing information is lumped into a noise term w. In te rest of te paper, we will ten refer to te formulation (2) as a general representation of te time series wic includes as particular instance te case (1). Te success of a reconstruction approac starting from a set of observed data depends on te coice of te ypotesis tat approximates f, te coice of te order m and te lag time d. In tis paper we will address only te problem of te modeling of f, assuming tat te values of m and d are available a priori or selected by conventional model selection tecniques. Good references on te order selection are given in [7, 16]. A model of te mapping (2) can be used for two objectives: one-step prediction and iterated prediction. In te first case, te m previous values of te series are assumed to be available and te problem is equivalent to a problem of function estimation. In te case of iterated prediction, te predicted output is fed back as an input to te following prediction. Hence, te inputs consist of predicted values as opposed to actual observations of te original time series. A prediction iterated for H times returns a H-step-aead forecasting. Examples of iterated approaces are recurrent neural networks [17] or local learning iterated tecniques [9, 12]. Anoter way to perform H-step-aead forecasting is to ave a model wic returns a direct forecast at time t +, = 1,...,H: t+ = f ( t d, t d 1,..., t d m+1 ) Direct metods often require ig functional complexity in order to emulate te system. In some cases te direct prediction metod yields better results tan te iterated one [16]. An example of combination of local tecniques of integrated and direct type is provided by Sauer [15]. Iterated and direct tecniques for multi-step-aead prediction sare a common feature: tey model from istorical data a multi-input single-output mapping were te output is te variable t+1 in te iterated case and te variable t+ in te direct case, respectively. Tis paper advocates tat wen a very long term prediction is at stake and a stocastic setting is assumed, te modeling of a single-output mapping neglects te existence of stocastic dependencies between future values, (e.g. t+ and t++1 ) and consequently biases te prediction accuracy. A possible way to remedy to tis sortcoming is to move from te modeling of single-output mapping to te modeling of multi-output dependencies. Tis requires te adoption of a multi-output tecnique were te predicted value is no more a scalar quantity but a vector of future values of te time series. If tere are multiple outputs it is common, apart from some exceptions [11], to treat te prediction problem as a set of independent problems, one per output. Unfortunately tis is not effective if te output noises are correlated as it is te case in a time series. Te contribution of te paper is to present a simple extension of te Lazy Learning paradigm to te multi-output setting[5, 2]. Lazy Learning (LL) is a local modeling tecnique wic is query-based in te sense tat te wole learning procedure (i.e. structural and parametric identification) is deferred until a prediction is required. In previous works we presented an original Lazy Learning algoritm [5, 2] tat selects automatically on a query-byquery basis te optimal number of neigbors. Iterated versions of Lazy Learning were successfully applied to multi-step-aead time series prediction [4, 6]. Tis
3 paper presents instead a multi-output version of LL for te prediction of multiple and dependent outputs in te context of long term prediction. 2 Multi-step-aead and multi-output models Let us consider a stocastic time-series of dimension m described by te stocastic dependency t+1 = f ( t d, t d 1,..., t d m+1) + w(t) = f(x) + w(t) (3) were w is a zero-mean noise term and X denotes te lag vector X = { t d, t d 1,..., t d m+1 } Suppose we ave measured te series up to time t and tat we intend to forecast te next H, H 1, values. Te problem of predicting te next H values boils down to te estimation of te distribution of te H dimensional random vector Y = { t+1,..., t+h } conditional on te value of X. In oter terms, te stocastic dependency (2) between a future value t of te time series and te past observed values X induces te existence of a multivariate conditional probability p(y X) were Y R H and X R m. Tis distribution can be igly complex in te case of a large dimensionality m of te series and a long term prediction orizon H. An easy way to visualize and reason about tis complex conditional distribution is to use a probabilistic grapical model approac. Probabilistic grapical models [10] are graps in wic nodes represent random variables, and te lack of arcs represent conditional independence assumptions. For instance te probabilistic dependencies wic caracterize a multi-step-aead prediction problem for a time series of dimension m = 2, lag time d = 0 and orizon H = 3 can be represented by te grapical model in Figure 1. Note tat in tis figure, X = { t, t 1 } and Y = { t+1, t+2, t+3 }. Tis grap sows tat t 1 as a direct influence on t+1 but only an indirect influence on t+2. At te same time t+1 and t+3 are not conditionally independent given te vector X = { t, t 1 }. Any forecasting metod wic aims to perform multi-step aead prediction implements (often in an implicit manner) an estimator of te igly multivariate conditional distribution p(y X). Te grapical model representation can elp us in visualizing te differences between te two most common multi-step-aead approaces, te iterated and te direct one. Te iterated prediction approac replaces te unknown random variables { t+1,..., t+h 1 } wit teir estimations {ˆ t+1,..., ˆ t+h 1 }. In grapical terms tis metod models an approximation (Figure 2) of te real conditional distribution were te topology of conditional dependencies is preserved toug non observable variables are replaced by teir noisy estimators. Te direct prediction approac transforms te problem of modeling te multivariate distribution p(y X) into H distinct and parallel problems were te target conditional distribution is p( t+ X), = 1,...,H. Te topology of te dependencies of te original condition distribution is ten altered as sown in Figure 3. Note
4 t 1 t t+1 t+2 t+3 Fig. 1: Grapical modeling representation of te conditional distribution p(y X) for H = 3, m = 2, d = 0 t 1 t t+1 t+2 t+3 Fig. 2: Grapical modeling representation of te distribution modeled by te iterated approac in te H = 3, m = 2, d = 0 prediction problem.
5 t 1 t t+1 t+2 t+3 Fig. 3: Grapical modeling representation of te distribution modeled by te direct approac in te H = 3, m = 2, d = 0 prediction problem. tat in grapical model terminology tis is equivalent to make a conditional independence assumption p(y X) = p({ t+1,..., t+h } X) = H p( t+ X) Suc assumption is well known in te macine learning literature since it is exploited by te Naive Bayes classifier to simplify multivariate classification problems. Figures 2 and 3 visualize te disadvantages associated to te adoption of te iterated and te direct metod, respectively. Iterated metods may suffer of low performance in long orizon tasks. Tis is due to te fact tat tey are essentially models tuned wit a one-step-aead criterion and terefore tey are not able to take temporal beavior into account. In terms of bias/variance decomposition we can say tat te iterated approac returns a non biased estimator of te conditional distribution p(y X) since it preserves te dependencies between te components of te vector Y toug it suffers of ig variance because of te propagation and amplification of te prediction error. On te oter side, direct metods, by making an assumption of conditional independence, neglect complex dependency patterns existing between te variables in Y and consequently return a biased estimator of te multivariate distribution p(y X). In order to overcome tese sortcomings, tis paper proposes a multi-input multi-output approac were te modeling procedure does not target any more single-output mappings (like t+1 = f(x) + w or t+k = f k (X) + w) but te multi-output mapping Y = F(X) + W were F : R m R H and te covariance of te noise vector W is not necessarily diagonal or symmetrical [11]. Te multi-output model is expected to return a multivariate estimation of te joint distribution p(y X) and, by taking into account te dependencies between te components of Y, to reduce te bias of te direct estimator. However, it is wort noting tat, in case of a large forecasting =1
6 orizons H, te dimensionality of Y is large too, and te multivariate estimation could be vulnerable to large variance. A possible countermeasure to suc a side effect is te adoption of combination strategies, wic are well reputed to reduce variance in case of low bias estimators. Te idea of combining predictors is well known in te time series literature [15]. Wat is original ere is tat a multi-output approac allows te availability of a large number of estimators once te prediction orizon H is long. Tink for example to te case were H = 20 and we want to estimate te value t+10. A simple way to make suc estimate more robust and accurate is to compute and combine several long term estimators wic ave an orizon larger tan 10 (e.g. all te predictors wit orizon between 10 and 20). For multi-output prediction problems te availability of learning algoritms is muc more reduced tan in te single output case [11]. Most of existing approaces propose wat is actually done by te direct approac, tat is to decompose te problem into several multi-output single-output problems by making te assumption of conditional independence. Wat we propose ere is to remove tis assumption by using a multivariate estimation of te conditional distribution. For tis purpose we adopt a nearest neigbor estimation approac were te problem of adjusting te size of te neigborood (bandwidt) is solved by a strategy successfully adopted in our previous work on te Lazy Learning algoritm [5, 2]. 3 A locally constant metod for multi-output regression We discuss ere a locally constant multi-output regression metod to implement a multi-step-aead predictor. Te idea is to return, instead of a scalar, a vector wic smootes te continuation of te trajectories wic at time t resemble te most to te trajectory X. Tis metod is a multi-output extension of te Lazy Learning algoritm [5, 2] and is referred to as LL-MIMO. Te adoption of a local approac to solve a prediction task requires te definition of a set of model parameters (e.g. te number of neigbors, te kernel function, te parametric family, te distance metric). In local learning literature different metods exist to automatically select te adequate configuration [1, 2] by adopting tools and tecniques from te field of linear statistical analysis. One of tese tools is te PRESS statistic wic is a simple, well-founded and economical way to perform leave-one-out (l-o-o) cross-validation and to assess te performance in generalization of local linear models. By assessing te performance of eac local model, alternative configurations can be tested and compared in order to select te best one in terms of expected prediction. Tis is known as te winner-takes-all approac in model selection. An alternative to te winnertakes-all approac was proposed in [5, 2] and consists in combining several local models by using te PRESS leave-one-out error to weig te contribution of eac term. Tis appears to particularly effective in large variance settings [3] as it is presumably te case of a stocastic multi-step-aead task. LL-MIMO extends te bandwidt combination strategy to te multi-output case were H denotes bot te orizon of te long term prediction and te number of outputs. Wat we propose is a combination of local approximators wit different bandwidts were te weigting criterion depends on te multiple
7 step leave-one-out errors e, = 1,...,H, computed over te orizon H. In order to apply local learning to time series forecasting, te time series is embedded into a dataset D N made of N pairs (X i, Y i ), were X i is a temporal pattern of lengt m, and te vector Y i is te consecutive temporal pattern of lengt H. Suppose te series is measured up to time t and assume for simplicity tat te lag d = 0. Let us denote X = { t,..., t m+1 } te lag embedding vector at time t. Given a metric on te space R m let us order increasingly te set of vectors X i wit respect to te distance to X and denote by [j] te index of te jt closest neigbor of X. For a given number k of neigbors te H step prediction is a vector wose t component is te average Ŷ k = 1 k were Y [j] is te output vector of te jt closest neigbor of X in te training set D N. We can associate to te estimation Ŷ k a multi-step leave-one-error E k = 1 H were e is te leave-one-out error of a constant model used to approximate te output at te step. In case of constant model te l-o-o term is easy to derive [3] e = k Y [j] Ŷ k k 1 Toug te optimal number of neigbors k is not known a priori, in [5, 2] we sowed tat an effective strategy consists in (i) allowing k to vary in a set k 1,...,k b and (ii) returning a prediction wic is te combination of te predictions Ŷ ki for eac bandwidt k i, i = 1,...,b. If we adopt as combination strategy te generalized ensemble metod proposed in [14], we obtain tat te outcome of te LL-MIMO algoritm is a vector of size H wose t term is k j=1 H =1 Y [j] e 2 ˆ t+ = Ŷ = b i=1 ζ i Ŷ ki b i=1 ζ, = 1,...,H (4) i and te weigts are te inverse of te multiple-step l-o-o mean square errors: ζ i =1/E ki. 4 Experiments and final considerations Te LL-MIMO approac as been tested by applying it to te prediction of te tree time series from te ESTSP08 Competition. Te first time series (ESTSP1) as a training set of 354 tree-dimensional vectors and te task is to predict te continuation of te tird variable for H = 18 steps. Te second time
8 series (ESTSP2) as a training set of 1300 values and te task is to predict te continuation for H = 100 steps. Te tird time series (ESTSP3) as a training set of values and te task is to predict te continuation for H = 200 steps. Te experimental session aims to compare te following set of metods on a long term prediction task (i) a conventional iterated approac (ii) a direct approac (iii) a multi-output LL-MIMO approac (iv) a combination of several LL-MIMO predictors (denoted by LL-MIMO-COMB) (v) a combination of te LL-MIMO and te iterated approac (denoted by LL-MIMO-IT). In te strategy LL-MIMO-COMB te prediction at time t + is ˆ t+ = H j= Ŷ (Hj) H + 1, were Ŷ (Hj) is te prediction of a multi-output LL-MIMO for an orizon H j. In te strategy LL-MIMO-IT te prediction ˆ t+ = Ŷ (H) + Ŷ it were Ŷ it is te prediction returned by an iterated sceme. Te rationale beind tis two averaging metods is te reduction of te variance as discussed at te end of Section 2. Note tat in all te considered tecniques te learner is implemented by te same local learning tecnique wic combines a set of constant models wose number of neigbors range in te same interval [5, k b ] wit k b parameter of te algoritm. In order to perform a correct comparison all te tecniques are tested under te same conditions in terms of test intervals, embedding order m, values of k b and lag time d. In detail te series ESTSP1 is used to assess te five tecniques on te last portion of te training set of size H = 18, for values of m ranging from 5 to 20, for values of d ranging from 0 to 1 and and for k b ranging from 10 to 25, te series ESTSP2 is used to assess te five tecniques on te last portion of te training set of size H = 100, for values of m ranging from 5 to 35, for values of d ranging from 0 to 1 and for k b ranging from 10 to 25, te series ESTSP3 is used to assess te five tecniques on te last portion of te training set of size H = 200 for m {20, 50, 80,..., 200}, for values of d ranging from 0 to 2 and for k b ranging from 10 to 15. Table 1 compares te average NMSE (Normalized 1 Mean Squared Error) prediction errors of te five tecniques for te tree datasets. Te bold notation designs te tecnique wic is significantly better tan all te oters (wit 0.05 significativity level of te permutation test). Table 2 compares te minimum of te NMSE prediction errors attained by te five tecniques over all te different configurations in terms of dimension m, lag time d and number k b. Te experimental results sow tat for long term prediction tasks te LL- MIMO-COMB and LL-MIMO-IT strategies, i.e. te averaging formulations 1 Te normalization is done wit respect to te variance of te entire series 2
9 Table 1: Average NMSE of te predictions for te tree time series. Te bold notation stands for significantly better tan all te oters at 0.05 significativity level of te paired permutation test. Test data LL-IT LL-DIR LL-MIMO LL-MIMO-COMB LL-MIMO-IT ESTSP ESTSP ESTSP3 1.63e e e e e-2 Table 2: Minimum NMSE of te predictions for time series. Test data LL-IT LL-DIR LL-MIMO LL-MIMO-COMB LL-MIMO-IT ESTSP ESTSP ESTSP3 1.00e e e e e-2 of te LL-MIMO algoritm, can outperform conventional direct and iterated metods. LL-MIMO alone does not emerge as a competitive algoritm probably because of te excessive variance induced by te large dimensionality. Te low biased nature of LL-MIMO owever makes of tis approac a good candidate for averaging approaces, as demonstrated by te good performance of LL- MIMO-COMB and LL-MIMO-IT. On te basis of tese experiences we decided to submit to te Competition te LL-MIMO-IT prediction of te continuation of ESTP2, and te LL-MIMO-COMB prediction of te continuation of ESTP1 and ESTP3. A plot of te LL-MIMO-COMB prediction on te last portion of ESTP3 is illustrated in Figure 4. We ope tat te final validation provided by te Competition continuation series will confirm te importance of multi-output strategies in long term time series forecasting. References [1] C. G. Atkeson, A. W. Moore, and S. Scaal. Locally weigted learning. Artificial Intelligence Review, 11(1 5):11 73, [2] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets te recursive least-squares algoritm. In M. S. Kearns, S. A. Solla, and D. A. Con, editors, NIPS 11, pages , Cambridge, MIT Press. [3] G. Bontempi. Local Learning Tecniques for Modeling, Prediction and Control. PD tesis, IRIDIA- Université Libre de Bruxelles, [4] G. Bontempi, M. Birattari, and H. Bersini. Lazy learning for iterated time series prediction. In J. A. K. Suykens and J. Vandewalle, editors, Proceedings of te International Worksop on Advanced Black-Box Tecniques for Nonlinear Modeling, pages Katolieke Universiteit Leuven, Belgium, 1998.
10 ESTSP Fig. 4: ESTSP3: time series (line) vs. LL-MIMO-COMB prediction (dots). [5] G. Bontempi, M. Birattari, and H. Bersini. Lazy learning for modeling and control design. International Journal of Control, 72(7/8): , [6] G. Bontempi, M. Birattari, and H. Bersini. Local learning for iterated time-series prediction. In I. Bratko and S. Dzeroski, editors, Macine Learning: Proceedings of te Sixteent International Conference, pages 32 38, San Francisco, CA, Morgan Kaufmann Publisers. [7] M. Casdagli, S. Eubank, J. D. Farmer, and J. Gibson. State space reconstruction in te presence of noise. Pysica D, 51:52 98, [8] J. Fan and Q. Yao. Nonlinear Time Series. Springer, [9] J. D. Farmer and J. J. Sidorowic. Predicting caotic time series. Pysical Review Letters, 8(59): , [10] F. V. Jensen. Bayesian Networks and Decision Graps. Springer, [11] J. M. Matias. Multi-output nonparametric regression. In Progress in Artificial Intelligence, pages , [12] J. McNames, J. Suykens, and J. Vandewalle. Winning contribution of te k.u. leuven time-series prediction competition. International Journal of Bifurcation and Caos, to appear. [13] N. H. Packard, J. P. Crutcfeld, J. D. Farmer, and R. S. Saw. Geometry from a time series. Pysical Review Letters, 45(9): , [14] M. P. Perrone and L. N. Cooper. Wen networks disagree: Ensemble metods for ybrid neural networks. In R. J. Mammone, editor, Artificial Neural Networks for Speec and Vision, pages Capman and Hall, [15] T. Sauer. Time series prediction by using delay coordinate embedding. In A. S. Weigend and N. A. Gersenfeld, editors, Time Series Prediction: forecasting te future and understanding te past, pages Addison Wesley, Harlow, UK, [16] A. Sorjamaa, J. Hao, N. Reyani, Y. Ji, and A. Lendasse. Metodology for longterm prediction of time series. Neurocomputing, [17] R. Williams and D. Zipser. A learning algoritm for continually running fully recurrent neural networks. Neural Computation, 1: , 1989.
Long-Term Prediction of Time Series by combining Direct and MIMO Strategies
Long-Term Prediction of Series by combining Direct and MIMO Strategies Souhaib Ben Taieb, Gianluca Bontempi, Antti Sorjamaa and Amaury Lendasse Abstract Reliable and accurate prediction of time series
More informationA = h w (1) Error Analysis Physics 141
Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.
More informationTe comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab
To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk
More information7 Semiparametric Methods and Partially Linear Regression
7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).
More information232 Calculus and Structures
3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE
More informationRegularized Regression
Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize
More informationTeaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line
Teacing Differentiation: A Rare Case for te Problem of te Slope of te Tangent Line arxiv:1805.00343v1 [mat.ho] 29 Apr 2018 Roman Kvasov Department of Matematics University of Puerto Rico at Aguadilla Aguadilla,
More informationVolume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households
Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of
More informationMVT and Rolle s Theorem
AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state
More informationFinancial Econometrics Prof. Massimo Guidolin
CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis
More information4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.
Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra
More informationBasic Nonparametric Estimation Spring 2002
Basic Nonparametric Estimation Spring 2002 Te following topics are covered today: Basic Nonparametric Regression. Tere are four books tat you can find reference: Silverman986, Wand and Jones995, Hardle990,
More informationHow to Find the Derivative of a Function: Calculus 1
Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te
More informationThe Priestley-Chao Estimator
Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are
More informationNUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,
NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc
More informationBoosting Kernel Density Estimates: a Bias Reduction. Technique?
Boosting Kernel Density Estimates: a Bias Reduction Tecnique? Marco Di Marzio Dipartimento di Metodi Quantitativi e Teoria Economica, Università di Cieti-Pescara, Viale Pindaro 42, 65127 Pescara, Italy
More information2.3 Product and Quotient Rules
.3. PRODUCT AND QUOTIENT RULES 75.3 Product and Quotient Rules.3.1 Product rule Suppose tat f and g are two di erentiable functions. Ten ( g (x)) 0 = f 0 (x) g (x) + g 0 (x) See.3.5 on page 77 for a proof.
More informationLIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION
LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y
More informationNumerical Differentiation
Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function
More informationTechnology-Independent Design of Neurocomputers: The Universal Field Computer 1
Tecnology-Independent Design of Neurocomputers: Te Universal Field Computer 1 Abstract Bruce J. MacLennan Computer Science Department Naval Postgraduate Scool Monterey, CA 9393 We argue tat AI is moving
More informationCubic Functions: Local Analysis
Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps
More informationClick here to see an animation of the derivative
Differentiation Massoud Malek Derivative Te concept of derivative is at te core of Calculus; It is a very powerful tool for understanding te beavior of matematical functions. It allows us to optimize functions,
More informationLIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT
LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as
More informationlecture 26: Richardson extrapolation
43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)
More informationNotes on Neural Networks
Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat
More informationINTRODUCTION AND MATHEMATICAL CONCEPTS
Capter 1 INTRODUCTION ND MTHEMTICL CONCEPTS PREVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x
More informationOrder of Accuracy. ũ h u Ch p, (1)
Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical
More informationGPoM : 5 Models predictability
GPoM : 5 Models predictability Sylvain Mangiarotti & Mireille Huc 2018-07-26 Model performances To evaluate te model performances is a often a difficult problem. It is in particular te case wen te studied
More informationLecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines
Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to
More informationDigital Filter Structures
Digital Filter Structures Te convolution sum description of an LTI discrete-time system can, in principle, be used to implement te system For an IIR finite-dimensional system tis approac is not practical
More informationINTRODUCTION AND MATHEMATICAL CONCEPTS
INTODUCTION ND MTHEMTICL CONCEPTS PEVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips of sine,
More informationChapter 1. Density Estimation
Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f
More informationBounds on the Moments for an Ensemble of Random Decision Trees
Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: Sep. 17, 2013 / Revised: Mar. 04, 2014 / Accepted: Jun. 30, 2014
More information2.11 That s So Derivative
2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point
More informationIntroduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =
Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.
More informationTHE hidden Markov model (HMM)-based parametric
JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Modeling Spectral Envelopes Using Restricted Boltzmann Macines and Deep Belief Networks for Statistical Parametric Speec Syntesis Zen-Hua Ling,
More informationSECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY
(Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative
More informationContinuity and Differentiability Worksheet
Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;
More informationHomework 1 Due: Wednesday, September 28, 2016
0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q
More information1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist
Mat 1120 Calculus Test 2. October 18, 2001 Your name Te multiple coice problems count 4 points eac. In te multiple coice section, circle te correct coice (or coices). You must sow your work on te oter
More informationSymmetry Labeling of Molecular Energies
Capter 7. Symmetry Labeling of Molecular Energies Notes: Most of te material presented in tis capter is taken from Bunker and Jensen 1998, Cap. 6, and Bunker and Jensen 2005, Cap. 7. 7.1 Hamiltonian Symmetry
More informationBounds on the Moments for an Ensemble of Random Decision Trees
Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: / Accepted: Abstract An ensemble of random decision trees is
More informationCopyright c 2008 Kevin Long
Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula
More informationBootstrap confidence intervals in nonparametric regression without an additive model
Bootstrap confidence intervals in nonparametric regression witout an additive model Dimitris N. Politis Abstract Te problem of confidence interval construction in nonparametric regression via te bootstrap
More informationIntuition Bayesian Classification
Intuition Bayesian Classification More ockey fans in Canada tan in US Wic country is Tom, a ockey ball fan, from? Predicting Canada as a better cance to be rigt Prior probability P(Canadian=5%: reflect
More information1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point
MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note
More informationContinuous Stochastic Processes
Continuous Stocastic Processes Te term stocastic is often applied to penomena tat vary in time, wile te word random is reserved for penomena tat vary in space. Apart from tis distinction, te modelling
More informationOnline Learning: Bandit Setting
Online Learning: Bandit Setting Daniel asabi Summer 04 Last Update: October 0, 06 Introduction [TODO Bandits. Stocastic setting Suppose tere exists unknown distributions ν,..., ν, suc tat te loss at eac
More informationCORRELATION TEST OF RESIDUAL ERRORS IN FREQUENCY DOMAIN SYSTEM IDENTIFICATION
IAC Symposium on System Identification, SYSID 006 Marc 9-3 006, Newcastle, Australia CORRELATION TEST O RESIDUAL ERRORS IN REQUENCY DOMAIN SYSTEM IDENTIICATION István Kollár *, Ri Pintelon **, Joan Scouens
More information3.1 Extreme Values of a Function
.1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find
More informationHandling Missing Data on Asymmetric Distribution
International Matematical Forum, Vol. 8, 03, no. 4, 53-65 Handling Missing Data on Asymmetric Distribution Amad M. H. Al-Kazale Department of Matematics, Faculty of Science Al-albayt University, Al-Mafraq-Jordan
More informationEnsembles of Nearest Neighbor Forecasts
Ensembles of Nearest Neighbor Forecasts Dragomir Yankov 1, Dennis DeCoste 2, and Eamonn Keogh 1 1 University of California, Riverside CA 92507, USA, {dyankov,eamonn}@cs.ucr.edu, 2 Yahoo! Research, 3333
More informationAn Empirical Bayesian interpretation and generalization of NL-means
Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation
More informationFundamentals of Concept Learning
Aims 09s: COMP947 Macine Learning and Data Mining Fundamentals of Concept Learning Marc, 009 Acknowledgement: Material derived from slides for te book Macine Learning, Tom Mitcell, McGraw-Hill, 997 ttp://www-.cs.cmu.edu/~tom/mlbook.tml
More informationBootstrap prediction intervals for Markov processes
arxiv: arxiv:0000.0000 Bootstrap prediction intervals for Markov processes Li Pan and Dimitris N. Politis Li Pan Department of Matematics University of California San Diego La Jolla, CA 92093-0112, USA
More informationNatural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs
Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata
More information2.1 THE DEFINITION OF DERIVATIVE
2.1 Te Derivative Contemporary Calculus 2.1 THE DEFINITION OF DERIVATIVE 1 Te grapical idea of a slope of a tangent line is very useful, but for some uses we need a more algebraic definition of te derivative
More informationAnalysis of Solar Generation and Weather Data in Smart Grid with Simultaneous Inference of Nonlinear Time Series
Te First International Worksop on Smart Cities and Urban Informatics 215 Analysis of Solar Generation and Weater Data in Smart Grid wit Simultaneous Inference of Nonlinear Time Series Yu Wang, Guanqun
More informationFlavius Guiaş. X(t + h) = X(t) + F (X(s)) ds.
Numerical solvers for large systems of ordinary differential equations based on te stocastic direct simulation metod improved by te and Runge Kutta principles Flavius Guiaş Abstract We present a numerical
More informationMinimizing D(Q,P) def = Q(h)
Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact
More informationConsider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.
Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions
More informationPreface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser
More informationProbabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm
Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain
More informationEFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS
Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,
More informationPOLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY
APPLICATIONES MATHEMATICAE 36, (29), pp. 2 Zbigniew Ciesielski (Sopot) Ryszard Zieliński (Warszawa) POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY Abstract. Dvoretzky
More informationNew Streamfunction Approach for Magnetohydrodynamics
New Streamfunction Approac for Magnetoydrodynamics Kab Seo Kang Brooaven National Laboratory, Computational Science Center, Building 63, Room, Upton NY 973, USA. sang@bnl.gov Summary. We apply te finite
More informationOn the Identifiability of the Post-Nonlinear Causal Model
UAI 9 ZHANG & HYVARINEN 647 On te Identifiability of te Post-Nonlinear Causal Model Kun Zang Dept. of Computer Science and HIIT University of Helsinki Finland Aapo Hyvärinen Dept. of Computer Science,
More informationNew Distribution Theory for the Estimation of Structural Break Point in Mean
New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University
More informationMath 1241 Calculus Test 1
February 4, 2004 Name Te first nine problems count 6 points eac and te final seven count as marked. Tere are 120 points available on tis test. Multiple coice section. Circle te correct coice(s). You do
More informationTwo Spirals Two Gaussians Letters
12 1 8 6 4 2 Two Spirals Two Gaussians Letters Figure 8: Number of examples needed for average error to reac.3. From left to rigt: random, uncertainty, maximal distance and lookaead sampling metods. contains
More information1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible.
004 Algebra Pretest answers and scoring Part A. Multiple coice questions. Directions: Circle te letter ( A, B, C, D, or E ) net to te correct answer. points eac, no partial credit. Wic one of te following
More informationOn Local Linear Regression Estimation of Finite Population Totals in Model Based Surveys
American Journal of Teoretical and Applied Statistics 2018; 7(3): 92-101 ttp://www.sciencepublisinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20180703.11 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)
More informationOverdispersed Variational Autoencoders
Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,
More informationTime (hours) Morphine sulfate (mg)
Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15
More informationArtificial Neural Network Model Based Estimation of Finite Population Total
International Journal of Science and Researc (IJSR), India Online ISSN: 2319-7064 Artificial Neural Network Model Based Estimation of Finite Population Total Robert Kasisi 1, Romanus O. Odiambo 2, Antony
More informationDeep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy
Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer
More informationLearning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks
Learning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks Jeremy C. Weiss University of Wisconsin-Madison Madison, WI, US jcweiss@cs.wisc.edu Sriraam Natarajan Indiana University
More informationA Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation
A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation Peiua Qiu Scool of Statistics University of Minnesota 313 Ford Hall 224 Curc St SE Minneapolis, MN 55455 Abstract
More informationLearning based super-resolution land cover mapping
earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc
More informationThe Verlet Algorithm for Molecular Dynamics Simulations
Cemistry 380.37 Fall 2015 Dr. Jean M. Standard November 9, 2015 Te Verlet Algoritm for Molecular Dynamics Simulations Equations of motion For a many-body system consisting of N particles, Newton's classical
More informationDifferentiation in higher dimensions
Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends
More informationSolution for the Homework 4
Solution for te Homework 4 Problem 6.5: In tis section we computed te single-particle translational partition function, tr, by summing over all definite-energy wavefunctions. An alternative approac, owever,
More informationFast optimal bandwidth selection for kernel density estimation
Fast optimal bandwidt selection for kernel density estimation Vikas Candrakant Raykar and Ramani Duraiswami Dept of computer science and UMIACS, University of Maryland, CollegePark {vikas,ramani}@csumdedu
More informationMulti-User Communication: Capacity, Duality, and Cooperation. Nihar Jindal Stanford University Dept. of Electrical Engineering
Multi-User Communication: Capacity, Duality, and Cooperation Niar Jindal Stanford University Dept. of Electrical Engineering February 3, 004 Wireless Communication Vision Cellular Networks Wireless LAN
More informationAverage Rate of Change
Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope
More informationBob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk
Bob Brown Mat 251 Calculus 1 Capter 3, Section 1 Completed 1 Te Tangent Line Problem Te idea of a tangent line first arises in geometry in te context of a circle. But before we jump into a discussion of
More informationHOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS
HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3
More informationSECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES
(Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,
More informationLecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.
Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative
More informationA MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES
A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties
More informationTHE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225
THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:
More informationEDML: A Method for Learning Parameters in Bayesian Networks
: A Metod for Learning Parameters in Bayesian Networks Artur Coi, Kaled S. Refaat and Adnan Darwice Computer Science Department University of California, Los Angeles {aycoi, krefaat, darwice}@cs.ucla.edu
More informationPre-Calculus Review Preemptive Strike
Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly
More informationarxiv: v1 [math.oc] 18 May 2018
Derivative-Free Optimization Algoritms based on Non-Commutative Maps * Jan Feiling,, Amelie Zeller, and Cristian Ebenbauer arxiv:805.0748v [mat.oc] 8 May 08 Institute for Systems Teory and Automatic Control,
More informationAdaptive Neural Filters with Fixed Weights
Adaptive Neural Filters wit Fixed Weigts James T. Lo and Justin Nave Department of Matematics and Statistics University of Maryland Baltimore County Baltimore, MD 150, U.S.A. e-mail: jameslo@umbc.edu Abstract
More informationChapter 2 Ising Model for Ferromagnetism
Capter Ising Model for Ferromagnetism Abstract Tis capter presents te Ising model for ferromagnetism, wic is a standard simple model of a pase transition. Using te approximation of mean-field teory, te
More informationFunction Composition and Chain Rules
Function Composition and s James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 8, 2017 Outline 1 Function Composition and Continuity 2 Function
More informationMath 312 Lecture Notes Modeling
Mat 3 Lecture Notes Modeling Warren Weckesser Department of Matematics Colgate University 5 7 January 006 Classifying Matematical Models An Example We consider te following scenario. During a storm, a
More information