Stochastic Volatility Models with Auto-regressive Neural Networks

Size: px
Start display at page:

Download "Stochastic Volatility Models with Auto-regressive Neural Networks"

Transcription

1 AUSTRALIAN NATIONAL UNIVERSITY PROJECT REPORT Stochastic Volatility Models with Auto-regressive Neural Networks Author: Aditya KHAIRE Supervisor: Adj/Prof. Hanna SUOMINEN CoSupervisor: Dr. Young LEE A report submitted in fulfillment of the requirements for the subject Special Topics in Computing (COMP6470) in the Department of Computer Science October 28, 2016

2

3 iii AUSTRALIAN NATIONAL UNIVERSITY Abstract Dr. Weifa Liang Department of Computer Science Stochastic Volatility Models with Auto-regressive Neural Networks by Aditya KHAIRE The Financial Time series data often has high volatility in the data and this makes Financial data more unpredictable and with this data it is very hard to predict the future values. There are various models from econometrics which models the stochastic volatility from the data. They are mainly focus on the mean and variance of the time dependent data but prediction of variance for next time interval is not of main concern. The algorithm proposed in the project is focusing on the prediction of variance through the time dependent data-set.

4

5 v Acknowledgements I would like to express my special thanks of gratitude to my Supervisor Adj/Prof. Hanna SUOMINEN as well as my Co-Supervisor Dr. Young LEE who gave me this opportunity to do this project on the topic Stochastic Volatility Models with Auto-regressive Neural Networks, which also helped me in doing a lot of Research and I came to know about so many new things I am really thankful to them. The thesis was partly carried out in the National Information and Communication Technology Australia and its successor Data61. I also express my gratitude to Mr. Kar Wai Lim as my advisor in NICTA/Data61 and The ANU. Secondly, I would also like to thank my parents and friends who helped me a lot in finalizing this project within the limited time frame.

6

7 vii Contents Abstract Acknowledgements iii v 1 Introduction Introduction Literature Review Gaussian Process Volatility Model Auto regressive Models Stationarity Stochastic Process Gaussian Processes The Stochastic Volatility model The priors Hetroscedastic Generalized Auto-regressive Conditionally Hetroscedastic (GARCH) ARCH(1) Processes GARCH (1,1) Process Model GARCH process Results on GARCH Auto Regressive Neural Network (AR NN) Feed Forward Neural Network Design of Auto Regressive Neural Network Time Series Data set Weight-space Activation Function Back Propagation Algorithm Practical Reducibility Function Results Alpha when alpha = When alpha ( 1 apple 1) When alpha ( >1) Discussion on Activation Function Prediction Results Comparison of Models Comparison between SV, GARCH and AR-NN... 27

8 viii 5 Summary Conclusion Recommendation Future Work Independent Study Contract Contract Bibliography 35

9 ix List of Figures 2.1 Prediction and Observation for the GARCH model Auto Regressive Neural Network AUD/USD Error Function Error Function with = Error Function with 1 apple Error Function with > Logistic Activation Function tanh Activation Function AR-NN Prediction and Observed Error on Train set Contract Contract Page Contract Page

10

11 xi List of Tables 2.1 Values for parameters!,, Error on GARCH model Number of Iteration required for convergence... 27

12

13 xiii List of Abbreviations MCMC - Markov Chain Monte Carlo AR-NN - Auto-regressive Neural Network GARCH - Generalized Auto-Regressive Conditional Heteroskedasticity ARCH - Auto-Regressive Conditional Heteroskedasticity SV - Stochastic Volatility NN - Neural Network AUD/USD - Australian Dollar / U.S Dollar

14

15 1 Chapter 1 Introduction 1.1 Introduction The Financial time series data from the various sources like stock market, foreign exchange, or any data are time dependent and have time index as one of the dependent variable.it is often observed that in the Financial time series data, large changes tend to be followed by future large changes in while small changes are followed by future small changes, this phenomenon is referred to as volatility clustering.(nicalos Chapados, 2012) The stochastic volatility model used for modelling the financial data. In this model Gaussian noise is considered as an additive factor. The idea to model the non-linear stochastic volatility with the linear state space, the model gets complicated as Gaussian noise is transformed into non-linear distribution and is no longer mapped to Gaussian distribution. The approach for predicting the time series data is different from the time independent data as it can be transformed and is easy to manipulate. The same cannot be considered while dealing with the time dependent data. Studies have shown that it is easy to model the time series data when it is linear in nature with uniform distribution with time but when the non linear time series needs to be handled usually it is converted into the linear form and then model using the linear distribution. In the linear approach, the use of Bayesian Inference is the most suitable approach for modelling the posterior mean and variance. The posterior mean is the time-varying standard deviation and it can be modelled through the stochastic volatility model. Neural models are non-linear in nature because of the activation functions used in the hidden layer of the network. The idea to map non-linear financial data onto the Neural model has its own advantages as Neural network has the Universal Approximation property, where with proper selection of prior parameters the model can achieve a reasonably good predictor.the normal feed forward Neural network cannot be used for the time series data because it is only dependent on non-linear factors which makes it difficult to map the time index data. The different approach of Neural model called Auto-regressive Neural Network which is used in this project for the purpose of prediction analysis. The proposed Auto-regressive Neural model is different from the Feed Forward Neural model as it combines the linear model with additive Gaussian noise while developing the final model. The prediction from the Neural model can be compared to the posterior mean from the Linear Gaussian model and that will in turn minimize the Mean squared error loss. 1.2 Literature Review In the recent years, Financial market has become more volatile after the 2008 crisis and market has become unstable for long time since There were no specific reason for the crisis but during this period the financial returns were extremely high or low in comparison to normal period. While building the portfolio on risk assessment, volatility and other factors are always dependent on the time series analysis.these are the major factors in financial market and are

16 2 Chapter 1. Introduction always been unpredictable because of the inefficiency in predicting the values through the statistical models.as pointed out in paper (Nicalos Chapados, 2012), the large change in observations are followed by large changes, while small changes are followed by small changes.this is referred as volatility clustering. Time series analysis, is a field in economics which tries to calculate the various factors through observing the past values. Time series data is difficult to model as it s mean and variance are non-constant. Basically, there are two types of model in time series analysis called linear models and non-linear models. In linear models, shocks are uncorrelated but are not assumed to be identical and independent. In non-linear models, shocks are assumed as identical and independent. (ruppert, 2001) Financial series is stochastic in nature with the mean and variance which are non-constant. The useful approach of regressive model is not beneficial in this financial field. The approaches used for modelling the Stochastic Volatility in the financial market are Bayesian Inference(Nicalos Chapados, 2012), Stochastic Volatility with Markov Chain Monte Carlo (MCMC).(Nicalos Chapados, 2012) These models are complex in nature and difficult to build. Hence, the most common model used were Auto-Regressive Conditionally Heteroskedasticity (ARCH) and other recently used model is GARCH (Generalized Auto-Regressive Conditionally heteroskedasticity) models which is an improved version of ARCH. These models are most importantly used in the financial time series analysis as they exhibit the volatility clustering phenomenon. The other models like Bayesian Inference and Stochastic Volatility with MCMC are too complicated and not very popular in the real world modelling for time series analysis. Relatively less complex models are used in the financial market because of it s stability to capture non-linearity and also it is less complex to implement. Along with the advantages of modelling the volatility with GARCH, it has a significant disadvantage that it is not able to model the hidden non-linearity in the data. The reason behind that, the GARCH model rely only on the previous forecasting and variance to predict the current value. GARCH/ARCH models are used specifically for non-linearity in the data or for structural regression coefficient (Dietz, 2010). The models discussed above have never developed as a prediction models but to calculate the posterior variance and mean of the time series data. The approach for predicting the non-linear volatility in the financial time series data is complex as there is no specific Neural Network for prediction. Though, the Neural Network has a property called Universal Approximation (Dietz, 2010) which could be helpful in evaluating any function and thus, it can be utilised in prediction analysis. The Universal Approximation property says that any continuous, bounded and non-constant activation function can approximate any function. This property has made Neural Network suitable for various applications which inspired us to explore the Neural Networks in the field of Financial Time series analysis. We are not using the Normal Feed forward Neural Network as it does not complement with the linearity in the Time series data. So, the proposed model for the analysis given in (Dietz, 2010) as the foundation for the modelling.

17 3 Chapter 2 Gaussian Process Volatility Model 2.1 Auto regressive Models A time series is a sequence of variables measure over a time with uniformly distributed time as example monthly/daily or yearly time. Times series data are Markov dependent with higher order lag. In the uni-variate state space models a times series models represent the autoregressive AR( ). The series {y t } means y measured in t with uniform time distribution. The AR(1) is model of y t regressed on the past values y t 1 which is the previous time series value ofy t. y t = y t 1 + t (2.1) where t N (0, 1) is a white additive noise with µ =0and 2 =1is uncorrelated with the past values of the AR series y t 1. t represents the new contribution to the y t and they are known as the series random shocks or innovation. The equation is termed as auto-regressive which is actually a linear regression model for the y t in terms of the y t 1. That is, y t is modeled as regression on its past y t 1. The value of the strongly affects the series behaviour for the AR(1) process. In series of the observation where y t 1 is a long vector then the value of 1 apple 1 the weights given to the shocks t which occurred long time ago will also be extremely small which will make the series stationary and thus, means and variance to the model remain constant as t grows and if the 1 then the weights given to the distant shocks will be much greater than those given to more recent ones. This model is said to be explosive as the series mean and variance tends to grow exponentially as the t grows. Finally, if =1, the model is neither stationary or explosive and is called a random walk. 2.2 Stationarity An AR( ) is a stationary process, if following properties of the times series data exists E[y t ]=0for 8 t (y t )= P 1 j=0 2 j for 8 t and this can be true if the error decays rapidly as j! 0 Cov(y t,y t 1 )= (k) is a function of j weights and depends only on the lag k and not on t. When 3, the restrictions for the increases and the models become much more unstable while trying to predict and also if the model gets over-fits to the training data, the generalization error measured on the test data will be increase.

18 4 Chapter 2. Gaussian Process Volatility Model 2.3 Stochastic Process A Stochastic Process is a random variable {X }, with parameter and index on the set. Where represents the time as an index. Stochastic process that depends on time is a simple process where it process at specific times according to the specific probabilistic rules. Thus, the state space, assumed in the time dependent analysis, is evolved to a stationary process in which probabilistic rules are constant during the transition matrix. If is a stationary and P is a transition probability P(X,Y) = pforallt 0 (2.2) A measure is stationary for X if measure µ where, µ(x) =q x (x), is stationary for Y For a discrete time process, the random variable X n will depend on earlier values of the process, X n 1,X n 2 therefore the conditional distribution of the form Pr =(X tk X tk 1,X tk 2,...,X t1 ) (2.3) for some set of times t k >t k 1 >t k 2 >...>t 1 the stochastic processes with time dependent satisfy the Markov property which states as Pr(X tk X tk 1,X tk 2,...,X tk )=Pr(X tk X tk 1 ) (2.4) The stochastic processes that satisfy the Markov Property are easy to model such as the stock exchange and the exchange rate are also time dependent. 2.4 Gaussian Processes A Gaussian Processes (GP) is a generalization of Gaussian distribution where probability is distributed over the function with the specified mean and variance. GP extends the multivariate Gaussian distribution to the infinite dimensionality. Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. let x R D index into the real process f(x). we write, f(x) GP(M( ),K(, )) (2.5) where functions M( ) and K(, ) are respectively, the mean and covariance functions: M(x) =E[f(x)]k(x 1,x 2 )=E[f(x 1 ) M(x 1 )(f(x 2 M(x 2 )))] (2.6) The data often used or received from different sources are never consistent and contain errors such as each observation of y can be thought of the data x as a function f(x) with the additive noise model as y = f(x)+n (0, 2 n) (2.7) The volatility measurements obtained from the log-range are normally distributed around the true log-volatility and hence the equation assumed to be hold with function f( ) which represents the log-volatility. Where y is the observed values for the x modelled on the time index. For modelling the stochastic volatility with Gaussian processes, the process is to cast in terms of regression from the time indexes to volatility measurement obtained from the log-range and the estimation would be the pair of D = {(t i,y i )} where t i would be time index and y i would

19 2.5. The Stochastic Volatility model 5 the obtained from formulation y t = f(t)+ 2 n t (2.8) 2.5 The Stochastic Volatility model The observation at time t is given by y t = e 1 2 ht t, (2.9) for t =1, 2,...,T, where t N(0, 1). Note that the state h t is called the log-volatility. The states are assumed to evolve according to a stationary process h t = µ h + h (h t 1 µ h )+ t (2.10) for t =2, 3,...,T, where t N(0, 2 h ) and is independent of at all leads and lags. Hence the conditional variance of y t is given by Var(y t h t )=(e 1 2 ht ) 2 Var( t )=e ht. (2.11) We further assume that h < 1, and that the states are initialized to h 1 N µ h, which is the stationary distribution of the process The priors 2 h 1 2 h We assume independent prior distributions for µ h, h and 2 h, i.e., (2.12) (µ h, h, 2 h )=(µ h)( h )( 2 h ) (2.13) Specifically, we use the following independent prior distributions: µ h N(µ h0,v µh ), (2.14) h N( h0,v h )1{ h < 1}, (2.15) 2 h IG( h,s h ) (2.16) where 1 denotes the indicator function of the and IG represents the inverse-gamma distribution. The stationarity condition h < 1 is imposed through the prior distribution h.to model stochastic volatility, we have invoked an R package stochvol which has a straight forward functions for doing the Auto-regressive AR(1) SV analysis. We use the svsample function to train the model on the AUD/USD daily returns data-set. Hence, using the prediction function predict for predicting the days ahead which are equal to the test data-set points. The mean squared error (MSE) is used for calculating the prediction error between the test data points and predicted data points. This model is compared with the other models in the chapter 4 during the model comparison.(kastner, 2016)

20 6 Chapter 2. Gaussian Process Volatility Model 2.6 Hetroscedastic A sequence of output random variables Y t and a sequence of input random variable X t with the conditional variance Var(Y t X t ) is non constant for a given time t so the model with constant variance 2 is not efficient to capture this phenomenon. It arises in two forms: Conditional and Unconditional. Conditional Hetroscedastic can be identified as non-constant volatility when future high and low returns cannot be identified. Unconditional Hetroscedastic is identified as independent volatility where future returns could be identified. 2.7 Generalized Auto-regressive Conditionally Hetroscedastic (GARCH) GARCH time series model are widely used in Financial market for capturing the random volatility of the returns. GARCH calculate the forecast y t based on the squared past forecast values yt 2 1 and past variance t 2 1. The model is not difficult to understand but it is required to focus on the reasons for taking past forecast and variance values into consideration. The main purpose of GARCH is to model variance on financial returns. To understand the GARCH model, we will go through a simple approach called ARCH model (D.Ruppert, 2011). ARCH(1) models the conditional variance looking at the past values which is similar to Linear AR(1) model as discussed in the Section 2.1. The Linear regressive model with constant variance 2 and expectation equal to 0 given as, Y t = f(x t )+ t (2.17) The conditional constant variance Var(Y t X t )= 2 and f is a conditional expectation of Y t given as X t. Equation ( 2.17) can be modified to allow conditional Hetroscedastic into the model with the equation, Y t = f(x t )+ (X t )+ t (2.18) where t has a conditional mean equal to 0 and variance equal to 1. As (X t ) should be nonnegative as it is a standard deviation. If the function ( ) is linear then its coefficient has to be made non-negative so that standard deviation remains non-negative. This would be a difficult task of controlling the coefficient, so the non-linear non-negativity approach is used instead. The conditional class variance method is also used for GARCH method ARCH(1) Processes We have to consider the Gaussian noise in the ARCH(1) model. Hence, while adding the noise into the model, we consider the Gaussian noise to be with constant mean and variance as, and E( t t 1 )=0 (2.19) Variance( 2 )( t t 1 )=1 (2.20) This property of the white Gaussian Noise is called as homoskedasticity. The process a t is given in ARCH(1) model as follows, q a t =! + 1 a 2 t 1 t (2.21)

21 2.8. Model GARCH process 7 From the equation ( 2.21) expectation is equal to zero and is equal to q! + 1 a 2 t 1. where to make the >0 the coefficients of the variance!>0and 1 0. To make the model a t to be stationary <1. The equation 2.21 can also be modified as, a 2 t =! + 1 a 2 t 1 2 t (2.22) The equation ( 2.22) is similar to the AR(1) but with the squared term and noise with mean 1. The conditional variance for the ARCH(1) would be 2 t = Var(a t a t 1 ) (2.23) and as noise is independent of the past values a t expectation of the equation ( 2.21) is zero. 1 the E( 2 t )=Var( t )=1. The conditional 2 t = E{(! + 1 a 2 t 1) 2 t a t 1 } (2.24) 2 t =(! + 1 a 2 t 1)E( 2 t a t 1 ) (2.25) 2 t 1 =! + 1 a 2 t 1 (2.26) To understand the GARCH model, the variance derived in the equation( 2.26) is the same variance for the model. If a t 1 is having a large magnitude then t will also be large. This tends to make a t as large as well and makes the volatility to propagate from a t to a t+1. Similarly, if a t 1 is small in magnitude then t is small in variance and a t will also be small in magnitude. There is a proportional relation between a t and t+1. Thus, the ARCH(1) variance model with y t is conditional on y t 1 with variance at time t, Therefore y t is for ARCH model with series mean = 0 as, q GARCH (1,1) Process Var(y t y t 1 )= 2 t = y 2 t 1 (2.27) 2 t = 2 t = y 2 t 1 (2.28) y t = t t (2.29) GARCH uses the same approach of ARCH model of relying on the past values y t to predict the new y t but with the addition of the past variance t 2 1 to the model. 1 of forecast 2 t = 2 t = y 2 t t 1 (2.30) In equation ( 2.30), 1 is multiplicative with t 2 1 and is added as the past variance into the model to make the ARCH model as a generalized model for any time series data. Since, t from the past value changes the magnitude of the y t, past variance are added into the GARCH model. 2.8 Model GARCH process To model the variance t from the GARCH model, the constraint for the variance should be satisfied and for this parameter selection of!,, needs to adhere to the constraints. Care should be taken during this so that variance should not have a negative value. The range of 0 0, 0 1 apple 1 and 0 1 apple 1 is required to maintain non-negative variance. To train

22 8 Chapter 2. Gaussian Process Volatility Model TABLE 2.1: Values for parameters!,, Type! Observed Predicted and test the model, we will use AUD/USD daily returns data set. The data-set is split into the training and testing data-set using the split function and normalize all the set to reduce the effect of outliers. The optimize function is used to run the GARCH model multiple times and find the mean squared error between the observed forecast and the predicted forecast on the train set. The optimize function will return the values for!,, and these values will be used for GARCH prediction. The GARCH prediction function predicts the variance for the specific time t equivalent to the test data-set. The Mean Squared Error (MSE) is used for calculating the cost function between the predicted values and the test data. As GARCH is a simple model it is not able to capture the hidden non-linearity in the financial data. We will go through the results in Section ( 2.9). 2.9 Results on GARCH The Financial data-set is split into 3 4 as train set and 1 4 as test set for the GARCH model. The parameters for selections are!, and which need to be selected to get the optimize prediction on the test set. We are using the optimize algorithm from the python code to iteratively the GARCH model and measure the cost function for minimized error on the particular selected parameter value for the!, and. The initialized values for all the parameters at the start of the optimization process is 1 for!, and. yt 2 1 will be squared with lag of the y t. 2 t = y 2 t t 1 (2.31) The equation ( 2.31) is used for calculating the variance and y t is calculated by taking the square root of the variance t 2. The equation is as follows, q y t = 2 t (2.32) In actual GARCH model, to calculate the y t we need to multiple the y t with Gaussian noise given as, t N(0, 1) but, in the training phase we don t add the Gaussian Noise because train model never gets to the global minimum cost function. This makes less efficient values of!, and. The table ( 2.1) gives the details of the values of the parameters after the training and now these value can be used for prediction model. The GARCH function for the prediction is same as the GARCH function for the training the only difference is now we are not using the optimize function to continuously run the training model. The error calculated on the prediction model and training model is given as, In figure ( 2.1) the two plots are based on the test set. There are some outliers in observation plots which are not able to capture in the prediction model. The reason is the hidden non-linearity in the data set are not properly captured by the GARCH models. The total prediction error is 0.33 which is comparatively small since the model is fairly simple to implement with less factors to be considered. As the input lag y t 1 is increased to y t i then the model will become complex and difficult to find the global

23 2.9. Results on GARCH 9 TABLE 2.2: Error on GARCH model Type Error Test Set minimum cost function for the train model. This will have a larger impact on the test data set as the generalization error will increase drastically. FIGURE 2.1: Prediction and Observation for the GARCH model

24

25 11 Chapter 3 Auto Regressive Neural Network (AR NN) 3.1 Feed Forward Neural Network In linear time series analysis, the Auto-regressive AR( ) model consist of two layers: The input layer, which contains the entirely independent variables and the output layer which contains dependent variables with a constant term called as bias. A linear Auto regressive given AR(1) by y t = y t y t 2 (3.1) As Linear AR model is not sufficient for prediction so the non-linear part has to be augmented with the linear model. The non-linear AR(1) is model on the hidden layer of the Neural Network and thus added in between input and output layer. Thus AR-NN model are those in which there is a direct connection between input and output layer as designed for linear model and as well the connection between the input and output layer through hidden layer for nonlinear model. Thus, the nonlinear function F ( ) is an extension of a linear AR(1) is given as, y t = y t y t 2 + F (y t 1,y t 2 ) (3.2) This equation ( 3.2) can be model using the feed forward neural network with extension of linear AR(1) model to build an Auto regressive Neural Network. We are using a three layer Feed Forward Neural Network with Back propagation algorithm as a method to update the weight vector on the network. The main purpose of the Neural Network in the project is to implement it for the prediction of the times series data-set. We have selected a AUD/USD data-set to predict the daily returns with variance of prediction that is to be compared with the observed values. Neural network is a nonlinear model because of the use of activation function in hidden layer to predict the values. The hidden layer in the network uses non-linear activation functions and as they are non-linear in nature the values from the hidden nodes are bounded between the range. We are using Neural Network as a regression model for the purpose of predicting the time series vector. NN has a property of universal approximation which means it can approximate any input attribute as close possible to the target with the selection of approximate hidden layer units. The normal feed forward network has set of input attributes and a target values and using the input attribute, we try to make the predicted output as close as possible to the target vector. We are implementing the special case of the neural network called Auto-regressive Neural Network and the reason for building this network model is that there is no input attributes for the time series data set. As this is a time series data set, we will be using the same target vector with lags as our input attribute.

26 12 Chapter 3. Auto Regressive Neural Network (AR NN) Input Layers i 1 Hidden Layers Output Layers h 1 h 2 o 1 i 2 1 FIGURE 3.1: Auto Regressive Neural Network (AR-NN) The design of the neural network is a state of the art as there are various factors to be considered for design. There is no specific criteria or constraint while building the model. To implement the model there is a need of various algorithms like Gradient descent, Back propagation, Cost function are considered while designing the model. In the figure ( 3.1) at the input layer i t where t =1, 2,...T acts as a input to the model and connections from the nodes i t to the hidden layer nodes h t and output layer nodes o t. It is called non-linear because of the use of activations function in the hidden nodes. The connections are made for linear part between the input node and output node. The figure is an approximate to the model used for the prediction there will be weight assigned on the edges connected between the nodes. This weight vectors are to be selected at the initial state and gets updated after predicting the output vector. This output vector is applied to the batch gradient algorithm to calculate the cost function. The aim of the model is to minimize the error and approximate the input vector as close as possible to the target vector. 3.2 Design of Auto Regressive Neural Network To design the AR-NN for regression as output, we have to consider prior parameters for design and model with three layers consisting of Input layer, Hidden Layer and Output layer are also sometimes called as first, second and third layers respectively. Input layer consist of the number of nodes equal to the dimension/attributes of the data set and it acts as input X t for the model. Output layer can have nodes according to the type of learning. We are using this model for predicting the regression, so only one node y t is required at the output layer. The decision on hidden layer nodes is state of the art as there is no specific criteria for hidden nodes. The input vector into the network through the input layer and forwarded to the hidden layer after combining it with weight parameters w ij on the input side. At the hidden node, there is a non linear activation function f( ) it acts as a triggering node in the layer. The output from the hidden units are differentiated through activation function and are of non-linear nature.

27 3.2. Design of Auto Regressive Neural Network 13 These vectors are again multiplied with the output layer weight vector w jk and transferred to the output layer. Thus equation for the input side layer network is as follows, DX a j = f( w (1) ji x i + w (1) j0 ) (3.3) i=1 z j = h(a j ) (3.4) where i = 1,...,D of equation 3.4 indicates the dimensions of the first layer also called input layer nodes. a j represents the activation function of single node of the next layer in our case the hidden layer. We add the bias value w j0 to the nodes to handle the output offset in this equation. Hence, these activation nodes a j is transformed through non linear activation function given in the equation (1.4) The above equation are at the input side of the model and are again transformed from hidden layer to the output layer through the output side weight vector and activation functions to predict the data as vector at the output layer. This formulates the whole equation of the Neural Network model as described, MX y k (X, W) = ( j=1 w (2) kj h( D X i=1 w (1) ji x i + w (1) j0 )+w(2) k0 ) (3.5) In the above equations, layer one and layer two given by superscripts (1) and (2) are combined to form the non linear equation for predicting the value. This is a normal Feed Forward Network and to design Auto regressive Neural Network this Feed Forward has to be extended by adding a linear term into the equation. Linear Input term is used with a multiplicative parameter and directly connected to the output layer and hence, there is no non-linear factor involved. This linear part is sometimes called as memory of the network used for storing the previous output vector and supplying the same to the output node at next time interval. The estimated equation for the Auto regressive model is as, y t = 0 + nx MX i y t i +( ( i=1 j=1 w (2) kj h( D X i=1 w (1) ji y t i + w (1) j0 )+w(2) k0 )) + t (3.6) The input for the equation is y t i which is lag of the output y t and this lag can be increased to bring the predicted output as close to the target vector. A care should be taken with the lag input as this might over-fit the model or under-fit and perform poorly on test or unknown data-set. For building the whole model all the parameters need to be initialized and also defined with the specified criteria. In the following subsection, there will be detailed discussion on Input time series data set, weight-space, Back propagation algorithm, Activation functions Time Series Data set The data-set we are using of interest is financial returns regarding the Daily asset returns in AUD/USD from January 2005 to December The data-set is only one variable representing the returns and as this data set is time dependent time series data it contains time-varying variance and constant time difference. The input to the model will be past values of series data to forecast the current value. Time series data exhibit a different approach of the prediction as it is only dependent on time index for the prediction it is difficult to model normally according to the other feed forward models. In times dependent data its current value is dependent on the previous data point and this is similar to the Markov process. When one past series value y t 1

28 14 Chapter 3. Auto Regressive Neural Network (AR NN) is used for the prediction then the series represents Auto Regressive AR(1), where 1 represent one lag vector of the y t vector is used. The Linear AR(1) model can be described as, y t = y t 1 (3.7) 0 is the offset and 1 weight factor for y t 1 this equation resembles to the Linear regression equation with t t 1 as input for y t. From the the figure 3.2, the variance of the daily returns FIGURE 3.2: AUD/USD daily returns from January 2005 to December 2012 mostly is ±2% and there is more than ±8% between 2009 and This drastic returns have to be able to capture by the model as this signifies an important event in the financial market. This event were during the financial crisis in the US market due to the recession in the market. This event continues for the next two years as returns continuously remains in the high variance range of ±4% The bounded value range of activation functions, the AUD/USD data set is scaled on the value range R! [0,1] this can be done using the mean - variance method with ȳ t or µ is the mean of the data set y t. t the square root of the variance of y t respective ȳ t as this equates as, y t = y t µ 2 (3.8) The two were taken into consideration while doing the scaling of the data

29 3.2. Design of Auto Regressive Neural Network 15 The model behaves better if all the variables are scaled as if the range of the observed values is much higher than the range of the activation function only linear values dominate the process. The initial weight parameters values are not depended on the observed values and if input vectors are not scaled and the initial weights are not sufficiently small, the output from the activation function might switch from the upper and lower bound. Transforming the series data set might lose the information but the good thing is prediction of the series gets much better if the range of activation function and series data is in bounded range. This criteria is not a strict criteria to follow but the cost function with scaled series gives good approximation of the observed values Weight-space The weight matrix W consist of the weight vectors w from all the layers of the model. This initialization of weight vector holds importance as, if the values are initialized randomly then network model will require more iterations to reach a minimum cost. The ratio of weight vector w and hidden units h is 2 n and with random weights makes the computation costly for the neural model. The updating of weight space dependents on the type of gradient descent algorithm approach to be done according to the output prediction. We have decided to use Batch gradient approach as with it weights get updated after it, computes the gradient of the cost function w.r.t. to the weight parameters W (1) and W (2) for the entire training set. The weight matrix W can be updated as, W = W r J( ) (3.9) In our model, weights are initialized using the Gaussian normal distribution N (0, 1) with µ =0 and 2 =1. Weights can be initialized with any random distribution but, Gaussian is the most preferred distribution in the various paper studies so far done for the regressive function. The weight matrix weight is greater than one then usually it takes more iteration to reduce the cost function. Another design consideration with the weight matrix W, as every input unit is connected to every output unit with a weight value w ij. There are strategies to connect units in order to reduce the W matrix calculations.the strategy used for connection is to connect all the units to each other and initializing the weights accordingly.as we are not using any basis function for the data set as studies have shown that adding more hidden units is equivalent to adding the basis function. Concerning the selection of Hidden neural units as usual approach is trial and error with arbitrary number of neurons with all other factors to be fixed and constantly monitoring the error approximation. A rule of thumb regarding choosing the hidden layer units is equal to the median of the input and output variables given as, h =(n + 1)/2 (3.10) This method doesn t have any technical support but, when we considered this approach our results were better. We started the hidden unit selection more than 20 units and the results from the hidden activation function used to always over shoots the range [-1,1] and at the output unit prediction was with all ones. So after taking into consideration the equation, we choose the hidden units to be around 3 to 5 and the results from the output unit were better but need some changes in the type of activation function to be chosen. The approach used during this was to step by step increase the units in the hidden layer and monitoring the error function.

30 16 Chapter 3. Auto Regressive Neural Network (AR NN) The cost function selection for the Neural model was difficult as there are no specific cost function to be used for the time dependant nonlinear data set. We have chosen the squared error and Mean Squared Error (MSE) to measure the performance of the prediction model. Mean Squared Error has more efficient than Squared Error, and also to take the derivative of the MSE is easy with back propagation it was the main criteria for the selection. E = 1 2 k X (y k t k ) 2 (3.11) k In order to avoid the over fitting of the model to the training data set, we have added the regularization parameter in the cost function. The purpose of the regularization parameter is to make sure the model gets penalize if it tries to over-fit the data and indirectly improves the generalization error measured on the test data. The cost function after adding the regularization variable W = W r J( )+ W (3.12) The variable is fixed value during the prior initialization and same fixed value is used for all the iteration. The criteria used for the selection of the value is trial and approach but if we increase the value of model might under-fit and this care has to be taken while selecting the value for. As over-fitting is a major problem for the prediction model a small changes in the cost function improves generalization error Activation Function The task of choosing the activation function for the model is important to concertize the AR- NN function. The selection of activation function dependents on the Universal approximation property which says that any continuous, bounded and non constant activation can approximate any function on a Borel measurable function from one finite-dimensional space to another with any desired non-zero amount of error, provided the network has enough hidden units for approximating. The derivatives of the feed forward network can also approximate the derivatives of the function well. The concept of Borel measurably is that the function is closed continuous and bounded subset of R n is Borel measurable. The use of basis function in AR-NN model is not used as it gets more complicated with the non linearity of the function. The bounded activation function used in general in all the NN models is the sigmoid function and as it is bounded between R! [-1,1] logistic function is one of the sigmoid function, ( ) =(1+exp ( )) 1 (3.13) logistic : R! [0,1] this sigmoid function is called the tangent hyperbolic function (tanh) ( ) = exp( ) exp ( ) exp( )+exp ( ) (3.14) Linear activation function called as Identity function is used sometimes at the output units for the regression based prediction model. We used this activation function for the model but, there was not significant improvement in the cost function value.this two activation function can be used in the model for the hidden units and output units. It can be used as a mixture of the two activation function with hidden units having the logistic function and output units with tanh function. It depends on the how much error be reduced from the selected activation functions. Sigmoid functions reduce the effect of the outliers because they squash the vector into the range of [-1,1].

31 3.2. Design of Auto Regressive Neural Network Back Propagation Algorithm The back propagation algorithm is a learning procedure for feed forward neural network, by this the network can map the a set of inputs to a set of outputs. The mapping is specified by giving the desired activation function on the units in the hidden and output layer. Learning is carried out iteratively by adjusting the coupling of the strength in the network so as to minimize the difference between the actual output state vector and input state vector. The learning process is repeated until the network responds for each input vector with an output vector that is sufficiently close to the desired one.as weights are initialized randomly with some prior distribution and we need to update the weights after every iteration. The main purpose of implementing back propagation is to minimize the cost function C with respect to weight W and bias b and this is achieved through partial derivative of the After calculating the partial derivative of the cost function it needs to update the weight space and bias so as to make the predicted output closer to the target data. Before discussing the algorithm, we need to consider the cost function for that to be used at the output unit. We have consider squared loss function for the model and has the form as, 1 2 nx (t y t ) 2 (3.15) i=1 where y t is the predicted vector and t is the target vector we take the squared difference of two and take sum of all the data points and divided the whole equation with 0.5. The usage of the the 1 2 is while taking the partial derivative of the cost function during the gradient descent the squared term gets eliminated because of the fraction term. We have have initialized weight vectors on the edges, one between input layer and hidden unit and another between hidden unit and output layer so as, we have to calculate the updating of the weight space twice with the same value of error function. we need to do the partial derivative of the cost function as nx (t y t ) it is some times written as i=1 nx (y t t) (3.16) The above function gives the total error between the input and output vector and adding this difference into the next immediate previous layer. We are using the activation function on each units in the layer, we need to take the derivative of the activation function while doing the updating. We are using three different activation function for the model to look at the error but for calculation we will use the sigmoid function 1/(1 + exp( z)) and combining the above equation ( 3.16) with the sigmoid function, we can update the weight vector w (2) as they are the previous layer of the output layer. The equations for calculations of the weight space vector between output to hidden layer. Therefore, delta o can be written as, i=1 a (2) = (z (2) ) (2) = (z) (1 (2) = (t y (z) t) (1 z) (z(2) ) (3.19) o = (t y t ) (z (2) ) (1 z (2) ) (3.20)

32 18 Chapter 3. Auto Regressive Neural Network (AR NN) We can update all the weights at different hidden units (2) = o a (2) (3.21) From the above equations we have updated the weights vector at the hidden units and now again back propagate the same error to the previous layer from the hidden layer and that to input layer. We have to use same set of equations for the updating of weight space vector W (1) Therefore, delta o can be written as, a (1) = (z (1) ) (1) = (z(1) ) (1 z (1) ) (2) = (t y (z (1) ) t) (1 z (1) ) (z(1) ) (3.24) o = (t y t ) (z (1) ) (1 z (1) ) (3.25) We can update all the weights at different hidden units using 0 and input vector y t (1) = o y t i (3.26) Finally, all the weights from the edges W (1) and W (2) of the layer are updated with Batch gradient approach. After every back propagation procedure on training sample error have to decrease slowly and weights gets updated and for the network model it requires around more than 10,000 iteration to get the weights stabilized to the local minimum range. From where error never goes below the certain level and remains in that range for long time. There are two choices for tackling this problem, one is either making the algorithm to run for fixed amount of iteration or second approach, if the error is not decreasing for long time then it can be stopped at the particular step. The changes made in the algorithm is to continuously tack the error range and take decision based on the cost function.(m.bishop, 2006) Practical Reducibility When the AR-NN extended to add the additive noise, the distribution of the noise term is positive everywhere in the range ( 1, 1) and thus AR-NN forms an irreducible and aperiodic Markov chain. This chain is aperiodic, because it does not cycle between a set of values at the specified multiples of t. It is irreducible because it is impossible to reduce the range of Y t from the entire real line ( 1, 1) to a smaller finite set. As noise is additive and it does not depend on Y t and t takes any random from the range. Therefore, even if the model converges, the noise term ensures that Y t is irreducible and aperiodic.(dietz, 2010) 3.3 Function As the architecture for the model is built in the above sections, we will now discuss the steps required for the model to implement.the data used for modelling AUD/USD returns with functions created for normalizing the data and calculating the Loss functions. The equations needed to model the Auto-regressive Neural Network combining all the algorithms included

33 3.3. Function 19 into the design of the model to make the prediction model for the time dependent non linear data set.linear equations for the model with linear weight vector with two lags of the observed forecasting data set is as follows, y t = y t y t 2 (3.27) The above equation will be needed for predicting at the output unit before this non-linear part of the model equations has to be proved. where weight matrix W consisting of weight vector w (1) and w (2) at the input side given by superscript (1) z (1) j = DX i=1 w (1) ji y t i + w (1) j0 (3.28) where i R D represent the dimensions of the input matrix and j represents the number of units at the hidden layer. This equation needs to be transformed into non-linear form through activation function. a (1) j = (z (1) j ) (3.29) We have used sigmoid function as activation function for the hidden units for transforming the linear value from the input vector into the range of R! [ 1, 1]. This transformed value from the hidden units is now combined with the next weight vector w (2) z (2) k = MX j=1 w (2) kj a(1) j + w (2) k0 (3.30) where k R M belongs to the units at the output layer. As we have only one unit at the put layer M =1. The final output equation is passed through the activation function a (2) k = (z (2) k ) (3.31) Finally equation ( 3.31) is the non-linear transformation of the input vector and now needs to be combined with linear equation ( 3.27) and additive noise t. At the output layer,linear and nonlinear part are combined with the to the activation function for prediction the equation ( 3.32) combines all the derived equation for predicting y t.data set is normalize through the function with mean µ and variance to remove the outliers. The weight vectors w ij and w kj are initialized with random Gaussian distribution with N (0, 1) and i is also initialized with Gaussian distribution. i can be initialized as a fixed values but as data set is non-linear and fixing value for the same doesn t improve the error function. A two different constant value are used for the regularization parameters this way two function have been created for the predictions. There are two separate calculations for linear and non-linear part of the model.for non-linear, input vector is feed into the hidden layer to combine it with weight values and differentiated with activation function. After the prediction of vector Algorithm ( 1) to describes the model for predicting the times series data. The factors that are need to initialized are the equation to be consider for the prediction.(dietz, 2010) y t = 0 + nx MX i y t i +( ( i=1 j=1 w (2) kj h( D X i=1 w (1) ji y t i + w (1) j0 )+w(2) k0 )) + t (3.32)

34 20 Chapter 3. Auto Regressive Neural Network (AR NN) For updating the weight matrix after the training samples, Back-propagation algorithm is invoked with the gradient descent on the cost function with the same steps which are followed in the section( 3.2.4) needs to be followed to update the weight matrix w (1) and w (2) = (t y (z (1) ) t) (1 z (1) ) (z(1) ) (3.33) (z (1) ) o = (t y t ) (1 z (1) (3.34) (1) = o y t i (3.35) Data: Times Series Data of AUD/USD Daily returns Result: Predicting the Time series and Minimizing Error Data NormalizeData(Data); for all loop num_passes do hidden h Combining Input vector with Weight vector W; output Y Combining h and W with linear Weight Matrix and ; Backpropogation Algorithm; error e CalculateLossFunction(Y); if error in range then W min + delta; else W max + delta ; end end return model Algorithm 1: Training AR-NN In the algorithm ( 1), we first initialize the prior parameters for W,,,, and selection of hidden units for the hidden layer as well activation function for hidden units and output units. Data set is passed through the function NormalizeData to get the normalize data set and this would be used for further operations. The data set with one and two lag are created called as y t 1 and y t 2 an these are set as input for the model. In the For loop two things are calculated, first is hidden units combining the input vector with weight values and at output unit, values from the hidden units and with the linear values from the input vector that are combined with and differentiated with the activation function. At the point, Back propagation algorithm is invoked to calculate the gradient descent on the error function and back propagate the value into the previous layers through differentiating the weight vectors at each previous layers. The weights at the immediate previous layers from the error function gets highly updated and the weight at the first layer of the network are updated with very small amount but as iteration increases weights start to get stabilize and error function stays at the specific range for longer time and toggles between the ranges. The total error calculated through the CalculateLossFunction is used for updating the weight vectors for the next iteration. The model is made to rum mode than 10,000 iteration because anything less than weight doesn t get stabilize and as AR-NN model will not reach the local minimum bounded range because of the noise consideration, while invoking prediction and error function. So its better to run the algorithm more iteration to stabilize the weight to a certain weight and reach a steady low error value for the network. From the figure 3.3, the prediction error has more fluctuation in the range of [0,10,000] and as the iteration goes on increasing the error gets stabilized in the range of [1100, 1000] after 30,000 iteration and as neural model cannot have a global minimum and because of additive noise in the error function, cause is to fluctuate between the range and never gets stabilized lo local

35 3.4. Results 21 FIGURE 3.3: Error minimum. 3.4 Results In the following section we will discuss the results from the AR-NN model with consideration of the different parameters affecting the results.the subsection which are following will have detailed discussion on each parameter and focus will be on the cost function posterior mean and variance. As these parameters will define the performance of the network architecture. Neural network cannot achieve a global minimum as the activation functions used in the units in all the layers act as a nonlinear function with different bounding range for each activation function. Some parameters are predefined in the model as model performance is not measured on these values and they are not important prior information to be considered by designing the model Alpha As discussed in the Linear Auto regressive model the linear weight matrix changes the y t from being stationary, explosive or a random walk. We tried to see the impact of on the model. The mean of the observed data set after normalizing it with mean and variance is µ = x10 4 and 2 =1

36 22 Chapter 3. Auto Regressive Neural Network (AR NN) when alpha = 1 At first, the vector was kept constant at =1From the figure ( 3.4), after 20,000 iteration FIGURE 3.4: Error Function with =1 error starts to get stabilize in the range [2800,2600] and because the linear weight vector is is constant at 1 the contribution of the linear value is fixed across the iteration and because of that the error does not fluctuates much and remain in the same range for long time. The mean for the predicted data set is µ =1.018 and variance 2 =2.93 where predicted µ and 2 are away from the observed values and this approach would not perform for the test data set When alpha ( 1 apple 1) We will consider the 1 apple 1 in which the predicted set has µ =0.23 and 2 =1.35. In the figure ( 3.5) at the start of the iteration there is a high deviation of the cost function and that continuous 80,000 iteration and after that error starts to get minimized with minimum deviation. The mean and variance of this plot is very close to the observed data set. This model might be outfitted but it has performed better than the previous graph When alpha ( >1) Finally, is considered above 1 that is >1 for this condition there is no specific value to be considered so, we have chosen =2as a parameter vector value to be fixed.as grows the value of the mean and variance grows exponential and from the figure 3.6 cost value never stabilizes also the cost value starts at around 12,500 and after 80,000 iterations it still stays at around 10,000 value and never reduces below 10,000 range. Compare to the other two,

37 3.5. Prediction Results 23 FIGURE 3.5: Error Function with 1 apple 1 this models has µ =2.01 and 2 =8.87 higher than the other two predicted set. So while implementing the neural model for prediction it would be better to consider alpha in the range of [-1,1] and for this one approach would be to use Gaussian distribution for the selection of the prior Discussion on Activation Function As discussed the two type of activation functions in the section ( 3.2.3) we will explore the results from the model using these activation function and the results for the two function are quite different. At first, we will consider the logistic activation function on the hidden units and output units. The formula for the same ( ) =(1+exp ( )) 1 (3.36) The predicted plot of the logistic activation function from the figure ( 3.7) it was able to capture the outliers from the forecasting data like the high variance of daily returns during the financial crisis between 2008 to The mean µ for the predicted is and the variance 2 is The model is not able to predict the normal target points. Secondly, the figure ( 3.8) of predicted output from the tanh activation is able to predict the vector as close to the observed values with µ = and 2 = 1.30 and as the mean of the observed vector is µ =5.109X The tanh activation performs better than the logistic function as it was able to capture the outliers regarding to the observed data. tanh activation function predicts better on the selected data set than the logistic function but, there is no specific criteria for selection of activation function and trail and error is the best approach for selection of this functions. 3.5 Prediction Results In AR-NN model, we choose for prediction with prior parameters such as initialization of the weight vectors W, choosing activation function ( ), linear weight vector and regularization

38 24 Chapter 3. Auto Regressive Neural Network (AR NN) FIGURE 3.6: Error Function with >1 parameter. The AR-NN model we choose is have the Gaussian distribution for the weight vector W and also tanh as activation function for the hidden and output layer of the network. Regularization parameter added in to the cost function is of = 0.1 when used in the model the prediction results from the model are optimized.in the figure ( 3.9) there are two plots of prediction and observed values with variance on the y-axis and time on x-axis. There is a high variance between 200 to 300 data points on the observed plot and this high volatility is of major concern for the financial market and our model is able to capture this variance but the magnitude is not as high as observed plot has. The reasons for this is to consider more hidden units to capture the non-linearity in the data. Neural network cannot reach the global minimum error for the reason of the activation function. we have to use the trial error method to reduce the error consistently and reach a minimum range but at same time model should not over-fit as this has impact on the prediction and this finally increases the prediction error. The plot ( 3.10) refers to the error calculated using Mean Squared error (MSE) the main aim is to reduce the error and bring the training set error to its minimum range and usually it takes around more than 10,000 iteration for the error to come in this range.

39 3.5. Prediction Results 25 FIGURE 3.7: Logistic Activation Function FIGURE 3.8: tanh Activation Function

40 26 Chapter 3. Auto Regressive Neural Network (AR NN) FIGURE 3.9: AR-NN Prediction and Observed FIGURE 3.10: Error on Train set

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications Yongmiao Hong Department of Economics & Department of Statistical Sciences Cornell University Spring 2019 Time and uncertainty

More information

Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data

Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data Revisiting linear and non-linear methodologies for time series - application to ESTSP 08 competition data Madalina Olteanu Universite Paris 1 - SAMOS CES 90 Rue de Tolbiac, 75013 Paris - France Abstract.

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Econometric Forecasting

Econometric Forecasting Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 1, 2014 Outline Introduction Model-free extrapolation Univariate time-series models Trend

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Financial Econometrics

Financial Econometrics Financial Econometrics Nonlinear time series analysis Gerald P. Dwyer Trinity College, Dublin January 2016 Outline 1 Nonlinearity Does nonlinearity matter? Nonlinear models Tests for nonlinearity Forecasting

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Chapter 2. Some basic tools. 2.1 Time series: Theory Stochastic processes

Chapter 2. Some basic tools. 2.1 Time series: Theory Stochastic processes Chapter 2 Some basic tools 2.1 Time series: Theory 2.1.1 Stochastic processes A stochastic process is a sequence of random variables..., x 0, x 1, x 2,.... In this class, the subscript always means time.

More information

Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms

Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms Comparison of the Complex Valued and Real Valued Neural Networks rained with Gradient Descent and Random Search Algorithms Hans Georg Zimmermann, Alexey Minin 2,3 and Victoria Kusherbaeva 3 - Siemens AG

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Stochastic Processes

Stochastic Processes Stochastic Processes Stochastic Process Non Formal Definition: Non formal: A stochastic process (random process) is the opposite of a deterministic process such as one defined by a differential equation.

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Session 5B: A worked example EGARCH model

Session 5B: A worked example EGARCH model Session 5B: A worked example EGARCH model John Geweke Bayesian Econometrics and its Applications August 7, worked example EGARCH model August 7, / 6 EGARCH Exponential generalized autoregressive conditional

More information

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50 GARCH Models Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 50 Outline 1 Stylized Facts ARCH model: definition 3 GARCH model 4 EGARCH 5 Asymmetric Models 6

More information

ECON3327: Financial Econometrics, Spring 2016

ECON3327: Financial Econometrics, Spring 2016 ECON3327: Financial Econometrics, Spring 2016 Wooldridge, Introductory Econometrics (5th ed, 2012) Chapter 11: OLS with time series data Stationary and weakly dependent time series The notion of a stationary

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Econometrics of financial markets, -solutions to seminar 1. Problem 1

Econometrics of financial markets, -solutions to seminar 1. Problem 1 Econometrics of financial markets, -solutions to seminar 1. Problem 1 a) Estimate with OLS. For any regression y i α + βx i + u i for OLS to be unbiased we need cov (u i,x j )0 i, j. For the autoregressive

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

at least 50 and preferably 100 observations should be available to build a proper model

at least 50 and preferably 100 observations should be available to build a proper model III Box-Jenkins Methods 1. Pros and Cons of ARIMA Forecasting a) need for data at least 50 and preferably 100 observations should be available to build a proper model used most frequently for hourly or

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Heteroskedasticity in Time Series

Heteroskedasticity in Time Series Heteroskedasticity in Time Series Figure: Time Series of Daily NYSE Returns. 206 / 285 Key Fact 1: Stock Returns are Approximately Serially Uncorrelated Figure: Correlogram of Daily Stock Market Returns.

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Financial Econometrics

Financial Econometrics Financial Econometrics Estimation and Inference Gerald P. Dwyer Trinity College, Dublin January 2013 Who am I? Visiting Professor and BB&T Scholar at Clemson University Federal Reserve Bank of Atlanta

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

13. Estimation and Extensions in the ARCH model. MA6622, Ernesto Mordecki, CityU, HK, References for this Lecture:

13. Estimation and Extensions in the ARCH model. MA6622, Ernesto Mordecki, CityU, HK, References for this Lecture: 13. Estimation and Extensions in the ARCH model MA6622, Ernesto Mordecki, CityU, HK, 2006. References for this Lecture: Robert F. Engle. GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics,

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

A Guide to Modern Econometric:

A Guide to Modern Econometric: A Guide to Modern Econometric: 4th edition Marno Verbeek Rotterdam School of Management, Erasmus University, Rotterdam B 379887 )WILEY A John Wiley & Sons, Ltd., Publication Contents Preface xiii 1 Introduction

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Different Criteria for Active Learning in Neural Networks: A Comparative Study

Different Criteria for Active Learning in Neural Networks: A Comparative Study Different Criteria for Active Learning in Neural Networks: A Comparative Study Jan Poland and Andreas Zell University of Tübingen, WSI - RA Sand 1, 72076 Tübingen, Germany Abstract. The field of active

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Stochastic process for macro

Stochastic process for macro Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information