Gaussian Process Regression Models for Predicting Stock Trends

Size: px
Start display at page:

Download "Gaussian Process Regression Models for Predicting Stock Trends"

Transcription

1 Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no noise. From all this relatively clean data it should be possible to predict accurate estimates o uture stock prices. A number o dierent supervised learning methods have been tried to predict uture stock prices, both or possible monetary gain and because it is an interesting research question. Examples o recent research are K. G. Srinivasa et al. [3] who used a neuro - genetic algorithm, M. A. Kaboudan [] who used genetic programming, and Robert J. Yan et al. [5] who tried competetive learning techniques. We use regression to capture changes in stock price prediction as a unction o a covariance (kernel or Gram) matrix. For our purposes it is natural to think o the price o a stock as being some unction over time. Loosely, a Gaussian processes can be cosidered to be deining a distrubution over unctions with the inerence step taking place directly in the space o unctions []. Thus, by using GP to extend a unction beyond known price data, we can predict whether stocks will rise or all the next day, and by how much. We conduct two experiments, both o which are carried out with data taken rom 8 stocks over the period o years. Our experiments keep track o two important eatures o stocks: their price, and whether their prices are rising or alling. The ormer is much more important in many cases o stock trading since it is useul to know by how much you are winning or loosing money. Method Non-kernel regression problems try to predict a continuous response y t R to some input vector x t R d. Kernel regression problems, on the other hand, map their input vectors into eature vectors via unctions φ(x t ) R n or some eature expansion φ( ) and input vector x t R d. The continous predictions are y t = θ T φ(x t )+ǫ in this simpliied model without bias and additive noise depends directly on the eature expansion used and thus the kernel K = φ( )φ( ) T. This is similar in the case o gaussian processes, and many o same rules or both linear and kernel regression apply. There are two ways to view regression in the case o gaussian processes (GP), the weight-space view and unction-space view []. We choose the unction-space view. Gaussian Process. A Gaussian process is any collection o random variables where an arbitrary subset o variables have a joint Gaussian distribution. mtarrel@mit.edu acorrea@mit.edu These stocks are: adbe, adsk, erts, mst, orcl, sap, symc, and vrsn

2 A Gaussian distribution is entirely speciied by the mean covariance and similarly a Gaussian distribution is completely speciied by its mean unction m(x) and covariance unction k(x,x ). That is, m(x) = E[(x)] = and, We can write the Gaussian process in the orm k(x,x ) = E[((x) m(x))((x m(x )))] = K (x) GP(m(x),k(x,x )). To simpliy notation and calculation the mean m(x) is taken to be. Thus we have, (x) GP(,K) The closing prices or stocks are assumed be be noise-ree. This is a special case since most training data contains noise in realistic processes. In this case we know the pairs {(t,i (t) t =,...,n)} or n training points, where t is the time at which we are evaluating the true price o the stock i, i (t) at closing time. There is a joint distribution over the training examples, and the predictions ˆ. The prior is centered with mean on, so [ ] ( [ ]) K(X,X) K(X, X) N, ˆ K( X,X) K( X, X) where the covariance term o X and X elements is an n n Gram matrix, X is the n d set o training input vectors, and X is the n d set o test input vectors. Functions drawn rom this joint prior distribution have the property that they agree with the observered training points. This limits the types o unctions acceptable to make predictions on the data. Function samples could be drawn rom the distribution and matched with the training data points to see i there is a it; however, or large data sets such as ours, this is computationally ineasible. Instead, consider calculating the conditional distribution or the values o ˆ on the observations o,x, X. This takes the orm, ˆ X,X, N(K( X,X)K(X,X),K( ˆX, ˆX) K( ˆX,X)K(X,X) K(X, ˆX)) This is the noise-ree prediction or ˆ over the posterior [].. Kernel Selection For the stock price data set we chose to try several dierent kernels to see which ones produced the best it o the training data. There are several types o kernel classes that can be used []. Below are the ew considered in this project. Stock data does not have a mean o. It would not make sense, as stocks cannot have negative prices. However, we used a Gaussian that assumed a mean o and a variance o our kernel (i.e. Ñ(, K)). We did this by shiting our stock price data such that its mean was. Not doing so would lead the GP process to believe that our predictive data s variance would be quite high, since the actual mean o a stock price is much higher than (one would hope).

3 .. Squared Exponential Covariance ( K(t,t ) = σ exp (t ) t ) ρ where ρ is the characteristic length-scale o the covariance unction and σ controls the smoothness o the curve. Squared Exponential Covariance is ininitely dierentiable. This means there is some implicit assumption about how smooth the estimated unctions are. That is, higher order noise will be smoothed over by this unction... Matern Class Covariance K(t,t ) = ν Γ(ν) ( ν(t t ) This covariance unction is an alternative to the Squared Exponential Covariance (SE) unction. While it can model higher order terms it is not ininitley dierentiable and approximates the SE unction as ν. In the our implementation we use only a speciic case o this covariance unction, where ν = 3 which is ( ) ( ) 3(t t K ν= 3(t,t ) 3(t t ) ) = + exp l l K ν= 3 is more interesting in most machine learning applications since the value o ν is not too high. When ν is too high, the predicted unction will have ver many jagged edges; on the other hand when it is too low the unction will be too smooth. Either case makes unrealistic assumptions about ˆ, thus ν = 3 is a good value. ρ ) ν..3 Rational Quadratic Covariance k RQ (t t ) = ( + (t t ) αl ) α The k RQ covariance is the SE covariance unction in the limit α. This is viewed as an ininite sum o SE covariance unctions with dierent length scales... Marginal Likelihood Gradient Given a Kernel it is important to ind the optimal set o hyperparameters it the observed data. To do this, maximize the marginal likelihood, log p(y X,θ) = ( (αα θ j tr T K ) ) K, θ j where α = K y []. The inversion o the matrix K is very computationally intensive and o O(n 3 ) complexity. This is a large drawback, particularly or the large data sets such as ours. However, once the value o K is known, the computational complexity drops to (O)(n ). 3

4 3 Experiments We perormed two experiments showing two dierent methods o stock prediction using Gaussian process. The irst uses a ± labeling to produce a simple metric o trend (where the label is + when rising and - when alling). The second uses regression on the value o the stock at the close o the trading day to predict the price o that stock or the ollowing day(s). To it the hyperparameters mentioned in Section. we used a iteration optimization o the marginal liklehood with each dierent kernel. 3. Prediction o Up/Down Trend In this experiment we see i we can predict stock prices by looking at only stock trend data (i.e. Is this stock rising or alling? ). Prediction on this property is highly sensitive to the amount o training data used. As shown in the large time range in Figure (rom /6/3 to /8/6) the price luctuates regularly and averages to a 5% chance that the predicted label, ŷ, will be + and a 5% chance it will be -. This particular prediction method is more useul on smaller data sets. Consider a Matern Class covariance unction trained on one month o stock price data as in Figure 3. Ater regression is perormed there is an indicaton or many stocks, take stock 6 and notice how the prediction curve is going down but remains above indicating that it is becoming less sure there will be a gain in close price. I the data is biased towards a + label then the next day s prediction is +. As the variance increases or urther predictions beyond the irst day the trend moves to about hal as likely to a + or. This is likely due to the uncertainty o our model to be able to predict a label eectively once the number o days passed has increased. 3. Single-Stock Price Prediction This experiment consists o three parts: training data regression, test data price prediction, and prediction assessment. The model assumes that the training data has zero mean, so the training outputs were recentered by calculating T rainingdata = T rainingdata mean(t rainingdata), with the outputs centered about zero. Ater the means and variances are predicted we translate the data back. This gives results like those in Table. Table : Experiment : Results Time Period Squared Exponential Matern ν = 3/ Rational Quadratic day Stock 7,. regret Stock 7,. regret no stock predicted month Stock 7,. regret Stock 7,. regret Stock 7,. 6 month Stock 7,. regret Stock 7,. regret Stock 7,. regret year Stock 8, regret Stock 8, regret Stock 7,. regret 3 years Stock 8, regret Stock 8, regret Stock 7,. regret Table : Chosen stock paired with regret in Covariance unction vs. time span table We measured penalty in terms o regret. Since we had the actual next-day (day t + ) prices o the test stock, we were able to ind which stock would have been the best to invest in the day

5 beore (day t). We then compared the dierence in price between time t and t + or the best stock and the stock we predicted, and took the dierence. In the case when our model predicted that no stock would gain any money the next day and no stock did, then we gave no penalty. I the model was wrong in this prediction, however, the penalty became the gain o the best stock, just as in the normal case. Conclusions The two experiments conducted here show the power o using Gaussian processes or predicting stock prices. There were a number o dierent ways to test how to make a good prediction based on the training data. By varying the lenghths o time considered or training, we achieved dierent results and accuracies. Using a long time period in experiment resulted in good stock predictions, while using shorter times resulted in sub-optimal predictions. In experiment, on the other hand, converse was true. The use o long time periods in the irst experiment did not produce interesting results, however, because regression only predicted the mean value o out o a long time period or predicting trend. For experiment where the goal was to classiy between two types o activity it was better to consider shorter time scales to avoid this averaging eect. A possible uture direction is using a larger subset o trends rather than restricting ourselves to mere growth and loss trends. The method in experiment showed that a Matern class or Squared Exponential class covariance unction predicted the optimal stock with less or equal regret compared with the rational quadratic covariance unction. The Squared Exponential unction makes some assumptions about the smoothness o the data as compared with the Matern Class covariance. The Rational Quadratic is a combination o Squared Exponential unctions at dierent length scales. This must play a role in how ineective this covariance unction was at predicting the optimal stock pick. It seems that i a large enough training set is used the Rational Quadratic covariance will eventually give stock 8 as a result. However, there was not enough time to test this. Given the data it appears either the Matern Class or Squared Exponential class is appropiate or use in prediction. Practially speaking however, using Gaussian process are not the most eicient method to use, because the computational load or predictions is high. To make money in the stock market using a machine learning technique, A aster method o computation would be required. Reerences [] M. A. Kaboudan. Genetic programming prediction o stock prices. Comput. Econ., 6(3):7 36,. [] C.E. Rasmussen and K. I. Williams. Gaussian processes or machine learning. 6. [3] K. G. Srinivasa, K. R. Venugopal, and L. M. Patnaik. An eicient uzzy based neuro - genetic algorithm or stock market prediction. Int. J. Hybrid Intell. Syst., 3():63 8, 6. [] R. von Mises. Mathematical theory o probability and statistics. 96. [5] Robert J. Yan and Charles X. Ling. Machine learning or stock selection. In KDD 7: Proceedings o the 3th ACM SIGKDD international conerence on Knowledge discovery and data mining, pages 38, New York, NY, USA, 7. ACM. 5

6 with Stock Hyperparameters σ =.65ρ= x with Stock Hyperparameters σ =.539ρ = x with Stock3 Hyperparameters σ =.88ρ= x with Stock Hyperparameters σ =.98ρ= x with Stock5 Hyperparameters σ =.65ρ= x with Stock6 Hyperparameters σ =.797ρ= x with Stock7 Hyperparameters σ =.338ρ = x with Stock8 Hyperparameters σ =.695ρ = x Figure : The above igures show the results o running a prediction on a large multi-year dataset. 6

7 8 Stock Predictions with Matern Class Covariance with Stock =.578ρ = Stock Predictions with Matern Class Covariance with Stock =.6ρ = Predictions with Matern Class Covariance x 8 Stock with Stock3 =.9969ρ = x Figure : Running label classiication on a large training set produces poor prediction. In a practical setting some stocks are highly regular and vary only a small amount or years at a time. This is relected in the act that the labels have what appears to be 5% chance o beeing labels + and 5% chace o being labeled - over time. 7

8 Stock Trends with Matern Class Covariance with Stock =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock Hyperparameters σ =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock =.98ρ = Stock Trends with Matern Class Covariance with Stock5 =.83ρ = Stock Trends with Matern Class Covariance with Stock6 Hyperparameters σ =.687ρ = Stock Trends with Matern Class Covariance with Stock7 =.3ρ = Stock Trends with Matern Class Covariance with Stock8 Hyperparameters σ =.736ρ = Figure 3: Running label classiication on a smaller dataset produces more meaninul results. The 95% conidence intervals show a trend towards either a + or - label. 8

9 Stock Predictions with Squared Exponential Covariance with Stock7 Hyperparameters σ =.7399ρ = Stock Predictions with Squared Exponential Covariance with Stock8 Hyperparameters σ =.3ρ = =.6867ρ =.88 Stock Predictions with Matern Class Covariance with Stock8 Hyperparameters σ =.559ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock7 =.ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock8 =.6ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock7 =.658ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock8 =.395ρ = Stock Prediction with Squared Exponential Class Covariance with Stock7 =.7856ρ = Stock Prediction with Squared Exponential Class Covariance with Stock8 Hyperparameters σ =.9ρ = with Stock7 =.6867ρ = with Stock8 Hyperparameters σ =.559ρ = Figure : The comparison between stock 7 and stock 9 8. Both were picked by the model. choosing stock 7 caused a regreet o. and choosing stock 8 caused no loss.

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

STAT 801: Mathematical Statistics. Hypothesis Testing

STAT 801: Mathematical Statistics. Hypothesis Testing STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd. .6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics Spline interpolation was originally developed or image processing. In GIS, it is mainly used in visualization o spatial

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Part I: Thin Converging Lens

Part I: Thin Converging Lens Laboratory 1 PHY431 Fall 011 Part I: Thin Converging Lens This eperiment is a classic eercise in geometric optics. The goal is to measure the radius o curvature and ocal length o a single converging lens

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Least-Squares Spectral Analysis Theory Summary

Least-Squares Spectral Analysis Theory Summary Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,

More information

A Brief Survey on Semi-supervised Learning with Graph Regularization

A Brief Survey on Semi-supervised Learning with Graph Regularization 000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 A

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Tokamak profile database construction incorporating Gaussian process regression

Tokamak profile database construction incorporating Gaussian process regression Tokamak profile database construction incorporating Gaussian process regression A. Ho 1, J. Citrin 1, C. Bourdelle 2, Y. Camenen 3, F. Felici 4, M. Maslov 5, K.L. van de Plassche 1,4, H. Weisen 6 and JET

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

CHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES

CHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES CAPTER 8 ANALYSS O AVERAGE SQUARED DERENCE SURACES n Chapters 5, 6, and 7, the Spectral it algorithm was used to estimate both scatterer size and total attenuation rom the backscattered waveorms by minimizing

More information

Definition: Let f(x) be a function of one variable with continuous derivatives of all orders at a the point x 0, then the series.

Definition: Let f(x) be a function of one variable with continuous derivatives of all orders at a the point x 0, then the series. 2.4 Local properties o unctions o several variables In this section we will learn how to address three kinds o problems which are o great importance in the ield o applied mathematics: how to obtain the

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/

More information

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Objectives. By the time the student is finished with this section of the workbook, he/she should be able FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Demonstration of Emulator-Based Bayesian Calibration of Safety Analysis Codes: Theory and Formulation

Demonstration of Emulator-Based Bayesian Calibration of Safety Analysis Codes: Theory and Formulation Demonstration o Emulator-Based Bayesian Calibration o Saety Analysis Codes: Theory and Formulation The MIT Faculty has made this article openly available. Please share how this access beneits you. Your

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Analysis Scheme in the Ensemble Kalman Filter

Analysis Scheme in the Ensemble Kalman Filter JUNE 1998 BURGERS ET AL. 1719 Analysis Scheme in the Ensemble Kalman Filter GERRIT BURGERS Royal Netherlands Meteorological Institute, De Bilt, the Netherlands PETER JAN VAN LEEUWEN Institute or Marine

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Can you predict the future..?

Can you predict the future..? Can you predict the future..? Gaussian Process Modelling for Forward Prediction Anna Scaife 1 1 Jodrell Bank Centre for Astrophysics University of Manchester @radastrat September 7, 2017 Anna Scaife University

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

3. Several Random Variables

3. Several Random Variables . Several Random Variables. Two Random Variables. Conditional Probabilit--Revisited. Statistical Independence.4 Correlation between Random Variables. Densit unction o the Sum o Two Random Variables. Probabilit

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Physics 5153 Classical Mechanics. Solution by Quadrature-1

Physics 5153 Classical Mechanics. Solution by Quadrature-1 October 14, 003 11:47:49 1 Introduction Physics 5153 Classical Mechanics Solution by Quadrature In the previous lectures, we have reduced the number o eective degrees o reedom that are needed to solve

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values Supplementary material or Continuous-action planning or discounted ininite-horizon nonlinear optimal control with Lipschitz values List o main notations x, X, u, U state, state space, action, action space,

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs Fluctuationlessness Theorem and its Application to Boundary Value Problems o ODEs NEJLA ALTAY İstanbul Technical University Inormatics Institute Maslak, 34469, İstanbul TÜRKİYE TURKEY) nejla@be.itu.edu.tr

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

MEASUREMENT UNCERTAINTIES

MEASUREMENT UNCERTAINTIES MEASUREMENT UNCERTAINTIES What distinguishes science rom philosophy is that it is grounded in experimental observations. These observations are most striking when they take the orm o a quantitative measurement.

More information

MODULE 6 LECTURE NOTES 1 REVIEW OF PROBABILITY THEORY. Most water resources decision problems face the risk of uncertainty mainly because of the

MODULE 6 LECTURE NOTES 1 REVIEW OF PROBABILITY THEORY. Most water resources decision problems face the risk of uncertainty mainly because of the MODULE 6 LECTURE NOTES REVIEW OF PROBABILITY THEORY INTRODUCTION Most water resources decision problems ace the risk o uncertainty mainly because o the randomness o the variables that inluence the perormance

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Fisher Consistency of Multicategory Support Vector Machines

Fisher Consistency of Multicategory Support Vector Machines Fisher Consistency o Multicategory Support Vector Machines Yueng Liu Department o Statistics and Operations Research Carolina Center or Genome Sciences University o North Carolina Chapel Hill, NC 7599-360

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

CLASSIFICATION OF MULTIPLE ANNOTATOR DATA USING VARIATIONAL GAUSSIAN PROCESS INFERENCE

CLASSIFICATION OF MULTIPLE ANNOTATOR DATA USING VARIATIONAL GAUSSIAN PROCESS INFERENCE 26 24th European Signal Processing Conerence EUSIPCO CLASSIFICATION OF MULTIPLE ANNOTATOR DATA USING VARIATIONAL GAUSSIAN PROCESS INFERENCE Emre Besler, Pablo Ruiz 2, Raael Molina 2, Aggelos K. Katsaggelos

More information

Robust Residual Selection for Fault Detection

Robust Residual Selection for Fault Detection Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Supervised Learning Coursework

Supervised Learning Coursework Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition COMP 408/508 Computer Vision Fall 07 PCA or Recognition Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, )

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

DETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES

DETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES Proceedings o the ASME International Design Engineering Technical Conerences & Computers and Inormation in Engineering Conerence IDETC/CIE August 7-,, Bualo, USA DETC- A GENERALIZED MAX-MIN SAMPLE FOR

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

arxiv: v1 [cs.sy] 20 Sep 2017

arxiv: v1 [cs.sy] 20 Sep 2017 On the Design o LQR Kernels or Eicient Controller Learning Alonso Marco 1, Philipp Hennig 1, Stean Schaal 1, and Sebastian Trimpe 1 arxiv:179.79v1 [cs.sy] Sep 17 Abstract Finding optimal eedback controllers

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Kernels for Automatic Pattern Discovery and Extrapolation

Kernels for Automatic Pattern Discovery and Extrapolation Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

RATIONAL FUNCTIONS. Finding Asymptotes..347 The Domain Finding Intercepts Graphing Rational Functions

RATIONAL FUNCTIONS. Finding Asymptotes..347 The Domain Finding Intercepts Graphing Rational Functions RATIONAL FUNCTIONS Finding Asymptotes..347 The Domain....350 Finding Intercepts.....35 Graphing Rational Functions... 35 345 Objectives The ollowing is a list o objectives or this section o the workbook.

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Quadratic Functions. The graph of the function shifts right 3. The graph of the function shifts left 3.

Quadratic Functions. The graph of the function shifts right 3. The graph of the function shifts left 3. Quadratic Functions The translation o a unction is simpl the shiting o a unction. In this section, or the most part, we will be graphing various unctions b means o shiting the parent unction. We will go

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Support Vector Machines and Bayes Regression

Support Vector Machines and Bayes Regression Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information