Gaussian Process Regression Models for Predicting Stock Trends
|
|
- Alan Barrett
- 5 years ago
- Views:
Transcription
1 Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no noise. From all this relatively clean data it should be possible to predict accurate estimates o uture stock prices. A number o dierent supervised learning methods have been tried to predict uture stock prices, both or possible monetary gain and because it is an interesting research question. Examples o recent research are K. G. Srinivasa et al. [3] who used a neuro - genetic algorithm, M. A. Kaboudan [] who used genetic programming, and Robert J. Yan et al. [5] who tried competetive learning techniques. We use regression to capture changes in stock price prediction as a unction o a covariance (kernel or Gram) matrix. For our purposes it is natural to think o the price o a stock as being some unction over time. Loosely, a Gaussian processes can be cosidered to be deining a distrubution over unctions with the inerence step taking place directly in the space o unctions []. Thus, by using GP to extend a unction beyond known price data, we can predict whether stocks will rise or all the next day, and by how much. We conduct two experiments, both o which are carried out with data taken rom 8 stocks over the period o years. Our experiments keep track o two important eatures o stocks: their price, and whether their prices are rising or alling. The ormer is much more important in many cases o stock trading since it is useul to know by how much you are winning or loosing money. Method Non-kernel regression problems try to predict a continuous response y t R to some input vector x t R d. Kernel regression problems, on the other hand, map their input vectors into eature vectors via unctions φ(x t ) R n or some eature expansion φ( ) and input vector x t R d. The continous predictions are y t = θ T φ(x t )+ǫ in this simpliied model without bias and additive noise depends directly on the eature expansion used and thus the kernel K = φ( )φ( ) T. This is similar in the case o gaussian processes, and many o same rules or both linear and kernel regression apply. There are two ways to view regression in the case o gaussian processes (GP), the weight-space view and unction-space view []. We choose the unction-space view. Gaussian Process. A Gaussian process is any collection o random variables where an arbitrary subset o variables have a joint Gaussian distribution. mtarrel@mit.edu acorrea@mit.edu These stocks are: adbe, adsk, erts, mst, orcl, sap, symc, and vrsn
2 A Gaussian distribution is entirely speciied by the mean covariance and similarly a Gaussian distribution is completely speciied by its mean unction m(x) and covariance unction k(x,x ). That is, m(x) = E[(x)] = and, We can write the Gaussian process in the orm k(x,x ) = E[((x) m(x))((x m(x )))] = K (x) GP(m(x),k(x,x )). To simpliy notation and calculation the mean m(x) is taken to be. Thus we have, (x) GP(,K) The closing prices or stocks are assumed be be noise-ree. This is a special case since most training data contains noise in realistic processes. In this case we know the pairs {(t,i (t) t =,...,n)} or n training points, where t is the time at which we are evaluating the true price o the stock i, i (t) at closing time. There is a joint distribution over the training examples, and the predictions ˆ. The prior is centered with mean on, so [ ] ( [ ]) K(X,X) K(X, X) N, ˆ K( X,X) K( X, X) where the covariance term o X and X elements is an n n Gram matrix, X is the n d set o training input vectors, and X is the n d set o test input vectors. Functions drawn rom this joint prior distribution have the property that they agree with the observered training points. This limits the types o unctions acceptable to make predictions on the data. Function samples could be drawn rom the distribution and matched with the training data points to see i there is a it; however, or large data sets such as ours, this is computationally ineasible. Instead, consider calculating the conditional distribution or the values o ˆ on the observations o,x, X. This takes the orm, ˆ X,X, N(K( X,X)K(X,X),K( ˆX, ˆX) K( ˆX,X)K(X,X) K(X, ˆX)) This is the noise-ree prediction or ˆ over the posterior [].. Kernel Selection For the stock price data set we chose to try several dierent kernels to see which ones produced the best it o the training data. There are several types o kernel classes that can be used []. Below are the ew considered in this project. Stock data does not have a mean o. It would not make sense, as stocks cannot have negative prices. However, we used a Gaussian that assumed a mean o and a variance o our kernel (i.e. Ñ(, K)). We did this by shiting our stock price data such that its mean was. Not doing so would lead the GP process to believe that our predictive data s variance would be quite high, since the actual mean o a stock price is much higher than (one would hope).
3 .. Squared Exponential Covariance ( K(t,t ) = σ exp (t ) t ) ρ where ρ is the characteristic length-scale o the covariance unction and σ controls the smoothness o the curve. Squared Exponential Covariance is ininitely dierentiable. This means there is some implicit assumption about how smooth the estimated unctions are. That is, higher order noise will be smoothed over by this unction... Matern Class Covariance K(t,t ) = ν Γ(ν) ( ν(t t ) This covariance unction is an alternative to the Squared Exponential Covariance (SE) unction. While it can model higher order terms it is not ininitley dierentiable and approximates the SE unction as ν. In the our implementation we use only a speciic case o this covariance unction, where ν = 3 which is ( ) ( ) 3(t t K ν= 3(t,t ) 3(t t ) ) = + exp l l K ν= 3 is more interesting in most machine learning applications since the value o ν is not too high. When ν is too high, the predicted unction will have ver many jagged edges; on the other hand when it is too low the unction will be too smooth. Either case makes unrealistic assumptions about ˆ, thus ν = 3 is a good value. ρ ) ν..3 Rational Quadratic Covariance k RQ (t t ) = ( + (t t ) αl ) α The k RQ covariance is the SE covariance unction in the limit α. This is viewed as an ininite sum o SE covariance unctions with dierent length scales... Marginal Likelihood Gradient Given a Kernel it is important to ind the optimal set o hyperparameters it the observed data. To do this, maximize the marginal likelihood, log p(y X,θ) = ( (αα θ j tr T K ) ) K, θ j where α = K y []. The inversion o the matrix K is very computationally intensive and o O(n 3 ) complexity. This is a large drawback, particularly or the large data sets such as ours. However, once the value o K is known, the computational complexity drops to (O)(n ). 3
4 3 Experiments We perormed two experiments showing two dierent methods o stock prediction using Gaussian process. The irst uses a ± labeling to produce a simple metric o trend (where the label is + when rising and - when alling). The second uses regression on the value o the stock at the close o the trading day to predict the price o that stock or the ollowing day(s). To it the hyperparameters mentioned in Section. we used a iteration optimization o the marginal liklehood with each dierent kernel. 3. Prediction o Up/Down Trend In this experiment we see i we can predict stock prices by looking at only stock trend data (i.e. Is this stock rising or alling? ). Prediction on this property is highly sensitive to the amount o training data used. As shown in the large time range in Figure (rom /6/3 to /8/6) the price luctuates regularly and averages to a 5% chance that the predicted label, ŷ, will be + and a 5% chance it will be -. This particular prediction method is more useul on smaller data sets. Consider a Matern Class covariance unction trained on one month o stock price data as in Figure 3. Ater regression is perormed there is an indicaton or many stocks, take stock 6 and notice how the prediction curve is going down but remains above indicating that it is becoming less sure there will be a gain in close price. I the data is biased towards a + label then the next day s prediction is +. As the variance increases or urther predictions beyond the irst day the trend moves to about hal as likely to a + or. This is likely due to the uncertainty o our model to be able to predict a label eectively once the number o days passed has increased. 3. Single-Stock Price Prediction This experiment consists o three parts: training data regression, test data price prediction, and prediction assessment. The model assumes that the training data has zero mean, so the training outputs were recentered by calculating T rainingdata = T rainingdata mean(t rainingdata), with the outputs centered about zero. Ater the means and variances are predicted we translate the data back. This gives results like those in Table. Table : Experiment : Results Time Period Squared Exponential Matern ν = 3/ Rational Quadratic day Stock 7,. regret Stock 7,. regret no stock predicted month Stock 7,. regret Stock 7,. regret Stock 7,. 6 month Stock 7,. regret Stock 7,. regret Stock 7,. regret year Stock 8, regret Stock 8, regret Stock 7,. regret 3 years Stock 8, regret Stock 8, regret Stock 7,. regret Table : Chosen stock paired with regret in Covariance unction vs. time span table We measured penalty in terms o regret. Since we had the actual next-day (day t + ) prices o the test stock, we were able to ind which stock would have been the best to invest in the day
5 beore (day t). We then compared the dierence in price between time t and t + or the best stock and the stock we predicted, and took the dierence. In the case when our model predicted that no stock would gain any money the next day and no stock did, then we gave no penalty. I the model was wrong in this prediction, however, the penalty became the gain o the best stock, just as in the normal case. Conclusions The two experiments conducted here show the power o using Gaussian processes or predicting stock prices. There were a number o dierent ways to test how to make a good prediction based on the training data. By varying the lenghths o time considered or training, we achieved dierent results and accuracies. Using a long time period in experiment resulted in good stock predictions, while using shorter times resulted in sub-optimal predictions. In experiment, on the other hand, converse was true. The use o long time periods in the irst experiment did not produce interesting results, however, because regression only predicted the mean value o out o a long time period or predicting trend. For experiment where the goal was to classiy between two types o activity it was better to consider shorter time scales to avoid this averaging eect. A possible uture direction is using a larger subset o trends rather than restricting ourselves to mere growth and loss trends. The method in experiment showed that a Matern class or Squared Exponential class covariance unction predicted the optimal stock with less or equal regret compared with the rational quadratic covariance unction. The Squared Exponential unction makes some assumptions about the smoothness o the data as compared with the Matern Class covariance. The Rational Quadratic is a combination o Squared Exponential unctions at dierent length scales. This must play a role in how ineective this covariance unction was at predicting the optimal stock pick. It seems that i a large enough training set is used the Rational Quadratic covariance will eventually give stock 8 as a result. However, there was not enough time to test this. Given the data it appears either the Matern Class or Squared Exponential class is appropiate or use in prediction. Practially speaking however, using Gaussian process are not the most eicient method to use, because the computational load or predictions is high. To make money in the stock market using a machine learning technique, A aster method o computation would be required. Reerences [] M. A. Kaboudan. Genetic programming prediction o stock prices. Comput. Econ., 6(3):7 36,. [] C.E. Rasmussen and K. I. Williams. Gaussian processes or machine learning. 6. [3] K. G. Srinivasa, K. R. Venugopal, and L. M. Patnaik. An eicient uzzy based neuro - genetic algorithm or stock market prediction. Int. J. Hybrid Intell. Syst., 3():63 8, 6. [] R. von Mises. Mathematical theory o probability and statistics. 96. [5] Robert J. Yan and Charles X. Ling. Machine learning or stock selection. In KDD 7: Proceedings o the 3th ACM SIGKDD international conerence on Knowledge discovery and data mining, pages 38, New York, NY, USA, 7. ACM. 5
6 with Stock Hyperparameters σ =.65ρ= x with Stock Hyperparameters σ =.539ρ = x with Stock3 Hyperparameters σ =.88ρ= x with Stock Hyperparameters σ =.98ρ= x with Stock5 Hyperparameters σ =.65ρ= x with Stock6 Hyperparameters σ =.797ρ= x with Stock7 Hyperparameters σ =.338ρ = x with Stock8 Hyperparameters σ =.695ρ = x Figure : The above igures show the results o running a prediction on a large multi-year dataset. 6
7 8 Stock Predictions with Matern Class Covariance with Stock =.578ρ = Stock Predictions with Matern Class Covariance with Stock =.6ρ = Predictions with Matern Class Covariance x 8 Stock with Stock3 =.9969ρ = x Figure : Running label classiication on a large training set produces poor prediction. In a practical setting some stocks are highly regular and vary only a small amount or years at a time. This is relected in the act that the labels have what appears to be 5% chance o beeing labels + and 5% chace o being labeled - over time. 7
8 Stock Trends with Matern Class Covariance with Stock =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock Hyperparameters σ =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock =.98ρ = Stock Trends with Matern Class Covariance with Stock5 =.83ρ = Stock Trends with Matern Class Covariance with Stock6 Hyperparameters σ =.687ρ = Stock Trends with Matern Class Covariance with Stock7 =.3ρ = Stock Trends with Matern Class Covariance with Stock8 Hyperparameters σ =.736ρ = Figure 3: Running label classiication on a smaller dataset produces more meaninul results. The 95% conidence intervals show a trend towards either a + or - label. 8
9 Stock Predictions with Squared Exponential Covariance with Stock7 Hyperparameters σ =.7399ρ = Stock Predictions with Squared Exponential Covariance with Stock8 Hyperparameters σ =.3ρ = =.6867ρ =.88 Stock Predictions with Matern Class Covariance with Stock8 Hyperparameters σ =.559ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock7 =.ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock8 =.6ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock7 =.658ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock8 =.395ρ = Stock Prediction with Squared Exponential Class Covariance with Stock7 =.7856ρ = Stock Prediction with Squared Exponential Class Covariance with Stock8 Hyperparameters σ =.9ρ = with Stock7 =.6867ρ = with Stock8 Hyperparameters σ =.559ρ = Figure : The comparison between stock 7 and stock 9 8. Both were picked by the model. choosing stock 7 caused a regreet o. and choosing stock 8 caused no loss.
Gaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSTAT 801: Mathematical Statistics. Hypothesis Testing
STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More information2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.
.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics Spline interpolation was originally developed or image processing. In GIS, it is mainly used in visualization o spatial
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationPart I: Thin Converging Lens
Laboratory 1 PHY431 Fall 011 Part I: Thin Converging Lens This eperiment is a classic eercise in geometric optics. The goal is to measure the radius o curvature and ocal length o a single converging lens
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationLeast-Squares Spectral Analysis Theory Summary
Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,
More informationA Brief Survey on Semi-supervised Learning with Graph Regularization
000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 A
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationTokamak profile database construction incorporating Gaussian process regression
Tokamak profile database construction incorporating Gaussian process regression A. Ho 1, J. Citrin 1, C. Bourdelle 2, Y. Camenen 3, F. Felici 4, M. Maslov 5, K.L. van de Plassche 1,4, H. Weisen 6 and JET
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationCHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES
CAPTER 8 ANALYSS O AVERAGE SQUARED DERENCE SURACES n Chapters 5, 6, and 7, the Spectral it algorithm was used to estimate both scatterer size and total attenuation rom the backscattered waveorms by minimizing
More informationDefinition: Let f(x) be a function of one variable with continuous derivatives of all orders at a the point x 0, then the series.
2.4 Local properties o unctions o several variables In this section we will learn how to address three kinds o problems which are o great importance in the ield o applied mathematics: how to obtain the
More informationIntroduction to Machine Learning
Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/
More informationObjectives. By the time the student is finished with this section of the workbook, he/she should be able
FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationMachine Learning Srihari. Gaussian Processes. Sargur Srihari
Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationDemonstration of Emulator-Based Bayesian Calibration of Safety Analysis Codes: Theory and Formulation
Demonstration o Emulator-Based Bayesian Calibration o Saety Analysis Codes: Theory and Formulation The MIT Faculty has made this article openly available. Please share how this access beneits you. Your
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationAnalysis Scheme in the Ensemble Kalman Filter
JUNE 1998 BURGERS ET AL. 1719 Analysis Scheme in the Ensemble Kalman Filter GERRIT BURGERS Royal Netherlands Meteorological Institute, De Bilt, the Netherlands PETER JAN VAN LEEUWEN Institute or Marine
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationCan you predict the future..?
Can you predict the future..? Gaussian Process Modelling for Forward Prediction Anna Scaife 1 1 Jodrell Bank Centre for Astrophysics University of Manchester @radastrat September 7, 2017 Anna Scaife University
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More information3. Several Random Variables
. Several Random Variables. Two Random Variables. Conditional Probabilit--Revisited. Statistical Independence.4 Correlation between Random Variables. Densit unction o the Sum o Two Random Variables. Probabilit
More informationRoberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points
Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationGaussian Processes. 1 What problems can be solved by Gaussian Processes?
Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationPhysics 5153 Classical Mechanics. Solution by Quadrature-1
October 14, 003 11:47:49 1 Introduction Physics 5153 Classical Mechanics Solution by Quadrature In the previous lectures, we have reduced the number o eective degrees o reedom that are needed to solve
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationSupplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values
Supplementary material or Continuous-action planning or discounted ininite-horizon nonlinear optimal control with Lipschitz values List o main notations x, X, u, U state, state space, action, action space,
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics
More informationFluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs
Fluctuationlessness Theorem and its Application to Boundary Value Problems o ODEs NEJLA ALTAY İstanbul Technical University Inormatics Institute Maslak, 34469, İstanbul TÜRKİYE TURKEY) nejla@be.itu.edu.tr
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationOptimization of Gaussian Process Hyperparameters using Rprop
Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are
More informationMEASUREMENT UNCERTAINTIES
MEASUREMENT UNCERTAINTIES What distinguishes science rom philosophy is that it is grounded in experimental observations. These observations are most striking when they take the orm o a quantitative measurement.
More informationMODULE 6 LECTURE NOTES 1 REVIEW OF PROBABILITY THEORY. Most water resources decision problems face the risk of uncertainty mainly because of the
MODULE 6 LECTURE NOTES REVIEW OF PROBABILITY THEORY INTRODUCTION Most water resources decision problems ace the risk o uncertainty mainly because o the randomness o the variables that inluence the perormance
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationFisher Consistency of Multicategory Support Vector Machines
Fisher Consistency o Multicategory Support Vector Machines Yueng Liu Department o Statistics and Operations Research Carolina Center or Genome Sciences University o North Carolina Chapel Hill, NC 7599-360
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationCLASSIFICATION OF MULTIPLE ANNOTATOR DATA USING VARIATIONAL GAUSSIAN PROCESS INFERENCE
26 24th European Signal Processing Conerence EUSIPCO CLASSIFICATION OF MULTIPLE ANNOTATOR DATA USING VARIATIONAL GAUSSIAN PROCESS INFERENCE Emre Besler, Pablo Ruiz 2, Raael Molina 2, Aggelos K. Katsaggelos
More informationRobust Residual Selection for Fault Detection
Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationSupervised Learning Coursework
Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session
More informationGaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer
Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationCOMP 408/508. Computer Vision Fall 2017 PCA for Recognition
COMP 408/508 Computer Vision Fall 07 PCA or Recognition Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, )
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationDETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES
Proceedings o the ASME International Design Engineering Technical Conerences & Computers and Inormation in Engineering Conerence IDETC/CIE August 7-,, Bualo, USA DETC- A GENERALIZED MAX-MIN SAMPLE FOR
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationJoint Emotion Analysis via Multi-task Gaussian Processes
Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions
More informationarxiv: v1 [cs.sy] 20 Sep 2017
On the Design o LQR Kernels or Eicient Controller Learning Alonso Marco 1, Philipp Hennig 1, Stean Schaal 1, and Sebastian Trimpe 1 arxiv:179.79v1 [cs.sy] Sep 17 Abstract Finding optimal eedback controllers
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationKernels for Automatic Pattern Discovery and Extrapolation
Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationRATIONAL FUNCTIONS. Finding Asymptotes..347 The Domain Finding Intercepts Graphing Rational Functions
RATIONAL FUNCTIONS Finding Asymptotes..347 The Domain....350 Finding Intercepts.....35 Graphing Rational Functions... 35 345 Objectives The ollowing is a list o objectives or this section o the workbook.
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationQuadratic Functions. The graph of the function shifts right 3. The graph of the function shifts left 3.
Quadratic Functions The translation o a unction is simpl the shiting o a unction. In this section, or the most part, we will be graphing various unctions b means o shiting the parent unction. We will go
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More information