Learning Gaussian Process Models from Uncertain Data
|
|
- Augustine Foster
- 5 years ago
- Views:
Transcription
1 Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada Abstract. It is generally assumed in the traditional formulation of supervised learning that only the outputs data are uncertain. However, this assumption might be too strong for some learning tasks. This paper investigates the use of Gaussian Process prior to infer consistent models given uncertain data. By assuming a Gaussian distribution with known variances over the inputs and a Gaussian covariance function, it is possible to marginalize out the inputs uncertainty and keep an analytical posterior distribution over functions. We demonstrated the properties of the method on a synthetic problem and on a more realistic one, which consist in learning the dynamics of the well-known cart-pole problem and compare the performance versus a classic Gaussian Process. A large improvement of the mean squared error is presented as well as the consistency of the result of the regression. Key words: Gaussian Processes, Noisy Inputs, Dynamical Systems 1 Introduction As soon as a regression has to be done using a statistical model on noisy inputs, the resulting quality of the estimated model may suffer if no attention is paid to the uncertainty of the training set. Actually, this may occur in two different ways, one due to the training with noisy inputs and the other due to an extra noise in the outputs caused by the noise over the inputs. Statisticians already have investigated this problem in several ways: total leastsquares [1] changes the cost of the regression problem to encourage the regressor to minimize both error due to noise on outputs as well as noise on inputs; the errorin-variables model [2] deals directly with noisy inputs by creating correlated virtual variables that thus have correlated noises. Recent work in machine learning has also addressed this problem, either by attempting to learn the entire input distribution [3], by integrating over chosen noisy points using estimated distribution during training [4] or by de-noising the inputs by accounting for the noise while training the model [5]. In this paper, we investigate an approach, pioneered by Girard [6], in which more than trying to predict using noisy inputs, we learn from these inputs by marginalizing out the inputs uncertainty and keep an analytical posterior distribution over functions. This approach achieves two goals: First it shows that we are able to learn and make
2 2 Learning Gaussian Process Models from Uncertain Data prediction from noisy inputs. Second, this method is applied to a well-known problem of balancing a pole over a cart where the problem is to learn the 5-dimensional nonlinear dynamics of the system. Results show that taking into account the uncertainty of the inputs make the regression consistent and reduce drastically the mean squared error. This paper is structured as follows. First, we formalize the problem of learning with noisy inputs and introduce some notations about Gaussian Processes and the regression model. In section 3, we present the experimental results on a difficult artificial problem and on a more realistic problem. Section 4 discusses the results and concludes the paper. 2 Preliminaries A Gaussian Process (GP) is a stochastic process which is used in machine learning to describe a distribution directly into the function space. It also provides a probabilistic approach to the learning task and has the interesting property to give uncertainty estimates while doing predictions. The interested reader is invited to refer to [7] for more information on GPs. 2.1 Gaussian Process regression By using a GP prior, it is assumed that the joint distribution of the finite set of observations given their inputs is multivariate Gaussian. Thus, a GP is fully specified by its mean and covariance functions. Assume that a set of training data D = {x i, y i } N i=1 is available where x i R D, y is a scalar observation such that y i = f(x i ) + ɛ i (1) and where ɛ i is a white Gaussian noise. For convenience, we will use the notation X = [x 1,..., x N ] for inputs and y = [y 1,..., y N ] for outputs. Under the GP prior model with zero mean function, the joint distribution of the training set is y X N (0, K) where K is the covariance matrix whose entries K ij are given by the covariance function C(x i, x j ). This multivariate Gaussian probability distribution over the training observations can be used to compute the posterior distribution over functions. Therefore, making prediction is done by using the posterior mean and its associated measure of uncertainty, given by the posterior covariance. For a test input x, the posterior distribution is f x, X, y N (µ(x ), σ 2 (x )) with mean and variance functions given by µ(x ) = k K 1 y (2) σ 2 (x ) = C(x, x ) k K 1 k (3) where k is the N 1 vector of covariance between x and training inputs X. Although many covariance functions can be used to define a GP prior, we will use for the reminder of this paper the squared exponential which is one of the most widely used kernel function. The chosen kernel function C(x i, x j ) = σ 2 f exp((x i x j ) W 1 (x i x j )) + σ 2 ɛ δ ij (4)
3 Learning Gaussian Process Models from Uncertain Data 3 is parameterized by a vector of hyperparameters θ = [W, σ 2 f, σ2 ɛ ], where W is the diagonal matrix of characteristic length-scale, which account for different covariance measure for each input dimension, σ 2 f is the signal variance and σ2 ɛ is the noise variance. Varying these hyperparameters influence the interpretation of the training data by modifying the shapes of functions allowed by the GP prior. It might be difficult a priori to fix the hyperparameters of a kernel function and expect these to fit the observed data correctly. A common way to estimate the hyperparameters is to maximize the log likelihood of the observations y [7]. The function to maximize is log p(y X, θ) = 1 2 y K 1 y 1 2 log K N 2 log 2π (5) since the joint distribution of the observations is a multivariate Gaussian. The maximization can be done using conjugate gradient methods to find an acceptable local maxima. 2.2 Learning with uncertain inputs As we suit in the introduction, the assumption that only the outputs are noisy is not enough for some learning task. Consider the case where the inputs are uncertain and where each input value comes with variance estimates. It has been shown by Girard [6] that, for normally distributed inputs and using the squared exponential as kernel function, integrate over the input distribution analytically is feasible. Consider the case where inputs are a set of Gaussian distributions rather than a set of point estimates. Therefore, the true input value x i is not observable, but we have access to its distribution N (u i, Σ i ). Thus, accounting for these inputs distributions is done by solving C n = C(x i, x j )p(x i )p(x j )dx i dx j (6) where p(x i ) = N (u i, Σ i ) and p(x j ) = N (u j, Σ j ). Since [8] involve integrations over products of Gaussians, the resulting kernel function is computed exactly with C n ((u i, Σ i ), (u j, Σ j )) = σ2 f exp((u i u j ) (W + Σ i + Σ j ) 1 (u i u j )) + σ I + W 1 (Σ i + Σ j ) 1 ɛ 2 δ ij 2 (7) which is again a squared exponential 1. It is easy to see that this new kernel function is a generalization of [3] by letting the covariance matrix of both inputs tends to zero. Hence, it is possible to learn from a combination of noise-free and uncertain inputs. Theoretically, learning from uncertain data is as difficult as in the noise-free case, although it might require more data. The posterior distribution over function is found using the same equations by using the new covariance function. The hyperparameters can be learned with the log-likelihood as well, but it is now riddled with many local maxima. Using standard conjugate gradient methods will quickly lead to a local maxima that might not explain the data properly. An improper local maxima which occurs often 1 In fact, the noise term is not a part of the integration since it models an independent noise process, and thus it remains in the new kernel.
4 4 Learning Gaussian Process Models from Uncertain Data is to interpret the observations as highly noisy. In this case, the matrix W tends to have large values on its diagonal, meaning that most dimensions are irrelevant, and the value of σ 2 ɛ is over estimated to transpose the input error in the output dimensions. A solution to prevent this difficulty is to find a maximum a posteriori (MAP) estimation of the hyperparameters. Placing a prior over the hyperparameters will thus act as a regularization term to prevent improper local maxima. In the experiments, we chose to use a prior of the exponential family in order to get a simpler log posterior function to maximize. 3 Experiments In our experiments, we compare the performance of the Gaussian Process using inputs uncertainty () and the standard Gaussian Process () which use only the point estimates. We first evaluate the behavior of each method on a one-dimensional synthetic problem and then compare their performances on a harder problem which consists in learning the nonlinear dynamics of a cart-pole system. 3.1 Synthetic Problem: Sincsig In order to be able to easily visualize the behavior of both GPs prior, we have chosen a one-dimensional function for the first learning example. The function is composed of a sinc and a sigmoid function as y = { sinc(x) if x [1 + exp( 10x 5)] otherwise (8) and we will refer to it as the Sincsig function. The evaluation has been conducted on randomly drawn training sets of different sizes. We uniformly sampled N inputs in [ 10, 10] which are the noise-free inputs {x i } N i=1. The observations set is then constructed by sampling each output according to y i N (sincsig(x i ), σy). 2 The computation of the uncertain inputs is done by sampling the noise σx 2 i to be applied on each input. For each noise-free x i, we sampled the noisy input according to u i N (x i, σx 2 i ). It is easy to see that x i u i, σx 2 i N (u i, σx 2 i ) and therefore we have a complete training set which is defined as D = {(u i, σx 2 i ), y i } N i=1. Figure 1 show a typical example of a training data set (crosses), with the real function to be regressed (solid line) and the result of the regression (thin line) for the (top) and the classic GP (bottom). Error bars indicate that the is not consistent with the data since it does not take into account the noise s variance on inputs. The first experiment was conducted with an output noise standard deviation σ y = 0.1 with different size of training sets. The input noises standard deviation σ xi were sampled uniformly in [0.5, 2.5]. We chose these standard deviations so that adding artificially some independent noise during the optimisation process over the outputs can explain the noise over the inputs. All comparisons of the and the has been done by training both with the same random data sets 2. Figure 2(a) shows the 2 Note that the standard Gaussian Process regression does not use the variances of the inputs.
5 Learning Gaussian Process Models from Uncertain Data Fig. 1. The Sincsig function with and regressions (a) Mean Squared Error. σ y = (b) Mean Squared Error. σ y (0.5, 2.5) Fig. 2. Results on the Sincsig Problem
6 6 Learning Gaussian Process Models from Uncertain Data averaged mean square error over 25 randomly chosen training sets for different values of N. Results show that when very few data are available, both processes explain the outputs with lot of noise over the outputs. As expected, when the size of the data set increases, the optimized its hyperparameters so as to explain the noisy inputs by very noisy outputs while the correctly explain the noise on the inputs and selects the less noisy so as to minimize the mean squared error. In the second experiment, in order to emphasize the impact of noisy inputs, we assumed that the Gaussian processes now know the noise s variance on the observations. Therefore, the noise hyperparameters σɛ 2 is set to zero since the processes exactly know the noise matrix to be added when computing the covariance matrix. For each output, the standard deviation σ yi is then uniformly sampled in [0.2, 0.5]. Figure 2(b) shows the performance of and. Not allowing to explain noisy data by the independent noise process has two effects: First, it does not allow the to explain noisy inputs by noisy outputs when only few data are available, and it also forces the to use the information on input variance whatever the size of the data set is. Let us now see what the results on a real nonlinear dynamical system. 3.2 The Cart Pole Problem We now consider the harder problem of learning the cart pole dynamics. Figure 3 gives a picture of the system from which we try to learn the dynamics. The state is defined by the position (ϕ) of the cart, its velocity ( ϕ), the pole s angle (α) and its angular velocity ( α). There is also a control input which is used to apply lateral forces on the cart. Following the equation in [9] to govern the dynamics, we used Euler s method to update the system s state: Fig. 3. The cart-pole balancing problem α = ( F mpl α g sin α + cos α 2 sin α ( ) 4 l 3 mp cos2 α m c+m p m c+m p ) ϕ = F + m pl( α 2 sin α α sin α) m c + m p Where g is the gravity force, F the force associated to the action, l the half-length of the cart, m p the mass of the pole and m c the mass of the cart. For this problem, the training set were sampled exactly as in the Sincsig case. Stateaction pairs were uniformly sampled on their respective domains. The outputs were obtained with the true dynamical system and then perturbed with sampled noises assumed known. Since the output variances are also known, the training set can be seen
7 Learning Gaussian Process Models from Uncertain Data 7 as Gaussian input distributions that map to Gaussian output distributions. Therefore, one might use a sequence of Gaussian belief state as its training set in order to learn a partially observable dynamical system. Following this idea, there is no reason for the output distributions to have a significantly smaller variance Position Position then the input distribution. Velocity Velocity In this experiment, the input and output noises standard deviation were uniformly 25 sampled in [0.5, 2.5] for each dimensions. Every output dimensions 25 were treated independently by using a Gaussian Process prior for each of them. Figure 4 shows the averaged mean square error over 25 randomly chosen training sets for different N values for each dimension Number Number of training of training data data Number Number of training of training data data Position Position Number of training of data data (a) Pole Position Pole Angle Angle Velocity Pole Velocity Pole Angle Angle Number of of training data Number Number of training of training data data (b) Velocity Angular Velocity 8 8 (c) Pole Angle Angular Angular Velocity Velocity Number of training of training data data (d) Angular Velocity Fig. 4. Mean Squared Error results on the Cart-pole problem Learning the kernel hyperparameters As stated at the end of Section 2.2, it is possible to learn the hyperparameters given a training set. Since conjugate gradient methods performed poorly for the optimization of the log likelihood in the cases, we preferred stochastic optimization methods for this task. In every experiments, we thus maximized the log posterior instead of the log likelihood. A gamma Γ (2, 1) prior distributions as been placed over all characteristic length-scale terms in W and a normal N (0, 1) prior distribution as been placed over the signal standard deviation σ f. Comparing to previous work on the subject [6, 10] which use isotropic hyperparameters in the kernel function, we applied automatic relevance determination, that improves considerably the performance while does not increase the complexity of the kernel. 4 Discussion Results for the synthetic problem are presented in Figure 1, 2(a) and 2(b). These results first show that using the knowledge of the noise on the inputs improve the consistency of the regression more than the standard Gaussian Process since the error assumed by the includes completely the real function while the one of does not. Second, the is also able to discriminate which noise comes from the input and
8 8 Learning Gaussian Process Models from Uncertain Data which one come from the output as denoted in Figure 2(a) and 2(b). As the does not assume any noise on the input, it always assumes that the noise comes from the outputs, and thus learns a large hyperparameter for the noise, that also augments its mean squared error. Problems of this approach come as soon as an optimisation of the hyperparameters have to be done. Indeed, the log-likelihood function is riddled of local maxima that cannot be avoided using classic gradient methods. An interesting avenue would be to look at natural gradient approaches [11]. Another future work concerns the application of this work to the learning of continuous Hidden Markov Models, as well as continuous POMDPs by using the belief state as a noisy input [12]. To conclude, we proposed a Gaussian Process Model for regression that is able to learn with noise on the inputs and on the outputs as well as to predict with less mean squared error than permitted by previous approaches while keeping consistent with the true function. Results on a synthetic problem explain the advantages of the methods while results on the cart-pole problem show the applicability of the approach to the learning of real nonlinear dynamical systems, largely outperforming previous methods. References 1. Golub, G., Loan, C.V.: An Analysis of the Total Least Squares problem. SIAM J. Numer. Anal. 17 (1980) Caroll, R., Ruppert, D., Stefanski, L.: Measurement Error in Nonlinear Models. Chapman and Hall (1995) 3. Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an EM approach. In: NIPS. (1993) Tresp, V., Ahmad, S., Neuneier, R.: Training Neural Networks with Deficient Data. In: NIPS. (1993) Quiñonero-Candela, J., Roweis, S.T.: Data imputation and robust training with gaussian processes (2003) 6. Girard, A.: Approximate Methods for Propagation of Uncertainty with Gaussian Process Model. PhD thesis, University of Glasgow, Glasgow, UK (2004) 7. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press (December 2006) 8. Girard, A., Rasmussen, C.E., Quiñonero-Candela, J., Murray-Smith, R.: Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting. In: NIPS. (2002) Florian, R.: Correct Equations for the Dynamics of the Cart-pole System. Technical report, Center for Cognitive and Neural Studies (2007) 10. Quiñonero-Candela, J.: Learn ing with Uncertainty - Gaussian Processes and Relevance Vector Machines. PhD thesis, Technical University of Denmark, Denmark (2004) 11. Roux, N.L., Manzagol, P.A., Bengio, Y.: Topmoumoute Online Natural Gradient Algorithm. In: NIPS. (2008) Dallaire, P., Besse, C., Chaib-draa, B.: GP-POMDP: Bayesian Reinforcement Learning in Continuous POMDPs with Gaussian Processes. In: Proc. of IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems. (2009) To appear.
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationGlasgow eprints Service
Girard, A. and Rasmussen, C.E. and Quinonero-Candela, J. and Murray- Smith, R. (3) Gaussian Process priors with uncertain inputs? Application to multiple-step ahead time series forecasting. In, Becker,
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationReinforcement Learning with Reference Tracking Control in Continuous State Spaces
Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Joseph Hall, Carl Edward Rasmussen and Jan Maciejowski Abstract The contribution described in this paper is an algorithm
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationModelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information
5st IEEE Conference on Decision and Control December 0-3, 202 Maui, Hawaii, USA Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information Joseph Hall, Carl Rasmussen
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationGaussian process for nonstationary time series prediction
Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics
More informationGAS-LIQUID SEPARATOR MODELLING AND SIMULATION WITH GAUSSIAN PROCESS MODELS
9-3 Sept. 2007, Ljubljana, Slovenia GAS-LIQUID SEPARATOR MODELLING AND SIMULATION WITH GAUSSIAN PROCESS MODELS Juš Kocijan,2, Bojan Likar 3 Jožef Stefan Institute Jamova 39, Ljubljana, Slovenia 2 University
More informationGaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction
Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationGaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer
Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationMTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationOptimization of Gaussian Process Hyperparameters using Rprop
Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are
More informationGaussian Processes. 1 What problems can be solved by Gaussian Processes?
Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationGaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005
Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Presented by Piotr Mirowski CBLL meeting, May 6, 2009 Courant Institute of Mathematical Sciences, New York University
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationINFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES
INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,
More informationConfidence Estimation Methods for Neural Networks: A Practical Comparison
, 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.
More informationManaging Uncertainty
Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationoutput dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1
To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS
BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,
More informationGaussian Process for Internal Model Control
Gaussian Process for Internal Model Control Gregor Gregorčič and Gordon Lightbody Department of Electrical Engineering University College Cork IRELAND E mail: gregorg@rennesuccie Abstract To improve transparency
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationHow to build an automatic statistician
How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationVariational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationq is a function of the inputs corresponding to the same cases p and q. In general, a stationary (depends only on the distance between points in the in
DYAMIC SYSTEMS IDETIFICATI WITH GAUSSIA PRCESSES J. Kocijan ;, A. Girard, 3, B. Banko,R.Murray-Smith 3; Jozef Stefan Institute, Ljubljana, Slovenia ova GoricaPolytechnic, ova Gorica, Slovenia 3 University
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More information