Learning Gaussian Process Models from Uncertain Data

Size: px
Start display at page:

Download "Learning Gaussian Process Models from Uncertain Data"

Transcription

1 Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada Abstract. It is generally assumed in the traditional formulation of supervised learning that only the outputs data are uncertain. However, this assumption might be too strong for some learning tasks. This paper investigates the use of Gaussian Process prior to infer consistent models given uncertain data. By assuming a Gaussian distribution with known variances over the inputs and a Gaussian covariance function, it is possible to marginalize out the inputs uncertainty and keep an analytical posterior distribution over functions. We demonstrated the properties of the method on a synthetic problem and on a more realistic one, which consist in learning the dynamics of the well-known cart-pole problem and compare the performance versus a classic Gaussian Process. A large improvement of the mean squared error is presented as well as the consistency of the result of the regression. Key words: Gaussian Processes, Noisy Inputs, Dynamical Systems 1 Introduction As soon as a regression has to be done using a statistical model on noisy inputs, the resulting quality of the estimated model may suffer if no attention is paid to the uncertainty of the training set. Actually, this may occur in two different ways, one due to the training with noisy inputs and the other due to an extra noise in the outputs caused by the noise over the inputs. Statisticians already have investigated this problem in several ways: total leastsquares [1] changes the cost of the regression problem to encourage the regressor to minimize both error due to noise on outputs as well as noise on inputs; the errorin-variables model [2] deals directly with noisy inputs by creating correlated virtual variables that thus have correlated noises. Recent work in machine learning has also addressed this problem, either by attempting to learn the entire input distribution [3], by integrating over chosen noisy points using estimated distribution during training [4] or by de-noising the inputs by accounting for the noise while training the model [5]. In this paper, we investigate an approach, pioneered by Girard [6], in which more than trying to predict using noisy inputs, we learn from these inputs by marginalizing out the inputs uncertainty and keep an analytical posterior distribution over functions. This approach achieves two goals: First it shows that we are able to learn and make

2 2 Learning Gaussian Process Models from Uncertain Data prediction from noisy inputs. Second, this method is applied to a well-known problem of balancing a pole over a cart where the problem is to learn the 5-dimensional nonlinear dynamics of the system. Results show that taking into account the uncertainty of the inputs make the regression consistent and reduce drastically the mean squared error. This paper is structured as follows. First, we formalize the problem of learning with noisy inputs and introduce some notations about Gaussian Processes and the regression model. In section 3, we present the experimental results on a difficult artificial problem and on a more realistic problem. Section 4 discusses the results and concludes the paper. 2 Preliminaries A Gaussian Process (GP) is a stochastic process which is used in machine learning to describe a distribution directly into the function space. It also provides a probabilistic approach to the learning task and has the interesting property to give uncertainty estimates while doing predictions. The interested reader is invited to refer to [7] for more information on GPs. 2.1 Gaussian Process regression By using a GP prior, it is assumed that the joint distribution of the finite set of observations given their inputs is multivariate Gaussian. Thus, a GP is fully specified by its mean and covariance functions. Assume that a set of training data D = {x i, y i } N i=1 is available where x i R D, y is a scalar observation such that y i = f(x i ) + ɛ i (1) and where ɛ i is a white Gaussian noise. For convenience, we will use the notation X = [x 1,..., x N ] for inputs and y = [y 1,..., y N ] for outputs. Under the GP prior model with zero mean function, the joint distribution of the training set is y X N (0, K) where K is the covariance matrix whose entries K ij are given by the covariance function C(x i, x j ). This multivariate Gaussian probability distribution over the training observations can be used to compute the posterior distribution over functions. Therefore, making prediction is done by using the posterior mean and its associated measure of uncertainty, given by the posterior covariance. For a test input x, the posterior distribution is f x, X, y N (µ(x ), σ 2 (x )) with mean and variance functions given by µ(x ) = k K 1 y (2) σ 2 (x ) = C(x, x ) k K 1 k (3) where k is the N 1 vector of covariance between x and training inputs X. Although many covariance functions can be used to define a GP prior, we will use for the reminder of this paper the squared exponential which is one of the most widely used kernel function. The chosen kernel function C(x i, x j ) = σ 2 f exp((x i x j ) W 1 (x i x j )) + σ 2 ɛ δ ij (4)

3 Learning Gaussian Process Models from Uncertain Data 3 is parameterized by a vector of hyperparameters θ = [W, σ 2 f, σ2 ɛ ], where W is the diagonal matrix of characteristic length-scale, which account for different covariance measure for each input dimension, σ 2 f is the signal variance and σ2 ɛ is the noise variance. Varying these hyperparameters influence the interpretation of the training data by modifying the shapes of functions allowed by the GP prior. It might be difficult a priori to fix the hyperparameters of a kernel function and expect these to fit the observed data correctly. A common way to estimate the hyperparameters is to maximize the log likelihood of the observations y [7]. The function to maximize is log p(y X, θ) = 1 2 y K 1 y 1 2 log K N 2 log 2π (5) since the joint distribution of the observations is a multivariate Gaussian. The maximization can be done using conjugate gradient methods to find an acceptable local maxima. 2.2 Learning with uncertain inputs As we suit in the introduction, the assumption that only the outputs are noisy is not enough for some learning task. Consider the case where the inputs are uncertain and where each input value comes with variance estimates. It has been shown by Girard [6] that, for normally distributed inputs and using the squared exponential as kernel function, integrate over the input distribution analytically is feasible. Consider the case where inputs are a set of Gaussian distributions rather than a set of point estimates. Therefore, the true input value x i is not observable, but we have access to its distribution N (u i, Σ i ). Thus, accounting for these inputs distributions is done by solving C n = C(x i, x j )p(x i )p(x j )dx i dx j (6) where p(x i ) = N (u i, Σ i ) and p(x j ) = N (u j, Σ j ). Since [8] involve integrations over products of Gaussians, the resulting kernel function is computed exactly with C n ((u i, Σ i ), (u j, Σ j )) = σ2 f exp((u i u j ) (W + Σ i + Σ j ) 1 (u i u j )) + σ I + W 1 (Σ i + Σ j ) 1 ɛ 2 δ ij 2 (7) which is again a squared exponential 1. It is easy to see that this new kernel function is a generalization of [3] by letting the covariance matrix of both inputs tends to zero. Hence, it is possible to learn from a combination of noise-free and uncertain inputs. Theoretically, learning from uncertain data is as difficult as in the noise-free case, although it might require more data. The posterior distribution over function is found using the same equations by using the new covariance function. The hyperparameters can be learned with the log-likelihood as well, but it is now riddled with many local maxima. Using standard conjugate gradient methods will quickly lead to a local maxima that might not explain the data properly. An improper local maxima which occurs often 1 In fact, the noise term is not a part of the integration since it models an independent noise process, and thus it remains in the new kernel.

4 4 Learning Gaussian Process Models from Uncertain Data is to interpret the observations as highly noisy. In this case, the matrix W tends to have large values on its diagonal, meaning that most dimensions are irrelevant, and the value of σ 2 ɛ is over estimated to transpose the input error in the output dimensions. A solution to prevent this difficulty is to find a maximum a posteriori (MAP) estimation of the hyperparameters. Placing a prior over the hyperparameters will thus act as a regularization term to prevent improper local maxima. In the experiments, we chose to use a prior of the exponential family in order to get a simpler log posterior function to maximize. 3 Experiments In our experiments, we compare the performance of the Gaussian Process using inputs uncertainty () and the standard Gaussian Process () which use only the point estimates. We first evaluate the behavior of each method on a one-dimensional synthetic problem and then compare their performances on a harder problem which consists in learning the nonlinear dynamics of a cart-pole system. 3.1 Synthetic Problem: Sincsig In order to be able to easily visualize the behavior of both GPs prior, we have chosen a one-dimensional function for the first learning example. The function is composed of a sinc and a sigmoid function as y = { sinc(x) if x [1 + exp( 10x 5)] otherwise (8) and we will refer to it as the Sincsig function. The evaluation has been conducted on randomly drawn training sets of different sizes. We uniformly sampled N inputs in [ 10, 10] which are the noise-free inputs {x i } N i=1. The observations set is then constructed by sampling each output according to y i N (sincsig(x i ), σy). 2 The computation of the uncertain inputs is done by sampling the noise σx 2 i to be applied on each input. For each noise-free x i, we sampled the noisy input according to u i N (x i, σx 2 i ). It is easy to see that x i u i, σx 2 i N (u i, σx 2 i ) and therefore we have a complete training set which is defined as D = {(u i, σx 2 i ), y i } N i=1. Figure 1 show a typical example of a training data set (crosses), with the real function to be regressed (solid line) and the result of the regression (thin line) for the (top) and the classic GP (bottom). Error bars indicate that the is not consistent with the data since it does not take into account the noise s variance on inputs. The first experiment was conducted with an output noise standard deviation σ y = 0.1 with different size of training sets. The input noises standard deviation σ xi were sampled uniformly in [0.5, 2.5]. We chose these standard deviations so that adding artificially some independent noise during the optimisation process over the outputs can explain the noise over the inputs. All comparisons of the and the has been done by training both with the same random data sets 2. Figure 2(a) shows the 2 Note that the standard Gaussian Process regression does not use the variances of the inputs.

5 Learning Gaussian Process Models from Uncertain Data Fig. 1. The Sincsig function with and regressions (a) Mean Squared Error. σ y = (b) Mean Squared Error. σ y (0.5, 2.5) Fig. 2. Results on the Sincsig Problem

6 6 Learning Gaussian Process Models from Uncertain Data averaged mean square error over 25 randomly chosen training sets for different values of N. Results show that when very few data are available, both processes explain the outputs with lot of noise over the outputs. As expected, when the size of the data set increases, the optimized its hyperparameters so as to explain the noisy inputs by very noisy outputs while the correctly explain the noise on the inputs and selects the less noisy so as to minimize the mean squared error. In the second experiment, in order to emphasize the impact of noisy inputs, we assumed that the Gaussian processes now know the noise s variance on the observations. Therefore, the noise hyperparameters σɛ 2 is set to zero since the processes exactly know the noise matrix to be added when computing the covariance matrix. For each output, the standard deviation σ yi is then uniformly sampled in [0.2, 0.5]. Figure 2(b) shows the performance of and. Not allowing to explain noisy data by the independent noise process has two effects: First, it does not allow the to explain noisy inputs by noisy outputs when only few data are available, and it also forces the to use the information on input variance whatever the size of the data set is. Let us now see what the results on a real nonlinear dynamical system. 3.2 The Cart Pole Problem We now consider the harder problem of learning the cart pole dynamics. Figure 3 gives a picture of the system from which we try to learn the dynamics. The state is defined by the position (ϕ) of the cart, its velocity ( ϕ), the pole s angle (α) and its angular velocity ( α). There is also a control input which is used to apply lateral forces on the cart. Following the equation in [9] to govern the dynamics, we used Euler s method to update the system s state: Fig. 3. The cart-pole balancing problem α = ( F mpl α g sin α + cos α 2 sin α ( ) 4 l 3 mp cos2 α m c+m p m c+m p ) ϕ = F + m pl( α 2 sin α α sin α) m c + m p Where g is the gravity force, F the force associated to the action, l the half-length of the cart, m p the mass of the pole and m c the mass of the cart. For this problem, the training set were sampled exactly as in the Sincsig case. Stateaction pairs were uniformly sampled on their respective domains. The outputs were obtained with the true dynamical system and then perturbed with sampled noises assumed known. Since the output variances are also known, the training set can be seen

7 Learning Gaussian Process Models from Uncertain Data 7 as Gaussian input distributions that map to Gaussian output distributions. Therefore, one might use a sequence of Gaussian belief state as its training set in order to learn a partially observable dynamical system. Following this idea, there is no reason for the output distributions to have a significantly smaller variance Position Position then the input distribution. Velocity Velocity In this experiment, the input and output noises standard deviation were uniformly 25 sampled in [0.5, 2.5] for each dimensions. Every output dimensions 25 were treated independently by using a Gaussian Process prior for each of them. Figure 4 shows the averaged mean square error over 25 randomly chosen training sets for different N values for each dimension Number Number of training of training data data Number Number of training of training data data Position Position Number of training of data data (a) Pole Position Pole Angle Angle Velocity Pole Velocity Pole Angle Angle Number of of training data Number Number of training of training data data (b) Velocity Angular Velocity 8 8 (c) Pole Angle Angular Angular Velocity Velocity Number of training of training data data (d) Angular Velocity Fig. 4. Mean Squared Error results on the Cart-pole problem Learning the kernel hyperparameters As stated at the end of Section 2.2, it is possible to learn the hyperparameters given a training set. Since conjugate gradient methods performed poorly for the optimization of the log likelihood in the cases, we preferred stochastic optimization methods for this task. In every experiments, we thus maximized the log posterior instead of the log likelihood. A gamma Γ (2, 1) prior distributions as been placed over all characteristic length-scale terms in W and a normal N (0, 1) prior distribution as been placed over the signal standard deviation σ f. Comparing to previous work on the subject [6, 10] which use isotropic hyperparameters in the kernel function, we applied automatic relevance determination, that improves considerably the performance while does not increase the complexity of the kernel. 4 Discussion Results for the synthetic problem are presented in Figure 1, 2(a) and 2(b). These results first show that using the knowledge of the noise on the inputs improve the consistency of the regression more than the standard Gaussian Process since the error assumed by the includes completely the real function while the one of does not. Second, the is also able to discriminate which noise comes from the input and

8 8 Learning Gaussian Process Models from Uncertain Data which one come from the output as denoted in Figure 2(a) and 2(b). As the does not assume any noise on the input, it always assumes that the noise comes from the outputs, and thus learns a large hyperparameter for the noise, that also augments its mean squared error. Problems of this approach come as soon as an optimisation of the hyperparameters have to be done. Indeed, the log-likelihood function is riddled of local maxima that cannot be avoided using classic gradient methods. An interesting avenue would be to look at natural gradient approaches [11]. Another future work concerns the application of this work to the learning of continuous Hidden Markov Models, as well as continuous POMDPs by using the belief state as a noisy input [12]. To conclude, we proposed a Gaussian Process Model for regression that is able to learn with noise on the inputs and on the outputs as well as to predict with less mean squared error than permitted by previous approaches while keeping consistent with the true function. Results on a synthetic problem explain the advantages of the methods while results on the cart-pole problem show the applicability of the approach to the learning of real nonlinear dynamical systems, largely outperforming previous methods. References 1. Golub, G., Loan, C.V.: An Analysis of the Total Least Squares problem. SIAM J. Numer. Anal. 17 (1980) Caroll, R., Ruppert, D., Stefanski, L.: Measurement Error in Nonlinear Models. Chapman and Hall (1995) 3. Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an EM approach. In: NIPS. (1993) Tresp, V., Ahmad, S., Neuneier, R.: Training Neural Networks with Deficient Data. In: NIPS. (1993) Quiñonero-Candela, J., Roweis, S.T.: Data imputation and robust training with gaussian processes (2003) 6. Girard, A.: Approximate Methods for Propagation of Uncertainty with Gaussian Process Model. PhD thesis, University of Glasgow, Glasgow, UK (2004) 7. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press (December 2006) 8. Girard, A., Rasmussen, C.E., Quiñonero-Candela, J., Murray-Smith, R.: Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting. In: NIPS. (2002) Florian, R.: Correct Equations for the Dynamics of the Cart-pole System. Technical report, Center for Cognitive and Neural Studies (2007) 10. Quiñonero-Candela, J.: Learn ing with Uncertainty - Gaussian Processes and Relevance Vector Machines. PhD thesis, Technical University of Denmark, Denmark (2004) 11. Roux, N.L., Manzagol, P.A., Bengio, Y.: Topmoumoute Online Natural Gradient Algorithm. In: NIPS. (2008) Dallaire, P., Besse, C., Chaib-draa, B.: GP-POMDP: Bayesian Reinforcement Learning in Continuous POMDPs with Gaussian Processes. In: Proc. of IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems. (2009) To appear.

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Glasgow eprints Service

Glasgow eprints Service Girard, A. and Rasmussen, C.E. and Quinonero-Candela, J. and Murray- Smith, R. (3) Gaussian Process priors with uncertain inputs? Application to multiple-step ahead time series forecasting. In, Becker,

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Joseph Hall, Carl Edward Rasmussen and Jan Maciejowski Abstract The contribution described in this paper is an algorithm

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information

Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information 5st IEEE Conference on Decision and Control December 0-3, 202 Maui, Hawaii, USA Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information Joseph Hall, Carl Rasmussen

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

GAS-LIQUID SEPARATOR MODELLING AND SIMULATION WITH GAUSSIAN PROCESS MODELS

GAS-LIQUID SEPARATOR MODELLING AND SIMULATION WITH GAUSSIAN PROCESS MODELS 9-3 Sept. 2007, Ljubljana, Slovenia GAS-LIQUID SEPARATOR MODELLING AND SIMULATION WITH GAUSSIAN PROCESS MODELS Juš Kocijan,2, Bojan Likar 3 Jožef Stefan Institute Jamova 39, Ljubljana, Slovenia 2 University

More information

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Presented by Piotr Mirowski CBLL meeting, May 6, 2009 Courant Institute of Mathematical Sciences, New York University

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, China E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Confidence Estimation Methods for Neural Networks: A Practical Comparison , 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,

More information

Gaussian Process for Internal Model Control

Gaussian Process for Internal Model Control Gaussian Process for Internal Model Control Gregor Gregorčič and Gordon Lightbody Department of Electrical Engineering University College Cork IRELAND E mail: gregorg@rennesuccie Abstract To improve transparency

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

q is a function of the inputs corresponding to the same cases p and q. In general, a stationary (depends only on the distance between points in the in

q is a function of the inputs corresponding to the same cases p and q. In general, a stationary (depends only on the distance between points in the in DYAMIC SYSTEMS IDETIFICATI WITH GAUSSIA PRCESSES J. Kocijan ;, A. Girard, 3, B. Banko,R.Murray-Smith 3; Jozef Stefan Institute, Ljubljana, Slovenia ova GoricaPolytechnic, ova Gorica, Slovenia 3 University

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information