CS-E3210 Machine Learning: Basic Principles

Size: px
Start display at page:

Download "CS-E3210 Machine Learning: Basic Principles"

Transcription

1 CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) / 61

2 Today s introduction we still want to learn a continuous hypothesis function h(x (i) ) y (i) from N observed data points (x (1), y (1) ),..., (x (N), y (N) ) today we consider kernel regression and methods in kernel regression a non-linear hypothesis is learned in we learn a distribution of hypotheses instead of a single hypothesis 2 / 61

3 Outline 1D example 2D example 1 1D example 2D example / 61

4 Kernel smoothing 1D example 2D example, h(x) = = N i=1 y (i) K(x, x (i) ) N l=1 K(x, x(l) ) }{{} K(x,x (i) ) N y (i) K(x, x (i) ) i=1 normalised kernel K(x, x (i) ) sums to one over data the hypothesis becomes a weighted average of y (i) every data point becomes a basis point ( Lecture 3) ( ) we assume a gaussian kernel K σ (x, x ) = exp 1 x x 2 2 σ 2 4 / 61

5 Outline 1D example 2D example 1 1D example 2D example / 61

6 Kernel on 1D rent data 1D example 2D example ( kernel K σ (x, x (3) ) = exp 1 (x x (3) ) 2 2 neighbouring points σ 2 ) gives similarities to (univariate gaussian kernel function since x is scalar) 6 / 61

7 Kernel on 1D rent data 1D example 2D example ( kernel K σ (x, x (6) ) = exp 1 (x x (6) ) 2 2 neighbouring points σ 2 ) gives similarities to 7 / 61

8 Kernel on 1D rent data 1D example 2D example ( kernel K σ (x, x (9) ) = exp 1 (x x (9) ) 2 2 neighbouring points σ 2 ) gives similarities to 8 / 61

9 Kernel on 1D rent data 1D example 2D example normalised kernel K σ (x, x (9) ) = Kσ(x,x(9) ) N l=1 Kσ(x,x(l) ) scales similarities to percentages (note different color scale) 9 / 61

10 Kernel on 1D rent data 1D example 2D example kernel matrix and normalised (rows sum to one) kernel matrix of 11 data point inputs 10 / 61

11 Kernel on 1D rent data 1D example 2D example kernel regression h(x) = N i=1 y (i) Kσ (x, x (i) ) with σ = / 61

12 Kernel on 1D rent data 1D example 2D example kernel regression h(x) = N i=1 y (i) K σ (x, x (i) ) with σ = 50 we get linear regression as σ increases 12 / 61

13 Kernel on 1D rent data 1D example 2D example kernel regression h(x) = N i=1 y (i) K σ (x, x (i) ) with σ = we get constant hypothesis 13 / 61

14 Kernel on 1D rent data 1D example 2D example kernel regression h(x) = N i=1 y (i) Kσ (x, x (i) ) with σ = 5 14 / 61

15 Kernel on 1D rent data 1D example 2D example kernel regression h(x) = N i=1 y (i) K σ (x, x (i) ) with σ = 1 nearest neighbour smoothing as σ 0 15 / 61

16 Outline 1D example 2D example 1 1D example 2D example / 61

17 1D example 2D example 2D rent data with kernel regression kernel regression h(x) = N i=1 y (i) Kσ (x, x (i) ) with σ = 2 nearest neighbor model 17 / 61

18 1D example 2D example 2D rent data with kernel regression kernel regression h(x) = N i=1 y (i) K σ (x, x (i) ) with σ = 5 18 / 61

19 1D example 2D example 2D rent data with kernel regression kernel regression h(x) = N i=1 y (i) K σ (x, x (i) ) with σ = / 61

20 1D example 2D example 2D rent data with kernel regression kernel regression h(x) = N i=1 y (i) K σ (x, x (i) ) with σ = / 61

21 1D example 2D example 2D rent data with kernel regression kernel regression h(x) = N i=1 y (i) K σ (x, x (i) ) with σ = / 61

22 1D example 2D example 2D rent data with kernel regression kernel regression h(x) = N i=1 y (i) Kσ (x, x (i) ) with σ = 50 we get linear regression 22 / 61

23 1D example 2D example 2D rent data with kernel regression the data supports the hypothesis function or surface h(x) kernel regression interpolates observed outputs (eg. rents) based on input geometry (similarity) 23 / 61

24 Kernel method summary 1D example 2D example there are no parameters (like w) in kernel regression (!) h(x) = N i=1 y (i) K(x, x (i) ) N l=1 K(x, x(l) ) all datapoints act as parameters (N-dimensional model) the input features x (i) j are not directly weighted by parameters w j, instead they go inside the kernel the kernel can have hyperparameters, eg. variance σ 2 Lectures 7 & 8: selection of hyperparameters 3 pillars of ML: neural networks, Bayesian learning, kernel methods CS-E Kernel Methods in Machine Learning, / 61

25 1D example 2D example Parametric vs non-parametric methods a parametric method compresses all information in the dataset into a set of model parameters eg. linear regression parameters w, hypothesis h(x) = w T x model of size d smaller than the data Nd model is a low-rank explanation of the data model is often interpretable new prediction h(x (N+1) ) = w T x (N+1) does not depend on datapoints (x (i), y (i) ) a non-parametric method uses the dataset as its parameters the whole dataset needs to be stored, increased memory model has no interpretable parameters eg. NW kernel method where h(x (N+1) ) = N i=1 depends on all observed datapoints y (i) K(x (N+1), x (i) ) l K(x(N+1), x (l) ) 25 / 61

26 ID card of NW kernel regression 1D example 2D example input/feature space X = R d target space Y = R function family h(x) = N i=1 y (i) K θ (x, x (i) ) N l=1 K θ(x, x (l) ) multiple choice of kernel functions K θ (x, x (i) ) selection of kernel hyperparameters θ is crucial choosing kernel parameters to minimize empirical risk can lead to overfitting Lectures 7 & 8 26 / 61

27 Outline 1 1D example 2D example / 61

28 Statistical learning previously we studied regression as a deterministic function fitting (minimize empirical risk) we can also treat regression as a statistical problem of inferring distribution w p( ) of random variable parameter after observing data (X, y) The key concepts of statistics: Probability density function (pdf) p(θ) 0 Probability P(a θ b) = b p(θ)dθ [0, 1] a Expectation E p(θ) [g(θ)] = g(θ)p(θ)dθ R read DL book chapter 3! 28 / 61

29 Some common continuous distributions 29 / 61

30 Generative model let s consider the task of estimating the ratio θ [0, 1] of genders of newborn babies, given a dataset of N observed newborn genders y = {y (1),..., y (N) } with y (i) {0, 1} a global true ratio is perhaps θ 0.52 we assume that each birth results in a boy or a girl with a Bernoulli probability p(y (i) θ) = Ber(y (i) θ) = θ y (i) (1 θ) 1 y (i) 0. assume all births are independent, p(y (i), y (j) ) = p(y (i) )p(y (j) ) assume all births i follow the same Ber(θ) distribution parameter θ contains everything needed to compute probability of an observation p(y (i) θ) assume we observe N = 5 births y = (0, 0, 1, 0, 0) T 30 / 61

31 Data likelihood a data likelihood p(y θ) is the probability of seeing this data y assuming parameters θ, p(y θ) = = N Ber(y (i) θ) i=1 N i=1 θ y (i) 1 y (i) (1 θ) θ p(y θ) in maximum likelihood (ML) inference we maximise (log) likelihood θ ML = argmax log p(y θ) = 1 θ N N y (i) = 0.2 i=1 31 / 61

32 Prior distribution the ML gender ratio of θ ML = 0.2 is unlikely! if we add more data, the ML solution will surely converge to a reasonable value around θ 0.5 what if we don t have more data? we should also consider our prior beliefs on θ that would reject θ = 0.2 let s encode prior belief as a distribution p(θ) = Beta(θ α, β) let s subjectively choose α = β = 20 how to combine prior and likelihood? 32 / 61

33 Posterior distribution combines prior and likelihood Bayes rule: p(a B) = p(b A)p(A) p(b) hence p(θ y) = p(y θ)p(θ) p(y) gives the posterior distribution p(θ y) the evidence p(y) is constant wrt θ exactly what we wanted! maximum a posteriori (MAP) inference θ MAP = argmax θ p(θ y) }{{} = p(y θ)p(θ) 33 / 61

34 Bayesian estimators maximum likelihood estimator θ ML = argmax log p(y θ) θ maximum a posteriori estimator θ MAP = argmax log p(θ y) θ (maximum a priori is not used) 34 / 61

35 Data decreases parameter variance 35 / 61

36 Solving posteriors the crucial part of Bayesian modelling is solving the posterior naive solution: search θ that maximises posterior p(θ y) p(y θ)p(θ) not feasible if parameter space is large conjugate distributions are combinations of priors and likelihoods that have known analytical posterior distributions for instance, N p(y θ) = Ber(y (i) θ) i=1 p(θ) = Beta(α, β) p(θ y) = p(y θ)p(θ) p(y) = Beta ( α + N y (i), β + N i=1 ) N y (i) i=1 36 / 61

37 ID card for Bayesian modelling data set of observations y = (y (i) ) N i=1 define a generative probability model p(y θ) R + define a likelihood p(y θ) of observing dataset y given model θ define a prior belief p(θ) on parameter values solve the posterior belief p(θ y) = p(y θ)p(θ) p(y) p(y θ)p(θ) of parameters θ given observations y several likelihoods/priors have known posterior solutions 37 / 61

38 Outline 1 1D example 2D example / 61

39 Bayesian linear regression (BLR) recipe Bayesian linear regression has several differences to deterministic linear regression parameters w are modeled as distributions p(w) predictions h(x) are distributions p(y w) BLR recipe (1) we need to define a likelihood p(y w) (2) we need to define a prior p(w) (3) optimal parameters are represented by posterior p(w y) 39 / 61

40 Bayesian linear regression: data and hypothesis assume a 1D dataset X with x = (x (1),..., x (N) ) T R N and y = (y (1),..., y (N) ) T R N assume bias trick φ(x) = (1, x) T R 2 assume parameters w = (w 0, w 1 ) T R 2 and linear regression 1 h w (x) = w 0 + w 1 x = w j φ j (x) = w T φ(x) j=0 data feature matrix is Φ = (φ(x (1) ),..., φ(x (N) )) T R N 2 40 / 61

41 Bayesian linear regression: variance model assume variance model y = h w (x) + }{{}}{{} ε clean output perturbation ε N (0, σ 2 ) if A N (µ, σ 2 ), then a new random variable ca + d = B N (cµ + d, c 2 σ 2 ) for any real c, d this rule gives h w (x) }{{} d + ε }{{} A } {{ } B = y N (w T φ(x), σ 2 ) 41 / 61

42 Bayesian linear regression: (1) likelihood the likelihood is then p(y w, σ) = = N p(y (i) w, σ) i=1 N N (y (i) w T φ(x (i) ), σ 2 ) i=1 = N (y Φw, σ 2 I NN ) 1 = (2π) N σ 2 I exp( 1 2 (y Φw)T (σ 2 I ) 1 (y Φw)) ( exp 1 ) 2 (y Φw)T σ 2 (y Φw) 42 / 61

43 Bayesian linear regression: (2) prior bivariate Gaussian prior for parameters w = (w 0, w 1 ) T ( [ ] [ ] [ ] w p(w) = N 0 m0,0 σ 2, 0 0 ) ( w 1 m 0,1 0 σ 2 exp 1 ) 1 2 wt S0 1 w }{{}}{{}}{{} w m 0 S 0 43 / 61

44 Bayesian linear regression: (3) posterior the (conjugate) posterior is now (DL book 5.6.) p(w y, σ) = where p(y w, σ)p(w) p(y) p(y w, σ)p(w) = N (y Φw, σ 2 I )N (w m 0, S 0 ) ( exp 1 ) 2 (y Φw)T σ 1 (y Φw) ( = exp 1 ) 2 (w m N) T S 1 N (w m N) N (w m N, S N ) S N = (S σ 2 Φ T Φ) 1 m N = S N (S 1 0 m 0 + σ 2 Φ T y) ( exp 1 ) 2 wt S0 1 w 44 / 61

45 BLR summary: (1) likelihood, (2) prior and (3) posterior data likelihood p(y w, σ) = parameter prior N N (y (i) w T φ(x (i) ), σ 2 ) = N (y Φw, σ 2 I N ) i=1 parameter posterior where p(w y, σ) = p(w) = N (w m 0, S 0 ) p(y w, σ)p(w) p(y σ) = N (w m N, S N ) m N = S N (S 1 0 m 0 + σ 2 Φ T y) S N = (S σ 2 Φ T Φ) 1 45 / 61

46 BLR Prior (σ = 100) let s pick m 0 = (0, 0) T and σ 0 = 300 and σ 1 = 10 (why?) let s sample 5 parameters from the prior let s plot 5 hypotheses w (j) N (m 0, S 0 ), j = 1,..., 5 h (j) (x) = w (j)t φ(x), j = 1,..., 5 46 / 61

47 BLR first data point let s sample new 5 parameters from the posterior w (j) N (m 1, S 1 ) where m 1 = σ 2 S 1 Φ T 1:1 y 1:1 and S 1 = (α 2 I + σ 2 Φ T 1:1 Φ 1:1) 1 let s draw these 5 hypothesis h (j) (x) = w (j)t φ(x) 47 / 61

48 BLR two data points posterior after seeing (random) 2 out of 11 datapoints 48 / 61

49 BLR 3 data points 49 / 61

50 BLR 5 data points 50 / 61

51 BLR 7 data points 51 / 61

52 BLR all data points posterior has converged to p(w y, σ) = N (m 11, S 11 ) with m 11 = [ ] 441 = 7.2 [ ] E[w0 y, σ], S E[w 1 y, σ] 11 = [ 4185 ] / 61

53 BLR final model let s zoom the posterior and draw 100 samples the posterior represents all hypotheses that match data and our prior assumptions (!) which one should we predict with? all of them and take average! 53 / 61

54 BLR final model predictive hypothesis h(x) = E w p(w y) [h w (x)] = w T φ(x)n (w m N, S N )dw N (m T N φ(x), φ(x)t S N φ(x)) finally y(x) = h(x) + ε N (m T N φ(x), φ(x)t S N φ(x) + σ 2 ) 54 / 61

55 What did we learn? posterior p(w y, σ) is the belief in any specific parameter value w = (w 0, w 1 ) after observing data our solution is a distribution instead of a single value w instead of predicting with an optimal w MAP = argmax p(w y) = m N = (S0 1 +σ2 Φ T Φ) 1 (S0 1 m 0+σ 2 Φ T y) w we average predictions from all posterior solutions posterior concentrates as data increases predictive distribution averages over all hypotheses more robust model, no need to choose a parameter value there is rarely a true underlying parameter value, instead a continuum of compatible parameters 55 / 61

56 On priors a prior is a subjective distribution defined by the modeler Bayesian theory argues that the prior should be subjective, and represent our prior beliefs on which hypothesis are expected our analysis is thus not objective, but does not need to be! the posterior encodes our degree of belief on certain models/parameters given our very explicit prior assumptions, and the data wide prior is recommended: don t want to exclude hypotheses prior assigns a probability to each hypothesis in the hypothesis space priors can also be used as a way to constrain the parameter values not to have crazy extreme values, or to favour simple models Lectures 7 & 8 56 / 61

57 Regularising with priors prior that constrains parameters to nice values is a regulariser assume 1st data points is x (1) = 0.6, y (1) = 1, the likelihood is now an infinite band we assume huge values are clearly wrong, zero-mean prior encodes this assumption w 0 w 1 w 1x (1) + w / 61

58 ID card of input/feature space X = R d, target space Y = R, feature mapping φ(x) R n linear function and noise y = w T φ(x) + ε with ε N (0, σ 2 ) Normal prior: p(w) = N (w m 0, S 0 ) Normal likelihood: p(y w, σ) = N (y Φw, σ 2 I N ) Normal posterior: p(w y, σ) = N (w T m N, S N ) where m N = S N (S 1 0 m 0 + σ 2 Φ T y) S N = (S σ 2 Φ T Φ) 1 predictive posterior: h(x) N (m T N φ(x), φ(x)t S N φ(x) + σ 2 ) MAP solution w MAP = m N 3 pillars of ML: neural networks, Bayesian learning, kernel methods CS-E Machine Learning: Advanced Probabilistic Methods, CS-E Bayesian Data Analysis, / 61

59 Recap of course so far: hypothesis space hypothesis space of linear regression H = {h(x) = w T x where x R d } each unique parameter vector w gives a linear hypothesis h w (x) hence the number of possible hypotheses is the number of d-dimensional real valued vectors (infinite!) hypothesis space grows with input dimension d in hypothesis space does not change, but hypotheses h w ( ) have prior probabilities p(h w ) = p(w) the concept of hypothesis space will be discussed in Lectures 7 & 8 59 / 61

60 Recap of course so far: loss function quantify error we make when predicting h(x (i) ) for i th datapoint when its true value was y (i) square loss function L((x, y), h( )) = (y h(x)) 2 empirical risk is average square error over dataset E(h w (x) X, y) = 1 N N L((x (i), y (i) ), h( )) = (y (i) h(x (i) )) 2 i=1 hypothesis that minimizes empirical risk has optimal fit to data X, y in Bayesian learning loss function is likelihood 60 / 61

61 Next steps next lecture: Classification I on at 10:15 DL book: read chapters 5.5 and 5.6 more information on kernel methods Hastie s book: chapter 6.1 (+ 6.2 & 6.3) Bishop s book: chapter 6.3 Bayesian linear regression Bishop s book: chapter 3.3 remember the post-lecture feedback questionnaire for this lecture 61 / 61

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

T Machine Learning: Basic Principles

T Machine Learning: Basic Principles Machine Learning: Basic Principles Bayesian Networks Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 17. Bayesian inference; Bayesian regression Training == optimisation (?) Stages of learning & inference: Formulate model Regression

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0. Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Learning with Probabilities

Learning with Probabilities Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior

More information

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information