Bayesian Inference (A Rough Guide)

Size: px
Start display at page:

Download "Bayesian Inference (A Rough Guide)"

Transcription

1 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 Bayesian Inference (A Rough Guide) Anil C. Kokaram anil.kokaram@tcd.ie Electrical and Electronic Engineering Dept., University of Dublin, Trinity College. See for more information.

2 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 2 Bayesian Inference The best way to think about problem solving (my opinion) Well established in Signal Processing since late 1980 s Grew with the rapid increse in computational power of computers: enabled diabolical Monte Carlo techniques to become practical Well known in Statistics (see Rev. Bayes) and in fact just another rule of probability Bayesian Inference sometimes used as a euphamism for Using Probability to solve your problem. Important Texts : [1] Numerical Bayesian Inference, J. O Ruanaidh, [2] Numerical Recipes, [3] Image Analysis, Random Fields... by Wilson 1 Is extremely good.

3 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 3 Probability and Marbles Blue Green A B 40 Red 60 Yellow Red Yellow A B Outcomes or Realisations of the random variables A,B Consider Two random variables A, B, which each have two outcomes or two realisations (blue/green), (red/yellow) respectively Outcomes are generated by first selecting a marble from box A, then selecting a marble from a box B. But box B depends on what colour you select from box A.

4 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 4 Probability and Marbles 40 Blue Green 60 A Probability is just a number. A probability density function expresses the probability of all outcomes of a R.V. and must obey the following equation. Z p(a)da = 1 Red 10 Yellow Red Yellow B In our example Z p(a)da = p(a = blue) + p(a = green) = = 1.0 What is p(b = red A = blue)? The probability of realising or observing a red marble from r.v. B GIVEN that a blue marble was drawn from the probability distribution for r.v. A (i.e. a blue marble was observed or realised as the outcome from r.v. A). p(b = red A = blue) is the Conditional probability of observing a red marble from box B GIVEN a particular outcome from A. It is the probability of B conditioned on A. p(b = red, A = blue) is the JOINT probability of observing a red marble from B AND a blue marble from A p(b, A) is the JOINT probability distribution of A AND B

5 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 5 Probability and Marbles Outcomes or Realisations of the random variables A,B A B We can measure p(b = red A = blue), or p(b = red, A = blue), by simulating the system (with Matlab say) and observing the frequencies of the various outcomes. In this example of a set of simulated outcomes, p(b = red A = blue) = 3/5, p(b = red, A = blue) = 3/10. We would need lots of example outcomes to be sure of our measurements. But we can calculate these values using laws of probability

6 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 6 Probability and Marbles Blue Green A Red 10 Yellow Red Yellow B p(b = red A = blue) = 0.1 Just by reading it off from our model p(b = red, A = blue) = p(b = red A = blue)p(a = blue) = = 0.04 p(b = red A = blue) = p(b = red, A = blue) p(a = blue) = 0.04/.4 = 0.1 This is an important equation in conditional probability p(b A) = p(b, A) p(a) = p(a, B) p(a) p(a B) = p(a, B) p(b) = p(b, A) p(b)

7 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 7 Probability and Bayes Theorem 40 Blue Green 60 A Red 10 Yellow Red Yellow B Bayes theorem turns out to be just another law of conditional probability What is p(a = blue B = red)? Hmmm... we can t just read that off easily now eh? Its sort of upside down... Given that you have observed that B = red what is the probability that a blue marble was drawn from A? This is the kind of thing that you end up having to answer alot in signal processing. Given that the corrupted speech signal at this point is 0.6Volts, what is the probability that the actual signal is 1.0 Volts?

8 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 8 Probability and Bayes Theorem Blue Green A Red 10 Yellow Red Yellow B p(a = blue B = red) = p(b = red, A = blue) p(b = red) = p(b = red A = blue)p(a = blue) p(b = red) We can find p(b = red A = blue) easily now = p(b = red) This is Bayes Theorem p(b A) = p(a B)p(B) p(a) (1) It turns the potentially tricky problem of estimating p(b A) into the hopefully easier problem of estimating P (A B).

9 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 9 Probability and Bayes Theorem This is Bayes Theorem p(b A) = p(a B)p(B) p(a) (2) p(a B) is the Likelihood. p(b) is the Prior. p(a) is the normalising factor or sometimes used as Evidence. p(b A) is the Posterior distribution because you are asking questions after the fact.

10 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 10 Probability and Bayes Theorem 40 Blue Green 60 A Red 10 Yellow Red Yellow B p(a = blue B = red) = = p(b = red) = p(b = red A = blue)p(a = blue) p(b = red) p(b = red) = Z p(b = red, A)dA = p(b = red, A = blue) + p(b = red, A = green) = p(b = red A = blue)p(a = blue) + p(b = red A = green)p(a = green) This is marginalisation, or integrating out one of the variables from a joint distribution. Z p(b) = p(b, A)dA

11 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 11 Line Fitting Want to expose all details using v. simple example Given N (time) samples of observed data y n Observation model : y n = mn + c + e n e n N (0, σ 2 e), i.e. e n is Normal distributed Gaussian variable, zero mean, variance σ 2 e Want to estimate m assuming we know c, σ 2 e and given the observations, y n as well as the model above. Remember can assemble data into vector form y = mx + c + e Where x is [ ].

12 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 12 Line Fitting: Bayes Vs LS Bayes : Big Picture Choose m to maximimse p(m y, θ) θ = [c, σ 2 e] p(m y, θ) = p(y m, θ)p(m θ) p(y θ) p(y m, θ)p(m θ) Least Squares: Big Picture (for m, c) Minimise y mx c wrt m, c Differentiate, set to 0. 2 P Pk k2 k k Pk k P 5 4 m 5 k 1 = 4 c P k ky k P k y k Gives you m, c in one shot Doesn t give you σ 2 e unless you measure VAR(y ˆmx ĉ) 3 5

13 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 13 Likelihood p(m y, θ) p(y m, θ)p(m θ) y n = mn + c + e n The Likelihood typically arises directly from the observation model. The likelihood connects our parameters to the observations. Knowing m, θ it is e n that determines y n so p(y m, θ) = p(e) = p(e 1, e 2, e 3, e 4, e 5,...) But the noise at each sample is independent of all the other samples, so... = p(e 1 )p(e 2 )p(e 3 )p(e 4 )... = Π k p(e k )» 1 (yk mk c) 2 «= Π k p exp 2πσ 2 e = 1 (2πσ 2 e) N 2 2σ 2 e Pk exp (y k mk c) 2 «2σ 2 e

14 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 14 Prior for m The Prior reflects what we know about our parameters before we observe anything. It captures our life-knowledge or intuition or bias about what we feel the answer should be. It is a powerful idea because Bayes theorem is a recipe that shows us how to incorporate this information quantitatively into problem solving So there are many choices for the prior and they all depend on what you know anout your problem If you know nothing (and you never know nothing) then you may choose a UNIFORM prior i.e. p(m) = α say. If you feel m is betwen 0.1 and 0.4 then you may choose p(m) = 3(u(m 0.1) u(m 0.4)) Maybe you know m should be near some value m, so p(m) = 1 (m m)2 p exp ( 2πσ 2 m 2σm 2 )

15 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 15 Using a uniform prior for m Lets stupidly say we know nothing : then we are just doing Maximim Likelihood (ML) estimation p(m y, θ) p(y m, θ)p(m θ) = p(y m, θ)α 1 Pk exp (y k mk c) 2 «(2πσe 2) N 2 2σe 2 Pk exp (y k mk c) 2 «2σ 2 e

16 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 16 Choosing m by maximizing p(m y, θ) implies choosing m to minimise the following expression Pk E(m) = (y k mk c) 2 «2σe 2 E(m) m = X 2(y k ˆmk c)k = 0 k ˆm X k k 2 = X k k(y k c) ˆm = P k k(y k c) P k k2

17 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 17 Using a pulse prior for m Now let s say we know the range of values for m i.e. a box prior... Maximum A-Posterior (MAP) Estimation p(m y, θ) p(y m, θ)p(m θ) = p(y m, θ)f(m) Pk exp (y k mk c) 2 «2σ 2 e For m =.1 :.4 Choosing m by maximizing p(m y, θ) implies choosing m to minimise the following expression E(m) = Pk (y k mk c) 2 «2σ 2 e For m =.1 :.4 Tricky to impose that constraint in a closed form solution. So instead how about jus exhaustive search? Choose m =.1 :.001 :.4 say, and pick the m that gives the smallest E(m). Note how computationally expensive it is to evaluate the likelihood. This is because you typically have alot of data, and hence this is usually the main drain in solutions of this kind.

18 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 18 Using a Gaussian Prior for m Now let s say we already roughly know what m is.. up to some variance σ 2 m, Maximum A-Posteriori (MAP) Estimation p(m y, θ) p(y m, θ)p(m θ) P 1 = p(y m, θ) p 2πσ 2 (N/2) exp m Pk exp (y k mk c) 2 «2σ 2 e Pk exp (y k mk c) 2 2σ 2 e + k «(m m)2 σ 2 m Pk (m m)2 exp σ 2 m Pk (m m)2 σm 2 Choosing m by maximizing p(m y, θ) implies choosing m to minimise the following expression E(m) = σ 2 m X (y k mk c) 2 + X k k σ 2 e(m m) 2! ««

19 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 19 Using a Gaussian Prior for m Differentiating and solving... E(m) = σ 2 m! X X (y k mk c) 2 + σe 2 (m m) 2 k E(m) m = X X 2σ2 m (y k ˆmk c)k + 2σe 2 ( ˆm m) = 0 k k» X X ˆm σm 2 k 2 + kσe 2 = σm 2 k(y k c) + kσe 2 m k k P ˆm = σ2 m k (y k c)k + kσe 2 m» σm 2 P k k2 + kσe 2 k

20 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 20 Some experiments Matlab code available Experiment uses c = 3, m = tan(40 ), σ e = 10.0 Can set what you like for priors and see effect of priors on solution Derive an expression for p(c m, θ) using a Gaussian prior on c with variance σc 2 c. Derive the closed form solution for the estimate of c given the other parameters by maximizing p(c m, θ) that you have derived. and mean Write Matlab code that simulates a corrupted signal and calculates the estimate for c using c = 3, m = tan(20 ), σ e = 10.0, σ c = 3, c = 3 Generate many realisations for your estimate ĉ by using many realisations of the noise e k and solving for the parameter c. Hence generate a histogram of your estimates and show how close that distribution is to p(c m, θ) that you have derived. Using a laplacian prior on c, derive an expression for p(c m, θ) i.e. p(c) = 1 Z exp( k c ). The noise in the model is now laplacian with p(e n ) = 1 Z exp( k e n ), derive an expression for p(c m, θ) using the same prior as above.

21 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 21 Final Comments Bayesian inference is only really useful if you are using a non-uniform prior The success of your solution always depends on your models and priors. Bayes cannot help you if you make bad choices. If you are making only Gaussian choices for your distributions then you might just be doing the same thing as a straightforward least squares/lagrange multiplier approach. In pictures, laplacian prior for noise (like dfd s and wavelet coefficients) is almost always better but inevitably very difficult to manipulate. We need to look at marginalisation, sampling, MCMC and priors suitable for 2D next

Markov Random Fields (A Rough Guide)

Markov Random Fields (A Rough Guide) Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 Markov Random Fields (A Rough Guide) Anil C. Kokaram anil.kokaram@tcd.ie Electrical and Electronic Engineering Dept., University of Dublin,

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

An example to illustrate frequentist and Bayesian approches

An example to illustrate frequentist and Bayesian approches Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian

More information

Robotics. Lecture 4: Probabilistic Robotics. See course website for up to date information.

Robotics. Lecture 4: Probabilistic Robotics. See course website   for up to date information. Robotics Lecture 4: Probabilistic Robotics See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review: Sensors

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Probability and Independence Terri Bittner, Ph.D.

Probability and Independence Terri Bittner, Ph.D. Probability and Independence Terri Bittner, Ph.D. The concept of independence is often confusing for students. This brief paper will cover the basics, and will explain the difference between independent

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Logistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics

Logistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics Logistics Naïve Bayes & Expectation Maximization CSE 7 eam Meetings Midterm Open book, notes Studying See AIMA exercises Daniel S. Weld Daniel S. Weld 7 Schedule Selected opics Coming Soon Selected opics

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Introduc)on to Bayesian Methods

Introduc)on to Bayesian Methods Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

CSC2515 Assignment #2

CSC2515 Assignment #2 CSC2515 Assignment #2 Due: Nov.4, 2pm at the START of class Worth: 18% Late assignments not accepted. 1 Pseudo-Bayesian Linear Regression (3%) In this question you will dabble in Bayesian statistics and

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka February 20, 2001 Abstract Empirical Risk Minimization (ERM) only utilizes the loss function defined for the task and is

More information

Inferring information about models from samples

Inferring information about models from samples Contents Inferring information about models from samples. Drawing Samples from a Probability Distribution............. Simple Samples from Matlab.................. 3.. Rejection Sampling........................

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

ECE295, Data Assimila0on and Inverse Problems, Spring 2015 ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes

BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes 1702-61 1 The Power-law (Scale-free) Distribution N(>L) = K L (γ+1) (integral form) dn = (γ+1) K L γ dl (differential

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods

More information

Probability Review. September 25, 2015

Probability Review. September 25, 2015 Probability Review September 25, 2015 We need a tool to 1) Formulate a model of some phenomenon. 2) Learn an instance of the model from data. 3) Use it to infer outputs from new inputs. Why Probability?

More information

Bayesian Networks Structure Learning (cont.)

Bayesian Networks Structure Learning (cont.) Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

7.1 What is it and why should we care?

7.1 What is it and why should we care? Chapter 7 Probability In this section, we go over some simple concepts from probability theory. We integrate these with ideas from formal language theory in the next chapter. 7.1 What is it and why should

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Information Science 2

Information Science 2 Information Science 2 Probability Theory: An Overview Week 12 College of Information Science and Engineering Ritsumeikan University Agenda Terms and concepts from Week 11 Basic concepts of probability

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Physics 6720 Introduction to Statistics April 4, 2017

Physics 6720 Introduction to Statistics April 4, 2017 Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I 1 Sum rule: If set {x i } is exhaustive and exclusive, pr(x

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 15 Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc. The General Addition Rule When two events A and B are disjoint, we can use the addition rule for disjoint events from Chapter

More information

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes. CMSC 310 Artificial Intelligence Probabilistic Reasoning and Bayesian Belief Networks Probabilities, Random Variables, Probability Distribution, Conditional Probability, Joint Distributions, Bayes Theorem

More information

Estimation theory and information geometry based on denoising

Estimation theory and information geometry based on denoising Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

Confidence Intervals

Confidence Intervals Quantitative Foundations Project 3 Instructor: Linwei Wang Confidence Intervals Contents 1 Introduction 3 1.1 Warning....................................... 3 1.2 Goals of Statistics..................................

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information