Bayesian Inference (A Rough Guide)
|
|
- Morgan Sharp
- 5 years ago
- Views:
Transcription
1 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 Bayesian Inference (A Rough Guide) Anil C. Kokaram anil.kokaram@tcd.ie Electrical and Electronic Engineering Dept., University of Dublin, Trinity College. See for more information.
2 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 2 Bayesian Inference The best way to think about problem solving (my opinion) Well established in Signal Processing since late 1980 s Grew with the rapid increse in computational power of computers: enabled diabolical Monte Carlo techniques to become practical Well known in Statistics (see Rev. Bayes) and in fact just another rule of probability Bayesian Inference sometimes used as a euphamism for Using Probability to solve your problem. Important Texts : [1] Numerical Bayesian Inference, J. O Ruanaidh, [2] Numerical Recipes, [3] Image Analysis, Random Fields... by Wilson 1 Is extremely good.
3 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 3 Probability and Marbles Blue Green A B 40 Red 60 Yellow Red Yellow A B Outcomes or Realisations of the random variables A,B Consider Two random variables A, B, which each have two outcomes or two realisations (blue/green), (red/yellow) respectively Outcomes are generated by first selecting a marble from box A, then selecting a marble from a box B. But box B depends on what colour you select from box A.
4 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 4 Probability and Marbles 40 Blue Green 60 A Probability is just a number. A probability density function expresses the probability of all outcomes of a R.V. and must obey the following equation. Z p(a)da = 1 Red 10 Yellow Red Yellow B In our example Z p(a)da = p(a = blue) + p(a = green) = = 1.0 What is p(b = red A = blue)? The probability of realising or observing a red marble from r.v. B GIVEN that a blue marble was drawn from the probability distribution for r.v. A (i.e. a blue marble was observed or realised as the outcome from r.v. A). p(b = red A = blue) is the Conditional probability of observing a red marble from box B GIVEN a particular outcome from A. It is the probability of B conditioned on A. p(b = red, A = blue) is the JOINT probability of observing a red marble from B AND a blue marble from A p(b, A) is the JOINT probability distribution of A AND B
5 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 5 Probability and Marbles Outcomes or Realisations of the random variables A,B A B We can measure p(b = red A = blue), or p(b = red, A = blue), by simulating the system (with Matlab say) and observing the frequencies of the various outcomes. In this example of a set of simulated outcomes, p(b = red A = blue) = 3/5, p(b = red, A = blue) = 3/10. We would need lots of example outcomes to be sure of our measurements. But we can calculate these values using laws of probability
6 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 6 Probability and Marbles Blue Green A Red 10 Yellow Red Yellow B p(b = red A = blue) = 0.1 Just by reading it off from our model p(b = red, A = blue) = p(b = red A = blue)p(a = blue) = = 0.04 p(b = red A = blue) = p(b = red, A = blue) p(a = blue) = 0.04/.4 = 0.1 This is an important equation in conditional probability p(b A) = p(b, A) p(a) = p(a, B) p(a) p(a B) = p(a, B) p(b) = p(b, A) p(b)
7 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 7 Probability and Bayes Theorem 40 Blue Green 60 A Red 10 Yellow Red Yellow B Bayes theorem turns out to be just another law of conditional probability What is p(a = blue B = red)? Hmmm... we can t just read that off easily now eh? Its sort of upside down... Given that you have observed that B = red what is the probability that a blue marble was drawn from A? This is the kind of thing that you end up having to answer alot in signal processing. Given that the corrupted speech signal at this point is 0.6Volts, what is the probability that the actual signal is 1.0 Volts?
8 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 8 Probability and Bayes Theorem Blue Green A Red 10 Yellow Red Yellow B p(a = blue B = red) = p(b = red, A = blue) p(b = red) = p(b = red A = blue)p(a = blue) p(b = red) We can find p(b = red A = blue) easily now = p(b = red) This is Bayes Theorem p(b A) = p(a B)p(B) p(a) (1) It turns the potentially tricky problem of estimating p(b A) into the hopefully easier problem of estimating P (A B).
9 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 9 Probability and Bayes Theorem This is Bayes Theorem p(b A) = p(a B)p(B) p(a) (2) p(a B) is the Likelihood. p(b) is the Prior. p(a) is the normalising factor or sometimes used as Evidence. p(b A) is the Posterior distribution because you are asking questions after the fact.
10 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 10 Probability and Bayes Theorem 40 Blue Green 60 A Red 10 Yellow Red Yellow B p(a = blue B = red) = = p(b = red) = p(b = red A = blue)p(a = blue) p(b = red) p(b = red) = Z p(b = red, A)dA = p(b = red, A = blue) + p(b = red, A = green) = p(b = red A = blue)p(a = blue) + p(b = red A = green)p(a = green) This is marginalisation, or integrating out one of the variables from a joint distribution. Z p(b) = p(b, A)dA
11 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 11 Line Fitting Want to expose all details using v. simple example Given N (time) samples of observed data y n Observation model : y n = mn + c + e n e n N (0, σ 2 e), i.e. e n is Normal distributed Gaussian variable, zero mean, variance σ 2 e Want to estimate m assuming we know c, σ 2 e and given the observations, y n as well as the model above. Remember can assemble data into vector form y = mx + c + e Where x is [ ].
12 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 12 Line Fitting: Bayes Vs LS Bayes : Big Picture Choose m to maximimse p(m y, θ) θ = [c, σ 2 e] p(m y, θ) = p(y m, θ)p(m θ) p(y θ) p(y m, θ)p(m θ) Least Squares: Big Picture (for m, c) Minimise y mx c wrt m, c Differentiate, set to 0. 2 P Pk k2 k k Pk k P 5 4 m 5 k 1 = 4 c P k ky k P k y k Gives you m, c in one shot Doesn t give you σ 2 e unless you measure VAR(y ˆmx ĉ) 3 5
13 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 13 Likelihood p(m y, θ) p(y m, θ)p(m θ) y n = mn + c + e n The Likelihood typically arises directly from the observation model. The likelihood connects our parameters to the observations. Knowing m, θ it is e n that determines y n so p(y m, θ) = p(e) = p(e 1, e 2, e 3, e 4, e 5,...) But the noise at each sample is independent of all the other samples, so... = p(e 1 )p(e 2 )p(e 3 )p(e 4 )... = Π k p(e k )» 1 (yk mk c) 2 «= Π k p exp 2πσ 2 e = 1 (2πσ 2 e) N 2 2σ 2 e Pk exp (y k mk c) 2 «2σ 2 e
14 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 14 Prior for m The Prior reflects what we know about our parameters before we observe anything. It captures our life-knowledge or intuition or bias about what we feel the answer should be. It is a powerful idea because Bayes theorem is a recipe that shows us how to incorporate this information quantitatively into problem solving So there are many choices for the prior and they all depend on what you know anout your problem If you know nothing (and you never know nothing) then you may choose a UNIFORM prior i.e. p(m) = α say. If you feel m is betwen 0.1 and 0.4 then you may choose p(m) = 3(u(m 0.1) u(m 0.4)) Maybe you know m should be near some value m, so p(m) = 1 (m m)2 p exp ( 2πσ 2 m 2σm 2 )
15 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 15 Using a uniform prior for m Lets stupidly say we know nothing : then we are just doing Maximim Likelihood (ML) estimation p(m y, θ) p(y m, θ)p(m θ) = p(y m, θ)α 1 Pk exp (y k mk c) 2 «(2πσe 2) N 2 2σe 2 Pk exp (y k mk c) 2 «2σ 2 e
16 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 16 Choosing m by maximizing p(m y, θ) implies choosing m to minimise the following expression Pk E(m) = (y k mk c) 2 «2σe 2 E(m) m = X 2(y k ˆmk c)k = 0 k ˆm X k k 2 = X k k(y k c) ˆm = P k k(y k c) P k k2
17 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 17 Using a pulse prior for m Now let s say we know the range of values for m i.e. a box prior... Maximum A-Posterior (MAP) Estimation p(m y, θ) p(y m, θ)p(m θ) = p(y m, θ)f(m) Pk exp (y k mk c) 2 «2σ 2 e For m =.1 :.4 Choosing m by maximizing p(m y, θ) implies choosing m to minimise the following expression E(m) = Pk (y k mk c) 2 «2σ 2 e For m =.1 :.4 Tricky to impose that constraint in a closed form solution. So instead how about jus exhaustive search? Choose m =.1 :.001 :.4 say, and pick the m that gives the smallest E(m). Note how computationally expensive it is to evaluate the likelihood. This is because you typically have alot of data, and hence this is usually the main drain in solutions of this kind.
18 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 18 Using a Gaussian Prior for m Now let s say we already roughly know what m is.. up to some variance σ 2 m, Maximum A-Posteriori (MAP) Estimation p(m y, θ) p(y m, θ)p(m θ) P 1 = p(y m, θ) p 2πσ 2 (N/2) exp m Pk exp (y k mk c) 2 «2σ 2 e Pk exp (y k mk c) 2 2σ 2 e + k «(m m)2 σ 2 m Pk (m m)2 exp σ 2 m Pk (m m)2 σm 2 Choosing m by maximizing p(m y, θ) implies choosing m to minimise the following expression E(m) = σ 2 m X (y k mk c) 2 + X k k σ 2 e(m m) 2! ««
19 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 19 Using a Gaussian Prior for m Differentiating and solving... E(m) = σ 2 m! X X (y k mk c) 2 + σe 2 (m m) 2 k E(m) m = X X 2σ2 m (y k ˆmk c)k + 2σe 2 ( ˆm m) = 0 k k» X X ˆm σm 2 k 2 + kσe 2 = σm 2 k(y k c) + kσe 2 m k k P ˆm = σ2 m k (y k c)k + kσe 2 m» σm 2 P k k2 + kσe 2 k
20 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 20 Some experiments Matlab code available Experiment uses c = 3, m = tan(40 ), σ e = 10.0 Can set what you like for priors and see effect of priors on solution Derive an expression for p(c m, θ) using a Gaussian prior on c with variance σc 2 c. Derive the closed form solution for the estimate of c given the other parameters by maximizing p(c m, θ) that you have derived. and mean Write Matlab code that simulates a corrupted signal and calculates the estimate for c using c = 3, m = tan(20 ), σ e = 10.0, σ c = 3, c = 3 Generate many realisations for your estimate ĉ by using many realisations of the noise e k and solving for the parameter c. Hence generate a histogram of your estimates and show how close that distribution is to p(c m, θ) that you have derived. Using a laplacian prior on c, derive an expression for p(c m, θ) i.e. p(c) = 1 Z exp( k c ). The noise in the model is now laplacian with p(e n ) = 1 Z exp( k e n ), derive an expression for p(c m, θ) using the same prior as above.
21 Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 21 Final Comments Bayesian inference is only really useful if you are using a non-uniform prior The success of your solution always depends on your models and priors. Bayes cannot help you if you make bad choices. If you are making only Gaussian choices for your distributions then you might just be doing the same thing as a straightforward least squares/lagrange multiplier approach. In pictures, laplacian prior for noise (like dfd s and wavelet coefficients) is almost always better but inevitably very difficult to manipulate. We need to look at marginalisation, sampling, MCMC and priors suitable for 2D next
Markov Random Fields (A Rough Guide)
Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 Markov Random Fields (A Rough Guide) Anil C. Kokaram anil.kokaram@tcd.ie Electrical and Electronic Engineering Dept., University of Dublin,
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationAn example to illustrate frequentist and Bayesian approches
Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian
More informationRobotics. Lecture 4: Probabilistic Robotics. See course website for up to date information.
Robotics Lecture 4: Probabilistic Robotics See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review: Sensors
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMarkov Chain Monte Carlo
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationProbability and Independence Terri Bittner, Ph.D.
Probability and Independence Terri Bittner, Ph.D. The concept of independence is often confusing for students. This brief paper will cover the basics, and will explain the difference between independent
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLogistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics
Logistics Naïve Bayes & Expectation Maximization CSE 7 eam Meetings Midterm Open book, notes Studying See AIMA exercises Daniel S. Weld Daniel S. Weld 7 Schedule Selected opics Coming Soon Selected opics
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationIntroduc)on to Bayesian Methods
Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationCSC2515 Assignment #2
CSC2515 Assignment #2 Due: Nov.4, 2pm at the START of class Worth: 18% Late assignments not accepted. 1 Pseudo-Bayesian Linear Regression (3%) In this question you will dabble in Bayesian statistics and
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationEmpirical Risk Minimization is an incomplete inductive principle Thomas P. Minka
Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka February 20, 2001 Abstract Empirical Risk Minimization (ERM) only utilizes the loss function defined for the task and is
More informationInferring information about models from samples
Contents Inferring information about models from samples. Drawing Samples from a Probability Distribution............. Simple Samples from Matlab.................. 3.. Rejection Sampling........................
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationECE295, Data Assimila0on and Inverse Problems, Spring 2015
ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationCommunication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi
Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationUnivariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation
Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical
More informationStrong Lens Modeling (II): Statistical Methods
Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationBERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes
BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes 1702-61 1 The Power-law (Scale-free) Distribution N(>L) = K L (γ+1) (integral form) dn = (γ+1) K L γ dl (differential
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationThe Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016
The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods
More informationProbability Review. September 25, 2015
Probability Review September 25, 2015 We need a tool to 1) Formulate a model of some phenomenon. 2) Learn an instance of the model from data. 3) Use it to infer outputs from new inputs. Why Probability?
More informationBayesian Networks Structure Learning (cont.)
Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationBayesian Learning Features of Bayesian learning methods:
Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More information7.1 What is it and why should we care?
Chapter 7 Probability In this section, we go over some simple concepts from probability theory. We integrate these with ideas from formal language theory in the next chapter. 7.1 What is it and why should
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationIntroduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More informationWhy do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationInformation Science 2
Information Science 2 Probability Theory: An Overview Week 12 College of Information Science and Engineering Ritsumeikan University Agenda Terms and concepts from Week 11 Basic concepts of probability
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationPhysics 6720 Introduction to Statistics April 4, 2017
Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationStatistics and Data Analysis
Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data
More informationBayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I
Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I 1 Sum rule: If set {x i } is exhaustive and exclusive, pr(x
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationChapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 15 Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc. The General Addition Rule When two events A and B are disjoint, we can use the addition rule for disjoint events from Chapter
More informationConsider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.
CMSC 310 Artificial Intelligence Probabilistic Reasoning and Bayesian Belief Networks Probabilities, Random Variables, Probability Distribution, Conditional Probability, Joint Distributions, Bayes Theorem
More informationEstimation theory and information geometry based on denoising
Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about
More informationFourier and Stats / Astro Stats and Measurement : Stats Notes
Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing
More informationConfidence Intervals
Quantitative Foundations Project 3 Instructor: Linwei Wang Confidence Intervals Contents 1 Introduction 3 1.1 Warning....................................... 3 1.2 Goals of Statistics..................................
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More information