Bayesian Inference. Chapter 2: Conjugate models

Size: px
Start display at page:

Download "Bayesian Inference. Chapter 2: Conjugate models"

Transcription

1 Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 1 / 40

2 Objective In this class we study the situations when Bayesian statistics is easy! Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 2 / 40

3 Conjugate models Yesterday we looked at a coin tossing example. We found that a particular, beta prior distribution lead to a beta posterior. This is an example of a conjugate family of prior distributions. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 3 / 40

4 Coin tossing problems In coin tossing problems, the likelihood function has the form f (x θ) = cθ x (1 θ) n x where x is the number of observed heads, n is the number of observed tosses and c is a constant determined by the experimental design. Therefore, it is clear that a beta prior f (θ) = implies that the posterior is also beta: 1 B(a, b) θa 1 (1 θ) b 1 f (θ x) θ a+x 1 (1 θ) b+n x 1 = 1 B(a + x, b + n x) θa+x 1 (1 θ) b+n x 1 θ Beta(a + x, b + n x) Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 4 / 40

5 Advantages of conjugate priors I: simplicity of calculation Using a beta prior in this context has a number of advantages. Given that we know the properties of the beta distribution, prior to posterior inference is equivalent to changing of parameter values. Prediction is also straightforward in the same way. If θ Beta(a, b) and X θ Binomial(n, θ), then ( ) n B(a + x, b + n x) x P(X = x) = B(a, b) for x = 0,..., n. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 5 / 40

6 Advantages of conjugate priors II: interpretability We can see that a (b) in the prior plays the same role as x (n x). Therefore we can think of the information represented by the prior as equivalent to the information in n tosses of the coin with x heads and n x tails. This gives one way of thinking about how to elicit sensible values for a and b. To how many tosses of a coin and how many heads does my prior information equate? A problem is that people are often overconfident. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 6 / 40

7 Prior elicitation The previous method is a little artificial. If we are asking a real expert to provide information it is better to ask questions about observable quantities. For example: What would be the average number of heads to occur in 100 tosses of the coin? What about the standard deviation? Then assuming a beta prior, we can solve a µ = 100 a + b ab σ = 100 (a + b)(a + b + 1) Many people don t understand means and standard deviations so it could be even better to ask about modes or medians or quartiles. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 7 / 40

8 Haldane s prior Recalling the role of a and b also gives a reasonable way of defining a default, non-informative prior by letting a, b 0. In this case we have a prior distribution f (θ) 1 θ(1 θ) for 0 < θ < 1 and the posterior is θ x Beta(x, n x), with mean E[θ x] = x n = ˆθ, the MLE. This prior is improper! Should we care? What if we only observe a sample of heads (tails)? Then the posterior would be improper too! This is a big problem in modern Bayesian statistics. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 8 / 40

9 Other ways of choosing a default objective prior Given the Principle of Insufficient Reason we saw yesterday, a uniform prior seems a natural selection. However, if we know nothing about θ, shouldn t we also know nothing about ϑ = log for example? θ 1 θ If θ Uniform(0, 1), then the laws of probability imply that the density of ϑ is which is clearly not uniform. f (ϑ) = e ϑ (1 + e ϑ ) 2 Uniform priors are sensible as default options for discrete variables but here, it is not so clear. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 9 / 40

10 Jeffreys prior Let X θ f ( θ). Then the Jeffreys prior is f (θ) I (θ) [ ] d where I (θ) = E 2 X dθ log f (X θ) is the expected Fisher information. 2 Let X θ Binomial(n, θ). Then the Jeffreys prior is θ Beta ( 1 2, 1 2). Let X θ Negative Binomial(r, θ). The Jeffreys prior is f (θ) 1 θ(1 θ) 1/2. The prior depends on the experimental design. This doesn t comply with the stopping rule principle! There is no truly objective prior! Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 10 / 40

11 Example The following plot gives the posterior densities of θ for our coin tossing example given the Haldane (blue), uniform (green), Jeffreys I (red) and Jeffreys II (brown) priors f theta Posterior means for θ are 0.75, 0.714, and 0.72 respectively. In small samples the prior can make a (small) difference... Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 11 / 40

12 Example f theta... but in our Chinese babies example, it is impossible to differentiate between the posteriors and the posterior means are all equal to to 4 d.p. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 12 / 40

13 Advantages of conjugate priors III: mixtures are still conjugate A single beta prior might not represent prior beliefs well. A mixture of k (sufficiently many) betas can. The posterior is still a mixture of k betas. Suppose we set f (θ) = 0.5Beta(5, 5) + 0.5Beta(8, 1) in the coin tossing problem of yesterday. Then, given the observed data, we have f (θ x) ] θ 9 (1 θ) [ B(5, 5) θ5 1 (1 θ) B(8, 1) θ8 1 (1 θ) B(5, 5) θ14 1 (1 θ) B(8, 1) θ17 1 (1 θ) 11 1 where w = = wbeta(14, 8) + (1 w)beta(17, 11) B(14,8)/B(5,5) B(14,8)/B(5,5)+B(17,11)/B(8,1). Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 13 / 40

14 Example The plot shows the prior (black), scaled likelihood (blue) and posterior (red) density f theta Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 14 / 40

15 When do conjugate priors exist? Conjugate priors are associated with exponential family distributions. f (x θ) = C(x)D(θ) exp ( E(x) T F(θ) ) A conjugate prior is then Given a sample of size n, f (θ) D(θ) a exp(b T F(θ)) f (θ x) D(θ) a+n exp((b + nē) T F(θ)) where Ē = 1 n n i=1 E(x i) is the vector of sufficient statistics. Letting a, b 0 gives a natural, objective prior. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 15 / 40

16 Rare events models Consider models associated with rare events (Poisson process). The likelihood function takes the form: f (x θ) = cθ n e xθ where n represents the number of events to have occurred in a time period of length x and c depends on the experimental design. Therefore, a gamma distribution θ Gamma(a, b), that is f (θ) = ba Γ(a) θa 1 e bθ for 0 < θ < is conjugate. The posterior distribution is then θ x Gamma(a + n, b + x). Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 16 / 40

17 The information in the prior is easily interpretable: a represents the prior equivalent of the number of rare events to occur in a time period of length b. Letting a, b 0 gives the natural default prior f (θ) 1 θ. (This is the Jeffreys prior for exponential data but not for Poisson data). In this case, given n observed events in time x, the posterior is θ x Gamma(n, x), with mean n x which is equal to the MLE in experiments of this type. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 17 / 40

18 f Example: Software failure data The CSIAC database provides data showing the times between 136 successive software failures. The diagram shows a histogram of the data and a classical, plug in estimator (blue) of the predictive distribution of x as well as the Bayesian posterior given a Jeffreys prior (red). The Bayesian and classical predictors are indistinguishable x Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 18 / 40

19 Example: Inference for a queueing system The M/M/1 queuing system assumes arrivals occur according to a Poisson process with rate λ. There is a single server. Service occurs on a first come first served basis. Service times are exponential with mean service time 1/µ. The system is stable if ρ = λ µ < 1. In this case, the equilibrium distribution of the number of people in the system, N, is geometric: N Geometric(1 ρ). Time spent in the system by an arriving customer, W Exponential(µ λ). Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 19 / 40

20 Example Hall (1991) provides collected inter-arrival and service time data for 98 users of an automatic teller machine in Berkeley, California. We shall assume that the interarrival times and service times both follow exponential distributions. The sufficient statistics were n a = n s = 98 and x a = and x s = minutes. Given default priors for λ, µ, the posterior distributions are λ x Gamma(98, ) µ x Gamma(98, 81.35). Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 20 / 40

21 It is easy to calculate the posterior probability that the system is stable remembering that the ratio of two χ 2 distributions divided by their degrees of freedom is F distributed. ( P(ρ < 1 x) = P = P ) x ) ρ < ( F < = Given this is so high, it makes sense to consider the equilibrium distributions. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 21 / 40

22 p n F w Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 22 / 40

23 Normal models Consider a sample from a normal distribution X µ, σ Normal ( µ, σ 2). The likelihood function is ( f (x µ, σ) σ n 2 exp 1 [ (n 1)s 2 2σ 2 + n( x µ) 2] ) Rewrite in terms of the precision, τ = 1 σ 2. Then ( f (x µ, τ) τ n 2 exp τ [ (n 1)s 2 + n( x µ) 2]) 2 Define f (µ, τ) = f (τ)f (τ µ) and assume τ Gamma ( a 2, ) b 2 and µ τ Normal ( ) m, 1 cτ. The marginal distribution of µ is a (scaled, shifted) Student s t. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 23 / 40

24 A posteriori, we have ( ) cm + n x µ τ, x Normal c + n, 1 (c + n)τ ( a + n τ Gamma 2, b + (n 1)s2 + cn 2 c+n ) (m x)2 The conditional posterior precision is the sum of prior precision (cτ) and precision of the MLE (nτ). The posterior mean is a weighted average of the prior mean (m) and the MLE ( x). A default prior is obtained by letting a, b, c 0 which implies f (µ, τ) 1 τ and ( ) 1 µ τ, x Normal x, nτ ( ) n 1 (n 1)s2 τ x Gamma, 2 2 Then µ x s/ n x Student s t n 1 (boring) Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 24 / 40

25 One sample example The normal core body temperature of a healthy adult is supposed to be 98.6 degrees Fahrenheit or 37 degrees Celsius on average. A normal model for temperatures, say X µ, τ Normal(µ, 1/τ), has been proposed. Mackowiak et al (1992) measured the core body temperatures of 130 individuals with mean. The sample mean temperature is x = Fahrenheit with standard deviation s = Thus, a classical 95% confidence interval for µ is ± / 130 = ( , ) and the hypothesis that the true mean is equal to 98.6 is rejected. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 25 / 40

26 Consider a prior for µ centred on 98.6, for example µ τ Normal(98.6, 1/τ) with f (τ) 1/τ. The posterior mean for µ is Fahrenheit and a 95% credible interval is ( , ) so that there still appears to be evidence against the hypothesis. Also, the classical plug in density for X (blue) and the Bayesian posterior predictive density (red) are almost identical f x Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 26 / 40

27 An odd feature of the conjugate prior The model precision and the prior precision of the distribution of µ are both proportional to the model precision, τ. This may be restrictive and unrealistic in practical applications. A more natural prior for µ might be Normal ( m, 1 c ) independent of τ. Then, the joint posterior distribution looks nasty. ( f (µ, τ x) τ a+n 2 1 exp τ [ b + (n 1)s 2 + n( x µ) 2] c [µ m]2) 2 2 What can we do? Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 27 / 40

28 In our problem, both conditional posterior distributions are available: ( ) cm + nτ x 1 µ τ, x Normal, c + nτ (c + nτ) ( a + n τ µ, x Gamma 2, b + (n 1)s2 + n( x µ) 2 ) 2 Both these distributions are straightforward to sample from. Can we use this to give a Monte Carlo sample from the posterior? Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 28 / 40

29 Introduction to Gibbs sampling A Gibbs sampler is a technique for sampling a multivariate distribution when it is straightforward to sample from the conditionals. Assume that we have a distribution f (θ) where θ = (θ 1,..., θ k ). Let θ i represent the remaining elements of θ when θ i is removed. Assume that we can sample from θ i θ i. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 29 / 40

30 The Gibbs sampler The Gibbs sampler proceeds by starting from (arbitrary) initial values and successively sampling the conditional distributions. ( 1 Set initial values θ (0) = 2 For i = 1,..., k: 1 Generate θ (t+1) i 2 Set θ (t) = θ (t+1) i 3 t = t Go to 2. θ i θ (t) θ (t) i. i. θ (0) 1,..., θ(0) k ). Set t = 0. As t, the sampled values approach a simple Monte Carlo sample from f (θ). Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 30 / 40

31 f The example revisited Consider now that we use independent priors, µ Normal(98.6, 1) f (τ) 1 τ Then an estimated 95% posterior interval for µ, based on a sample of size is ( , ), very similar to the previous case. The diagram shows the estimated posterior density (green) and the posterior given the conjugate prior (red) Both densities are very similar mu Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 31 / 40

32 Two samples: the Behrens Fisher problem For most simple one and two sample problems, when the usual default prior for µ, τ is used, posterior means and intervals for µ coincide with their frequentist counterparts. An exception is the following two sample problem: Consider the model X µ 1, τ 1 N ) ) (µ 1, 1τ1, Y µ 2, τ 2 N (µ 2, 1τ2 with priors f (µ i, τ i ) 1 τ i and independent samples of size n i for i = 1, 2. Then, and similarly for µ 2. Therefore, if δ = µ 1 µ 2, we have µ 1 x s 1 / n 1 Student s t(n 1 1) δ = x ȳ + s 1 n1 T 1 s 2 n2 T 2 Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 32 / 40

33 The distribution of δ is a scaled, shifted difference of two Student s t variables. Quantiles,... can be calculated to a given precision by e.g. Monte Carlo. Writing δ = δ/ gives δ = sin wt 1 + cos wt 2 where s 2 1 n 1 + s2 1 n 1 w = tan 1 s 1/ n 1 s 2/ n 2, a Behrens Fisher distribution. This problem is difficult to solve classically. Usually a t approximation to the sampling distribution of δ is used, but... the quality of the approximation depends on the true variance ratio. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 33 / 40

34 f Example Returning to the normal body temperature example, the histograms indicate there may be a difference between the sexes. Men Women f x x The sample means are and respectively. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 34 / 40

35 An approximate 95% confidence interval for the mean difference is ( , ) suggesting that the true mean for women is higher than that for men. Using the Bayesian approach as earlier (based on simulated values), we have an estimate of the posterior density of δ f A Bayesian 95% credible interval is estimated as ( , ). delta Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 35 / 40

36 Multinomial models The multinomial distribution is the extension of the binomial distribution to dice throwing problems. Assume a dice with k faces and probability θ i for face i is thrown n times. Let X be a k 1 vector such that X i is the number of times face i occurs. Then P(X = x θ) = x! k i=1 x i! k i=1 θ x i i, where x = (x 1,..., x k ), x i Z + and k i=1 x i = n and 0 θ i 1, k i=1 θ i = 1. Consider a Dirichlet prior, θ Dirichlet(a), where a = (a 1,..., a k ) and a i > 0. f (θ) = Γ( k i=1 a i) k k i=1 Γ(a θ a i 1 i. i) Then θ x Dirichlet (a + x). i=1 Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 36 / 40

37 Example After the recent abdication of the King of Spain in favour of his son, 20minutos.es launched a survey asking whether this was the correct decision, (X 1 = 3698 votes) whether the King should have waited longer (X 2 = 347) or whether he should have considered other options such as a referendum (X 3 = 2446). 1 Let θ = (θ 1, θ 2, θ 3 ) and assume a Dirichlet (1/2, 1/2, 1/2) prior. The posterior distribution is Dirichlet(3698.5, 347.5, ). Consider the difference θ 1 θ 3, reflecting (?) the difference between Monarchists and Republicans. We have E[θ 1 θ 3 x] = f theta1 theta3 1 Votes at 12:30 on 5th May Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 37 / 40

38 The Dirichlet process prior Sometimes, we do not wish to assume a parametric model for the data generating distribution. How can we do this in a Bayesian context? Assume X F F and define a Dirichlet process prior for F. If the support of X is C, then for any partition, C = C 1 C 2... C k and k N, we suppose that (F (C 1 ), F (C 2 ),..., F (C k )) Dirichlet (af 0 (C 1 ), af 0 (C 2 ),..., af 0 (C k )) where a > 0 and F 0 is a baseline, prior mean c.d.f. We write F Dirichlet process(a, F 0 ). Given a sample, x 1,..., x n, we have F x Dirichlet process where ˆF is the empirical c.d.f. ( a + n, af 0 + nˆf a + n The posterior mean is a weighted average of the empirical c.d.f. and the prior mean. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 38 / 40 )

39 Example The following plot shows the prior (green), posterior (red), empirical (blue) and true (black) c.d.f.s when 20 data were generated from a Beta(2,1) distribution and a Dirichlet process prior with a = 5 and F 0 a uniform distribution were used. F x Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 39 / 40

40 Summary and next chapter In this chapter we have illustrated the basic properties of conjugate models. When these exist, they allow for simple interpretation and straightforward inference. Unfortunately, conjugate priors do not always exist, for example if data are t or F distributed. Then we need numerical techniques like Gibbs sampling. We study these in more detail in the next chapter. Conchi Ausín and Mike Wiper Conjugate models Masters Programmes 40 / 40

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Conjugate Priors, Uninformative Priors

Conjugate Priors, Uninformative Priors Conjugate Priors, Uninformative Priors Nasim Zolaktaf UBC Machine Learning Reading Group January 2016 Outline Exponential Families Conjugacy Conjugate priors Mixture of conjugate prior Uninformative priors

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Lecture 6. Prior distributions

Lecture 6. Prior distributions Summary Lecture 6. Prior distributions 1. Introduction 2. Bivariate conjugate: normal 3. Non-informative / reference priors Jeffreys priors Location parameters Proportions Counts and rates Scale parameters

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling 2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Introduc)on to Bayesian Methods

Introduc)on to Bayesian Methods Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Exam 2 Practice Questions, 18.05, Spring 2014

Exam 2 Practice Questions, 18.05, Spring 2014 Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution. The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let

More information

Advanced Herd Management Probabilities and distributions

Advanced Herd Management Probabilities and distributions Advanced Herd Management Probabilities and distributions Anders Ringgaard Kristensen Slide 1 Outline Probabilities Conditional probabilities Bayes theorem Distributions Discrete Continuous Distribution

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

18.05 Practice Final Exam

18.05 Practice Final Exam No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

The Random Variable for Probabilities Chris Piech CS109, Stanford University

The Random Variable for Probabilities Chris Piech CS109, Stanford University The Random Variable for Probabilities Chris Piech CS109, Stanford University Assignment Grades 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Frequency Frequency 10 20 30 40 50 60 70 80

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics MAS3301 Bayesian Statistics M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9 1 11 Conjugate Priors IV: The Dirichlet distribution and multinomial observations 11.1

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

A Discussion of the Bayesian Approach

A Discussion of the Bayesian Approach A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns

More information

Bernoulli and Poisson models

Bernoulli and Poisson models Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics MAS331 Bayesian Statistics M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 28-9 1 9 Conjugate Priors II: More uses of the beta distribution 9.1 Geometric observations 9.1.1

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1 The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Bayesian Inference for Binomial Proportion

Bayesian Inference for Binomial Proportion 8 Bayesian Inference for Binomial Proportion Frequently there is a large population where π, a proportion of the population, has some attribute. For instance, the population could be registered voters

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

More information

Intro to Bayesian Methods

Intro to Bayesian Methods Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office

More information

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 12 MCMC for Bayesian computation II March 1, 2013 J. Olsson Monte Carlo-based

More information

Probability and Statistics Concepts

Probability and Statistics Concepts University of Central Florida Computer Science Division COT 5611 - Operating Systems. Spring 014 - dcm Probability and Statistics Concepts Random Variable: a rule that assigns a numerical value to each

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Statistical Theory MT 2006 Problems 4: Solution sketches

Statistical Theory MT 2006 Problems 4: Solution sketches Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Conjugate Priors: Beta and Normal Spring 2018

Conjugate Priors: Beta and Normal Spring 2018 Conjugate Priors: Beta and Normal 18.05 Spring 018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = θ on [0,1]. Data: heads on one toss. Question: Find

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Weakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 1.1 The Probability Model...1 1.2 Finite Discrete Models with Equally Likely Outcomes...5 1.2.1 Tree Diagrams...6 1.2.2 The Multiplication Principle...8

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Bayesian Inference for Regression Parameters

Bayesian Inference for Regression Parameters Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

Weakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

(3) Review of Probability. ST440/540: Applied Bayesian Statistics Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional

More information

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information