Introduction to Bayesian thinking

Size: px
Start display at page:

Download "Introduction to Bayesian thinking"

Transcription

1 Introduction to Bayesian thinking Statistics seminar Rodrigo Díaz Geneva Observatory, April 11th, 2016

2 Agenda (I) Part I. Basics. General concepts & history of Bayesian statistics. Bayesian inference. Part II. Likelihoods. Constructing likelihoods. Part III. Priors. General concepts and problems. Subjective vs objective priors. Assigning priors.

3 Agenda (II) Part IV. Posterior. Conjugate priors. The right prior for a given likelihood. MCMC. Part V. Model comparison. Model evidence and Bayes factor. The philosophy of integrating over parameter space. Estimating the model evidence. Dos and Don ts Point estimate: BIC. Importance sampling and a family of methods. Nested sampling.

4 Part I basics

5 Statistical inference Fig. adapted from Gregory (2005) Deductive inference (predictions) Testable hypothesis (theory) Observations (data) Statistical inference (hyp. testing, parameter estimation)

6 Statistical inference requires a probability theory Frequentist Bayesian

7 Thomas Bayes ( ) First appearance of the product rule (the base for the Bayes theorem; An Essay towards solving a Problem in the Doctrine of Chances). p(h i I, D) = p(d H i,i) p(d I) p(h i I) Pierre-Simon Laplace ( ) Wide application of the Bayes' rule. Principle of insufficient reason (non-informative priors). Primitive version of the Bernstein von Mises theorem. Laplace s inverse probability is largely rejected for ~100 years. The reign of frequentist probability. Fischer, Pearson, etc.

8 Harold Jeffreys ( ) Objective Bayesian probability revived. Jeffreys rule for priors. (1940s s) R. T. Cox George Pólya E. T. Jaynes Plausible reasoning. Reasoning with uncertainty. Probability theory as an extension of Aristotelian logic. The product and sum rules deduced for basic principles. MAXENT priors. See E.T Jaynes. Probability Theory: The Logic of Science.

9 Statistical inference Fig. adapted from Gregory (2005) Deductive inference (predictions) Testable hypothesis (theory) Observations (data) Statistical inference (hyp. testing, parameter estimation)

10 The three desiderata Rules of deductive reasoning: strong syllogisms from Aristotelian logic. Brain works using plausible inference. Woman in wood cabin story. The rules for plausible reasoning are deduced from three simple desiderata on how the mind of a thinking robot should work (Cox- Póyla-Jaynes). I. Degrees of Plausibility are represented by real numbers. II. Qualitative Correspondence with common sense. III. If a conclusion can be reasoned out in more than one way; then every possible way must lead to the same result.

11 if degrees of plausibility are represented by real numbers, then there is a uniquely determined set of quantitative rules for conducting inference That is any, other rules whose results conflict with them will necessarily violate an elementary -and nearly inescapable- desideratum of rationality or consistency. But the final result was just the standard rules of probability theory, given already by Bernoulli and Laplace, so why all the fuss? The important new feature was that these rules were now seen as uniquely valid principles of logic in general, making no reference to chance or random variables ; so their range of application is vastly greater than had been supposed in the conventional probability theory that was developed in the early 20th century. As a result, the imaginary distinction between probability theory and statistical inference disappears and the field achieves not only logical unity and simplicity but far greater technical power and flexibility in applications.

12 The basic rules of probability theory. Sum p(a + B) =p(a)+p(b) p(ab) A B Product p(ab) =p(a B)p(B) =p(b A)p(A) Once these rules are recognised as general ways of manipulating any preposition, a world of applications opens up. Aristotelian deductive logic is the limiting form of our rules for plausible reasoning E.T. Jaynes

13 if degrees of plausibility are represented by real numbers, then there is a uniquely determined set of quantitative rules for conducting inference That is any, other rules whose results conflict with them will necessarily violate an elementary -and nearly inescapable- desideratum of rationality or consistency. But the final result was just the standard rules of probability theory, given already by Bernoulli and Laplace, so why all the fuss? The important new feature was that these rules were now seen as uniquely valid principles of logic in general, making no reference to chance or random variables ; so their range of application is vastly greater than had been supposed in the conventional probability theory that was developed in the early 20th century. As a result, the imaginary distinction between probability theory and statistical inference disappears and the field achieves not only logical unity and simplicity but far greater technical power and flexibility in applications.

14 The Bayesian recipe Sum p(a + B) =p(a)+p(b) p(ab) Product p(ab) =p(a B)p(B) =p(b A)p(A) Bayes theorem p(a B) = p(b A) p(a) p(b)

15 The Bayesian recipe Law of Total Probability p(a I) = X i p(a B i, I)p(B i I) for {Bi} exclusive and exhaustive B2 B3 B1 B4 A B5 B6

16 The Bayesian recipe Sum p(a + B) =p(a)+p(b) p(ab) Product p(ab) =p(a B)p(B) =p(b A)p(A) Bayes theorem Law of Total Probability p(b A) p(a) p(a B) = p(b) X p(a I) = p(a B i, I)p(B i I) i for {Bi} exclusive and exhaustive

17 The Bayesian recipe Sum p(a + B) =p(a)+p(b) p(ab) Product p(ab) =p(a B)p(B) =p(b A)p(A) Bayes theorem Law of Total Probability p(b A) p(a) p(a B) = p(b) X p(a I) = p(a B i, I)p(B i I) i for {Bi} exclusive and exhaustive Normalisation X i p(b i I) =1 for {Bi} exclusive and exhaustive

18 "Bayes", "Bayesian", MCMC

19

20 Two basic tasks of statistical inference Learning process (parameter estimation) Decision making (model comparison)

21 Learning process Bayesian probability represents a state of knowledge : parameter vector Hi: hypothesis I: information p( H i, I) Prior D: data Discrete space (hypothesis space) p( D, H i, I) Posterior p(h i I) p(h i I, D)

22 Learning process : parameter vector Hi: hypothesis I: information D: data Enter the likelihood function p( H i, I, D) = p(d, H i, I) p(d H i, I) Posterior p( H i, I) Prior p( D, H i, I) / L (H i ) p( H i, I) The proportionality constant has many names: marginal likelihood, global likelihood, model evidence, prior predictive. Hard to compute. Important.

23 Optimising the learning process The likelihood needs to be selective for the learning process to be effective. Prior p(h 0 M 1,I ) Likelihood p(d H 0,M 1,I ) Likelihood p(d H 0,M 1,I ) Prior p(h 0 M 1,I ) Dominated by prior information Posterior p(h 0 D,M 1,I ) Parameter H 0 P. Gregory (2005) Posterior p(h 0 D,M 1,I ) Parameter H 0 Dominated by data

24 Decision making Bayes theorem is also the base for model comparison p(h i I, D) = p(d H i, I) p(d I) p(h i I) Z but now p(d H i, I) = p(d, H i, I)p( H i, I) d n Computation of the evidence cannot be escaped.

25 Decision making Model comparison consists in computing the ratio of the posteriors (odds ratio) of two competing hypotheses. p(h i D, I) p(h i I,D) = p(d H i,i) p(h i I) p(h j I,D) p(d H j,i) p(h j I) p(h j D, I) Bayes factor Prior odds

26 Part II likelihoods

27 The likelihood function : parameter vector Hi: hypothesis I: information D: data p( H i, I, D) = p(d, H i, I) p(d H i, I) Posterior p( H i, I) Prior p( D, H i, I) / L (H i ) p( H i, I) The likelihood is the probability of obtaining data D, for a given prior information I and a set of parameters θ. Remember, likelihood is not a probability for parameter vector θ (for that you need the prior)

28 Optimising the learning process The likelihood needs to be selective for the learning process to be effective. Prior p(h 0 M 1,I ) Likelihood p(d H 0,M 1,I ) Likelihood p(d H 0,M 1,I ) Prior p(h 0 M 1,I ) Dominated by prior information Posterior p(h 0 D,M 1,I ) Parameter H 0 P. Gregory (2005) Posterior p(h 0 D,M 1,I ) Parameter H 0 Dominated by data

29 Likelihood function : parameter vector Hi: hypothesis I: information D: data Ingredients Physical model Analytic model Simulations Statistical (non-deterministic) model Unknown errors (jitter) Instrument systematics Complex physics (activity, ) Error statistics Covariances Non-Gaussianity p(d, H i, I) =L (H i ) indep.,gauss. / exp 2 2

30 Constructing the likelihood The data: D = D 1 D 2... D n = {D i } Di: the i-th measurement is in the infinitesimal range yi to yi + dyi The errors: Ei: the i-th error is in the infinitesimal range ei to ei + dei p(e i, H, I) =f E (e i ) The probability distribution of statement Ei Most used f E The model: f E (e i )=N(0, 2 i ) Mi: the i-th error is in the infinitesimal range mi to mi + dmi p(m i, H, I) =f M (m i ) The probability distribution of statement Mi

31 Constructing the likelihood The data: D = D 1 D 2... D n = {D i } We want to build the probability distribution: p(d, H, I) =p(d 1,D 2,...,D n, H, I) Write: y i = m i + e i It can be shown that: p(d i, H, I) = Z dm i f M (m i ) f E (y i m i ) Convolution equation

32 Constructing the likelihood Z p(d i, H, I) = dm i f M (m i ) f E (y i m i ) But for a deterministic model, mi is obtained from a (usually analytically) function f without any uncertainty (say, a Keplerian curve for RV measurements) Then: p(d i, H, I) = m i = f(x i ) f M (m i )= (m i f(x i )) Z dm i (m i f(x i )) f E (y i m i ) = f E (y i f(x i ) =p(e i, H, I) p(d, H, I) =p(d 1,D 2,...,D n, H, I) =p(e 1,E 2,...,E n, H, I)

33 Constructing the likelihood p(d, H, I) =p(d 1,D 2,...,D n, H, I) =p(e 1,E 2,...,E n, H, I) Now, for independent errors p(d, H, I) =p(e 1,E 2,...,E n, H, I) = p(e 1, H, I)...p(E n, H, I) ny = p(e i, H, I) i=1 And, if in addition, the errors are distributed normally: p(d, H, I) indep.,gauss. / exp 2 2

34 Constructing the likelihood Back to the convolution equation Z p(d i, H, I) = dm i f M (m i ) f E (y i m i ) For a non-deterministic model, Mi is distributed: Mi: the i-th error is in the infinitesimal range mi to mi + dmi p(m i, H, I) =f M (m i ) The probability distribution of statement Mi E.g. adding instrumental error / resolution: f M (m i )=N(f(x i ), 2 inst)

35 Part III priors

36 xkcd.com

37 Prior probabilities p(h i I) Hi: hypothesis (can be continuous). I: information Prior information I is always present: The term prior does not necessarily mean earlier in time. Philosophical controversy on how to assign priors. Subjective vs. objective views. No single universal rule, but a few accepted methods. Informative priors. Usually based on the output from previous observations. (What was the prior of the first analysis?). Ignorance priors. Required as a starting point for the theory.

38 Assigning ignorance priors 1. Principle of indifference. Given n mutually exclusive, exhaustive hypothesis, {H i }, with i = 1,, n, the PoI states: p(h i I) =1/n

39 Assigning ignorance priors 2. Transformation groups. Location and scale parameters. For a certain type of parameters (location and scale), total ignorance can we represented as invariance under certain (group of) transformation. Location: position of highest tree along a river. Problem must be invariant under a translation. X 0 = X + c p(x I)dX = p(x 0 I)dX 0 = p(x 0 I)d(X + c) =p(x 0 I)dX p(x I) = constant Uniform prior.

40 Assigning ignorance priors 2. Transformation groups. Location and scale parameters. For a certain type of parameters (location and scale), total ignorance can we represented as invariance under certain (group of) transformation. Scale: life time of a new bacteria or Poisson rate Problem must be invariant under scaling. X 0 = ax p(x I)dX = p(x 0 I)dX 0 = p(x 0 I)d(aX) =ap(x 0 I)dX p(x I) = constant x Jeffreys prior.

41 Assigning ignorance priors 3. Jeffreys rule. Besides location and scale parameters, little more can be said using transformation invariance. Jeffreys priors use the Fisher information; parameterisation invariant, but strange behaviour in many dimensions. Observed Fischer information: I D = d2 log L D d 2 But D is not known when we have to define a prior. Use expectation value over D. I( ) = E D apple d 2 log L D d 2

42 Assigning ignorance priors 3. Jeffreys rule says: Examples: p( I) / p I( ) Mean of Normal distribution (μ) with known variance σ^2. p(µ 2, I) / constant Rate λ of Poisson distribution. p( I) / 1/ p Exercise: Scale of Normal with known mean value?

43 Assigning ignorance priors 3. Jeffreys rule: p( I) / p I( ) I( ) = E D apple d 2 log L D d 2 Favours parts of parameter space where data provides more information. Is invariant under reparametrisation. Works fine only in one dimension See more examples here: en.wikipedia.org/wiki/jeffreys_prior

44 Assigning ignorance priors 4. Maximum entropy. Uses information theory to define a measurement of the distribution information. S = nx i=0 p i log(p i ) Maximise entropy subject to constraints (use Lagrange multipliers). Defined strictly for discrete distributions.

45 Assigning ignorance priors 5. Reference priors. Recent development in the field (Bernardo; Bernardo, Berger, & Sun). Similar to MAXENT, but maximise the Kullback-Leibler divergence between prior and posterior. Priors which maximise the expected difference with the posterior. Rigorous definition is complicated and subtle. For 1-D, reference priors are the Jeffreys priors. In multi-d, reference prior behaves better than Jeffreys priors.

46 Agenda (II) Part IV. Posterior. Conjugate priors. The right prior for a given likelihood. MCMC. Part V. Model comparison. Model evidence and Bayes factor. The philosophy of integrating over parameter space. Estimating the model evidence. Dos and Don ts BIC, Chib & Jeliazkov, etc. Importance sampling and a family of methods.

47 Part IV sampling the posterior / MCMC

48 Sampling from the posterior p( D, H i, I) / L (H i ) p( H i, I) The posterior distribution is proportional to the likelihood times the prior. The normalising constant (called model evidence, marginal likelihood, etc.) is of importance when comparing different models. The posterior contains all the information on a given model a Bayesian statistician can get for a given set of priors and data. Posterior is only analytical in few cases: Conjugate priors. Other methods needed to sample from posterior. Remember, most Bayesian computations can be reduced to expectation values with respect to the posterior.

49 Markov Chain Monte Carlo : parameter vector Hi: hypothesis I: information D: data p( D, H i, I) / L (H i ) p( H i, I) Metropolis-Hastings 0 * 0 q( 0, ) * 1. L 0 p( 0 I) 2. Create proposal point L 0 p( 0 I) r = L 0 p( 0 I) L 0 p( 0 I)

50 Markov Chain Monte Carlo : parameter vector Hi: hypothesis I: information D: data p( D, H i, I) / L (H i ) p( H i, I) Metropolis-Hastings 0 * q( 0, ) 0 * q( 0, ) 1 * 1. L 0 p( 0 I) 2. Create proposal point L 0 p( 0 I) r = L 0 p( 0 I) L 0 p( 0 I) 5. Accept proposal with probability min(1, r)

51 Markov Chain Monte Carlo : parameter vector Hi: hypothesis I: information D: data p( D, H i, I) / L (H i ) p( H i, I) q( 0, ) Metropolis-Hastings 1 * 0 * 1. L 0 p( 0 I) 2. Create proposal point L 0 p( 0 I) r = L 0 p( 0 I) L 0 p( 0 I) 5. Accept proposal with probability min(1, r)

52 Markov Chain Monte Carlo : parameter vector Hi: hypothesis I: information D: data p( D, H i, I) / L (H i ) p( H i, I) Metropolis-Hastings 0 * q( 0, ) 0 * 1 * q( 0, ) Algorithms Metropolis-Hastings Gibbs sampling Slice sampling Hybrid Monte Carlo Codes pymc emcee kombine cobmcmc

53 5 minutes to LEARN a lifetime to MASTER

54 But beware. Nonconvergence, bad mixing. The dark side of MCMC are they. Problems with correlations and multi-modal distributions.

55 The problem with Correlations If parameters exhibit correlations, then step size must be small to reach the demanded fraction of accepted jumps. Rejected proposal Accepted proposal Need a very long chain to explore the entire posterior. Or, more relevant, the entire posterior will not be explored thoroughly (i.e. reduced error bars!)

56 MCMC: the Good, the Bad, and the Ugly Visual inspection of traces. Good Marginal mixing No convergence

57 MCMC: the Good, the Bad, and the Ugly Evaluation of the chain auto-correlations Image credit: E. Ford

58 Multi-modal posteriors. Difficult for sampler to move across modes if separated by a low-probability barrier. Image credit: E. Ford

59 Multi-modal posteriors. Run as many chains as possible starting from significantly different places in the prior space. Be paranoid! You can always be missing modes.

60

61 Part V model comparison

62 Decision making Bayes theorem is also the base for model comparison p(h i I, D) = p(d H i, I) p(d I) p(h i I) Z but now p(d H i, I) = p(d, H i, I)p( H i, I) d n Computation of the model evidence (a.k.a. marginal likelihood, global likelihood, prior predictive, ) cannot be escaped.

63 Decision making Model comparison consists in computing the ratio of the posteriors (odds ratio) of two competing hypotheses. p(h i D, I) p(h i I,D) = p(d H i,i) p(h i I) p(h j I,D) p(d H j,i) p(h j I) p(h j D, I) Bayes factor Prior odds

64 A built-in Occam s razor The Bayes factor naturally punishes models with more parameters Z p(d H i, I) = p(d, H i, I)p( H i, I) d E.g.: model M0, without free parameters. Model M1, with one parameter. M 0 = M 1 ( = 0 ) p(d M 0, I) =p(d 0, M 1, I) θ L(θ) = p(d θ, M 1, I ) p(d θ, M 1, I ) = L(θ) Z p(d M 1, I) = p(d, M 1, I) p( M 1, I) d = 1 Z p(d, M 1, I) d δθ p(d ˆ, M 1, I) p(θ M 1,I ) = 1 θ θ B 10 p(d ˆ, M 1, I) p(d 0, M 1, I) Occam s factor One per parameter Gregory (2005) Parameter θ

65 Estimating the marginal likelihood p(d H i, I) = Z p(d, H i, I)p( H i, I) d n Marginal likelihood is a k-dimensional integral over parameter space. Accounting for the size of parameter space bring many good things to Bayesian statistics, but the integral is intractable Large number of techniques to estimate the value of the integral: Asymptotic estimates (Laplace approximation, BIC). Importance sampling. Chib & Jeliazkov. Nested sampling.

66 Estimating the marginal likelihood Point Estimations BIC 2ln L max + k ln N Bayesian Information Criterion Some nice properties are lost, such as reasonable parameter penalisation (see Liddle 2007). The error term of the BIC is generally O(1). Even for large samples, it does not produce the right value.

67 Importance Sampling (I) Used to obtain moments of distributions using samples from another distribution. The evidence is the expectation value of the likelihood over the prior space. p(d H i, I) = Z p(d, H i, I)p( H i, I) d n The simplest estimation of the evidence is to take samples from the prior and compute the average of the p(d θ, Hi, I): ˆp(D H i, I) = 1 S SX p(d (i), H i, I) But this estimator is extremely inefficient if the likelihood is concentrated relative to the prior. A large number of samples are needed, which is computationally expensive. i

68 Importance Sampling (II) More generally, we can choose a distribution g(θ) (from which we can sample) and express the integral as an expectation value over that distribution (dropping conditional I for notation simplicity) Z p( H i ) p(d, H i )d = Z w g ( ) z } { p( H i ) g( ) p(d, H i ) g( )d = =E g( ) [w g ( ) p(d, H i )] 1 SX w g ( (i) ) p(d (i), H i ) S i where θ (i) is sampled from g(θ), the importance sampling function.

69 Importance Sampling (III) Different choices of g(θ) give different estimates: g( ) =p( H i, I) g( ) =p( D, H i, I) ˆp(D H i, I) = 1 S ˆ p(d H i, I) =S " S X i SX i p(d (i), H i, I) 1 p(d (i), H i, I) # 1 The Harmonic Mean: does not satisfy a Gaussian central limit theorem (Kass & Raftery 1995). The sum is dominated by the occasional small terms in the likelihood. Its variance can be infinite. Insensitive to diffuse priors (i.e. priors much larger than likelihood). Harmonic mean

70 Importance Sampling (IV) Perrakis et al. (2014) use the product of the posterior marginal distributions. g( ) = ky i=1 p( i D, H i, I) Which leads to the estimator ˆ p(d H, I) = 1 S SX i L( (i) )p( (i) H, I) Q k j=1 p( (i) D, H, I) Behaves much better than the harmonic mean. Requires estimating the marginal densities in the denominator. Different techniques available (moment-matching, kernel estimation, etc.) Samples from the marginal distributions are readily obtained from MCMC samples

71 Estimating the marginal likelihood Estimations based on importance sampling (IS) ISF Leads to... But... Prior Mean Estimate Very inefficient Posterior Harmonic Mean Estimate Dominated by points with low likelihood. Posterior & Prior Mixture Estimate Requires drawing from both posterior and prior Kaas & Raferty (1995) Posterior x 2 Truncated- Posterior Mixture Estimate Inconsistent or as good as HM. Tuomi & Jones (2012)

72 Nested sampling (Skilling 2007) p(d H i, I) = Z p(d, H i, I)p( H i, I) d n A non-trivial change of variables. Define prior volume X dx = p( I, H)d n X( )= R L( )> p( I, H)dn Then, the integral is a simple 1-D integral over prior volume p(d H, I) = Z 1 0 L(X)dX

73 Nested sampling (Skilling 2007) p(d H, I) = Z 1 0 L(X)dX

74 Nested sampling (Skilling 2007) Algorithm: p(d H, I) = Z 1 0 L(X)dX Start with N points θ 1,..., θ N from prior; initialise Z = 0, X 0 = 1. Repeat for i = 1, 2,..., j; record the lowest of the current likelihood values as L i, set X i = exp( i/n) (crude) or sample it to get uncertainty, set w i = X i 1 X i (simple) or (X i 1 X i+1 )/2 (trapezoidal), increment Z by L i w i, then replace point of lowest likelihood by new one drawn from within L(θ) > L i, in proportion to the prior π(θ). Increment Z by N 1 (L(θ 1 ) L(θ N )) X j. Two keys: - Assign Xi. - Draw with condition L > Li.

75 Nested sampling (Skilling 2007) Draw new point with condition. Elipsoidal nested sampling (Mukherjee et al. 2006). Uses ellipsoidal contours around active points. Becomes inefficient at high dimensions or multi-modal. MultiNest algorithm (Feroz, Hobson, Bridges, 2009). Ellipsoids break up as needed to improve efficiency. Still problems at high dimensions (the curse goes on).

76 Nested sampling (Skilling 2007) Assigning Xi. p(d H, I) = Z 1 0 L(X)dX Use statistical properties of randomly distributed points. X 0 =1, X i = t i X i 1, p(t i )=Nt N 1 i E [log t] = 1/N Take mean value i/n) t i =exp( OR Draw from distribution to account for uncertainty.

77

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Nested Sampling. Brendon J. Brewer.   brewer/ Department of Statistics The University of Auckland Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Introduction to Bayesian Data Analysis

Introduction to Bayesian Data Analysis Introduction to Bayesian Data Analysis Phil Gregory University of British Columbia March 2010 Hardback (ISBN-10: 052184150X ISBN-13: 9780521841504) Resources and solutions This title has free Mathematica

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Multimodal Nested Sampling

Multimodal Nested Sampling Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) Background: Conditional probability A P (B A) = A,B P (A,

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Bayesian search for other Earths

Bayesian search for other Earths Bayesian search for other Earths Low-mass planets orbiting nearby M dwarfs Mikko Tuomi University of Hertfordshire, Centre for Astrophysics Research Email: mikko.tuomi@utu.fi Presentation, 19.4.2013 1

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Introduction to Bayes

Introduction to Bayes Introduction to Bayes Alan Heavens September 3, 2018 ICIC Data Analysis Workshop Alan Heavens Introduction to Bayes September 3, 2018 1 / 35 Overview 1 Inverse Problems 2 The meaning of probability Probability

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

More information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem? Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Bayesian Phylogenetics

Bayesian Phylogenetics Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 Bayesian Phylogenetics 1 / 27 Who was Bayes? The Reverand Thomas Bayes was born

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13 The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His

More information

Bayesian inference: what it means and why we care

Bayesian inference: what it means and why we care Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Preliminary statistics

Preliminary statistics 1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics 9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009 Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, 14 27 June 2009 Glen Cowan Physics Department Royal Holloway, University

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL London SPM Course Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk A spectacular piece of information

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Class Website: http://ircamera.as.arizona.edu/astr_518 Class Texts: Practical Statistics for Astronomers, J.V. Wall, and C.R. Jenkins Measuring the Universe,

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Advanced Statistical Methods. Lecture 6

Advanced Statistical Methods. Lecture 6 Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Variational Methods in Bayesian Deconvolution

Variational Methods in Bayesian Deconvolution PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Introduc)on to Bayesian Methods

Introduc)on to Bayesian Methods Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Recent advances in cosmological Bayesian model comparison

Recent advances in cosmological Bayesian model comparison Recent advances in cosmological Bayesian model comparison Astrophysics, Imperial College London www.robertotrotta.com 1. What is model comparison? 2. The Bayesian model comparison framework 3. Cosmological

More information

Bayesian analysis in nuclear physics

Bayesian analysis in nuclear physics Bayesian analysis in nuclear physics Ken Hanson T-16, Nuclear Physics; Theoretical Division Los Alamos National Laboratory Tutorials presented at LANSCE Los Alamos Neutron Scattering Center July 25 August

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

@R_Trotta Bayesian inference: Principles and applications

@R_Trotta Bayesian inference: Principles and applications @R_Trotta Bayesian inference: Principles and applications - www.robertotrotta.com Copenhagen PhD School Oct 6th-10th 2014 astro.ic.ac.uk/icic Bayesian methods on the rise Bayes Theorem Bayes' Theorem follows

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I 1 Sum rule: If set {x i } is exhaustive and exclusive, pr(x

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Bayesian Computation

Bayesian Computation Bayesian Computation CAS Centennial Celebration and Annual Meeting New York, NY November 10, 2014 Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut CAS Antitrust

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics Bayesian phylogenetics the one true tree? the methods we ve learned so far try to get a single tree that best describes the data however, they admit that they don t search everywhere, and that it is difficult

More information

Assessing Regime Uncertainty Through Reversible Jump McMC

Assessing Regime Uncertainty Through Reversible Jump McMC Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

A Probabilistic Framework for solving Inverse Problems. Lambros S. Katafygiotis, Ph.D.

A Probabilistic Framework for solving Inverse Problems. Lambros S. Katafygiotis, Ph.D. A Probabilistic Framework for solving Inverse Problems Lambros S. Katafygiotis, Ph.D. OUTLINE Introduction to basic concepts of Bayesian Statistics Inverse Problems in Civil Engineering Probabilistic Model

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information