Foundations of Statistical Inference

Size: px
Start display at page:

Download "Foundations of Statistical Inference"

Transcription

1 Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT / 20

2 Lecture 6 : Bayesian Inference Julien Berestycki (University of Oxford) SB2a MT / 20

3 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. Julien Berestycki (University of Oxford) SB2a MT / 20

4 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. By contrast, in Subjective Bayesian inference, probability is a measure of the strength of belief. Julien Berestycki (University of Oxford) SB2a MT / 20

5 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. By contrast, in Subjective Bayesian inference, probability is a measure of the strength of belief. We treat parameters as random variables. Before collecting any data we assume that there is uncertainty about the value of a parameter. This uncertainty can be formalised by specifying a pdf (or pmf) for the parameter. We then conduct an experiment to collect some data that will give us information about the parameter. We then use Bayes Theorem to combine our prior beliefs with the data to derive an updated estimate of our uncertainty about the parameter. Julien Berestycki (University of Oxford) SB2a MT / 20

6 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). Julien Berestycki (University of Oxford) SB2a MT / 20

7 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. Julien Berestycki (University of Oxford) SB2a MT / 20

8 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. Julien Berestycki (University of Oxford) SB2a MT / 20

9 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. This continued throughout the s, especially as problems with the Frequentist approach started to emerge. Julien Berestycki (University of Oxford) SB2a MT / 20

10 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. This continued throughout the s, especially as problems with the Frequentist approach started to emerge. The development of simulation based inference has transformed Bayesian statistics in the last years and it now plays a prominent part in modern statistics. Julien Berestycki (University of Oxford) SB2a MT / 20

11 Julien Berestycki (University of Oxford) SB2a MT / 20

12 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data Julien Berestycki (University of Oxford) SB2a MT / 20

13 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. Julien Berestycki (University of Oxford) SB2a MT / 20

14 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. This is not the same as the statement that, given data X = x, the interval C(x) covers θ with probability 1 α. But this is the type of statement we might wish to make. (observe that this statement makes sense iff θ is a r.v.) Julien Berestycki (University of Oxford) SB2a MT / 20

15 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. This is not the same as the statement that, given data X = x, the interval C(x) covers θ with probability 1 α. But this is the type of statement we might wish to make. (observe that this statement makes sense iff θ is a r.v.) Example 1 Suppose X 1, X 2 U(θ 1/2, θ + 1/2) so that X (1) and X (2) are the order statistics. Then C(X) = [X (1), X (2) ] is a α = 1/2 level CI for θ. Suppose in your data X = x, x (2) x (1) > 1/2 (this happens in an eighth of data sets). Then θ [x (1), x (2) ] with probability one. Julien Berestycki (University of Oxford) SB2a MT / 20

16 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed Julien Berestycki (University of Oxford) SB2a MT / 20

17 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed The likelihood principle Suppose that two experiments relating to θ, E 1, E 2, give rise to data y 1, y 2, such that the corresponding likelihoods are proportional, that is, for all θ L(θ; y 1, E 1 ) = cl(θ; y 2, E 2 ). then the two experiments lead to identical conclusions about θ. Julien Berestycki (University of Oxford) SB2a MT / 20

18 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed The likelihood principle Suppose that two experiments relating to θ, E 1, E 2, give rise to data y 1, y 2, such that the corresponding likelihoods are proportional, that is, for all θ L(θ; y 1, E 1 ) = cl(θ; y 2, E 2 ). then the two experiments lead to identical conclusions about θ. Key point MLE s respect the likelihood principle i.e. the MLEs for θ are identical in both experiments. But significance tests do not respect the likelihood principle. Julien Berestycki (University of Oxford) SB2a MT / 20

19 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Julien Berestycki (University of Oxford) SB2a MT / 20

20 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Note 1 The calculation of the p-value involves a sum (or integral) over data that was not observed, and this can depend upon the form of the experiment. Julien Berestycki (University of Oxford) SB2a MT / 20

21 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Note 1 The calculation of the p-value involves a sum (or integral) over data that was not observed, and this can depend upon the form of the experiment. Note 2 A p-value is not P(H 0 T (X) = t). Julien Berestycki (University of Oxford) SB2a MT / 20

22 Example A Bernoulli trial succeeds with probability p. Julien Berestycki (University of Oxford) SB2a MT / 20

23 Example A Bernoulli trial succeeds with probability p. E 1 fix n 1 Bernoulli trials, count number y 1 of successes Julien Berestycki (University of Oxford) SB2a MT / 20

24 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes Julien Berestycki (University of Oxford) SB2a MT / 20

25 Example A Bernoulli trial succeeds with probability p. E 1 fix n 1 Bernoulli trials, count number y 1 of successes E 2 count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = ( n1 y 1 ) p y 1 (1 p) n 1 y 1 binomial Julien Berestycki (University of Oxford) SB2a MT / 20

26 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 Julien Berestycki (University of Oxford) SB2a MT / 20

27 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 If n 1 = n 2 = n, y 1 = y 2 = y then L(p; y 1, E 1 ) L(p; n 2, E 2 ). Julien Berestycki (University of Oxford) SB2a MT / 20

28 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 If n 1 = n 2 = n, y 1 = y 2 = y then L(p; y 1, E 1 ) L(p; n 2, E 2 ). So MLEs for p will be the same under E 1 and E 2. Julien Berestycki (University of Oxford) SB2a MT / 20

29 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. Julien Berestycki (University of Oxford) SB2a MT / 20

30 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 Julien Berestycki (University of Oxford) SB2a MT / 20

31 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 while the p-value based on E 2 is ( P N n θ = 1 ) ( ) k 1 = (1/2) k (1 1/2) n k (= 0.033) 2 y 1 k=n so different conclusions at significance level Julien Berestycki (University of Oxford) SB2a MT / 20

32 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 while the p-value based on E 2 is ( P N n θ = 1 ) ( ) k 1 = (1/2) k (1 1/2) n k (= 0.033) 2 y 1 k=n so different conclusions at significance level Note The p-values disagree because they sum over portions of two different sample spaces. Julien Berestycki (University of Oxford) SB2a MT / 20

33 Julien Berestycki (University of Oxford) SB2a MT / 20

34 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ Julien Berestycki (University of Oxford) SB2a MT / 20

35 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ Julien Berestycki (University of Oxford) SB2a MT / 20

36 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ posterior likelihood prior The same form for θ continuous (π(θ x) a pdf) or discrete (π(θ x) a pmf). We call f (x θ)π(θ)dθ the marginal likelihood. Julien Berestycki (University of Oxford) SB2a MT / 20

37 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ posterior likelihood prior The same form for θ continuous (π(θ x) a pdf) or discrete (π(θ x) a pmf). We call f (x θ)π(θ)dθ the marginal likelihood. Likelihood principle Notice that, if we base all inference on the posterior distribution, then we respect the likelihood principle. If two likelihood functions are proportional, then any constant cancels top and bottom in Bayes rule, and the two posterior distributions are the same. Julien Berestycki (University of Oxford) SB2a MT / 20

38 Example 1 X Bin(n, ϑ) for known n and unknown ϑ Julien Berestycki (University of Oxford) SB2a MT / 20

39 Example 1 X Bin(n, ϑ) for known n and unknown ϑ. Suppose our prior knowledge about ϑ is represented by a Beta distribution on (0, 1), and θ is a trial value for ϑ. π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Julien Berestycki (University of Oxford) SB2a MT / 20

40 Example 1 X Bin(n, ϑ) for known n and unknown ϑ. Suppose our prior knowledge about ϑ is represented by a Beta distribution on (0, 1), and θ is a trial value for ϑ. π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) density a=1, b=1 a=20, b=20 a=1, b=3 a=8, b= Julien Berestycki (University of Oxford) SB2a MT / 20

41 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Julien Berestycki (University of Oxford) SB2a MT / 20

42 Example 1 Prior probability density Likelihood π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) f (x θ) = ( ) n θ x (1 θ) n x, x = 0,..., n x Julien Berestycki (University of Oxford) SB2a MT / 20

43 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior Julien Berestycki (University of Oxford) SB2a MT / 20

44 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x Julien Berestycki (University of Oxford) SB2a MT / 20

45 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Julien Berestycki (University of Oxford) SB2a MT / 20

46 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Here posterior has the same form as the prior (conjugacy) with updated parameters a, b replaced by a + x, b + n x Julien Berestycki (University of Oxford) SB2a MT / 20

47 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Here posterior has the same form as the prior (conjugacy) with updated parameters a, b replaced by a + x, b + n x, so π(θ x) = θa+x 1 (1 θ) n x+b 1 B(a + x, b + n x) Julien Berestycki (University of Oxford) SB2a MT / 20

48 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = ab (a + b) 2 (a + b + 1) Julien Berestycki (University of Oxford) SB2a MT / 20

49 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = The posterior mean and variance are ab (a + b) 2 (a + b + 1) a + X a + b + n, (a + X)(b + n X) (a + b + n) 2 (a + b + n + 1) Julien Berestycki (University of Oxford) SB2a MT / 20

50 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = The posterior mean and variance are ab (a + b) 2 (a + b + 1) a + X a + b + n, (a + X)(b + n X) (a + b + n) 2 (a + b + n + 1) Suppose X = n and we set a = b = 1 for our prior. Then posterior mean is n + 1 n + 2 i.e. when we observe events of just one type then our point estimate is not 0 or 1 (which is sensible especially in small sample sizes). Julien Berestycki (University of Oxford) SB2a MT / 20

51 Example 1 For large n, the posterior mean and variance are approximately In classical statistics X X(n X), n n 3 θ = X n, θ(1 θ) n = X(n X) n 3 Julien Berestycki (University of Oxford) SB2a MT / 20

52 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1 Julien Berestycki (University of Oxford) SB2a MT / 20

53 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). Julien Berestycki (University of Oxford) SB2a MT / 20

54 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). probability density post n=100 post n=10 prior theta Julien Berestycki (University of Oxford) SB2a MT / 20

55 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). probability density post n=100 post n=10 prior theta As n increases, the likelihood overwhelms information in prior. Julien Berestycki (University of Oxford) SB2a MT / 20

56 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown. Julien Berestycki (University of Oxford) SB2a MT / 20

57 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. Julien Berestycki (University of Oxford) SB2a MT / 20

58 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 kτ ) for some k > 0, ν R. Julien Berestycki (University of Oxford) SB2a MT / 20

59 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 kτ ) for some k > 0, ν R. The prior is π(τ, µ) = βα Γ(α) τ α 1 e βτ Julien Berestycki (University of Oxford) SB2a MT / 20

60 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 Julien Berestycki (University of Oxford) SB2a MT / 20

61 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is or π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 [ π(τ, µ) τ α 1/2 exp τ {β + k2 }] (µ ν)2 Julien Berestycki (University of Oxford) SB2a MT / 20

62 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is or π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 [ π(τ, µ) τ α 1/2 exp τ {β + k2 }] (µ ν)2 The likelihood is { f (x µ, τ) = (2π) n/2 τ n/2 exp τ 2 } n (x i µ) 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20

63 Example 2 - Conjugate priors Thus [ { π(τ, µ x) τ α+(n/2) 1/2 exp τ β + k 2 (µ ν) }] n (x i µ) 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20

64 Example 2 - Conjugate priors Thus [ { π(τ, µ x) τ α+(n/2) 1/2 exp τ β + k 2 (µ ν) }] n (x i µ) 2 i=1 Complete the square to see that k(µ ν) 2 + (x i µ) 2 ( ) kν + n x 2 = (k + n) µ + nk k + n n + k ( x ν)2 + (x i x) 2 Julien Berestycki (University of Oxford) SB2a MT / 20

65 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20

66 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 The idea is to try to write that as [ c exp τ {β + k }] 2 (ν µ) 2 so that (conditional on τ) we recognize a Normal density. Julien Berestycki (University of Oxford) SB2a MT / 20

67 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 The idea is to try to write that as [ c exp τ {β + k }] 2 (ν µ) 2 so that (conditional on τ) we recognize a Normal density. k(µ ν) 2 + (x i µ) 2 = µ 2 (k + n) µ(2kν + 2 x i ) +... ( = (k + n) µ kν + n x k + n ) 2 + nk n + k ( x ν)2 + (x i x) 2 Julien Berestycki (University of Oxford) SB2a MT / 20

68 Example 2 - Conjugate priors Thus the posterior is where π(τ, µ x) τ α 1/2 exp [ τ {β + k }] 2 (ν µ) 2 α = α + n 2 β = β nk n + k ( x ν)2 + 1 (xi x) 2 2 k = k + n ν = kν + n x k + n This is the same form as the prior, so the class is conjugate prior. Julien Berestycki (University of Oxford) SB2a MT / 20

69 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ Julien Berestycki (University of Oxford) SB2a MT / 20

70 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ = π(µ τ, x)π(τ x)dτ Here, we have a simplification if we assume 2α = m N. Then τ = W /2β, µ = ν + Z / kτ with W χ 2 m and Z N(0, 1). Julien Berestycki (University of Oxford) SB2a MT / 20

71 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ = π(µ τ, x)π(τ x)dτ Here, we have a simplification if we assume 2α = m N. Then τ = W /2β, µ = ν + Z / kτ with W χ 2 m and Z N(0, 1). Recall that Z m/w t m (Studen with m d.f.) we see that the prior of µ is km 2β (µ ν) t m and the posterior is k m 2β (µ ν ) t m, m = m + n Julien Berestycki (University of Oxford) SB2a MT / 20

72 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Julien Berestycki (University of Oxford) SB2a MT / 20

73 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544). post. mean = post. std dev = central 95% post. interval = [0.415, 0.477] Julien Berestycki (University of Oxford) SB2a MT / 20

74 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544). post. mean = post. std dev = central 95% post. interval = [0.415, 0.477] Sensibility to proposed prior. α/β 2 = prior sample size". Julien Berestycki (University of Oxford) SB2a MT / 20

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Course arrangements. Foundations of Statistical Inference

Course arrangements. Foundations of Statistical Inference Course arrangements Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 013 Lectures M. and Th.10 SPR1 weeks 1-8 Classes Weeks 3-8 Friday 1-1, 4-5, 5-6

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

9 Bayesian inference. 9.1 Subjective probability

9 Bayesian inference. 9.1 Subjective probability 9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

Chapter 8: Sampling distributions of estimators Sections

Chapter 8: Sampling distributions of estimators Sections Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Bernoulli and Poisson models

Bernoulli and Poisson models Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

ST 740: Multiparameter Inference

ST 740: Multiparameter Inference ST 740: Multiparameter Inference Alyson Wilson Department of Statistics North Carolina State University September 23, 2013 A. Wilson (NCSU Statistics) Multiparameter Inference September 23, 2013 1 / 21

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Statistical Inference: Maximum Likelihood and Bayesian Approaches Statistical Inference: Maximum Likelihood and Bayesian Approaches Surya Tokdar From model to inference So a statistical analysis begins by setting up a model {f (x θ) : θ Θ} for data X. Next we observe

More information

Weakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Bayesian statistics, simulation and software

Bayesian statistics, simulation and software Module 4: Normal model, improper and conjugate priors Department of Mathematical Sciences Aalborg University 1/25 Another example: normal sample with known precision Heights of some Copenhageners in 1995:

More information

Bayesian Inference. p(y)

Bayesian Inference. p(y) Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior: Pi Priors Unobservable Parameter population proportion, p prior: π ( p) Conjugate prior π ( p) ~ Beta( a, b) same PDF family exponential family only Posterior π ( p y) ~ Beta( a + y, b + n y) Observed

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016 Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability

More information

Conditional distributions

Conditional distributions Conditional distributions Will Monroe July 6, 017 with materials by Mehran Sahami and Chris Piech Independence of discrete random variables Two random variables are independent if knowing the value of

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

Part 3 Robust Bayesian statistics & applications in reliability networks

Part 3 Robust Bayesian statistics & applications in reliability networks Tuesday 9:00-12:30 Part 3 Robust Bayesian statistics & applications in reliability networks by Gero Walter 69 Robust Bayesian statistics & applications in reliability networks Outline Robust Bayesian Analysis

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Weakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu Agenda knn and PCA Bayesian Inference k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Bayesian statistics, simulation and software

Bayesian statistics, simulation and software Module 3: Bayesian principle, binomial model and conjugate priors Department of Mathematical Sciences Aalborg University 1/14 Motivating example: Spelling correction (Adapted from Bayesian Data Analysis

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

STAT 830 Bayesian Estimation

STAT 830 Bayesian Estimation STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Statistical Theory MT 2006 Problems 4: Solution sketches

Statistical Theory MT 2006 Problems 4: Solution sketches Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine

More information

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics

More information

HPD Intervals / Regions

HPD Intervals / Regions HPD Intervals / Regions The HPD region will be an interval when the posterior is unimodal. If the posterior is multimodal, the HPD region might be a discontiguous set. Picture: The set {θ : θ (1.5, 3.9)

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Error analysis for efficiency

Error analysis for efficiency Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Bayes Factors for Grouped Data

Bayes Factors for Grouped Data Bayes Factors for Grouped Data Lizanne Raubenheimer and Abrie J. van der Merwe 2 Department of Statistics, Rhodes University, Grahamstown, South Africa, L.Raubenheimer@ru.ac.za 2 Department of Mathematical

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

Beta statistics. Keywords. Bayes theorem. Bayes rule

Beta statistics. Keywords. Bayes theorem. Bayes rule Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate

More information

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

More information

A Discussion of the Bayesian Approach

A Discussion of the Bayesian Approach A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns

More information

Boxplots A boxplot, or box-and-whisker plot, is a convenient way of summarising data, particularly when the data is made up of several groups.

Boxplots A boxplot, or box-and-whisker plot, is a convenient way of summarising data, particularly when the data is made up of several groups. Boxplots A boxplot, or box-and-whisker plot, is a convenient way of summarising data, particularly when the data is made up of several groups. 75 8 85 9 95 Boxplot largest value (there are no extreme values)

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Bayesian inference: what it means and why we care

Bayesian inference: what it means and why we care Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

STAT J535: Chapter 5: Classes of Bayesian Priors

STAT J535: Chapter 5: Classes of Bayesian Priors STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia,, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University Outline Lecture 1 Introduction Confidence Intervals

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon Final Examination Saturday, 2001 May 5, 9:00am 12:00 noon This is an open-book examination, but you may not share materials. A normal distribution table, a PMF/PDF handout, and a blank worksheet are attached

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information