Foundations of Statistical Inference

Size: px

Start display at page:

Download "Foundations of Statistical Inference"

Juniper Fowler
5 years ago
Views:

1 Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT / 20

2 Lecture 6 : Bayesian Inference Julien Berestycki (University of Oxford) SB2a MT / 20

3 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. Julien Berestycki (University of Oxford) SB2a MT / 20

4 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. By contrast, in Subjective Bayesian inference, probability is a measure of the strength of belief. Julien Berestycki (University of Oxford) SB2a MT / 20

5 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. By contrast, in Subjective Bayesian inference, probability is a measure of the strength of belief. We treat parameters as random variables. Before collecting any data we assume that there is uncertainty about the value of a parameter. This uncertainty can be formalised by specifying a pdf (or pmf) for the parameter. We then conduct an experiment to collect some data that will give us information about the parameter. We then use Bayes Theorem to combine our prior beliefs with the data to derive an updated estimate of our uncertainty about the parameter. Julien Berestycki (University of Oxford) SB2a MT / 20

6 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). Julien Berestycki (University of Oxford) SB2a MT / 20

7 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. Julien Berestycki (University of Oxford) SB2a MT / 20

8 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. Julien Berestycki (University of Oxford) SB2a MT / 20

9 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. This continued throughout the s, especially as problems with the Frequentist approach started to emerge. Julien Berestycki (University of Oxford) SB2a MT / 20

10 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. This continued throughout the s, especially as problems with the Frequentist approach started to emerge. The development of simulation based inference has transformed Bayesian statistics in the last years and it now plays a prominent part in modern statistics. Julien Berestycki (University of Oxford) SB2a MT / 20

11 Julien Berestycki (University of Oxford) SB2a MT / 20

12 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data Julien Berestycki (University of Oxford) SB2a MT / 20

13 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. Julien Berestycki (University of Oxford) SB2a MT / 20

14 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. This is not the same as the statement that, given data X = x, the interval C(x) covers θ with probability 1 α. But this is the type of statement we might wish to make. (observe that this statement makes sense iff θ is a r.v.) Julien Berestycki (University of Oxford) SB2a MT / 20

15 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. This is not the same as the statement that, given data X = x, the interval C(x) covers θ with probability 1 α. But this is the type of statement we might wish to make. (observe that this statement makes sense iff θ is a r.v.) Example 1 Suppose X 1, X 2 U(θ 1/2, θ + 1/2) so that X (1) and X (2) are the order statistics. Then C(X) = [X (1), X (2) ] is a α = 1/2 level CI for θ. Suppose in your data X = x, x (2) x (1) > 1/2 (this happens in an eighth of data sets). Then θ [x (1), x (2) ] with probability one. Julien Berestycki (University of Oxford) SB2a MT / 20

16 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed Julien Berestycki (University of Oxford) SB2a MT / 20

17 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed The likelihood principle Suppose that two experiments relating to θ, E 1, E 2, give rise to data y 1, y 2, such that the corresponding likelihoods are proportional, that is, for all θ L(θ; y 1, E 1 ) = cl(θ; y 2, E 2 ). then the two experiments lead to identical conclusions about θ. Julien Berestycki (University of Oxford) SB2a MT / 20

18 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed The likelihood principle Suppose that two experiments relating to θ, E 1, E 2, give rise to data y 1, y 2, such that the corresponding likelihoods are proportional, that is, for all θ L(θ; y 1, E 1 ) = cl(θ; y 2, E 2 ). then the two experiments lead to identical conclusions about θ. Key point MLE s respect the likelihood principle i.e. the MLEs for θ are identical in both experiments. But significance tests do not respect the likelihood principle. Julien Berestycki (University of Oxford) SB2a MT / 20

19 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Julien Berestycki (University of Oxford) SB2a MT / 20

20 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Note 1 The calculation of the p-value involves a sum (or integral) over data that was not observed, and this can depend upon the form of the experiment. Julien Berestycki (University of Oxford) SB2a MT / 20

21 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Note 1 The calculation of the p-value involves a sum (or integral) over data that was not observed, and this can depend upon the form of the experiment. Note 2 A p-value is not P(H 0 T (X) = t). Julien Berestycki (University of Oxford) SB2a MT / 20

22 Example A Bernoulli trial succeeds with probability p. Julien Berestycki (University of Oxford) SB2a MT / 20

23 Example A Bernoulli trial succeeds with probability p. E 1 fix n 1 Bernoulli trials, count number y 1 of successes Julien Berestycki (University of Oxford) SB2a MT / 20

24 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes Julien Berestycki (University of Oxford) SB2a MT / 20

25 Example A Bernoulli trial succeeds with probability p. E 1 fix n 1 Bernoulli trials, count number y 1 of successes E 2 count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = ( n1 y 1 ) p y 1 (1 p) n 1 y 1 binomial Julien Berestycki (University of Oxford) SB2a MT / 20

26 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 Julien Berestycki (University of Oxford) SB2a MT / 20

27 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 If n 1 = n 2 = n, y 1 = y 2 = y then L(p; y 1, E 1 ) L(p; n 2, E 2 ). Julien Berestycki (University of Oxford) SB2a MT / 20

28 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 If n 1 = n 2 = n, y 1 = y 2 = y then L(p; y 1, E 1 ) L(p; n 2, E 2 ). So MLEs for p will be the same under E 1 and E 2. Julien Berestycki (University of Oxford) SB2a MT / 20

29 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. Julien Berestycki (University of Oxford) SB2a MT / 20

30 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 Julien Berestycki (University of Oxford) SB2a MT / 20

31 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 while the p-value based on E 2 is ( P N n θ = 1 ) ( ) k 1 = (1/2) k (1 1/2) n k (= 0.033) 2 y 1 k=n so different conclusions at significance level Julien Berestycki (University of Oxford) SB2a MT / 20

32 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 while the p-value based on E 2 is ( P N n θ = 1 ) ( ) k 1 = (1/2) k (1 1/2) n k (= 0.033) 2 y 1 k=n so different conclusions at significance level Note The p-values disagree because they sum over portions of two different sample spaces. Julien Berestycki (University of Oxford) SB2a MT / 20

33 Julien Berestycki (University of Oxford) SB2a MT / 20

34 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ Julien Berestycki (University of Oxford) SB2a MT / 20

35 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ Julien Berestycki (University of Oxford) SB2a MT / 20

36 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ posterior likelihood prior The same form for θ continuous (π(θ x) a pdf) or discrete (π(θ x) a pmf). We call f (x θ)π(θ)dθ the marginal likelihood. Julien Berestycki (University of Oxford) SB2a MT / 20

37 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ posterior likelihood prior The same form for θ continuous (π(θ x) a pdf) or discrete (π(θ x) a pmf). We call f (x θ)π(θ)dθ the marginal likelihood. Likelihood principle Notice that, if we base all inference on the posterior distribution, then we respect the likelihood principle. If two likelihood functions are proportional, then any constant cancels top and bottom in Bayes rule, and the two posterior distributions are the same. Julien Berestycki (University of Oxford) SB2a MT / 20

38 Example 1 X Bin(n, ϑ) for known n and unknown ϑ Julien Berestycki (University of Oxford) SB2a MT / 20

39 Example 1 X Bin(n, ϑ) for known n and unknown ϑ. Suppose our prior knowledge about ϑ is represented by a Beta distribution on (0, 1), and θ is a trial value for ϑ. π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Julien Berestycki (University of Oxford) SB2a MT / 20

40 Example 1 X Bin(n, ϑ) for known n and unknown ϑ. Suppose our prior knowledge about ϑ is represented by a Beta distribution on (0, 1), and θ is a trial value for ϑ. π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) density a=1, b=1 a=20, b=20 a=1, b=3 a=8, b= Julien Berestycki (University of Oxford) SB2a MT / 20

41 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Julien Berestycki (University of Oxford) SB2a MT / 20

42 Example 1 Prior probability density Likelihood π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) f (x θ) = ( ) n θ x (1 θ) n x, x = 0,..., n x Julien Berestycki (University of Oxford) SB2a MT / 20

43 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior Julien Berestycki (University of Oxford) SB2a MT / 20

44 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x Julien Berestycki (University of Oxford) SB2a MT / 20

45 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Julien Berestycki (University of Oxford) SB2a MT / 20

46 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Here posterior has the same form as the prior (conjugacy) with updated parameters a, b replaced by a + x, b + n x Julien Berestycki (University of Oxford) SB2a MT / 20

47 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Here posterior has the same form as the prior (conjugacy) with updated parameters a, b replaced by a + x, b + n x, so π(θ x) = θa+x 1 (1 θ) n x+b 1 B(a + x, b + n x) Julien Berestycki (University of Oxford) SB2a MT / 20

48 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = ab (a + b) 2 (a + b + 1) Julien Berestycki (University of Oxford) SB2a MT / 20

49 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = The posterior mean and variance are ab (a + b) 2 (a + b + 1) a + X a + b + n, (a + X)(b + n X) (a + b + n) 2 (a + b + n + 1) Julien Berestycki (University of Oxford) SB2a MT / 20

50 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = The posterior mean and variance are ab (a + b) 2 (a + b + 1) a + X a + b + n, (a + X)(b + n X) (a + b + n) 2 (a + b + n + 1) Suppose X = n and we set a = b = 1 for our prior. Then posterior mean is n + 1 n + 2 i.e. when we observe events of just one type then our point estimate is not 0 or 1 (which is sensible especially in small sample sizes). Julien Berestycki (University of Oxford) SB2a MT / 20

51 Example 1 For large n, the posterior mean and variance are approximately In classical statistics X X(n X), n n 3 θ = X n, θ(1 θ) n = X(n X) n 3 Julien Berestycki (University of Oxford) SB2a MT / 20

52 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1 Julien Berestycki (University of Oxford) SB2a MT / 20

53 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). Julien Berestycki (University of Oxford) SB2a MT / 20

54 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). probability density post n=100 post n=10 prior theta Julien Berestycki (University of Oxford) SB2a MT / 20

55 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). probability density post n=100 post n=10 prior theta As n increases, the likelihood overwhelms information in prior. Julien Berestycki (University of Oxford) SB2a MT / 20

56 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown. Julien Berestycki (University of Oxford) SB2a MT / 20

57 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. Julien Berestycki (University of Oxford) SB2a MT / 20

58 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 kτ ) for some k > 0, ν R. Julien Berestycki (University of Oxford) SB2a MT / 20

59 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 kτ ) for some k > 0, ν R. The prior is π(τ, µ) = βα Γ(α) τ α 1 e βτ Julien Berestycki (University of Oxford) SB2a MT / 20

60 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 Julien Berestycki (University of Oxford) SB2a MT / 20

61 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is or π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 [ π(τ, µ) τ α 1/2 exp τ {β + k2 }] (µ ν)2 Julien Berestycki (University of Oxford) SB2a MT / 20

62 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is or π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 [ π(τ, µ) τ α 1/2 exp τ {β + k2 }] (µ ν)2 The likelihood is { f (x µ, τ) = (2π) n/2 τ n/2 exp τ 2 } n (x i µ) 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20

63 Example 2 - Conjugate priors Thus [ { π(τ, µ x) τ α+(n/2) 1/2 exp τ β + k 2 (µ ν) }] n (x i µ) 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20

64 Example 2 - Conjugate priors Thus [ { π(τ, µ x) τ α+(n/2) 1/2 exp τ β + k 2 (µ ν) }] n (x i µ) 2 i=1 Complete the square to see that k(µ ν) 2 + (x i µ) 2 ( ) kν + n x 2 = (k + n) µ + nk k + n n + k ( x ν)2 + (x i x) 2 Julien Berestycki (University of Oxford) SB2a MT / 20

65 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20

66 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 The idea is to try to write that as [ c exp τ {β + k }] 2 (ν µ) 2 so that (conditional on τ) we recognize a Normal density. Julien Berestycki (University of Oxford) SB2a MT / 20

67 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 The idea is to try to write that as [ c exp τ {β + k }] 2 (ν µ) 2 so that (conditional on τ) we recognize a Normal density. k(µ ν) 2 + (x i µ) 2 = µ 2 (k + n) µ(2kν + 2 x i ) +... ( = (k + n) µ kν + n x k + n ) 2 + nk n + k ( x ν)2 + (x i x) 2 Julien Berestycki (University of Oxford) SB2a MT / 20

68 Example 2 - Conjugate priors Thus the posterior is where π(τ, µ x) τ α 1/2 exp [ τ {β + k }] 2 (ν µ) 2 α = α + n 2 β = β nk n + k ( x ν)2 + 1 (xi x) 2 2 k = k + n ν = kν + n x k + n This is the same form as the prior, so the class is conjugate prior. Julien Berestycki (University of Oxford) SB2a MT / 20

69 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ Julien Berestycki (University of Oxford) SB2a MT / 20

70 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ = π(µ τ, x)π(τ x)dτ Here, we have a simplification if we assume 2α = m N. Then τ = W /2β, µ = ν + Z / kτ with W χ 2 m and Z N(0, 1). Julien Berestycki (University of Oxford) SB2a MT / 20

71 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ = π(µ τ, x)π(τ x)dτ Here, we have a simplification if we assume 2α = m N. Then τ = W /2β, µ = ν + Z / kτ with W χ 2 m and Z N(0, 1). Recall that Z m/w t m (Studen with m d.f.) we see that the prior of µ is km 2β (µ ν) t m and the posterior is k m 2β (µ ν ) t m, m = m + n Julien Berestycki (University of Oxford) SB2a MT / 20

72 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Julien Berestycki (University of Oxford) SB2a MT / 20

73 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544). post. mean = post. std dev = central 95% post. interval = [0.415, 0.477] Julien Berestycki (University of Oxford) SB2a MT / 20

Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is 0.485.

74 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544). post. mean = post. std dev = central 95% post. interval = [0.415, 0.477] Sensibility to proposed prior. α/β 2 = prior sample size". Julien Berestycki (University of Oxford) SB2a MT / 20

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using