Foundations of Statistical Inference
|
|
- Juniper Fowler
- 5 years ago
- Views:
Transcription
1 Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT / 20
2 Lecture 6 : Bayesian Inference Julien Berestycki (University of Oxford) SB2a MT / 20
3 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. Julien Berestycki (University of Oxford) SB2a MT / 20
4 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. By contrast, in Subjective Bayesian inference, probability is a measure of the strength of belief. Julien Berestycki (University of Oxford) SB2a MT / 20
5 Ideas of probability The majority of statistics you have learned so are are called classical or Frequentist. The probability for an event is defined as the proportion of successes in an infinite number of repeatable trials. By contrast, in Subjective Bayesian inference, probability is a measure of the strength of belief. We treat parameters as random variables. Before collecting any data we assume that there is uncertainty about the value of a parameter. This uncertainty can be formalised by specifying a pdf (or pmf) for the parameter. We then conduct an experiment to collect some data that will give us information about the parameter. We then use Bayes Theorem to combine our prior beliefs with the data to derive an updated estimate of our uncertainty about the parameter. Julien Berestycki (University of Oxford) SB2a MT / 20
6 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). Julien Berestycki (University of Oxford) SB2a MT / 20
7 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. Julien Berestycki (University of Oxford) SB2a MT / 20
8 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. Julien Berestycki (University of Oxford) SB2a MT / 20
9 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. This continued throughout the s, especially as problems with the Frequentist approach started to emerge. Julien Berestycki (University of Oxford) SB2a MT / 20
10 The history of Bayesian Statistics Bayesian methods originated with Bayes and Laplace (late 1700s to mid 1800s). In the early 1920 s, Fisher put forward an opposing viewpoint, that statistical inference must be based entirely on probabilities with direct experimental interpretation i.e. the repeated sampling principle. In 1939 Jeffrey s book The theory of probability started a resurgence of interest in Bayesian inference. This continued throughout the s, especially as problems with the Frequentist approach started to emerge. The development of simulation based inference has transformed Bayesian statistics in the last years and it now plays a prominent part in modern statistics. Julien Berestycki (University of Oxford) SB2a MT / 20
11 Julien Berestycki (University of Oxford) SB2a MT / 20
12 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data Julien Berestycki (University of Oxford) SB2a MT / 20
13 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. Julien Berestycki (University of Oxford) SB2a MT / 20
14 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. This is not the same as the statement that, given data X = x, the interval C(x) covers θ with probability 1 α. But this is the type of statement we might wish to make. (observe that this statement makes sense iff θ is a r.v.) Julien Berestycki (University of Oxford) SB2a MT / 20
15 Problems with Frequentist Inference - I Frequentist Inference generally does not condition on the observed data A confidence interval is a set-valued function C(X) Θ of the data X which covers the parameter θ C(X) a fraction 1 α of repeated draws of X taken under the null H 0. This is not the same as the statement that, given data X = x, the interval C(x) covers θ with probability 1 α. But this is the type of statement we might wish to make. (observe that this statement makes sense iff θ is a r.v.) Example 1 Suppose X 1, X 2 U(θ 1/2, θ + 1/2) so that X (1) and X (2) are the order statistics. Then C(X) = [X (1), X (2) ] is a α = 1/2 level CI for θ. Suppose in your data X = x, x (2) x (1) > 1/2 (this happens in an eighth of data sets). Then θ [x (1), x (2) ] with probability one. Julien Berestycki (University of Oxford) SB2a MT / 20
16 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed Julien Berestycki (University of Oxford) SB2a MT / 20
17 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed The likelihood principle Suppose that two experiments relating to θ, E 1, E 2, give rise to data y 1, y 2, such that the corresponding likelihoods are proportional, that is, for all θ L(θ; y 1, E 1 ) = cl(θ; y 2, E 2 ). then the two experiments lead to identical conclusions about θ. Julien Berestycki (University of Oxford) SB2a MT / 20
18 Problems with Frequentist Inference - II Frequentist Inference depends on data that were never observed The likelihood principle Suppose that two experiments relating to θ, E 1, E 2, give rise to data y 1, y 2, such that the corresponding likelihoods are proportional, that is, for all θ L(θ; y 1, E 1 ) = cl(θ; y 2, E 2 ). then the two experiments lead to identical conclusions about θ. Key point MLE s respect the likelihood principle i.e. the MLEs for θ are identical in both experiments. But significance tests do not respect the likelihood principle. Julien Berestycki (University of Oxford) SB2a MT / 20
19 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Julien Berestycki (University of Oxford) SB2a MT / 20
20 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Note 1 The calculation of the p-value involves a sum (or integral) over data that was not observed, and this can depend upon the form of the experiment. Julien Berestycki (University of Oxford) SB2a MT / 20
21 Consider a pure test for significance where we specify just H 0. We must choose a test statistic T (x), and define the p-value for data T (x) = t as p-value = P(T (X) at least as extreme as t H 0 ). The choice of T (X) amounts to a statement about the direction of likely departures from the null, which requires some consideration of alternative models. Note 1 The calculation of the p-value involves a sum (or integral) over data that was not observed, and this can depend upon the form of the experiment. Note 2 A p-value is not P(H 0 T (X) = t). Julien Berestycki (University of Oxford) SB2a MT / 20
22 Example A Bernoulli trial succeeds with probability p. Julien Berestycki (University of Oxford) SB2a MT / 20
23 Example A Bernoulli trial succeeds with probability p. E 1 fix n 1 Bernoulli trials, count number y 1 of successes Julien Berestycki (University of Oxford) SB2a MT / 20
24 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes Julien Berestycki (University of Oxford) SB2a MT / 20
25 Example A Bernoulli trial succeeds with probability p. E 1 fix n 1 Bernoulli trials, count number y 1 of successes E 2 count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = ( n1 y 1 ) p y 1 (1 p) n 1 y 1 binomial Julien Berestycki (University of Oxford) SB2a MT / 20
26 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 Julien Berestycki (University of Oxford) SB2a MT / 20
27 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 If n 1 = n 2 = n, y 1 = y 2 = y then L(p; y 1, E 1 ) L(p; n 2, E 2 ). Julien Berestycki (University of Oxford) SB2a MT / 20
28 Example A Bernoulli trial succeeds with probability p. E 1 E 2 fix n 1 Bernoulli trials, count number y 1 of successes count number n 2 Bernoulli trials to get fixed number y 2 successes L(p; y 1, E 1 ) = L(p; n 2, E 2 ) = ( n1 ) p y 1 (1 p) n 1 y 1 binomial y ( 1 ) n2 1 p y 2 (1 p) n 2 y 2 negative binomial y 2 1 If n 1 = n 2 = n, y 1 = y 2 = y then L(p; y 1, E 1 ) L(p; n 2, E 2 ). So MLEs for p will be the same under E 1 and E 2. Julien Berestycki (University of Oxford) SB2a MT / 20
29 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. Julien Berestycki (University of Oxford) SB2a MT / 20
30 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 Julien Berestycki (University of Oxford) SB2a MT / 20
31 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 while the p-value based on E 2 is ( P N n θ = 1 ) ( ) k 1 = (1/2) k (1 1/2) n k (= 0.033) 2 y 1 k=n so different conclusions at significance level Julien Berestycki (University of Oxford) SB2a MT / 20
32 Example But significance tests contradict : eg, H 0 : p = 1/2 against H 1 : p < 1/2 and suppose n = 12 and y = 3. The p-value based on E 1 is ( P Y y θ = 1 ) y ( ) n = (1/2) k (1 1/2) n k (= 0.073) 2 k k=0 while the p-value based on E 2 is ( P N n θ = 1 ) ( ) k 1 = (1/2) k (1 1/2) n k (= 0.033) 2 y 1 k=n so different conclusions at significance level Note The p-values disagree because they sum over portions of two different sample spaces. Julien Berestycki (University of Oxford) SB2a MT / 20
33 Julien Berestycki (University of Oxford) SB2a MT / 20
34 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ Julien Berestycki (University of Oxford) SB2a MT / 20
35 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ Julien Berestycki (University of Oxford) SB2a MT / 20
36 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ posterior likelihood prior The same form for θ continuous (π(θ x) a pdf) or discrete (π(θ x) a pmf). We call f (x θ)π(θ)dθ the marginal likelihood. Julien Berestycki (University of Oxford) SB2a MT / 20
37 Bayesian Inference - Revison Likelihood f (x θ) and prior distribution π(θ) for ϑ. The posterior distribution of ϑ at ϑ = θ, given x, is π(θ x) = f (x θ)π(θ) π(θ x) f (x θ)π(θ) f (x θ)π(θ)dθ posterior likelihood prior The same form for θ continuous (π(θ x) a pdf) or discrete (π(θ x) a pmf). We call f (x θ)π(θ)dθ the marginal likelihood. Likelihood principle Notice that, if we base all inference on the posterior distribution, then we respect the likelihood principle. If two likelihood functions are proportional, then any constant cancels top and bottom in Bayes rule, and the two posterior distributions are the same. Julien Berestycki (University of Oxford) SB2a MT / 20
38 Example 1 X Bin(n, ϑ) for known n and unknown ϑ Julien Berestycki (University of Oxford) SB2a MT / 20
39 Example 1 X Bin(n, ϑ) for known n and unknown ϑ. Suppose our prior knowledge about ϑ is represented by a Beta distribution on (0, 1), and θ is a trial value for ϑ. π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Julien Berestycki (University of Oxford) SB2a MT / 20
40 Example 1 X Bin(n, ϑ) for known n and unknown ϑ. Suppose our prior knowledge about ϑ is represented by a Beta distribution on (0, 1), and θ is a trial value for ϑ. π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) density a=1, b=1 a=20, b=20 a=1, b=3 a=8, b= Julien Berestycki (University of Oxford) SB2a MT / 20
41 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Julien Berestycki (University of Oxford) SB2a MT / 20
42 Example 1 Prior probability density Likelihood π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) f (x θ) = ( ) n θ x (1 θ) n x, x = 0,..., n x Julien Berestycki (University of Oxford) SB2a MT / 20
43 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior Julien Berestycki (University of Oxford) SB2a MT / 20
44 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x Julien Berestycki (University of Oxford) SB2a MT / 20
45 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Julien Berestycki (University of Oxford) SB2a MT / 20
46 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Here posterior has the same form as the prior (conjugacy) with updated parameters a, b replaced by a + x, b + n x Julien Berestycki (University of Oxford) SB2a MT / 20
47 Example 1 Prior probability density π(θ) = θa 1 (1 θ) b 1, 0 < θ < 1. B(a, b) Likelihood ( ) n f (x θ) = θ x (1 θ) n x, x = 0,..., n x Posterior probability density π(θ x) likelihood prior θ a 1 (1 θ) b 1 θ x (1 θ) n x = θ a+x 1 (1 θ) n x+b 1 Here posterior has the same form as the prior (conjugacy) with updated parameters a, b replaced by a + x, b + n x, so π(θ x) = θa+x 1 (1 θ) n x+b 1 B(a + x, b + n x) Julien Berestycki (University of Oxford) SB2a MT / 20
48 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = ab (a + b) 2 (a + b + 1) Julien Berestycki (University of Oxford) SB2a MT / 20
49 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = The posterior mean and variance are ab (a + b) 2 (a + b + 1) a + X a + b + n, (a + X)(b + n X) (a + b + n) 2 (a + b + n + 1) Julien Berestycki (University of Oxford) SB2a MT / 20
50 Example 1 For a Beta distribution with parameters a, b µ = a a + b, σ2 = The posterior mean and variance are ab (a + b) 2 (a + b + 1) a + X a + b + n, (a + X)(b + n X) (a + b + n) 2 (a + b + n + 1) Suppose X = n and we set a = b = 1 for our prior. Then posterior mean is n + 1 n + 2 i.e. when we observe events of just one type then our point estimate is not 0 or 1 (which is sensible especially in small sample sizes). Julien Berestycki (University of Oxford) SB2a MT / 20
51 Example 1 For large n, the posterior mean and variance are approximately In classical statistics X X(n X), n n 3 θ = X n, θ(1 θ) n = X(n X) n 3 Julien Berestycki (University of Oxford) SB2a MT / 20
52 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1 Julien Berestycki (University of Oxford) SB2a MT / 20
53 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). Julien Berestycki (University of Oxford) SB2a MT / 20
54 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). probability density post n=100 post n=10 prior theta Julien Berestycki (University of Oxford) SB2a MT / 20
55 Example 1 Suppose ϑ = 0.3 but prior mean is 0.7 with std 0.1. Suppose data X Bin(n, ϑ) with n = 10 (yielding X = 3) and then n = 100 (yielding X = 30, say). probability density post n=100 post n=10 prior theta As n increases, the likelihood overwhelms information in prior. Julien Berestycki (University of Oxford) SB2a MT / 20
56 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown. Julien Berestycki (University of Oxford) SB2a MT / 20
57 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. Julien Berestycki (University of Oxford) SB2a MT / 20
58 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 kτ ) for some k > 0, ν R. Julien Berestycki (University of Oxford) SB2a MT / 20
59 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 kτ ) for some k > 0, ν R. The prior is π(τ, µ) = βα Γ(α) τ α 1 e βτ Julien Berestycki (University of Oxford) SB2a MT / 20
60 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 Julien Berestycki (University of Oxford) SB2a MT / 20
61 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is or π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 [ π(τ, µ) τ α 1/2 exp τ {β + k2 }] (µ ν)2 Julien Berestycki (University of Oxford) SB2a MT / 20
62 Example 2 - Conjugate priors Normal distribution when the mean and variance are unknown.let τ = 1/σ 2, θ = (τ, µ). τ is called the precision. The prior τ has a Gamma distribution with parameters α, β > 0, and conditional on τ, µ has a distribution N(ν, 1 ) for some k > 0, ν R. The prior is or π(τ, µ) = kτ { βα Γ(α) τ α 1 e βτ (2π) 1/2 (kτ) 1/2 exp kτ } (µ ν)2 2 [ π(τ, µ) τ α 1/2 exp τ {β + k2 }] (µ ν)2 The likelihood is { f (x µ, τ) = (2π) n/2 τ n/2 exp τ 2 } n (x i µ) 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20
63 Example 2 - Conjugate priors Thus [ { π(τ, µ x) τ α+(n/2) 1/2 exp τ β + k 2 (µ ν) }] n (x i µ) 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20
64 Example 2 - Conjugate priors Thus [ { π(τ, µ x) τ α+(n/2) 1/2 exp τ β + k 2 (µ ν) }] n (x i µ) 2 i=1 Complete the square to see that k(µ ν) 2 + (x i µ) 2 ( ) kν + n x 2 = (k + n) µ + nk k + n n + k ( x ν)2 + (x i x) 2 Julien Berestycki (University of Oxford) SB2a MT / 20
65 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 Julien Berestycki (University of Oxford) SB2a MT / 20
66 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 The idea is to try to write that as [ c exp τ {β + k }] 2 (ν µ) 2 so that (conditional on τ) we recognize a Normal density. Julien Berestycki (University of Oxford) SB2a MT / 20
67 Example 2 - Completing the square Your posterior dependence on µ is entirely in the factor [ { }] exp τ β + k 2 (µ ν)2 + 1 n (x i µ) 2 2 i=1 The idea is to try to write that as [ c exp τ {β + k }] 2 (ν µ) 2 so that (conditional on τ) we recognize a Normal density. k(µ ν) 2 + (x i µ) 2 = µ 2 (k + n) µ(2kν + 2 x i ) +... ( = (k + n) µ kν + n x k + n ) 2 + nk n + k ( x ν)2 + (x i x) 2 Julien Berestycki (University of Oxford) SB2a MT / 20
68 Example 2 - Conjugate priors Thus the posterior is where π(τ, µ x) τ α 1/2 exp [ τ {β + k }] 2 (ν µ) 2 α = α + n 2 β = β nk n + k ( x ν)2 + 1 (xi x) 2 2 k = k + n ν = kν + n x k + n This is the same form as the prior, so the class is conjugate prior. Julien Berestycki (University of Oxford) SB2a MT / 20
69 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ Julien Berestycki (University of Oxford) SB2a MT / 20
70 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ = π(µ τ, x)π(τ x)dτ Here, we have a simplification if we assume 2α = m N. Then τ = W /2β, µ = ν + Z / kτ with W χ 2 m and Z N(0, 1). Julien Berestycki (University of Oxford) SB2a MT / 20
71 Example 2 - Contd If we are interested in the posterior distribution of µ alone π(µ x) = π(τ, µ x)dτ = π(µ τ, x)π(τ x)dτ Here, we have a simplification if we assume 2α = m N. Then τ = W /2β, µ = ν + Z / kτ with W χ 2 m and Z N(0, 1). Recall that Z m/w t m (Studen with m d.f.) we see that the prior of µ is km 2β (µ ν) t m and the posterior is k m 2β (µ ν ) t m, m = m + n Julien Berestycki (University of Oxford) SB2a MT / 20
72 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Julien Berestycki (University of Oxford) SB2a MT / 20
73 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544). post. mean = post. std dev = central 95% post. interval = [0.415, 0.477] Julien Berestycki (University of Oxford) SB2a MT / 20
74 Example: estimating the probability of female birth given placenta previa Result of german study: 980 birth, 437 females. In general population the proportion is Using a uniform (Beta(1,1)) prior, posterior is Beta(438,544). post. mean = post. std dev = central 95% post. interval = [0.415, 0.477] Sensibility to proposed prior. α/β 2 = prior sample size". Julien Berestycki (University of Oxford) SB2a MT / 20
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationCourse arrangements. Foundations of Statistical Inference
Course arrangements Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 013 Lectures M. and Th.10 SPR1 weeks 1-8 Classes Weeks 3-8 Friday 1-1, 4-5, 5-6
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationData Analysis and Uncertainty Part 2: Estimation
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable
More informationIntroduction to Applied Bayesian Modeling. ICPSR Day 4
Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the
More information9 Bayesian inference. 9.1 Subjective probability
9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis
More informationChapter 8: Sampling distributions of estimators Sections
Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.
More informationUSEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*
USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationBernoulli and Poisson models
Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationBayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification
Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More informationST 740: Multiparameter Inference
ST 740: Multiparameter Inference Alyson Wilson Department of Statistics North Carolina State University September 23, 2013 A. Wilson (NCSU Statistics) Multiparameter Inference September 23, 2013 1 / 21
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationStatistical Inference: Maximum Likelihood and Bayesian Approaches
Statistical Inference: Maximum Likelihood and Bayesian Approaches Surya Tokdar From model to inference So a statistical analysis begins by setting up a model {f (x θ) : θ Θ} for data X. Next we observe
More informationWeakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationBayesian statistics, simulation and software
Module 4: Normal model, improper and conjugate priors Department of Mathematical Sciences Aalborg University 1/25 Another example: normal sample with known precision Heights of some Copenhageners in 1995:
More informationBayesian Inference. p(y)
Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationUnobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:
Pi Priors Unobservable Parameter population proportion, p prior: π ( p) Conjugate prior π ( p) ~ Beta( a, b) same PDF family exponential family only Posterior π ( p y) ~ Beta( a + y, b + n y) Observed
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationBayesian Statistics. Debdeep Pati Florida State University. February 11, 2016
Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability
More informationConditional distributions
Conditional distributions Will Monroe July 6, 017 with materials by Mehran Sahami and Chris Piech Independence of discrete random variables Two random variables are independent if knowing the value of
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationPart 3 Robust Bayesian statistics & applications in reliability networks
Tuesday 9:00-12:30 Part 3 Robust Bayesian statistics & applications in reliability networks by Gero Walter 69 Robust Bayesian statistics & applications in reliability networks Outline Robust Bayesian Analysis
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective
More informationWeakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu
ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu Agenda knn and PCA Bayesian Inference k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationBayesian statistics, simulation and software
Module 3: Bayesian principle, binomial model and conjugate priors Department of Mathematical Sciences Aalborg University 1/14 Motivating example: Spelling correction (Adapted from Bayesian Data Analysis
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationSTAT 830 Bayesian Estimation
STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationStatistical Theory MT 2006 Problems 4: Solution sketches
Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine
More informationBayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics
More informationHPD Intervals / Regions
HPD Intervals / Regions The HPD region will be an interval when the posterior is unimodal. If the posterior is multimodal, the HPD region might be a discontiguous set. Picture: The set {θ : θ (1.5, 3.9)
More informationBayesian Approach 2. CSC412 Probabilistic Learning & Reasoning
CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities
More informationData Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We
More information1 Introduction. P (n = 1 red ball drawn) =
Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationError analysis for efficiency
Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationBayes Factors for Grouped Data
Bayes Factors for Grouped Data Lizanne Raubenheimer and Abrie J. van der Merwe 2 Department of Statistics, Rhodes University, Grahamstown, South Africa, L.Raubenheimer@ru.ac.za 2 Department of Mathematical
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationBeta statistics. Keywords. Bayes theorem. Bayes rule
Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate
More information(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics
Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What
More informationA Discussion of the Bayesian Approach
A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns
More informationBoxplots A boxplot, or box-and-whisker plot, is a convenient way of summarising data, particularly when the data is made up of several groups.
Boxplots A boxplot, or box-and-whisker plot, is a convenient way of summarising data, particularly when the data is made up of several groups. 75 8 85 9 95 Boxplot largest value (there are no extreme values)
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationBayesian inference: what it means and why we care
Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More informationSTAT J535: Chapter 5: Classes of Bayesian Priors
STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationOne-parameter models
One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationBayesian Inference: Concept and Practice
Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationConfidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University
Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia,, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University Outline Lecture 1 Introduction Confidence Intervals
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationBayesian Inference for Normal Mean
Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationFinal Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon
Final Examination Saturday, 2001 May 5, 9:00am 12:00 noon This is an open-book examination, but you may not share materials. A normal distribution table, a PMF/PDF handout, and a blank worksheet are attached
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More information