9 Bayesian inference. 9.1 Subjective probability
|
|
- Archibald Jennings
- 6 years ago
- Views:
Transcription
1 9 Bayesian inference Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win M and equally prepared to accept a stake of pm to win M. In other words the bet is fair and you are assumed to behave rationally Kolmogorov s axioms How does subjective probability fit in with the fundamental axioms? Let A be the set of all subsets of a countable sample space Ω. Then (i) P (A) 0 for every A A; (ii) P (Ω)=1; 83
2 (iii) If {A λ : λ Λ} is a countable set of mutually exclusive events belonging to A, then P ( λ Λ A λ ) = λ Λ P (A λ ). Obviously the subjective interpretation has no difficulty in conforming with (i) and (ii). (iii) is slightly less obvious. Suppose we have 2 events A and B such that A B =. Consider a stake of p A M to win M if A occurs and a stake p B M to win M if B occurs. The total stake for bets on A or B occurring is p A M+ p B M to win M if A or B occurs. Thus we have (p A + p B )M to win M and so P (A B) =P (A) +P (B) Conditional probability Define p B, p AB, p A B such that p B M is the fair stake for M if B occurs; p AB M is the fair stake for M if A and B occur; p A B M is the fair stake for M if A occurs given B has occurred otherwise the bet is off. Consider the 3 alternative outcomes A only, B only, A and B. IfG 1, G 2, G 3 are the gains, then A : p B M 2 p AB M 3 = G 1 B : p A B M 1 +(1 p B )M 2 p AB )M 3 = G 2 A B : (1 p A B )M 1 +(1 p B )M 2 +(1 p AB )M 3 = G 3 The principle of rationality does not allow the possibility of setting the stake to obtain sure profit. Thus or 0 p B p AB p A B 1 p B p AB 1 p A B 1 p B 1 p AB p AB p B p A B =0 P (A B) = P (A B). P (B) =0 84
3 9.2 Estimation formulated as a decision problem A The action space, which is the set of all possible actions a A available. Θ The parameter space, consisting of all possible states of nature, only one of which occurs or will occur ( this true state being unknown). L The loss function, having domain Θ A(the set of all possible consequences (θ, a)) and codomain R. R X Containing all possible realisations x R X of a random variable X belonging to the family {f(x; θ) :θ Θ}. D The decision space, consisting of all possible decisions δ D, each decision function having domain R X and codomain δ The basic idea The true state of the world is unknown when the action is chosen. The loss is known which results from each of the possible consequences (θ, a). Data are collected to provide information about the unknown θ. A choice of action is taken for each possible observation of X. Decision theory is concerned with the criteria for making a choice. The form of the loss function is crucial it depends upon the nature of the problem. Example You are a fashion buyer and must stock up with trendy dresses. The true number you will sell is θ (unknown). 85
4 You stock up with a dresses. If you overstock you must sell off at your endof-season sale and lose A per dress. If you understock you lose the profit of B per dress. Thus { A(a θ), a θ, L(θ, a) = B(θ a), a θ. Special forms of loss function If A = B, you have modular loss, L(θ, a) =A θ a To suit comparison with classical inference, quadratic loss is often used. L(θ, a) =C(θ a) The risk function This is a measure of the quality of a decision function. Suppose we choose a decision δ(x) (i.e. choose a = δ(x)). The function R with domain Θ Dand codomain R defined by is called the risk function. We may write this as R(θ, δ) = L (θ, δ(x)) f(x; θ)dx RX or R(θ, δ) = x RX L (θ, δ(x)) p(x; θ) R(θ, δ) =E [L (θ,δ(x))]. Example Suppose X is a random sample from N(θ,θ) and suppose we assume quadratic loss. 86
5 Consider We know that E(X) =θ, so But E [δ 2 (X)] = θ, sothat δ 1 (X) =X, δ 2 (X) = 1 n 1 n i=1 R(θ, δ 1 )=E [ (θ X) 2]. R(θ, δ 1 )=V (X) = θ n. R(θ, δ 2 )=E [ (θ δ 2 (X)) 2 ]. R(θ, δ 2 ) = V [δ 2 (X)] = [ θ 2 n (n V 1) 2 i=1 (X i X) 2. (X i X) 2 θ ] = R(θ,d) θ 2 2θ2 2(n 1) = 2 (n 1) n. 1 d 2 (n 1)/2n 2 d 1 (n 1)/2n Without knowing θ, how do we choose between δ 1 and δ 2? Two sensible criteria θ 1. Take precautions against the worst. 2. Take into account any further information you may have. 87
6 Criterion 1 leads to minimax. Choose the decision leading to the smallest maximum risk, i.e. choose δ Dsuch that sup {R(θ, δ )} = inf θ Θ sup δ D θ Θ {R(θ, δ)}. Minimax rules have disadvantages, the main one being that they treat Nature as an adversary. Minimax guards against the worst choice. The value of θ should not be regarded as a value deliberately chosen by Nature to make the risk large. Fortunately in many situations they turn out to be reasonable in practice, if not in concept. Criterion 2 involves introducing beliefs about θ into the analysis. If π(θ) represents our beliefs about θ, then define Bayes risk r(π, δ) as r(π, δ) = Now choose δ such that { Θ R(θ, δ)π(θ)dθ θ Θ R(θ, δ)π(θ) if θ is continuous, if θ is discrete. r(π, δ )=inf{r(π, δ)}. δ D Bayes Rule The Bayes rule with respect to a prior π is the decision rule δ that minimises the Bayes risk among all possible decision rules. Bayes tomb 88
7 9.2.3 Decision terminology δ is called a Bayes decision function. δ(x) is a Bayes estimator. δ(x) is a Bayes estimate. If a decision function δ is such that δ satisfying R(θ, δ ) R(θ, δ ) θ Θ and R(θ, δ ) <R(θ,δ ) for some θ Θ, then δ is dominated by δ and δ is said to be inadmissible. All other decisions are admissible. 89
8 9.3 Prior and posterior distributions The prior distribution represents our initial belief in the parameter θ. This means that θ is regarded as a random variable with p.d.f. π(θ). From Bayes Theorem, f(x θ)π(θ) =π(θ x)f(x) or f(x θ)π(θ) π(θ x) = f(x θ)π(θ)dθ. Θ π(θ x) is called the posterior p.d.f. It represents our updated belief in θ, having taken account of the data. We write the expression in the form π(θ x) f(x θ)π(θ) or Posterior Likelihood Prior. Example Suppose X 1,X 2,...,X n is a random sample from a Poisson distribution, Poi(θ), where θ has a prior distribution π(θ) = θα 1 e θ Γ(α) The posterior distribution is given by which simplifies to π(θ x) θ x i e nθ xi!, θ 0,α> 0. θα 1 e θ Γ(α) π(θ x) θ x i +α 1 e (n+1)θ, θ 0. This is recognisable as a Γ-distribution and, by comparison with π(θ), we can write π(θ (n x) = +1) xi +α x θ i+α 1 e (n+1)θ, θ 0, Γ( x i + α) which represents our posterior belief in θ. 90
9 9.4 Decisions We have already defined the Bayes risk to be where so that r(π,δ) = Θ R(θ,δ)π(θ)dθ R(θ, δ) = L (θ,δ(x)) f(x θ)dx R X r(π, δ) = Θ R X L (θ, δ(x)) f (x θ)dxπ(θ)dθ = R X Θ L (θ, δ(x)) π(θ x)dθf (x)dx For any given x, minimising the Bayes risk is equivalent to minimising Θ L (θ,δ(x)) π(θ x)dθ. i.e. we minimise the posterior expected loss. Example Take the posterior distribution in the previous example and, for no special reason, assume quadratic loss Then we minimise L (θ,δ(x))=[θ δ(x)] 2. Θ [θ δ(x)] 2 π(θ x)dθ by differentiating with respect to δ to obtain 2 Θ [θ δ(x)] π(θ x)dθ =0 δ(x) = Θ θπ(θ x)dθ. 91
10 This is the mean of the posterior distribution. δ(x) = 0 (n +1) xi +α x θ i +α e (n+1)θ dθ Γ( x i + α) = (n +1) xi +α Γ( x i + α +1) Γ( x i + α)(n +1) x i +α+1 = xi + α n +1 Theorem: Suppose that δ π is a decision rule that is Bayes with respect to some prior distribution π. If the risk function of δ π satisfies then δ π is a minimax decision rule. R(θ, δ π ) r(π, δ π ) for all θ Θ, Proof : Suppose δ π is not minimax. Then there is a decision rule δ such that sup θ Θ R(θ, δ ) < sup R(θ, δ π ). θ Θ For this decision rule we have, since, if f(x) k, then E (f (X)) k, r(π, δ ) sup R(θ, δ ) θ Θ R(θ, δ π ) < sup θ Θ r(π,δ π ), contradicting the statement that δ π is Bayes with respect to π. Hence δ π is minimax. Example Let X 1,...,X n be a random sample from a Bernoulli distribution B(1,p). Let Y = X i and consider estimating p using the loss function L(p, a) = 92 (p a)2 p(1 p),
11 a loss which penalises mistakes more if p is near 0 or 1. The m.l.e. is p = Y/n and ( ) ( p p) 2 R(p, p) = E p(1 p) 1 p(1 p) = = 1 p(1 p) n n The risk is constant and, therefore, minimax. [NB A decision rule with constant risk is called an equalizer rule] We now show that p is the Bayes estimator with respect to the U (0, 1) prior which will imply that p is minimax. π(θ y) p y (1 p) n y so we want the value of a which minimises 1 0 (p a) 2 Differentiation with respect to a yields p(1 p) py (1 p) n y dp. 0 a = py (1 p) n y 1 dp 1 0 py 1 (1 p) n y 1 dp. Now a β-distribution with parameters α, β has pdf so 1 f Γ(α (x) = + β)xα 1 (1 x) β 1, x [0, 1], Γ(α)Γ(β) a = Γ(y + 1)Γ(n y) Γ(n +1) and, using the identity Γ(α +1)=αΓ(α), Γ(n) Γ(y)Γ(n y) a = y n. i.e. p = Y/n. 93
12 9.5 Bayes estimates (i) The mean of π(θ x) is the Bayes estimate with respect to quadratic loss. Proof Choose δ(x) to minimise Θ [θ δ(x)] 2 π(θ x)dθ by differentiating with respect to δ and equating to zero. Then since Θ π(θ x)dθ =1. δ(x) = Θ θπ(θ x)dθ (ii) The median of π(θ x) is the Bayes estimate with respect to modular loss (i.e. absolute value loss). Proof Choose δ(x) to minimise = δ(x) Θ θ δ(x) π(θ x)dθ (δ(x) θ) π(θ x)dθ + δ(x) Differentiate with respect to δ and equate to zero. so that (θ δ(x)) π(θ x)dθ δ(x) π(θ x)dθ = π(θ x)dθ δ(x) δ(x) 2 π(θ x)dθ = π(θ x)dθ =1 δ(x) and δ(x) is the median of π(θ x). π(θ x)dθ = 1 2, 94
13 (iii) The mode of π(θ x) is the Bayes estimate with respect to zero-one loss (i.e. loss of zero for a correct estimate and loss of one for any incorrect eastimate). Proof The loss function has the form L (θ, δ(x)) = { 1, if δ(x) θ, 0, if δ(x) =θ. This is only a useful idea for a discrete distribution. θ Θ L (θ, δ(x)) π(θ x) = π(θ x) θ Θ\{θ } = 1 π(θ x) where δ(x) =θ. Minimisation means maximising π(θ x), soδ(x) is taken to be the most likely value the mode. Remember that π(θ x) represents our most up-to-date belief in θ and is all that is needed for inference. 95
14 9.6 Credible intervals Definition The interval (θ 1,θ 2 ) is a 100(1 α)% credible interval for θ if θ2 π(θ x)dθ =1 α, 0 α 1. θ 1 π(θ x) θ θ 1 θ 1 θ 2 θ 2 Both (θ 1,θ 2 ) and (θ 1,θ 2 ) are 100(1 α)% credible intervals. Definition The interval (θ 1,θ 2 ) is a 100(1 α)% highest posterior density interval if (i) (θ 1,θ 2 ) is a 100(1 α)% credible interval; (ii) θ (θ 1,θ 2 ) and θ / (θ 1,θ 2 ), π(θ x) π(θ x) Obviously this defines the shortest possible 100(1 α)% credible interval. 96
15 π(θ x) 1 α θ 1 θ 2 θ 97
16 9.7 Features of Bayesian models (i) If there is a sufficient statistic, say T, the posterior distribution will depend upon the data only through T. (ii) There is a natural parametric family of priors such that the posterior distributions also belong to this family. Such priors are called conjugate priors. Example: Binomial likelihood, beta prior A binomial likelihood has the form ( n ) p(x θ) = θ x (1 θ) n x, x x =0, 1,...,n, and a beta prior is where so that π(θ) = θa 1 (1 θ) b 1, θ [0, 1], B(a, b) B(a, b) = Γ(a)Γ(b) Γ(a. + b) Posterior Likelihood Prior, π(θ x) θ x+a 1 (1 θ) n x+b 1 and comparison with the prior leads to π(θ x) = θx+a 1 (1 θ) n x+b 1, θ [0, 1], B(x + a, n x + b) which is also a beta distribution. 98
17 9.8 Hypothesis testing The Bayesian only has a concept of a hypothesis test in certain very specific situations. This is because of the alien idea of having to regard a parameter as a fixed point. However the concept has some validity in an example such as the following. Example: Water samples are taken from a reservoir and checked for bacterial content. These provide information about θ, the proportion of infected samples. If it is believed that bacterial density is such that more than 1% of samples are infected, there is a legal obligation to change the chlorination procedure. Bayesian procedure is (i) use the data to obtain the posterior distribution of θ; (ii) calculate P (θ > 0.01). EC rules are formulated in statistical terms and, in such cases, allow for a 5% significance level. This could be interpreted in a Bayesian sense as allowing for a wrong decision with probability no more than 0.05, so order increased chlorination if P (θ > 0.01) > Where for some reason (e.g. legal reason) a fixed parameter value is of interest because it defines a threshold, then a Bayesian concept of a test is possible. It is usually performed either by calculating a credible bound or a credible interval and looking at it in relation to that threshold value or values, or by calculating posterior probabilities based on the threshold or thresholds. 99
18 9.9 Inferences for normal distributions Unknown mean, known variance Consider data from N(θ, σ 2 ) and a prior N(τ,κ 2 ). so, since and But Posterior Likelihood Prior. f(x θ,σ 2 )=(2πσ 2 ) n/2 exp π(θ) = π(θ x,σ 2,τ,κ 2 ) exp ( 1 2πκ exp 1 2 ( 1 2σ 2 (x i θ) 2 ) ) 2κ (θ τ 2 )2, θ R, ( n 2σ 2 (θ2 2θx) 1 2κ 2 (θ2 2θτ) n 2σ 2 (θ2 2θx) + 1 2κ 2 (θ2 2θτ) ( ) = 1 n 2 σ + 1 θ 2 2 κ 2 = 1 2 ( nx σ 2 + τ κ 2 ) θ + garbage ( n )( ) σ + 1 θ nx /σ2 + τ /κ κ 2 n /σ 2 +1/κ 2 so the posterior p.d.f. is proportional to [ ( 1 exp θ nx /σ2 + τ /κ 2 2(n /σ 2 +1/κ 2 ) 1 This means that θ x,σ 2,τ,κ 2 N The form is a weighted average. The prior mean τ has weight 1/κ 2 : this is 1/(prior variance),... n /σ 2 +1/κ 2 ) 2 ] ( nx /σ 2 + τ /κ 2 n /σ 2 +1/κ 2, (n / σ 2 +1 / κ 2 ) 1 ). 100 ).
19 and the observed sample mean has weight n /σ 2 : this is 1/(variance of sample mean). This is reasonable because ( nx /σ 2 + τ /κ 2 and lim n lim κ 2 n /σ 2 +1/κ 2 ) ( nx /σ 2 + τ /κ 2 n /σ 2 +1/κ 2 ) = x = x. The posterior mean tends to x as the amount of data swamps the prior or as prior beliefs become very vague Unknown variance, known mean We need to assign a prior p.d.f. π(σ 2 ). The Γ family of p.d.f. s provides a flexible set of shapes over [0, ). We take τ = 1 σ 2 Γ ( λ 2, νλ 2 where λ and ν are chosen to provide suitable location and shape. In other words, the prior p.d.f. of τ =1/σ 2 is chosen to be of the form ( ) π(τ) = (νλ)ν/2 ν/2 1 τ 2 ν/2 Γ(ν /2) exp νλτ, τ 0. 2 or π ( σ 2) = (νλ)ν/2 (σ 2 ) ν/2 1 τ =1/σ 2 is called the precision. The posterior p.d.f. is given by 2 ν/2 Γ(ν /2) ) exp ( νλ 2σ 2 ). π ( σ 2 x,θ ) f ( x θ, σ 2) π ( σ 2), and the right-hand side as a function of σ 2, is proportional to ( σ 2 ) n/2 exp ( 1 2σ 2 (x i θ) 2 ) ( σ 2) ν/2 1 exp ( νλ 2σ 2 ). 101
20 Thus π (σ 2 x,θ) is proportional to ( ) [ ]) σ 2 (ν+n)/2 1 exp ( 1 νλ + (x 2σ 2 i θ) 2 This is the same as the prior except that ν is replaced by ν + n and λ is replaced by νλ + (x i θ) 2. ν + n Consider the random variable W = νλ + (x i θ) 2 σ 2. Clearly f W (w) =Cw (ν+n)/2 1 e w/2, w 0. Thisisaχ 2 -distribution with ν + n degrees of freedom. In other words, νλ + (x i θ) 2 σ 2 χ 2 (ν + n). Again this agrees with intuition. As n we approach the classical estimate, and also as ν 0 which corresponds to vague prior knowledge Mean and variance unknown We now have to assign a joint prior p.d.f. π (θ, σ 2 ) to obtain a joint posterior p.d.f. π ( θ, σ 2 x ) f ( x θ, σ 2) π ( θ, σ 2). We shall look at the special case of a non-informative prior. From theoretical considerations beyond the scope of this course, a non-informative prior may be approximated by π ( θ, σ 2) 1 σ
21 Note that this is not to be thought of as an expression of prior beliefs but rather as an approximate prior which allows the posterior to be dominated by the data. π ( θ, σ 2 x ) ( σ 2) n/2 1 exp ( 1 2σ 2 (x i θ) 2 ). The right-hand side may be written in the form ( σ 2 ) n/2 1 exp ( 1 2σ 2 [ (x i x) 2 + n(x θ) 2 ]). We can use this form to integrate out, in turn, θ and σ 2. Using we find exp ( n ) 2σ (θ 2πσ 2 x)2 dθ 2 = n π ( σ 2 x ) ( σ 2) (n 1)/2 1 exp ( 1 2σ 2 (x i x) 2 ) and, putting W = (x i x) 2 /σ 2, f W (w) =Cw (n 1)/2 1 e w/2, w 0, so we see that (xi x) 2 σ 2 χ 2 (n 1). To integrate out σ 2, make the substitution σ 2 = τ 1. Then π (θ x) is proportional to 0 ( τ n/2 1 exp τ ]) [ (x i x) 2 + n(x θ) 2 dτ. 2 From the definition of the Γ-function, 0 y α 1 e βy dy = Γ(α) β α, and we see, since only the terms in θ are relevant, π (θ x) ( (x i x) 2 + n(x θ) 2 ) n/2. 103
22 or where π (θ x) t = (1 + t2 n 1 ) n/2 n(θ x). (xi x) 2 /(n 1) 104
A Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationEcon 2148, spring 2019 Statistical decision theory
Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationStatistical Theory MT 2006 Problems 4: Solution sketches
Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationLecture notes on statistical decision theory Econ 2110, fall 2013
Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationINTRODUCTION TO BAYESIAN METHODS II
INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the
More informationMIT Spring 2016
Decision Theoretic Framework MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Decision Theoretic Framework 1 Decision Theoretic Framework 2 Decision Problems of Statistical Inference Estimation: estimating
More informationEcon 2140, spring 2018, Part IIa Statistical Decision Theory
Econ 2140, spring 2018, Part IIa Maximilian Kasy Department of Economics, Harvard University 1 / 35 Examples of decision problems Decide whether or not the hypothesis of no racial discrimination in job
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationBayesian statistics: Inference and decision theory
Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationSTAT 830 Bayesian Estimation
STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More information4 Invariant Statistical Decision Problems
4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationThe binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.
The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let
More informationPrinciples of Statistics
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationBayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification
Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More informationChapter 1. Sets and probability. 1.3 Probability space
Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper
McGill University Faculty of Science Department of Mathematics and Statistics Part A Examination Statistics: Theory Paper Date: 10th May 2015 Instructions Time: 1pm-5pm Answer only two questions from Section
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 3 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a)
More informationData Analysis and Uncertainty Part 2: Estimation
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable
More informationStatistical Machine Learning Lecture 1: Motivation
1 / 65 Statistical Machine Learning Lecture 1: Motivation Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 65 What is this course about? Using the science of statistics to build machine learning
More informationLecture 8 October Bayes Estimators and Average Risk Optimality
STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.
More informationChapter 1. Statistical Spaces
Chapter 1 Statistical Spaces Mathematical statistics is a science that studies the statistical regularity of random phenomena, essentially by some observation values of random variable (r.v.) X. Sometimes
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More informationBeta statistics. Keywords. Bayes theorem. Bayes rule
Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate
More informationUnobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:
Pi Priors Unobservable Parameter population proportion, p prior: π ( p) Conjugate prior π ( p) ~ Beta( a, b) same PDF family exponential family only Posterior π ( p y) ~ Beta( a + y, b + n y) Observed
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationChapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets
Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets Confidence sets We consider a sample X from a population indexed by θ Θ R k. We are interested in ϑ, a vector-valued
More informationLecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)
8 HENRIK HULT Lecture 2 3. Some common distributions in classical and Bayesian statistics 3.1. Conjugate prior distributions. In the Bayesian setting it is important to compute posterior distributions.
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationStatistical Inference
Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will
More informationOn the Bayesianity of Pereira-Stern tests
Sociedad de Estadística e Investigación Operativa Test (2001) Vol. 10, No. 2, pp. 000 000 On the Bayesianity of Pereira-Stern tests M. Regina Madruga Departamento de Estatística, Universidade Federal do
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part 1: Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationChapter 7. Basic Probability Theory
Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries
More informationBayesian statistics, simulation and software
Module 10: Bayesian prediction and model checking Department of Mathematical Sciences Aalborg University 1/15 Prior predictions Suppose we want to predict future data x without observing any data x. Assume:
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationIntroduction to Bayesian Statistics 1
Introduction to Bayesian Statistics 1 STA 442/2101 Fall 2018 1 This slide show is an open-source document. See last slide for copyright information. 1 / 42 Thomas Bayes (1701-1761) Image from the Wikipedia
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationDiscrete Binary Distributions
Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationHPD Intervals / Regions
HPD Intervals / Regions The HPD region will be an interval when the posterior is unimodal. If the posterior is multimodal, the HPD region might be a discontiguous set. Picture: The set {θ : θ (1.5, 3.9)
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationChapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates
Chapter 4 Bayesian inference The posterior distribution π(θ x) summarises all our information about θ to date. However, sometimes it is helpful to reduce this distribution to a few key summary measures.
More informationThe Bayesian Paradigm
Stat 200 The Bayesian Paradigm Friday March 2nd The Bayesian Paradigm can be seen in some ways as an extra step in the modelling world just as parametric modelling is. We have seen how we could use probabilistic
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationHypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes
Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis
More informationPeter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More informationStatistics Ph.D. Qualifying Exam: Part I October 18, 2003
Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationDefinition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.
Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More information(3) Review of Probability. ST440/540: Applied Bayesian Statistics
Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More information2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling
2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationLecture 17: The Exponential and Some Related Distributions
Lecture 7: The Exponential and Some Related Distributions. Definition Definition: A continuous random variable X is said to have the exponential distribution with parameter if the density of X is e x if
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationMAS3301 Bayesian Statistics
MAS331 Bayesian Statistics M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 28-9 1 9 Conjugate Priors II: More uses of the beta distribution 9.1 Geometric observations 9.1.1
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationLecture 3. Univariate Bayesian inference: conjugate analysis
Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis
More informationFinal Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.
Final Examination a STA 532: Statistical Inference Wednesday, 2015 Apr 29, 7:00 10:00pm Thisisaclosed bookexam books&phonesonthefloor Youmayuseacalculatorandtwo pagesofyourownnotes Do not share calculators
More informationPreliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com
1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix
More informationBayesian inference: what it means and why we care
Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationBayesian Inference. Chapter 2: Conjugate models
Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationFundamental Theory of Statistical Inference
Fundamental Theory of Statistical Inference G. A. Young January 2012 Preface These notes cover the essential material of the LTCC course Fundamental Theory of Statistical Inference. They are extracted
More informationBayesian Statistics. Debdeep Pati Florida State University. February 11, 2016
Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability
More informationInterval Estimation. Chapter 9
Chapter 9 Interval Estimation 9.1 Introduction Definition 9.1.1 An interval estimate of a real-values parameter θ is any pair of functions, L(x 1,..., x n ) and U(x 1,..., x n ), of a sample that satisfy
More information