Notes on Jaynes s Treatment of the Marginalization Paradox
|
|
- Barnard Holmes
- 5 years ago
- Views:
Transcription
1 Notes on Jaynes s Treatment of the Marginalization Paradox Kevin S. Van Horn Leuther Analytics November 23, 23 In Chapter 15 of Probability Theory: The Logic of Science [2] abbreviated PTLOS, Jaynes discusses the Marginalization Paradox MP of Dawid, Stone, and Zidek [1] abbreviated DSZ, going into details for two specific examples. In the MP we have a model with four variables: ζ, η, y, and z. ζ is the parameter of interest, η is a nuisance parameter, and y and z are data. ζ and η are assigned independent priors π 1 ζ and π 2 η, and we are given py, z ζ, η. Suppose now that π 2 η belongs to a certain class of improper priors; the MP is that, deriving pζ z in two different but apparently equally legitimate ways, we get different answers. This result has been used to argue that one should not use improper priors at all in Bayesian analyses. Jaynes argues that the problem is not the use of an improper prior, but that the two derivations implicitly condition on different states of information, and hence we should expect to get different answers. In this note I show that Jaynes is wrong: the problem does in fact stem from the use of an improper prior. To Jaynes s credit, however, the paradox is resolved by carefully following the procedures he advocates in PTLOS. 1 The change-point problem In PTLOS section Jaynes looks specifically at a particular parameterization of the change-point problem, wherein y is a scalar, z is a vector of length n 1, and we have py, z ζ, η, I 1 = c n ζ η n y n 1 exp η y Qζ, z 1 for a constant c and positive function Q. Here I 1 is the state of information of a Bayesian B 1 who knows about all four variables and the above sampling distribution. Integrating out y, we find pz ζ, η, I 1 = cn ζ n 1! Qζ, z n. 2 1
2 Since the right-hand side of 2 has no dependence on η, Jaynes follows DSZ in concluding PTLOS that pz ζ, I 1 = cn ζ n 1! Qζ, z n. 3 Now, this is the first point where we need to exercise some caution. How do we justify the last step above? Let s carefully apply the rules of probability theory. To get pz ζ, I 1 from pz ζ, η, I 1 we must multiply by pη ζ, I 1 and integrate out η: pz ζ, I 1 = = dη pz, η ζ, I 1 dη pz η, ζ, I 1 pη ζ, I 1 = pz η, ζ, I 1 = pz η, ζ, I 1 = cn ζ n 1! Qζ, z n, dη pη ζ, I 1 for any < η <. The third equality follows because pz η, ζ, I 1 has no dependence on η. What if pη ζ, I 1 is improper? The usual procedure would be to replace = with above, in the hope that the end result will be a proper distribution, and then normalize when we are done. The problem, however, is that by definition the integral on the third line above diverges. Thus, the above derivation of pz ζ is invalid when using an improper prior over η. In such circumstances, Jaynes tells us that we must consider the sequence of proper priors whose limit our improper prior represents, go back and do our analysis with these proper priors, and see if our result converges to some well-defined limit. If it does converge, we take this limit as our result for the improper prior. If it does not converge, then we cannot use the improper prior. In this case we get the same answer at every point in whatever sequence of proper priors we might choose, and so we have our well-defined limit. Thus we can justify 3 even for improper priors. At this point Jaynes, again following DSZ, postulates another Bayesian B 2 who does not know of the existence of η or y; his knowledge is limited to ζ, z, the prior π 1 ζ, and the marginal sampling distribution 3. B 2 then applies Bayes rule to derive PTLOS 15.7 pζ z, I 2 π 1ζc ζ [Qζ, z] n, 4 where I 2 represents the state of information of B 2. Suppose now that B 1 assigns an improper prior for η: π 2 η η k, < η <. Applying Bayes Rule with pζ, η I 1 = π 1 ζπ 2 η and the sampling 2
3 distribution 3 to obtain pζ, η y, z, I 1, then integrating out η, Jaynes and DSZ obtain the marginal posterior for ζ PTLOS pζ y, z, I 1 π 1 ζc ζ [Qζ, z] n k+1 5 and thereby conclude, since 5 has no dependence on y, that pζ z, I 1 π 1 ζc ζ. 6 [Qζ, z] n k+1 4 and 6 are clearly different posterior distributions for ζ. Jaynes asserts that 6 is in fact correct for B 1, and attributes the difference with 4 to the different prior information of B 1 and B 2. To see that Jaynes s analysis is incorrect, it suffices to note that pz ζ, I 1 = pz ζ, I 2 pζ I 1 = pζ I 2 ; B 1 is also entitled to apply Bayes Rule to the above, and doing so, obtains the same answer as B 2 : pζ z, I 1 π 1ζc ζ [Qζ, z] n. 7 Since this contradicts 6, we do indeed have a paradox to resolve. We resolve the paradox as follows: We show that the step from 5 to 6 is invalid: taking this step involves integrating py z, I 1, which itself is improper, and hence the integral diverges. Thus we cannot obtain 6. We take the conservative approach of defining the sequence of proper priors whose limit π 2 η represents, then deriving pζ y, z for each of these proper priors; this gives us 7, and not 6, as the limit. 2 An overlooked impropriety As we did with 3, let us carefully go through the steps we would have to take to derive 6 from 5: pζ z, I 1 = = dy pζ, y z, I 1 dy pζ y, z, I 1 py z, I 1 = pζ y, z, I 1 = pζ y, z, I 1 dy py z, I 1 3
4 for any < y <, since pζ y, z, I 1 has no dependence on y. As in the discussion of 3, the above derivation fails if py z, I 1 is improper, as the integral on the second-to-last line diverges. We now show that py z, I 1 is, indeed, improper, given the improper prior over η. Applying 1 and the improper prior π 2 η η k, we obtain py z, I 1 py, z I 1 ζ y n 1 ζ = y n 1 ζ y k 2, dη py, z ζ, η, I 1 π 1 ζη k π 1 ζc ζ dη η n k exp ηy Qζ, z π 1 ζc ζ Γn k + 1 yqζ, z n k+1 which is improper. In the above we used the equality x α 1 e βx dx = Γα/β α, which should be familiar as the normalization constant for the gamma distribution. Thus we cannot obtain 6 from 5, and the paradox disappears. I believe this to be the heart of the Marginalization Paradox: the implicit use of a divergent integral when an improper prior for a parameter leads to an improper distribution for a data variable. 3 Take it to the limit The divergent integral in Section 2 tells us that we need to return to basics; that is, we need to repeat the analysis with a sequence of proper priors, and see if the results tend to some well-defined limit. Following Jaynes, we look at proper priors of the form π 2 η η a exp bη 8 for b > and a > 1; our previous improper prior corresponds to a = k and b. Let us write I 1 b for the state of information corresponding to use of this prior for a specific value b. Corresponding to 5 we now have pζ y, z, I 1 b π 1 ζc ζ, 9 b + y Qζ, z n+a+1 which is just PTLOS We must be careful here, though. We are going to multiply by py z, I 1 b then integrate out y in order to obtain pζ z, I 1 b. In using instead of = we have dropped factors that do not depend on ζ, but may depend on y; we need the y-dependent factors to get the correct result. So 4
5 we shall put the normalization constant back in: pζ y, z, I 1 b = π 1 ζc ζ b + y Qζ, z n+a+1 ζ π. 1 1ζ c ζ b + y Qζ, z n+a+1 Now let us again use 1 to derive the marginal distribution of y conditional on z, only this time for a proper prior: py z, I 1 b py, z I 1 b = ζ π 1 ζ y n 1 ζ dη pη I 1 bpy, z ζ, η, I 1 b π 1 ζc ζ dη η n+a exp ηb + y Qζ, z y n 1 ζ π 1 ζc ζ b + y Qζ, z n+a+1 11 The proportionality constant depends only on z and b; in particular, it doesn t depend on ζ. We now combine 11 with 1 to obtain pζ z, I 1 b = dy pζ y, z, I 1 b py z, I 1 b π 1 ζc ζ dy y n 1 b + y Qζ, z n+a To finish the derivation, we need the following facts, obtained by use of Mathematica: y r b + uy m dy = y r+1 b m r F 1 r + 1, m, r + 2, uy/b 13 2F 1 α, β, γ, s From 14 we find Γγ ΓβΓγ β 1 t β 1 1 t γ β 1 1 ts α dt 14 2F 1 α, β, γ, = Γγ ΓβΓγ β t β 1 1 t γ β 1 dt = Γγ ΓβΓγ β ΓβΓγ β Γγ = 1; 15 furthermore, as y, we have = κ 1 y r+1 fr + 1, m, r + 2, uy/b 1 t m 1 1 t r m+1 y r uyt/b r+1 dt 5
6 = κ 1 t m 1 1 t r m+1 y 1 + ut/b r+1 dt r+1 b 1 κ t m r 2 1 t r m+1 dt u r+1 b = κ Γm r 1Γr m + 2 u = Γr + 2Γm r 1 Γm r+1 b 16 u where κ Γr + 2/ΓmΓr m + 2. Combining 13, 15, and 16, we obtain r+1 y r b + uy m dy = r m Γr + 2Γm r 1 b b Γm u Substituting this into 12, with r = n 1, u = Qζ, z, and m = n + a + 1, we obtain pζ z, I 1 b π 1 ζc ζ [Qζ, z] n, which accords with 7. The above holds regardless of b, so it is also the limit as b. Thus 7, and not 6, is the correct solution for B 1. 4 What went wrong: non-uniform convergence The reader may still be puzzled as to how pζ z, I 1 could differ from pζ y, z, I 1 if the latter has no dependence on y. To clarify what is happening we look again at what happens with the proper priors 8 as b and we approach the improper prior η a. The problem is one of nonuniform convergence. Comparing 5 and 9, we have pζc ζ b + y Qζ, z n+a+1 pζc ζ y n a 1 Qζ, z n+a+1 only for y b/qζ, z. For any fixed value of y, pζ z, I 1 b does indeed tend to 5 as b ; nonetheless, for all values of b, no matter how small, there remain values of y for which pζ y, z, I 1 b differs greatly from 5. Is this important? Since the only way to get rid of the conditioning on y is to multiply by py z, I 1 b and then integrate out y, to obtain 6 we need to have py Cb/Qζ, z z, I 1 b, for any ζ, z, and constant C >, as b. In Figure 1 we plot pu z, I 1 b, where u yq/b and Q is the average of Qζ, z, 1 ζ n. We chose a = n = 1, c = 1.5, and randomly generated the vector z as follows: x i, 1 i n/2, were drawn from an exponential distribution with mean 1; x i, n/2 < i n, were drawn from an exponential 6
7 pdf v Figure 1: pu z, I 1 b, where u yq/b distribution with mean 1/c; and z i x i /x 1. The ratios Qζ, z/q range from.819 to 1.167; thus u 1 corresponds roughly to y b/qζ, z. We see then that in this case there is substantial probability mass in the region y b/qζ, z for all ζ, which is precisely the region in which 5 differs substantially from 9. We have not specified b, because pu z, I 1 b has no dependence on b! Thus as b we find that, for any ζ, P u CQ/Qζ, z z, I 1 b remains constant, and hence so does P y Cb/Qζ, z z, I 1 b. To see this, note from 11 and the definition of u that pu z, I 1 b u n 1 ζ π 1 ζc ζ b + bu Qζ, z/q n+a+1 u n 1 ζ π 1 ζc ζ 1 + u Qζ, z/q n+a+1, which has no dependence on b. 5 DSZ Example #5 Jaynes analyzes a second specific MP example in PTLOS section ; again, the apparent paradox can be traced to an overlooked improper distribution, and the paradox is resolved by noting that the integral we must take to obtain 7
8 the paradox diverges. In this case we have iid variables x i, with a normal distribution for each x i with mean µ and variance σ 2. We then define ζ = µ/σ, ν = σ, y = r, and z = R, where R 2 nx 2 + s 2 r nx/r x n 1 n i=1 x i n s 2 n 1 x i x 2. The paradox arises when we use an improper prior of the form i=1 pµ, σ σ γ 1, γ >, taken as the limit of the proper prior pµ, σ σ γ 1 exp β/σ αµ 2 as α, β. Similarly to the first example, pζ R is derived by first finding pζ r, R hζ, R, noting that this has no dependence on r, and concluding that pζ R hζ, R also. As before, this last step is invalid, and therefore the paradox cannot be created, because pr r is improper when we use the improper prior for µ and σ. We now present the proof that pr r is improper. Theorem: Let x i N µ, σ for 1 i n, and define S 2 = i x i x 2. Then x and S 2 are independent given µ and σ; x N µ, σ/ n; S 2 σ 2 χ 2 n 1. Proof: This is a standard result in statistics; see, for example, M. G. Bulmer, Principles of Statistics. We have s 2 = S 2 /n; define also u = S 2 σ 2 = nσ 2 s 2. Then du = nσ 2 ds 2, and combining this with the above theorem we have px µ, σ σ 1 exp n x µ2 2σ2 pu µ, σ u n 3/2 e u/2 ps 2 µ, σ σ n+1 s n 3 exp n 2σ 2 s2 px, s 2 µ, σ σ n s n 3 exp n s 2 2σ 2 + x µ 2. We will later remove the conditioning on µ and σ by multiplying by the priors on these variables and integrating them out. For this reason we explicitly show any 8
9 factors dependent on µ or σ, rather than absorbing them into the proportionality constants above. We now want to do a change of variables from x and s 2 to R. To compute the Jacobian, we first find R x = 1 nx 2 + s 2 1/2 2nx 2 = nxr 1 nx 2 + s 2 1/2 n R s 2 = 1 2 = 1 2 nr 1 r x = Rn nxnxr 1 R 2 = n R n2 x 2 R 3 r s 2 = nxr 2 = n2 x 2R 3 Then the Jacobian is the absolute value of R x = nxr 1 r s 2 R r s 2 x 1 2 nr 1 12 n2 xr nr 1 nr 1 n 2 x 2 R 3 = 1 2 n3 x 2 R n2 R n3 x 2 R 4 = 1 2 n2 R 2, giving dx ds 2 = 2n 2 R 2 dr dr. We now combine this with the fact that to obtain x = n 1 rr s = n 1 R 2 x 2 1/2 = n 1 R 2 n 2 r 2 R 2 1/2 = n 1 Rn r 2 1/2 pr, r µ, σ R 2 px, s 2 µ, σ R 2 σ n s n 3 exp n 2σ 2 s2 + x 2 + µ 2 2xµ R 2 σ n s n 3 exp 1 2σ 2 R2 2µrR + nµ 2 R n 1 σ n vr exp 1 2σ 2 R2 2µrR + nµ 2 9
10 where vr n r 2 n 3/2. Now we remove the µ dependence by multiplying pr, r µ, σ by the prior over µ and then integrating out µ: fr, R, µ, σ exp 1 exp 2σ 2 R2 2µrR + nµ 2 pµ 1 2σ 2 R2 2µrR + n + 2σ 2 αµ 2 Let λ n + 2σ 2 α and m rr/λ; then fr, R, µ, σ exp 1 2σ 2 R2 1 r 2 /λ + λµ m 2 and fr, R, µ, σdµ λ 1/2 σ exp R2 2σ 2 1 r2 /λ Now, we are interested in what happens as α, β simultaneously. As β we see that the prior over σ goes to pσ σ γ 1, and as we approach this limit it becomes highly probably that σ is very near zero. Recall that γ > is fixed. Thus as α, β we can assume that ασ 2 is very small. Then hence and so Likewise, we have So we have R2 2σ 2 where, for α, β 1, λ 1 n 1 1 2n 1 ασ 2, 1 r 2 λ 1 1 n 1 r 2 + 2n 2 r 2 ασ 2, 1 r2 R2 λ 2σ 2 1 r2 αr2 R 2 n n 2. λ 1/2 n 1/2 1 n 1 ασ 2. fr, R, µ, σdµ σ exp R2 2σ 2 wr gα, σhα, r, R, wr 1 r 2 /n gα, σ 1 n 1 ασ 2 hα, r, R exp n 2 αr 2 R 2. Note that n r n and so wr 1. This gives us pr, r σ R n 1 vrhα, r, Rgα, σσ n+1 exp 1 R2 2σ 2 wr.
11 Now we remove the σ dependence by multiplying by the prior over σ and then integrating out σ: F α, r, R, σ pσgα, σσ n+1 exp R2 2σ 2 wr F 1 r, R, σ + F 2 α, r, R, σ F 1 r, R, σ pσσ n+1 exp R2 2σ 2 wr = Z 1 σ n γ βσ R2 exp 2σ 2 wr F 2 α, r, R, σ n 1 ασ 2 F 1 r, R, σ where Z is the normalization constant for the prior over σ. We now note that 1 > exp β/σ > exp β exp β/σ 2 Proof: 1 < σ + 1/σ, hence 1/σ < 1 + 1/σ 2, hence β/σ > β β/σ 2. This means that Z 1 σ n γ exp R2 2σ 2 wr > F 1 r, R, σ > e β Z 1 σ n γ exp R2 2σ 2 wr β σ 2. Applying the change of variables ρ = σ 2, with dρ = 2σ dσ, we find 2Z 1 ρ n+γ+1/2 exp R2 2ρ wr dρ > F 1 r, R, σ dσ > 2Ze β 1 ρ n+γ+1/2 exp R2 2ρ wr β dρ. ρ We ve now matched the form of an inverted-gamma distribution for ρ, so the integrals are just the corresponding normalization constants: > > Γn + γ 1/2 2ZR 2 wr/2 n+γ 1/2 Likewise, we find for F 2 that F 1 r, R, σ dσ Γn + γ 1/2 2Ze β R 2 wr/2 + β n+γ 1/2. αγn + γ + 1/2 2nZR 2 wr/2 n+γ+1/2 11
12 < F 2 α, r, R, σ dσ αγn + γ + 1/2 < 2nZe β R 2 wr/2 + β. n+γ+1/2 We see that as α, β, if wr then for some constant C 1. Then F α, r, R, σ dσ C 1 R 2 wr n+γ 1/2 pr r pr, r vrr n 1 hα, r, R F α, r, R, σ dσ C 1 vrr n 1 R 2 wr n+γ 1/2 R γ which is improper. What about the case wr =? This occurs when r = ± n, that is to say, when all the x i are identical. As previously mentioned, as β the prior for σ becomes concentrated near, and so the x i tend to group closely together in the prior predictive distribution. Thus, the prior predictive distribution for wr is δwr, making wr = a case of interest. For wr = we obtain F 1 r, R, σ = Z 1 σ n γ exp β/σ. Thus F α, r, R, σ has no dependence on r or R, nor does its integral over σ, and so we find pr r pr, r vrr n 1 hα, r, R F α, r, R, σ dσ R n 1 hα, r, R R n 1 which is also improper. 6 Conclusion What have we learned from this? Certainly, we see that the use of improper priors can be a tricky business. However, avoiding improper priors altogether may be an extreme solution. In a sense, the usual rule for dealing with improper priors, more-or-less explicitly laid out in PTLOS, still protects us from the Marginalization Paradox. That rule has two parts: 12
13 Make sure you carefully follow the rules of probability theory; don t skip steps that seem obvious or unimportant. If you ever find yourself trying to evaluate a divergent integral, stop immediately; you must abandon the improper prior and use a proper prior. Optionally, you may do the analysis for a sequence of proper priors and see if the solutions tend to a well-defined limit. This note does not, of course, constitute a proof that the above simple rule will always protect one from inconsistencies in the use of improper priors. It seems clear that some foundational work remains to be done on the safe use of improper priors. In particular, the following questions need to be answered: What, exactly, do we mean when we say that an improper prior is the limit of a particular sequence of proper priors? This is a trickier question than one might think; in particular, it appears that no reasonable definition can be given without reference to some class of likelihood functions with which the prior is to be combined. If π i, 1 i <, are a sequence of proper priors whose limit is the improper prior π, what rules for use of improper priors will guarantee that a solution obtained using π is the well-defined limit of the sequence of solutions obtained by use of the priors π i? References [1] Dawid, A. P., Stone, M., and Zidek, J. V Marginalization paradoxes in Bayesian and structural inference with discussion. J. Roy. Statist. Soc. B, 35, [2] Jaynes, E. T. 23. Probability Theory: The Logic of Science, Cambridge University Press. 13
Remarks on Improper Ignorance Priors
As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive
More informationarxiv: v1 [math.st] 10 Aug 2007
The Marginalization Paradox and the Formal Bayes Law arxiv:0708.1350v1 [math.st] 10 Aug 2007 Timothy C. Wallstrom Theoretical Division, MS B213 Los Alamos National Laboratory Los Alamos, NM 87545 tcw@lanl.gov
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationSampling Distributions
Sampling Distributions In statistics, a random sample is a collection of independent and identically distributed (iid) random variables, and a sampling distribution is the distribution of a function of
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter
More informationFitting a Straight Line to Data
Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationMCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham
MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation
More informationT has many other desirable properties, and we will return to this example
2. Introduction to statistics: first examples 2.1. Introduction. The basic problem of statistics is to draw conclusions about unknown distributions of random variables from observed values. These conclusions
More informationExpectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom
Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to compute and interpret expectation, variance, and standard
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationBayesian Inference for Normal Mean
Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationImproper mixtures and Bayes s theorem
and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain
More information1 Introduction. P (n = 1 red ball drawn) =
Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability
More informationInconsistency of Bayesian inference when the model is wrong, and how to repair it
Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction
More informationSampling Distributions
In statistics, a random sample is a collection of independent and identically distributed (iid) random variables, and a sampling distribution is the distribution of a function of random sample. For example,
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationThe Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13
The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationAPPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2
APPM/MATH 4/5520 Solutions to Exam I Review Problems. (a) f X (x ) f X,X 2 (x,x 2 )dx 2 x 2e x x 2 dx 2 2e 2x x was below x 2, but when marginalizing out x 2, we ran it over all values from 0 to and so
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More information7.3 The Chi-square, F and t-distributions
7.3 The Chi-square, F and t-distributions Ulrich Hoensch Monday, March 25, 2013 The Chi-square Distribution Recall that a random variable X has a gamma probability distribution (X Gamma(r, λ)) with parameters
More informationIntroduc)on to Bayesian Methods
Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =
More informationMATH 180A - INTRODUCTION TO PROBABILITY PRACTICE MIDTERM #2 FALL 2018
MATH 8A - INTRODUCTION TO PROBABILITY PRACTICE MIDTERM # FALL 8 Name (Last, First): Student ID: TA: SO AS TO NOT DISTURB OTHER STUDENTS, EVERY- ONE MUST STAY UNTIL THE EXAM IS COMPLETE. ANSWERS TO THE
More informationChapter 8: Sampling distributions of estimators Sections
Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationBayes Factors for Discovery
Glen Cowan RHUL Physics 3 April, 22 Bayes Factors for Discovery The fundamental quantity one should use in the Bayesian framework to quantify the significance of a discovery is the posterior probability
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationu xx + u yy = 0. (5.1)
Chapter 5 Laplace Equation The following equation is called Laplace equation in two independent variables x, y: The non-homogeneous problem u xx + u yy =. (5.1) u xx + u yy = F, (5.) where F is a function
More informationSTAT 830 Bayesian Estimation
STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationlim F n(x) = F(x) will not use either of these. In particular, I m keeping reserved for implies. ) Note:
APPM/MATH 4/5520, Fall 2013 Notes 9: Convergence in Distribution and the Central Limit Theorem Definition: Let {X n } be a sequence of random variables with cdfs F n (x) = P(X n x). Let X be a random variable
More informationDivergence Based priors for the problem of hypothesis testing
Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and
More informationAPPM/MATH 4/5520 Solutions to Problem Set Two. = 2 y = y 2. e 1 2 x2 1 = 1. (g 1
APPM/MATH 4/552 Solutions to Problem Set Two. Let Y X /X 2 and let Y 2 X 2. (We can select Y 2 to be anything but when dealing with a fraction for Y, it is usually convenient to set Y 2 to be the denominator.)
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More informationAn Even Better Solution to the Paradox of the Ravens. James Hawthorne and Branden Fitelson (7/23/2010)
An Even Better Solution to the Paradox of the Ravens James Hawthorne and Branden Fitelson (7/23/2010) Think of confirmation in the context of the Ravens Paradox this way. The likelihood ratio measure of
More informationDIFFERENTIATION AND INTEGRATION PART 1. Mr C s IB Standard Notes
DIFFERENTIATION AND INTEGRATION PART 1 Mr C s IB Standard Notes In this PDF you can find the following: 1. Notation 2. Keywords Make sure you read through everything and the try examples for yourself before
More informationCS395T Computational Statistics with Application to Bioinformatics
CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 2: Estimation of Parameters The University of Texas at
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationModern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016
Auxiliary material: Exponential Family of Distributions Timo Koski Second Quarter 2016 Exponential Families The family of distributions with densities (w.r.t. to a σ-finite measure µ) on X defined by R(θ)
More informationPhysics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester
Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationChecking for Prior-Data Conflict
Bayesian Analysis (2006) 1, Number 4, pp. 893 914 Checking for Prior-Data Conflict Michael Evans and Hadas Moshonov Abstract. Inference proceeds from ingredients chosen by the analyst and data. To validate
More informationSolution: chapter 2, problem 5, part a:
Learning Chap. 4/8/ 5:38 page Solution: chapter, problem 5, part a: Let y be the observed value of a sampling from a normal distribution with mean µ and standard deviation. We ll reserve µ for the estimator
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationLecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)
8 HENRIK HULT Lecture 2 3. Some common distributions in classical and Bayesian statistics 3.1. Conjugate prior distributions. In the Bayesian setting it is important to compute posterior distributions.
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationDefault priors and model parametrization
1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationStrong Lens Modeling (II): Statistical Methods
Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution
More informationCHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution
CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS Recall that a continuous random variable X is a random variable that takes all values in an interval or a set of intervals. The distribution of a continuous
More informationHierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31
Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationSAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING
SAMPLE CHAPTER Avi Pfeffer FOREWORD BY Stuart Russell MANNING Practical Probabilistic Programming by Avi Pfeffer Chapter 9 Copyright 2016 Manning Publications brief contents PART 1 INTRODUCING PROBABILISTIC
More informationAlgebra Exam. Solutions and Grading Guide
Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full
More informationSequences and infinite series
Sequences and infinite series D. DeTurck University of Pennsylvania March 29, 208 D. DeTurck Math 04 002 208A: Sequence and series / 54 Sequences The lists of numbers you generate using a numerical method
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationThe exponential family: Conjugate priors
Chapter 9 The exponential family: Conjugate priors Within the Bayesian framework the parameter θ is treated as a random quantity. This requires us to specify a prior distribution p(θ), from which we can
More informationHarrison B. Prosper. CMS Statistics Committee
Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationMore Spectral Clustering and an Introduction to Conjugacy
CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationPhysics 225 Relativity and Math Applications. Fall Unit 7 The 4-vectors of Dynamics
Physics 225 Relativity and Math Applications Fall 2011 Unit 7 The 4-vectors of Dynamics N.C.R. Makins University of Illinois at Urbana-Champaign 2010 Physics 225 7.2 7.2 Physics 225 7.3 Unit 7: The 4-vectors
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES Contents 1. Continuous random variables 2. Examples 3. Expected values 4. Joint distributions
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationOne sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:
One sided tests So far all of our tests have been two sided. While this may be a bit easier to understand, this is often not the best way to do a hypothesis test. One simple thing that we can do to get
More informationLimiting Distributions
Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the
More informationUniversity of Toronto
A Limit Result for the Prior Predictive by Michael Evans Department of Statistics University of Toronto and Gun Ho Jang Department of Statistics University of Toronto Technical Report No. 1004 April 15,
More informationLesson 28: A Focus on Square Roots
now Lesson 28: A Focus on Square Roots Student Outcomes Students solve simple radical equations and understand the possibility of extraneous solutions. They understand that care must be taken with the
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationHOW TO CREATE A PROOF. Writing proofs is typically not a straightforward, algorithmic process such as calculating
HOW TO CREATE A PROOF ALLAN YASHINSKI Abstract We discuss how to structure a proof based on the statement being proved Writing proofs is typically not a straightforward, algorithmic process such as calculating
More informationThis exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.
TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to
More informationProf. Thistleton MAT 505 Introduction to Probability Lecture 13
Prof. Thistleton MAT 55 Introduction to Probability Lecture 3 Sections from Text and MIT Video Lecture: Sections 5.4, 5.6 http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-4- probabilisticsystems-analysis-and-applied-probability-fall-2/video-lectures/lecture-8-continuousrandomvariables/
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.
More informationProbability, Entropy, and Inference / More About Inference
Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationPart 4: Multi-parameter and normal models
Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,
More informationCommon ontinuous random variables
Common ontinuous random variables CE 311S Earlier, we saw a number of distribution families Binomial Negative binomial Hypergeometric Poisson These were useful because they represented common situations:
More information1.4 Mathematical Equivalence
1.4 Mathematical Equivalence Introduction a motivating example sentences that always have the same truth values can be used interchangeably the implied domain of a sentence In this section, the idea of
More information1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.
probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I
More informationSecond-Order Homogeneous Linear Equations with Constant Coefficients
15 Second-Order Homogeneous Linear Equations with Constant Coefficients A very important class of second-order homogeneous linear equations consists of those with constant coefficients; that is, those
More information