Notes on Jaynes s Treatment of the Marginalization Paradox

Size: px
Start display at page:

Download "Notes on Jaynes s Treatment of the Marginalization Paradox"

Transcription

1 Notes on Jaynes s Treatment of the Marginalization Paradox Kevin S. Van Horn Leuther Analytics November 23, 23 In Chapter 15 of Probability Theory: The Logic of Science [2] abbreviated PTLOS, Jaynes discusses the Marginalization Paradox MP of Dawid, Stone, and Zidek [1] abbreviated DSZ, going into details for two specific examples. In the MP we have a model with four variables: ζ, η, y, and z. ζ is the parameter of interest, η is a nuisance parameter, and y and z are data. ζ and η are assigned independent priors π 1 ζ and π 2 η, and we are given py, z ζ, η. Suppose now that π 2 η belongs to a certain class of improper priors; the MP is that, deriving pζ z in two different but apparently equally legitimate ways, we get different answers. This result has been used to argue that one should not use improper priors at all in Bayesian analyses. Jaynes argues that the problem is not the use of an improper prior, but that the two derivations implicitly condition on different states of information, and hence we should expect to get different answers. In this note I show that Jaynes is wrong: the problem does in fact stem from the use of an improper prior. To Jaynes s credit, however, the paradox is resolved by carefully following the procedures he advocates in PTLOS. 1 The change-point problem In PTLOS section Jaynes looks specifically at a particular parameterization of the change-point problem, wherein y is a scalar, z is a vector of length n 1, and we have py, z ζ, η, I 1 = c n ζ η n y n 1 exp η y Qζ, z 1 for a constant c and positive function Q. Here I 1 is the state of information of a Bayesian B 1 who knows about all four variables and the above sampling distribution. Integrating out y, we find pz ζ, η, I 1 = cn ζ n 1! Qζ, z n. 2 1

2 Since the right-hand side of 2 has no dependence on η, Jaynes follows DSZ in concluding PTLOS that pz ζ, I 1 = cn ζ n 1! Qζ, z n. 3 Now, this is the first point where we need to exercise some caution. How do we justify the last step above? Let s carefully apply the rules of probability theory. To get pz ζ, I 1 from pz ζ, η, I 1 we must multiply by pη ζ, I 1 and integrate out η: pz ζ, I 1 = = dη pz, η ζ, I 1 dη pz η, ζ, I 1 pη ζ, I 1 = pz η, ζ, I 1 = pz η, ζ, I 1 = cn ζ n 1! Qζ, z n, dη pη ζ, I 1 for any < η <. The third equality follows because pz η, ζ, I 1 has no dependence on η. What if pη ζ, I 1 is improper? The usual procedure would be to replace = with above, in the hope that the end result will be a proper distribution, and then normalize when we are done. The problem, however, is that by definition the integral on the third line above diverges. Thus, the above derivation of pz ζ is invalid when using an improper prior over η. In such circumstances, Jaynes tells us that we must consider the sequence of proper priors whose limit our improper prior represents, go back and do our analysis with these proper priors, and see if our result converges to some well-defined limit. If it does converge, we take this limit as our result for the improper prior. If it does not converge, then we cannot use the improper prior. In this case we get the same answer at every point in whatever sequence of proper priors we might choose, and so we have our well-defined limit. Thus we can justify 3 even for improper priors. At this point Jaynes, again following DSZ, postulates another Bayesian B 2 who does not know of the existence of η or y; his knowledge is limited to ζ, z, the prior π 1 ζ, and the marginal sampling distribution 3. B 2 then applies Bayes rule to derive PTLOS 15.7 pζ z, I 2 π 1ζc ζ [Qζ, z] n, 4 where I 2 represents the state of information of B 2. Suppose now that B 1 assigns an improper prior for η: π 2 η η k, < η <. Applying Bayes Rule with pζ, η I 1 = π 1 ζπ 2 η and the sampling 2

3 distribution 3 to obtain pζ, η y, z, I 1, then integrating out η, Jaynes and DSZ obtain the marginal posterior for ζ PTLOS pζ y, z, I 1 π 1 ζc ζ [Qζ, z] n k+1 5 and thereby conclude, since 5 has no dependence on y, that pζ z, I 1 π 1 ζc ζ. 6 [Qζ, z] n k+1 4 and 6 are clearly different posterior distributions for ζ. Jaynes asserts that 6 is in fact correct for B 1, and attributes the difference with 4 to the different prior information of B 1 and B 2. To see that Jaynes s analysis is incorrect, it suffices to note that pz ζ, I 1 = pz ζ, I 2 pζ I 1 = pζ I 2 ; B 1 is also entitled to apply Bayes Rule to the above, and doing so, obtains the same answer as B 2 : pζ z, I 1 π 1ζc ζ [Qζ, z] n. 7 Since this contradicts 6, we do indeed have a paradox to resolve. We resolve the paradox as follows: We show that the step from 5 to 6 is invalid: taking this step involves integrating py z, I 1, which itself is improper, and hence the integral diverges. Thus we cannot obtain 6. We take the conservative approach of defining the sequence of proper priors whose limit π 2 η represents, then deriving pζ y, z for each of these proper priors; this gives us 7, and not 6, as the limit. 2 An overlooked impropriety As we did with 3, let us carefully go through the steps we would have to take to derive 6 from 5: pζ z, I 1 = = dy pζ, y z, I 1 dy pζ y, z, I 1 py z, I 1 = pζ y, z, I 1 = pζ y, z, I 1 dy py z, I 1 3

4 for any < y <, since pζ y, z, I 1 has no dependence on y. As in the discussion of 3, the above derivation fails if py z, I 1 is improper, as the integral on the second-to-last line diverges. We now show that py z, I 1 is, indeed, improper, given the improper prior over η. Applying 1 and the improper prior π 2 η η k, we obtain py z, I 1 py, z I 1 ζ y n 1 ζ = y n 1 ζ y k 2, dη py, z ζ, η, I 1 π 1 ζη k π 1 ζc ζ dη η n k exp ηy Qζ, z π 1 ζc ζ Γn k + 1 yqζ, z n k+1 which is improper. In the above we used the equality x α 1 e βx dx = Γα/β α, which should be familiar as the normalization constant for the gamma distribution. Thus we cannot obtain 6 from 5, and the paradox disappears. I believe this to be the heart of the Marginalization Paradox: the implicit use of a divergent integral when an improper prior for a parameter leads to an improper distribution for a data variable. 3 Take it to the limit The divergent integral in Section 2 tells us that we need to return to basics; that is, we need to repeat the analysis with a sequence of proper priors, and see if the results tend to some well-defined limit. Following Jaynes, we look at proper priors of the form π 2 η η a exp bη 8 for b > and a > 1; our previous improper prior corresponds to a = k and b. Let us write I 1 b for the state of information corresponding to use of this prior for a specific value b. Corresponding to 5 we now have pζ y, z, I 1 b π 1 ζc ζ, 9 b + y Qζ, z n+a+1 which is just PTLOS We must be careful here, though. We are going to multiply by py z, I 1 b then integrate out y in order to obtain pζ z, I 1 b. In using instead of = we have dropped factors that do not depend on ζ, but may depend on y; we need the y-dependent factors to get the correct result. So 4

5 we shall put the normalization constant back in: pζ y, z, I 1 b = π 1 ζc ζ b + y Qζ, z n+a+1 ζ π. 1 1ζ c ζ b + y Qζ, z n+a+1 Now let us again use 1 to derive the marginal distribution of y conditional on z, only this time for a proper prior: py z, I 1 b py, z I 1 b = ζ π 1 ζ y n 1 ζ dη pη I 1 bpy, z ζ, η, I 1 b π 1 ζc ζ dη η n+a exp ηb + y Qζ, z y n 1 ζ π 1 ζc ζ b + y Qζ, z n+a+1 11 The proportionality constant depends only on z and b; in particular, it doesn t depend on ζ. We now combine 11 with 1 to obtain pζ z, I 1 b = dy pζ y, z, I 1 b py z, I 1 b π 1 ζc ζ dy y n 1 b + y Qζ, z n+a To finish the derivation, we need the following facts, obtained by use of Mathematica: y r b + uy m dy = y r+1 b m r F 1 r + 1, m, r + 2, uy/b 13 2F 1 α, β, γ, s From 14 we find Γγ ΓβΓγ β 1 t β 1 1 t γ β 1 1 ts α dt 14 2F 1 α, β, γ, = Γγ ΓβΓγ β t β 1 1 t γ β 1 dt = Γγ ΓβΓγ β ΓβΓγ β Γγ = 1; 15 furthermore, as y, we have = κ 1 y r+1 fr + 1, m, r + 2, uy/b 1 t m 1 1 t r m+1 y r uyt/b r+1 dt 5

6 = κ 1 t m 1 1 t r m+1 y 1 + ut/b r+1 dt r+1 b 1 κ t m r 2 1 t r m+1 dt u r+1 b = κ Γm r 1Γr m + 2 u = Γr + 2Γm r 1 Γm r+1 b 16 u where κ Γr + 2/ΓmΓr m + 2. Combining 13, 15, and 16, we obtain r+1 y r b + uy m dy = r m Γr + 2Γm r 1 b b Γm u Substituting this into 12, with r = n 1, u = Qζ, z, and m = n + a + 1, we obtain pζ z, I 1 b π 1 ζc ζ [Qζ, z] n, which accords with 7. The above holds regardless of b, so it is also the limit as b. Thus 7, and not 6, is the correct solution for B 1. 4 What went wrong: non-uniform convergence The reader may still be puzzled as to how pζ z, I 1 could differ from pζ y, z, I 1 if the latter has no dependence on y. To clarify what is happening we look again at what happens with the proper priors 8 as b and we approach the improper prior η a. The problem is one of nonuniform convergence. Comparing 5 and 9, we have pζc ζ b + y Qζ, z n+a+1 pζc ζ y n a 1 Qζ, z n+a+1 only for y b/qζ, z. For any fixed value of y, pζ z, I 1 b does indeed tend to 5 as b ; nonetheless, for all values of b, no matter how small, there remain values of y for which pζ y, z, I 1 b differs greatly from 5. Is this important? Since the only way to get rid of the conditioning on y is to multiply by py z, I 1 b and then integrate out y, to obtain 6 we need to have py Cb/Qζ, z z, I 1 b, for any ζ, z, and constant C >, as b. In Figure 1 we plot pu z, I 1 b, where u yq/b and Q is the average of Qζ, z, 1 ζ n. We chose a = n = 1, c = 1.5, and randomly generated the vector z as follows: x i, 1 i n/2, were drawn from an exponential distribution with mean 1; x i, n/2 < i n, were drawn from an exponential 6

7 pdf v Figure 1: pu z, I 1 b, where u yq/b distribution with mean 1/c; and z i x i /x 1. The ratios Qζ, z/q range from.819 to 1.167; thus u 1 corresponds roughly to y b/qζ, z. We see then that in this case there is substantial probability mass in the region y b/qζ, z for all ζ, which is precisely the region in which 5 differs substantially from 9. We have not specified b, because pu z, I 1 b has no dependence on b! Thus as b we find that, for any ζ, P u CQ/Qζ, z z, I 1 b remains constant, and hence so does P y Cb/Qζ, z z, I 1 b. To see this, note from 11 and the definition of u that pu z, I 1 b u n 1 ζ π 1 ζc ζ b + bu Qζ, z/q n+a+1 u n 1 ζ π 1 ζc ζ 1 + u Qζ, z/q n+a+1, which has no dependence on b. 5 DSZ Example #5 Jaynes analyzes a second specific MP example in PTLOS section ; again, the apparent paradox can be traced to an overlooked improper distribution, and the paradox is resolved by noting that the integral we must take to obtain 7

8 the paradox diverges. In this case we have iid variables x i, with a normal distribution for each x i with mean µ and variance σ 2. We then define ζ = µ/σ, ν = σ, y = r, and z = R, where R 2 nx 2 + s 2 r nx/r x n 1 n i=1 x i n s 2 n 1 x i x 2. The paradox arises when we use an improper prior of the form i=1 pµ, σ σ γ 1, γ >, taken as the limit of the proper prior pµ, σ σ γ 1 exp β/σ αµ 2 as α, β. Similarly to the first example, pζ R is derived by first finding pζ r, R hζ, R, noting that this has no dependence on r, and concluding that pζ R hζ, R also. As before, this last step is invalid, and therefore the paradox cannot be created, because pr r is improper when we use the improper prior for µ and σ. We now present the proof that pr r is improper. Theorem: Let x i N µ, σ for 1 i n, and define S 2 = i x i x 2. Then x and S 2 are independent given µ and σ; x N µ, σ/ n; S 2 σ 2 χ 2 n 1. Proof: This is a standard result in statistics; see, for example, M. G. Bulmer, Principles of Statistics. We have s 2 = S 2 /n; define also u = S 2 σ 2 = nσ 2 s 2. Then du = nσ 2 ds 2, and combining this with the above theorem we have px µ, σ σ 1 exp n x µ2 2σ2 pu µ, σ u n 3/2 e u/2 ps 2 µ, σ σ n+1 s n 3 exp n 2σ 2 s2 px, s 2 µ, σ σ n s n 3 exp n s 2 2σ 2 + x µ 2. We will later remove the conditioning on µ and σ by multiplying by the priors on these variables and integrating them out. For this reason we explicitly show any 8

9 factors dependent on µ or σ, rather than absorbing them into the proportionality constants above. We now want to do a change of variables from x and s 2 to R. To compute the Jacobian, we first find R x = 1 nx 2 + s 2 1/2 2nx 2 = nxr 1 nx 2 + s 2 1/2 n R s 2 = 1 2 = 1 2 nr 1 r x = Rn nxnxr 1 R 2 = n R n2 x 2 R 3 r s 2 = nxr 2 = n2 x 2R 3 Then the Jacobian is the absolute value of R x = nxr 1 r s 2 R r s 2 x 1 2 nr 1 12 n2 xr nr 1 nr 1 n 2 x 2 R 3 = 1 2 n3 x 2 R n2 R n3 x 2 R 4 = 1 2 n2 R 2, giving dx ds 2 = 2n 2 R 2 dr dr. We now combine this with the fact that to obtain x = n 1 rr s = n 1 R 2 x 2 1/2 = n 1 R 2 n 2 r 2 R 2 1/2 = n 1 Rn r 2 1/2 pr, r µ, σ R 2 px, s 2 µ, σ R 2 σ n s n 3 exp n 2σ 2 s2 + x 2 + µ 2 2xµ R 2 σ n s n 3 exp 1 2σ 2 R2 2µrR + nµ 2 R n 1 σ n vr exp 1 2σ 2 R2 2µrR + nµ 2 9

10 where vr n r 2 n 3/2. Now we remove the µ dependence by multiplying pr, r µ, σ by the prior over µ and then integrating out µ: fr, R, µ, σ exp 1 exp 2σ 2 R2 2µrR + nµ 2 pµ 1 2σ 2 R2 2µrR + n + 2σ 2 αµ 2 Let λ n + 2σ 2 α and m rr/λ; then fr, R, µ, σ exp 1 2σ 2 R2 1 r 2 /λ + λµ m 2 and fr, R, µ, σdµ λ 1/2 σ exp R2 2σ 2 1 r2 /λ Now, we are interested in what happens as α, β simultaneously. As β we see that the prior over σ goes to pσ σ γ 1, and as we approach this limit it becomes highly probably that σ is very near zero. Recall that γ > is fixed. Thus as α, β we can assume that ασ 2 is very small. Then hence and so Likewise, we have So we have R2 2σ 2 where, for α, β 1, λ 1 n 1 1 2n 1 ασ 2, 1 r 2 λ 1 1 n 1 r 2 + 2n 2 r 2 ασ 2, 1 r2 R2 λ 2σ 2 1 r2 αr2 R 2 n n 2. λ 1/2 n 1/2 1 n 1 ασ 2. fr, R, µ, σdµ σ exp R2 2σ 2 wr gα, σhα, r, R, wr 1 r 2 /n gα, σ 1 n 1 ασ 2 hα, r, R exp n 2 αr 2 R 2. Note that n r n and so wr 1. This gives us pr, r σ R n 1 vrhα, r, Rgα, σσ n+1 exp 1 R2 2σ 2 wr.

11 Now we remove the σ dependence by multiplying by the prior over σ and then integrating out σ: F α, r, R, σ pσgα, σσ n+1 exp R2 2σ 2 wr F 1 r, R, σ + F 2 α, r, R, σ F 1 r, R, σ pσσ n+1 exp R2 2σ 2 wr = Z 1 σ n γ βσ R2 exp 2σ 2 wr F 2 α, r, R, σ n 1 ασ 2 F 1 r, R, σ where Z is the normalization constant for the prior over σ. We now note that 1 > exp β/σ > exp β exp β/σ 2 Proof: 1 < σ + 1/σ, hence 1/σ < 1 + 1/σ 2, hence β/σ > β β/σ 2. This means that Z 1 σ n γ exp R2 2σ 2 wr > F 1 r, R, σ > e β Z 1 σ n γ exp R2 2σ 2 wr β σ 2. Applying the change of variables ρ = σ 2, with dρ = 2σ dσ, we find 2Z 1 ρ n+γ+1/2 exp R2 2ρ wr dρ > F 1 r, R, σ dσ > 2Ze β 1 ρ n+γ+1/2 exp R2 2ρ wr β dρ. ρ We ve now matched the form of an inverted-gamma distribution for ρ, so the integrals are just the corresponding normalization constants: > > Γn + γ 1/2 2ZR 2 wr/2 n+γ 1/2 Likewise, we find for F 2 that F 1 r, R, σ dσ Γn + γ 1/2 2Ze β R 2 wr/2 + β n+γ 1/2. αγn + γ + 1/2 2nZR 2 wr/2 n+γ+1/2 11

12 < F 2 α, r, R, σ dσ αγn + γ + 1/2 < 2nZe β R 2 wr/2 + β. n+γ+1/2 We see that as α, β, if wr then for some constant C 1. Then F α, r, R, σ dσ C 1 R 2 wr n+γ 1/2 pr r pr, r vrr n 1 hα, r, R F α, r, R, σ dσ C 1 vrr n 1 R 2 wr n+γ 1/2 R γ which is improper. What about the case wr =? This occurs when r = ± n, that is to say, when all the x i are identical. As previously mentioned, as β the prior for σ becomes concentrated near, and so the x i tend to group closely together in the prior predictive distribution. Thus, the prior predictive distribution for wr is δwr, making wr = a case of interest. For wr = we obtain F 1 r, R, σ = Z 1 σ n γ exp β/σ. Thus F α, r, R, σ has no dependence on r or R, nor does its integral over σ, and so we find pr r pr, r vrr n 1 hα, r, R F α, r, R, σ dσ R n 1 hα, r, R R n 1 which is also improper. 6 Conclusion What have we learned from this? Certainly, we see that the use of improper priors can be a tricky business. However, avoiding improper priors altogether may be an extreme solution. In a sense, the usual rule for dealing with improper priors, more-or-less explicitly laid out in PTLOS, still protects us from the Marginalization Paradox. That rule has two parts: 12

13 Make sure you carefully follow the rules of probability theory; don t skip steps that seem obvious or unimportant. If you ever find yourself trying to evaluate a divergent integral, stop immediately; you must abandon the improper prior and use a proper prior. Optionally, you may do the analysis for a sequence of proper priors and see if the solutions tend to a well-defined limit. This note does not, of course, constitute a proof that the above simple rule will always protect one from inconsistencies in the use of improper priors. It seems clear that some foundational work remains to be done on the safe use of improper priors. In particular, the following questions need to be answered: What, exactly, do we mean when we say that an improper prior is the limit of a particular sequence of proper priors? This is a trickier question than one might think; in particular, it appears that no reasonable definition can be given without reference to some class of likelihood functions with which the prior is to be combined. If π i, 1 i <, are a sequence of proper priors whose limit is the improper prior π, what rules for use of improper priors will guarantee that a solution obtained using π is the well-defined limit of the sequence of solutions obtained by use of the priors π i? References [1] Dawid, A. P., Stone, M., and Zidek, J. V Marginalization paradoxes in Bayesian and structural inference with discussion. J. Roy. Statist. Soc. B, 35, [2] Jaynes, E. T. 23. Probability Theory: The Logic of Science, Cambridge University Press. 13

Remarks on Improper Ignorance Priors

Remarks on Improper Ignorance Priors As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive

More information

arxiv: v1 [math.st] 10 Aug 2007

arxiv: v1 [math.st] 10 Aug 2007 The Marginalization Paradox and the Formal Bayes Law arxiv:0708.1350v1 [math.st] 10 Aug 2007 Timothy C. Wallstrom Theoretical Division, MS B213 Los Alamos National Laboratory Los Alamos, NM 87545 tcw@lanl.gov

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Sampling Distributions

Sampling Distributions Sampling Distributions In statistics, a random sample is a collection of independent and identically distributed (iid) random variables, and a sampling distribution is the distribution of a function of

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation

More information

T has many other desirable properties, and we will return to this example

T has many other desirable properties, and we will return to this example 2. Introduction to statistics: first examples 2.1. Introduction. The basic problem of statistics is to draw conclusions about unknown distributions of random variables from observed values. These conclusions

More information

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to compute and interpret expectation, variance, and standard

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Improper mixtures and Bayes s theorem

Improper mixtures and Bayes s theorem and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Inconsistency of Bayesian inference when the model is wrong, and how to repair it Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction

More information

Sampling Distributions

Sampling Distributions In statistics, a random sample is a collection of independent and identically distributed (iid) random variables, and a sampling distribution is the distribution of a function of random sample. For example,

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13 The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2 APPM/MATH 4/5520 Solutions to Exam I Review Problems. (a) f X (x ) f X,X 2 (x,x 2 )dx 2 x 2e x x 2 dx 2 2e 2x x was below x 2, but when marginalizing out x 2, we ran it over all values from 0 to and so

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

7.3 The Chi-square, F and t-distributions

7.3 The Chi-square, F and t-distributions 7.3 The Chi-square, F and t-distributions Ulrich Hoensch Monday, March 25, 2013 The Chi-square Distribution Recall that a random variable X has a gamma probability distribution (X Gamma(r, λ)) with parameters

More information

Introduc)on to Bayesian Methods

Introduc)on to Bayesian Methods Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =

More information

MATH 180A - INTRODUCTION TO PROBABILITY PRACTICE MIDTERM #2 FALL 2018

MATH 180A - INTRODUCTION TO PROBABILITY PRACTICE MIDTERM #2 FALL 2018 MATH 8A - INTRODUCTION TO PROBABILITY PRACTICE MIDTERM # FALL 8 Name (Last, First): Student ID: TA: SO AS TO NOT DISTURB OTHER STUDENTS, EVERY- ONE MUST STAY UNTIL THE EXAM IS COMPLETE. ANSWERS TO THE

More information

Chapter 8: Sampling distributions of estimators Sections

Chapter 8: Sampling distributions of estimators Sections Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Bayes Factors for Discovery

Bayes Factors for Discovery Glen Cowan RHUL Physics 3 April, 22 Bayes Factors for Discovery The fundamental quantity one should use in the Bayesian framework to quantify the significance of a discovery is the posterior probability

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

u xx + u yy = 0. (5.1)

u xx + u yy = 0. (5.1) Chapter 5 Laplace Equation The following equation is called Laplace equation in two independent variables x, y: The non-homogeneous problem u xx + u yy =. (5.1) u xx + u yy = F, (5.) where F is a function

More information

STAT 830 Bayesian Estimation

STAT 830 Bayesian Estimation STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

lim F n(x) = F(x) will not use either of these. In particular, I m keeping reserved for implies. ) Note:

lim F n(x) = F(x) will not use either of these. In particular, I m keeping reserved for implies. ) Note: APPM/MATH 4/5520, Fall 2013 Notes 9: Convergence in Distribution and the Central Limit Theorem Definition: Let {X n } be a sequence of random variables with cdfs F n (x) = P(X n x). Let X be a random variable

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

APPM/MATH 4/5520 Solutions to Problem Set Two. = 2 y = y 2. e 1 2 x2 1 = 1. (g 1

APPM/MATH 4/5520 Solutions to Problem Set Two. = 2 y = y 2. e 1 2 x2 1 = 1. (g 1 APPM/MATH 4/552 Solutions to Problem Set Two. Let Y X /X 2 and let Y 2 X 2. (We can select Y 2 to be anything but when dealing with a fraction for Y, it is usually convenient to set Y 2 to be the denominator.)

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

An Even Better Solution to the Paradox of the Ravens. James Hawthorne and Branden Fitelson (7/23/2010)

An Even Better Solution to the Paradox of the Ravens. James Hawthorne and Branden Fitelson (7/23/2010) An Even Better Solution to the Paradox of the Ravens James Hawthorne and Branden Fitelson (7/23/2010) Think of confirmation in the context of the Ravens Paradox this way. The likelihood ratio measure of

More information

DIFFERENTIATION AND INTEGRATION PART 1. Mr C s IB Standard Notes

DIFFERENTIATION AND INTEGRATION PART 1. Mr C s IB Standard Notes DIFFERENTIATION AND INTEGRATION PART 1 Mr C s IB Standard Notes In this PDF you can find the following: 1. Notation 2. Keywords Make sure you read through everything and the try examples for yourself before

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 2: Estimation of Parameters The University of Texas at

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016 Auxiliary material: Exponential Family of Distributions Timo Koski Second Quarter 2016 Exponential Families The family of distributions with densities (w.r.t. to a σ-finite measure µ) on X defined by R(θ)

More information

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Checking for Prior-Data Conflict

Checking for Prior-Data Conflict Bayesian Analysis (2006) 1, Number 4, pp. 893 914 Checking for Prior-Data Conflict Michael Evans and Hadas Moshonov Abstract. Inference proceeds from ingredients chosen by the analyst and data. To validate

More information

Solution: chapter 2, problem 5, part a:

Solution: chapter 2, problem 5, part a: Learning Chap. 4/8/ 5:38 page Solution: chapter, problem 5, part a: Let y be the observed value of a sampling from a normal distribution with mean µ and standard deviation. We ll reserve µ for the estimator

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger) 8 HENRIK HULT Lecture 2 3. Some common distributions in classical and Bayesian statistics 3.1. Conjugate prior distributions. In the Bayesian setting it is important to compute posterior distributions.

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS Recall that a continuous random variable X is a random variable that takes all values in an interval or a set of intervals. The distribution of a continuous

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING SAMPLE CHAPTER Avi Pfeffer FOREWORD BY Stuart Russell MANNING Practical Probabilistic Programming by Avi Pfeffer Chapter 9 Copyright 2016 Manning Publications brief contents PART 1 INTRODUCING PROBABILISTIC

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Sequences and infinite series

Sequences and infinite series Sequences and infinite series D. DeTurck University of Pennsylvania March 29, 208 D. DeTurck Math 04 002 208A: Sequence and series / 54 Sequences The lists of numbers you generate using a numerical method

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

The exponential family: Conjugate priors

The exponential family: Conjugate priors Chapter 9 The exponential family: Conjugate priors Within the Bayesian framework the parameter θ is treated as a random quantity. This requires us to specify a prior distribution p(θ), from which we can

More information

Harrison B. Prosper. CMS Statistics Committee

Harrison B. Prosper. CMS Statistics Committee Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

More Spectral Clustering and an Introduction to Conjugacy

More Spectral Clustering and an Introduction to Conjugacy CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Physics 225 Relativity and Math Applications. Fall Unit 7 The 4-vectors of Dynamics

Physics 225 Relativity and Math Applications. Fall Unit 7 The 4-vectors of Dynamics Physics 225 Relativity and Math Applications Fall 2011 Unit 7 The 4-vectors of Dynamics N.C.R. Makins University of Illinois at Urbana-Champaign 2010 Physics 225 7.2 7.2 Physics 225 7.3 Unit 7: The 4-vectors

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES Contents 1. Continuous random variables 2. Examples 3. Expected values 4. Joint distributions

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests: One sided tests So far all of our tests have been two sided. While this may be a bit easier to understand, this is often not the best way to do a hypothesis test. One simple thing that we can do to get

More information

Limiting Distributions

Limiting Distributions Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the

More information

University of Toronto

University of Toronto A Limit Result for the Prior Predictive by Michael Evans Department of Statistics University of Toronto and Gun Ho Jang Department of Statistics University of Toronto Technical Report No. 1004 April 15,

More information

Lesson 28: A Focus on Square Roots

Lesson 28: A Focus on Square Roots now Lesson 28: A Focus on Square Roots Student Outcomes Students solve simple radical equations and understand the possibility of extraneous solutions. They understand that care must be taken with the

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

HOW TO CREATE A PROOF. Writing proofs is typically not a straightforward, algorithmic process such as calculating

HOW TO CREATE A PROOF. Writing proofs is typically not a straightforward, algorithmic process such as calculating HOW TO CREATE A PROOF ALLAN YASHINSKI Abstract We discuss how to structure a proof based on the statement being proved Writing proofs is typically not a straightforward, algorithmic process such as calculating

More information

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text. TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to

More information

Prof. Thistleton MAT 505 Introduction to Probability Lecture 13

Prof. Thistleton MAT 505 Introduction to Probability Lecture 13 Prof. Thistleton MAT 55 Introduction to Probability Lecture 3 Sections from Text and MIT Video Lecture: Sections 5.4, 5.6 http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-4- probabilisticsystems-analysis-and-applied-probability-fall-2/video-lectures/lecture-8-continuousrandomvariables/

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

Probability, Entropy, and Inference / More About Inference

Probability, Entropy, and Inference / More About Inference Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Common ontinuous random variables

Common ontinuous random variables Common ontinuous random variables CE 311S Earlier, we saw a number of distribution families Binomial Negative binomial Hypergeometric Poisson These were useful because they represented common situations:

More information

1.4 Mathematical Equivalence

1.4 Mathematical Equivalence 1.4 Mathematical Equivalence Introduction a motivating example sentences that always have the same truth values can be used interchangeably the implied domain of a sentence In this section, the idea of

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

Second-Order Homogeneous Linear Equations with Constant Coefficients

Second-Order Homogeneous Linear Equations with Constant Coefficients 15 Second-Order Homogeneous Linear Equations with Constant Coefficients A very important class of second-order homogeneous linear equations consists of those with constant coefficients; that is, those

More information