Part III. A Decision-Theoretic Approach and Bayesian testing
|
|
- Kelly Robinson
- 5 years ago
- Views:
Transcription
1 Part III A Decision-Theoretic Approach and Bayesian testing 1
2 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to choose between various possible actions after observing data. Let Θ denote the set of all possible states of nature values of parameter), and let D denote the set of all possible decisions actions). A loss function is any function L : Θ D [0, ); the idea is that Lθ, d) gives the cost penalty) associated with decision d if the true state of the world is θ Inference as a decision problem Denote by fx, θ) the sampling distribution, for a sample x X. Let Lθ, δ) be our loss function. Often the decision d is to evaluate or estimate a function hθ) as accurately as possible. For point estimation, we want to evaluate hθ) = θ; our set of decisions is D = Θ, the parameter space; and Lθ, d) is the loss in reporting d when θ is true. For hypothesis testing, if we want to test H 0 : θ Θ 0, then our possible decisions are D = {accept H 0, reject H 0 }. 2
3 We would like to evaluate the function hθ) = { 1 if θ Θ0 0 otherwise as accurately as possible. Our loss function is Lθ, accept H 0 ) = Lθ, reject H 0 ) = { l00 if θ Θ 0 l 01 otherwise { l10 if θ Θ 0 l 11 otherwise. Note: l 01 is the Type II-error, accept H 0 although false), and l 10 is the Type I-error reject H 0 although true) Decision rules and Bayes estimators In general we would like to find a decision rule: δ : X D, a function which makes a decision based on the data. We would like to choose δ such that we incur only a small loss. In general there is no δ that uniformly mimimizes Lθ, δx)). In a Bayesian setting, for a prior π and data x X, the posterior expected loss of a decision is a function of the data x, defined as ρπ, d x) = Lθ, d)πθ x)dθ. Θ For a prior π the integrated risk of a decision rule δ is the real number defined as rπ, δ) = Lθ, δx)fx θ)dxπθ)dθ. We prefer δ 1 to δ 2 if and only if rπ, δ 1 ) < rπ, δ 2 ). Θ X Proposition. An estimator minimizing rπ, δ) can be obtained by selecting, for every x X, the value δx) that minimizes ρπ, δ x). 3
4 Proof additional material) rπ, δ) = Lθ, δx))fx θ)dxπθ)dθ Θ X = Lθ, δx))πθ x)px)dθdx = X Θ X ρπ, δ x)px)dx Recall that px) = fx θ)πθ)dθ).) This proves the assertion. Θ A Bayes estimator associated with prior π, loss L, is any estimator δ π which minimizes rπ, δ): For every x X it is δ π = arg min ρπ, d x). d Then rπ) = rπ, δ π ) is called Bayes risk. This is valid for proper priors, and for improper priors if rπ) <. If rπ) = one can define a generalized Bayes estimator as the minimizer, for every x, of ρπ, d x). Fact: For strictly convex loss functions, Bayes estimators are unique Some common loss functions In principle the loss function is part of the problem specification. A very popular choice is squared error loss Lθ, d) = θ d) 2. This loss function is convex, and penalizes large deviations heavily. Proposition. The Bayes estimator δ π associated with prior π under squared error loss is the posterior mean, δ π x) = E π Θ θ x) = θfx θ)πθ)dθ Θ fx θ)πθ)dθ. To see this, recall that for any random variable Y, EY a) 2 ) is minimized by a = EY. Another common choice for the loss function is absolute error loss Lθ, d) = θ d. Proposition: The posterior median is a Bayes estimator under absolute error loss. 4
5 10.4 Bayesian testing Let fx, θ) be our sampling distribution, x X, θ Θ, and suppose that we cant to test H 0 : θ Θ 0. Then the set of possible decisions is D = {accept H 0, reject H 0 } = {1, 0}, where 1 stands for acceptance. We use the loss function Lθ, φ) = 0 if θ Θ 0, φ = 1 a 0 if θ Θ 0, φ = 0 0 if θ Θ 0, φ = 0 a 1 if θ Θ 0, φ = 1. Proposition Under this loss function, the Bayes decision rule associated with a prior distribution π is { 1 if P φ π π θ Θ 0 x) > a 1 x) = a 0 +a 1 0 otherwise Note: In the special case that a 0 = a 1, the rule states that we accept H 0 if P π θ Θ 0 x) > 1 2. Proof additional material) The posterior expected loss is ρπ, φ x) = a 0 P π θ Θ 0 x)1φx) = 0) + a 1 P π θ Θ 0 x)1φx) = 1) = a 0 P π θ Θ 0 x) + 1φx) = 1) a 1 a 0 + a 1 )P π θ Θ 0 x)), and a 1 a 0 + a 1 )P π θ Θ 0 x) < 0 if and only if P π θ Θ 0 x) > a 1 a 0 +a 1. Example: X Binn, θ), Θ 0 = [0, 1/2), πθ) = 1; then P π θ < 1 2 x ) = = θx 1 θ) n x dθ 1 0 θx 1 θ) n x dθ 1 n+1 2) Bx + 1, n x + 1) { } 1 n x)!x! , x + 1 n + 1)! which can be evaluated for particular n and x, and compared with the acceptance level a 1 a 0 +a 1. 5
6 Example: X N θ, σ 2 ), σ 2 is known, θ N µ, τ 2 ). We already calculated To test H 0 : θ < 0: Let z a0,a 1 be the equivalently, if πθ x) N µx), w 2 ) µx) = σ2 µ + τ 2 x σ 2 + τ 2 w 2 σ 2 τ 2 = σ 2 + τ. 2 P π θ < 0 x) = P π θ µx) w a 1 a 0 +a 1 < µx) ) = Φ µx) ) w w quantile; then we accept H 0 if µx) > z a0,a 1 w, or, x < σ2 τ 2 µ 1 + σ2 τ 2 ) z a0,a 1 w. For σ 2 = 1, µ = 0, τ 2, we accept H 0 if x < z a0,a 1. Compare this to the frequentist test: Agian σ 2 = 1. Accept H 0 if x < z 1 α = z α. This corresponds to a 0 a 1 = 1 1. So a 0 α a 1 = 19 for α = 0.05, and = 99 for α = 0.01, for example. a 0 a 1 Note: If the prior probability of H 0 is 0, then so will be posterior probability. Testing H 0 : θ = θ 0 against H 1 : θ > θ 0 often really means testing H 0 : θ θ 0 against H 1 : θ > θ 0, which is natural to test in a Bayesian setting. is Definition: The Bayes factor for testing H 0 : θ Θ 0 against H 1 : θ Θ 1 B π x) = P π θ Θ 0 x)/p π θ Θ 1 x). P π θ Θ 0 )/P π θ Θ 1 ) It measures the extent to which the data x will change the odds of Θ 0 relative to Θ 1. 6
7 If B π x) > 1 the data adds support to H 0 ; if B π x) < 1 the data adds support to H 1 ; if B π x) = 1 the data does not help to distinguish between H 0 and H 1. Note that the Bayes factor still depends on the prior π. Special case: If H 0 : θ = θ 0, H 1 : θ = θ 1, then is the likelihood ratio. B π x) = fx θ 0) fx θ 1 ) More generally, / B π x) = Θ 0 πθ)fx θ)dθ Θ 0 πθ)dθ Θ 1 πθ)fx θ)dθ Θ 1 πθ)dθ = Θ 0 πθ)fx θ)/p π θ Θ 0 )dθ Θ 1 πθ)fx θ))/p π θ Θ 1 )dθ = px θ Θ 0) px θ Θ 1 ) is the ratio of how likely the data is under H 0 and how likely the data is under H 1. Compare this to the frequentist likelihood ratio Λx) = max θ Θ 0 fx θ) max θ Θ1 fx θ). In Bayesian statistics we average instead of taking maxima. Note: with φ π from the Proposition, and ρ 0 = P π θ Θ 0 ), ρ 1 = P π θ Θ 1 ), B π x) = P π θ Θ 0 x)/1 P π θ Θ 0 x)) ρ 0 /ρ 1 and so / φ π x) = 1 B π x) > a 1 ρ 0. a 0 ρ 1 7
8 Also, by inverting the equality it follows that P π θ Θ 0 x) = 1 + ρ ) 1 1 B π x)) 1. ρ 0 Example: Let X Binn, p), H 0 : p = 1/2, H 1 : p 1/2. Choose as prior an atom of size ρ 0 at 1/2, otherwise uniform; then B π x) = px p = 1/2) px p Θ 1 ) = n ) x 2 n n. x) Bx + 1, n x + 1) So P p = 12 ) x = If ρ 0 = 1/2, n = 5, x = 3: B π x) = 15 8 P p = 12 ) x = ρ 0) ρ 0 > 1, then ) 1 x!n x)! 2 n. n 1)! ) = ; so the data adds support to H 0, the posterior probability of H 0 is 15/23 > 1/2. We could have chosen an alternatively prior: atom of size ρ 0 at 1/2, otherwise Beta1/2, 1/2); this prior favours 0 and 1. Then we obtain for n=10: x P p = 1 x) Example: Let X N θ, σ 2 ), σ 2 is known, H 0 : θ = 0. We choose as prior mass ρ 0 at θ = 0, otherwise N 0, τ 2 ). Then B π ) 1 = px θ 0) px θ = 0) = σ2 + τ 2 ) 1/2 exp{ x 2 /2σ 2 + τ 2 ))} σ 1 exp{ x 2 /2σ 2 )} 8
9 and P θ = 0 x) = ρ 0 ρ 0 σ 2 σ 2 + τ exp 2 τ 2 x 2 2σ 2 σ 2 + τ 2 ) ) ) 1. Example: ρ 0 = 1/2, τ = σ, put z = x/σ x P θ = 0 z) For τ = 10σ more diffusive prior) x P θ = 0 z) so x gives stronger support for H 0 than under the tighter prior. Note: For x fixed, τ 2, ρ 0 > 0, we have P θ = 0 x) 1. For the noninformative prior πθ) 1 we have that px πθ) = 2πσ 2 ) 1/2 e x θ)2 2σ 2 dθ = 2πσ 2 ) 1/2 e θ x)2 2σ 2 dθ and so P θ = 0 x) = which is not equal to 1. = ρ ) 1 0 2π expx 2 /2) ρ 0 9
10 Lindley s paradox: Let X N θ, σ 2 x /n), H 0 : θ = 0, n is fixed. If σ/ n) is large enough to reject H 0 in classical test, then for large enough τ 2 the Bayes factor will be larger than 1, indicating support for H 0. In contrast, if σ 2, τ 2 x are fixed, and n such that σ/ = k n) α fixed, which is just significant at level α in classical test, then B π x). Results which are just significant at some fixed level in the classical test will, for large n, actually be much more likely under H 0 than under H 1. A very diffusive prior proclaims great scepticism, which may overwhelm the contrary evidence of the observations Least favourable Bayesian answers The choice of prior affects the result. Suppose that we want to test H 0 : θ = θ 0 against H 1 : θ θ 0, and the prior probability on H 0 is ρ 0 = 1/2. Which prior g in H 1, after observing x, would be least favourable to H 0? Let G be a family of priors on H 1 ; put and P x, G) = Bx, G) = inf g G fx θ 0 ) Θ fx θ)gθ)dθ ) 1 fx θ 0 ) fx θ 0 ) + sup fx θ)gθ)dθ = g G Bx, G) Θ A Bayesian prior g G on H 0 will then have posterior probability at least P x, G) on H 0 for ρ 0 = 1/2). If ˆθ is the m.l.e. of θ, and if G A is the set of all prior distributions, then Bx, G A ) = fx θ 0) fx ˆθx)) and P x, G A ) = 1 + fx ˆθx)) fx θ 0 ) 1 10
11 Other natural families are G S the set of distributions symmetric around θ 0, and G SU the set of unimodal distributions symmetric around θ 0. Example: Let X N θ, 1), H 0 : θ = θ 0, H 1 : θ θ 0, then we can calculate p-value P x, G A ) P x, G SU ) The Bayesian approach will typically reject H 0 less frequently than the frequentist approach Consider the general test problem H 0 : θ = θ 0 against H 1 : θ = θ 1, and consider repetitions in which one uses the most powerful test with level α = In frequentist analysis, only 1% of the true H 0 will be rejected. But this does not say anything about the proportion of errors made when rejecting! Example: Suppose the probability of type II error is 0.99, and θ 0 and θ 1 occur equally often, then about half of the rejections of H 0 will be in error Comparison with frequentist hypothesis testing Frequentist hypothesis testing: Asymmetry between H 0, H 1 : fix type I error, minimize type II error; UMP tests do not always exist general 2-sided tests, e.g.); p-values: - have no intrinsic optimality, space of p-values lacks a decision-theoretic foundation - are routinely misinterpreted - do not take type II error into account; confidence regions: are a pre-data measure, can often have very different post data coverage probabilities. 11
12 Example: Let X N θ, 1/2), H 0 : θ = 1, H 1 : θ = 1, and suppose that x = 0. Then the UMP p-value is 0.072, but the p-value for the test of H 1 against H 0 takes exactly the same value. Example: Let X 1,..., X n be i.i.d. N θ, σ 2 ), both θ, σ 2 are unknown. Then a 1001 α) confidence for θ is ) s s C = x t α/2, x + t α/2. n n For n = 2, α = 0.5, the predata coverage probability is 0.5. However, Brown Ann.Math.Stat. 38, 1967, ) showed that P θ C x /s < 1 + 2) > 2/3. Bayesian testing compares the probability of the actual data under the two hypotheses 12
13 Chapter 11 Hierarchical and empirical Bayesian methods 11.1 Hierarchical Bayes: A hierarchical model consists of modelling a parameter θ through randomness at different levels; for example, θ β π 1 θ β), where β π 2 β); so that then πθ) = π 1 θ β)π 2 β)dβ. When dealing with complicated posterior distributions, rather than evaluating the integrals, we might use simulation to approximate the integrals. For simulation in hierarchical models, we simulate first from β, then, given β, we simulate from θ. We hope that the distribution of β is easy to simulate, and also that the conditional distribution of θ given β is easy to simulate. This approach is particularly useful for MCMC Markov chain Monte Carlo) methods, e.g.: see next term Empirical Bayes: Let x fx θ). The empirical Bayes method chooses a convenient prior family πθ λ) typically conjugate), where λ is a hyperparameter, so px λ) = fx θ)πθ λ)dθ. 13
14 Rather than specifying λ, we estimate λ by ˆλ, for example by frequentist methods, based on px λ), and we substitute ˆλ for λ; πθ x, ˆλ) is called a pseudo-posterior. We plug it into Bayes Theorem for inference. The empirical Bayes approach * is neither fully Bayesian nor fully frequentist; * depends on ˆλ, different ˆλ will lead to different procedures; * if ˆλ is consistent, then asymptotically will lead to coherent Bayesian analysis. * often outperforms classical estimators in empirical terms. Example: James-Stein estimators Let X i N θ i, 1) be independent given θ i, i = 1,..., p,where p 3. In vector notation: X N θ, I p ). Here the vector θ is random; assume that we have realizations θ i, i = 1,..., p. The obvious estimate which is the least-squares estimate) for θ i is ˆθ i = x i, leading to ˆθ = X. Abbreviate X = x. Our decision rule would hence be δx) = x. But this is not the best rule! Assume that θ i N 0, τ 2 ), then px τ 2 ) = N 0, 1 + τ 2 )I p ), and the posterior for θ given the data is ) τ 2 θ x N 1 + τ x, τ I 2 p. Under quadratic loss, the Bayes estimator δx) of θ is the posterior mean τ τ 2 x. In the empirical Bayes approach, we would use the m.l.e. for τ 2, which is ) τˆ x 2 2 = 1 1 x 2 > p), p 14
15 and the empirical Bayes estimator is the estimated posterior mean, δ EB x) = ˆτ ˆτ x = 2 1 p ) + x x 2 is the truncated James-Stein estimator. It can can be shown to outperform the estimator δx) = x. Alternatively, the best unbiased estimator of 1/1 + τ 2 ) is p 2 x 2, giving δ EB x) = 1 p ) x. x 2 This is the James-Stein estimator. It can be shown that under quadratic loss function the James-Stein estimator has smaller integrated risk than δx) = x. Note: both estimators tend to shrink towards 0. It is now known to be a very general phenomenon that when comparing three or more populations, the sample mean is not the best estimator. Shrinkage estimators are an active area of research. Bayesian computation of posterior probabilities can be very computerintensive; see the MCMC and Applied Bayesian Statistics course. 15
16 Chapter 12 Principles of Inference 12.1 The Likelihood Principle The Likelihood Principle: The information brought by an observation x about θ is entirely contained in the likelihood function Lθ x). From this follows: if x 1 and x 2 are two observations with likelihoods L 1 θ x) and L 2 θ x), and if L 1 θ x) = cx 1, x 2 )L 2 θ x) then x 1 and x 2 must lead to identical inferences. Example. We know that πθ x) fx θ)πθ). If f 1 x θ) f 2 x θ) as a function of θ, then they have the same posterior, so they lead to the same Bayesian inference. Example: Binomial versus negative binomial. a) Let X Binn, θ) be the number of successes in n independent trials, with p.m.f. ) n f 1 x θ) = θ x 1 θ) n x x then πθ x) ) n θ x 1 θ) n x πθ) x θ x 1 θ) n x πθ). 16
17 b) Let N NegBinx, θ) be the number of independent trials until x successes, with p.m.f. ) n 1 f 2 n θ) = θ x 1 θ) n x x 1 and πθ x) θ x 1 θ) n x πθ). Bayesian inference about θ does not depend on whether a binomial or a negative binomial sampling scheme was used. M.l.e. s satisfy the likelihood principle, but many frequentist procedures do not! Example: Binn, θ)-sampling. We observe x 1,..., x n ) = 0,..., 0, 1). An unbiased estimate for θ is ˆθ = 1/n. If instead we view n as geometricθ), then the only unbiased estimator for θ is ˆθ = 1n = 1). Unbiasedness typically violates the likelihood principle: it involves integrals over the sample space, so it depends on the value of fx θ) for values of x other than the observed value. Example: a) We observe a Binomial randome variable, n=12; we observe 9 heads, 3 tails. Suppose that we want to test H 0 : θ = 1/2 against H 1 : θ > 1/2. We can calculate that the UMP test has P X 9) = b) If instead we continue tossing until 3 tails recorded, and observe that N = 12 tosses are needed, then the underlying distribution is negative binomial, and P N 12) = The conditionality perspective Example Cox 1958) A scientist wants to measure a physical quantity θ. Machine 1 gives measurements X 1 N θ, 1), but is often busy. Machine 2 gives measurements X 1 N θ, 100). The availability of machine 1 is beyond the scientist s control, independent of object to be measured. Assume that on any given occasion machine 1 is available with probability 1/2; if available, the scientist chooses machine 1. 17
18 A standard 95% confidence interval is about x 16.4, x ) because of the possibility that machine 2 was used. Conditionality Principle: If two experiments on the parameter θ are available, and if one of these two experiments is selected with probability 1/2, then the resulting inference on θ should only depend on the selected experiment. The conditionality principle is satisfied in Bayesian analysis. In the frequentist approach, we could condition on an ancillary statistic, but such statistic is not always available. A related principle is the Stopping rule principle SRP): A it stopping rule is a random variable that tells when to stop the experiment; this random variable depends only on the outcome of the first n experiments does not look into the future). The stopping rule principle states that if a sequence of experiments is directed by a stopping rule, then, given the resulting sample, the inference about θ should not depend on the nature of the stopping rule. The likelihood principle implies the SRP. The SRP is satisfied in Bayesian inference, but it is not always satisfied in frequentist analysis. 18
A Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationDefinition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.
Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More information9 Bayesian inference. 9.1 Subjective probability
9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win
More informationSpatial Statistics Chapter 4 Basics of Bayesian Inference and Computation
Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationST 740: Model Selection
ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationIntroduction to Bayesian Statistics 1
Introduction to Bayesian Statistics 1 STA 442/2101 Fall 2018 1 This slide show is an open-source document. See last slide for copyright information. 1 / 42 Thomas Bayes (1701-1761) Image from the Wikipedia
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationTwo examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.
Two examples of the use of fuzzy set theory in statistics Glen Meeden University of Minnesota http://www.stat.umn.edu/~glen/talks 1 Fuzzy set theory Fuzzy set theory was introduced by Zadeh in (1965) as
More informationChapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates
Chapter 4 Bayesian inference The posterior distribution π(θ x) summarises all our information about θ to date. However, sometimes it is helpful to reduce this distribution to a few key summary measures.
More informationBayesian Statistics. Debdeep Pati Florida State University. February 11, 2016
Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability
More informationLecture 8 October Bayes Estimators and Average Risk Optimality
STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationHypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes
Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis
More informationSTAT 830 Decision Theory and Bayesian Methods
STAT 830 Decision Theory and Bayesian Methods Example: Decide between 4 modes of transportation to work: B = Ride my bike. C = Take the car. T = Use public transit. H = Stay home. Costs depend on weather:
More informationPrinciples of Statistics
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationInterval Estimation. Chapter 9
Chapter 9 Interval Estimation 9.1 Introduction Definition 9.1.1 An interval estimate of a real-values parameter θ is any pair of functions, L(x 1,..., x n ) and U(x 1,..., x n ), of a sample that satisfy
More informationWeakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationBayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017
Chalmers April 6, 2017 Bayesian philosophy Bayesian philosophy Bayesian statistics versus classical statistics: War or co-existence? Classical statistics: Models have variables and parameters; these are
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Statistics Script for the course in spring 2014
Bayesian Statistics Script for the course in spring 2014 Hansruedi Künsch Seminar für Statistik, ETH Zurich Version of August 6, 2014 Contents 1 Basic concepts 1 1.1 Interpretations of probability..........................
More informationIntroduction to Bayesian Statistics
Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationAnswer Key for STAT 200B HW No. 7
Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;
More informationFrequentist Statistics and Hypothesis Testing Spring
Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationBayesian inference: what it means and why we care
Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationA BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain
A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist
More informationIntegrated Objective Bayesian Estimation and Hypothesis Testing
Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationFundamental Theory of Statistical Inference
Fundamental Theory of Statistical Inference G. A. Young January 2012 Preface These notes cover the essential material of the LTCC course Fundamental Theory of Statistical Inference. They are extracted
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More information1 Hierarchical Bayes and Empirical Bayes. MLII Method.
ISyE8843A, Brani Vidakovic Handout 8 Hierarchical Bayes and Empirical Bayes. MLII Method. Hierarchical Bayes and Empirical Bayes are related by their goals, but quite different by the methods of how these
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayes and Empirical Bayes Estimation of the Scale Parameter of the Gamma Distribution under Balanced Loss Functions
The Korean Communications in Statistics Vol. 14 No. 1, 2007, pp. 71 80 Bayes and Empirical Bayes Estimation of the Scale Parameter of the Gamma Distribution under Balanced Loss Functions R. Rezaeian 1)
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationIntroduction to Bayesian Statistics
Introduction to Bayesian Statistics Dimitris Fouskakis Dept. of Applied Mathematics National Technical University of Athens Greece fouskakis@math.ntua.gr M.Sc. Applied Mathematics, NTUA, 2014 p.1/104 Thomas
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationP Values and Nuisance Parameters
P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationLecture 12 November 3
STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationApproximate Bayesian computation for spatial extremes via open-faced sandwich adjustment
Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Ben Shaby SAMSI August 3, 2010 Ben Shaby (SAMSI) OFS adjustment August 3, 2010 1 / 29 Outline 1 Introduction 2 Spatial
More informationStatistical Theory MT 2006 Problems 4: Solution sketches
Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine
More informationMultinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is
Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationSTAT 830 Bayesian Estimation
STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These
More informationStable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence
Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April,
More informationMathematics Ph.D. Qualifying Examination Stat Probability, January 2018
Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationMIT Spring 2016
Decision Theoretic Framework MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Decision Theoretic Framework 1 Decision Theoretic Framework 2 Decision Problems of Statistical Inference Estimation: estimating
More informationBayes Testing and More
Bayes Testing and More STA 732. Surya Tokdar Bayes testing The basic goal of testing is to provide a summary of evidence toward/against a hypothesis of the kind H 0 : θ Θ 0, for some scientifically important
More informationDepartment of Statistics
Research Report Department of Statistics Research Report Department of Statistics No. 208:4 A Classroom Approach to the Construction of Bayesian Credible Intervals of a Poisson Mean No. 208:4 Per Gösta
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationMathematical Statistics
Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationMore on nuisance parameters
BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution
More information