Are Bayesian Inferences Weak for Wasserman s Example?

Size: px

Start display at page:

Download "Are Bayesian Inferences Weak for Wasserman s Example?"

Austin Brown
5 years ago
Views:

1 Are Bayesian Inferences Weak for Wasserman s Example? Longhai Li longhai@math.usask.ca Department of Mathematics and Statistics University of Saskatchewan Saskatoon, Saskatchewan, S7N 5E6 Canada Quebec City, SSC May 2010

2 Introduction In the textbook by Wasserman (2004), under a Section titled Strengths and Weaknesses of Bayesian Inference (pages ), he used a simple example to demonstrate the weaknesses of Bayesian inferences in high-dimensional and nonparametric problems. I have carried out a simple Bayesian analysis for this example, obtaining a simple Bayes estimate, and used a frequentist method mean square error, to compare with Horwitz-Thompson he suggested to use for this problem. Are Bayesian Inferences Weak for Wasserman s Example? 2/19

3 Wasserman s Example The model in the example may be appropriate for sampling survey problems. Observation Y i is modeled by a mixture distribution of a huge number, B, of Bernoulli distributions, parameterized by θ b, for b = 1,...,B: X i Uniform(1,...,B), (1) Y i X i Bernoulli(θ Xi ). (2) In practice it is possible that some of these Y i are unobserved, for example when those sampled individuals refuse to respond. Let ξ b denote the probability that the individual indexed by b will respond to the question. Then, given X i, the distribution of R i is R i X i Bernoulli(ξ Xi ). (3) In addition, we assume that R i and Y i are independently distributed given X i. Interested Parameter: ϕ = 1 B B b=1 θ b = P(Y i = 1). (4) Are Bayesian Inferences Weak for Wasserman s Example? 3/19

4 Likelihood Function of the Example The likelihood function of θ and ξ based on the data D is the product of the joint distributions of either (X i,r i = 1,Y i ) or (X i,r i = 0), for i = 1,...,n: L(θ,ξ; D) = 1 B n n i=1 ξ R i X i (1 ξ Xi ) 1 R i {i: R i =1} θ Y i X i (1 θ Xi ) 1 Y i (5) Wasserman s argument based on this likelihood function is rephrased as follows: The above likelihood function is relevant to at most n different θ b. Therefore, when B is greatly larger than n, the likelihood function contains information of only a tiny fraction of θ. The posterior distribution of θ is almost equal to the whatever prior distribution, therefore cannot lead to a good inference for the interested parameter, ϕ. Are Bayesian Inferences Weak for Wasserman s Example? 4/19

5 Horwitz-Thompson Estimator Wasswerman suggested the following estimator for ϕ: ˆϕ HT = 1 n R i Y i, (6) n ξ Xi where, when R i = 0, Y i is imaginary and can be assigned arbitrarily. This estimator treats both observed Y i = 0, and sets missing Y i to 0, and count 1/ξ Xi times each observed Y i = 1. Note that he assumes that the parameter ξ is known. i=1 One can easily show that this estimate has mean ϕ, by iterative expectation formula. This estimator is also consistent since its MSE converges to 0 as n : MSE(ˆϕ HT ) = 1 ( ) n Var Ri Y i (7) ξ Xi = 1 ( ( ) Ri Y i E ) ϕ 2 n ξx 2 (8) i ( ) = 1 1 B θ b ϕ 2. (9) n B ξ b b=1 Are Bayesian Inferences Weak for Wasserman s Example? 5/19

6 A Bayesian Analysis Are Bayesian Inferences Weak for Wasserman s Example? 6/19

7 Prior Specification I will assume that θ and ξ are independent given some hyperparameters. The prior for θ: θ 1,...,θ B α T,f IID Beta(θ α T f, α T (1 f)), (10) f,α T Beta(f α F,α F ) π T (α T ). (11) Some properties of the above prior: E(θ b ) = f (12) Var(θ b ) = f(1 f) α T +1 (13) θ b f,α T d Bernoulli(f), as α T (14) The prior for ξ is denoted by π ξ (ξ). We don tneed to specifyit as it will be unrelated to the posterior of θ once we assume that θ and ξ are independent. Similar for π T (α T ). Are Bayesian Inferences Weak for Wasserman s Example? 7/19

8 Posterior Analysis When B is very large, ϕ is very close to f from LLN. I will turn to find the posterior distribution of f given D. I will start with joint distribution of data and parameters: P(D,θ,ξ,f,α T α F ) θ Y i X i (1 θ Xi ) 1 Y i {i R i =1} B [ Beta(θb α T f, α T (1 f)) ] Beta(f α F,α F )π T (α T ) b=1 n i=1 θ R i X i (1 θ Xi ) 1 R i π ξ (ξ) Let I X = {X 1,...,X n }. Since B is greatly larger than n, I will assume that all X i s are distinct for finding an approximate estimate of the posterior of f. Are Bayesian Inferences Weak for Wasserman s Example? 8/19

9 Posterior Analysis (Cont d) I will next integrate θ away from the above joint distribution: (1) Those θ b for b I X can be integrated away from their prior, leading to 1. (2) Those θ b for b I X can be integrated away too, leading to a Bernoulli distribution for Y i : 1 0 θ Y i(b) b (1 θ b ) 1 Y i(b) Beta(θ b α T f,α T (1 f))dθ b = f Yi(b) (1 f) 1 Yi(b). We can also integrate ξ away, leading to an expression unrelated to f. After integrating away ξ and θ, we obtain the joint distribution of data and f: P(D,f α F ) = c {i:r i =1} f Y i (1 f) 1 Y i Beta(f α F,α F ), (15) Are Bayesian Inferences Weak for Wasserman s Example? 9/19

10 Bayes Estimator The posterior distribution of f is therefore a Beta distribution: where P(f D,α F ) = Beta(f n 1 +α F,n 0 +α F ), (16) n 1 = n 0 = n Y i = R i Y i, (17) {i:r i =1} i=1 (1 Y i ) = {i:r i =1} i=1 n R i n 1. (18) The best guess for f that minimizes the expected square loss is the mean of the posterior distribution of f, which leads to the Bayes estimator for f: ˆϕ BS = n 1 +α F n 0 +n 1 +2α F. (19) I will compare ˆϕ HT and ˆϕ BS with criterion of mean square error. Are Bayesian Inferences Weak for Wasserman s Example? 10/19

11 End of Bayesian Analysis Are Bayesian Inferences Weak for Wasserman s Example? 11/19

12 Comparison of MSEs (1) In the first comparison, I set ξ all equal to δ. For each δ, a set of θ are drawn from a transformed normal random numbers: θ b Φ(Z b ), Z b N(0,0.5 2 ),for b = 1,...,B (20) Are Bayesian Inferences Weak for Wasserman s Example? 12/19

13 Comparison of MSEs (1) δ = 0.02 δ = 0.09 δ = H T Bayes δ = 0.23 δ = 0.29 δ = Mean Square Error δ = δ = δ = 0.57 δ = 0.64 δ = 0.71 δ = δ = 0.84 δ = 0.91 δ = ϕ Are Bayesian Inferences Weak for Wasserman s Example? 13/19

14 Comparison of MSEs (2) In the second comparison, I generated pairs of θ and ξ from transformed multivariate normal numbers with different correlations. The MSE with similar ξ and similar correlations (generated from the same parameters) are plotted in a graph against the true values of ϕ: Are Bayesian Inferences Weak for Wasserman s Example? 14/19

15 Comparison of MSEs (2) λ = 0.04, ρ = 0.69 λ = 0.04, ρ = 0 λ = 0.04, ρ = H T Bayes λ = 0.14, ρ = 0.76 λ = 0.14, ρ = 0 λ = 0.14, ρ = Mean Square Error λ = 0.5, ρ = λ = 0.5, ρ = λ = 0.5, ρ = 0.8 λ = 0.86, ρ = 0.76 λ = 0.86, ρ = 0 λ = 0.86, ρ = λ = 0.96, ρ = 0.69 λ = 0.96, ρ = 0 λ = 0.96, ρ = ϕ Are Bayesian Inferences Weak for Wasserman s Example? 15/19

16 Histograms of the Simulated Estimators I generated three pairs of nearly independent θ and ξ with high ϕ values. With each pair, I simulated 5000 values of two estimators, and drew their histograms: Are Bayesian Inferences Weak for Wasserman s Example? 16/19

17 Histograms of the Simulated Estimators λ = 0.09, ϕ = 0.91, ρ = 0.01 Histogram of 5000 Horwitz Thompson samples Histogram of 5000 Bayes samples Density Density ϕ^ λ = 0.5, ϕ = 0.91, ρ = 0 ϕ^ Histogram of 5000 Horwitz Thompson samples Histogram of 5000 Bayes samples Density Density ϕ^ λ = 0.91, ϕ = 0.91, ρ = 0 ϕ^ Histogram of 5000 Horwitz Thompson samples Histogram of 5000 Bayes samples Density Density ϕ^ ϕ^ Are Bayesian Inferences Weak for Wasserman s Example? 17/19

18 Conclusion From my comparisons, the simple Bayes estimator isn t weak for Wasserman s example. Indeed, it is stronger than Horwitz-Thompson estimator for most parameter configurations. Indeed, Bayesian inferences have been applied successfully to many high-dimensional and nonparametric problems. Appropriate Bayesian inferences can avoid overfitting problem of MLE, easily model data with complex structure, naturally use prior knowledge to improve inference, and automatically consider uncertainty in inference. They therefore show superiority in many hard problems. Are Bayesian Inferences Weak for Wasserman s Example? 18/19

19 Thank You! Questions and Comments? Are Bayesian Inferences Weak for Wasserman s Example? 19/19

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,