The beta density, Bayes, Laplace, and Pólya

Similar documents
Exponential Families and Bayesian Inference

Approximations and more PMFs and PDFs

Bernoulli Numbers. n(n+1) = n(n+1)(2n+1) = n(n 1) 2

4. Partial Sums and the Central Limit Theorem

ε > 0 N N n N a n < ε. Now notice that a n = a n.

MATH/STAT 352: Lecture 15

Lecture 8. Dirac and Weierstrass

0, otherwise. EX = E(X 1 + X n ) = EX j = np and. Var(X j ) = np(1 p). Var(X) = Var(X X n ) =

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015


Probability and Statistics

Lecture 7: Properties of Random Samples

Bayesian Methods: Introduction to Multi-parameter Models

CS 330 Discussion - Probability

Random Models. Tusheng Zhang. February 14, 2013

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

= p x (1 p) 1 x. Var (X) =p(1 p) M X (t) =1+p(e t 1).

5. Likelihood Ratio Tests

Topic 9: Sampling Distributions of Estimators

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Ma 530 Infinite Series I

Lecture 2: Monte Carlo Simulation

Calculus 2 TAYLOR SERIES CONVERGENCE AND TAYLOR REMAINDER

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

Chapter 8 Hypothesis Testing

Chapter 0. Review of set theory. 0.1 Sets

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Infinite Sequences and Series

The Random Walk For Dummies

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Summation Method for Some Special Series Exactly

Sampling Distributions, Z-Tests, Power

Chapter 6 Principles of Data Reduction

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Binomial Distribution

4.3 Growth Rates of Solutions to Recurrences

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

ANOTHER PROOF FOR FERMAT S LAST THEOREM 1. INTRODUCTION

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Lecture 4 The Simple Random Walk

Topic 9: Sampling Distributions of Estimators

Final Review for MATH 3510

Probability and MLE.

Problem 4: Evaluate ( k ) by negating (actually un-negating) its upper index. Binomial coefficient

Lecture 2: April 3, 2013

B. Maddah ENMG 622 ENMG /20/09

Some discrete distribution

Axioms of Measure Theory

Lecture 9: September 19

Topic 9: Sampling Distributions of Estimators

ON POINTWISE BINOMIAL APPROXIMATION

Monte Carlo Integration

6. Sufficient, Complete, and Ancillary Statistics

The Binomial Theorem

1+x 1 + α+x. x = 2(α x2 ) 1+x

Basics of Probability Theory (for Theory of Computation courses)

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

Probability theory and mathematical statistics:

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Machine Learning Brett Bernstein

If we want to add up the area of four rectangles, we could find the area of each rectangle and then write this sum symbolically as:

EE 4TM4: Digital Communications II Probability Theory

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

An Introduction to Randomized Algorithms

Lecture Chapter 6: Convergence of Random Sequences

Confidence Intervals for the Population Proportion p

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

Discrete probability distributions

The standard deviation of the mean

STAT Homework 1 - Solutions

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Probability and statistics: basic terms

STAT 516 Answers Homework 6 April 2, 2008 Solutions by Mark Daniel Ward PROBLEMS

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Sigma notation. 2.1 Introduction

Expectation and Variance of a random variable

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

AMS570 Lecture Notes #2

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Simulation. Two Rule For Inverting A Distribution Function

Generalized Semi- Markov Processes (GSMP)

Solutions: Homework 3

Chapter 2 The Monte Carlo Method

Empirical Distributions

Keywords: Last-Success-Problem; Odds-Theorem; Optimal stopping; Optimal threshold AMS 2010 Mathematics Subject Classification 60G40, 62L15

Once we have a sequence of numbers, the next thing to do is to sum them up. Given a sequence (a n ) n=1

Mathematical Statistics Anna Janicka

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

Mathematics 170B Selected HW Solutions.

Math Third Midterm Exam November 17, 2010

Transcription:

The beta desity, Bayes, Laplae, ad Pólya Saad Meimeh The beta desity as a ojugate form Suppose that is a biomial radom variable with idex ad parameter p, i.e. ( ) P ( p) p ( p) Applyig Bayes s rule, we have: Therefore, a prior of the form f(p ) p ( p) f(p) f(p) p α ( p) β is a ojugate prior sie the posterior will have the form: It is ot hard to show that f(p ) p +α ( p) +β p α ( p) β dp Γ(α)Γ(β) Γ(α + β) Let s deote the above by B(α, β). Therefore, f(p) Be(α, β) where Be(α, β) is alled the beta desity with parameters α > ad β >, ad is give by: B(α, β) pα ( p) β Note that the beta desity a also be viewed as the posterior for p after observig α suesses ad β failures, give a uiform prior o p (here both α ad β are itegers). f(p α, β) p α ( p) β

Example: Cosider a ur otaiig red ad bla balls. The probability of a red ball is p, but p is uow. The prior o p is uiform betwee ad (o speifi owledge). We repeatedly draw balls with replaemet. What is the posterior desity for p after observig α red balls ad β bla balls? f(p α red, β bla) ( α + β 2 α ) p α ( p) β Therefore, f(p) Be(α, β). Note that both α ad β eed to be equal to at least. For istae, after drawig oe red ball oly (α 2, β ), the posterior will be f(p) 2p. Here s a table listig some possible observatios: observatio posterior α, β f(p) α 2, β f(p) 2p α 2, β 2 f(p) 6p( p) α 3, β f(p) 3p 2 α 3, β 2 f(p) 2p 2 ( p) α 3, β 3 f(p) 3p 2 ( p) 2 2 Laplae s rule of suessio I 774, Laplae laimed that a evet whih has ourred times, ad has ot failed thus far, will our agai with probability ( + )/( + 2). This is ow as Laplae s rule of suessio. Laplae applied this result to the surise problem: What is the probability that the su will rise tomorrow? Let X, X 2,... be a sequee of idepedet Beroulli trials with parameter p. Note that this otio of depedee is oditioal o p. More preisely: P (X b, X 2 b 2,..., X b p) P (X i b i ) I fat, X i ad X j are ot idepedet beause by observig X i, oe ould say somethig about p, ad hee about X j. This is a osequee of the Bayesia approah whih treats p itself as a radom variable (uow). Let S i X i. We would lie to fid the followig probability: P (X + S ) i

Observe that: P (X + S ) P (X + p, S )f(p S )dp P (X + p)f(p S )dp pf(p S )dp Therefore, we eed to fid the posterior desity of p. Assumig we ow othig about p iitially, we will adopt the uiform prior f(p) betwee ad. Applyig Bayes rule: We olude that: Fially, f(p S ) P (S p)f(p) p ( p) f(p S ) B( +, + ) p(+) ( p) ( +) P (X + S ) We obtai Laplae s result by settig. 3 Geeralizatio pf(p S )dp + + 2 Cosider a oi toss that a result i head, tail, or edge. We deote by p the probability of head, ad by q the probability of tail, thus the probability of edge is p q. Observe that p, q [, ] ad p + q. I oi tosses, the probability of observig heads ad 2 tails (ad thus 2 edges) is give by the multiomial probability mass futio (this geeralizes the biomial): ( ) ( ) P (, 2 ) p q 2 ( p q) 2 2 The Dirihlet desity is a geeralizatio of beta ad is ojugate to multiomial. For istae: f(p, q) Γ(α + β + γ) Γ(α)Γ(β)Γ(γ) pα q β ( p q) γ

4 Pólya s ur Pólya s ur represets a geeralizatio of a Biomial radom variable. Cosider the followig sheme: A ur otais b bla ad r red balls. The ball draw is always replaed, ad, i additio, balls of the olor draw are added to the ur. Whe, drawigs are equivalet to idepedet Beroulli proesses b b+r with p. However, with, the Beroulli proesses are depedet, eah with a parameter that depeds o the sequee of previous drawigs. For istae, if the first ball is bla, the (oditioal) probability of a bla ball at the seod drawig is (b + )/(b + + r). The probability of the sequee bla, bla is, therefore, b b + b + r b + + r Let X be a radom variable deotig the umber of bla balls draw i trials. What is P (X )? It is easy to show that all sequees with bla balls have the same probability p ad, therefore, ( ) P (X ) p We ow ompute p as: i [b + (i )] i [r + (i )] [b + r + (i )] i Rewritig i terms of the Gamma futio (assumig > ), we have: i [ b + i ] i [ r + i ] i [ b+r + i ] Γ( b +)Γ( r + ) Γ( b )Γ( r ) Γ( b+r +) Γ( b+r ) Γ( b + )Γ( r + ) Γ( b + r + ) Γ( b + r ) Γ( b )Γ( r ) B( b +, r + ) B( b, r ) Therefore, the importat parameters are b/ ad r/. Note that we a rewrite the above as (verify it): So, p ( ) P (X ) ( b p ( p) Be, r ) dp ( b p ( p) Be, r ) dp

5 Pólya s ur geerates beta We ow show that Pólya s ur geerates a beta distributio at the limit. For this, we will osider. lim X /. First ote that we a write P (X ) as follows: P (X ) Γ( b + r ) Γ( + b ) Γ( + r ) Γ( + ) Γ( b )Γ( r ) Γ( + ) Γ( + ) Γ( + b + r ) Usig Stirlig s approximatio Γ(x + ) 2πx ( x e ) x as x goes to ifiity, we a olude that whe x goes to ifiity, Γ(x + a) Γ(x + b) xa b Therefore, whe (but x for some < x < ), Now, P (X ) B( b, r ) b ( ) r b r P ( X x) P (X ) + P (X As goes to ifiity, / goes to zero; therefore: ) +... + P (X x ) P ( X u)du lim [P (X ) + P (X ) +... + P (X x )] P ( X x) P ( X u)du P (X u)du Ad sie u, we a replae by u i the limitig expressio we obtaied for P (X ) to get: P ( X x) B( b, r )ub ( u) r du It is rather iterestig that this limitig property of Pólya s ur depeds o the iitial oditio. Eve more iterestig is that if Y lim X /, the oditioed o Y p we have idepedet Beroulli trials with parameter p (stated without proof). P (X Y p) ( ) p ( p)