More on Bayes and conjugate forms

Similar documents
Continuous Random Variables and Continuous Distributions

1 Review of Probability and Distributions

Multivariate distributions

Exercises and Answers to Chapter 1

Probability and Distributions

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

2 Functions of random variables

Continuous Distributions

1 Solution to Problem 2.1

Chapter 2 Continuous Distributions

Random Variables and Their Distributions

Bayes and conjugate forms

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Statistics, Data Analysis, and Simulation SS 2013

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Things to remember when learning probability distributions:

Statistics, Data Analysis, and Simulation SS 2015

This does not cover everything on the final. Look at the posted practice problems for other topics.

Formulas for probability theory and linear models SF2941

Ching-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Northwestern University Department of Electrical Engineering and Computer Science

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

Common probability distributionsi Math 217 Probability and Statistics Prof. D. Joyce, Fall 2014

Math Spring Practice for the Second Exam.

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Continuous Random Variables

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

Attempt QUESTIONS 1 and 2, and THREE other questions. penalised if you attempt additional questions.

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

1 Random Variable: Topics

Sampling Distributions

1 Presessional Probability

MAS3301 Bayesian Statistics Problems 2 and Solutions

Probability Distributions - Lecture 5

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Probability Distributions Columns (a) through (d)

Bayesian Models in Machine Learning

MAS223 Statistical Inference and Modelling Exercises

Probability Density Functions

General Random Variables

Statistics (1): Estimation

1 Review of di erential calculus

STA 256: Statistics and Probability I

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

3 Continuous Random Variables

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

So far we discussed random number generators that need to have the maximum length period.

Guidelines for Solving Probability Problems

Chapter 3: Random Variables 1

Probability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution

Statistics for scientists and engineers

SOLUTION FOR HOMEWORK 12, STAT 4351

Actuarial Science Exam 1/P

λ(x + 1)f g (x) > θ 0

3. Probability and Statistics

CS 361: Probability & Statistics

Statistical Theory MT 2007 Problems 4: Solution sketches

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( )

18.175: Lecture 15 Characteristic functions and central limit theorem

Slides 8: Statistical Models in Simulation

18.440: Lecture 28 Lectures Review

Conditional distributions. Conditional expectation and conditional variance with respect to a variable.

Statistical Theory MT 2006 Problems 4: Solution sketches

5. Conditional Distributions

Probability Theory and Statistics. Peter Jochumzen

Lecture 3. David Aldous. 31 August David Aldous Lecture 3

STAT 3610: Review of Probability Distributions

2.3 Analysis of Categorical Data

Statistics 100A Homework 5 Solutions

Chapter 2: Random Variables

E[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =

MA 519 Probability: Review

Stochastic Models of Manufacturing Systems

Chapter 3: Random Variables 1

Experimental Design and Statistics - AGA47A

The random variable 1

Multiple Random Variables

Variational Bayes and Variational Message Passing

1 Review of Probability

) k ( 1 λ ) n k. ) n n e. k!(n k)! n

Physics 403 Probability Distributions II: More Properties of PDFs and PMFs

Sampling Distributions

University of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions

Probability and Statistics Concepts

Continuous random variables

Math/Stats 425, Sec. 1, Fall 04: Introduction to Probability. Final Exam: Solutions

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

Stat410 Probability and Statistics II (F16)

Statistics, Data Analysis, and Simulation SS 2017

1 Introduction. P (n = 1 red ball drawn) =

Solution to Assignment 3

Exam 3, Math Fall 2016 October 19, 2016

On the number of ways of writing t as a product of factorials

Exam 2 Practice Questions, 18.05, Spring 2014

Prelim Examination 2010 / 2011 (Assessing Units 1 & 2) MATHEMATICS. Advanced Higher Grade. Time allowed - 2 hours

Transcription:

More on Bayes and conjugate forms Saad Mneimneh A cool function, Γ(x) (Gamma) The Gamma function is defined as follows: Γ(x) = t x e t dt For x >, if we integrate by parts ( udv = uv vdu), we have: Γ(x) = t x e t = + (x ) = (x )Γ(x ) (x )t x e t dt t x e t dt Note also that Γ() = e t dt =. We conclude that if x is an integer, Γ(x) = (x )!. Therefore, the Gamma function represents a generalization of the factorial function defined only on the non-negative integers. Given any a >, the product of the following k terms can now be expressed as: a (a + ) (a + )... (a + k ) = Γ(a + k) Γ(a) Perhaps the most famous value of the Gamma function for a non-integer is Γ(/) = π. The Gamma function is a component of various probability density functions as we will see later on. Chi-squared density Let X...X n be independent normal variables such that X i N(,), and consider the sum V = X +...X n. What is the probability density of V? For simplicity, let us first stop the sum at X, i.e. the probability density of X if X N(,). P( y δ X y) + P( y X y + δ) = P(y X ( y + δ) )

f X ( y)δ + f X ( y)δ = f X (y)(δ + yδ) Taking the limit as δ, we have: f X (y) = φ( y) + φ( y) y = π (y ) e y The expression above is arranged to reveal a form similar to the integrand of the Gamma function. Therefore, using a change of variable t = y/: ( y ) e y dy = t e t dt = Γ( ) = π This shows that the density integrates to. Another way for obtaining the above density is to use an appropriate change of variable. To illustrate this general technique, assume that f X (x) is given, and that we are interested in finding f Y (y) where y = g(x). Observe that dy = g (x)dx and, therefore, dx = dy/g (x). Hence, fx (x) f X (x)dx = g (x) dy = with appropriate adjustment of the integral range. If x = g (y) is uniquely obtained from y, then we can write the above as follows: fx (g (y)) g (g (y)) dy = implying that f Y (y) = f X(g (y)) g (g (y)) If, however, x is not uniquely obtained from y, then we add up the contribution of all solutions. Let us apply this to our example where f X (x) = φ(x) and y = g(x) = x. We have g (x) = x and x = ± y. Therefore, g (x) = y. f Y (y) = f X( y) + f X ( y) y and since f X (x) = φ(x) is symmetric, we get f Y (y) = φ( y) y = φ( y) = y π (y ) e y which is as obtained before. The above suggests also that we can generalize the form of the density for k as follows: Γ( k )(y ) k e y

where f X (y) being the special case when k =. This is called the χ (Chisquared) density with parameter k, or k degrees of freedom. It turns out χ is exactly the density for V when k = n. V = X +... + X n χ n This is because the sum of two χ independent random variables with parameters k and k is a χ random variable with parameter k = k + k. To prove this, one can use the transform of a χ random variable. Recall that the transform of a random variable Z is E[e sz ]. Therefore, the transform of Z +Z is E[e s(z+z) ] = E[e sz+sz ] = E[e sz e sz ] = E[e sz ]E[e sz ] if Z and Z are independent. It is easy to show that the transform of a χ k random variable is ( s) k/ and, therefore, the transform of the sum of two independent χ random variables with parameters k and k is ( s) k/ ( s) k/ = ( s) (k+k)/, which is the transform of a χ k random variable where k = k + k. Example: Fairness of dice. Consider throwing a pair of dice, and let S be the random variable corresponding to a given total on a single throw. If the pair of dice is fair, S can take the following values with the given probabilities: 3 4 5 6 7 8 9 8 9 5 6 Assume that we throw the pair of dice n = 44 times, and let Y s be the random variable corresponding to the number of times we obtain a particular value s for S. Note that Y s is the sum of n Bernoulli trials with success probability p s as given in the above table. Consider the following real data: s 3 4 5 6 7 8 9 y s 4 9 5 4 9 6 np s 4 8 6 4 6 8 4 How can we test whether or not the given pair of dice is loaded? For s =,..., and in general for s =...k, consider the following random variable: Z s = 5 9 Y s np s nps ( p s ) where Y s is the number of times outcome s occurs, and p s is the corresponding probability (thus np s is the expected number of times s should occur). A natural way is to consider i Z i to see whether this sum is (probabilistically) too high or too low. By the central limit theorem, Z s N(,) when n is large. Therefore, s Z s χ k. But the Zs are not completely independent! For instance, Y k can be computed if Y,...,Y k are known (because s Y s = n). Instead, let us consider Z s = Y s np s nps 8

and, for simplicity, assume we only have two possible outcomes (Y + Y = n,p + p = ). Z + Z = (Y np ) + (Y np ) = (Y np ) + ( Y + np ) np np np n( p ) = (Y np ) np ( p ) = Z We know that Z χ. In general (proof omitted), if k d variables are sufficient to determine all k, then V = k i= Z i χ k d Applying this to our dice problem (k =,d = ), we compute V = 7.4583. From the table for χ, we can find P(V 7.4583) = V χ (y)dy, and see that V = 7.4583 falls within the entries for 5% and 5%, so it is not significantly high or significantly low; thus the pair of dice is satisfactory (not loaded). 3 Chi-squared prior To Keep a Bayesian spirit, the χ density provides a conjugate prior in a number of cases. For instance, let x,...,x n be independent Poisson random variables with parameter λ. Then where T = i x i. If mλ χ k for a prior, i.e. P(x,...,x n λ) λ T e nλ then f(λ) = m Γ(k/) (mλ )k/ e mλ f(λ x,...,x n ) λ T e nλ ( mλ )k/ e mλ f(λ x,...,x n ) ( (n + m)λ ) (k+t)/ e (n+m)λ (n + m)λ x,...,x n χ k+t Now consider n independent Gaussian random variables with a variance σ that is unknown (mean µ is known). X i σ N(µ,σ )

What would be an appropriate prior for σ (a conjugate one)? Note that Let S = i (x i µ), then: f(σ x,...,x n ) Pi (x σ n e i µ) σ f(σ ) Therefore, f(σ x,...,x n ) (/σ ) n/ e S/σ f(σ ) f(σ dσ x,...,x n ) d(/σ ) (/σ ) n/ e f(σ dσ ) d(/σ ) The term dσ /d(/σ ) adjusts for changing the variable from σ to /σ so that integration of the density is with respect to /σ rather than σ. We get: S/σ f(/σ x,...,x n ) (/σ ) n/ e S/σ f(/σ ) If S /σ χ k for some S, then: Therefore, f(/σ ) (/σ ) k/ e S /σ f(/σ x,...,x n ) (/σ ) n+k e (S+S )/σ (S + S )/σ x,...,x n χ n+k Example: Consider the following sample (n = ): 9 8 6 4 8 7 5 9 9 5 9 4 3 6 4 3 We can compute x, and let us, for simplicity, assume that this is the real value of µ. In this case, S = 664. Now one may argue that knowledge of k and S is not available. In this case, let k = and S = to get the following (improper) prior: f(/σ ) (/σ ) Therefore, the posterior will be given by 644/σ χ and from the tables we see that σ is between and 75 with probability.95.

f(log σ ) (/σ ) / d(log σ ) d(/σ ) ) = (/σ ) / d(log(/σ )) d(/σ ) ) which is uniform in log σ (,+ ). This prompts the following question: is the improper uniform prior equivalent to some conjugate form? In other words, is there a function g(θ) such that f(θ x) has a valid density when f(g(θ))? This is equivalent to the following statement (why?): f(θ x) f(x θ) dg(θ) dθ The answer to this of course depends on f(x θ). For our example, it is clear that we need dg(/σ ) d(/σ ) = (/σ ) which is satisfied if g(/σ ) = log σ. If we recall the example of deciding on the mean µ of independent Gaussian samples, g(µ) = µ is appropriate. Example: Let f(x λ) = λe λx (exponential density). Then f(λ x) λe λx dg(λ) dλ Obviously, g(λ) = log λ provides a valid posterior density (exponential). Therefore, the improper uniform prior f(log λ) works.