Lecture 5: Moment Generating Functions

Similar documents
Lecture 2: Discrete Probability Distributions

Chapter 5 continued. Chapter 5 sections

(Multivariate) Gaussian (Normal) Probability Densities

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 5. Chapter 5 sections

Joint p.d.f. and Independent Random Variables

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Multivariate Bayesian Linear Regression MLAI Lecture 11

Normal Distribution and Central Limit Theorem

Let X and Y denote two random variables. The joint distribution of these random

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

STAT Chapter 5 Continuous Distributions

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

LIST OF FORMULAS FOR STK1100 AND STK1110

Statistics, Data Analysis, and Simulation SS 2015

FINAL EXAM: Monday 8-10am

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

18 Bivariate normal distribution I

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

1 Presessional Probability

Section 9.1. Expected Values of Sums

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Probability and Statistics Notes

TAMS39 Lecture 2 Multivariate normal distribution

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

Chapter 4 Multiple Random Variables

Gaussian Models (9/9/13)

01 Probability Theory and Statistics Review

ECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding.

6 The normal distribution, the central limit theorem and random samples

CS Lecture 19. Exponential Families & Expectation Propagation

18.440: Lecture 28 Lectures Review

Random Variables and Their Distributions

Sampling Distributions

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

Special distributions

2.3. The Gaussian Distribution

Introduction to Probability and Stocastic Processes - Part I

ACM 116: Lectures 3 4

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

3. Probability and Statistics

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Lecture 1: Probability Fundamentals

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Let X and Y be two real valued stochastic variables defined on (Ω, F, P). Theorem: If X and Y are independent then. . p.1/21

Chapter 3.3 Continuous distributions

04. Random Variables: Concepts

3. Review of Probability and Statistics

Statistics and data analyses

1: PROBABILITY REVIEW

We introduce methods that are useful in:

Basics on Probability. Jingrui He 09/11/2007

1 Exercises for lecture 1

Lecture Notes on the Gaussian Distribution

Stat410 Probability and Statistics II (F16)

Introduction to Gaussian Processes

Chp 4. Expectation and Variance

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

MAS223 Statistical Inference and Modelling Exercises

Elements of Probability Theory

A Probability Review

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Chapter 2: Random Variables

Exercises and Answers to Chapter 1

Lecture 2: Review of Basic Probability Theory

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Business Statistics 41000: Homework # 2 Solutions

Multiple Random Variables

Stat 5101 Notes: Brand Name Distributions

PROBABILITY THEORY LECTURE 3

Probability and Estimation. Alan Moses

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

18.175: Lecture 17 Poisson random variables

Lecture 15 Random variables

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Final. Fall 2016 (Dec 16, 2016) Please copy and write the following statement:

EE4601 Communication Systems

Gaussian Processes for Machine Learning

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

CS281A/Stat241A Lecture 17

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Introduction to Normal Distribution

Stat 100a, Introduction to Probability.

Ch. 5 Joint Probability Distributions and Random Samples

The Multivariate Gaussian Distribution

Continuous Random Variables

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

Statistical Methods in Particle Physics

Review (Probability & Linear Algebra)

Gaussian random variables inr n

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Lecture 2: Repetition of probability theory and statistics

B4 Estimation and Inference

Fundamental Tools - Probability Theory II

UNIT Define joint distribution and joint probability density function for the two random variables X and Y.

Transcription:

Lecture 5: Moment Generating Functions IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge February 28th, 2018 Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 1 / 16

Moment Generating Functions The computation of the central moments (e.g. expectation and variance) as well as combinations of random variables such as sums are useful, but can be tedious because of the sums or integrals involved. Example: The expectation of the Binomial is E[X] = n r p(x = r) = r=0 = np = np n r=1 ñ r=0 n r n C r p r (1 p) n r = r=0 (n 1)! (n r)!(r 1)! pr 1 (1 p) n r ñ! (ñ r)! r! p r (1 p)ñ r = np, n r=1 rn! (n r)!r! pr (1 p) n r where ñ = n 1 and r = r 1, and using the fact that the Binomial normalizes to one. Moment Generating functions are a neat mathematical trick which sometimes sidesteps these tedious calculations. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 2 / 16

The Discrete Moment Generating Function For a discrete random variable, we define the moment generating function g(z) = r z r p(r). This is useful, since when differentiated w.r.t. z an extra factor r appears in the sum, thus g (z) = r rz r 1 p(r), and g (z) = r r(r 1)z r 2 p(r). So g (1) = r rp(r), and g (1) = r (r 2 r)p(r), and E[R] = g (1), and E[R 2 ] = g (1) + g (1). Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 3 / 16

The Binomial Distribution The Binomial has g(z) = r nc r z r p r (1 p) n r = r nc r (pz) r (1 p) n r = (q + pz) n, by the Binomial theorem, where we have defined q = 1 p. Thus, we have g (z) = np(q + pz) n 1, and g (z) = n(n 1)p 2 (q + pz) n 2. So g (1) = np, and g (1) = n(n 1)p 2, and E[X] = np, and E[X 2 ] = n 2 p 2 np 2 + np, which combine to E[X] = np, and V[X] = E[X 2 ] E[X] 2 = np np 2 = npq. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 4 / 16

Some Discrete Moment Generating Functions distribution symbol probability moment generating function Bernoulli Ber(p) p(x) = p x (1 p) 1 x g(z) = q + zp Binomial B(n, p) p(r) = n C r p r (1 p) n r g(z) = (q + zp) n Poisson Po(λ) p(r) = exp( λ)λ r /r! g(z) = exp(λ(z 1)) where we have defined q = 1 p. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 5 / 16

Sums of Random Variables Example: Consider W = X + Y, where X Po(λ x ) and Y Po(λ y ) are independent Poisson distributed. Then p(w= w) = x w p(x = x)p(y = w x) = = exp( λ x λ y ) w! = Po(λ x + λ y ), w x=0 w exp( λ x ) λx x x! exp( λ y) λw x y (w x)! x=0 w! x!(w x)! λx xλy w x i.e. the Poisson distribution is closed under addition. = exp( λ x λ y ) (λ x + λ y ) w w! Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 6 / 16

Sums using Moment Generating Functions Now W = X + Y, then g w (z) = w = w = x = x z w p(x = x)p(y = w x) x z x p(x = x)z w x p(y = w x) x z x p(x = x)z y p(y = y) y z x p(x = x) z y p(y = y) y = g x (z)g y (z). I.e., the sum of independent random variables has a moment generating function, which is the product of the moment generating functions. Example: we see immediately, that the sum of two independent Poisson is Poisson with λ = λ x + λ y as g(z) = exp(λ(z 1)). Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 7 / 16

1 In the past a different definition g(s) = x exp( sx)p(x)dx was used. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 8 / 16 Moment Generating Functions in the Continuous case For continuous distributions 1 g(s) = x exp(sx)p(x)dx, which is related to the two-sided Laplace transform. We have g (s) = x exp(sx)p(x)dx, and g (s) = x 2 exp(sx)p(x)dx, and so on, which gives E[X] = g (0), and E[X 2 ] = g (0). Also, the sum of two independent continuous random variables, which is the convolution of the probability densities, has a moment generating function which is the product of the moment generating functions. Similar to the discrete case and to Laplace transforms from signal analysis.

Moment Generating Functions in the Continuous case distribution symbol probability moment generating function Uniform Uni(a, b) p(x) = 1/(b a) g(s) = exp(as) exp(bs) s(b a) Exponential Ex(λ) p(x) = λ exp( λx) g(s) = λ/(λ s) Gaussian N(µ, σ 2 ) p(x) = exp( (x µ)2 2σ 2 ) 2πσ 2 g(s) = exp(sµ + s 2 σ 2 /2) The moment generating functions for shifted and scaled random variables are Y = X + β, g y (s) = exp(βs)g x (s) and Y = αx, g y (s) = g x (αs), which are both verified by plugging into the definition. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 9 / 16

The multivariate Gaussian The multivariate Gaussian in D dimensions, where x is a vector of length D has probability density p(x) N(µ, Σ) = (2π) D/2 Σ 1/2 exp ( 1 2 (x µ) Σ 1 (x µ) ), where µ is the mean vector of length D and Σ is the D D covariance matrix. The covariance matrix is positive definite and symmetric. The entries of the covariance matrix Σ ij are the covariances between different coordinates of x Σ ij = E[(x i µ i )(x j µ j )] = E[x i x j ] µ i µ j. In a Gaussian, if all covariances Σ i j are zero, Σ is diagonal, and the components x i are independent, since then p(x) = i p(x i). Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 10 / 16

The Gaussian Distribution In the multivariate Gaussian, the equi-probability contours are ellipses. The axis directions are given by the eigenvectors of the covariance matrix and their lengths are proportional to the square root of the corresponding eigenvalues. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 11 / 16

Correlation and independence The covariance matrix is sometimes written as Σ = [ σ 2 1 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 2 where 1 < ρ < 1 is the correlation coefficient. When ρ < 0, the variables are anti-correlated ρ = 0, uncorrelated ρ > 0, positively correlated Independence: p(x, Y) = p(x)p(y). Note: independence uncorrelated, but not vice versa. Example: X i are independent, with X i N(0, 1) and Y i = ±X i (with random sign). Here, X and Y are uncorrelated, but not independent. ], Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 12 / 16

Conditionals and Marginals of a Gaussian joint Gaussian conditional joint Gaussian marginal Both the conditionals and the marginals of a joint Gaussian are again Gaussian. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 13 / 16

Conditionals and Marginals of a Gaussian Recall marginalization: p(x) = p(x, y)dy. For Gaussians: And conditioning p(x, y) = N ([ a ] [ A C ] ), = p(x) = N(a, A). b C B p(x y) = N ( a + CB 1 (y b), A CB 1 C ). Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 14 / 16

The Central Limit Theorem If X 1, X 2,..., X n are all identically independently distributed random variables with mean µ and variance σ 2, then in the limit of large n X 1 + X 2 +... + X n N(nµ, nσ 2 ), regardless of the actual distribution of X i. Note: As we expect, the means and the variances add up. Equivalently X 1 + X 2 +... + X n nµ σ n N(0, 1). The Central Limit Theorem can be proven by examining the moment generating function. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 15 / 16

Central Limit Theorem Example The distribution of X n = Y 1 + Y 2 +... + Y n nµ σ n where Y i Ex(1) for different values of n p(x) 1 0.5 n=1 n=2 n=3 n=5 n=10 n 0 3 2 1 0 1 2 3 x Even for quite small values of n we get a good approximation by the Gaussian. Rasmussen (CUED) Lecture 5: Moment Generating Functions February 28th, 2018 16 / 16