MIT Spring 2016

Size: px

Start display at page:

Download "MIT Spring 2016"

Eleanore Turner
6 years ago
Views:

1 Exponential Families II MIT Dr. Kempthorne Spring

2 Outline Exponential Families II 1 Exponential Families II 2

3 : Expectation and Variance U (k 1) and V (l 1) are random vectors If A (m k), B (m l) are nonrandom, and then E (AU + BV) = AE (U) + BE (V) If U = c with probability 1 E (U) = c. For a random vector U, if E ( U 2 ) = ) k E (U 2 ) < define the variance of U by Var(U) For A (m k) as above: Var(AU) = AVar(U)A T For c (k 1) a constant vector Var(U + c) = Var(U) For a (k 1) a constant vector, ) k i=1 i = E (U E (U))(U E (U)) T = Cov(U i, U j ) (k k) (m m) 3 Var(a T U) = Var( j=1 a j U j ) = a T Var(U)a = i,j a i a j Cov(U i, U j )

4 : Expectation and Variance Proposition B.5.1 If E [ U 2 ] < then Var(U) is positive definite if and only if P[a T U + b = 0] < 1, for every a = 0, and b R. Proof. Var(U) is not positive definite iff A T Var(Y)a = 0 for some a = 0 which is equivalent to Var(aT U) = 0. 4

5 : Covariance Definition: For random vectors U (k 1) and V (l 1) define the Covariance of U (k 1) [ and V (l 1) by Cov(U, V) = E (U E (U))(V E (V)) T ] (k l) (must assume: E U 2 < and E V 2 < ) If U and V are independent Cov(U, V) = 0. For nonrandom A, a, B, b, Cov(AU + a, BV + b) = ACov(U, V)B T If U and W are random (k 1) vectors, then Var(U + W) = Var(U) + Cov(U, W)+ Cov(W, U) + Var(W) and if U and W are independent Var(U + W) = Var(U) + Var(W) 5

6 : Moment Generating Functions Moment-Generating Function of a Random Vector Let T = (T 1, T 2,..., T k ) T be a (k 1) random vector. For s = (s 1, s 2,..., s k ) T R k, define M(s) E [e st T ] M(s) is the moment-generating function (mgf) of T The mgf may not exist for a given T. If it does exist, it is defined for s in some ball centered at s = 0. Define the characteristic function (cf) of T: is φ(s) = E [e T T ] = E [cos(s T T)] + ie [sin(s T T)] The cf always exists. 6

7 : Moment Generating Functions Theorem B.5.1 Let S = {s : M(s) < }. Then S is convex. If S has a nonempty interior S 0, (contains a sphere S(0, c), c > 0), then M is analytic on S 0. If S 0 =, and E [ T p ] < for all p, then if i 1 + i i k = p, p M(s) T i i s=0 = E [T i 1 i 1 i k 1 k ] s 1 s k M (s = 0) = E(T j ) = E [T] s j 2 M (s = 0) = E(T i T j ) = E[TT T ] s i s j If S 0 is nonempty, then M(s) determines the distribution of U uniquely. 7

8 : Moment Generating Functions Definition: The Cumulant Generating Function of the random vector T with mgf M T (s) is K (s) = K T (s) = log M T (s). p i 1 i i K (s) s=0 1 k c i1,,i k = c i1,,i k (T) = s s In the bivariate case (k = 2) where µ = E [T], and τ i,j = E [(T 1 µ 1 ) i (T 2 µ 2 ) j ] c 10 = µ 1 c 01 = µ 2 c 2,0 = τ 2,0 = var(t 1 ) c 0,2 = τ 0,2 = var(t 2 ) c 1,1 = τ 1,1 = cov(t 1, T 2 ) c 3,0 = τ 3,0 = E [(T 1 µ 1 ) 3 ] c 0,3 = τ 0,3 = E [(T 2 µ 2 ) 3 ] c 4,0 = τ 4,0 3τ 2 2,0 c 0,4 = τ 0,4 3τ 2 0,2 8

9 Sums of Independent If U and V are independent (k 1) random vectors, then M U+V (s) = M U (s) M V (s) K U+V (s) = K U (s) + K V (s) 9

10 Multivariate Normal Distributions Definition B.6.1: A random vector U (k 1) has a k-variate normal distribution iff U can be written as U = µ + AZ where µ, A are constant and Z = (Z 1,..., Z k ) T ; Z i iid N(0, 1). Definition B.6.2: A random vector U (k 1) has a k-variate normal distribution iff for every (k 1) nonrandom a: k a T U = a i U i has a univariate normal distribution i=1 The moment generating n function of U is M T 1 U (s) = exp s µ + 2 s T Σs where µ = E [U], and Σ = Cov(U) = AA T. 10

11 Outline Exponential Families II 1 Exponential Families II 11

12 12

13 Theorem Let P be a canonical k-parameter exponential family generated by (T, h), with corresponding natural parameter space E and function A(η). Then E is convex A : E R is convex If E has nonempty interior E 0 R k, and η 0 E 0, then T(X ) has under η 0 a mgf given by M(s) = exp {A(η 0 + s) A(η 0 )} valid for all s such that η 0 + s E. (this set of s includes a ball about η 0 ) Corollary Under the conditions of the theorem E η0 [T(X )] = A(η 0 ) Var η0 [T(X )] = A(η 0 ) A 2 A where A(η 0 ) = (η 0 ) η and A(η 0 ) = (η 0 ) j η i η j 13

14 Example: Multinomial Distribution Multinomial Distribution X = (X 1, X 2,..., X q ) Multinomial(n, θ = (θ 1, θ 2,..., θ q )) n θ x 1 θ x 2 x p(x θ) = q θq where x 1! x q! 1 2 q is a given positive integer, ) q θ = (θ 1,..., θ q ) : 1 θ j = 1. n is a given positive integer ) q 1 X i = n. 14

15 Example: Multinomial Distribution n θ x1 θ x2 x p(x θ) = q x θq 1! x q! 1 2 n = x 1! x q! exp{log(θ 1 )x log(θ q 1 )x q 1 ) q 1 ) q 1 +log(1 1 θ j )[n 1 x j ]} ) q 1 = h(x)exp{ j=1 η j (θ)t j (x) B(θ)} ) q 1 = h(x)exp{ j=1 η j T j (x) A(η)} where: n h(x) = x1! x q! η(θ) = (η 1 (θ), η 2 (θ),..., η q 1 (θ)) ) q 1 η j (θ) = log(θ j /(1 1 θ j )), j = 1,..., q 1 T (x) = (X 1, X 2,..., X q 1 ) = (T 1 (x), T 2 (x),..., T q 1 (x)). ) q 1 ) q 1 B(θ) = nlog(1 θ j ) and A(η) = +nlog(1 + e η j ) j=1 j=1 q 1 e η j θ j /(1 θ k ) 1+ j=1 e 1+ 1 θ k /(1 1 θ k ) A(η) j = n q 1 ηj = n q 1 1 q 1 = nθ j A(η) i,j = nθ i θ j, (i = j) and A(η) i,i = nθ i (1 θ i ), 15

16 Rank of Exponential Family Defining the Rank of an Exponential Family Every k-parameter exponential family is also a k -parameter exponential family for any k > k. The minimal value of k defines the rank of the exponential family. Define minimal k as the rank when the generating statistic T(X ) is k-dimentional, and the collection {1, T 1 (X ), T 2 (X ),..., T k (X )} are linearly ) independent with positive probability, i.e., P[ k j=1 a j T j (X ) = a k+1 η] < 1, unless all a j = 0. Note: the set of positive support on X does not depend on η. 16

17 Rank of Exponential Family Theorem Let P = {q(x η), η E} be a canonical exponential family generated by (T(X ), h(x )) with natural parameter space E such that E is open. Then the following statements are equivalent P is of rank k. η is an identifiable parameter. Var(T η) is positive defnite η A(η) is 1-to-1 on E. A(η) is strictly convex on E. Note: E open = A defined on all E. Corollary If P is of rank k under Theorem 1.6.4, then P may be uniquely parametrized by µ(η) E [T(X ) η]. log[q(x, η)] is strictly concave in η on E. 17

18 p-variate Gaussian Family Let Y be a (p 1) random vector with a p-variate Gaussian distribution Y N k (µ, Σ) where µ = E [Y] and Σ = Var(Y) is positive definite, rank p. The density of Y is p(y, µ, Σ) = det(σ) 1 2 (2π) p 2 exp{ 1 2 (Y µ) T Σ 1 (Y µ)} Taking logs: log[p(y, µ, Σ)] = 1 Y T 2 Σ 1 Y + [Σ 1 µ] T Y 1 µ T Σ 1 µ 1 log[det(σ)] p log[2π]] Defining ) Σ 1 = σ i,j ), we can write ) the ) first 2 terms as 1 σ i,i Y 2 p p [ i<j σ i,j Y σ i,j i Y j + 2 i i ] + i=1[ j=1 µ j ]Y i The parameter space dimension is k = p + p(p + 1)/2 = p(p + 3)/2 The sufficient statistics are [(Y 1,..., Y p ), {Y i Y j, 1 i j p}], 18

19 h(y ) 1 θ = (µ, Σ) T Σ 1 B(θ) = 1 log[ det(σ) ] + µ µ 2 For a sample Y 1,..., Y n of iid N p (µσ) r.vectors, the data X = (Y 1, Y 2,..., Y n ) follows the k = ) p(p + 3)/2 parameter ) exponential family with T = ( i Y i, LowerTriangle( i Y i Y T i )) (LowerTriangle( ) refers to matrix elements along and below the diagonal) 19

20 Conjugate Families of Prior Distributions 20 Let X 1,..., X n be a sample from the k-parameter exponential family n ) k ) n p(x θ) = [ i=1 h(x i )]exp{ j=1 η j (θ) i=1 T j (x i ) nb(θ)} where θ is k-dimensional. Treat θ as the variable of interest in p(x θ) Treat n and T j as parameters in p(x θ) Find a normalizing function: f )k ω(t) = exp j=1 t j η j (θ) t k+1 B(θ) dθ 1 dθ k and set Ω = {(t 1,..., t k+1 ) : 0 < ω(t 1,..., t k+1 ) < } Proposition The (k + 1)-parameter exponential family given by f )k π t (θ) = exp j=1 t j η j (θ) t k+1 B(θ) log[ω(t)] where t = (t 1,..., t k+1 ) Ω, is a conjugate prior to p(x θ). MIT Exponential Families

21 MIT OpenCourseWare Mathematical Statistics Spring 2016 For information about citing these materials or our Terms of Use, visit:

MIT Spring 2016

MIT Spring 2016 Dr. Kempthorne Spring 2016 1 Outline Building 1 Building 2 Definition Building Let X be a random variable/vector with sample space X R q and probability model P θ. The class of probability models P = {P