2. Multivariate Distributions

Size: px
Start display at page:

Download "2. Multivariate Distributions"

Transcription

1 2. Multivariate Distributions Random vectors: mean, covariance matrix, linear transformations, dependence measures (a short introduction on the probability tools for multivariate statistics). Multidimensional normal distribution, mixture models (some well-known examples of multivariate probability distributions). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 1

2 2.1. Random vectors Multivariate data are the result of observing a random vector, a vector X = (X 1,..., X p ) whose components X j, j = 1,..., p, are random variables (r.v.) on the same probability space (Ω, A, P). Similarly, a random matrix is a matrix whose elements are r.v. The probability distribution of a random vector or matrix is characterized by the joint distribution of its components. In particular, the distribution function of a random vector X is F (x 1,..., x p ) = P{X 1 x 1,..., X p x p }, for (x 1,..., x p ) R p. In general, we will only work with continuous random vectors, whose probability distribution is characterized by the density function f = f (x 1,..., x n ), satisfying 1 f (x 1,..., x p ) 0 for all (x 1,..., x p ) R p ; 2 f (x 1,..., x p )dx 1... dx p = 1; R p 3 f (x 1,..., x p ) = p F (x 1,..., x p ). x 1... x p Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 2

3 The marginal distribution of each component X j, j = 1,..., p, is its probability distribution as an individual random variable. Its density function is: f j (x j ) = f (x 1,..., x p )dx 1... dx j 1 dx j+1... dx p, for x j R. R p 1 More generally, given the partition X (1) X =, X (2) with X (1) = (X 1,..., X r ) and X (2) = (X r+1,..., X p ), the marginal density of X (1) is f X (1)(x 1,..., x r ) = f (x 1,..., x p )dx r+1... dx p. R p r Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 3

4 Two random matrices X 1 and X 2 are independent if the elements of X 1 (as a collection of r.v.) are independent of the elements of X 2. (The elements within X 1 or X 2 need not be independent.) In particular, given the partition X = [X (1), X (2) ], the vectors X (1) and X (2) are independent if F (x 1,..., x p ) = F X (1)(x 1,..., x r ) F X (2)(x r+1,..., x p ), for all x 1,..., x p, or, equivalently, if f (x 1,..., x p ) = f X (1)(x 1,..., x r ) f X (2)(x r+1,..., x p ), for all x 1,..., x p. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 4

5 2.1.1 Expectation The expected value of a random vector (resp. matrix) is the vector (resp. matrix) of expected values of each of its components (the marginal expectations). For the random vector X = (X 1,..., X p ), µ := E(X) = (E(X 1 ),..., E(X p )) = (µ 1,..., µ p ), where µ j := E(X j ) = R x f j(x) dx. The expectation is a linear function: 1 If A is a q p constant matrix, X is a p-dimensional random vector and b is a q-dimensional constant vector, then E(AX + b) = AE(X) + b. 2 If X and Y are random matrices of the same dimension, then E(X + Y) = E(X) + E(Y). 3 If X is a q p random matrix and A, B are constant matrices of adequate dimensions, then E(AXB) = AE(X)B. If X 1 and X 2 are conformable independent matrices, then E(X 1 X 2 ) = E(X 1 )E(X 2 ). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 5

6 2.1.2 Covariance matrix The variance-covariance matrix (or simply covariance matrix) of a random vector X = (X 1..., X p ) with expectation µ is Σ = V(X) := E((X µ)(x µ) ) = E(XX ) µµ σ 11 σ σ 1p σ 21 σ σ 2p =..., σ p1 σ p2... σ pp where σ jj = V(X j ) is the variance of the r.v. X j and σ jk = Cov(X j, X k ) is the covariance of X j and X k, j, k = 1,..., p. Then Σ is a symmetric matrix. Some properties of the covariance matrix: 1 If A is a q p constant matrix, X is a p-dimensional random vector and b is a q-dimensional constant vector, then V(AX + b) = AV(X)A. 2 Σ = V(X) is always nonnegative definite. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 6

7 2.1.3 Correlation matrix Let X = (X 1,..., X p ) be a random vector with covariance matrix Σ and with 0 < σ jj = V(X i ) <, i = 1..., p. Define D := diag(σ 11,..., σ pp ). Then the correlation matrix of X is 1 ρ ρ 1p ρ ρ 2p ρ =... = D 1/2 ΣD 1/2, ρ p1 ρ p where ρ jk is the correlation of X j and X k, j, k = 1,..., p, and D 1/2 := diag(σ 1/2 11,..., σ 1/2 pp ). Observe that, if Z := D 1/2 (X µ), where µ = E(X), then V(Z) = ρ. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 7

8 2.1.4 Dependence measures More generally, the (cross-)covariance between the p-dimensional random vector X 1 and the q-dimensional random vector X 2, with means µ 1 and µ 2 respectively, is the p q matrix given by Cov(X 1, X 2 ) = E((X 1 µ 1 )(X 2 µ 2 ) ) Some properties of the cross-covariance: 1 If A and B are constant matrices and c and d are constant vectors, then Cov(AX 1 + c, BX 2 + d) = ACov(X 1, X 2 )B. 2 If X 1, X 2 and X 3 are random vectors, then Cov(X 1 + X 2, X 3 ) = Cov(X 1, X 3 ) + Cov(X 2, X 3 ). 3 If X 1 and X 2 are independent, then Cov(X 1, X 2 ) = 0 p q. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 8

9 Pearson s product-moment covariance measures linear dependence and, for the multivariate normal distribution, diagonal covariance matrix implies independence of the random vector components. In general, however, Pearson s correlation matrix does not characterize independence. Székely et al. (2007) introduced two dependence coefficients, distance covariance and distance correlation, that measure all types of dependence between random vectors X and Y of arbitrary (and possibly different) dimensions. Suppose that X in R p and Y in R q are random vectors. The characteristic function of X is ( ˆf X (t) := E e i t,x ) = e i t,x df X (x). R p Let ˆf Y be the characteristic function of Y, and denote the joint characteristic function of (X, Y ) by ˆf X,Y. Then X and Y are independent if and only if ˆf X,Y = ˆf XˆfY. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 9

10 Distance covariance is defined as a measure of the discrepancy between ˆf X,Y and ˆf X and ˆf Y : ˆf X,Y (t, s) ˆf X (t)ˆf Y (s) 2 w = ˆf X,Y (t, s) ˆf X (t)ˆf Y (s) 2 w(t, s) dt ds. R p+q The only integrable weight function w that makes this definition scale and rotation invariant is proportional to the reciprocal of t 1+p p s 1+q q, where p here denotes the Euclidean distance in R p. The distance covariance between random vectors X and Y with E X p < and E Y q < is the square root of V 2 (X, Y) = 1 Rp+q ˆf X,Y (t, s) ˆf X (t)ˆf Y (s) 2 c p c q t 1+p p s 1+q dt ds, (1) q with c p := π(p+1)/2 ). Γ ( p+1 2 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 10

11 Similarly, distance variance is defined as the square root of V 2 (X) = V 2 (X, X). The distance correlation between random vectors X and Y with E X p < and E Y q < is the square root of R 2 (X, Y) := V 2 (X, Y) V 2 (X)V 2 (Y), V2 (X)V 2 (Y) > 0, 0, V 2 (X)V 2 (Y) = 0. (2) Theorem 3 in Székely et al. (2007): If E X p < and E Y q <, then 0 R 1, and R(X, Y) = 0 if and only if X and Y are independent. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 11

12 For an observed random sample {(X i, Y i ), i = 1,..., n} of (X, Y) natural estimators of the unknown characteristic functions are ˆf X n (t) := e i t,x dfx n (x) = 1 n e i t,x i, ˆf Y n R p n (s) := 1 n n and ˆf n X,Y (t, s) := 1 n i=1 n e i t,x i +i s,y i, i=1 where FX n denotes the empirical distribution function of X 1,..., X n. i=1 e i s,y i The empirical distance covariance is defined as the square root of V 2 n(x, Y) := ˆf n X,Y (t, s) ˆf n X (t)ˆf n Y (s) 2 w. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 12

13 Székely et al. (2007) used the asymptotic properties of the empirical distance covariance to test the independence of X and Y: H 0 : H 1 : X and Y are independent X and Y are dependent Corollary 2 of Székely et al. (2007): If E X p < and E Y q < and X and Y are independent, then n V2 n(x, Y) S 2 d Q, (3) n where Q is a certain, known quadratic form of centered Gaussian random variables with E(Q) = 1 and S 2 := 1 n 2 n i,k=1 X i X k p 1 n 2 n Y i Y k q. i,k=1 The test statistic (3) is a particular case of the so-called energy statistics, functions of distances between statistical observations (see Székely and Rizzo 2013). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 13

14 2.2. Examples of multidimensional distributions Multidimensional normal distribution The random vector X = (X 1,..., X p ) follows a p-dimensional normal distribution with mean µ and covariance matrix Σ, and we denote it by X N p (µ, Σ), if its density function is f (x; µ, Σ) = 1 (2π) p/2 Σ 1/2 e (x µ) Σ 1 (x µ)/2, (4) where x = (x 1,..., x p ) and < x i <, i = 1,..., p. Example (Bivariate normal density): We evaluate the bivariate (p = 2) normal density in terms of the individual parameters µ 1 = E(X 1 ), µ 2 = E(X 2 ), σ 11 = V(X 1 ), σ 22 = V(X 2 ) and ρ 12 = Cor(X 1, X 2 ) = σ 12 / σ 11 σ 22. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 14

15 The determinant and inverse of the matrix ( ) ( ) σ11 σ Σ = 12 σ = 11 σ11 σ 22 ρ 12 σ 12 σ 22 σ11 σ 22 ρ 12 σ 22 are respectively Σ = σ 11 σ 22 (1 ρ 2 12 ) and Σ 1 = ( 1 σ 11 σ 22 (1 ρ 2 12 ) σ 22 σ 11 σ 22 ρ 12 σ 11 σ 22 ρ 12 σ 11 ) Thus (x µ) Σ 1 (x µ) = ( 1 σ = (x 1 µ 1, x 2 µ 2 ) 22 ) ( ) σ 11 σ 22 ρ 12 σ 11 σ 22 (1 ρ 2 12 ) x1 µ 1 σ 11 σ 22 ρ 12 σ 11 x 2 µ 2 1 [ = σ 11 σ 22 (1 ρ 2 12 ) σ22 (x 1 µ 1 ) 2 + σ 11 (x 2 µ 2 ) 2 2ρ 12 σ11 σ 22 (x 1 µ 1 )(x 2 µ 2 ) ] [ 1 (x1 µ 1 ) 2 = 1 ρ 2 + (x 2 µ 2 ) 2 ] x 1 µ 1 x 2 µ 2 2ρ σ 11 σ 22 σ11 σ22 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 15

16 Consequently, the bivariate normal density is f (x 1, x 2 ) = 1 2π σ 11 σ 22 (1 ρ 2 12 ) { [ 1 (x1 µ 1 ) 2 exp 2(1 ρ 2 12 ) + (x 2 µ 2 ) 2 ]} x 1 µ 1 x 2 µ 2 2ρ 12 σ 11 σ 22 σ11 σ22 Observe that, if ρ 12 = 0 (X 1 and X 2 are uncorrelated), then f (x 1, x 2 ) = = = 1 2π σ 11 σ22 exp 1 2π σ11 exp = f 1 (x 1 ) f 2 (x 2 ). { 1 2 { 1 [ (x1 µ 1 ) 2 2 (x 1 µ 1 ) 2 σ 11 } + (x 2 µ 2 ) 2 ]} = σ π σ22 exp { 1 2 σ 22 (x 2 µ 2 ) 2 } Since the joint density f (x 1, x 2 ) can be expressed as the product of the marginal densities, we conclude that X 1 and X 2 are actually independent r.v. σ 22 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 16

17 split.screen(c(2,3)) screen(1) ## bivariate normal pdf library(mvtnorm) x = y = seq(-5, 5, length = 50) f = function(x,y) { dmvnorm(cbind(x,y)) } z = outer(x, y, f) par(mai=c(0.1,0.1,0.1,0.1)) persp(x, y, z, theta=5, phi=50, expand=0.5, col="lightblue") screen(2) ## contours of the bivariate normal pdf x = y = seq(-5, 5, length = 150) z = outer(x, y, f) par(mai=c(0.5,0.5,0.5,0.5)) contour(x, y, z, nlevels=20, col=rainbow(20)) screen(3) ## normal data X = rmvnorm(n=100,sigma=matrix(c(1,0,0,1), ncol=2)) par(mai=c(0.5,0.5,0.5,0.5)) plot(x[,1],x[,2], pch=19,xlab=expression(x [1]),ylab=expression(x[2])) screen(4) x = y = seq(-5, 5, length = 50) Sigma = matrix(c(1,0.7,0.7,1), ncol=2) f = function(x,y) { dmvnorm(cbind(x,y),sigma= Sigma) } z = outer(x, y, f) par(mai=c(0.1,0.1,0.1,0.1)) persp(x, y, z, theta=5, phi=50, expand=0.5, col="lightblue") screen(5) ## contours of the bivariate normal pdf x = y = seq(-5, 5, length = 150) z = outer(x, y, f) par(mai=c(0.5,0.5,0.5,0.5)) contour(x, y, z, nlevels=20, col=rainbow(20)) screen(6) ## normal data X = rmvnorm(n=100,sigma=sigma) par(mai=c(0.5,0.5,0.5,0.5)) plot(x[,1],x[,2], pch=19,xlab=expression(x [1]),ylab=expression(x[2])) Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 17

18 x x x x 4 4 z 4 4 Advanced Course in Statistics. Lecturer: Amparo Baı llo x x1 x2 x z y 4 y x x x2 x zz z z y yyy y z x Multivariate Distributions 0 x

19 Properties of the multivariate normal distribution Let X N p (µ, Σ). 1 The normal density has a global maximum at µ and is symmetric with respect to µ in the sense that f (µ + a) = f (µ a) for all a R d. 2 Linear combinations of a multivariate normal are also normally distributed: if A is a (q p) constant matrix and d is a (q 1) constant vector, then AX + d N q (Aµ + d, AΣA ). Consequently, all subsets of the components of X are normally distributed. 3 Zero correlation between [ normal ] vectors is equivalent to X1 independence: if X =, then X 1 and X 2 are X 2 independent if and only if Cov(X 1, X 2 ) = 0. 4 If Σ > 0, there exists a linear transformation of X with mean 0 and covariance matrix equal to the identity. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 19

20 5 Contours of constant density for the multivariate distribution are ellipsoids centered at the population mean. If Σ > 0, then (a) the level sets of the probability density f are the ellipsoids given by {x R d : (x µ) Σ 1 (x µ) = c 2 }. These ellipsoids are centered at µ and have axes ±c λ i e i, where (λ i, e i ), i = 1,..., p, are the eigenvalue-eigenvector pairs of Σ. (b) (X µ) Σ 1 (X µ) follows a χ 2 p distribution. Thus, P{(X µ) Σ 1 (X µ) χ 2 p;α} = 1 α, for any 0 < α < 1. The Mahalanobis distance d M of a point x R p to the mean µ of a p-dimensional distribution with covariance matrix Σ is defined by d 2 M (x) = (x µ) Σ 1 (x µ). It is a statistical distance in the sense that it takes into account the variability of the distribution (unlike the Euclidean distance). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 20

21 6 If X N p (µ, Σ), then any linear combination of variables a X = a 1 X 1 + a 2 X 2 + a p X p is distributed as N(a µ, a Σa). Also, if a X is distributed as N(a µ, a Σa) for every a R p, then X must follow a N p (µ, Σ). 7 Let X 1,..., X n be mutually independent N p (µ j, Σ) random vectors. Let c 1..., c n be real constants. Then V = c 1 X c n X n n n follows a N p c j µ j, Σ distribution. j=1 j=1 8 Given X 1,..., X n a random sample from X N p (µ, Σ), the maximum likelihood estimators (m.l.e.) of µ and Σ are respectively ˆµ = X : 1 n c 2 j n X i and ˆΣ = S n = 1 n i=1 n (X i X)(X i X). i=1 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 21

22 9 The Central Limit Theorem: Let X 1,..., X n be independent observations from a population with mean µ and nonsingular covariance matrix Σ. Then n( X µ) d n N p(0, Σ) and n( X µ) S 1 ( X µ) d n χ2 p. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 22

23 The normality assumption on a sample from X can be assessed by examining the univariate marginal distributions of the components of X, which should be Gaussian; examining the bivariate scatterplots of the pairs of components of X, which should have an elliptical appearance; checking if the Mahalanobis distances di 2 follow a χ 2 p distribution. = (x i x) S 1 n (x i x) If the data are clearly non-normal, we can consider the possibility of taking nonlinear transformations of the variables. There are multiple proposals in the literature to test the multivariate normality assumption (see Székely and Rizzo 2005; McAssey 2013 and references therein). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 23

24 Example: Mass, snout-vent length and hind limb span of 25 lizards var var 2 var Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 24

25 Example: Concentration of Selenium in the teeth and liver of 20 whales (Delphinapterus leucas) at Mackenzie Delta, Northwest Territories, in var var Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 25

26 2.2.2 Distributions associated to the multivariate normal Correspondences between the univariate and the multivariate situations: Univariate case Multivariate case N(µ, σ) N p (µ, Σ) χ 2 n W p (Σ, n) F (m, n) Λ(p, a, b) t T 2 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 26

27 Wishart distribution Given a random sample of independent random vectors X 1,..., X n from a N p (0, Σ) distribution, the Wishart distribution W p (Σ, n) is that of the random p p matrix Q = n X i X i. i=1 Properties: 1 If Q 1 W p (Σ, n 1 ) and Q 2 W p (Σ, n 2 ) are independent, then Q 1 + Q 2 W p (Σ, n 1 + n 2 ). 2 Fisher s Theorem: If X 1,..., X n are independent N p (µ, Σ) random vectors, then i) the sample mean vector X and the sample covariance matrix S n are independent; ii) X N p (µ, 1 n Σ); iii) ns n W p (Σ, n 1). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 27

28 Wilks Lambda This is the distribution of the following determinants ratio Λ = A A + B = 1 I + A 1 Λ(p, a, b), B where A W p (Σ, a) and B W p (Σ, b) are independent, Σ is non singular and a p. Properties: 1 Bartlett s approximation: For large a, ( a + b p + b + 1 ) log Λ(p, a, b) χ 2 pb 2. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 28

29 Hotelling s T 2 It is the distribution of the r.v. T 2 = nz Q 1 Z T 2 (p, n), where Z N p (0, I) and Q W p (I, n) are independent. Properties: 1 If p = 1, then T 2 (1, n) is the square of a Student t distribution with n degrees of freedom. 2 n p + 1 T 2 (p, n) = F (p, n p + 1) np 3 Hotelling s distribution is invariant under affine transformations, that is, if X N p (µ, Σ) and R W p (Σ, n) are independent, then n(x µ) R 1 (X µ) T 2 (p, n). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 29

30 4 Given a random sample of independent random vectors X 1,..., X n from a N p (0, Σ) distribution, then n( X µ) S 1 n ( X µ) T 2 (p, n 1). 5 Let X 1,..., X n1 and Y 1,..., Y n2 be two random samples of independent random vectors from a N p (µ 1, Σ) and a N p (µ 2, Σ) respectively. If µ 1 = µ 2, then where n 1 n 2 n 1 + n 2 ( X Ȳ) S 1 p ( X Ȳ) T 2 (p, n 1 + n 2 2), S p = (n 1 S x,n1 + n 2 S y,n2 )/(n 1 + n 2 ) (5) is the pooled covariance matrix. These two properties will be used in hypothesis tests about mean vectors of Gaussian distributions. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 30

31 Inferences about the mean Case 1. Let X 1,..., X n be a sample of independent random vectors from a N p (µ, Σ). Fix µ 0 R p and consider the test Under H 0 the test statistic or, equivalently, H 0 : µ = µ 0. (6) n( X µ 0 ) S 1 n ( X µ 0 ) T 2 (p, n 1), n p p ( X µ 0 ) S 1 n ( X µ 0 ) F (p, n p). This provides us with a rejection region for the test (6). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 31

32 Case 2. Let X 1,..., X n1 and Y 1,..., Y n2 be two independent samples of independent random vectors from a N p (µ 1, Σ) and a N p (µ 2, Σ) respectively. Consider the test Under H 0, the test statistic H 0 : µ 1 = µ 2. (7) n( X Ȳ) S 1 p ( X Ȳ) T 2 (p, n 1 + n 2 2), where S p is given in (5). This is equivalent to n 1 + n 2 p 1 p(n 1 + n 2 2) n 1 n 2 n 1 + n ( X Ȳ) S 1 p ( X Ȳ) F (p, n 1+n 2 p 1). 2 This provides us with a rejection region for the test (7). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 32

33 Case 3. Assume we have g data matrices from g independent multivariate normal populations Sample Size Mean Covariance Distribution X 1 n 1 p x 1 S 1 N p (µ 1, Σ) X 2 n 2 p x 2 S 2 N p (µ 2, Σ)..... X g n g p x g S g N p (µ g, Σ) The global sample mean vector and sample covariance matrix are x = 1 n g i=1 n i x i, S = 1 n g g n i S i, with n = i=1 g n i. i=1 Consider the test H 0 : µ 1 = µ 2 =... = µ g. (8) Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 33

34 Let us introduce the following matrices g B = n i ( x i x)( x i x) (between-groups dispersion) W = g i=1 i=1 k=1 n (x ik x i )(x ik x i ) = g n i S i i=1 (intra-groups dispersion). Under H 0, B W p (Σ, g 1) and W W p (Σ, n g) are independent. The test statistic Λ = W Λ(p, n g, g 1), W + B can be approximated by the F distribution via Rao s asymptotic approximation: If Λ Λ(p, a, b), then 1 Λ1/β Λ 1/β αβ 2γ pb F (pb, αβ 2γ), where α = a + b p + b + 1, β 2 = p2 b p 2 + b 2 5, γ = pb 2. 4 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 34

35 2.2.3 Mixture models Let k > 0 be an integer. A p-dimensional random vector X has a k-component finite mixture distribution if its probability density (or mass) function is given by f (x) = k π j f j (x), (9) j=1 where f j, j = 1,..., k, are probability densities (or mass functions) and 0 π j 1, j = 1,..., k, are constants such that k j=1 π j = 1. The f j are the component densities of the mixture and the π j are the mixing proportions or weights. In the definition of a mixture model, the number k of components is considered fixed, but in many applications the value of k is unknown and has to be inferred from the data. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 35

36 The key to generate random vectors with density (9) is as follows. We define the discrete r.v. Z taking values 1, 2,..., k, with probabilities π 1, π 2,..., π k respectively. We suppose that the conditional density of X given Z = j is given by f j. Then the unconditional density of X is (9). Equivalently, we can define the discrete random vector Z = (Z 1,..., Z k ), with the Z j s taking value 0 or 1, k j=1 Z j = 1 and π j equal to the probability that component Z j in Z is 1. Then Z follows a multinomial distribution with parameters (π 1,..., π k ) and we suppose that f j is the conditional density of X given that the j-th component of Z is 1. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 36

37 In many applications the component densities f j are specified to belong to some parametric family. The resulting model is called a parametric mixture. In particular, frequently the component densities are assumed to belong to the same parametric family, such as the mixtures of Gaussian densities. Parametric mixture models can be viewed as a semiparametric compromise between a single parametric family (case k = 1) and a nonparametric model such as kernel density estimation (case k = n). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 37

38 Example: Mixture of three bivariate normal distributions n = 200 # Sample size Size = 1 # Number of non-zero values of the multinomial components Prob = c(0.5,0.3,0.2) # Mixing weights NumComp = length(prob) # Number of components in mixture C = rmultinom(n, Size, Prob) # Sample from the multinomial SizeComp = apply(c,1,sum) # Sample size for each component X = matrix(rep(0,n*2),nrow = n) library(mvtnorm) X[C[1,]==1,] = rmvnorm(n=sizecomp[1],sigma=matrix(c(1,0,0,1),ncol=2)) X[C[2,]==1,] = rmvnorm(n=sizecomp[2],mean=c(3,5),sigma=matrix(c(3,1,1,1),ncol=2)) X[C[3,]==1,] = rmvnorm(n=sizecomp[3],mean=c(4,-3),sigma=matrix(c (0.5,0.1,0.1,2),ncol=2)) panel.hist = function(x,...) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h <- hist(x, plot = FALSE) breaks <- h$breaks; nb <- length(breaks) y <- h$counts; y <- y/max(y) rect(breaks[-nb], 0, breaks[-1], y, col = "cyan",...) } pairs(x, cex = 1.5, pch = 20, bg = "light blue", diag.panel = panel.hist, cex.labels = 2, font.labels = 2) Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 38

39 var var 2 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 39

40 Thus, a mixture is a candidate distribution to model a population with several subpopulations. Example: Times between Old Faithful eruptions (Y, var 2) and duration of eruptions (X, var 1) var var 2 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 40

41 2.3. Maximum likelihood estimation Let x 1,..., x n denote a sample of a multivariate parametric model with density (or mass) function f (x; ψ), where ψ = (ψ 1,..., ψ k ) denotes the vector of unknown parameters. The maximum likelihood estimator (m.l.e.) of ψ is ˆψ, the maximizer of the likelihood function n L(ψ; x 1,..., x n ) = f (x i ; ψ). MLE for the Gaussian distribution Let X 1,..., X n be a random sample from a normal population with mean µ and covariance Σ. Then the m.l.e. of µ and Σ are respectively ˆµ = X and ˆΣ = S n = 1 n i=1 n (X i X)(X i X) i=1 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 41

42 Proof (Johnson and Wichern 2007): The likelihood is n 1 L(µ, Σ) = (2π) p/2 Σ 1/2 e (x i µ) Σ 1 (x i µ)/2 = i=1 1 (2π) np/2 e Σ n/2 The mle of µ is the minimizer of n (x i µ) Σ 1 (x i µ) i=1 = = = n tr [ Σ 1 (x i µ)(x i µ) ] i=1 n i=1 (x i µ) Σ 1 (x i µ)/2. n tr [ Σ 1 ( (x i x)(x i x) + n( x µ)( x µ) )] j=1 n tr [ Σ 1 (x i x)(x i x) ] + n( x µ)σ 1 ( x µ) j=1 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 42

43 Since Σ 1 is positive definite, the distance ( x µ)σ 1 ( x µ) > 0 unless µ = x. Thus the mle of µ is ˆµ = x. It remains to maximize (over Σ) L(ˆµ, Σ) = 1 (2π) np/2 Σ n/2 e tr[σ 1 n j=1 (x i x)(x i x) ]/2. Auxiliary result: Given a p p symmetric positive definite matrix B and a scalar b > 0, it holds that, for all positive definite Σ (p p), 1 Σ b e tr(σ 1 B)/2 1 B b (2b)pb e bp. Equality holds if Σ = (1/2b)B. We apply this auxiliary result with b = n/2 and B = n j=1 (x i x)(x i x) and conclude that the maximum occurs at ˆΣ = S n. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 43

44 MLE for a parametric mixture model Consider a parametric mixture model f (x; ψ) = k π j f j (x; θ j ), (10) j=1 where ψ = (π 1,..., π k 1, ξ ) and ξ is the vector containing all the parameters in θ 1,..., θ k known a priori to be different. We want to obtain the m.l.e. of the parameters in model (10) based on a sample x 1,..., x n from f. The log-likelihood for ψ is log L(ψ; x 1,..., x n ) = n k log( π j f j (x i ; θ j )). i=1 Computing the m.l.e. would require solving the likelihood equation j=1 log L(ψ) = 0, ψ not an easy task (see Section 2.8 in McLachlan and Peel 2000). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 44

45 The Expectation-Maximization (EM) algorithm of Dempster et al. (1977) provides an iterative scheme to be followed for computing the m.l.e. of the parameters ψ in a parametric mixture. The EM algorithm is designed for incomplete data, so the key is to consider the mixture data x 1,..., x n as incomplete, since the associated component label vectors, z 1,..., z n, are not available. Here z i = (z i1,..., z ik ) is a k-dimensional vector with z ij = 1 or 0 according to whether x i did or did not arise from the j-th component of the mixture, i = 1,..., n, j = 1,..., k. The complete data sample is therefore declared to be x c1,... x cn, where x ci = (x i, z i ). Then the complete-data log-likelihood for ψ is given by log L c (ψ) = n k z ij (log π j + log f j (x i ; θ j )). i=1 j=1 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 45

46 E-Step: The algorithm starts with an initial guess ψ (0) for ψ. In general, denote by ψ (g) the approximated value of ψ after the g-th iteration of the algorithm. The E-step requires computing the conditional expectation of log L c (ψ) given the sample x 1,..., x n and under the current approximation for ψ: Q(ψ; ψ (g) ) = E ψ (g)(log L c (ψ) x 1,..., x n ) = n i=1 j=1 k E ψ (g)(z ij x 1,..., x n )(log π j + log f j (x i ; θ j )). It can be proved that, for i = 1,..., n, j = 1,..., k, E ψ (g)(z ij x 1,..., x n ) = P ψ (g){z ij = 1 x 1,..., x n ) = π (g) j k j=1 π(g) j f j (x i ; θ (g) j ) f j (x i ; θ (g) ) j := τ (g) ij. This is the posterior probability that the i-th member of the sample, X i, belongs to the j-th component of the mixture. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 46

47 Then Q(ψ; ψ (g) ) = n k i=1 j=1 τ (g) ij (log π j + log f j (x i ; θ j )). M-Step: The updated estimate ψ (g+1) is obtained as the global maximizer of Q(ψ; ψ (g) ) with respect to ψ. Specifically, π (g+1) j = 1 n n i=1 τ (g) ij and ξ (g+1) is obtained as an appropriate root of n k i=1 j=1 τ (g) ij log f j (x i ; θ j ) ξ = 0. The E- and M-steps are repeatedly alternated until the difference L(ψ (g+1) ) L(ψ (g) )( 0) is small enough. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 47

48 In the case of a normal mixture with heteroscedastic components f (x, ψ) = k π j f (x; µ j, Σ j ), j=1 the M-step update ξ (g+1) has a closed form: and Σ (g+1) j = µ (g+1) j = n i=1 τ (g) ij n i=1 τ (g) ij x i n i=1 τ (g) ij (x i µ (g+1) j )(x i µ (g+1) n i=1 τ (g) ij j ) Remark: We have assumed that the number k of components in the mixture fitted to the sample is known or fixed in advance. There are techniques for choosing the optimal number of components (see, e.g., Chapter 6 in McLachlan and Peel 2000; Claeskens and Hjort 2008). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 48.

49 We can use the R package mclust for normal mixture fitting to a data set. Example: Times between Old Faithful eruptions (Y ) and duration of eruptions (X ). Datos = read.table( Datos-geyser.txt,header=TRUE) XY = cbind(datos$x,datos$y) # Normal mixture fitting with 2 components faithfuldens = densitymclust(xy,g=2,modelnames="vvv") summary(faithfuldens, parameters = TRUE) Density estimation via Gaussian finite mixture modeling Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 2 components: log.likelihood n df BIC ICL Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 49

50 Clustering table: Mixing probabilities: Means: [,1] [,2] [1,] [2,] Variances: [,,1] [,1] [,2] [1,] [2,] [,,2] [,1] [,2] [1,] [2,] Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 50

51 plot(faithfuldens,xy,xlab="x",ylab="y") Y X Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 51

52 plot(faithfuldens, type = "persp", col = grey(0.8)) Density Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 52

53 References Claeskens, G. and Hjort, N.L. (2008). Model Selection and Model Averaging. Cambridge University Press. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, Johnson, R.A. and Wichern, D.W. (2007). Applied Multivariate Statistical Analysis. Prentice Hall. McAssey, M.P. (2013). An empirical goodness-of-fit test for multivariate distributions. Journal of Applied Statistics, 40, McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley. Peña, D. (2002). Análisis de datos multivariantes. McGraw-Hill. Székely, G.J. and Rizzo, M.L. (2005). A new test for multivariate normality. Journal of Multivariate Analysis, 93, Székely, G.J. and Rizzo, M.L. (2013). Energy statistics: a class of statistics based on distances. Journal of Statistical Planning and Inference, 143, Székely, G.J., Rizzo, M.L. and Bakirov, N.K. (2007). Measuring and testing independence by correlation of distances. Annals of Statistics, 35, Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 53

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4. Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Multivariate Analysis Homework 1

Multivariate Analysis Homework 1 Multivariate Analysis Homework A490970 Yi-Chen Zhang March 6, 08 4.. Consider a bivariate normal population with µ = 0, µ =, σ =, σ =, and ρ = 0.5. a Write out the bivariate normal density. b Write out

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Elements of Probability Theory

Elements of Probability Theory Short Guides to Microeconometrics Fall 2016 Kurt Schmidheiny Unversität Basel Elements of Probability Theory Contents 1 Random Variables and Distributions 2 1.1 Univariate Random Variables and Distributions......

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

MULTIVARIATE DISTRIBUTIONS

MULTIVARIATE DISTRIBUTIONS Chapter 9 MULTIVARIATE DISTRIBUTIONS John Wishart (1898-1956) British statistician. Wishart was an assistant to Pearson at University College and to Fisher at Rothamsted. In 1928 he derived the distribution

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,

More information

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

STAT 730 Chapter 5: Hypothesis Testing

STAT 730 Chapter 5: Hypothesis Testing STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The

More information

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution: 4.1 Bivariate Distributions. Chapter 4. Multivariate Distributions For a pair r.v.s (X,Y ), the Joint CDF is defined as F X,Y (x, y ) = P (X x,y y ). Obviously, the marginal distributions may be obtained

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Lecture 14: Multivariate mgf s and chf s

Lecture 14: Multivariate mgf s and chf s Lecture 14: Multivariate mgf s and chf s Multivariate mgf and chf For an n-dimensional random vector X, its mgf is defined as M X (t) = E(e t X ), t R n and its chf is defined as φ X (t) = E(e ıt X ),

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2 MATH 3806/MATH4806/MATH6806: MULTIVARIATE STATISTICS Solutions to Problems on Rom Vectors Rom Sampling Let X Y have the joint pdf: fx,y) + x +y ) n+)/ π n for < x < < y < this is particular case of the

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

Random Vectors and Multivariate Normal Distributions

Random Vectors and Multivariate Normal Distributions Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random 75 variables. For instance, X = X 1 X 2., where each

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84 Chapter 2 84 Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random variables. For instance, X = X 1 X 2.

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013 Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm 2 Chapter Gaussian Vectors.

More information

Multivariate Distributions

Multivariate Distributions Copyright Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/adafaepov/ Appendix E Multivariate Distributions E.1 Review of Definitions Let s review

More information

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group Explore the data Anja Bråthen Kristoffersen Biomedical Research Group density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

III - MULTIVARIATE RANDOM VARIABLES

III - MULTIVARIATE RANDOM VARIABLES Computational Methods and advanced Statistics Tools III - MULTIVARIATE RANDOM VARIABLES A random vector, or multivariate random variable, is a vector of n scalar random variables. The random vector is

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five

More information

5 Inferences about a Mean Vector

5 Inferences about a Mean Vector 5 Inferences about a Mean Vector In this chapter we use the results from Chapter 2 through Chapter 4 to develop techniques for analyzing data. A large part of any analysis is concerned with inference that

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

8 - Continuous random vectors

8 - Continuous random vectors 8-1 Continuous random vectors S. Lall, Stanford 2011.01.25.01 8 - Continuous random vectors Mean-square deviation Mean-variance decomposition Gaussian random vectors The Gamma function The χ 2 distribution

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Stat 206: the Multivariate Normal distribution

Stat 206: the Multivariate Normal distribution Stat 6: the Multivariate Normal distribution James Johndrow (adapted from Iain Johnstone s notes) 16-11- Introduction The multivariate normal distribution plays a central role in multivariate statistics

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

A Squared Correlation Coefficient of the Correlation Matrix

A Squared Correlation Coefficient of the Correlation Matrix A Squared Correlation Coefficient of the Correlation Matrix Rong Fan Southern Illinois University August 25, 2016 Abstract Multivariate linear correlation analysis is important in statistical analysis

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information