2. Multivariate Distributions
|
|
- Theodora Spencer
- 6 years ago
- Views:
Transcription
1 2. Multivariate Distributions Random vectors: mean, covariance matrix, linear transformations, dependence measures (a short introduction on the probability tools for multivariate statistics). Multidimensional normal distribution, mixture models (some well-known examples of multivariate probability distributions). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 1
2 2.1. Random vectors Multivariate data are the result of observing a random vector, a vector X = (X 1,..., X p ) whose components X j, j = 1,..., p, are random variables (r.v.) on the same probability space (Ω, A, P). Similarly, a random matrix is a matrix whose elements are r.v. The probability distribution of a random vector or matrix is characterized by the joint distribution of its components. In particular, the distribution function of a random vector X is F (x 1,..., x p ) = P{X 1 x 1,..., X p x p }, for (x 1,..., x p ) R p. In general, we will only work with continuous random vectors, whose probability distribution is characterized by the density function f = f (x 1,..., x n ), satisfying 1 f (x 1,..., x p ) 0 for all (x 1,..., x p ) R p ; 2 f (x 1,..., x p )dx 1... dx p = 1; R p 3 f (x 1,..., x p ) = p F (x 1,..., x p ). x 1... x p Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 2
3 The marginal distribution of each component X j, j = 1,..., p, is its probability distribution as an individual random variable. Its density function is: f j (x j ) = f (x 1,..., x p )dx 1... dx j 1 dx j+1... dx p, for x j R. R p 1 More generally, given the partition X (1) X =, X (2) with X (1) = (X 1,..., X r ) and X (2) = (X r+1,..., X p ), the marginal density of X (1) is f X (1)(x 1,..., x r ) = f (x 1,..., x p )dx r+1... dx p. R p r Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 3
4 Two random matrices X 1 and X 2 are independent if the elements of X 1 (as a collection of r.v.) are independent of the elements of X 2. (The elements within X 1 or X 2 need not be independent.) In particular, given the partition X = [X (1), X (2) ], the vectors X (1) and X (2) are independent if F (x 1,..., x p ) = F X (1)(x 1,..., x r ) F X (2)(x r+1,..., x p ), for all x 1,..., x p, or, equivalently, if f (x 1,..., x p ) = f X (1)(x 1,..., x r ) f X (2)(x r+1,..., x p ), for all x 1,..., x p. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 4
5 2.1.1 Expectation The expected value of a random vector (resp. matrix) is the vector (resp. matrix) of expected values of each of its components (the marginal expectations). For the random vector X = (X 1,..., X p ), µ := E(X) = (E(X 1 ),..., E(X p )) = (µ 1,..., µ p ), where µ j := E(X j ) = R x f j(x) dx. The expectation is a linear function: 1 If A is a q p constant matrix, X is a p-dimensional random vector and b is a q-dimensional constant vector, then E(AX + b) = AE(X) + b. 2 If X and Y are random matrices of the same dimension, then E(X + Y) = E(X) + E(Y). 3 If X is a q p random matrix and A, B are constant matrices of adequate dimensions, then E(AXB) = AE(X)B. If X 1 and X 2 are conformable independent matrices, then E(X 1 X 2 ) = E(X 1 )E(X 2 ). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 5
6 2.1.2 Covariance matrix The variance-covariance matrix (or simply covariance matrix) of a random vector X = (X 1..., X p ) with expectation µ is Σ = V(X) := E((X µ)(x µ) ) = E(XX ) µµ σ 11 σ σ 1p σ 21 σ σ 2p =..., σ p1 σ p2... σ pp where σ jj = V(X j ) is the variance of the r.v. X j and σ jk = Cov(X j, X k ) is the covariance of X j and X k, j, k = 1,..., p. Then Σ is a symmetric matrix. Some properties of the covariance matrix: 1 If A is a q p constant matrix, X is a p-dimensional random vector and b is a q-dimensional constant vector, then V(AX + b) = AV(X)A. 2 Σ = V(X) is always nonnegative definite. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 6
7 2.1.3 Correlation matrix Let X = (X 1,..., X p ) be a random vector with covariance matrix Σ and with 0 < σ jj = V(X i ) <, i = 1..., p. Define D := diag(σ 11,..., σ pp ). Then the correlation matrix of X is 1 ρ ρ 1p ρ ρ 2p ρ =... = D 1/2 ΣD 1/2, ρ p1 ρ p where ρ jk is the correlation of X j and X k, j, k = 1,..., p, and D 1/2 := diag(σ 1/2 11,..., σ 1/2 pp ). Observe that, if Z := D 1/2 (X µ), where µ = E(X), then V(Z) = ρ. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 7
8 2.1.4 Dependence measures More generally, the (cross-)covariance between the p-dimensional random vector X 1 and the q-dimensional random vector X 2, with means µ 1 and µ 2 respectively, is the p q matrix given by Cov(X 1, X 2 ) = E((X 1 µ 1 )(X 2 µ 2 ) ) Some properties of the cross-covariance: 1 If A and B are constant matrices and c and d are constant vectors, then Cov(AX 1 + c, BX 2 + d) = ACov(X 1, X 2 )B. 2 If X 1, X 2 and X 3 are random vectors, then Cov(X 1 + X 2, X 3 ) = Cov(X 1, X 3 ) + Cov(X 2, X 3 ). 3 If X 1 and X 2 are independent, then Cov(X 1, X 2 ) = 0 p q. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 8
9 Pearson s product-moment covariance measures linear dependence and, for the multivariate normal distribution, diagonal covariance matrix implies independence of the random vector components. In general, however, Pearson s correlation matrix does not characterize independence. Székely et al. (2007) introduced two dependence coefficients, distance covariance and distance correlation, that measure all types of dependence between random vectors X and Y of arbitrary (and possibly different) dimensions. Suppose that X in R p and Y in R q are random vectors. The characteristic function of X is ( ˆf X (t) := E e i t,x ) = e i t,x df X (x). R p Let ˆf Y be the characteristic function of Y, and denote the joint characteristic function of (X, Y ) by ˆf X,Y. Then X and Y are independent if and only if ˆf X,Y = ˆf XˆfY. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 9
10 Distance covariance is defined as a measure of the discrepancy between ˆf X,Y and ˆf X and ˆf Y : ˆf X,Y (t, s) ˆf X (t)ˆf Y (s) 2 w = ˆf X,Y (t, s) ˆf X (t)ˆf Y (s) 2 w(t, s) dt ds. R p+q The only integrable weight function w that makes this definition scale and rotation invariant is proportional to the reciprocal of t 1+p p s 1+q q, where p here denotes the Euclidean distance in R p. The distance covariance between random vectors X and Y with E X p < and E Y q < is the square root of V 2 (X, Y) = 1 Rp+q ˆf X,Y (t, s) ˆf X (t)ˆf Y (s) 2 c p c q t 1+p p s 1+q dt ds, (1) q with c p := π(p+1)/2 ). Γ ( p+1 2 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 10
11 Similarly, distance variance is defined as the square root of V 2 (X) = V 2 (X, X). The distance correlation between random vectors X and Y with E X p < and E Y q < is the square root of R 2 (X, Y) := V 2 (X, Y) V 2 (X)V 2 (Y), V2 (X)V 2 (Y) > 0, 0, V 2 (X)V 2 (Y) = 0. (2) Theorem 3 in Székely et al. (2007): If E X p < and E Y q <, then 0 R 1, and R(X, Y) = 0 if and only if X and Y are independent. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 11
12 For an observed random sample {(X i, Y i ), i = 1,..., n} of (X, Y) natural estimators of the unknown characteristic functions are ˆf X n (t) := e i t,x dfx n (x) = 1 n e i t,x i, ˆf Y n R p n (s) := 1 n n and ˆf n X,Y (t, s) := 1 n i=1 n e i t,x i +i s,y i, i=1 where FX n denotes the empirical distribution function of X 1,..., X n. i=1 e i s,y i The empirical distance covariance is defined as the square root of V 2 n(x, Y) := ˆf n X,Y (t, s) ˆf n X (t)ˆf n Y (s) 2 w. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 12
13 Székely et al. (2007) used the asymptotic properties of the empirical distance covariance to test the independence of X and Y: H 0 : H 1 : X and Y are independent X and Y are dependent Corollary 2 of Székely et al. (2007): If E X p < and E Y q < and X and Y are independent, then n V2 n(x, Y) S 2 d Q, (3) n where Q is a certain, known quadratic form of centered Gaussian random variables with E(Q) = 1 and S 2 := 1 n 2 n i,k=1 X i X k p 1 n 2 n Y i Y k q. i,k=1 The test statistic (3) is a particular case of the so-called energy statistics, functions of distances between statistical observations (see Székely and Rizzo 2013). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 13
14 2.2. Examples of multidimensional distributions Multidimensional normal distribution The random vector X = (X 1,..., X p ) follows a p-dimensional normal distribution with mean µ and covariance matrix Σ, and we denote it by X N p (µ, Σ), if its density function is f (x; µ, Σ) = 1 (2π) p/2 Σ 1/2 e (x µ) Σ 1 (x µ)/2, (4) where x = (x 1,..., x p ) and < x i <, i = 1,..., p. Example (Bivariate normal density): We evaluate the bivariate (p = 2) normal density in terms of the individual parameters µ 1 = E(X 1 ), µ 2 = E(X 2 ), σ 11 = V(X 1 ), σ 22 = V(X 2 ) and ρ 12 = Cor(X 1, X 2 ) = σ 12 / σ 11 σ 22. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 14
15 The determinant and inverse of the matrix ( ) ( ) σ11 σ Σ = 12 σ = 11 σ11 σ 22 ρ 12 σ 12 σ 22 σ11 σ 22 ρ 12 σ 22 are respectively Σ = σ 11 σ 22 (1 ρ 2 12 ) and Σ 1 = ( 1 σ 11 σ 22 (1 ρ 2 12 ) σ 22 σ 11 σ 22 ρ 12 σ 11 σ 22 ρ 12 σ 11 ) Thus (x µ) Σ 1 (x µ) = ( 1 σ = (x 1 µ 1, x 2 µ 2 ) 22 ) ( ) σ 11 σ 22 ρ 12 σ 11 σ 22 (1 ρ 2 12 ) x1 µ 1 σ 11 σ 22 ρ 12 σ 11 x 2 µ 2 1 [ = σ 11 σ 22 (1 ρ 2 12 ) σ22 (x 1 µ 1 ) 2 + σ 11 (x 2 µ 2 ) 2 2ρ 12 σ11 σ 22 (x 1 µ 1 )(x 2 µ 2 ) ] [ 1 (x1 µ 1 ) 2 = 1 ρ 2 + (x 2 µ 2 ) 2 ] x 1 µ 1 x 2 µ 2 2ρ σ 11 σ 22 σ11 σ22 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 15
16 Consequently, the bivariate normal density is f (x 1, x 2 ) = 1 2π σ 11 σ 22 (1 ρ 2 12 ) { [ 1 (x1 µ 1 ) 2 exp 2(1 ρ 2 12 ) + (x 2 µ 2 ) 2 ]} x 1 µ 1 x 2 µ 2 2ρ 12 σ 11 σ 22 σ11 σ22 Observe that, if ρ 12 = 0 (X 1 and X 2 are uncorrelated), then f (x 1, x 2 ) = = = 1 2π σ 11 σ22 exp 1 2π σ11 exp = f 1 (x 1 ) f 2 (x 2 ). { 1 2 { 1 [ (x1 µ 1 ) 2 2 (x 1 µ 1 ) 2 σ 11 } + (x 2 µ 2 ) 2 ]} = σ π σ22 exp { 1 2 σ 22 (x 2 µ 2 ) 2 } Since the joint density f (x 1, x 2 ) can be expressed as the product of the marginal densities, we conclude that X 1 and X 2 are actually independent r.v. σ 22 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 16
17 split.screen(c(2,3)) screen(1) ## bivariate normal pdf library(mvtnorm) x = y = seq(-5, 5, length = 50) f = function(x,y) { dmvnorm(cbind(x,y)) } z = outer(x, y, f) par(mai=c(0.1,0.1,0.1,0.1)) persp(x, y, z, theta=5, phi=50, expand=0.5, col="lightblue") screen(2) ## contours of the bivariate normal pdf x = y = seq(-5, 5, length = 150) z = outer(x, y, f) par(mai=c(0.5,0.5,0.5,0.5)) contour(x, y, z, nlevels=20, col=rainbow(20)) screen(3) ## normal data X = rmvnorm(n=100,sigma=matrix(c(1,0,0,1), ncol=2)) par(mai=c(0.5,0.5,0.5,0.5)) plot(x[,1],x[,2], pch=19,xlab=expression(x [1]),ylab=expression(x[2])) screen(4) x = y = seq(-5, 5, length = 50) Sigma = matrix(c(1,0.7,0.7,1), ncol=2) f = function(x,y) { dmvnorm(cbind(x,y),sigma= Sigma) } z = outer(x, y, f) par(mai=c(0.1,0.1,0.1,0.1)) persp(x, y, z, theta=5, phi=50, expand=0.5, col="lightblue") screen(5) ## contours of the bivariate normal pdf x = y = seq(-5, 5, length = 150) z = outer(x, y, f) par(mai=c(0.5,0.5,0.5,0.5)) contour(x, y, z, nlevels=20, col=rainbow(20)) screen(6) ## normal data X = rmvnorm(n=100,sigma=sigma) par(mai=c(0.5,0.5,0.5,0.5)) plot(x[,1],x[,2], pch=19,xlab=expression(x [1]),ylab=expression(x[2])) Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 17
18 x x x x 4 4 z 4 4 Advanced Course in Statistics. Lecturer: Amparo Baı llo x x1 x2 x z y 4 y x x x2 x zz z z y yyy y z x Multivariate Distributions 0 x
19 Properties of the multivariate normal distribution Let X N p (µ, Σ). 1 The normal density has a global maximum at µ and is symmetric with respect to µ in the sense that f (µ + a) = f (µ a) for all a R d. 2 Linear combinations of a multivariate normal are also normally distributed: if A is a (q p) constant matrix and d is a (q 1) constant vector, then AX + d N q (Aµ + d, AΣA ). Consequently, all subsets of the components of X are normally distributed. 3 Zero correlation between [ normal ] vectors is equivalent to X1 independence: if X =, then X 1 and X 2 are X 2 independent if and only if Cov(X 1, X 2 ) = 0. 4 If Σ > 0, there exists a linear transformation of X with mean 0 and covariance matrix equal to the identity. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 19
20 5 Contours of constant density for the multivariate distribution are ellipsoids centered at the population mean. If Σ > 0, then (a) the level sets of the probability density f are the ellipsoids given by {x R d : (x µ) Σ 1 (x µ) = c 2 }. These ellipsoids are centered at µ and have axes ±c λ i e i, where (λ i, e i ), i = 1,..., p, are the eigenvalue-eigenvector pairs of Σ. (b) (X µ) Σ 1 (X µ) follows a χ 2 p distribution. Thus, P{(X µ) Σ 1 (X µ) χ 2 p;α} = 1 α, for any 0 < α < 1. The Mahalanobis distance d M of a point x R p to the mean µ of a p-dimensional distribution with covariance matrix Σ is defined by d 2 M (x) = (x µ) Σ 1 (x µ). It is a statistical distance in the sense that it takes into account the variability of the distribution (unlike the Euclidean distance). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 20
21 6 If X N p (µ, Σ), then any linear combination of variables a X = a 1 X 1 + a 2 X 2 + a p X p is distributed as N(a µ, a Σa). Also, if a X is distributed as N(a µ, a Σa) for every a R p, then X must follow a N p (µ, Σ). 7 Let X 1,..., X n be mutually independent N p (µ j, Σ) random vectors. Let c 1..., c n be real constants. Then V = c 1 X c n X n n n follows a N p c j µ j, Σ distribution. j=1 j=1 8 Given X 1,..., X n a random sample from X N p (µ, Σ), the maximum likelihood estimators (m.l.e.) of µ and Σ are respectively ˆµ = X : 1 n c 2 j n X i and ˆΣ = S n = 1 n i=1 n (X i X)(X i X). i=1 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 21
22 9 The Central Limit Theorem: Let X 1,..., X n be independent observations from a population with mean µ and nonsingular covariance matrix Σ. Then n( X µ) d n N p(0, Σ) and n( X µ) S 1 ( X µ) d n χ2 p. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 22
23 The normality assumption on a sample from X can be assessed by examining the univariate marginal distributions of the components of X, which should be Gaussian; examining the bivariate scatterplots of the pairs of components of X, which should have an elliptical appearance; checking if the Mahalanobis distances di 2 follow a χ 2 p distribution. = (x i x) S 1 n (x i x) If the data are clearly non-normal, we can consider the possibility of taking nonlinear transformations of the variables. There are multiple proposals in the literature to test the multivariate normality assumption (see Székely and Rizzo 2005; McAssey 2013 and references therein). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 23
24 Example: Mass, snout-vent length and hind limb span of 25 lizards var var 2 var Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 24
25 Example: Concentration of Selenium in the teeth and liver of 20 whales (Delphinapterus leucas) at Mackenzie Delta, Northwest Territories, in var var Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 25
26 2.2.2 Distributions associated to the multivariate normal Correspondences between the univariate and the multivariate situations: Univariate case Multivariate case N(µ, σ) N p (µ, Σ) χ 2 n W p (Σ, n) F (m, n) Λ(p, a, b) t T 2 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 26
27 Wishart distribution Given a random sample of independent random vectors X 1,..., X n from a N p (0, Σ) distribution, the Wishart distribution W p (Σ, n) is that of the random p p matrix Q = n X i X i. i=1 Properties: 1 If Q 1 W p (Σ, n 1 ) and Q 2 W p (Σ, n 2 ) are independent, then Q 1 + Q 2 W p (Σ, n 1 + n 2 ). 2 Fisher s Theorem: If X 1,..., X n are independent N p (µ, Σ) random vectors, then i) the sample mean vector X and the sample covariance matrix S n are independent; ii) X N p (µ, 1 n Σ); iii) ns n W p (Σ, n 1). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 27
28 Wilks Lambda This is the distribution of the following determinants ratio Λ = A A + B = 1 I + A 1 Λ(p, a, b), B where A W p (Σ, a) and B W p (Σ, b) are independent, Σ is non singular and a p. Properties: 1 Bartlett s approximation: For large a, ( a + b p + b + 1 ) log Λ(p, a, b) χ 2 pb 2. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 28
29 Hotelling s T 2 It is the distribution of the r.v. T 2 = nz Q 1 Z T 2 (p, n), where Z N p (0, I) and Q W p (I, n) are independent. Properties: 1 If p = 1, then T 2 (1, n) is the square of a Student t distribution with n degrees of freedom. 2 n p + 1 T 2 (p, n) = F (p, n p + 1) np 3 Hotelling s distribution is invariant under affine transformations, that is, if X N p (µ, Σ) and R W p (Σ, n) are independent, then n(x µ) R 1 (X µ) T 2 (p, n). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 29
30 4 Given a random sample of independent random vectors X 1,..., X n from a N p (0, Σ) distribution, then n( X µ) S 1 n ( X µ) T 2 (p, n 1). 5 Let X 1,..., X n1 and Y 1,..., Y n2 be two random samples of independent random vectors from a N p (µ 1, Σ) and a N p (µ 2, Σ) respectively. If µ 1 = µ 2, then where n 1 n 2 n 1 + n 2 ( X Ȳ) S 1 p ( X Ȳ) T 2 (p, n 1 + n 2 2), S p = (n 1 S x,n1 + n 2 S y,n2 )/(n 1 + n 2 ) (5) is the pooled covariance matrix. These two properties will be used in hypothesis tests about mean vectors of Gaussian distributions. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 30
31 Inferences about the mean Case 1. Let X 1,..., X n be a sample of independent random vectors from a N p (µ, Σ). Fix µ 0 R p and consider the test Under H 0 the test statistic or, equivalently, H 0 : µ = µ 0. (6) n( X µ 0 ) S 1 n ( X µ 0 ) T 2 (p, n 1), n p p ( X µ 0 ) S 1 n ( X µ 0 ) F (p, n p). This provides us with a rejection region for the test (6). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 31
32 Case 2. Let X 1,..., X n1 and Y 1,..., Y n2 be two independent samples of independent random vectors from a N p (µ 1, Σ) and a N p (µ 2, Σ) respectively. Consider the test Under H 0, the test statistic H 0 : µ 1 = µ 2. (7) n( X Ȳ) S 1 p ( X Ȳ) T 2 (p, n 1 + n 2 2), where S p is given in (5). This is equivalent to n 1 + n 2 p 1 p(n 1 + n 2 2) n 1 n 2 n 1 + n ( X Ȳ) S 1 p ( X Ȳ) F (p, n 1+n 2 p 1). 2 This provides us with a rejection region for the test (7). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 32
33 Case 3. Assume we have g data matrices from g independent multivariate normal populations Sample Size Mean Covariance Distribution X 1 n 1 p x 1 S 1 N p (µ 1, Σ) X 2 n 2 p x 2 S 2 N p (µ 2, Σ)..... X g n g p x g S g N p (µ g, Σ) The global sample mean vector and sample covariance matrix are x = 1 n g i=1 n i x i, S = 1 n g g n i S i, with n = i=1 g n i. i=1 Consider the test H 0 : µ 1 = µ 2 =... = µ g. (8) Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 33
34 Let us introduce the following matrices g B = n i ( x i x)( x i x) (between-groups dispersion) W = g i=1 i=1 k=1 n (x ik x i )(x ik x i ) = g n i S i i=1 (intra-groups dispersion). Under H 0, B W p (Σ, g 1) and W W p (Σ, n g) are independent. The test statistic Λ = W Λ(p, n g, g 1), W + B can be approximated by the F distribution via Rao s asymptotic approximation: If Λ Λ(p, a, b), then 1 Λ1/β Λ 1/β αβ 2γ pb F (pb, αβ 2γ), where α = a + b p + b + 1, β 2 = p2 b p 2 + b 2 5, γ = pb 2. 4 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 34
35 2.2.3 Mixture models Let k > 0 be an integer. A p-dimensional random vector X has a k-component finite mixture distribution if its probability density (or mass) function is given by f (x) = k π j f j (x), (9) j=1 where f j, j = 1,..., k, are probability densities (or mass functions) and 0 π j 1, j = 1,..., k, are constants such that k j=1 π j = 1. The f j are the component densities of the mixture and the π j are the mixing proportions or weights. In the definition of a mixture model, the number k of components is considered fixed, but in many applications the value of k is unknown and has to be inferred from the data. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 35
36 The key to generate random vectors with density (9) is as follows. We define the discrete r.v. Z taking values 1, 2,..., k, with probabilities π 1, π 2,..., π k respectively. We suppose that the conditional density of X given Z = j is given by f j. Then the unconditional density of X is (9). Equivalently, we can define the discrete random vector Z = (Z 1,..., Z k ), with the Z j s taking value 0 or 1, k j=1 Z j = 1 and π j equal to the probability that component Z j in Z is 1. Then Z follows a multinomial distribution with parameters (π 1,..., π k ) and we suppose that f j is the conditional density of X given that the j-th component of Z is 1. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 36
37 In many applications the component densities f j are specified to belong to some parametric family. The resulting model is called a parametric mixture. In particular, frequently the component densities are assumed to belong to the same parametric family, such as the mixtures of Gaussian densities. Parametric mixture models can be viewed as a semiparametric compromise between a single parametric family (case k = 1) and a nonparametric model such as kernel density estimation (case k = n). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 37
38 Example: Mixture of three bivariate normal distributions n = 200 # Sample size Size = 1 # Number of non-zero values of the multinomial components Prob = c(0.5,0.3,0.2) # Mixing weights NumComp = length(prob) # Number of components in mixture C = rmultinom(n, Size, Prob) # Sample from the multinomial SizeComp = apply(c,1,sum) # Sample size for each component X = matrix(rep(0,n*2),nrow = n) library(mvtnorm) X[C[1,]==1,] = rmvnorm(n=sizecomp[1],sigma=matrix(c(1,0,0,1),ncol=2)) X[C[2,]==1,] = rmvnorm(n=sizecomp[2],mean=c(3,5),sigma=matrix(c(3,1,1,1),ncol=2)) X[C[3,]==1,] = rmvnorm(n=sizecomp[3],mean=c(4,-3),sigma=matrix(c (0.5,0.1,0.1,2),ncol=2)) panel.hist = function(x,...) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h <- hist(x, plot = FALSE) breaks <- h$breaks; nb <- length(breaks) y <- h$counts; y <- y/max(y) rect(breaks[-nb], 0, breaks[-1], y, col = "cyan",...) } pairs(x, cex = 1.5, pch = 20, bg = "light blue", diag.panel = panel.hist, cex.labels = 2, font.labels = 2) Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 38
39 var var 2 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 39
40 Thus, a mixture is a candidate distribution to model a population with several subpopulations. Example: Times between Old Faithful eruptions (Y, var 2) and duration of eruptions (X, var 1) var var 2 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 40
41 2.3. Maximum likelihood estimation Let x 1,..., x n denote a sample of a multivariate parametric model with density (or mass) function f (x; ψ), where ψ = (ψ 1,..., ψ k ) denotes the vector of unknown parameters. The maximum likelihood estimator (m.l.e.) of ψ is ˆψ, the maximizer of the likelihood function n L(ψ; x 1,..., x n ) = f (x i ; ψ). MLE for the Gaussian distribution Let X 1,..., X n be a random sample from a normal population with mean µ and covariance Σ. Then the m.l.e. of µ and Σ are respectively ˆµ = X and ˆΣ = S n = 1 n i=1 n (X i X)(X i X) i=1 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 41
42 Proof (Johnson and Wichern 2007): The likelihood is n 1 L(µ, Σ) = (2π) p/2 Σ 1/2 e (x i µ) Σ 1 (x i µ)/2 = i=1 1 (2π) np/2 e Σ n/2 The mle of µ is the minimizer of n (x i µ) Σ 1 (x i µ) i=1 = = = n tr [ Σ 1 (x i µ)(x i µ) ] i=1 n i=1 (x i µ) Σ 1 (x i µ)/2. n tr [ Σ 1 ( (x i x)(x i x) + n( x µ)( x µ) )] j=1 n tr [ Σ 1 (x i x)(x i x) ] + n( x µ)σ 1 ( x µ) j=1 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 42
43 Since Σ 1 is positive definite, the distance ( x µ)σ 1 ( x µ) > 0 unless µ = x. Thus the mle of µ is ˆµ = x. It remains to maximize (over Σ) L(ˆµ, Σ) = 1 (2π) np/2 Σ n/2 e tr[σ 1 n j=1 (x i x)(x i x) ]/2. Auxiliary result: Given a p p symmetric positive definite matrix B and a scalar b > 0, it holds that, for all positive definite Σ (p p), 1 Σ b e tr(σ 1 B)/2 1 B b (2b)pb e bp. Equality holds if Σ = (1/2b)B. We apply this auxiliary result with b = n/2 and B = n j=1 (x i x)(x i x) and conclude that the maximum occurs at ˆΣ = S n. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 43
44 MLE for a parametric mixture model Consider a parametric mixture model f (x; ψ) = k π j f j (x; θ j ), (10) j=1 where ψ = (π 1,..., π k 1, ξ ) and ξ is the vector containing all the parameters in θ 1,..., θ k known a priori to be different. We want to obtain the m.l.e. of the parameters in model (10) based on a sample x 1,..., x n from f. The log-likelihood for ψ is log L(ψ; x 1,..., x n ) = n k log( π j f j (x i ; θ j )). i=1 Computing the m.l.e. would require solving the likelihood equation j=1 log L(ψ) = 0, ψ not an easy task (see Section 2.8 in McLachlan and Peel 2000). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 44
45 The Expectation-Maximization (EM) algorithm of Dempster et al. (1977) provides an iterative scheme to be followed for computing the m.l.e. of the parameters ψ in a parametric mixture. The EM algorithm is designed for incomplete data, so the key is to consider the mixture data x 1,..., x n as incomplete, since the associated component label vectors, z 1,..., z n, are not available. Here z i = (z i1,..., z ik ) is a k-dimensional vector with z ij = 1 or 0 according to whether x i did or did not arise from the j-th component of the mixture, i = 1,..., n, j = 1,..., k. The complete data sample is therefore declared to be x c1,... x cn, where x ci = (x i, z i ). Then the complete-data log-likelihood for ψ is given by log L c (ψ) = n k z ij (log π j + log f j (x i ; θ j )). i=1 j=1 Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 45
46 E-Step: The algorithm starts with an initial guess ψ (0) for ψ. In general, denote by ψ (g) the approximated value of ψ after the g-th iteration of the algorithm. The E-step requires computing the conditional expectation of log L c (ψ) given the sample x 1,..., x n and under the current approximation for ψ: Q(ψ; ψ (g) ) = E ψ (g)(log L c (ψ) x 1,..., x n ) = n i=1 j=1 k E ψ (g)(z ij x 1,..., x n )(log π j + log f j (x i ; θ j )). It can be proved that, for i = 1,..., n, j = 1,..., k, E ψ (g)(z ij x 1,..., x n ) = P ψ (g){z ij = 1 x 1,..., x n ) = π (g) j k j=1 π(g) j f j (x i ; θ (g) j ) f j (x i ; θ (g) ) j := τ (g) ij. This is the posterior probability that the i-th member of the sample, X i, belongs to the j-th component of the mixture. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 46
47 Then Q(ψ; ψ (g) ) = n k i=1 j=1 τ (g) ij (log π j + log f j (x i ; θ j )). M-Step: The updated estimate ψ (g+1) is obtained as the global maximizer of Q(ψ; ψ (g) ) with respect to ψ. Specifically, π (g+1) j = 1 n n i=1 τ (g) ij and ξ (g+1) is obtained as an appropriate root of n k i=1 j=1 τ (g) ij log f j (x i ; θ j ) ξ = 0. The E- and M-steps are repeatedly alternated until the difference L(ψ (g+1) ) L(ψ (g) )( 0) is small enough. Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 47
48 In the case of a normal mixture with heteroscedastic components f (x, ψ) = k π j f (x; µ j, Σ j ), j=1 the M-step update ξ (g+1) has a closed form: and Σ (g+1) j = µ (g+1) j = n i=1 τ (g) ij n i=1 τ (g) ij x i n i=1 τ (g) ij (x i µ (g+1) j )(x i µ (g+1) n i=1 τ (g) ij j ) Remark: We have assumed that the number k of components in the mixture fitted to the sample is known or fixed in advance. There are techniques for choosing the optimal number of components (see, e.g., Chapter 6 in McLachlan and Peel 2000; Claeskens and Hjort 2008). Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 48.
49 We can use the R package mclust for normal mixture fitting to a data set. Example: Times between Old Faithful eruptions (Y ) and duration of eruptions (X ). Datos = read.table( Datos-geyser.txt,header=TRUE) XY = cbind(datos$x,datos$y) # Normal mixture fitting with 2 components faithfuldens = densitymclust(xy,g=2,modelnames="vvv") summary(faithfuldens, parameters = TRUE) Density estimation via Gaussian finite mixture modeling Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 2 components: log.likelihood n df BIC ICL Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 49
50 Clustering table: Mixing probabilities: Means: [,1] [,2] [1,] [2,] Variances: [,,1] [,1] [,2] [1,] [2,] [,,2] [,1] [,2] [1,] [2,] Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 50
51 plot(faithfuldens,xy,xlab="x",ylab="y") Y X Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 51
52 plot(faithfuldens, type = "persp", col = grey(0.8)) Density Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 52
53 References Claeskens, G. and Hjort, N.L. (2008). Model Selection and Model Averaging. Cambridge University Press. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, Johnson, R.A. and Wichern, D.W. (2007). Applied Multivariate Statistical Analysis. Prentice Hall. McAssey, M.P. (2013). An empirical goodness-of-fit test for multivariate distributions. Journal of Applied Statistics, 40, McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley. Peña, D. (2002). Análisis de datos multivariantes. McGraw-Hill. Székely, G.J. and Rizzo, M.L. (2005). A new test for multivariate normality. Journal of Multivariate Analysis, 93, Székely, G.J. and Rizzo, M.L. (2013). Energy statistics: a class of statistics based on distances. Journal of Statistical Planning and Inference, 143, Székely, G.J., Rizzo, M.L. and Bakirov, N.K. (2007). Measuring and testing independence by correlation of distances. Annals of Statistics, 35, Advanced Course in Statistics. Lecturer: Amparo Baíllo 2. Multivariate Distributions 53
Notes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationRandom Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30
Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices
More informationStat 206: Sampling theory, sample moments, mahalanobis
Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because
More information1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.
Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline
More informationTAMS39 Lecture 2 Multivariate normal distribution
TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More informationLecture Note 1: Probability Theory and Statistics
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationIntroduction to Normal Distribution
Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationMultivariate Analysis Homework 1
Multivariate Analysis Homework A490970 Yi-Chen Zhang March 6, 08 4.. Consider a bivariate normal population with µ = 0, µ =, σ =, σ =, and ρ = 0.5. a Write out the bivariate normal density. b Write out
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationInferences about a Mean Vector
Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University
More information5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.
88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationElements of Probability Theory
Short Guides to Microeconometrics Fall 2016 Kurt Schmidheiny Unversität Basel Elements of Probability Theory Contents 1 Random Variables and Distributions 2 1.1 Univariate Random Variables and Distributions......
More informationGaussian random variables inr n
Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationMULTIVARIATE DISTRIBUTIONS
Chapter 9 MULTIVARIATE DISTRIBUTIONS John Wishart (1898-1956) British statistician. Wishart was an assistant to Pearson at University College and to Fisher at Rothamsted. In 1928 he derived the distribution
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,
More information5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors
EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationSTAT 730 Chapter 5: Hypothesis Testing
STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The
More informationChapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:
4.1 Bivariate Distributions. Chapter 4. Multivariate Distributions For a pair r.v.s (X,Y ), the Joint CDF is defined as F X,Y (x, y ) = P (X x,y y ). Obviously, the marginal distributions may be obtained
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationLecture 14: Multivariate mgf s and chf s
Lecture 14: Multivariate mgf s and chf s Multivariate mgf and chf For an n-dimensional random vector X, its mgf is defined as M X (t) = E(e t X ), t R n and its chf is defined as φ X (t) = E(e ıt X ),
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationMATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2
MATH 3806/MATH4806/MATH6806: MULTIVARIATE STATISTICS Solutions to Problems on Rom Vectors Rom Sampling Let X Y have the joint pdf: fx,y) + x +y ) n+)/ π n for < x < < y < this is particular case of the
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationRandom Vectors and Multivariate Normal Distributions
Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random 75 variables. For instance, X = X 1 X 2., where each
More informationBASICS OF PROBABILITY
October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationACM 116: Lectures 3 4
1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance
More informationMultivariate Random Variable
Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationBIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84
Chapter 2 84 Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random variables. For instance, X = X 1 X 2.
More informationSTT 843 Key to Homework 1 Spring 2018
STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationStatistical Inference with Monotone Incomplete Multivariate Normal Data
Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationMultivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013
Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm 2 Chapter Gaussian Vectors.
More informationMultivariate Distributions
Copyright Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/adafaepov/ Appendix E Multivariate Distributions E.1 Review of Definitions Let s review
More informationExplore the data. Anja Bråthen Kristoffersen Biomedical Research Group
Explore the data Anja Bråthen Kristoffersen Biomedical Research Group density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationIII - MULTIVARIATE RANDOM VARIABLES
Computational Methods and advanced Statistics Tools III - MULTIVARIATE RANDOM VARIABLES A random vector, or multivariate random variable, is a vector of n scalar random variables. The random vector is
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five
More information5 Inferences about a Mean Vector
5 Inferences about a Mean Vector In this chapter we use the results from Chapter 2 through Chapter 4 to develop techniques for analyzing data. A large part of any analysis is concerned with inference that
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationProbability Theory and Statistics. Peter Jochumzen
Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................
More informationMATH5745 Multivariate Methods Lecture 07
MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationRandom vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.
Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just
More informationJournal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error
Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More information8 - Continuous random vectors
8-1 Continuous random vectors S. Lall, Stanford 2011.01.25.01 8 - Continuous random vectors Mean-square deviation Mean-variance decomposition Gaussian random vectors The Gamma function The χ 2 distribution
More informationStat 5101 Notes: Brand Name Distributions
Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationIntroduction to Probability and Stocastic Processes - Part I
Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationCanonical Correlation Analysis of Longitudinal Data
Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationStat 206: the Multivariate Normal distribution
Stat 6: the Multivariate Normal distribution James Johndrow (adapted from Iain Johnstone s notes) 16-11- Introduction The multivariate normal distribution plays a central role in multivariate statistics
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationRandom Matrices and Multivariate Statistical Analysis
Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationA Squared Correlation Coefficient of the Correlation Matrix
A Squared Correlation Coefficient of the Correlation Matrix Rong Fan Southern Illinois University August 25, 2016 Abstract Multivariate linear correlation analysis is important in statistical analysis
More informationDependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.
MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y
More information