MULTIVARIATE DISTRIBUTIONS

Size: px
Start display at page:

Download "MULTIVARIATE DISTRIBUTIONS"

Transcription

1 Chapter 9 MULTIVARIATE DISTRIBUTIONS John Wishart ( ) British statistician. Wishart was an assistant to Pearson at University College and to Fisher at Rothamsted. In 1928 he derived the distribution which bears his name. As a professor of Statistics and Agriculture at Cambridge, he made outstanding contributions to experimental designs as well. He combined his academic work with that of a consultant for international organizations involved in the application of statistical methods to agriculture. 9.1 BASIC CONCEPTS A central problem in data analysis is deciding whether the properties found in a sample can be generalized to the population from which they are taken. In order to be able to carry out this extrapolation we need to build a model of the system which generates the data, meaning that we assume a probability distribution for the random variable in the population. This chapter reviews some basic concepts for constructing multivariate statistical models and presents the distributions which will be used for inference in the following chapters Vector random variables. A vector random variable is the result of observing p characteristics of an item in a population. For example, if we observe the age and weight of students at a university we will have the values of a bivariate random variable; if we observe the number of workers, sales and profit of the companies in a sector, we will have a trivariate vector. 1

2 We can say that we have a joint distribution of a random vector when the following are specified: 1. The sample space of the possible values. Representing each value with a point in the space of dimension p, R p, of real numbers, the sample space is, in general, a subset of this space. 2. The probabilities of each possible result of the sample space. We say that a vector variable p dimensional is discrete when each of the p scalar variables which comprise it are discrete as well. For example, eye and hair color make up a discrete bivariate variable. Analogously, we say that the variable is continuous if its components are. When some of its components are discrete and others are continuous we say that the variable is mixed. For example, the variable: gender (0=male, 1=female), height and weight, is a mixed trivariate. In this chapter, for the sake of simplicity and except when otherwise indicated, we will assume that the vector variable is continuous Joint distribution The joint distribution of a vector random variable F (x) is defined at point x 0 = (x 0 1,..., x0 p) by using: F (x 0 ) = P (x x 0 ) = P (x 1 x 0 1,..., x p x 0 p) where P (x x 0 ) represents the probability that the variable will take values less than or equal to the particular value under consideration, x 0. Thus, the distribution function accumulates the probabilities of the values which are less than or equal to the point considered, and will be non-decreasing. Although the distribution function is of great theoretical interest, in practice it is more useful to work with the density function for continuous variables or with the probability function for discrete variables. Let p(x 0 ) be the probability function of a discrete variable defined by p(x 0 ) = P (x = x 0 ) = P (x 1 = x 0 1,..., x p = x 0 p). We say that vector x is absolutely continuous if there is a density function, f(x), which satisfies: F (x 0 ) = x 0 f(x)dx, (9.1) where dx = dx 1...dx p and the integral is a multiple integral in dimension p. The probability density has the usual interpretation of a density: mass by unit of volume. Thus, the joint density function must verify a) f(x) = f(x 1..., x p ) 0. Density is always non-negative. 2

3 b) f(x)dx =... f(x 1,...x p ) dx 1...dx p = 1. If we multiply the density in each point by the element of volume in p dimensions ( if p = 2, it will be the area of a rectangle, if p = 3 it will be the volume of a parallelepiped, etc.) and we sum (integrate) for all points with non-zero density, we obtain the mass of total probability which standardizes the unit value. The likelihood of outcomes defined as subsets of the sample space will be equal to the total probability corresponding to the subset. These probabilities are calculated by integrating the density function over the subset. For example, for a bivariate variable and outcomes A where A = (a < x 1 b; c < x 2 d): whereas, in general, P (A) = b d a c P (A) = f(x 1, x 2 )dx 1 dx 2 A f(x)dx. In this chapter, and in order to simplify the notation, we will use the letter f when referring to the density function of any variable and will indicate the variable by the argument of the function, so that f(x 1 ) is the density function of the variable x 1, and f(x 1, x 2 ) is the density function of the bivariate variable (x 1, x 2 ) Marginal and conditional distributions Given a p dimensional random vector (x 1,..., x p ) we will call the univariate distribution of each component x i its marginal distribution. In the marginal distribution components are considered individually, ignoring the values of the remaining components. For example, for bivariate continuous variables, the marginal distributions are obtained as: f(x 1 ) = f(x 2 ) = f(x 1, x 2 )dx 2, (9.2) f(x 1, x 2 )dx 1, (9.3) and represent the density function of each variable ignoring the values taken by the other. As mentioned earlier, the letter f refers generically to the density function. The functions f(x 1 ) and f(x 1, x 2 ) in general are totally distinct and share only the fact that they are density functions, thus f (.) 0, and f(x 1 )dx 1 = 1 3

4 f(x 1, x 2 )dx 1 dx 2 = 1. In order to justify (9.2), we calculate the probability that variable x 1 belongs to an interval (a, b] starting from the joint distribution. Thus: P (a < x 1 b) = P (a < x 1 b; < x 2 ) = = b a f(x 1 )dx 1 b a dx 1 f(x 1, x 2 )dx 2 = which justifies (9.2). We can see that in this equation, x 1 is any fixed value. We assume that the accuracy of the measurement of x 1 is x 1, meaning that, we will say that a value x 1 has occurred if a value is observed in the interval x 1 ± x 1 /2. The likelihood of this value is the density value in the center of the interval, f(x 1 ) multiplied by the longitude of the base x 1. If we multiply both sides of equation (9.2) by the constant x 1, we have the first part f(x 1 ) x 1, which is the probability of this value of x 1 calculated with its univariate distribution. In the second part we have the sum of all the probabilities of the pairs of possible values (x 1, x 2 ), when x 1 is fixed and x 2 takes all possible values. These probabilities are given by f(x 1, x 2 )dx 2 x 1, and summing for all possible values of x 2 we again obtain the probability of the value x 1. If x = (x 1, x 2 ), where x 1 and x 2 are at the same time vector variables, the conditional distribution of x 1 is defined for a given value of the variable x 2 = x 0 2, by: f(x1 x 0 2) = f(x 1, x 0 2 ) f(x 0 2 ) (9.4) assuming that f(x 0 2 ) 0. This definition is consistent with the concept of conditional probability and with the density function for a variable. We assume, in order to simplify, that both variables are scalar. Then multiplying both sides by x 1 we have f(x 1 x 0 2) x 1 = f(x 1, x 0 2 ) x 1 x 2 f(x 0 2 ) x 2 and the first member represents the conditional probability that is expressed as a ratio of the joint and marginal probability. From this definition it can be deduced that: f(x 1, x 2 ) = f(x 2 x 1 )f(x 1 ). (9.5) The marginal distribution of x 2, can be calculated according to (9.3) and (9.5) as: f(x 2 ) = f(x 2 x 1 )f(x 1 )dx 1, (9.6) 4

5 which has a clear intuitive interpretation. If we multiply both sides by x 2, the element of volume, then on the left side, we have f(x 2 ) x 2, the probability of the given value of x 2. Formula (9.6) tells us that this probability can be calculated, first by obtaining the probability of the value x 2 for each possible value of x 1, given by f(x 2 x 1 ) x 2, and then multiplying each one of these values by the probabilities of x 1, f(x 1 )dx 1, which is equivalent to averaging the conditional probabilities by x 1 with respect to the distribution of this variable. As a result of (9.5) and (9.6) the conditional distribution f(x 1 x 2 ) can then be expressed as: f(x 1 x 2 ) = f(x 2 x 1 )f(x 1 ) f(x2 x 1 )f(x 1 )dx 1 (9.7) which is Bayes theorem for density functions, and constitutes a fundamental tool in Bayesian inference which we will study in Chapter 11. For discrete variables, the concepts are similar, but the integrals are replaced with sums as shown in the example below. Example: Table 9.1 gives the joint distribution of the discrete random variables: x 1 : vote for one of four possible political parties, whose four possible values are P 1, P 2, P 3 and P 4 ;and x 2 : the level of voter income, which takes the three values H (high), M (medium) and L (low). We calculate the marginal distributions, the conditional distribution of the votes for people of low income and the conditional distribution of income for those who voted for party P 4. H M L P P P P Table 9.1. Joint distribution of votes and income in a population In order to calculate the marginal distribution, we add a row and a column to the table and there we include the totals resulting from summing the rows and columns. With this, Table 9.2 is obtained. For example, the marginal distribution of income indicates that the probability of high income is.2, of middle income is.6 and low income is.2. We see that the marginal distributions are the totals which were obtained in the margins of the table (which explains its name) by adding the joint probabilities by rows and columns. 5

6 H M L Marginal votes P P P P Marginal income Table 9.2. Joint and marginal distribution of votes and income in a population In order to calculate the conditional distribution of votes for people with low income, we divide each box of the low income column by the total of the column. The resulting distribution is shown in Table 9.3. P 1 P 2 P 3 P Table 9.3 Conditional distribution of votes for people with low income. For example, the value.05 is the result of dividing.01, the joint probability of low income and voting for P 1 by the marginal probability of low income,.1. This table indicates that the party preferred by people with low incomes is the P 4 with 40% of the votes, followed by P 3 with 35%. Table 9.4 gives the conditional distribution of income by voters for party P 4. The most numerous group of voters for this party is the middle income group (52.63%) followed by low income (42.11%) and high (5.26%). H M L Total P Table 9.4 Conditional distribution of income by people who voted for P Independence A fundamental concept in the study of random variables is that of independence. We say that two random vectors x 1, x 2 are independent if the value of one has no influence on the value of the other and vice versa. In other words, the distribution of x 2 does not depend on x 1 and it is the same for any value of x 1. This is expressed mathematically as: f(x 2 x 1 ) = f(x 2 ) (9.8) which indicates that the conditional distribution is identical to the marginal. Using (9.5), an equivalent definition of independence between two random variables x 1, x 2 is: f(x 1, x 2 ) = f(x 1 )f(x 2 ) (9.9) meaning, two random vectors are independent if their joint distribution (their joint probability) is the product of the marginal distributions (of 6

7 the individual probabilities). In general, we say that the random variables x 1,..., x p, with joint density f(x 1,..., x p ) are independent, if it is verified that: f(x 1,..., x p ) = f(x 1 )f(x 2 )...f(x p ) (9.10) Joint independence is a very strong condition: if x 1,..., x p are independent, the same is also true for any subset of variables (x 1,..., x h ) with h p, as well as for any set of functions of individual variables, g 1 (x 1 )...g 1 (x p ). When the variables are independent we gain nothing from a joint study of them, thus it is advisable to study them individually. It is easy to prove that if variables x 1 and x 2 are independent and we build new variables y 1 = g 1 (x 1 ), y 2 = g 2 (x 2 ), where the first variable is only a function of x 1 and the second only of x 2, then variables y 1, y 2 are also independent The curse of dimensionality The curse of dimensionality is a term coined by the mathematician R. Bellman to describe how the complexity of a problem increases when the dimension of the variables involved increases as well. In multivariate statistical analysis, this problem appears in various ways. First, by increasing the dimension, the space becomes increasingly empty making any inference process of the data more difficult. This is a consequence of the fact that when the dimension of the space increases, so does its volume (or hypervolume in general), and since the total mass of probability is the unit, the density of the random variable must diminish. As a result, the probability density of a random variable of high dimensions is very low in space, or the equivalent being that the space grows progressively emptier. To illustrate the problem, we assume that the density of a p dimensional variable is uniform in the hypercube [0,1] p and that all components are independent. For example, samples of this variable can be produced by taking sets of p random numbers between zero and one. Let us consider the probability that a random value of this variable is within the hypercube [0; 0, 9] p. For p = 1, the scalar case, this probability is 0, 9, for p = 10, this value drops to 0, 9 10 = 0, 35, and for p = 30 it is 0, 9 30 = 0, 04. We can see that as the dimension of the space increases, any set will progressively become emptier. A second problem is that the number of parameters needed to describe the data also increases with the dimension. To represent the mean and covariance matrix in dimension p we need p + p(p + 1)/2 = p(p + 3)/2 which is of order p 2. Thus, the complexity of the data, measured by the number of parameters needed to represent them grows, in this case, with the square of the dimension of the space. For example, a sample of size 7

8 100 is a large sample for a unidimensional variable, but is quite small for a vector variable with p = 20. As a general rule, multivariate procedures need a ratio of n/p > 10 and it is preferable for this ratio to be greater than 20. The result of an increase in dimension is an increase in the uncertainty of the problem: the joint forecast of the values of the variable become more difficult. In practice, this problem diminishes if the variables are strongly dependent among themselves, since the density of probability is concentrated in determinate areas of the space, defined by the relationship of dependence instead of being spread throughout the sample space. This dependence can be used, extending the methods seen in earlier chapters, in order to reduce the dimension of the space of the variables and thus avoiding the curse of dimensionality. 9.2 PROPERTIES OF VECTOR VARIABLES Mean vector We use the term expectation or mean vector, µ, of a multidimensional random variable, x, to refer to a vector whose components are the expectations or means of the components of the random variable. We write the mean vector as: µ = E [x] (9.11) where it is understood that the expectation acting on a vector or matrix is the result of applying this operator to (taking the means of) each of the components. If the variable is continuous: µ = E [x] = xf(x)dx The expectation is a linear function, meaning that for any matrix A, and vector b, we have: E [Ax + b] = AE[x 1 ] + b. If x = (x 1, x 2 ) we also have that, for scalars a and b: and if x 1 and x 2 are independent: E [ax 1 + bx 2 ] = ae [x 1 ] + be[x 2 ]. E [x 1 x 2 ] = E [x 1 ]E[x 2 ]. 8

9 9.2.2 Expectation of a function Generalizing on the idea of expectation, if we have a scalar function y = g(x) of a vector of random variables, the mean value of this function is calculated as: E [y] = yf(y)dy =... g(x)f(x 1,..., x n )dx 1,..., dx n (9.12) The first integral takes into account that y is scalar and if we know its density function, f(y), the expectation can be calculated in the usual way. The second shows that it is not necessary to calculate f(y) in order to determine the average value of g(x): it is enough to weight the possible values by their probabilities. This definition is consistent, and it its easy to check that both methods lead to the same result. If x = (x 1, x 2 ), and we define y 1 = g 1 (x 1 ), y 2 = g 2 (x 2 ), if x 1 and x 2 are independent E [y 1 y 2 ] = E(g 1 (x 1 ))E(g 2 (x 2 )) Variance and covariance matrix We use the term covariance matrix of a random vector x = (x 1,..., x p ), of R p, with a mean vector µ = (µ 1,..., µ p ) to refer to the square matrix of order p obtained by: V x = E [ (x µ)(x µ) ] (9.13) The matrix V x contains the variances of the components in the diagonal which are represented by σi 2. Outside the diagonal, the covariances between pairs of variables are represented by σ ij. The covariance matrix is symmetric and is semidefinite positive. This means that given any vector, ω, it is true that: ω V x ω 0. In order to demonstrate this property we define a unidimensional variable by: y = (x µ) ω where ω is an arbitrary vector in R p. The variable y has an expectation of zero because E(y) = E [(x µ)] ω =0 and its variance must be non-negative: var(y) = E [ y 2] = ω E [ (x µ)(x µ) ] ω = ω V x ω 0 9

10 Let the mean variance be the average of the variances given by tr(v x )/p, the generalized variance be V x and the effective variance be V P = V x 1/p which is a global measure of the joint variability for all the variables that takes into account their independent structure. The interpretation of these measures is similar to that studied in Chapter 3 for data distributions Transformations of random vectors. When working with density functions of random vectors it is important to remember that, in the univariate case, the density function has dimensions: if p = 1, univariate case, probability by longitudinal unit; if p = 2, probability by surface unit; if p = 3 by unit of volume, and if p > 3 by unit of hypervolume. Therefore, if we change the units of measurement of the variables, the density function must also be modified. In general, let x be a vector in R p with density function f x (x) and y be another vector in R p, defined by the transformation: y 1 = g 1 (x 1,..., x p ).. y p = g p (x 1,..., x p ), where we assume that there are inverse functions x 1 = h 1 (y 1,..., y p ),..., x p = h p (y 1,..., y p ), and that all the functions implied have all the derivatives. Then it can be proved that the density function of vector y is given by: f y (y) = f x (x) dx dy, (9.14) where here we have used f y and f x to represent the density functions of variables y, and x, in order to avoid confusion. The term dx/dy represents the Jacobian of the transformation (which adjusts the probability by the measurement s change of scale) given by the determinant: dx dy = x 1 y x p... y 1 x 1 y p. x p y p which we assume is different from zero in the range of the transformation. An important case is that of linear transformations of the variable. If we take y = Ax 10

11 where A is a non-singular square matrix, the derivative of the components of x with respect to y will be obtained from x = A 1 y, and thus will be given by the elements of the matrix A 1. The Jacobian of the transformation will be A 1 = A 1 and the density function of the new variable y, will be f y (y) = f x (A 1 y) A 1 (9.15) an expression which indicates that in order to obtain the density function of variable y, we substitute the density function of variable x using A 1 y and we divide the result by the determinant of the matrix A Expectations of linear transformations Let x be a random vector of dimension p and we define a new random vector y of dimension m, (m p), with y = Ax, (9.16) where A is a rectangular matrix of dimensions m p. Letting µ x, µ y, be the mean vectors and V x, V y the covariance matrices, we have: µ y = Aµ x (9.17) which is straightforward taking expectations in (9.16). Also: V y = AV x A (9.18) where A is the transposed matrix of A. Indeed, by applying the definition of covariance matrices and the equations (9.16) and (9.18) V y = E [ (y µ y )(y µ y ) ] = E [ A(x µ x )(x µ x ) A ] = AV x A Example: Clients of a transportation service evaluated the following areas: punctuality (x 1 ), quickness (x 2 ) and cleanliness (x 3 ). The means, on a scale of zero to ten were 7, 8 and 8.5 respectively, with the variance covariance matrix: V x = Two indicators of service quality are built. The first is the average of the three scores and the second is the difference between the average of punctuality and quickness, which indicate the reliability of the service, and cleanliness, which indicates comfort. Calculate the mean vector and covariance matrix for these two indicators. 11

12 The expression of the first indicator is y 1 = x 1 + x 2 + x 3 3 and the second y 2 = x 1 + x 2 x 3 2 These two equations can be written in matrix form [ ] [ ] x y1 1/3 1/3 1/3 1 = x y 2 1/2 1/2 1 2 x 3 The mean vector is [ µ1 µ 2 ] = [ 1/3 1/3 1/3 1/2 1/2 1 ] 7 8 8, 5 = [ 7, 83 1 ] and the value 7.83 is a global measure of the average quality of the service and minus one is the reliability/comfort ratio. The variance covariance matrix is: [ ] /3 1/2 1/3 1/3 1/3 V y = /3 1/2 = 1/2 1/ /3 1 [ ] = which indicates that the variability of both indicators is similar and that they are negatively related, since the covariance is negative. 9.3 Dependence between random variables Conditional expectations The expectation of a vector x 1 conditional on a given value of another vector x 2 is the expectation of the distribution of x 1 conditional on x 2 and is given by: E [x 1 x 2 ] = x 1 f (x 1 x 2 ) dx 1. In general, this expression is a function of the value x 2. When x 2 is a fixed value, the conditional expectation is a constant. If x 2 is a random variable, the conditional expectation will also be a random variable. The expectation of a random vector x 1 can be calculated from the conditional expectations in two steps: in the first we calculate the expectation 12

13 of x 1 conditional on x 2. The result is a random function which is dependent on the random variable x 2. In the second, we calculate the expectation of this function with relation to the distribution of x 2. Then: E(x 1 ) = E [E(x 1 x 2 )]. (9.19) This expression indicates that the expectation of a random variable can be obtained by averaging the conditional expectations by their probability of appearance, or, in other words, the expectation of the conditional mean is the marginal expectation or unconditional. Proof E(x 1 ) = x 1 f(x 1 )dx 1 = x 1 f(x 1 x 2 )dx 1 dx 2 = x 1 f(x 1 x 2 )f(x 2 )dx 1 dx 2 [ ] = f(x 2 ) x 1 f(x 1 x 2 )dx 1 dx 2 = E [x 1 x 2 ] f(x 2 )dx 2 = E [E(x 1 x 2 )] Conditional variances The variance of x 1 conditional on x 2 is defined as the variance of the distribution of x 1 conditional on x 2. We use the notation V ar(x 1 x 2 ) = V 1/2 and this matrix will have the properties of covariance matrices studied earlier. If x 1 is scalar, its variance can also be calculated from the properties of conditional distribution. Specifically, it can be expressed as the sum of two terms: the first associated with conditional means and the second with conditional variances. In order to obtain this expression, we start with the decomposition: x 1 µ 1 = x 1 E(x 1 /x 2 ) + E(x 1 /x 2 ) µ 1 where x 2 is any random vector with finite conditional expectation E(x 1 /x 2 ). By squaring this expression and taking the expectations of both sides with respect to both x 1 and x 2, we have: var(x 1 ) = E(x 1 E(x 1 /x 2 )) 2 +E(E(x 1 /x 2 ) µ 1 ) 2 +2E [(x 1 E(x 1 /x 2 ))(E(x 1 /x 2 ) µ 1 )] the double product is zero, because = E [(x 1 E(x 1 /x 2 ))(E(x 1 /x 2 ) µ 1 )] = [ ] (E(x 1 /x 2 ) µ 1 ) (x 1 E(x 1 /x 2 ))f(x 1 /x 2 )dx 1 f(x 2 )dx 2 = 0 13

14 and the integral in brackets is null. On the other hand, as in (9.19): E [E(x 1 /x 2 )] = E(x 1 ) = µ 1, the second term is the expectation of the square of a random variable E(x 1 /x 2 ) minus its mean, µ 1. It is then the variance of the variable, E(x 1 /x 2 ). The first term can be expressed by taking first the expectation with respect to (x 1 /x 2 ) which leads to the variance var(x 1 /x 2 ), and second the expected value of this variable with respect to x 2 Then we obtain: var(x 1 ) = E [var(x 1 /x 2 )] + var [E(x 1 /x 2 )] (9.20) This expression is known as the variance decomposition, since it decomposes the variability of the variable in two main sources of variation. On one hand, variability exists because the variances of the conditional distributions, var(x 1 /x 2 ), may be different, and the first term averages these variances. On the other hand, variability also exists because the means of the conditional distributions may be different, and the second term includes the differences between the conditional means, E(x 1 /x 2 ), and total mean, µ 1, through the use of the term var [E(x 1 /x 2 )]. We can see that the variance of variable x 1 is, in general, greater than the average of the variances of the conditional distributions, due to the fact that in the conditionals, the variability is calculated from the conditional means, E(x 1 /x 2 ), whereas var(x 1 ) measures the variability with respect to the global mean, µ 1. If all the conditional means are equal to µ 1, which will happen for example if x 1 and x 2 are independent, then the term var [E(x 1 /x 2 )] is zero and the variance is the weighted average of the conditional variances. If E(x 1 /x 2 ) is not constant then the variance of x 1 will increase according to the increase in variability of the conditional means. This decomposition of the variance appears in the analysis of the variance of univariate linear models: (xi x) 2 /n = (x i x i ) 2 /n + ( x i x) 2 /n where, in this expression, x i is the estimation of the conditional mean in the linear model. The total variability, which is equivalent to var(x 1 ), is decomposed into two uncorrelated terms. On one side, there is the average of the estimations of var(x 1 /x 2 ), which is calculated by averaging the differences between the variable and the conditional mean. On the other side, the variability of the conditional expectations which are estimated in the linear models by taking the differences x i x Correlation matrix The correlation matrix of a random vector x with a covariance matrix V x, is defined by R x = D 1/2 V x D 1/2 14

15 where D = diag(σ 2 1,..., σ 2 p) is the diagonal matrix which contains the variances of the variables. The correlation matrix will then be a square and symmetric matrix with ones in the diagonal and the correlation coefficients between pairs of variables outside the diagonal. Simple correlation coefficients or linear correlation coefficients, are given by ρ ij = σ ij σ i σ j The correlation matrix is also positive semidefinite. A global measure of the linear correlations existing in the set of variables is the dependence, defined by D x = 1 R x 1/(p 1) whose interpretation for random variables is analogous to that presented in Chapter 3, statistical variables. For p = 2 the matrix R x has ones in the diagonal and the coefficient ρ 12 outside, R x = 1 ρ 2 12, and the dependence D x = 1 (1 ρ 2 12 ) = ρ2 12 coincides with the determination coefficient. Just as we saw in Chapter 3, in the general case where p > 2, the dependence is a geometrical average of the determination coefficients Multiple Correlations A linear measure of the ability to predict y using a linear function of the variables x is the multiple correlation coefficient. Assuming, without a loss of generality, that the variables have a zero mean, we define the best linear prediction of y as the function β x which minimizes E(y β x) 2. It can be shown that β = V 1 x V xy where V x is the covariance matrix of x and V xy is the vector of covariances between y and x. The simple correlation coefficient between the scalar variables y and β x is called the multiple correlation coefficient. We can prove that if we let σ ij be the terms of the covariance matrix V of a variable vector, and σ ij be the terms of the matrix V 1, then the multiple correlation coefficient, R i.r between each variable (i) and the others (R) is calculated as: Ri.R 2 = 1 1 σ ij σ ij In particular, if E(y x) is a linear function of x then E(y x) = β x and Ri.R 2 can also be calculated as 1 σ2 y x /σ2 y, where σy x 2 is the conditional distribution variance, y x and σy 2 is the marginal variance of y. 15

16 9.3.5 Partial Correlations Let us assume that we obtain the best linear approximation to a vector of variables x 1 of dimensions p 1 1 starting from another vector of variables x 2 of dimensions p 2 1. Supposing that the variables have a zero mean, this implies calculating a vector Bx 2 where B is a coefficient matrix of dimensions p 1 p 2 such that p 1 j=1 E(x 1j β j x 2) 2 is minimum, where x 1j is the component j of the vector x 1 and β j the row j of the matrix B. We let V 1/2 be the covariance matrix of the variable x 1 Bx 2. If we standardize this covariance matrix in order to obtain the correlations, the resulting correlation coefficients are called partial correlation coefficients between the components of x 1 given the variables x 2. The square and symmetric matrix of order p 1 R 1/2 = D 1/2 1/2 V 1/2D 1/2 1/2 is called the partial correlation matrix between the components of vector x 1 when we control for (or are conditional on) vector x 2, where D 1/2 = diag(σ 2 1/2,..., σ2 k/2 ) and σ 2 j/2 is the variance of the variable x 1j β j x 2. In particular, if E(x 1 x 2 ) is linear in x 2, then E(x 1 x 2 ) = Bx 2 and V 1/2 is the covariance matrix of the conditional distribution of x 1 x MULTINOMIAL DISTRIBUTION Suppose that we observe elements which we can classify into two classes, A and A. If, for example, we classify newborns in a hospital as male (A) or female (A), the days in a month as rainy (A) or not (A), or the objects manufactured by a machine as good (A) or defective (A). We assume that the process which generates the elements is stable, that there exists a constant likelihood that elements from either class will appear, P (A) = p = cte, and that the process has no memory, or in other words P (A A) = P (A A). Supposing that we observe random elements of this process and we define the variable { } 1, if the observation belongs to class A x = 0, otherwise this variable follows a Bernoulli distribution, with P (x = 1) = p and P (x = 0) = 1 p. If we observe n elements instead of one, and we define the variable y = n i=1 x i, that is to say, we count the number of elements in n that belong to the first group, then the variable y follows a binomial distribution with P (y = r) = n! r!(n r)! pr (1 p) n r. We can generalize this distribution allowing for G classes instead of two, and we let p be the vector of probabilities of belonging to the classes, 16

17 p =(p 1,..., p G ), where p j = 1. We can now define the G random variables: { } 1, if the observation belongs to group j x j = j = 1,..., G 0, otherwise and the result of an observation is a value of the vector of G-variables x = (x 1,..., x G ), which will always take the form of x = (0,..., 1,...0), since only one of the G components can take the value of one, that associated with the observed class for this element. As a result, the components of this random variable are not independent since they are bound by the equation G x j = 1. j=1 In order to describe the result of the observation, it would be enough to define G 1 variables, as is done in the Bernoulli distribution where a variable is defined only when there are two classes, since the value of the last variable is set when the rest are known. Nevertheless, with more than two classes it is customary to work with the G variables and the distribution of the multivariate variable thus defined is called the point multinomial. Its probability function is P (x 1,.., x G ) = p x px G G = p x j j Since only one of x j is different from zero, the probability of the j-th being one is precisely p j, the probability that the observed element belongs to class j. Generalizing this distribution, let (x 1,..., x n ) be a sample of n values of this specific multinomial variable which is a result of classifying n elements of a sample into the G classes. We use the term multinomial distribution to to refer to the variable sum: y = n i=1 which indicates the number of elements in the sample that correspond to each of the classes. The components of this variable, y =(y 1,..., y G ), represent the observed frequencies for each class and can take on the values y i = 0, 1,...n, but they are always subject to the restriction: x i yi = n, (9.21) and their probability function will be: P (y 1 = n 1,.., y G = n G ) = 17 n! n 1!...n G! pn pn G G

18 where ni = n. The combinatorial term takes into account the permutations of n elements when there are n 1,..., n G repetitions. We can verify that E(y) =np = µ y and V ar(y) =n [ diag(p) pp ] = diag(µ y ) 1 n µ yµ y where diag(p) is a square matrix with the elements of p in the diagonal and zeros outside. This matrix is singular since the elements of y are bound by the restriction equation (9.21). It is easy to verify that the marginal distributions are binomials, with: E[y j ] = np j, DT [y j ] = np j (1 p j ). Additionally, any conditional distribution is multinomial. With G 1 variables, for example, when y G takes the fixed value, n G is a multinomial in the G 1 remaining variables with a sample size of n = n n G. The conditional distribution of y 1, y 2 when y 3 = n 3,..., y G = n G is a binomial, with n = n n 3 n 4... n G, and so on. Example: In a quality control process, elements can have three types of defects: slight (A 1 ), medium (A 2 ), serious (A 3 ) and it is known that among the defective elements the probability of error is p 1 = P (A 1 ) = 0, 7; p 2 = P (A 2 ) = 0, 2; and p 3 = P (A 3 ) = 0, 1. Calculate the probability that in the next three defective elements there will be exactly one with a serious defect. The possible defects in the next three elements are, without taking into account the order of appearance: A 1 A 1 A 3 ; A 1 A 2 A 3 ; A 2 A 2 A 3 and their probabilities according to the multinomial distribution are: Then: P (x 1 = 2, x 2 = 0, x 3 = 1) = P (x 1 = 1, x 2 = 1, x 3 = 1) = P (x 1 = 0, x 2 = 2, x 3 = 1) = 3! 2!0!1! 0, 72 0, 2 0 0, 1 = 0, 147 3! 0, 7 0, 2 0, 1 = 0, 084 1!1!1! 3! 0!2!1! 0, 70 0, 2 2 0, 1 = 0, 012 P (x 3 = 1) = 0, , , 012 = 0, 243 The same result can also be obtained using the Binomial (A 3 A 3 ) with probabilities (0, 9; 0, 1) and: P (x 3 = 1) = ( 3 1 ) 0, 1 + 0, 9 2 = 0,

19 9.5 THE DIRICHLET DISTRIBUTION The Dirichlet distribution is introduced in order to represent variables between zero and one and whose sum is equal to the unit. These data are known as compositional data. For example, suppose that we are studying the relative weight that consumers assign to a set of quality attributes, and that the evaluation of the importance of those attributes is carried out on a scale of zero to one. Thus, for example, with three attributes a client can assign the following scores (0.6, 0,3, 01) indicating that the first attribute has an importance of 60%, the second 30% and the third, 10%. Other examples of this type of data are the proportion of time invested in certain activities, or the composition of percentages of different substances found in a group of products. In all of these cases the data are continuous variable vectors x =(x 1,..., x G ) so that, by construction, 0 x j 1 and there is a restriction equation: G x j = 1. j=1 An appropriate distribution to represent these types of situations is the Dirichlet distribution, whose density function is: f(x 1,..., x G ) = Γ(α 0 ) Γ(α 1 )Γ(α 2 )...Γ(α G ) xα x α G 1 G where Γ(.) is a gamma function and α = (α 1,..., α G ) is the vector of parameters that characterizes the distribution, and It is proven that α 0 = α 1 = G α j. j=1 E(x) = α/α 0 = µ x, therefore, the parameters α j indicate the relative expectation of each component and 1 V ar(x) = (α 0 + 1) ( 1 diag(α) 1 α 0 α0 2 αα ). This expression indicates that the variance of each component is: var(x j ) = α j(α 0 α j ) α 2 0 (α 0 + 1). and we see that the parameter α 0 determines the variance of the components and that these variances decrease rapidly with α 0. The Dirichlet variables, as with multinomials, are subject to a restriction equation, and thus are not 19

20 Figure 9.1: marginals. Representation of the Normal bivariate distribution and its linearly independent and their covariance matrix is singular. The covariances between two components are: cov(x i x j ) = α jα i α0 2(α 0 + 1), and the covariances also diminish with α 0, but they increase when the expectations of the variables increase. The reader can appreciate the similarity between the multinomial formulas of probabilities, means and variances and that of the Dirichlet. This similarity is due to the fact that in both cases we classify the results into G groups. The difference is that in the multinomial case we count how many observations of n appear in each group, whereas in the Dirichlet we measure the proportion which that element contains in each class. In the Dirichlet distribution the parameter α 0 has a role similar to that of the sample size and ratios α j / α 0 to the probabilities. 9.6 THE K-DIMENSIONAL NORMAL DISTRI- BUTION The density function of the normal scalar random variable is: f(x) = (σ 2 ) 1/2 (2π) 1/2 exp { (1/2)(x µ) 2 σ 2}. and we write x N(µ, σ 2 ) to show that x has a normal distribution with a mean µ and variance σ 2. Generalizing on this function, we say that a vector x follows a normal distribution p dimensional if its density function is: f(x) = V 1/2 (2π) p/2 exp { (1/2)(x µ) V 1 (x µ) } (9.22) [ Figure 9.1 shows a bivariate normal with µ = (0, 0) and V = 1 1/ 3 1/ 3 1 and its marginal distributions. We write x N p (µ, V). The principal properties of the multivariate normal are: ], 1. The distribution is symmetric around µ. The symmetry can be proven by replacing µ ± a with x in the density and by observing that f(µ + a) =f(µ a). 20

21 2. The distribution has a single maximum in µ. Since V is a positive definite, the term of the exponent (x µ) V 1 (x µ) is always positive, and the density f(x) will be maximum when that term is zero, which occurs when x = µ. 3. The mean of the normal random vector is µ and its covariance matrix is V. These properties, which can be rigorously proven, are deduced form the comparison of the univariate and multivariate densities. 4. If p random variables have a combined normal distribution and they are uncorrelated, they are independent. The proof of this property can be found by taking, in (9.22), the diagonal matrix V and proving that f(x) = f(x 1 ),..., f(x p ). 5. Any p dimensional normal vector x with non-singular matrix V can be converted using a linear transformation into a p dimensional normal vector z with a mean vector 0 and a variance-covariance matrix equal to the identity (I). We let the standard p dimensional normal be the density of z, which is given by: f(z) = { 1 exp 1 } (2π) p/2 2 z z = Π p 1 i=1 (2π) { exp 1/2 1 2 z2 i } (9.23) The proof of this property is the following: Since V is a positive definite, there is a square, symmetric matrix A from which we take the square root and verify that: Defining a new variable: V = AA (9.24) z = A 1 (x µ) (9.25) then x = µ + Az and according to (9.14) the density function of z is f z (z) = f x (µ + Az) A then by using AV 1 A = I, (9.23) is obtained. Therefore, any vector of normal variables x in R p can be transformed into another vector of R p of normal independent variables and of unit variance. 6. The marginal distributions are normal. If the variables are independent, the proof of this property is immediate. A general proof can be seen, for example, in Mardia et al (1979). 21

22 7. Any subset of h < p variables is h dimensional normal. This is an extension of the above property and is proved in the same way. 8. If y is (k 1), k p, the vector y = Ax, where A is a matrix (k p), is k dimensional normal. In particular, any scalar variable y = a x, (where a is a non-zero vector 1 p) has a normal distribution. The proof can be seen, for example, in Mardia et al (1979). 9. By cutting the density function by parallel hyperplanes, the level curves are obtained. Their equation is: (x µ) V 1 (x µ) = cte. The level curves are therefore ellipses. If we consider that all the points in a level curve are at the same distance from the center of the distribution, the implied distance is called the Mahalanobis distance, and it is given by: D 2 = (x µ) V 1 (x µ) (9.26) As an illustration, we take the simplest case of two univariate distribution shown in Figure 9.2. The observation x=3, indicated with an X in the graph, has its Euclidean distance closer to the center of the distribution A, which is zero, than to the center of B, which is ten. Nevertheless, with the Mahalanobis distance, the distance of point X to the distribution A, which has a standard deviation of one, is (3 0) 2 /1, whereas the distance to the center of B, which has a standard deviation of ten, is (3 10) 2 /10 2 = 0, 7 2 and with this distance point X is much closer to distribution B. This is due to the fact that it is much more likely that this point comes from distribution B than from A. 10. The Mahalanobis distance is distributed as a χ 2 with p degrees of freedom. In order to prove this, we carry out the transformation (9.25) and since V 1 = A 1 A 1 we obtain D 2 = z z = z 2 i where each z i is N(0, 1). Thus D 2 χ 2 p. 22

23 Figure 9.2: Point X is closer to the center of the distribution A with the Euclidean distance, but with the Mahalanobis distance it is closer to that of B Conditional distributions We split a random vector into two parts, x = (x 1, x 2 ), where x 1 is a vector of dimension p 1 and x 2 of dimension p 2, where p 1 + p 2 = p. We also split the covariance matrix of vector x into blocks linked to these two vectors such that: [ ] V11 V V = 12 (9.27) V 21 V 22 where, for example, V 11, the covariance matrix of vector x 1, is a square of order p 1, V 12, the covariance matrix between the vectors x 1 and x 2 has dimensions p 1 p 2, and V 22, the covariance matrix of vector x 2, is a square of order p 2. Supposing we wish to calculate the conditional distribution of vector x 1 given the values of vector x 2. We can prove that the distribution is normal, with a mean: and covariance matrix: E [x 1 x 2 ] = µ 1 + V 12 V 1 22 (x 2 µ 2 ) (9.28) V ar [x 1 x 2 ] = V 11 V 12 V 1 22 V 21 (9.29) In order to interpret these expressions, we first assume the bivariate case where both variables are scalar with a zero mean. The mean is then E [x 1 x 2 ] = σ 12 σ 1 22 x 2 which is the usual expression for the straight line regression with a slope β = σ 12 /σ 22. The expression of the conditional variance around the straight line regression is var [x 1 x 2 ] = σ 11 σ 2 12/σ 22 = σ 2 1(1 ρ 2 ) where ρ = σ 12 /σ 1/2 22 σ1/2 11 is the correlation coefficient between the variables. This expression indicates that the variability of the conditional distribution is always less than that of the marginal, and the reduction of variability increases with respect to increases in ρ 2. We now suppose that x 1 is scalar but that x 2 is a vector. The expression of the conditional mean provides the equation for the multiple regression E [x 1 x 2 ] = µ 1 + β (x 2 µ 2 ) 23

24 where β = V 1 22 V 21 with V 21 being the covariance vector between x 1 and the components of x 2. The variance of this conditional distribution is var [x 1 x 2 ] = σ 2 1(1 R 2 ) where R 2 = V 12 V22 1 V 21/σ1 2 is the multiple correlation coefficient. In general cases, these expressions correspond to the set of multiple regressions of the components of x 1 over the variables x 2, which is known as multivariate regression. Proof The conditional distribution is f (x 1 x 2 ) = f (x 1, x 2 ) f (x 2 ) Since the distributions f (x 1, x 2 ) and f (x 2 ) are multivariate normal when the quotient is calculated we will have the quotient among the determinants and the difference between the exponents. We begin by calculating the exponents which are (x µ) V 1 (x µ) (x 2 µ 2 ) V 1 22 (x 2 µ 2 ) (9.30) We are going to decompose the first quadratic form in the terms corresponding to x 1 and x 2. To do this, we partition (x µ) as (x 1 µ 1, x 2 µ 2 ), we partition V as in (9.27), and we use the equation of the inverse of a partitioned matrix (see section 2.2.3). Calculating the product we get (x µ) V 1 (x µ) = (x 1 µ 1 ) B 1 (x 1 µ 1 ) (x 1 µ 1 ) B 1 V 12 V µ 2 ) (x 2 µ 2 ) V B 1 (x 1 µ 1 ) + (x 2 µ 2 ) V µ 2 ) + (x 2 µ 2 ) A 1 22 A 21B 1 A 12 A 1 22 (x 2 µ 2 ) where B = ( V 11 V 12 V22 1 V 21), which is the equation used in (9.29). The fourth term of this equation is cancelled out in the difference (9.30), and the other four can be grouped as ( x1 µ 1 V 12 V22 1 (x 2 µ 2 ) ) B 1 ( x 1 µ 1 V 12 V22 1 (x 2 µ 2 ) ). This equation shows that the exponent of the distribution corresponds to a normal variable with a mean vector and covariance matrix equal to those shown in (9.28) and (9.29). We are going to prove that the quotient of determinants leads to the same covariance matrix as well. We use, according to section 2.3.5, V = V 22 V 11 V 12 V22 1 V 21 = V 22 B. Since we have V 22 in the denominator, the quotient provides the single term B. Finally we end up with the term (2π) p/2 p2/2 =(2π) p1/2. In conclusion, the resulting equation is the density function of the multivariate normal distribution of order p 1, with the mean vector given by (9.28) and covariance matrix given by (9.29). 24

25 Example: The distribution of spending of a group of consumers on two products (x, y) follows a bivariate normal distribution with respective means of 2 and 3 Euros and covariance matrix: [ 1 ] 0, 8 0, 8 2 Calculate the conditional distribution of spending on product y for consumers that spend 4 Euros on product x. The conditional distribution f (y/x = 4) = f (4, y) /f x (4). The marginal distribution of x is normal, N(2, 1). The terms of the joint distribution f(x, y) are: V 1/2 = ( σ1σ ( 1 ϱ 2 )) 1/2 = σ1 σ 2 1 ϱ 2 V 1 1 = σ1 2σ2 2 (1 σ2 2 ϱσ 2 σ 1 ϱ2 ) ϱσ 2 σ 1 σ1 2 where in this example σ1 2 = 1, σ2 2 = 2, and ϱ = 0, 8/ 2 = 0, 566. The exponent of the bivariate normal f(x, y) is: { (x ) 1 2 ( ) } µ1 y 2 µ2 2 (1 ϱ 2 + 2ϱ (x µ 1) (y µ 2 ) = A ) σ 1 σ 2 σ 1 σ 2 2 As a result, we have: f (y x) = = ( 1 σ 1 σ 2 1 ϱ 2) (2π) 1 exp { A } 2 { ( ) } 2 = σ1 1 (2π) 1 exp 1 x µ1 2 σ exp { 12 } σ B 2 1 ϱ 2 2π where the resulting term in the exponent, which we denote as B, is: B = = B = 1 1 ϱ ϱ 2 1 σ 2 2 (1 ϱ2 ) [ (x ) 2 ( ) µ1 y 2 µ2 + 2ϱ (x µ ( ) 1) (y µ 2 ) x 2 µ1 ( 1 ϱ 2 )] σ 1 σ 2 σ 1 σ 2 σ 1 [ (y ) 2 ( ) ] µ2 x 2 2 µ1 ϱ σ 2 [ y σ 1 ( µ 2 + ϱ σ 2 σ 1 (x µ 1 ) )] 2 This exponent corresponds to a normal distribution with mean: E [y x] = µ 2 + ϱ σ 2 σ 1 (x µ 1 ) 25

26 Figure 9.3: Density of the standard bivariate normal. which is the regression line, and standard deviation: For x = 4: DT [y x] = σ 2 1 ϱ 2 ( ) 0, 8 2 E [y 4] = 3 + (4 2) = 4, Since there is a positive correlation of between spending on both products the consumers who spend more on one also spend, on average, more on the other. The variability of the conditional distribution is: V ar [y 4] = σ2 2 ( 1 ϱ 2 ) = 2 (1 0, 32) = 1, 36 and will be less than the variance of the marginal distribution because conditioning gives us more information. 9.7 ELLIPTICAL DISTRIBUTIONS The multivariate normal distribution is a particular case in a family of distributions frequently used in multivariate analysis: elliptical distributions. As an introduction, we will first look at the simplest case: spherical distributions Spherical distributions We say that a vector variable x = (x 1,..., x p ) follows a spherical distribution if its density function depends on the variable only for the Euclidean distance x x = p i=1 x2 i. This property implies that: 1. The shape of equiprobability of the distribution is a sphere whose center is in the origin. 2. The distribution is invariant when rotated. Thus, if we define a new variable y = Cx, where C is an orthogonal matrix, the density of the variable y is the same as that of x. One example of spherical distribution, studied in the above section, is the standard multivariate normal density function, whose density is f(x) = 1 (2π) p/2 exp( 1 2 x x) = Π p 1 i=1 (2π) 1/2 exp( 1 2 x2 i ) This density is shown in the bivariate case in Figure 9.3. Then the two scalar variables which form the vector are independent. This property is 26

27 Figure 9.4: Density of the bivariate double exponential. characteristic of the normal distribution since, in general, the components of spherical distributions are dependent. For example, the Cauchy multivariate distribution given by p+1 Γ( 2 f(x) = ) π (p+1)/2 (1 + x x) (p+1)/2 (9.31) has heavier tails than the normal distribution, as in the univariate case. It is easy to prove that this function cannot be written as a product of Cauchy univariate distributions due to the fact that while its components are uncorrelated, they are not independent. Another important spherical distribution is the double exponential. In the bivariate case, this distribution has a density function f(x) = 1 2π exp( x x) and although the density function may appear to be similar to the normal, its tails are much heavier. This distribution can be seen in Figure Elliptical distributions If variable x has a spherical distribution and A is a square matrix of dimension p and m is a vector of dimension p, then the variable y = m + Ax (9.32) has an elliptical distribution. Since a spherical variable has a mean of zero and covariance matrix ci, it follows that an elliptical variable has a mean m and a covariance matrix V =caa. Elliptical distributions have the following properties: 1. Their density function depends on the variable through the Mahalanobis distance: (y m) V 1 (y m) 2. The curves of equiprobability are ellipsoids with the center at point m. The general multivariate normal distribution is the most well known of the elliptical distributions. Another member of this family is the multivariate t distribution. Although there are different versions of this distribution, the most common is constructed by dividing each component of a vector of multivariate normal variables N p (m, V) by the same scalar variable: the 27

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution DiscreteUniform(n). Discrete. Rationale Equally likely outcomes. The interval 1, 2,..., n of

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Expectation. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Expectation. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean,

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

Random Variables and Expectations

Random Variables and Expectations Inside ECOOMICS Random Variables Introduction to Econometrics Random Variables and Expectations A random variable has an outcome that is determined by an experiment and takes on a numerical value. A procedure

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Chp 4. Expectation and Variance

Chp 4. Expectation and Variance Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix: Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

3 Continuous Random Variables

3 Continuous Random Variables Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

STA2603/205/1/2014 /2014. ry II. Tutorial letter 205/1/

STA2603/205/1/2014 /2014. ry II. Tutorial letter 205/1/ STA263/25//24 Tutorial letter 25// /24 Distribution Theor ry II STA263 Semester Department of Statistics CONTENTS: Examination preparation tutorial letterr Solutions to Assignment 6 2 Dear Student, This

More information

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Stat 5101 Notes: Algorithms (thru 2nd midterm) Stat 5101 Notes: Algorithms (thru 2nd midterm) Charles J. Geyer October 18, 2012 Contents 1 Calculating an Expectation or a Probability 2 1.1 From a PMF........................... 2 1.2 From a PDF...........................

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices. 3d scatterplots You can also make 3d scatterplots, although these are less common than scatterplot matrices. > library(scatterplot3d) > y par(mfrow=c(2,2)) > scatterplot3d(y,highlight.3d=t,angle=20)

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

Lectures on Elementary Probability. William G. Faris

Lectures on Elementary Probability. William G. Faris Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

III - MULTIVARIATE RANDOM VARIABLES

III - MULTIVARIATE RANDOM VARIABLES Computational Methods and advanced Statistics Tools III - MULTIVARIATE RANDOM VARIABLES A random vector, or multivariate random variable, is a vector of n scalar random variables. The random vector is

More information

THE QUEEN S UNIVERSITY OF BELFAST

THE QUEEN S UNIVERSITY OF BELFAST THE QUEEN S UNIVERSITY OF BELFAST 0SOR20 Level 2 Examination Statistics and Operational Research 20 Probability and Distribution Theory Wednesday 4 August 2002 2.30 pm 5.30 pm Examiners { Professor R M

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics,

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2015 Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

Notes for Math 324, Part 19

Notes for Math 324, Part 19 48 Notes for Math 324, Part 9 Chapter 9 Multivariate distributions, covariance Often, we need to consider several random variables at the same time. We have a sample space S and r.v. s X, Y,..., which

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

ECON Fundamentals of Probability

ECON Fundamentals of Probability ECON 351 - Fundamentals of Probability Maggie Jones 1 / 32 Random Variables A random variable is one that takes on numerical values, i.e. numerical summary of a random outcome e.g., prices, total GDP,

More information

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS Recall that a continuous random variable X is a random variable that takes all values in an interval or a set of intervals. The distribution of a continuous

More information

Elements of Probability Theory

Elements of Probability Theory Short Guides to Microeconometrics Fall 2016 Kurt Schmidheiny Unversität Basel Elements of Probability Theory Contents 1 Random Variables and Distributions 2 1.1 Univariate Random Variables and Distributions......

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

4. CONTINUOUS RANDOM VARIABLES

4. CONTINUOUS RANDOM VARIABLES IA Probability Lent Term 4 CONTINUOUS RANDOM VARIABLES 4 Introduction Up to now we have restricted consideration to sample spaces Ω which are finite, or countable; we will now relax that assumption We

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Lecture Notes Part 2: Matrix Algebra

Lecture Notes Part 2: Matrix Algebra 17.874 Lecture Notes Part 2: Matrix Algebra 2. Matrix Algebra 2.1. Introduction: Design Matrices and Data Matrices Matrices are arrays of numbers. We encounter them in statistics in at least three di erent

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 Prof. Steven Waslander p(a): Probability that A is true 0 pa ( ) 1 p( True) 1, p( False) 0 p( A B) p( A) p( B) p( A B) A A B B 2 Discrete Random Variable X

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information