MULTIVARIATE PROBABILITY DISTRIBUTIONS

Similar documents
Large Sample Properties of Estimators in the Classical Linear Regression Model

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Continuous Random Variables

Joint Distribution of Two or More Random Variables

1 Probability theory. 2 Random variables and probability theory.

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

More on Distribution Function

Bivariate distributions

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

3. Probability and Statistics

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

BASICS OF PROBABILITY

Mathematics 426 Robert Gross Homework 9 Answers

STA 256: Statistics and Probability I

3-1. all x all y. [Figure 3.1]

Notes for Math 324, Part 19

Review of Probability Theory

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Lecture 2: Review of Probability

3 Multiple Discrete Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Introduction to Machine Learning

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Random Variables and Expectations

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Jointly Distributed Random Variables

2 (Statistics) Random variables

Elements of Probability Theory

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

18.440: Lecture 28 Lectures Review

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Discrete Probability Refresher

Multivariate Distributions (Hogg Chapter Two)

ECE Lecture #9 Part 2 Overview

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Stat 5101 Notes: Algorithms

5 Operations on Multiple Random Variables

1 Random variables and distributions

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Expectation of Random Variables

STAT/MATH 395 PROBABILITY II

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

ENGG2430A-Homework 2

18 Bivariate normal distribution I

Chp 4. Expectation and Variance

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Chapter 4. Chapter 4 sections

Preliminary Statistics. Lecture 3: Probability Models and Distributions

18.440: Lecture 28 Lectures Review

Homework 4 Solution, due July 23

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Class 8 Review Problems solutions, 18.05, Spring 2014

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Lecture 2: Repetition of probability theory and statistics

Random Variables and Their Distributions

conditional cdf, conditional pdf, total probability theorem?

Chapter 4 continued. Chapter 4 sections

Multivariate probability distributions and linear regression

Expectation and Variance

Bivariate Distributions

EE4601 Communication Systems

Quick Tour of Basic Probability Theory and Linear Algebra

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Review of Probability. CS1538: Introduction to Simulations

STT 441 Final Exam Fall 2013

STOR Lecture 16. Properties of Expectation - I

Lecture 1: August 28

Recitation 2: Probability

University of Regina. Lecture Notes. Michael Kozdron

ECON Fundamentals of Probability

More than one variable

Random variables (discrete)

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Stochastic Models of Manufacturing Systems

ACM 116: Lectures 3 4

Algorithms for Uncertainty Quantification

Topic 3: The Expectation of a Random Variable

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

4. Distributions of Functions of Random Variables

01 Probability Theory and Statistics Review

Probability and Distributions

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Week 12-13: Discrete Probability

18.440: Lecture 26 Conditional expectation

MAS113 Introduction to Probability and Statistics. Proofs of theorems

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom

Lecture 19: Properties of Expectation

Transcription:

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined on this sample space. We will assign an indicator random variable to the result of tossing the coin. If it comes up heads we will assign a value of one, and if it comes up zero we will assign a value of zero. Consider the following random variables. X : The number of dots appearing on the die. X : The sum of the number of dots on the die and the indicator for the coin. X : The value of the indicator for tossing the coin. X 4 : The product of the number of dots on the die and the indicator for the coin. There are twelve sample points associated with this experiment where the first element of the pair is the number on the die and the second is whether the coin comes up heads or tails. E : H E : H E : H E 4 : 4H E 5 : 5H E : H E 7 : T E 8 : T E 9 : T E : 4T E : 5T E : T Random variable X has six possible outcomes, each with probability. Random variable X has two possible outcomes, each with probability. Consider the values of X for each of the sample points. The possible outcomes and the probabilities for X are as follows: TABLE. Probability of X Value of Random Variable Probability / / / 4 / 5 / / 7 / The possible outcomes and the probabilities for X 4 are as follows: Date: January,.

MULTIVARIATE PROBABILITY DISTRIBUTIONS TABLE. Probability of X 4 Value of Random Variable Probability / / / / 4 / 5 / /.. Bivariate Random Variables. Now consider the intersection of X and X. We call this intersection a bivariate random variable. For a general bivariate case we write this as P (X x, X x ). We can write the probability distribution in the form of a table as follows for the above example. TABLE. Joint Probability of X and X X 4 5 7 X 4 5 For the example, P (X, X ), which is the probability of sample point E 9.. PROBABILITY DISTRIBUTIONS FOR DISCRETE MULTIVARIATE RANDOM VARIABLES.. Definition. If X and X be discrete random variables, the function given by p(x, x ) P (X x, X x ) for each pair of values of (x, x ) within the range of X and X is called the joint (or bivariate) probability distribution for X and X. Specifically we write p(x, x ) P (X x, X x ), < x <, < x <. ()

MULTIVARIATE PROBABILITY DISTRIBUTIONS In the single-variable case, the probability function for a discrete random variable X assigns non-zero probabilities to a countable number of distinct values of X in such a way that the sum of the probabilities is equal to. Similarly, in the bivariate case the joint probability function p(x, x ) assigns non-zero probabilities to only a countable number of pairs of values (x, x ). Further, the non-zero probabilities must sum to... Properties of the Joint Probability (or Density) Function. Theorem. If X and X are discrete random variables with joint probability function p(x, x ), then (i) p(x, x ) for all x, x. (ii) p(x, x ), where the sum is over all values (x, x ) that are assigned non-zero x, x probabilities. Once the joint probability function has been determined for discrete random variables X and X, calculating joint probabilities involving X and X is straightforward... Example. Roll a red die and a green die. Let There are points in the sample space. X number of dots on the red die X number of dots on the green die TABLE 4. Possible Outcomes of Rolling a Red Die and a Green Die. (First number in pair is number on red die.) Green 4 5 Red 4 5 4 5 4 5 4 4 4 4 4 4 4 5 4 5 5 5 5 5 4 5 5 5 4 5 The probability of (, ) is. The probability of (, ) is also. Now consider P ( X, X ). This is given as P ( X, X ) p(, ) + p(, ) + p(, ) + p(, ) 4 9

4 MULTIVARIATE PROBABILITY DISTRIBUTIONS.4. Example. Consider the example of tossing a coin and rolling a die from section. Now consider P ( X, X ). This is given as P ( X 4, X 5) p(, ) + p(, 4) + p(, 5) + p(, ) + p(, 4) + p(, 5) + p(4, ) + p(4, 4) + p(4, 5) 5.5. Example. Two caplets are selected at random from a bottle containing three aspirin, two sedative, and four cold caplets. If X and Y are, respectively, the numbers of aspirin and sedative caplets included among the two caplets drawn from the bottle, find the probabilities associated with all possible pairs of values of X and Y? The possible pairs are (, ), (, ), (, ), (, ), (, ), and (, ). To find the probability associated with (, ), for example, observe that we are concerned with the event of getting one of the three aspirin caplets, none of the two sedative caplets, and hence, one of the four cold caplets. The number of ways in which this can be done is ( )( )( ) 4 and the total number of ways in which two of the nine caplets can be selected is ( ) 9. Since those possibilities are all equally likely by virtue of the assumption that the selection is random, it follows that the probability associated with (, ) is. Similarly, the probability associated with (, ) is ( )( )( 4 ) and, continuing this way, we obtain the values shown in the following table: TABLE 5. Joint Probability of Drawing Aspirin (X ) and Sedative Caplets (Y ). y 9 x We can also represent this joint probability distribution as a formula

MULTIVARIATE PROBABILITY DISTRIBUTIONS 5 p(x, y) ( )( ) 4 x y)( x y, x,, ; y,, ; (x + y). DISTRIBUTION FUNCTIONS FOR DISCRETE MULTIVARIATE RANDOM VARIABLES.. Definition of the Distribution Function. If X and X are discrete random variables, the function given by F (x, x ) P [X x, X x ] u x u x p(u, u ) < x < < x < where p(u, u ) is the value of the joint probability function of X and X at (u, u ) is called the joint distribution function, or the joint cumulative distribution of X and X... Examples.... Example. Consider the experiment of tossing a red and green die where X is the number of the red die and X is the number on the green die. Now find F (, ) P (X, X ). This is given by summing as in the definition (equation ). F (, ) P [X, X ] u p(u, u ) u p(, ) + p(, ) + p(, ) + p(, ) + p(, ) + p(, ) + + + + + ()... Example. Consider Example from Section. The joint probability distribution is given in Table 5 which is repeated here for convenience. TABLE 5. Joint Probability of Drawing Aspirin (X ) and Sedative Caplets (Y ). y 9 The joint probability distribution is x

MULTIVARIATE PROBABILITY DISTRIBUTIONS p(x, y) ( )( ) 4 x y)( x y, x,, ; y,, ; (x + y) For this problem find F (, ) P (X, Y ). This is given by F (, ) P [X, Y ] u p(u, u ) u p(, ) + p(, ) + p(, ) + p(, ) + p(, ) + p(, ) + 9 + + + + + 8 + + + 4. PROBABILITY DISTRIBUTIONS FOR CONTINUOUS BIVARIATE RANDOM VARIABLES 4.. Definition of a Joint Probability Density Function. A bivariate function with values f(x, x ) defined over the x x -plane is called a joint probability density function of the continuous random variables X and X if, and only if, P [(X, X ) A] A f(x, x ) dx dx for any region A the x x -plane () 4.. Properties of the Joint Probability (or Density) Function in the Continuous Case. Theorem. A bivariate function can serve as a joint probability density function of a pair of continuous random variables X and X if its values, f(x, x ), satisfy the conditions (i) f(x, x ) (ii) f(x, x )dx dx for < x <, < x < 4.. Example of a Joint Probability Density Function. Given the joint probability density function { x f(x, x ) x < x <, < x < elsewhere of the two random variables, X and X, find P [(X, X ) A], where A is the region {(x, x ) < x < 4, < x < }. We find the probability by integrating the double integral over the relevant region, i.e.,

MULTIVARIATE PROBABILITY DISTRIBUTIONS 7 P ( < X < 4, < X < ) Integrate the inner integral first. 4 4 4 f(x, x ) dx dx x x dx dx + x x dx dx 4 dx dx P ( < X < 4, < X < ) Now integrate the remaining integral 4 x x dx dx (x ) x dx ( ( () 4 ( () 4 54 4 x dx ) x ) dx ( ) ) 7 x dx 4 P ( < X < 4, < X < ) 54 54 4 x dx 8 x ( 54 8 ( 54 8 48 8 8 ) () ) () ( ) ( ) 54 8 9 ( ) 8

8 MULTIVARIATE PROBABILITY DISTRIBUTIONS This probability is the volume under the surface f(x, x ) x x and above the rectangular set {(x, x ) < x < 4, < x < } in the x x -plane. We can see this area in figure FIGURE. Probability that ( < X < 4, < X < ) 4.4. Definition of a Joint Distribution Function. If X and X are continuous random variables, the function given by F (x, x ) P [X x, X x ] x x f(u, u ) du du < x < < x < (4) where f(u, u ) is the value of the joint probability function of X and X at (u, u ) is called the joint distribution function, or the joint cumulative distribution of X and X. If the joint distribution function is continuous everywhere and partially differentiable with respect to x and x for all but a finite set of values then f(x, x ) wherever these partial derivatives exist. x x F (x, x ) (5)

4.5. Properties of the Joint Distribution Function. MULTIVARIATE PROBABILITY DISTRIBUTIONS 9 Theorem. If X and X are random variables with joint distribution function F (x, x ), then (i) F (, ) F (, x ) F (x, ) (ii) F (, ) (iii) If a < b and c < d, then F (a, c) < F (b, d) (iv) If a > x and b > x, then F (a, b) F (a, x ) F (x, b) + F (x, x ) Part (iv) follows because F (a, b) F (a, x ) F (x, b) + F (x, x ) P [x < X a, x < X b] Note also that F (, ) lim x lim F (x, x ) x implies that the joint density function f(x, x ) must be such that the integral of f(x, x ) over all values of (x, x ) is. 4.. Examples of a Joint Distribution Function and Density Functions. 4... Deriving a Distribution Function from a Joint Density Function. Consider a joint density function for X and X given by f(x, x ) { x + x < x <, < x < elsewhere This has a positive value in the square bounded by the horizontal and vertical axes and the vertical and horizontal lines at one. It is zero elsewhere. We will therefore need to find the value of the distribution function for five different regions: second, third and fourth quadrants, square defined by the vertical and horizontal lines at one, area between the vertical axis and a vertical line at one and above a horizontal line at one in the first quadrant, area between the horizontal axis and a horizontal line at one and to the right of a vertical line at one in the first quadrant, the area in the first quadrant not previously mentioned. This can be diagrammed as follows.

MULTIVARIATE PROBABILITY DISTRIBUTIONS II.5 I.75.5.5.5.5.5.75.5 III IV We find the distribution function by integrating the joint density function. If either x < or x <, it follows that For < x < and < x <, we get F (x, x ) for x > and < x <, we get F (x, x ) for < x < and x >, we get F (x, x ) and for x > and x > we get x x F (x, x ) x x F (x, x ) (s + t) ds dt x x (x + x ) (s + t)ds dt x (x + ) (s + t)ds dt x (x + ) (s + t) ds dt Because the joint distribution function is everywhere continuous, the boundaries between any two of these regions can be included in either one, and we can write

MULTIVARIATE PROBABILITY DISTRIBUTIONS for x or x x x (x + x ) for < x <, < x < F (x, x ) x (x + ) for x, < x < x (x + ) for < x <, x for x, x 4.7. Deriving a Joint Density Function from a Distribution Function. Consider two random variables X and X whose joint distribution function is given by { ( e x ) ( e x ) for x > and x > F (x, x ) elsewhere Partial differentiation yields x x F (x, x ) e (x +x ) For x > and x > and elsewhere we find that the joint probability density of X and X is given by { e (x +x ) for x > and x > f(x, x ) elsewhere 5. MULTIVARIATE DISTRIBUTIONS FOR CONTINUOUS RANDOM VARIABLES 5.. Joint Density of Several Random Variables. The k-dimensional random variable (X, X,..., X k ) is said to be a k-dimensional random variable if there exists a function f(,,..., ) such that xk xk x F (x, x,..., x k )... f(u, u,... u k ) du... du k () for all (x, x,..., x k ) where F (x, x, x,... ) P [X x, X x, X x,... ] The function f( ) is defined to be a joint probability density function. It has the following properties:... f(x, x,..., x k ) f(x, x,..., x k ) dx... dx k In order to make it clear the variables over which f is defined it is sometimes written f(x, x,..., x k ) f X, X,..., X k (x, x,..., x k ) (8) (7)

MULTIVARIATE PROBABILITY DISTRIBUTIONS. MARGINAL DISTRIBUTIONS.. Example Problem. Consider the example of tossing a coin and rolling a die from section. The probability of any particular pair, (x, x ) is given in the Table. TABLE. Joint and Marginal Probabilities of X and X. X 4 5 7 X 4 5 Notice that we have summed the columns and the rows and placed these sums at the bottom and right hand side of the table. The sum in the first column is the probability that X. The sum in the sixth row is the probability that X. Specifically the column totals are the probabilities that X will take on the values,,,..., 7. They are the values g(x ) p(x, x ) for x,,,..., 7 x In the same way, the row totals are the probabilities that X will take on the values in its space. Because these numbers are computed in the margin of the table, they are called marginal probabilities... Marginal Distributions for Discrete Random Variables. If X and X are discrete random variables and p(x, x ) is the value of their joint distribution function at (x, x ), the function given by g(x ) x p(x, x ) (9) for each x within the range of X is called the marginal distribution of X. Correspondingly, the function given by

MULTIVARIATE PROBABILITY DISTRIBUTIONS h(x ) x p(x, x ) () for each x within the range of X is called the marginal distribution of X... Marginal Distributions for Continuous Random Variables. If X and Y are jointly continuous random variables, then the functions f X ( ) and f Y ( ) are called the marginal probability density functions. The subscripts remind us that f X is defined for the random variable X. Intuitively, the marginal density is the density that results when we ignore any information about the random outcome Y. The marginal densities are obtained by integration of the joint density f X (x) f Y (y) f X, Y (x, y) dy f X, Y (x, y) dx In a similar fashion for a k-dimensional random variable X () f X (x ) f X (x )...... f(x, x,... ) dx dx... dx k f(x, x,... ) dx dx... dx k ().4. Example. Let the joint density of two random variables x and x be given by f(x, x ) { x e x x, x otherwise What are the marginal densities of x and x? First find the marginal density for x. f (x ) Now find the marginal density for x. x e x e x e x x e x dx

4 MULTIVARIATE PROBABILITY DISTRIBUTIONS f (x ) x e x dx x e x ( x e ) x e x.5. Example. Let the joint density of two random variables x and y be given by { (x + 4y) < x <, < y < f(x, y) otherwise What are the marginal densities of x and y? First find the marginal density for x. f X (x) (x + 4y) dy (xy + y ) (x + ) () Now find the marginal density for y. f Y (y) (x + ) (x + 4y) dx ( ) x + 4xy ( ) 4 + 8y () ( + 8y) 7. CONDITIONAL DISTRIBUTIONS 7.. Conditional Probability Functions for Discrete Distributions. We have previously shown that the conditional probability of A given B can be obtained by dividing the probability of the intersection by the probability of B, specifically,

MULTIVARIATE PROBABILITY DISTRIBUTIONS 5 P (A B) P (A B) P (B) Now consider two random variables X and Y. We can write the probability that X x and Y y as () P (X x Y y) P (X x, Y y) P (Y y) p (x, y) h(y) (4) provided P (Y y), where p(x, y) is the value of joint probability distribution of X and Y at (x, y) and h(y) is the value of the marginal distribution of Y at y. We can then define a conditional distribution of X given Y y as follows. If p(x, y) is the value of the joint probability distribution of the discrete random variables X and Y at (x, y) and h(y) is the value for the marginal distribution of Y at y, then the function given by p (x y) p (x, y) h(y) h(y) (5) for each x within the range of X, is called the conditional distribution of X given Y y. 7.. Example for discrete distribution. Consider the example of tossing a coin and rolling a die from section. The probability of any particular pair, (x, x ) is given in the following table where x is the value on the die and x is the sum of the number on the die and an indicator that is one if the coin is a head and zero otherwise. The data is in the Table is repeated here for convenience. Consider the probability that x given that x 4. We compute this as follows. For the example, P (X, X ), which is the probability of sample point E 9. p (x x ) p( 4) p (x, x ) h(x ) p(, 4) h(4) We can then make a table for the conditional probability function for x. We do the same for X given X in table 8. 7.. Conditional Distribution Functions for Continuous Distributions.

MULTIVARIATE PROBABILITY DISTRIBUTIONS TABLE. Joint and Marginal Probabilities of X and X. X 4 5 7 X 4 5 TABLE 7. Probability Function for X given X. X 4 5 7 X 4 5 7... Discussion. In the continuous case, the idea of a conditional distribution takes on a slightly different meaning than in the discrete case. If X and X are both continuous, P (X x X x ) is not defined because the probability of any one point is identically zero. It make sense however to define a conditional distribution function, i.e., P (X x X x )

MULTIVARIATE PROBABILITY DISTRIBUTIONS 7 TABLE 8. Probability Function for X given X. X 4 5 7 X 4 5 because the value of X is known when we compute the value the probability that X is less than some specific value. 7... Definition of a Continuous Distribution Function. If X and X are jointly continuous random variables with joint density function f(x, x ), then the conditional distribution function of X given X x is F (x x ) P (X x X x ) () We can obtain the unconditional distribution function by integrating the conditional one over x. This is done as follows. F (x ) F (x x )f X (x ) dx (7) We can also find the probability that X is less than x is the usual fashion as F (x ) x f X (t ) dt (8) But the marginal distribution inside the integral is obtained by integrating the joint density over the range of x. Specifically, This implies then that f X (t ) f X X (t, x ) dx (9) F (x ) x f X X (t, x ) dt dx ()

8 MULTIVARIATE PROBABILITY DISTRIBUTIONS Now compare the integrand in equation with that in equation 7 to conclude that F (x x ) f X (x ) F (x x ) x x f X X (t, x ) dt f X X (t, x ) dt f X (x ) We call the integrand in the second line of () the conditional density function of X given X x. We denote it by f(x x ) orf X X (x x ). Specifically Let X and X be jointly continuous random variables with joint probability density f X X (x, x ) and marginal densities f X (x ) and f X (x ), respectively. For any x such that f X (x ) >, the conditional probability density function of X given X x, is defined to be () And similarly f X X (x x ) f X X (x, x ) f X (x ) f(x, x ) f(x ) f X X (x x ) f X X (x, x ) f X (x ) f(x, x ) f(x ) () () 7.4. Example. Let the joint density of two random variables x and y be given by { (x + 4y) < x <, < y < f(x, y) otherwise The marginal density of x is f X (x) (x + ) while the marginal density of y is f y (y) ( + 8y). Now find the conditional distribution of x given y. This is given by f X Y (x y) f(x, y) f(y) (x + 4y) (8y + ) (x + 4y) ( + 8y) for < x < and < y <. Now find the probability that X given that y. First determine the density function when y as follows

MULTIVARIATE PROBABILITY DISTRIBUTIONS 9 f(x, y) f(y) (x + 4y) (8y + ) ( x + 4 ( ) ) ( 8 ( ) ) + Then (x + ) (4 + ) (x + ) P ( X Y ) (x + ) dx ( ) x + x ( ) + + 5 8. INDEPENDENT RANDOM VARIABLES 8.. Discussion. We have previously shown that two events A and B are independent if the probability of their intersection is the product of their individual probabilities, i.e. P (A B) P (A)P (B) (4) In terms of random variables, X and Y, consistency with this definition would imply that P (a X b, c Y d) P (a X b) P (c Y d) (5) That is, if X and Y are independent, the joint probability can be written as the product of the marginal probabilities. We then have the following definition. Let X have distribution function F X (x), Y have distribution function F Y (y), and X and Y gave joint distribution function F (x, y). Then X and Y are said to be independent if, and only if, F (x, y) F X (X)F Y (y) () for every pair of real numbers (x, y). If X and Y are not independent, they are said to be dependent. 8.. Independence Defined in Terms of Density Functions.

MULTIVARIATE PROBABILITY DISTRIBUTIONS 8... Discrete Random Variables. If X and Y are discrete random variables with joint probability density function p(x, y) and marginal density functions p X (x) and p Y (y), respectively, then X and Y are independent if, and only if p X,Y (x, y) p X (x)p Y (y) p(x)p(y) (7) for all pairs of real numbers (x, y). 8... Continuous Bivariate Random Variables. If X and Y are continuous random variables with joint probability density function f(x, y) and marginal density functions f X (x) and f Y (y), respectively then X and Y are independent if and only if f X,Y (x, y) f X (x)f Y (y) f(x)f(y) (8) for all pairs of real numbers (x, y). 8.. Continuous Multivariate Random Variables. In a more general context the variables X, X,..., X k are independent if, and only if k f X, X,..., X k (x, x,..., x k ) f Xi (x i ) i f X (x )f X (x )... f Xk (x k ) (9) f(x )f(x )... f(x k ) In other words two random variables are independent if the joint density is equal to the product of the marginal densities. 8.4. Examples. 8.4.. Example Rolling a Die and Tossing a Coin. Consider the previous example where we rolled a die and tossed a coin. X is the number on the die, X is the number of the die plus the value of the indicator on the coin (H ). Table 8 is repeated here for convenience. For independence p(x, y) p(x)p(y) for all values of x and x. To show that the variables are not independent, we only need show that p(x, y) p(x)p(y) Consider p(, ). If we multiply the marginal probabilities we obtain ( ) ( )

MULTIVARIATE PROBABILITY DISTRIBUTIONS TABLE 8. Probability Function for X given X. X 4 5 7 X 4 5 8.4.. Example A Continuous Multiplicative Joint Density. Let the joint density of two random variables x and x be given by f(x x ) { x e x x, x otherwise The marginal density for x is given by f (x ) x e x dx x e x e x e x

MULTIVARIATE PROBABILITY DISTRIBUTIONS The marginal density for x is given by f (x ) x e x dx x e x ( x e ) x e x It is clear the joint density is the product of the marginal densities. 8.4.. Example. Let the joint density of two random variables x and y be given by x log[y] f(x, y) ( + log[] log[4] ) x, y 4 otherwise First find the marginal density for x. x log[y] f X (x) ( + log[] log[4] ) dy x y ( log[y] ) ( + log[] log[4] ) 4 x 4 ( log[4] ) + x ( log[] ) ( + log[] log[4] ) x ( ( log[] ) 4 ( log[4] )) ( + log[] log[4] ) x( log[] 4 log[4] + 4 ) ( + log[] log[4] ) x( log[] 4 log[4] + ) ( + log[] log[4] ) x ( ( + log[] log[4] )) x Now find the marginal density for y. ( + log[] log[4] )

MULTIVARIATE PROBABILITY DISTRIBUTIONS x log[y] f Y (y) ( + log[] log[4] ) dx x log[y] ( + log[] log[4] ) log[y] + ( + log[] log[4] ) log[y] ( + log[] log[4] ) It is clear the joint density is the product of the marginal densities. 8.4.4. Example 4. Let the joint density of two random variables X and Y be given by 5 x + y x, y f(x, y) otherwise First find the marginal density for x. f X (x) ( 5 x + ) y dy ( 5 x y + ) y ( 5 x + ) 5 x + 5 Now find the marginal density for y.

4 MULTIVARIATE PROBABILITY DISTRIBUTIONS f Y (y) ( 5 x + ) y dx ( 5 x + ) xy ( 5 + ) y ( 5 + ) y The product of the marginal densities is not the joint density. 8.4.5. Example 5. Let the joint density of two random variables X and Y be given by f(x, y) Find the marginal density of X. f X (x) { e (x+y) x y, y otherwise x e (x+y) dy e (x+y) x e (x+ ) + e x e x The marginal density of Y is obtained as follows. ( e (x+x)) f Y (y) y e (x+y) dx e (x+y) y e (y+y) e y + e y e y ( e y) ( e (+y))

MULTIVARIATE PROBABILITY DISTRIBUTIONS 5 We can show that this is a proper density function by integrating it over the range of x and y. x e (x+y) dy dx [ e (x+y) ] dx e x dx e x e [ e ] + x Or in the other order as follows. y e (x+y) dx dy [ e (x+y) y ] dy [ e y e y] dy e y [ e y] [ e + ] [ e + ] [ + ] [ + ]

MULTIVARIATE PROBABILITY DISTRIBUTIONS 8.5. Separation of a Joint Density Function. 8.5.. Theorem 4. Theorem 4. Let X and X have a joint density function f(x, x ) that is positive if, and only if, a x b and c x d, for constants a, b, c and d; and f(x, x ) otherwise. Then X and X are independent random variables if, and only if f(x, x ) g(x )h(x ) where g(x ) is a non-negative function of x alone and h(x ) is a non-negative function of x alone. Thus if we can separate the joint density into two multiplicative terms, one depending on x alone and one on x alone, we know the random variables are independent without showing that these functions are actually the marginal densities. 8.5.. Example. Let the joint density of two random variables x and y be given by { 8x x f(x, y), y otherwise We can write f(x, y) as g(x)h(y), where { x x g(x) otherwise { 8 y h(y) otherwise These functions are not density functions because they do not integrate to one. / x dx / x 8 8 dy 8y 8 The marginal densities as defined below do sum to one. { 8x x f X (x) otherwise { y f Y (y) otherwise 9.. Definition. 9. EXPECTED VALUE OF A FUNCTION OF RANDOM VARIABLES

MULTIVARIATE PROBABILITY DISTRIBUTIONS 7 9... Discrete Case. Let X (X, X,..., X k ) be a k-dimensional discrete random variable with probability function p(x, x,..., x k ). Let g(,,..., ) be a function of the k random variables (X, X,..., X k ). Then the expected value of g(x, X,..., X k ) is E [ g(x, X,..., X k ) ] x k g(x,..., x k )p(x, x,..., x k ) () x k x x 9... Continuous Case. Let X (X, X,..., X k ) be a k-dimensional random variable with density f(x, x,..., x k ). Let g(,,..., ) be a function of the k random variables (X, X,..., X k ). Then the expected value of g(x, X,..., X k ) is E [g(x, X..., X k )]... x k x k... x x g(x,..., x k )f X,..., X k (x,..., x k )dx... dx k g(x,..., x k )f X,..., X k (x,..., x k ) dx... dx k if the integral is defined. Similarly, if g(x) is a bounded real function on the interval [a, b] then () E(g(X)) b a g(x) df (x) b a g df () where the integral is in the sense of Lebesque and can be loosely interpreted asf(x) dx. Consider as an example g(x,..., x k ) x i. Then E [g(x,..., X k )] E[X i ]... x i f(x,..., x k ) dx... dx k () x i f Xi (x i ) dx i because integration over all the other variables gives the marginal density of x i. 9.. Example. Let the joint density of two random variables x and x be given by { x e x x, x f(x x ) otherwise The marginal density for x is given by

8 MULTIVARIATE PROBABILITY DISTRIBUTIONS f (x ) The marginal density for x is given by f (x ) x e x e x e x x e x dx x e x dx x e x ( x e ) x e x We can find the expected value of X by integrating the joint density or the marginal density. First with the joint density. E [X ] x x e x dx dx Consider the inside integral first. We will need a u dv substitution to evaluate the integral. Let then u x x and dv e x dx du x dx and v e x Then x x e x dx x x e x x e x dx x x e x + x e x dx Now integrate with respect to x. + x e x x

MULTIVARIATE PROBABILITY DISTRIBUTIONS 9 E [X ] x dx x Now find it using the marginal density of x. Integrate as follows: E [X ] x e x dx We will need to use a u dv substitution to evaluate the integral. Let u x and dv e x dx then du dx and v e x Then x e x dx x e x e x dx x e x + e x dx + e x e e + We can likewise show that the expected value of x is. Now consider E [x x ]. We can obtain it as E [X X ] x x e x dx dx Consider the inside integral first. We will need a u dv substitution to evaluate the integral. Let then u x x and dv e x dx du x dx and v e x

MULTIVARIATE PROBABILITY DISTRIBUTIONS Then x x e x dx x x e x x e x dx x x e x + x e x dx Now integrate with respect to x. + x e x x 9.. Properties of Expectation. 9... Constants. Theorem 5. Let c be a constant. Then E [X X ] x dx x E[c] x c c x y cf(x, y) dy dx y f(x, y) dy dx (4) 9... Theorem. Theorem. Let g(x, X ) be a function of the random variables X and X and let a be a constant. Then E[ag(X, X )] ag(x, x )f(x, x ) dx dx x x a g(x, x )fx, x ) dx dx x x ae[g(x, X )] (5)

MULTIVARIATE PROBABILITY DISTRIBUTIONS 9... Theorem. Theorem 7. Let X and Y denote two random variables defined on the same probability space and let f(x, y) be their joint density. Then E[aX + by ] y a + b y y x (ax + by)f(x, y) dx dy x x xf(x, y) dx dy yf(x, y) dx dy ae[x] + be[y ] () In matrix notation we can write this as [ ] [ ] x E(x ) E[a a ] [a x a ] a E(x ) µ + a µ (7) 9..4. Theorem. Theorem 8. Let X and Y denote two random variables defined on the same probability space and let g (X, Y ), g (X, Y ), g (X, Y ),..., g k (X, Y ) be functions of (X, Y ). Then E[g (X, Y ) + g (X, Y ) + + g k (X, Y )] 9..5. Independence. E[g (X, Y )] + E[g (X, Y )] + + E[g k (X, Y )] (8) Theorem 9. Let X and X be independent random variables and g(x ) and h(x ) be functions of X and X, respectively. Then provided that the expectations exist. E [g(x )h(x )] E [g(x )] E [h(x )] (9) Proof: Let f(x, x ) be the joint density of X and X. The product g(x )h(x ) is a function of X and X. Therefore we have

MULTIVARIATE PROBABILITY DISTRIBUTIONS E [ g(x )h(x ) ] g(x )h(x )f(x, x ) dx dx x x x g(x )h(x )f X (x )f X (x ) dx dx x [ ] g(x )f X (x ) x h(x )f X (x ) dx x dx g(x )f X (x ) ( E [h(x )] ) dx x E [h(x )] g(x )f X (x ) dx x E [h(x )] E [g(x )] (4). VARIANCE, COVARIANCE AND CORRELATION.. Variance of a Single Random Variable. The variance of a random variable X with mean µ is given by [ (X var(x) σ ) ] E E(X) E [ (X µ) ] (x µ) f(x) dx [ ] x f(x) dx xf(x)dx E(x ) E (x) (4) The variance is a measure of the dispersion of the random variable about the mean... Covariance.... Definition. Let X and Y be any two random variables defined in the same probability space. The covariance of X and Y, denoted cov[x, Y ] or σ X, Y, is defined as

MULTIVARIATE PROBABILITY DISTRIBUTIONS cov[x, Y ] E [(X µ X )(Y µ Y )] E[XY ] E[µ X Y ] E[Xµ Y ] + E[µ Y µ X ] E[XY ] E[X]E[Y ] [ xyf(x, y) dx dy [ xyf(x, y) dx dy xf(x, y) dx dy xf X (x, y) dx yf Y (x, y) dy ] yf(x, y) dx dy ] The covariance measures the interaction between two random variables, but its numerical value is not independent of the units of measurement of X and Y. Positive values of the covariance imply that X increases when Y increases; negative values indicate X decreases as Y decreases. (4)... Examples. (i) Let the joint density of two random variables x and x be given by { x e x x, x f(x x ) otherwise We showed in Example 9. that E [X ] E [X ] E [X X ] The covariance is then given by cov[x, Y ] E[XY ] E[X]E[Y ] ( ) () (ii) Let the joint density of two random variables x and x be given by { f(x x ) x x, x otherwise First compute the expected value of X X as follows.

4 MULTIVARIATE PROBABILITY DISTRIBUTIONS E [X X ] 4 x x dx dx ( 8 x x 8 8 x dx 4 9 x dx 8 x 8 ) dx Then compute expected value of X as follows E [X ] ( x dx dx x 4 dx dx x ) dx

MULTIVARIATE PROBABILITY DISTRIBUTIONS 5 Then compute the expected value of X as follows. E [X ] x x dx dx ( x x 4 x dx x dx x ) dx 9 The covariance is then given by cov[x, Y ] E[XY ] E[X]E[Y ] ( ) ( ) 4 (iii) Let the joint density of two random variables x and x be given by f(x x ) { 8 x x x otherwise

MULTIVARIATE PROBABILITY DISTRIBUTIONS First compute the expected value of X X as follows. E [X X ] x 8 x x dx dx ( ) 4 x x dx x ( 4 4 x ) 4 x4 dx ( x ) 8 x4 dx ( x ) 4 x5 4 4 4 5 5 Then compute expected value of X as follows E [X ] x 8 x dx dx ( x 8 x x 8 x dx x4 48 ) dx

MULTIVARIATE PROBABILITY DISTRIBUTIONS 7 Then compute the expected value of X as follows. E [X ] x 8 x x dx dx ( ) x x dx x ( x ) x dx ( 4 x ) x dx ( 8 x ) 4 x4 8 48 4 9 4 48 4 48 4 4 The covariance is then given by cov[x, Y ] E[XY ] E[X]E[Y ] ( ) ( ) 5 4 5 9 8 48 4 45 4 4.. Correlation. The correlation coefficient, denoted by ρ[x, Y ], or ρ X, Y of random variables X and Y is defined to be ρ X, Y cov[x, Y ] σ X σ Y (4) provided that cov[x, Y ], σ X and σ Y exist, and σ X, σ Y are positive. The correlation coefficient between two random variables is a measure of the interaction between them. It also has the property of being independent of the units of measurement and being bounded between negative one and one. The sign of the correlation coefficient is the same as the

8 MULTIVARIATE PROBABILITY DISTRIBUTIONS sign of the covariance. Thus ρ > indicates that X increases as X increases and ρ indicates perfect correlation, with all the points falling on a straight line with positive slope. If ρ, there is no correlation and the covariance is zero..4. Independence and Covariance..4.. Theorem. Theorem. If X and Y are independent random variables, then Proof: We know from equation 4 that cov[x, Y ]. (44) cov[x, Y ] E[XY ] E[X]E[Y ] (45) We also know from equation 9 that if X and Y are independent, then Let g(x) X and h(y ) Y to obtain Substituting into equation 45 we obtain E [g(x)h(y )] E [g(x)] E [h(y )] (4) E [XY ] E [X] E [Y ] (47) cov[x, Y ] E[X]E[Y ] E[X]E[Y ] (48) The converse of Theorem is not true, i.e., cov[x, Y ] does not imply X and Y are independent..4.. Example. Consider the following discrete probability distribution. x - - 5 x 8 5 5 8 5

MULTIVARIATE PROBABILITY DISTRIBUTIONS 9 These random variables are not independent because the joint probabilities are not the product of the marginal probabilities. For example p X X [, ] p X ( )p X ( ) ( ) ( ) 5 5 5 5 Now compute the covariance between X and X. First find E[X ] as follows ( ) ( ) ( ) 5 5 E[X ] ( ) + () + () Similarly for the expected value of X. Now compute E[X X ] as follows The covariance is then ( ) ( ) ( ) 5 5 E[X ] ( ) + () + () ( E[X X ] ( )( ) ( + ()( ) ( + ()( ) ) ( ) ( + ( )() + ( )() ( ) ) + ()()() + ()() + ) ( ) + ()() + ()() cov[x, Y ] E[XY ] E[X]E[Y ] () () ( In this case the covariance is zero, but the variables are not independent..5. Sum of Variances var[a x + a x ]. var[a x + a x ] a var(x ) + a var(x ) + a a cov(x, x ) ) ) a σ + a a σ + a σ [ ] [ ] σ σ a [a a ] σ σ a [ [ ] ] x var [a a ] x (49)

4 MULTIVARIATE PROBABILITY DISTRIBUTIONS.. The Expected Value and Variance of Linear Functions of Random Variables.... Theorem. Theorem. Let Y, Y,..., Y n and X, X,..., X m be random variables with E[Y i ] µ i and E[X j ] ξ j. Define U n m a i Y i and U b j X j (5) i j for constants a, a,..., a n and b, b,..., b m. Then the following three results hold: (i) E[U ] n i a iµ i (ii) var[u ] n i a i var[y i] + a i a j cov[y i, Y j ] i<j where the double sum is over all pairs (i, j) with i < j. (iii) cov[u, U ] n i m j a ib j cov[y i, X j ]. Proof: (i) We want to show that E[U ] n a i µ i i Write out the E[U ] as follows: [ n ] E[U ] E a i Y i i n E [a i Y i ] i n a i E [Y i ] i n a i µ i i (5) using Theorems 8 as appropriate.

(ii) Write out the var[u ] as follows: MULTIVARIATE PROBABILITY DISTRIBUTIONS 4 [ n ] n var(u ) E[U E(U )] E a i Y i a i µ i [ n E a i (Y i µ i ) i i i ] n E a i (Y i µ i ) + a i a j (Y i µi)(y j µ j ) i i j (5) n a i E(Y i µ i ) + a i a j E[(Y i µ i )(Y j µ j )] i i j By definitions of variance and covariance, we have var(u ) n a i V (Y i ) + a i a j cov(y i, Y j ) (5) i i j Because cov(y i, Y j ) cov(y j, Y i ) we can write var(u ) n a i V (Y i ) + a i a j cov(y i, Y j ) (54) i i<j Similar steps can be used to obtain (iii).

4 MULTIVARIATE PROBABILITY DISTRIBUTIONS (iii) We have cov(u, U ) E {[U E(U )] [U E(U )]} ( n ) n m E a i Y i a i µ i b j X j i i j m b j ξ j j [ n E a i (Y i µ i )] m b j (x j ξ j ) i i j j n m E a i b i (Y i µ i )(X j ξ j ) (55) n m a i b i E [(Y i µ i )(X j ξ j )] i j n m a i b i cov(y i, X j ) i j. CONDITIONAL EXPECTATIONS.. Definition. If X and X are any two random variables, the conditional expectation of g(x ), given that X x, is defined to be E [g(x ) X ] if X and X are jointly continuous and g(x )f(x x ) dx (5) E [g(x ) X ] x g(x )p(x x ) (57) if X and X are jointly discrete... Example. Let the joint density of two random variables X and Y be given by f(x, y) { x, y, x + y otherwise

MULTIVARIATE PROBABILITY DISTRIBUTIONS 4 We can find the marginal density of y by integrating the joint density with respect to x as follows f Y (y) y x y f(x, y) dx dx ( y), y We find the conditional density of X given that Y y by forming the ratio f X Y (x y) f(x, y) f Y (y) ( y) ( y), x y We then form the expected value by multiplying the density by x and then integrating over x. E [X Y ] y ( y) ( y) ( y) ( y) x ( y) dx y ( x x dx y ) ( ) ( y) We can find the unconditional expected value of X by multiplying the marginal density of y by this expected value and integrating over y as follows

44 MULTIVARIATE PROBABILITY DISTRIBUTIONS E[X] E Y [ E[X Y ] ] y ( ) ( y) dy ( y) dy ( y) [ ( ) ( ) ] [ ] We can show this directly by multiplying the joint density by x and then integrating over x and y. E[X] y ( x y x dx dy ( y) dy ) dy ( y) dy ( y) [ ( ) ( ) ] [ ] The fact that we can find the expected value of X using the conditional distribution of X given Y is due to the following theorem.

MULTIVARIATE PROBABILITY DISTRIBUTIONS 45.. Theorem. Theorem. Let X and Y denote random variables. Then E[X] E Y [ EX Y [X Y ] ] (58) The inner expectation is with respect to the conditional distribution of X given Y and the outer expectation is with respect to the distribution of Y. Proof: Suppose that X and Y are jointly continuous with joint density F (X, Y ) and marginal distributions f X (x) and f Y (y), respectively. Then E[X] xf XY (x, y) dx dy xf X Y (x y)f Y (y) dx dy [ ] xf X Y (x y) dx f Y (y) dy E [X Y y] f Y (y) dy E Y [ EX Y [X Y ] ] (59) The proof is similar for the discrete case..4. Conditional Variance..4.. Definition. Just as we can compute a conditional expected value, we can compute a conditional variance. The idea is that the variance of the random variable X may be different for different values of Y. We define the conditional variance as follows. var[x Y y] E [ (X E[X Y y]) Y y ] E [ X Y y ] [ E[X Y y] ] () We can write the variance of X as a function of the expected value of the conditional variance. This is sometimes useful for specific problems..4.. Theorem. Theorem. Let X and Y denote random variables. Then var[x] E [ var[x Y y] ] + var [ E[X Y y] ] ()

4 MULTIVARIATE PROBABILITY DISTRIBUTIONS Proof: First note the following three definitions var[x Y ] E[X Y ] [ E[X Y ] ] E [ var[x Y ] ] E [ E[X Y ] ] E {[ E[X Y ] ] } var [ E[X Y ] ] E {[ E[X Y ] ] } { E [ E[X Y ] ]} (a) (b) (c) The variance of X is given by var[x] E[X ] [ E[X] ] We can find the expected value of a variable by taking the expected value of the conditional expectation as in Theorem. For this problem we can write E[X ] as the expected value of the conditional expectation of X given Y. Specifically, () and E[X ] E Y { EX Y [X Y ] } (4) [ E[X] ] [ EY { EX Y [X Y ] }] (5) Write () substituting in (4) and (5) as follows var[x] E[X ] [ E[X] ] E Y { EX Y [X Y ] } [ E Y { EX Y [X Y ] }] () Now subtract and add E {[ E(X Y ) ] } to the right hand side of equation as follows var[x] E Y { EX Y [X Y ] } [ E Y { EX Y [X Y ] }] E Y { EX Y [X Y ] } E {[ E(X Y ) ] } + E {[ E(X Y ) ] } [ EY { EX Y [X Y ] }] (7) Now notice that the first two terms in equation 7 are the same as the right hand side of equation b which is ( E [ var[x Y ] ]). Then notice that the second two terms in equation 7 are the same as the right hand side of equation c which is ( var [ E[X Y ] ]). We can then write var[x] as var[x] E Y { EX Y [X Y ] } [ E Y { EX Y [X Y ] }] E [ var[x Y ] ] + var [ E[X Y ] ] (8)

MULTIVARIATE PROBABILITY DISTRIBUTIONS 47.4.. Example. Let the joint density of two random variables X and Y be given by f(x, y) { 4 (x + y) x, y otherwise We can find the marginal density of x by integrating the joint density with respect to y as follows f X (x) f(x, y) dy (x + y) dy 4 ) (xy + y 4 ( 4x + 4 ) 4 (9) (4x + ), x 4 We can find the marginal density of y by integrating the joint density with respect to x as follows: f Y (y) 4 f(x, y) dy (x + y) dx 4 ( x + xy ) (7) ( + y), y 4

48 MULTIVARIATE PROBABILITY DISTRIBUTIONS We find the expected value of X by multiplying the conditional density by x and then integrating over x E[X] x (4x + ) dx 4 ( 4x + x ) dx 4 ) ( 4 4 x + x ( ) 4 4 + 4 7 To find the variance of X, we first need to find the E[X ]. We do this as follows E[X ] 4 4 5 The variance of X is then given by ( 7 ) 4 x (4x + ) dx ( 4x + x ) dx 4 (x 4 + ) x ( + ) 4 ( 5 var(x) E [ (X E(X)) ] E(x ) E (x) ) (7) (7) 5 ( ) 7 5 49 44 44 49 44 44 (7)

MULTIVARIATE PROBABILITY DISTRIBUTIONS 49 We find the conditional density of X given that Y y by forming the ratio f X Y (x y) f(x, y) f Y (y) 4 4 (x + y) ( + y) (x + y) ( + y) (74) We then form the expected value of X given Y by multiplying the density by x and then integrating over x. E [X Y ] (x + y) x ( + y) dx (x + xy) dx + y ( + y x + ) x y ( ( + y) + ) y ( + y ) ( + y) (4 + y) ( + y) ( ) (4 + y) ( + y) (75) We can find the unconditional expected value of X by multiplying the marginal density of y by this expected value and integrating over y as follows:

5 MULTIVARIATE PROBABILITY DISTRIBUTIONS E[X] E Y [ E[X Y ] ] 4 (4 + y) ( + y) dy ( + y) 4 (4 + y)( + y) ( + y) (4 + y) dy 4 (4y + ) y dy (7) 4 (8 + ) 4 4 4 7 We find the conditional variance by finding the expected value of X given Y and then subtracting the square of E[X Y ]. Now square E[X Y ]. E[X Y ] (x + y) x ( + y) dx (x + x y) dx + y ( + y x4 + ) x y ( + y + ) y ( + y ) + y ( ) ( ) + y + y ( ( ) ) (4 + y) E [X Y ] ( + y) (4 + y) ( + y) (77) (78)

Now subtract equation 78 from equation 77 MULTIVARIATE PROBABILITY DISTRIBUTIONS 5 ( ) ( ) + y var[x Y ] (4 + y) + y ( + y) ( ) ( ) (8 + y)( + y) (4 + y) ( + y) ( y + y + 8 ( + 4y + 9y ) ) ( + y) (79) y + y + ( + y) For example, if y, we obtain var[x Y ] y + y + ( + y) 44 y (8) To find the expected value of this variance we need to multiply the expression in equation 8 by the marginal density of Y and then integrate over the range of Y. E [ var[x Y ] ] 44 y + y + ( + y) ( + y) dy 4 y + y + ( + y) dy (8) Consider first the indefinite integral. y + y + z dy (8) ( + y) This integral would be easier to solve if ( + y) in the denominator could be eliminated. This would be the case if it could be factored out of the numerator. One way to do this is

5 MULTIVARIATE PROBABILITY DISTRIBUTIONS carry out the specified division. y + + y y + y + y + y + + y y + y y + y + (y + ) + y Now substitute equation 8 into equation 8 as follows y + y + z dy ( + y) [ (y + ) ] dy + y (8) (84) y + y log[ + y] Now compute the expected value of the variance as E [ var[x Y ] ] 44 44 44 [ y y + y + ( + y) dy + y log[ + y] [( ) ] + log[] log[] [ ] log[] 44 To compute the variance of E[X Y ] we need to find E Y [( E[X Y ] ) ] and then subtract ( EY [ E[X Y ] ]). First find the second term. The expected value of X given Y comes from equation 75. E[X Y ] ( ) (4 + y) ( + y) We found the expected value of E[X Y ] in equation 7. We repeat the derivation here by multiplying E[X Y ] by the marginal density of Y and then integrating over the range of Y. ] (85) (8)

MULTIVARIATE PROBABILITY DISTRIBUTIONS 5 ( ) ( E Y E[X Y ] ) (4 + y) ( + y) (4 + y) dy 4 (4y + ) y (y + ) dy 4 4 4 ( 8 + ) (87) 4 (4) 7 Now find the first term E Y ( (E[X Y ] ) ) 44 44 ( ) (4 + y) ( + y) (y + ) dy 4 (4 + y) + y dy 9y + 4y + + y Now find the indefinite integral by first simplifying the integrand using long division. dy (88) 9y + 4y + + y Now carry out the division + y 9y + 4y + (89) 9y + 5 + y 9y + 4y + 9y + 4y + + y 9y + 9y 5y + 5y + 5 (9y + 5) + + y (9)

54 MULTIVARIATE PROBABILITY DISTRIBUTIONS Now substitute in equation 9 into equation 88 as follows ( (E[X ) ) E Y Y ] 9y + 4y + dy 44 + y 44 9y + 5 + + y dy [ 9y ] 44 + 5y + log[y + ] [ ] 44 + + log[] [ ] 48 + log[] 44 (9) The variance is obtained by subtracting the square of (87) from (9) var [ E[X Y ] ] ( (E[X ) ) ( ) E Y Y ] + (E ) Y E[X Y ] ( ) [ ] 7 48 + log[] 44 [ ] 49 48 + log[] 44 44 [ ] log[] 44 (9) We can show that the sum of (85) and (9) is equal to the var[x ] as in Theorem : var[x] E [ var[x Y y] ] + var [ E[X Y y] ] [ ] [ ] log[] + log[] 44 44 log[] + log[] 44 44 (9) which is the same as in equation 7.

MULTIVARIATE PROBABILITY DISTRIBUTIONS 55. CAUCHY-SCHWARZ INEQUALITY.. Statement of Inequality. For any functions g(x) and h(x) and cumulative distribution function F (x),the following holds: ( g(x)h(x) df (x) where x is a vector random variable. ) ( ) g(x) df (x) h(x) df (x).. Proof. Form a linear combination of g(x) and h(x), square it and then integrate as follows: (tg(x) + h(x) ) df (x) (95) (94) The inequality holds because of the square and df (x) >. Now expand the integrand in (95) to obtain t (g(x) ) df (x) + t g(x)h(x) df (x) + (h(x) ) df (x) (9) This is a quadratic equation in t which holds for all t. Now define t as follows: and substitute in (9) t g(x)h(x) df (x) ( g(x) ) df (x) (97) ( ) ( ) g(x)h(x) df x) g(x)h(x) df (x) (h(x) ) ( ) ( ) + df (x) g(x) df (x) g(x) df (x) ( ( g(x)h(x) df (x) ) ( g(x) ) df (x) (h(x) ) df (x) g(x)h(x) df (x)) (h(x) ) df (x) (g(x) ) df (x) (98) ( g(x)h(x) df (x) ( ) ) ( ( ) ) h(x) df (x) gx) df (x).. Corollary. Consider two random variables X and X and the expectation of their product. Using (98) we obtain ( E(X X ) ) E(X )E(X ) E (X X ) ( E(X ) ) ( E(X ) ) (99)

5 MULTIVARIATE PROBABILITY DISTRIBUTIONS.4. Corollary. cov(x X ) < ( var(x ) ) ( var(x ) ) () Proof: Apply (98) to the centered random variables g(x) X µ and h(x) X µ where µ i E(X i ).

MULTIVARIATE PROBABILITY DISTRIBUTIONS 57 REFERENCES [] Amemiya, T. Advanced Econometrics. Cambridge: Harvard University Press, 985. [] Bickel, P. J. and K. A. Doksum. Mathematical Statistics: Basic Ideas and Selected Topics, Vol ). nd edition. Upper Saddle River, NJ: Prentice Hall,. [] Billingsley, P. Probability and Measure. rd edition. New York: Wiley, 995. [4] Casella, G. And R. L. Berger. Statistical Inference. Pacific Grove, CA: Duxbury,. [5] Cramer, H. Mathematical Methods of Statistics. Princeton: Princeton University Press, 94. [] Goldberger, A. S. Econometric Theory. New York: Wiley, 94. [7] Lindgren, B. W. Statistical Theory rd edition. New York: Macmillan Publishing Company, 97. [8] Rao, C. R. Linear Statistical Inference and its Applications. nd edition. New York: Wiley, 97.