Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.)
Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution
Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution
up to now, we have taken a measurement on a single item under observation it is clear in many practical cases that it is possible (and often very desirable) to take a measurement of the multiple items for example, we are observing university students to obtain some information of their physical characteristics such as height x and weight y we want to determine relation y = u(x) and say something about the variation of the points around the curve
DEFINITION Let X and Y be two random variables defined on a discrete space. Let S denote the corresponding two-dimensional space of X and Y, the two random variables of the discrete type. The probability that X = x and Y = y is denoted by f(x,y) = P(X = x, Y = y). The function f(x,y) is called the joint probability mass function of X and Y and has the following properties: (a) 0 f(x,y) 1 b x,y S f x, y = 1 c P X, Y A = (x,y) A f x, y, where A is a subset of the space S
EXAMPLE Roll a pair of fair dice. For each of the 36 sample points with probability 1/36, let X denote the smaller and Y the larger outcome on the dice. For example, if the outcome is (3,), then the observed values are X =, Y = 3. The event {X =, Y = 3} could occur in one of two ways (3,) or (,3) so its probability is 1 36 + 1 36 = 36. If the outcome is (,), then the observed values are X =, Y =. Since the event {X =, Y = } can occur in only one way, P(X =, Y = ) = 1/36. The joint pmf of X and Y is given by the probabilities f x, y = 1, 1 x = y 6 36, 1 x < y 6 36 when x and y are integers.
DEFINITION Let X and Y have the joint probability mass function f(x,y) with space S. The probability mass function of X alone, which is called the marginal probability mass function of X, f X x = y is defined by f x, y = P X = x, x ε S X, where the summation is taken over all possible y values for each given x in the x space S X. That is, the summation is over all (x,y) in S with a given x value. Similarly, the marginal probability mass function of Y is defined by f Y x = f x, y = P Y = y, y ε S Y, x where the summation is taken over all possible x values for each given y in the y space S Y.
DEFINITION The random variables X and Y are independent if and only if, for every x ε S X and every y ε S Y, P(X = x, Y = y) = P(Y = y) P(Y = y) or, equivalently, f(x,y) = f X (x) f Y (y). otherwise, X and Y are said to be dependent.
it is possible to define a probability histogram for a joint pmf as we did for a single random variable suppose that X and Y have a joint pmf f(x,y) with space S, where S is a set of pairs of integers at a point (x,y) in S, construct a rectangular column that is centered at (x,y) and has a one-by-one-unit base and height equal to f(x,y) f(x,y) is equal to the volume of this rectangular column the sum of the volumes of the rectangular columns in the probability histogram is equal to 1 f x, y = xy 30, x = 1,,3 y = 1,
NOTE sometimes it is convenient to replace symbols X and Y representing random variables by X 1 and X let X 1 and X be random variables of the discrete type with the joint pmf f(x 1, x ) on the space S if u(x 1, X ) is a function of these two random variables, then E u(x 1, X ) = u(x 1, x )f x 1, x, x 1, x S if it exists, is called the mathematical expectation (or expected value) of u(x 1, X )
the following mathematical expectations, if they exist, have special names: A. if u i (X 1,X ) = X i, for i = 1,, then E u(x 1, X ) = E X i = μ i is called the mean of X i, for i = 1,. B. if u i (X 1,X ) = (X i i ) for i = 1,, then E u(x 1, X ) = E X i μ i = σ i = Var(X i ) is called the variance of X i, for i = 1,. the mean i and the variance σ i can be computed from the joint pmf f(x 1, x ) or the marginal f i (x i ), i = 1,
EXTENSION OF THE BINOMIAL DISTRIBUTION TO A TRINOMIAL DISTRIBUTION we have three mutually exclusive and exhaustive ways for an experiment to terminate: perfect, seconds, defective we repeat the experiment n independent times, and the probabilities p X, p Y, p Z = 1 p X p Y remains the same from trial to trial in the n trials, let X = number of perfect items Y = number of seconds Z = n X Y = number of defectives if the x and y are nonnegative integers such that x + y n, then the probability of having x perfects, y seconds and n x y defectives is p X x p Y y 1 p X p Y n x y
EXTENSION OF THE BINOMIAL DISTRIBUTION TO A TRINOMIAL DISTRIBUTION however, if we want P(X = x, Y = y), then we must recognize that X = x, Y = y can be achieved in n x, y, n x y = n! x! y! n x y! different ways f x, y = P X = x, Y = y = the trinomial pmf is n! x! y! n x y! p X x p Y y 1 p X p Y n x y where x and y are nonnegative integers such that x + y n without summing, we know that X is b(n, p X ) and Y is b(n, p Y ), thus X and Y are dependent
Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution
we introduced the mathematical expectation of a function of two random variables X, Y μ X = E X, μ Y = E Y = E X μ X, σ Y = E X μ Y A. if u(x,y) = (X X ) (Y Y ), then E u(x, Y) = E X μ X Y μ Y = Y = Cov(X, Y) is called the covariance of X and Y. B. if the standard deviations and σ Y are positive, then Cov(X, Y) ρ = = Y σ Y σ Y is called the correlation coefficient of X and Y.
it is convenient that the mean and the variance of X can be computed from either the joint pmf (or pdf) or the marginal pmf (or pdf) of X μ X = E X = xf x, y = x f(x, y) = xf X (x) x y x y x to compute the covariance, we need the joint pmf (or pdf) E X μ X Y μ Y = E XY μ X Y μ Y X + μ X μ Y = E XY μ X E Y μ Y E X + μ X μ Y because E is a linear or distributive operator thus, Cov X, Y = E XY μ Y μ X μ X μ Y + μ X μ Y = E X, Y μ X μ Y since ρ = Cov(X,Y) / σ Y, we also have E XY = μ X μ Y + ρ σ Y (the expected value of the product of two random variables is equal to the product μ X μ Y plus their covariance ρ σ Y )
ρ = x y x μ X y μ Y f(x, y) σ Y interpretation of the sign of the correlation coefficient if positive probabilities are assigned to pairs (x,y) in which both x and y are either simultaneously above or simultaneously below their respective means, then the corresponding terms in the summation that defines ρ are positive because both factors x μ X and y μ Y will be positive or both will be negative if, on the one hand, the points (x,y), which yield large positive products x μ X y μ Y, contain most of the probability of the distribution, then the correlation coefficient will tend to be positive if, on the other hand, the points (x,y), in which one component is below its mean and the other above its mean, have most of the probability, then the coefficient of correlation will tend to be negative because products x μ X y μ Y having higher probabilities are negative
consider the following problem: think of the points (x,y) in the space S and their corresponding probabilities let us consider all possible lines in two-dimensional space, each with finite slope, that pass through the point associated with the means (μ Y, μ X ) these lines are of the form y = μ Y + b(x μ X ) for each point (x 0,y 0 ) in S, so that f(x 0,y 0 ) > 0 consider the vertical distance from that point to the aforesaid lines since y 0 is the height of the point above the x-axis and μ Y + b(x 0 μ X ) is the height of the point on the line that is directly above or below the point (x 0,y 0 ), the absolute value of the difference of these two heights is the vertical distance from the point (x 0,y 0 ) to the line y = μ Y + b(x μ X ) y 0 μ Y b x 0 μ X
let us now square the distance and take the weighted average of all such squares let us consider the mathematical expectation E Y μ Y b X μ X = K(b) the problem is to find that line (or that b) which minimizes this expectation of the square Y μ Y b X μ X application of the principle of least squares the line is sometimes called the least square regression line K b = E Y μ Y b X μ X Y μ Y + b X μ X = σ Y bρ σ Y + b because E is a linear operator and E X μ X Y μ Y = ρ σ Y
K b = E Y μ Y b X μ X Y μ Y + b X μ X = σ Y bρ σ Y + b the derivative K b = ρ σ Y + b equals to zero at b = ρσ Y / and we see that K b = > 0 obtains its minimum for b consequently, the least squares regression line is y = μ Y + ρ σ Y (x μ X ) if ρ > 0, the slope is positive; if ρ < 0, the slope is negative
K b = E Y μ Y b X μ X Y μ Y + b X μ X = σ Y bρ σ Y + b the value of the minimum is K ρ σ Y = σ Y ρ σ Y ρ σ Y + ρ σ Y = σ Y ρ σ Y + ρ σ Y = σ Y (1 ρ ) σ Y 1 ρ = K(b) 0 ρ 1 1 ρ 1 if ρ = 0 K ρ σ Y = σ Y if ρ is close to 1 or -1, then K ρ σ Y is relatively small
K ρ σ Y = σ Y ρ σ Y ρ σ Y + ρ σ Y = σ Y ρ σ Y + ρ σ Y = σ Y (1 ρ ) if ρ = 0 K ρ σ Y = σ Y if ρ is close to 1 or -1, then K ρ σ Y is relatively small the vertical deviations of the points with positive probability from the line y = μ Y + ρσ Y are small if ρ is close to 1 or -1 ρ measures the amount of linearity in the probability distribution x μ X
suppose that X and Y are independent, so that f x, y f X x f Y y suppose also that we want to find the expected value of the product u(x) v(y) E u X v Y = S X S Y u x v y f(x, y) = S X S Y u x v y f X x f Y y E u X v Y = S X u x f X x S Y v y f Y y = E u X E[v(Y)] the correlation coefficient of two independent variables is zero: Cov X, Y = E X μ X Y μ Y = E X μ X E Y μ Y = 0!!! independence implies zero correlation coefficient, but zero correlation coefficient does not necessarily imply independence!!!
Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution
let X and Y have a joint discrete distribution with pmf f(x,y) on space S the marginal probability mass functions are f X x and f Y y with spaces S X and S Y let event A = {X = x} and event B = {Y = y}, (x,y) S A B = {X = x, Y = y} because P(A B) = P(X = x, Y = y) = f(x,y) and P(B) = P(Y = y) = f Y (y) > 0 (since y S Y ) the conditional probability of event A given event B is P A B = P(A B) P(B) = f(x, y) f Y (y)
DEFINITION The conditional probability mass function of X, given that Y = y, is defined by g x y = f(x, y) f Y (y), provided that f Y y > 0. Similarly, the conditional probability mass function of Y, given that X = x, is defined by h y x = f(x, y) f X (x), provided that f X x > 0.
h(y x) 0 if we sum h(y x) over y for that fixed x, we obtain y h y x = y f(x, y) f X (x) = f X(x) f X (x) = 1 h(y x) satisfies the conditions of a probability mass function, we can compute conditional probabilities such as P a < Y < b X = x = {y: a<y<b} h(y x) and conditional mathematical expectations such as E u Y X = x = y u y h(y x)
special conditional mathematical expectations conditional mean of Y, given that X = x, defined by μ Y X = E(Y x) = y yh(y x) conditional variance of Y, given that X = x, defined by σ Y X = E{[Y E(Y x)] x} = y [Y E(Y x)] h(y x) σ Y X = E Y x [E(Y x)] the conditional mean μ X Y and the conditional variance Y are given by similar expression
suppose that the conditional mean is a linear function of x; that is, E(Y x) = a + bx let us find the constants a and b in terms of characteristics μ X, μ Y,, σ Y and ρ we assume that the respective standard deviations and σ Y are both positive, so that the correlation coefficient will exist y yh(y x) = y f(x, y) y f X (x) = a + bx, for x S X yf(x, y) = a + bx f X (x), for x S X y xεs X y yf x, y = xεs X (a + bx)f X (x) μ Y = a + bμ X
yf(x, y) = ax + b f X (x) y xyf(x, y) = ax + bx f X (x) y E XY = ae X + be X or, equivalently, μ X μ Y + σ σ Y = aμ X + b(μ X + ) μ Y = a + bμ X the solution of the equations is a = μ Y ρ σ Y μ X and b = ρ σ Y if E(Y x) is linear, it is given by E Y x = μ Y + ρ σ Y (x μ X )
E Y x = μ Y + ρ σ Y (x μ X ) by symmetry E X y = μ X + ρ σ Y (y μ Y ) if the conditional mean of Y, given that X = x, is linear, it is exactly the same as the best-fitting line (least squares regression line) considered in the previous case
Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution
the idea of joint distributions of two random variables of the discrete type can be extended to that of two random variables of the continuous type the definitions are the same except that integrals replace summations The joint probability density function (joint pdf) of two continuous-type random variables is an integrable function f(x,y) with the following properties: (a) f(x,y) 0, where f(x,y) = 0 when (x,y) is not in the support (space) S of X and Y b f x, y dxdy = 1 c P X, Y A = A f x, y dxdy, where X, Y A is an event defined in the plane A. (c) P[(X,Y) A] is the volume of the solid over the region A in the xy-plane and bounded by the surface z = f(x,y)
the respective marginal pdfs of continuous-type random variables X and Y are given by f X x = f Y y = f x, y dy, f x, y dx, x S X y S Y in the definitions of mathematical expectations: summation replace with integration X and Y are independent if and only if the joint pdf factors into the product of their marginal pdfs f x, y f x x f Y y, x S X, y S Y
let X and Y have a distribution of the continuous type with joint pdf f(x,y) and marginal pdfs f X (x) and f Y (y) the conditional pdf, mean and variance of Y, given that X = x, are h y x = f(x, y) f X (x), provided that f X x > 0 E Y x = yh y x dy Var Y x = E Y E Y x x = y E Y x h y x dy = E Y x [E(Y x)]
if E(Y x) is linear, then E Y x = μ Y + ρ σ Y x μ X if E(X y) is linear, then E X y = μ X + ρ σ Y y μ Y
Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution
let X and Y be random variables with joint pdf f(x,y) of the continuous type and marginal pdfs f X (x) and f Y (y) suppose that we have an application in which we can make following three assumptions about the conditional distribution of Y, given X = x: (a) It is normal for each real x. (b) Its mean, E(Y x), is a linear function of x. (c) Its variance is constant, that is, it does not depend upon the given value of x. assumption (b) implies: E Y x = μ Y + ρ σ Y x μ X assumption (c) implies: σ Y x = y μ Y ρ σ Y x μ X (multiply by f X (x) and integrate on x) h y x dy
σ Y x = y μ Y ρ σ Y x μ X (multiply by f X (x) and integrate on x) h y x dy since σ Y x is constant, the left-hand side is equal to σ Y x σ Y x = y μ Y ρ σ Y x μ X h y x f X (x)dydx h y x f X x = f(x, y) σ Y x = E (Y μ Y ) ρ σ Y X μ Y μ Y + ρ σ Y X μ X X
σ Y x = E (Y μ Y ) ρ σ Y X μ Y μ Y + ρ σ Y X μ X X using the fact that the expectation E is a linear operator and recalling that E X μ X Y μ Y = ρ σ Y we have σ Y x = σ Y ρ σ Y ρσ σ Y + ρ σ Y X σ = σ Y ρ σ Y + ρ σ Y = σ Y (1 ρ ) X these facts about the conditional mean and variance + assumption (a) require that the conditional pdf of Y, given X = x, be h y x = 1 σ Y π 1 ρ exp [y μ Y ρ(σ Y / )(x μ X )] σ Y (1 ρ ), < y <, for every real x up to this point, nothing has been said about the distribution of X other than it has mean μ X and positive variance
suppose we assume that distribution of X is also normal that is, the marginal pdf of X is f X x = 1 π exp (x μ x), < x < the joint pdf of X and Y is given by the product f x, y = h y x f X x = π σ Y 1 q(x, y) exp 1 ρ where q x, y = 1 1 ρ x μ X ρ x μ X x μ Y σ Y + y μ Y σ Y a joint pdf of this form is called a bivariate normal pdf
suppose we assume that distribution of X is also normal that is, the marginal pdf of X is f X x = 1 π exp (x μ x), < x <
THEOREM If X and Y have a bivariate normal distribution with correlation coefficient ρ, then X and Y are independent if and only if ρ = 0. the joint pdf of X and Y equals f X (x) f Y (y) h y x is a normal pdf with mean μ Y and variance σ Y f x, y = h y x f X x = π 1 q(x, y) exp 1 ρ where q x, y = 1 1 ρ x μ X ρ x μ X x μ Y σ Y + y μ Y σ Y