Bivariate distributions

Similar documents
Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Bivariate Distributions

Chapter 4 Multiple Random Variables

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Chapter 2. Probability

Review: mostly probability and some statistics

Mathematics 426 Robert Gross Homework 9 Answers

Joint Distribution of Two or More Random Variables

Recitation 2: Probability

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Random Variables and Their Distributions

Notes for Math 324, Part 19

Lecture 2: Review of Probability

ENGG2430A-Homework 2

Chapter 5 Class Notes

Multivariate Random Variable

Joint Probability Distributions, Correlations

Jointly Distributed Random Variables

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Continuous Random Variables

More on Distribution Function

Multiple Random Variables

Statistical Learning Theory

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Introduction to Probability and Stocastic Processes - Part I

Formulas for probability theory and linear models SF2941

ACM 116: Lectures 3 4

Algorithms for Uncertainty Quantification

Lecture 11. Probability Theory: an Overveiw

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

Bivariate Distributions

18 Bivariate normal distribution I

Lecture 2: Repetition of probability theory and statistics

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

STA 256: Statistics and Probability I

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

EE4601 Communication Systems

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

HW4 : Bivariate Distributions (1) Solutions

Probability and Distributions

2 (Statistics) Random variables

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Section 8.1. Vector Notation

Review of Statistics

Multivariate distributions

STAT Chapter 5 Continuous Distributions

6.041/6.431 Fall 2010 Quiz 2 Solutions

MAS223 Statistical Inference and Modelling Exercises

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Let X and Y denote two random variables. The joint distribution of these random

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Chapter 4 Multiple Random Variables

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

3 Multiple Discrete Random Variables

Joint Probability Distributions, Correlations

Multivariate Distributions CIVL 7012/8012

STAT/MATH 395 PROBABILITY II

Chapter 5 continued. Chapter 5 sections

1.1 Review of Probability Theory

3-1. all x all y. [Figure 3.1]

ECSE B Solutions to Assignment 8 Fall 2008

Discrete Random Variables

STAT 430/510: Lecture 16

Chapter 5. Chapter 5 sections

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Statistics for Economists. Lectures 3 & 4

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Random variables, Expectation, Mean and Variance. Slides are adapted from STAT414 course at PennState

Brief Review of Probability

BASICS OF PROBABILITY

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

1 Random variables and distributions

1 Presessional Probability

Ch. 5 Joint Probability Distributions and Random Samples

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Review of Probability. CS1538: Introduction to Simulations

More than one variable

Discrete Random Variables

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

18.440: Lecture 28 Lectures Review

1 Review of Probability and Distributions

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Regression and Covariance

Probability Theory and Statistics. Peter Jochumzen

Introduction to Statistical Inference Self-study

Covariance and Correlation

Probability and Statistics Notes

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Class 8 Review Problems solutions, 18.05, Spring 2014

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

Transcription:

Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.)

Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution

Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution

up to now, we have taken a measurement on a single item under observation it is clear in many practical cases that it is possible (and often very desirable) to take a measurement of the multiple items for example, we are observing university students to obtain some information of their physical characteristics such as height x and weight y we want to determine relation y = u(x) and say something about the variation of the points around the curve

DEFINITION Let X and Y be two random variables defined on a discrete space. Let S denote the corresponding two-dimensional space of X and Y, the two random variables of the discrete type. The probability that X = x and Y = y is denoted by f(x,y) = P(X = x, Y = y). The function f(x,y) is called the joint probability mass function of X and Y and has the following properties: (a) 0 f(x,y) 1 b x,y S f x, y = 1 c P X, Y A = (x,y) A f x, y, where A is a subset of the space S

EXAMPLE Roll a pair of fair dice. For each of the 36 sample points with probability 1/36, let X denote the smaller and Y the larger outcome on the dice. For example, if the outcome is (3,), then the observed values are X =, Y = 3. The event {X =, Y = 3} could occur in one of two ways (3,) or (,3) so its probability is 1 36 + 1 36 = 36. If the outcome is (,), then the observed values are X =, Y =. Since the event {X =, Y = } can occur in only one way, P(X =, Y = ) = 1/36. The joint pmf of X and Y is given by the probabilities f x, y = 1, 1 x = y 6 36, 1 x < y 6 36 when x and y are integers.

DEFINITION Let X and Y have the joint probability mass function f(x,y) with space S. The probability mass function of X alone, which is called the marginal probability mass function of X, f X x = y is defined by f x, y = P X = x, x ε S X, where the summation is taken over all possible y values for each given x in the x space S X. That is, the summation is over all (x,y) in S with a given x value. Similarly, the marginal probability mass function of Y is defined by f Y x = f x, y = P Y = y, y ε S Y, x where the summation is taken over all possible x values for each given y in the y space S Y.

DEFINITION The random variables X and Y are independent if and only if, for every x ε S X and every y ε S Y, P(X = x, Y = y) = P(Y = y) P(Y = y) or, equivalently, f(x,y) = f X (x) f Y (y). otherwise, X and Y are said to be dependent.

it is possible to define a probability histogram for a joint pmf as we did for a single random variable suppose that X and Y have a joint pmf f(x,y) with space S, where S is a set of pairs of integers at a point (x,y) in S, construct a rectangular column that is centered at (x,y) and has a one-by-one-unit base and height equal to f(x,y) f(x,y) is equal to the volume of this rectangular column the sum of the volumes of the rectangular columns in the probability histogram is equal to 1 f x, y = xy 30, x = 1,,3 y = 1,

NOTE sometimes it is convenient to replace symbols X and Y representing random variables by X 1 and X let X 1 and X be random variables of the discrete type with the joint pmf f(x 1, x ) on the space S if u(x 1, X ) is a function of these two random variables, then E u(x 1, X ) = u(x 1, x )f x 1, x, x 1, x S if it exists, is called the mathematical expectation (or expected value) of u(x 1, X )

the following mathematical expectations, if they exist, have special names: A. if u i (X 1,X ) = X i, for i = 1,, then E u(x 1, X ) = E X i = μ i is called the mean of X i, for i = 1,. B. if u i (X 1,X ) = (X i i ) for i = 1,, then E u(x 1, X ) = E X i μ i = σ i = Var(X i ) is called the variance of X i, for i = 1,. the mean i and the variance σ i can be computed from the joint pmf f(x 1, x ) or the marginal f i (x i ), i = 1,

EXTENSION OF THE BINOMIAL DISTRIBUTION TO A TRINOMIAL DISTRIBUTION we have three mutually exclusive and exhaustive ways for an experiment to terminate: perfect, seconds, defective we repeat the experiment n independent times, and the probabilities p X, p Y, p Z = 1 p X p Y remains the same from trial to trial in the n trials, let X = number of perfect items Y = number of seconds Z = n X Y = number of defectives if the x and y are nonnegative integers such that x + y n, then the probability of having x perfects, y seconds and n x y defectives is p X x p Y y 1 p X p Y n x y

EXTENSION OF THE BINOMIAL DISTRIBUTION TO A TRINOMIAL DISTRIBUTION however, if we want P(X = x, Y = y), then we must recognize that X = x, Y = y can be achieved in n x, y, n x y = n! x! y! n x y! different ways f x, y = P X = x, Y = y = the trinomial pmf is n! x! y! n x y! p X x p Y y 1 p X p Y n x y where x and y are nonnegative integers such that x + y n without summing, we know that X is b(n, p X ) and Y is b(n, p Y ), thus X and Y are dependent

Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution

we introduced the mathematical expectation of a function of two random variables X, Y μ X = E X, μ Y = E Y = E X μ X, σ Y = E X μ Y A. if u(x,y) = (X X ) (Y Y ), then E u(x, Y) = E X μ X Y μ Y = Y = Cov(X, Y) is called the covariance of X and Y. B. if the standard deviations and σ Y are positive, then Cov(X, Y) ρ = = Y σ Y σ Y is called the correlation coefficient of X and Y.

it is convenient that the mean and the variance of X can be computed from either the joint pmf (or pdf) or the marginal pmf (or pdf) of X μ X = E X = xf x, y = x f(x, y) = xf X (x) x y x y x to compute the covariance, we need the joint pmf (or pdf) E X μ X Y μ Y = E XY μ X Y μ Y X + μ X μ Y = E XY μ X E Y μ Y E X + μ X μ Y because E is a linear or distributive operator thus, Cov X, Y = E XY μ Y μ X μ X μ Y + μ X μ Y = E X, Y μ X μ Y since ρ = Cov(X,Y) / σ Y, we also have E XY = μ X μ Y + ρ σ Y (the expected value of the product of two random variables is equal to the product μ X μ Y plus their covariance ρ σ Y )

ρ = x y x μ X y μ Y f(x, y) σ Y interpretation of the sign of the correlation coefficient if positive probabilities are assigned to pairs (x,y) in which both x and y are either simultaneously above or simultaneously below their respective means, then the corresponding terms in the summation that defines ρ are positive because both factors x μ X and y μ Y will be positive or both will be negative if, on the one hand, the points (x,y), which yield large positive products x μ X y μ Y, contain most of the probability of the distribution, then the correlation coefficient will tend to be positive if, on the other hand, the points (x,y), in which one component is below its mean and the other above its mean, have most of the probability, then the coefficient of correlation will tend to be negative because products x μ X y μ Y having higher probabilities are negative

consider the following problem: think of the points (x,y) in the space S and their corresponding probabilities let us consider all possible lines in two-dimensional space, each with finite slope, that pass through the point associated with the means (μ Y, μ X ) these lines are of the form y = μ Y + b(x μ X ) for each point (x 0,y 0 ) in S, so that f(x 0,y 0 ) > 0 consider the vertical distance from that point to the aforesaid lines since y 0 is the height of the point above the x-axis and μ Y + b(x 0 μ X ) is the height of the point on the line that is directly above or below the point (x 0,y 0 ), the absolute value of the difference of these two heights is the vertical distance from the point (x 0,y 0 ) to the line y = μ Y + b(x μ X ) y 0 μ Y b x 0 μ X

let us now square the distance and take the weighted average of all such squares let us consider the mathematical expectation E Y μ Y b X μ X = K(b) the problem is to find that line (or that b) which minimizes this expectation of the square Y μ Y b X μ X application of the principle of least squares the line is sometimes called the least square regression line K b = E Y μ Y b X μ X Y μ Y + b X μ X = σ Y bρ σ Y + b because E is a linear operator and E X μ X Y μ Y = ρ σ Y

K b = E Y μ Y b X μ X Y μ Y + b X μ X = σ Y bρ σ Y + b the derivative K b = ρ σ Y + b equals to zero at b = ρσ Y / and we see that K b = > 0 obtains its minimum for b consequently, the least squares regression line is y = μ Y + ρ σ Y (x μ X ) if ρ > 0, the slope is positive; if ρ < 0, the slope is negative

K b = E Y μ Y b X μ X Y μ Y + b X μ X = σ Y bρ σ Y + b the value of the minimum is K ρ σ Y = σ Y ρ σ Y ρ σ Y + ρ σ Y = σ Y ρ σ Y + ρ σ Y = σ Y (1 ρ ) σ Y 1 ρ = K(b) 0 ρ 1 1 ρ 1 if ρ = 0 K ρ σ Y = σ Y if ρ is close to 1 or -1, then K ρ σ Y is relatively small

K ρ σ Y = σ Y ρ σ Y ρ σ Y + ρ σ Y = σ Y ρ σ Y + ρ σ Y = σ Y (1 ρ ) if ρ = 0 K ρ σ Y = σ Y if ρ is close to 1 or -1, then K ρ σ Y is relatively small the vertical deviations of the points with positive probability from the line y = μ Y + ρσ Y are small if ρ is close to 1 or -1 ρ measures the amount of linearity in the probability distribution x μ X

suppose that X and Y are independent, so that f x, y f X x f Y y suppose also that we want to find the expected value of the product u(x) v(y) E u X v Y = S X S Y u x v y f(x, y) = S X S Y u x v y f X x f Y y E u X v Y = S X u x f X x S Y v y f Y y = E u X E[v(Y)] the correlation coefficient of two independent variables is zero: Cov X, Y = E X μ X Y μ Y = E X μ X E Y μ Y = 0!!! independence implies zero correlation coefficient, but zero correlation coefficient does not necessarily imply independence!!!

Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution

let X and Y have a joint discrete distribution with pmf f(x,y) on space S the marginal probability mass functions are f X x and f Y y with spaces S X and S Y let event A = {X = x} and event B = {Y = y}, (x,y) S A B = {X = x, Y = y} because P(A B) = P(X = x, Y = y) = f(x,y) and P(B) = P(Y = y) = f Y (y) > 0 (since y S Y ) the conditional probability of event A given event B is P A B = P(A B) P(B) = f(x, y) f Y (y)

DEFINITION The conditional probability mass function of X, given that Y = y, is defined by g x y = f(x, y) f Y (y), provided that f Y y > 0. Similarly, the conditional probability mass function of Y, given that X = x, is defined by h y x = f(x, y) f X (x), provided that f X x > 0.

h(y x) 0 if we sum h(y x) over y for that fixed x, we obtain y h y x = y f(x, y) f X (x) = f X(x) f X (x) = 1 h(y x) satisfies the conditions of a probability mass function, we can compute conditional probabilities such as P a < Y < b X = x = {y: a<y<b} h(y x) and conditional mathematical expectations such as E u Y X = x = y u y h(y x)

special conditional mathematical expectations conditional mean of Y, given that X = x, defined by μ Y X = E(Y x) = y yh(y x) conditional variance of Y, given that X = x, defined by σ Y X = E{[Y E(Y x)] x} = y [Y E(Y x)] h(y x) σ Y X = E Y x [E(Y x)] the conditional mean μ X Y and the conditional variance Y are given by similar expression

suppose that the conditional mean is a linear function of x; that is, E(Y x) = a + bx let us find the constants a and b in terms of characteristics μ X, μ Y,, σ Y and ρ we assume that the respective standard deviations and σ Y are both positive, so that the correlation coefficient will exist y yh(y x) = y f(x, y) y f X (x) = a + bx, for x S X yf(x, y) = a + bx f X (x), for x S X y xεs X y yf x, y = xεs X (a + bx)f X (x) μ Y = a + bμ X

yf(x, y) = ax + b f X (x) y xyf(x, y) = ax + bx f X (x) y E XY = ae X + be X or, equivalently, μ X μ Y + σ σ Y = aμ X + b(μ X + ) μ Y = a + bμ X the solution of the equations is a = μ Y ρ σ Y μ X and b = ρ σ Y if E(Y x) is linear, it is given by E Y x = μ Y + ρ σ Y (x μ X )

E Y x = μ Y + ρ σ Y (x μ X ) by symmetry E X y = μ X + ρ σ Y (y μ Y ) if the conditional mean of Y, given that X = x, is linear, it is exactly the same as the best-fitting line (least squares regression line) considered in the previous case

Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution

the idea of joint distributions of two random variables of the discrete type can be extended to that of two random variables of the continuous type the definitions are the same except that integrals replace summations The joint probability density function (joint pdf) of two continuous-type random variables is an integrable function f(x,y) with the following properties: (a) f(x,y) 0, where f(x,y) = 0 when (x,y) is not in the support (space) S of X and Y b f x, y dxdy = 1 c P X, Y A = A f x, y dxdy, where X, Y A is an event defined in the plane A. (c) P[(X,Y) A] is the volume of the solid over the region A in the xy-plane and bounded by the surface z = f(x,y)

the respective marginal pdfs of continuous-type random variables X and Y are given by f X x = f Y y = f x, y dy, f x, y dx, x S X y S Y in the definitions of mathematical expectations: summation replace with integration X and Y are independent if and only if the joint pdf factors into the product of their marginal pdfs f x, y f x x f Y y, x S X, y S Y

let X and Y have a distribution of the continuous type with joint pdf f(x,y) and marginal pdfs f X (x) and f Y (y) the conditional pdf, mean and variance of Y, given that X = x, are h y x = f(x, y) f X (x), provided that f X x > 0 E Y x = yh y x dy Var Y x = E Y E Y x x = y E Y x h y x dy = E Y x [E(Y x)]

if E(Y x) is linear, then E Y x = μ Y + ρ σ Y x μ X if E(X y) is linear, then E X y = μ X + ρ σ Y y μ Y

Bivariate Distributions of the Discrete Type The Correlation Coefficient Conditional Distributions Bivariate Distributions of the Continuous Type The Bivariate Normal Distribution

let X and Y be random variables with joint pdf f(x,y) of the continuous type and marginal pdfs f X (x) and f Y (y) suppose that we have an application in which we can make following three assumptions about the conditional distribution of Y, given X = x: (a) It is normal for each real x. (b) Its mean, E(Y x), is a linear function of x. (c) Its variance is constant, that is, it does not depend upon the given value of x. assumption (b) implies: E Y x = μ Y + ρ σ Y x μ X assumption (c) implies: σ Y x = y μ Y ρ σ Y x μ X (multiply by f X (x) and integrate on x) h y x dy

σ Y x = y μ Y ρ σ Y x μ X (multiply by f X (x) and integrate on x) h y x dy since σ Y x is constant, the left-hand side is equal to σ Y x σ Y x = y μ Y ρ σ Y x μ X h y x f X (x)dydx h y x f X x = f(x, y) σ Y x = E (Y μ Y ) ρ σ Y X μ Y μ Y + ρ σ Y X μ X X

σ Y x = E (Y μ Y ) ρ σ Y X μ Y μ Y + ρ σ Y X μ X X using the fact that the expectation E is a linear operator and recalling that E X μ X Y μ Y = ρ σ Y we have σ Y x = σ Y ρ σ Y ρσ σ Y + ρ σ Y X σ = σ Y ρ σ Y + ρ σ Y = σ Y (1 ρ ) X these facts about the conditional mean and variance + assumption (a) require that the conditional pdf of Y, given X = x, be h y x = 1 σ Y π 1 ρ exp [y μ Y ρ(σ Y / )(x μ X )] σ Y (1 ρ ), < y <, for every real x up to this point, nothing has been said about the distribution of X other than it has mean μ X and positive variance

suppose we assume that distribution of X is also normal that is, the marginal pdf of X is f X x = 1 π exp (x μ x), < x < the joint pdf of X and Y is given by the product f x, y = h y x f X x = π σ Y 1 q(x, y) exp 1 ρ where q x, y = 1 1 ρ x μ X ρ x μ X x μ Y σ Y + y μ Y σ Y a joint pdf of this form is called a bivariate normal pdf

suppose we assume that distribution of X is also normal that is, the marginal pdf of X is f X x = 1 π exp (x μ x), < x <

THEOREM If X and Y have a bivariate normal distribution with correlation coefficient ρ, then X and Y are independent if and only if ρ = 0. the joint pdf of X and Y equals f X (x) f Y (y) h y x is a normal pdf with mean μ Y and variance σ Y f x, y = h y x f X x = π 1 q(x, y) exp 1 ρ where q x, y = 1 1 ρ x μ X ρ x μ X x μ Y σ Y + y μ Y σ Y