Joint Probability Distributions

Size: px

Start display at page:

Download "Joint Probability Distributions"

Joshua Hoover
6 years ago
Views:

1 Joint Probability Distributions ST 370 In many random experiments, more than one quantity is measured, meaning that there is more than one random variable. Example: Cell phone flash unit A flash unit is chosen randomly from a production line; its recharge time X (seconds) and flash intensity Y (watt-seconds) are measured. 1 / 21 Joint Probability Distributions

2 Example: Bernoulli trials X 1 is the indicator of success on the first trial: { 1 success on first trial X 1 = 0 otherwise and X 2, X 3,..., the indicators for the other trials, are all random variables. 2 / 21 Joint Probability Distributions

3 Two or More Random Variables ST 370 To make probability statements about several random variables, we need their joint probability distribution. Discrete random variables If X and Y are discrete random variables, they have a joint probability mass function f XY (x i, y j ) = P(X = x i and Y = y j ). 3 / 21 Joint Probability Distributions Two or More Random Variables

4 Example: Mobile response time A mobile web site is accessed from a smart phone; X is the signal strength, in number of bars, and Y is response time, to the nearest second. x = Number of bars y = Response time / 21 Joint Probability Distributions Two or More Random Variables

5 Continuous random variables If X and Y are continuous random variables, they have a joint probability density function f XY (x, y), with the interpretation P(a X b and c Y d) = b d a c f XY (x, y)dy dx. If one random variable is discrete and the other is continuous, the joint distribution is more complex. In all cases, they have a joint cumulative distribution function F XY (x, y) = P(X x and Y y). 5 / 21 Joint Probability Distributions Two or More Random Variables

6 Marginal probability distributions Since X is a random variable, it also has its own probability distribution, ignoring the value of Y, called its marginal probability distribution. Discrete case: f X (x i ) = P(X = x i ) = P(X = x i and Y takes any value) = j P(X = x i, Y = y j ) = j f XY (x i, y j ), and similarly f Y (y j ) = i f XY (x i, y j ). 6 / 21 Joint Probability Distributions Two or More Random Variables

7 Example: Mobile response time Marginal distributions of X and Y : x = Number of bars Marginal y = Response time Marginal / 21 Joint Probability Distributions Two or More Random Variables

8 Continuous case: and f X (x) = f Y (y) = f XY (x, y)dy. f XY (x, y)dx. 8 / 21 Joint Probability Distributions Two or More Random Variables

9 Cumulative distribution: F X (x) = P(X x) = P(X x, Y takes any value) = P(X x, Y < ) = F XY (x, ) and F Y (y) = F XY (, y). 9 / 21 Joint Probability Distributions Two or More Random Variables

10 Conditional probability distributions Suppose that X and Y are discrete random variables, and that we observe the value of X : X = x i for one of its values x i. What does that tell us about Y? Recall conditional probability: P(Y = y j X = x i ) = P(Y = y j X = x i ) P(X = x i ) = f XY (x i, y j ). f X (x i ) This is the conditional probability mass function of Y given X = x i, written f Y X (y x i ). 10 / 21 Joint Probability Distributions Two or More Random Variables

11 Example: Mobile response time Conditional distributions of Y given X : x = Number of bars y = Response time Total / 21 Joint Probability Distributions Two or More Random Variables

12 When X and Y are continuous random variables, the conditional probability density function of Y given X is also defined as a ratio: f Y X (y x) = f XY (x, y), f X (x) but the reason is less clear: P(X = x) = 0, so we cannot simply divide the joint probability by the marginal probability. One approach is to condition on X being near to x, say x δx X x + δx for some small δx > 0, and take the limit as δx / 21 Joint Probability Distributions Two or More Random Variables

13 Independent random variables In some situations, knowing the value of X gives no information about the value of Y. So the conditional distribution of Y given X is the same as the marginal distribution of Y : f Y X (y x) = f Y (y). In this case, X and Y are said to be independent random variables. 13 / 21 Joint Probability Distributions Two or More Random Variables

14 But f Y X (y x) = f XY (x, y), f X (x) so when X and Y are independent f XY (x, y) f X (x) = f Y (y), or f XY (x, y) = f X (x)f Y (y). This is true for either the probability density function or the probability mass function. 14 / 21 Joint Probability Distributions Two or More Random Variables

15 So for independent random variables, it is enough to know the marginal probability distributions: the joint probability distribution is just the product of the marginal functions. Example: Cell phone flash unit The recharge time X and flash intensity Y may not be independent: they are both affected by the quality of components such as capacitors, and a defective component may cause both a long recharge time and a low flash intensity. Example: Bernoulli trials We assume that the trials are independent, so the indicator variables X 1, X 2,... are also independent. 15 / 21 Joint Probability Distributions Two or More Random Variables

16 Designed experiments When you carry out a designed experiment, such as the replicated two-factor case Y i,j,k = µ + τ i + β j + (τβ) i,j + ɛ i,j,k, good technique will ensure that the result of any one run is unaffected by results of other runs. You would then assume that the responses Y i,j,k, i = 1,..., a, j = 1,..., b, k = 1,..., n are independent random variables. 16 / 21 Joint Probability Distributions Two or More Random Variables

17 Equivalently, you could assume that the random noise terms ɛ i,j,k, i = 1,..., a, j = 1,..., b, k = 1,..., n are independent. We always assume that the noise terms have zero expected value: E(ɛ i,j,k ) = 0, and usually also a common variance: V (ɛ i,j,k ) = σ / 21 Joint Probability Distributions Two or More Random Variables

18 In order to find the probability distributions of statistics like the t-ratio and the F -ratio, we shall also assume that the noise terms have Gaussian distributions; that is, ɛ i,j,k, i = 1,..., a, j = 1,..., b, k = 1,..., n are independent random variables, each distributed as N(0, σ 2 ). The joint distribution of these a b n random variables is determined by their common N(0, σ 2 ) marginal distribution and the assumption of independence. 18 / 21 Joint Probability Distributions Two or More Random Variables

19 Residual Plots ST 370 The probability distributions of statistics like the t-ratio and the F -ratio are derived under these assumptions about the random noise terms ɛ, so we should try to verify that the assumptions actually hold. We observe the responses Y, but the parameters µ and so on are unknown, so we cannot compute the noise terms ɛ. The best we can do is replace the parameters by their estimates, and compute the residuals e i,j,k = y i,j,k (ˆµ + ˆτ i + ˆβ j + (τβ) i,j ) = y i,j,k ŷ i,j,k. 19 / 21 Joint Probability Distributions Residual Plots

20 Four plots of the residuals are often used to look for departures from the assumptions: Residuals vs Fitted values: If E(ɛ) = 0, the residuals should vary around 0, with no pattern; curvature would suggest that second-order terms are needed. Normal quantile-quantile plot: If the noise terms ɛ are Gaussian, the quantile-quantile plot should be close to a straight line; outliers or nongaussian behavior, especially longer tails, will show up. Scale-Location plot: The y-axis in this plot is residual, and, if the noise terms ɛ have constant variance, the plot should show no trend. Residuals vs Factor Levels: This plot can detect particular factor levels that change either the expected value of ɛ or its variance. 20 / 21 Joint Probability Distributions Residual Plots

21 Example: Aircraft paint A replicated two-factor case: paint <- read.csv("data/table csv") plot(aov(adhesion ~ factor(primer) * Method, paint)) Example: Wire bonds A one-predictor regression case: wirebond <- read.csv("data/table csv") plot(lm(strength ~ Length, wirebond)) In regression analyses, the fourth plot is replaced by: Residuals vs Leverage: This plot can reveal individual observations that strongly influence the analysis (Section 12-5). 21 / 21 Joint Probability Distributions Residual Plots

Covariance and Correlation

Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability