Amherst College Department of Economics Economics 360 Fall 202 Monday, September 0 Handout: Random Processes, Probability, Random Variables, and Probability Distributions Preview Random Processes and Probability o Random Process: A process whose outcome cannot be predicted with certainty. o Probability: The likelihood of a particular outcome of a random process. Random Variable: A variable that is associated with an outcome of a random process; a variable whose numerical value cannot be determined beforehand. o Discrete Random Variables and Probability Distributions Probability Distribution: Describes the probability for all possible values of a random variable. A Random Variable s Bad News and Good News. Relative Frequency Interpretation of Probability: When a random process is repeated many, many times, the relative frequency of an outcome equals its probability. o Describing a Probability Distribution Center of the Distribution: Mean Spread of the Distribution: Variance o Continuous Random Variables and Probability Distributions Estimation Procedures o Clint s Dilemma: Assessing Clint s Political Prospects o Center of an Estimate s Probability Distribution: Mean o Spread of an Estimate s Probability Distribution: Variance Random Processes and Probability Experiment: Random card draw from a deck composed of the 2, 3, 3, and 4. Shuffle the 4 cards thoroughly. Draw one card and record it. Replace the card. Computing Probabilities There is chance in of drawing the 2 ; therefore, Prob[2 ] =. There is chance in of drawing the 3 ; therefore, Prob[3 ] =. There is chance in of drawing the 3 ; therefore, Prob[3 ] =. There is chance in of drawing the 4 ; therefore, Prob[4 ] =. Random Variable: A variable whose value _ be predicted beforehand with certainty. A discrete random variable can only take on a countable number of values. A continuous random variable can take on a _ of values.
2 Discrete Random Variables and Probability Distributions An Example: Define the random variable v: v = Value of the selected card: 2, 3, or 4. Question: What do we know about v beforehand? Answer: While we cannot determine the value of v beforehand, we can calculate its probability distribution..50 Probability Distribution of Numerical Values Card Drawn v Prob[v] 2 2 =.25 3 or 3 3 = = 4 4 = NB: The probabilities must sum to. Why? 2 3 4 v A Random Variable s Bad News and Good News: Beforehand, that is, before the experiment is conducted: Bad News: We cannot determine the numerical value of the random variable with certainty. Good News: On the other hand, we can often calculate the random variable s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Card Draw Simulation: Illustrating the Relative Frequency Interpretation of Probability Default specification: 2, 3, 3, and 4. Repetitions >,000,000: Value Relative Frequency 2 3 4 Question: How are probabilities and relative frequencies related? 2 of Hearts 2 of Diamonds 2 of Clubs 3 of Spades 3 of Hearts 3 of Diamonds 3 of Clubs 4 of Spades 4 of Hearts Cards selected to be in the deck Card drawn in this repetition.50.25 Histogram of Numerical Values 2 3 4 v Start Stop Pause Repetitions Value Mean Var Value of card drawn in this repetition Mean (average) of the numerical values of the cards drawn from all repetitions Variance of the numerical values of the cards drawn from all repetitions Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable s probability distribution.
3 Question: How can we describe the general properties of a random variable; that is, how can we describe the probability distribution of a random variable? Center of its probability distribution: Mean Spread of its probability distribution: Variance Center of the Probability Distribution: Mean (Expected Value) of the Random Variable The average of the numerical values of v after many, many repetitions of the experiment. NB: The mean of a random variable is often called the expected value. After many, many repetitions v will be 2 about a quarter of the time 3 about a half of the time 4 about a quarter of the time On average, the outcome, v, will be _. More formally, Mean[v] = Σ all v v Prob[v] v = 2 v = 3 v = 4 = _ + _ + _ For each possible value, multiply the value and its probability; then, add. = _ + + = Spread of the Probability Distribution: Variance of the Random Variable The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: For each possible value of the random variable, calculate the deviation from the mean; Square the each value s deviation; Multiply each value s squared deviation by the value s probability; Sum the products. Deviation From Squared Card Drawn v Mean[v] Mean[v] Deviation Prob[v] 2 2 3 4 =.25 3 or 3 3 3 4 4 3 Var[v] = all Σ (v Mean[v]) 2 Prob[v] v v = 2 v = 3 v = 4 = _ + _ + _ 2 =.50 4 =.25 = _ + + = For each possible value, multiply the squared deviation and its probability; then, add. NB: The distribution mean and variance are general properties of the random variable: The mean represents the center of the random variable s distribution. The variance represents the spread of the random variable s distribution.
4 Card Draw Simulation: Checking Our Math Default specification: The 2, 3, 3, and 4 are included in a deck of four cards. Repetitions >,000,000 Mean Variance After many, many repetitions of the experiment: The mean reflects the center of the distribution; more specifically, the mean equals the average of the numerical values after many, many repetitions of the experiment. The variance reflects the spread of the distribution. NB: Value of Simulations: By exploiting the relative frequency interpretation of probability (after many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable s probability distribution), we can use simulations to reveal the probability distribution. That is, simulations allow us to confirm our logic. Continuous Random Variables and Probability Distributions An Example: Dan Duffer Good news: Dan Duffer consistently hits 200 yard drives from the tee. Bad News: His drives can land up to 40 yards to the left and up to 40 yards to the right of his target point. Suppose that Dan s target point is the center of the fairway. The fairway is 32 yards wide 200 yards from the tee. Left Rough Eighteen Hole Fairway 32 yards Target Lake Let v equal the lateral distance from Dan s target point. A negative v indicates that the drive went to the left; a positive v indicates that the drive went to the right. 200 yards Right Rough A continuous random variable, unlike a discrete random variable, can take on a continuous range of values, a of values. v is a random variable Tee Probability Distribution.025 What does v s probability distribution suggest?.020.05 What is the area beneath the probability distribution? Applying the equation for the area of a triangle: Area Beneath = +.00.005-40 -32-24 -6-8 0 8 6 24 32 40 v = + = What does this imply?
5 Let us now calculate some probabilities: What is the probability that Dan s drive will land in the left rough? Prob[Drive in Left Rough] = Prob[v Less Than 6] = = What is the probability that Dan s drive will land in the lake? Prob[Drive in Lake] = Prob[v Greater Than +6] = = What is the probability that Dan s drive will land in the fairway? Prob[Drive in Fairway] = Prob[v Between 6 and +6] = = Prob[Drive in Left Rough] + Prob[Drive in Lake] + Prob[Drive in Fairway] = _ + _ + _ What does this imply? = _ Clint Ton s Dilemma On the day before the election, Clint must decide whether or not to hold a pre-election party: If he is comfortably ahead, he will not hold the party; he will save his campaign funds for a future political endeavor (or perhaps a vacation to the Caribbean next January). If he is not comfortably ahead, he will fund a party to try to sway some voters. There is not enough time to poll every member of the student body, however. What should he do? Econometrician s Philosophy: If you lack the information to determine the value directly, do the best you can by estimating the value using the information you do have. Clint s Opinion Poll: Poll a sample of the population Questionnaire: Are you voting for Clint? Procedure: Clint selects 6 students at random and poses the question. Results: 2 students report that they will vote for Clint and 4 against Clint. Estimate Fraction of the Population Supporting Clint = 2 6 = 3 4 =.75 Clint wishes to use the information collected from the sample to draw inferences about the entire population. Seventy-five percent,.75, of those polled support Clint. This suggests that Clint leads, does it not? Clint s Dilemma: Should Clint be confident that he has the election in hand or should he fund the party?
6 Polling Simulation: Learning More about Clint s Polling Procedure Questionnaire: Are you voting for Clint? Terms ActFrac = Actual Fraction of the Population Supporting Clint EstFrac = Estimated Fraction of the Population Supporting Clint Actual Population Fraction ActFrac..2.3.4.5.6.7 Sample Size 0 6 25 50 Sample Size To decide how much confidence Clint should have, we shall learn a little more about the polling procedure. A simulation will help us. In a simulation, we can do something that we cannot do in the real world. We can specify the actual proportion of the population, ActFrac, and then observe the estimated fraction, EstFrac, when we conduct a poll. In this way, we Numerical value of the estimated fraction in this repetition can learn more about the polling procedure itself. To do so, suppose that the election is a tossup; that is, suppose that the actual population fraction supporting Clint, ActFrac, equals.5. Sample Size = 6 Number ActFrac =.50 Supporting Repetition Clint EstFrac 2 3 4 5 Start Repetition: EstFrac Mean Observations: The estimated fraction, EstFrac, is a random variable. Even if we knew the actual fraction supporting Clint, ActFrac, we could not predict EstFrac before the poll. Only occasionally does the estimated fraction, EstFrac, in one repetition of the poll equal the actual population fraction. When the election is actually a toss-up, it is entirely possible that 2 or even more of the 6 students polled will support Clint. Var Stop Pause Mean (average) of the numerical values of the sample fraction from all repetitions Variance of the numerical values of the sample fraction from all repetitions
7 Populations and Samples: Estimates and Actual Values Question: How can sample information be used to draw inferences about the entire population? This is the question Clint must address. We begin with an unrealistic, but instructive, example. So, please be patient. Sample Size of One Questionnaire: Are you voting for Clint? Experiment: Write the names of every individual in the population on a 3x5 card, then Thoroughly shuffle the cards. Randomly draw one card. Ask that individual if he/she supports Clint and record the answer. Replace the card. The random variable v: v = if the individual polled supports Clint. = 0 otherwise Question: Can we determine with certainty the numerical value of v before the experiment is conducted?. Hence, v is a variable. Question: What can we say about the random variable v beforehand? Answer:. Question: How can we describe the probability distribution? Answer:. For the moment, continue to assume that the population is split evenly; that is, suppose that half the population supports Clint and half does not: Individual s Response v Prob[v] For Clint Not for Clint 0 Individual Center of the Probability Distribution: Mean. The average of the numerical values after the many, many repetitions of the experiment. After the many, many repetitions of the experiment, v will equal about half of the time 0 about half of the time On average, what will the numerical value of v equal? _. For Clint Not for Clint v 0 Prob Mean[v] = Σ all v v Prob[v] v = v = 0 Mean[v] = _ + _ For each possible value, multiply the value and its probability; then, add. = _ + =
8 Spread of the Probability Distribution: Variance. The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: For each possible value, calculate the deviation from the mean; Square each value s deviation; Multiply each value s squared deviation by the value s probability; Sum the products. Individual s Deviation From Squared Response v Mean[v] Mean[v] Deviation Prob[v] For Clint = 2 Not for Clint 0 = 2 Var[v] = Σ all v (v Mean[v]) 2 Prob[v] v = v = 0 Var[v] = _ + _ For each possible value, multiply the squared deviation and its probability; then, add. = _ + = Opinion Poll Simulation Sample Size of One: Checking Our Math Actual Population Fraction = ActFrac = p = 2 =.50 Equations: Simulation: Mean of Variance of Mean (Average) of Variance of v s v s Numerical Values Numerical Values Probability Probability Simulation of v from of v from Distribution Distribution Repetitions the Experiments the Experiments _ _ Conclusion: Our equations and simulation produce identical results. Again, this illustrates how we can exploit the relative frequency interpretation of probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable s probability distribution.
9 Generalization: Let p = ActFrac = Actual fraction of the population supporting Clint Consider the experiment: Write the name of each individual in the population on a 3 5 card Individual s Response v Prob[v] For Clint Individual For Clint v Prob Not for Clint 0 Not for Clint 0 Center of the Probability Distribution: Mean. The average of the numerical values after many, many repetitions of the experiment. After many, many repetitions of the experiment, v will equal, _ of the time 0, _ of the time Mean[v] = Σ all v v Prob[v] Mean[v] v = v = 0 = _ + _ For each possible value, multiply the value and its probability; then, add. = _ + = Spread of the Probability Distribution: Variance. The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: For each possible value, calculate the deviation from the mean; Square each value s deviation; Multiply each value s squared deviation by the value s probability; Sum the products. Individual s Deviation From Squared Response v Mean[v] Mean[v] Deviation Prob[v] For Clint p Not for Clint 0 p Var[v] = Σ all v (v Mean[v]) 2 Prob[v] For each possible value, multiply the squared deviation and its probability; then, add. v = v = 0 Var[v] = _ + _ = = =
0 Sample Size of Two Questionnaire: Are you voting for Clint? Experiment: Write the names of every individual in the population on a card In the first stage: o Thoroughly shuffle the cards. o Randomly draw one card. o Ask that individual if he/she supports Clint and record the answer; this yields a specific numerical value of v for the random variable. v equals if the first individual polled supports Clint; 0 otherwise. o Replace the card. In the second stage, the procedure is repeated: o Thoroughly shuffle the cards. o Randomly draw one card. o Ask that individual if he/she supports Clint and record the answer; this yields a specific numerical value of v 2 for the random variable. v 2 equals if the second individual polled supports Clint; 0 otherwise. o Replace the card. Calculate the fraction of those polled supporting Clint. Fraction of Sample Supporting Clint, Estimated Fraction: EstFrac = v + v 2 2 = 2 (v + v 2 ) The estimated fraction of the population supporting Clint is a random variable; that is, EstFrac is a random variable. We cannot determine with certainty the numerical value of the estimated fraction, EstFrac, before the experiment is conducted. Question: What can we say about the random variable EstFrac beforehand? Answer: We can describe its probability distribution. Question: How can we describe the probability distribution? Answer: Compute its center (mean) and spread (variance). Center of the Estimated Fraction s Probability Distribution: Mean. Mean[EstFrac] = Mean[ 2 (v + v 2 )] What do we know? Mean[v ] = Mean[v] = p Mean[v 2 ] = Mean[v] = p Arithmetic of Means: Mean[cx] = cmean[x] Mean[x + y] = Mean[x] + Mean[y] Mean[cx] = cmean[x] Mean[x + y] = Mean[x] + Mean[y] Mean[ 2 (v + v 2 )] = = = = = _
Spread of the Estimated Fraction s Probability Distribution: Variance. Var[EstFrac] = Var[ 2 (v + v 2 )] What do we know? Var[v ] = Var[v] = p( p) Var[v 2 ] = Var[v] = p( p) Arithmetic of Variances: Var[cx] = c 2 Var[x] Var[x + y] = Var[x] + 2Cov[x, y] + Var[y] Var[cx] = c 2 Var[x] Var[x + y] = Var[x] + 2Cov[x, y] + Var[y] Var[ 2 (v + v 2 )] = = = v and v 2 are independent: Cov[v, v 2 ] = 0 = = = Question: Why are v and v 2 independent? Answer: Since the card of the first name drawn is replaced, whether or not the first voter polled supports Clint does not affect the probability that the second voter will support Clint. In either case, the probability that the second voter will support Clint is p, the actual population fraction. Consequently, knowing the value of v does not help us predict the value of v 2. More formally, the numerical value of v does not affect v 2 s probability distribution and vice versa. The random variables are independent. Hence, their covariance equals 0. Opinion Poll Simulation Sample Size of Two: Checking Our Math Actual Population Fraction = ActFrac = p = 2 =.50 Equations: Simulations: Mean of Variance of Mean (Average) of Variance of EstFrac s EstFrac s Numerical Values Numerical Values Sample Probability Probability Simulation of EstFrac from of EstFrac from Size Distribution Distribution Repetitions the Experiments the Experiments 2 _ _ Conclusion: Our equations and simulation produce identical results. Again, this illustrates how we can exploit the relative frequency interpretation of probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable s probability distribution.
2 Summary of Random Variables Before the experiment is conducted Bad news. What we do not know: We cannot determine the numerical value of the random variable with certainty. Good news. What we do know: On the other hand, we can often calculate the random variable s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment: The distribution of the numerical values from the experiments mirrors the random variable s probability distribution; the two distributions are identical. Distribution of the Numerical Values After many, many repetitions Probability Distribution The distribution mean and variance describe the general properties of the random variable: o The mean reflects the center of the distribution; more specifically, the mean equals the average of the numerical values after many, many repetitions. o The variance reflects the spread of the distribution. Mean of the Numerical Values Variance of Numerical Values After many, many repetitions Mean of Probability Distribution Variance of Probability Distribution for One Repetition for One Repetition