Monday, September 10 Handout: Random Processes, Probability, Random Variables, and Probability Distributions

Similar documents
Monday, November 26: Explanatory Variable Explanatory Premise, Bias, and Large Sample Properties

Wednesday, September 19 Handout: Ordinary Least Squares Estimation Procedure The Mechanics

Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o

Chapter 1 Handout: Descriptive Statistics

Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution

Statistical Inference for Means

Section 7.1 Experiments, Sample Spaces, and Events

Statistics for Managers Using Microsoft Excel (3 rd Edition)

Probability Year 9. Terminology

STAT Chapter 3: Probability

Probability Year 10. Terminology

Math P (A 1 ) =.5, P (A 2 ) =.6, P (A 1 A 2 ) =.9r

INTRODUCTION TO ANALYSIS OF VARIANCE

Section F Ratio and proportion

Agile Mind Grade 7 Scope and Sequence, Common Core State Standards for Mathematics

Review Basic Probability Concept

The Geometric Distribution

Lecture 2: Probability. Readings: Sections Statistical Inference: drawing conclusions about the population based on a sample

Conditional Probability

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

Chapter 18: Sampling Distributions

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

An Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England

Lecture 8 Sampling Theory

Lesson 19: Understanding Variability When Estimating a Population Proportion

Probability and Statistics

Problem # Number of points 1 /20 2 /20 3 /20 4 /20 5 /20 6 /20 7 /20 8 /20 Total /150

Review of probability and statistics 1 / 31

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

1. Use the Fundamental Counting Principle. , that n events, can occur is a 1. a 2. a 3. a n

Statistical Inference, Populations and Samples

Mathematics of Finance Problem Set 1 Solutions

Last few slides from last time

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

2008 Winton. Statistical Testing of RNGs

Sampling Distributions and the Central Limit Theorem. Definition

Political Science Math Camp: Problem Set 2

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran

Grade 7. South Carolina College- and Career-Ready Mathematical Process Standards

ST 371 (IX): Theories of Sampling Distributions

9. DISCRETE PROBABILITY DISTRIBUTIONS

3.2 Probability Rules

Statistics 100 Exam 2 March 8, 2017

Confidence Intervals for the Sample Mean

Homework (due Wed, Oct 27) Chapter 7: #17, 27, 28 Announcements: Midterm exams keys on web. (For a few hours the answer to MC#1 was incorrect on

PS 203 Spring 2002 Homework One - Answer Key

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Grade 7 Math Spring 2017 Item Release

4.4-Multiplication Rule: Basics

Madison County Schools Suggested 7 th Grade Math Pacing Guide for CPM

Review of the Normal Distribution

Solutions: Monday, October 22

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

AP Statistics Ch 6 Probability: The Study of Randomness

Madison County Schools Suggested 7 th Grade Math Pacing Guide,

Review of Basic Probability Theory

Stochastic calculus for summable processes 1

Do not copy, post, or distribute

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Bernoulli and Binomial Distributions. Notes. Bernoulli Trials. Bernoulli/Binomial Random Variables Bernoulli and Binomial Distributions.

Chapter 4 : Discrete Random Variables

Introduction to Statistics for Traffic Crash Reconstruction

Topic 3 Populations and Samples

7 th Grade Math Scope of Work Sumner County Schools

Chapter 7: Sampling Distributions

Math 20 Spring Discrete Probability. Midterm Exam

UNIT 5 ~ Probability: What Are the Chances? 1

Supporting Australian Mathematics Project. A guide for teachers Years 11 and 12. Probability and statistics: Module 25. Inference for means

Statistics for Engineers

CHAPTER 4 PROBABILITY AND PROBABILITY DISTRIBUTIONS

Intermediate Math Circles November 8, 2017 Probability II

Inferences About Two Proportions

σ. We further know that if the sample is from a normal distribution then the sampling STAT 2507 Assignment # 3 (Chapters 7 & 8)

Math 6 Common Core. Mathematics Prince George s County Public Schools

Machine Learning

1 Probability Distributions

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

Stat 20 Midterm 1 Review

Lab 5 for Math 17: Sampling Distributions and Applications

Curriculum Scope & Sequence Subject/Grade Level: MATHEMATICS/GRADE 7 Course: MATH 7

Conditional Probability & Independence. Conditional Probabilities

Middle School Math 2 Grade 7

Grade 7 Overview. Mathematical Practices. Ratios and Proportional Relationships

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

DRAFT EAST POINSETT CO. SCHOOL DIST. - GRADE 7 MATH

Exam III Review Math-132 (Sections 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 8.1, 8.2, 8.3)

Mathematics Grade 7 focuses on four critical areas:

Chapter 2 Class Notes

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Stat 225 Week 1, 8/20/12-8/24/12, Notes: Set Theory

Solution: By Markov inequality: P (X > 100) 0.8. By Chebyshev s inequality: P (X > 100) P ( X 80 > 20) 100/20 2 = The second estimate is better.

Political Science 6000: Beginnings and Mini Math Boot Camp

1 Basic continuous random variable problems

MATHEMATICS GRADE 7. THE EWING PUBLIC SCHOOLS 2099 Pennington Road Ewing, NJ 08618

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

Transcription:

Amherst College Department of Economics Economics 360 Fall 202 Monday, September 0 Handout: Random Processes, Probability, Random Variables, and Probability Distributions Preview Random Processes and Probability o Random Process: A process whose outcome cannot be predicted with certainty. o Probability: The likelihood of a particular outcome of a random process. Random Variable: A variable that is associated with an outcome of a random process; a variable whose numerical value cannot be determined beforehand. o Discrete Random Variables and Probability Distributions Probability Distribution: Describes the probability for all possible values of a random variable. A Random Variable s Bad News and Good News. Relative Frequency Interpretation of Probability: When a random process is repeated many, many times, the relative frequency of an outcome equals its probability. o Describing a Probability Distribution Center of the Distribution: Mean Spread of the Distribution: Variance o Continuous Random Variables and Probability Distributions Estimation Procedures o Clint s Dilemma: Assessing Clint s Political Prospects o Center of an Estimate s Probability Distribution: Mean o Spread of an Estimate s Probability Distribution: Variance Random Processes and Probability Experiment: Random card draw from a deck composed of the 2, 3, 3, and 4. Shuffle the 4 cards thoroughly. Draw one card and record it. Replace the card. Computing Probabilities There is chance in of drawing the 2 ; therefore, Prob[2 ] =. There is chance in of drawing the 3 ; therefore, Prob[3 ] =. There is chance in of drawing the 3 ; therefore, Prob[3 ] =. There is chance in of drawing the 4 ; therefore, Prob[4 ] =. Random Variable: A variable whose value _ be predicted beforehand with certainty. A discrete random variable can only take on a countable number of values. A continuous random variable can take on a _ of values.

2 Discrete Random Variables and Probability Distributions An Example: Define the random variable v: v = Value of the selected card: 2, 3, or 4. Question: What do we know about v beforehand? Answer: While we cannot determine the value of v beforehand, we can calculate its probability distribution..50 Probability Distribution of Numerical Values Card Drawn v Prob[v] 2 2 =.25 3 or 3 3 = = 4 4 = NB: The probabilities must sum to. Why? 2 3 4 v A Random Variable s Bad News and Good News: Beforehand, that is, before the experiment is conducted: Bad News: We cannot determine the numerical value of the random variable with certainty. Good News: On the other hand, we can often calculate the random variable s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Card Draw Simulation: Illustrating the Relative Frequency Interpretation of Probability Default specification: 2, 3, 3, and 4. Repetitions >,000,000: Value Relative Frequency 2 3 4 Question: How are probabilities and relative frequencies related? 2 of Hearts 2 of Diamonds 2 of Clubs 3 of Spades 3 of Hearts 3 of Diamonds 3 of Clubs 4 of Spades 4 of Hearts Cards selected to be in the deck Card drawn in this repetition.50.25 Histogram of Numerical Values 2 3 4 v Start Stop Pause Repetitions Value Mean Var Value of card drawn in this repetition Mean (average) of the numerical values of the cards drawn from all repetitions Variance of the numerical values of the cards drawn from all repetitions Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable s probability distribution.

3 Question: How can we describe the general properties of a random variable; that is, how can we describe the probability distribution of a random variable? Center of its probability distribution: Mean Spread of its probability distribution: Variance Center of the Probability Distribution: Mean (Expected Value) of the Random Variable The average of the numerical values of v after many, many repetitions of the experiment. NB: The mean of a random variable is often called the expected value. After many, many repetitions v will be 2 about a quarter of the time 3 about a half of the time 4 about a quarter of the time On average, the outcome, v, will be _. More formally, Mean[v] = Σ all v v Prob[v] v = 2 v = 3 v = 4 = _ + _ + _ For each possible value, multiply the value and its probability; then, add. = _ + + = Spread of the Probability Distribution: Variance of the Random Variable The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: For each possible value of the random variable, calculate the deviation from the mean; Square the each value s deviation; Multiply each value s squared deviation by the value s probability; Sum the products. Deviation From Squared Card Drawn v Mean[v] Mean[v] Deviation Prob[v] 2 2 3 4 =.25 3 or 3 3 3 4 4 3 Var[v] = all Σ (v Mean[v]) 2 Prob[v] v v = 2 v = 3 v = 4 = _ + _ + _ 2 =.50 4 =.25 = _ + + = For each possible value, multiply the squared deviation and its probability; then, add. NB: The distribution mean and variance are general properties of the random variable: The mean represents the center of the random variable s distribution. The variance represents the spread of the random variable s distribution.

4 Card Draw Simulation: Checking Our Math Default specification: The 2, 3, 3, and 4 are included in a deck of four cards. Repetitions >,000,000 Mean Variance After many, many repetitions of the experiment: The mean reflects the center of the distribution; more specifically, the mean equals the average of the numerical values after many, many repetitions of the experiment. The variance reflects the spread of the distribution. NB: Value of Simulations: By exploiting the relative frequency interpretation of probability (after many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable s probability distribution), we can use simulations to reveal the probability distribution. That is, simulations allow us to confirm our logic. Continuous Random Variables and Probability Distributions An Example: Dan Duffer Good news: Dan Duffer consistently hits 200 yard drives from the tee. Bad News: His drives can land up to 40 yards to the left and up to 40 yards to the right of his target point. Suppose that Dan s target point is the center of the fairway. The fairway is 32 yards wide 200 yards from the tee. Left Rough Eighteen Hole Fairway 32 yards Target Lake Let v equal the lateral distance from Dan s target point. A negative v indicates that the drive went to the left; a positive v indicates that the drive went to the right. 200 yards Right Rough A continuous random variable, unlike a discrete random variable, can take on a continuous range of values, a of values. v is a random variable Tee Probability Distribution.025 What does v s probability distribution suggest?.020.05 What is the area beneath the probability distribution? Applying the equation for the area of a triangle: Area Beneath = +.00.005-40 -32-24 -6-8 0 8 6 24 32 40 v = + = What does this imply?

5 Let us now calculate some probabilities: What is the probability that Dan s drive will land in the left rough? Prob[Drive in Left Rough] = Prob[v Less Than 6] = = What is the probability that Dan s drive will land in the lake? Prob[Drive in Lake] = Prob[v Greater Than +6] = = What is the probability that Dan s drive will land in the fairway? Prob[Drive in Fairway] = Prob[v Between 6 and +6] = = Prob[Drive in Left Rough] + Prob[Drive in Lake] + Prob[Drive in Fairway] = _ + _ + _ What does this imply? = _ Clint Ton s Dilemma On the day before the election, Clint must decide whether or not to hold a pre-election party: If he is comfortably ahead, he will not hold the party; he will save his campaign funds for a future political endeavor (or perhaps a vacation to the Caribbean next January). If he is not comfortably ahead, he will fund a party to try to sway some voters. There is not enough time to poll every member of the student body, however. What should he do? Econometrician s Philosophy: If you lack the information to determine the value directly, do the best you can by estimating the value using the information you do have. Clint s Opinion Poll: Poll a sample of the population Questionnaire: Are you voting for Clint? Procedure: Clint selects 6 students at random and poses the question. Results: 2 students report that they will vote for Clint and 4 against Clint. Estimate Fraction of the Population Supporting Clint = 2 6 = 3 4 =.75 Clint wishes to use the information collected from the sample to draw inferences about the entire population. Seventy-five percent,.75, of those polled support Clint. This suggests that Clint leads, does it not? Clint s Dilemma: Should Clint be confident that he has the election in hand or should he fund the party?

6 Polling Simulation: Learning More about Clint s Polling Procedure Questionnaire: Are you voting for Clint? Terms ActFrac = Actual Fraction of the Population Supporting Clint EstFrac = Estimated Fraction of the Population Supporting Clint Actual Population Fraction ActFrac..2.3.4.5.6.7 Sample Size 0 6 25 50 Sample Size To decide how much confidence Clint should have, we shall learn a little more about the polling procedure. A simulation will help us. In a simulation, we can do something that we cannot do in the real world. We can specify the actual proportion of the population, ActFrac, and then observe the estimated fraction, EstFrac, when we conduct a poll. In this way, we Numerical value of the estimated fraction in this repetition can learn more about the polling procedure itself. To do so, suppose that the election is a tossup; that is, suppose that the actual population fraction supporting Clint, ActFrac, equals.5. Sample Size = 6 Number ActFrac =.50 Supporting Repetition Clint EstFrac 2 3 4 5 Start Repetition: EstFrac Mean Observations: The estimated fraction, EstFrac, is a random variable. Even if we knew the actual fraction supporting Clint, ActFrac, we could not predict EstFrac before the poll. Only occasionally does the estimated fraction, EstFrac, in one repetition of the poll equal the actual population fraction. When the election is actually a toss-up, it is entirely possible that 2 or even more of the 6 students polled will support Clint. Var Stop Pause Mean (average) of the numerical values of the sample fraction from all repetitions Variance of the numerical values of the sample fraction from all repetitions

7 Populations and Samples: Estimates and Actual Values Question: How can sample information be used to draw inferences about the entire population? This is the question Clint must address. We begin with an unrealistic, but instructive, example. So, please be patient. Sample Size of One Questionnaire: Are you voting for Clint? Experiment: Write the names of every individual in the population on a 3x5 card, then Thoroughly shuffle the cards. Randomly draw one card. Ask that individual if he/she supports Clint and record the answer. Replace the card. The random variable v: v = if the individual polled supports Clint. = 0 otherwise Question: Can we determine with certainty the numerical value of v before the experiment is conducted?. Hence, v is a variable. Question: What can we say about the random variable v beforehand? Answer:. Question: How can we describe the probability distribution? Answer:. For the moment, continue to assume that the population is split evenly; that is, suppose that half the population supports Clint and half does not: Individual s Response v Prob[v] For Clint Not for Clint 0 Individual Center of the Probability Distribution: Mean. The average of the numerical values after the many, many repetitions of the experiment. After the many, many repetitions of the experiment, v will equal about half of the time 0 about half of the time On average, what will the numerical value of v equal? _. For Clint Not for Clint v 0 Prob Mean[v] = Σ all v v Prob[v] v = v = 0 Mean[v] = _ + _ For each possible value, multiply the value and its probability; then, add. = _ + =

8 Spread of the Probability Distribution: Variance. The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: For each possible value, calculate the deviation from the mean; Square each value s deviation; Multiply each value s squared deviation by the value s probability; Sum the products. Individual s Deviation From Squared Response v Mean[v] Mean[v] Deviation Prob[v] For Clint = 2 Not for Clint 0 = 2 Var[v] = Σ all v (v Mean[v]) 2 Prob[v] v = v = 0 Var[v] = _ + _ For each possible value, multiply the squared deviation and its probability; then, add. = _ + = Opinion Poll Simulation Sample Size of One: Checking Our Math Actual Population Fraction = ActFrac = p = 2 =.50 Equations: Simulation: Mean of Variance of Mean (Average) of Variance of v s v s Numerical Values Numerical Values Probability Probability Simulation of v from of v from Distribution Distribution Repetitions the Experiments the Experiments _ _ Conclusion: Our equations and simulation produce identical results. Again, this illustrates how we can exploit the relative frequency interpretation of probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable s probability distribution.

9 Generalization: Let p = ActFrac = Actual fraction of the population supporting Clint Consider the experiment: Write the name of each individual in the population on a 3 5 card Individual s Response v Prob[v] For Clint Individual For Clint v Prob Not for Clint 0 Not for Clint 0 Center of the Probability Distribution: Mean. The average of the numerical values after many, many repetitions of the experiment. After many, many repetitions of the experiment, v will equal, _ of the time 0, _ of the time Mean[v] = Σ all v v Prob[v] Mean[v] v = v = 0 = _ + _ For each possible value, multiply the value and its probability; then, add. = _ + = Spread of the Probability Distribution: Variance. The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: For each possible value, calculate the deviation from the mean; Square each value s deviation; Multiply each value s squared deviation by the value s probability; Sum the products. Individual s Deviation From Squared Response v Mean[v] Mean[v] Deviation Prob[v] For Clint p Not for Clint 0 p Var[v] = Σ all v (v Mean[v]) 2 Prob[v] For each possible value, multiply the squared deviation and its probability; then, add. v = v = 0 Var[v] = _ + _ = = =

0 Sample Size of Two Questionnaire: Are you voting for Clint? Experiment: Write the names of every individual in the population on a card In the first stage: o Thoroughly shuffle the cards. o Randomly draw one card. o Ask that individual if he/she supports Clint and record the answer; this yields a specific numerical value of v for the random variable. v equals if the first individual polled supports Clint; 0 otherwise. o Replace the card. In the second stage, the procedure is repeated: o Thoroughly shuffle the cards. o Randomly draw one card. o Ask that individual if he/she supports Clint and record the answer; this yields a specific numerical value of v 2 for the random variable. v 2 equals if the second individual polled supports Clint; 0 otherwise. o Replace the card. Calculate the fraction of those polled supporting Clint. Fraction of Sample Supporting Clint, Estimated Fraction: EstFrac = v + v 2 2 = 2 (v + v 2 ) The estimated fraction of the population supporting Clint is a random variable; that is, EstFrac is a random variable. We cannot determine with certainty the numerical value of the estimated fraction, EstFrac, before the experiment is conducted. Question: What can we say about the random variable EstFrac beforehand? Answer: We can describe its probability distribution. Question: How can we describe the probability distribution? Answer: Compute its center (mean) and spread (variance). Center of the Estimated Fraction s Probability Distribution: Mean. Mean[EstFrac] = Mean[ 2 (v + v 2 )] What do we know? Mean[v ] = Mean[v] = p Mean[v 2 ] = Mean[v] = p Arithmetic of Means: Mean[cx] = cmean[x] Mean[x + y] = Mean[x] + Mean[y] Mean[cx] = cmean[x] Mean[x + y] = Mean[x] + Mean[y] Mean[ 2 (v + v 2 )] = = = = = _

Spread of the Estimated Fraction s Probability Distribution: Variance. Var[EstFrac] = Var[ 2 (v + v 2 )] What do we know? Var[v ] = Var[v] = p( p) Var[v 2 ] = Var[v] = p( p) Arithmetic of Variances: Var[cx] = c 2 Var[x] Var[x + y] = Var[x] + 2Cov[x, y] + Var[y] Var[cx] = c 2 Var[x] Var[x + y] = Var[x] + 2Cov[x, y] + Var[y] Var[ 2 (v + v 2 )] = = = v and v 2 are independent: Cov[v, v 2 ] = 0 = = = Question: Why are v and v 2 independent? Answer: Since the card of the first name drawn is replaced, whether or not the first voter polled supports Clint does not affect the probability that the second voter will support Clint. In either case, the probability that the second voter will support Clint is p, the actual population fraction. Consequently, knowing the value of v does not help us predict the value of v 2. More formally, the numerical value of v does not affect v 2 s probability distribution and vice versa. The random variables are independent. Hence, their covariance equals 0. Opinion Poll Simulation Sample Size of Two: Checking Our Math Actual Population Fraction = ActFrac = p = 2 =.50 Equations: Simulations: Mean of Variance of Mean (Average) of Variance of EstFrac s EstFrac s Numerical Values Numerical Values Sample Probability Probability Simulation of EstFrac from of EstFrac from Size Distribution Distribution Repetitions the Experiments the Experiments 2 _ _ Conclusion: Our equations and simulation produce identical results. Again, this illustrates how we can exploit the relative frequency interpretation of probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable s probability distribution.

2 Summary of Random Variables Before the experiment is conducted Bad news. What we do not know: We cannot determine the numerical value of the random variable with certainty. Good news. What we do know: On the other hand, we can often calculate the random variable s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment: The distribution of the numerical values from the experiments mirrors the random variable s probability distribution; the two distributions are identical. Distribution of the Numerical Values After many, many repetitions Probability Distribution The distribution mean and variance describe the general properties of the random variable: o The mean reflects the center of the distribution; more specifically, the mean equals the average of the numerical values after many, many repetitions. o The variance reflects the spread of the distribution. Mean of the Numerical Values Variance of Numerical Values After many, many repetitions Mean of Probability Distribution Variance of Probability Distribution for One Repetition for One Repetition