Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

Similar documents
Discrete Probability and State Estimation

STA Module 4 Probability Concepts. Rev.F08 1

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 8: An Introduction to Probability and Statistics

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Probability Year 10. Terminology

Homework (due Wed, Oct 27) Chapter 7: #17, 27, 28 Announcements: Midterm exams keys on web. (For a few hours the answer to MC#1 was incorrect on

Section 13.3 Probability

Men. Women. Men. Men. Women. Women

Probability Year 9. Terminology

Today we ll discuss ways to learn how to think about events that are influenced by chance.

MAT Mathematics in Today's World

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Senior Math Circles November 19, 2008 Probability II

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

3 PROBABILITY TOPICS

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability

Objectives. CHAPTER 5 Probability and Probability Distributions. Counting Rules. Counting Rules

Example. If 4 tickets are drawn with replacement from ,

MITOCW watch?v=vjzv6wjttnc

3.2 Probability Rules

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Chapter 13, Probability from Applied Finite Mathematics by Rupinder Sekhon was developed by OpenStax College, licensed by Rice University, and is

Chapter 35 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.

Probability (Devore Chapter Two)

Lecture 3. January 7, () Lecture 3 January 7, / 35

Lecture 3 Probability Basics

Chapter 26: Comparing Counts (Chi Square)

Discrete Probability and State Estimation

Discrete Probability

Mutually Exclusive Events

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

Discrete Probability. Chemistry & Physics. Medicine

value of the sum standard units

Chapter 4: An Introduction to Probability and Statistics

Grades 7 & 8, Math Circles 24/25/26 October, Probability

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

7.1 What is it and why should we care?

Event A: at least one tail observed A:

STAT 201 Chapter 5. Probability

Introductory Probability

Chapter 7: Section 7-1 Probability Theory and Counting Principles

Basic Concepts of Probability

MITOCW ocw f99-lec30_300k

Module 8 Probability

1 The Basic Counting Principles

STAT200 Elementary Statistics for applications

MATH STUDENT BOOK. 12th Grade Unit 9

Chapter 7 Wednesday, May 26th

Probability 5-4 The Multiplication Rules and Conditional Probability

P (E) = P (A 1 )P (A 2 )... P (A n ).

Note: Please use the actual date you accessed this material in your citation.

STA111 - Lecture 1 Welcome to STA111! 1 What is the difference between Probability and Statistics?

Chapter 2.5 Random Variables and Probability The Modern View (cont.)

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006

Fundamentals of Probability CE 311S

1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Probability and Independence Terri Bittner, Ph.D.

V. Probability. by David M. Lane and Dan Osherson

Lecture 6 - Random Variables and Parameterized Sample Spaces

Statistics for Engineers

6.2 Introduction to Probability. The Deal. Possible outcomes: STAT1010 Intro to probability. Definitions. Terms: What are the chances of?

Lecture 3. Measures of Relative Standing and. Exploratory Data Analysis (EDA)

Probability. Hosung Sohn

2011 Pearson Education, Inc

Chapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e

Math 140 Introductory Statistics

Sampling Distribution Models. Chapter 17

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

CIS 2033 Lecture 5, Fall

Math 140 Introductory Statistics

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Basic Concepts of Probability

Probability and Discrete Distributions

Probability and Probability Distributions. Dr. Mohammed Alahmed

Markov Chains. Chapter 16. Markov Chains - 1

Probability and Statistics Notes

RVs and their probability distributions

The set of all outcomes or sample points is called the SAMPLE SPACE of the experiment.

Announcements. Topics: To Do:

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

Instructor (Brad Osgood)

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Conditional Probability

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers

Math 243 Section 3.1 Introduction to Probability Lab

Probability (Devore Chapter Two)

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Binomial Distribution *

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Lecture 6 Probability

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Lecture 2: Probability and Distributions

STAT:5100 (22S:193) Statistical Inference I

Transcription:

FPPA-Chapters 13,14 and parts of 16,17, and 18 STATISTICS 50 Richard A. Berk Spring, 1997 May 30, 1997 1 Thinking about Chance People talk about \chance" and \probability" all the time. There are many dierent denitions implied. Some are based on states of mind (called \subjective probability"), and some are based on the proportion of times something happens over the long run ("called \frequentist probability."). Examples of subjective probability can include the medical diagnosis of a patient, the chances of rain, and how likely it is that the \big bang" theory is correct. These can be thought of as one-time events in which it is still legitimate to speak about subjective probabilities referring to what you believe: \I think the chance of rain today is about.5." But note that today cannot happen again. 1

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability since that is the kind of probability most used in applications (although the subjective approach is gaining in popularity for scientic applications). This is not just some math exercise. It is central for a lot of statistics when an eort is made to represent uncertainty. A key application will be in sample surveys such as the gallup poll. But probability plays a key role in all sorts of risk assessments. The risk of the transmission of AIDS the probability that a sexual contact will lead to transmission of the HIV. The risk of a tanker oil spill the probability that a substantial amount of oil will be spilled. The risk of a fatal car accident the probability that a person will die in a car crash. What these examples share is the translation of a proportion into a probability via a large number of repeatable events. In each case the is a denominator which contains the number of \opportunities" for the event to occur and a numerator which contains the number of times the event 2

occurs. It is essential that both are properly conceptualized and how the are conceptualized will typically aect the computed result. For example, one could study a very large number of tanker trips and compute the proportion of times there were spills of certain sizes. But the denition of \tanker trips" and of spills of certain sizes must be very clear. Then, it might make sense to apply the abstraction of a probability to the data (just like it sometimes is useful to apply the abstraction of the normal curve to a histogram). At this point, it is not clear what this has to do with surveys and randomized experiments, but we will get to those eventually. 2 Frequentist Probability Now, more formally, the probability of some event is the proportion of times (or percentage) that event can be expected to occur is a very large number of independent and identical opportunities (\trials"). This is an abstraction since nothing in the real world is ever exactly like this. But often, the t is good enough \for government work." Games of chance are a good way to think about it. The probability of heads in a large number of coin ips is 3

.5 (i.e., 50% of the time). The probability of getting a 6 when rolling a die is 1=6 = :167 (i.e., 16.7% of the time). If the thing you are studying can be conceptualized in terms of games of chance, it is likely that the frequentist denition of probability applies and can be useful. If not, not. Many times it does not apply. Ask yourself, does it make sense to think about a large number of identical and independent events here? Probabilities (\chances" in Freedman's terms) are between 0 (0%) and 1.0 (100%). Example: The probability of ipping two heads in a row is.25. You can get this perhaps most intuitively by counting up all the outcomes that could occur in two coin ips (HH, TT, HT, TH) and then computing the proportion of outcomes represented by two heads. This is what we did above with oil spills, for example. Again, the key link is between proportions and probabilities. The probability that an event can occur is 1.0 minus the chance that it will not occur. Example: The probability of ipping anything but two heads is.75. (1:0,:25 = :75). Again, thinking about proportions help, since the proportion for all possible events is 1.0. 4

3 The \ Box Model" We use the idea of a \box model" to help us understand the concept of probability and later, many real-world applications. The metaphor is that you have a box, and in that box is a set of tickets. Each ticket has something written on it. The box is thoroughly shaken, and then you draw one ticket. The shaking and drawing is repeated a very large number of times under the same conditions. If the one ticket is put back into the box before the next draw, the selection (i.e., sampling) is with replacement. If the one ticket is not put back in the box before the next draw, the selection (i.e., sampling) is without replacement. Example: suppose the box contains the tickets (1,2,2,3). What is the probability of getting a 1? If you sampled with replacement, the probability is.25 (or 25% of the time), since the proportion of 1's is.25. A key point is that after each draw, that ticket is replaced, and the box is shaken again to make sure that the arrangement of the tickets cannot be predicted. This is an eort to ensure that each ticket selected is chosen at random; i.e., which ticket is chosen cannot be predicted. Then, the process is repeated over and over. The results is that 25% of the 5

tickets chosen will be 1's. Likewise, 50% of the tickets chosen will be 2's. Thus, while the result on any given draw cannot be predicted, the result of a large number of draws can be. This is a key point about the use of probabilities. We don't know what will happen on any given draw, but we know what will happen over the \long run." A word on notation. We sometimes summarize probability statements in the following form: P (1) = :25. That is, the probability of drawing a 1 is.25. 4 Conditional and Unconditional Probabilities We have been talking about unconditional probabilities. Conditional probabilities are a bit dierent. Once again, we use a box model to explain. Example: Suppose a box contains the tickets (3,3,4,4,5). Under sampling with replacement, the unconditional probability of selecting a 4 is.40 (i.e., 2 out of 5). Now suppose that the rst ticket drawn is a 4, and we do not put it back in the box. But we go ahead and sample from the remaining tickets. If we repeat this two-step sampling process a very large number of times (remove a 4 and 6

then sample from the rest with replacement, the conditional probability of drawing a 4 is.25 (1 out of 4). Note that the conditional and unconditional probabilities are dierent in this case. In techno-speak, we say that the probability of drawing a 4, given that a 4 was already drawn, is.25. In notational form: P (4j4) = :25. 5 Multiplication Rule The probability that two events will occur equals the probability that the rst will occur times the probability that the second will occur, given that the rst has already occurred. Example: Suppose a box contains the tickets (2,2,3,3,4). The probability of drawing a 4 is.20 (1 out of 5). The probability of drawing a 2 is.40 (2 out of 5). The probability of drawing a 4 and a 2 (sampling with replacement) is.20 times.40, which equals.08 (or 8% of the time). This is the result when you sample twice, with replacement after each draw. Now suppose you don't replace the 4 before selecting the 2. Then the probability of drawing a 4 and a 2, is.20 times.50, or.10 (10% of the time). This is the result when you sample without replacement after the rst draw. 7

6 Independence Continuing with our example, notice that in the rst case, the probability of drawing a 2 on the second draw is the same regardless of what happened on the rst draw. In the second case, the probability of drawing a 2 could vary, depending on the outcome of the rst draw. The rst case an illustration of two independent events. In the second case, the events are dependent. Consider shooting a pair of foul shots. If the probability of making the second shot is the same whether or not the rst shot is made, the two shots are independent. If the probability of making the second shot diers depending on whether or not the rst shot is made, the second shot two shots are dependent. Thus, two events are independent if the probability of the second event, given that the rst has occurred, is the same, no matter how the rst event turns out. Otherwise, the two events are dependent. Another example: We have two boxes of tickets (1,1,3,3) and (1,2,3,4). Suppose we draw a single ticket from the rst box and then a single ticket from the second box. For the second box, the probability of drawing a 3 (say) is.25 no matter what we drawn from the rst box. The 8

draws are, therefore, independent. Now suppose we pool the two boxes (1,1,1,2,3,3,3,4). We draw once without replacement and then draw again. The probability of drawing a 3 on the second draw is now either 3 out of 7, if something other than a 3isdrawn rst (.43 or 43% of the time), or 2 out of 7, if a 3 is drawn rst (.29 or 29% of the time). Since the second probability depends on what happen in the rst draw, the two draws are dependent. When two events are independent, the probability that both events will occur is the product of their unconditional probabilities. When two events are dependent, the probability that both events will occur is the product of the unconditional probability of the rst event and the conditional probability of the second even, given the outcome of the rst event. Example: Suppose the probability of a person with an Hispanic surname will win the lottery is.20. Suppose the probability that a woman will win, given that she has an Hispanic surname is.60. Then the probability that a women with an Hispanic surname will win is.12 (.20 times.60). If the probability of the winner being 9

a women is also.60 for people who have non-hispanic surnames, gender and surname are independent. If that probability is other than.60, gender and surname are dependent (because the probability of the winner being a women changes depending on whether the winner has an Hispanic surname). The computations are the same in both cases, but when the events are independent, the probability of the second event can be the unconditional probability (since conditioning (here, on surname) does not matter. These sorts of issue come up all the time in litigation on discrimination. 7 Enumeration as a method For many kinds of problems in probability, it is useful to construct all of the possible outcomes for the chance process and then count of outcomes of particular interest. The total number of outcomes then goes in the denominator and the number of outcomes of interest goes in the numerator (to compute the probability). In eect, we have been doing this all along with the box model. But now our problem is to properly construct the box model, and the applications are more complicated. Example: Suppose a sexually active person has 3 sex- 10

ual partners over a 6 month period. Also suppose that each of these has a probability of.50 of having some kind of sexually transmitted disease. If you assume that the epidemiological process just described is is like 3 ips of a fair coin, what is the probability that the majority (2 or more) of the partners will have an STD? There are eight possible outcomes (where \y" is yes, they are infected, and \n" is no, they are not infected): yyy, nyy, yny, yyn, ynn, nyn, nny, nnn. And there are 4 cases in which there are 2 or more y's. So the probability is 4=8 = :50. A key point: everything depends upon the assumption that the pairing of sexual partners with and without STD's is like a series of coin ip with a 50-50 coin. Again, using probability computations depends on whether the abstraction ts the situation. 8 Mutually Exclusive Events and The Addition Rule Two events are mutually exclusive when if one occurs, the other cannot occur. Example: the ethnic background of this week's Lotto winner is a set of mutually exclusive events: Anglo, African-American, Asian-American, Latino, etc. The winner can only have one ethnic background (under most denitions), so the dierent ethnic 11

backgrounds are mutually exclusive. Then, to nd the probability of at least one of a set of mutually exclusive events happening, you can just add the probabilities. (Note: the set of events also has to be exhaustive all of the possible events are known and taken into account). Example: Suppose that when a person is formally charged with a felony, the following things can happen: the case is dropped, the person can plead guilty, or the person can go to trial. Suppose also that the associated probabilities for these exhaustive and mutually exclusive events are respectively.25,.50, and.25. The the probability of pleading or going to trial is.75. 9 The Law of Averages As the number of independent trials increases the observed probability approaches the expected probability. That is, the computations from the data approach the mathematical abstraction. Freedman's example: if you toss a coin a very large number of times, the proportion of times you get heads gets closer and closer to.50 even through the number of 12

heads minus half the number of tosses tends get larger and larger. If you toss a coin a large number of times, there are a large number of opportunities to depart from the 50-50 slip of heads and tails. But the total number of heads as a proportion of the total number of tosses will get closer and closer to.50. More intuitively, with a large sample we get a better x what the probability should be. With coin ips and other games of chance, you have a chance process. What Freedman means is some physical procedure that can be repeated over and over under the same conditions (for all practical purposes) and that has an unpredictable outcome. 10 Working with Statistical Summaries of the Tickets Drawn: The Expect Value and Standard Error Just as the ticket that will be chosen from a box is not predictable, so is a set of tickets. It follows that any computations done on a set of tickets is unpredictable as well. Example: Suppose the tickets are (1,2,3,4). Now we sample 4 tickets with replacement and we get (1,4,3,3). Their sum is 11. We sample 4 tickets again with replace- 13

ment and we get (2,4,3,4). Their sum is 13. One more time: (1,1,4,4) for a sum of 10. So, we see that the sum of each set is unpredictable and will vary from set to set. Note that since the mean is just the sum divided by the number of tickets, the mean too is unpredictable. The same story holds if instead of the mean, we computed the proportion of 3's in the set of 4 tickets drawn. (The proportion is just the mean of a bunch of 1's and 0's.) And the same story holds for any useful summary statistic we might choose to compute (e.g., the median, the IQR). But let's stick with the sum for now. While the sums vary, it stands to reason that the sums should vary around some typical value for the box. For example, in 100 draws with replacement, you'd expect about 25 1's, 25 2's, 25 3's and 25 4's. So the sums should vary around 250. Sometimes th sum of the 100 draws with replacement will be a bit more than 250 and sometimes a bit less. Or, if only 4 draws were made with replacement, you'd expect the sums to vary around 10. The number that these sums vary around is called the expected value, which equals: number of draws times the average (mean) of the box. We are going to use this idea later. The basic message is that any given sample is likely to be in the neighborhood of the expected value. So, if you don't happen to 14

know the expected value, a sample value may be in the right neighborhood. For example, we do public opinion polls to learn what proportion of the public hold a particular view. We don't know that that proportion is because if we did, there would be no need to do the survey. But we can take advantage (we will see how later) of the fact that with a proper sample, the proportion will be in the neighborhood of the expected value for the relevant population. But we'd also like to know how far a given sample result is likely to be from the expected value by chance error. This value, which we call the standard error is for the sum of the draws is computed by multiplying the standard deviation of the box by the square root of the number of draws. The standard error for the sum will be larger when the standard deviation of the box is larger and when the number or draws is larger. Note that the standard error applies to variation around the expected value of the sum Example: Suppose the tickets are (1 1 3 3), and we take draws of size 4. The expected value of the sum is 8 (the mean of 2 times 4). The deviation scores are (-1, -1, 1, 1), since the mean of the box is 2, so the sum of the squared deviation scores is 4. The average squared deviation is 4=4 = 1, and the square root of 1 is 1. So 15

the standard deviation of the box is 1. Therefore, the standard error of the sum is the square root of 4 times 1, or 2. Therefore from this box, one would expect to depart from the expected value of 8 by around plus or minus 2. Technical point: the standard error only increases by the square root of the sample size because of cancellation. Positive and negative values tend to cancel each other out. It also turns out that for a very large number of draws that the distribution of sums is very close to normal. And then we can used the expected value and standard error just like we used the mean and standard deviation for data. That is, we can compute percentiles using the normal approximation. For example, in the above example, a sum of 8 plus 4 (2 standard errors) is about the 97.5 percentile. But here that means that the probability of getting a sum of 12 or greater is only 2.5%. That is, from a box with a mean of 2 and draws of size 4, a sum of 12 or larger is a rare event. Where we are going is to use such logic to determine whether an assumed expected value for a box is consistent with a sample from that box. File this away for later under the label of \tests of signicance." 16