Proportions
A proportion is the fraction of individuals having a particular attribute. It is also the probability that an individual randomly sampled from the population will have that attribute Can range from 0 to 1!
Example: 2092 adult passengers on the Titanic; 654 survived Proportion of survivors = 654/2092 0.3
Probability that two out of three randomly chosen passengers survived the Titanic
Binomial distribution The binomial distribution describes the probability of a given number of "successes" from a fixed number of independent trials, when the probability of success is the same in each trial.
Binomial distribution Used when individuals can be divided into two (bi-) mutually exclusive named groups (-nomial). For example: Left handed or right handed Alive or dead University student or not university student We call the two groups successes vs. failures
Binomial distribution Probability of obtaining X left-handed flowers out of n = 27 randomly sampled, if the proportion of left-handed flowers in the population is 0.25!
n trials; p probability of success Pr[X ] = " $ # n X % ' p X ( 1 p) n X & Probability of X successes in n trials! Probability of a given ordered sequence of successes and failures that yield X successes in n trials! n choose X! The # of unique ordered sequences of successes and failures that yield X successes in n trials!
n trials; p probability of success Pr[X ] = " $ # n X % ' p X ( 1 p) n X & " n % $ ' = # X& n! X! ( n X)!
n! = n n-1 n-2... 3 2 1! 6! = 6 5 4 3 2 1 = 720 0!=1 1!=1
Binomial distribution Assumptions: The number of trials (n) is fixed Separate trials are independent The probability of successes (p) is the same in every trial
Probability that two out of three randomly chosen passengers survived the Titanic " Pr[2] = $ 3% ' 0.3 # 2& ( ) 2 ( 1 0.3) 3 2 = 3! 2! 1! ( 0.3)2 ( 0.7) 1 = 3( 0.3) 2 ( 0.7) =0.189
Probability that two out of three randomly chosen passengers survived the Titanic " Pr[2] = $ 3% ' 0.3 # 2& Number of ways to get 2 survivors out of 3 passengers! ( ) 2 ( 1 0.3) 3 2 Probability of 2 survivors! Probability of 1 death!
Example: Paradise flycatchers A population of paradise flycatchers has 80% brown males and 20% white. Your field assistant captures 5 male flycatchers at random. What is the chance that 3 of those are brown and 2 are white?
Call brown success p = 0.8 n = 5 X = 3 Pr[3] = " 5 $ % ' 0.8 3 (1 0.8) 5 3 = 120 # 3& 6 2 0.83 (0.2) 2 = 0.205
In-class Exercise: What is the probability that 3 or more are brown?!
In-class Exercise: What is the probability that 3 or more are brown?! Pr 3 or more are brown [ ] = Pr[3]+ Pr 4 [ ] + Pr 5 [ ]
In-class Exercise: What is the probability that 3 or more are brown?! Pr 3 or more are brown [ ] = Pr[3]+ Pr 4 [ ] + Pr 5 [ ] Pr[3] =!!! Pr[4] =!!! Pr[5] =!! # "! # "! # " 5 3 5 4 5 5 $ &0.8 3 (1 0.8) 5 3 = 0.205 % $ &0.8 4 (1 0.8) 5 4 = 0.410 % $ &0.8 5 = 0.328 %
In-class Exercise: What is the probability that 3 or more are brown?! Pr 3 or more are brown [ ] = Pr[3]+ Pr 4 [ ] + Pr 5 [ ] = 0.205+ 0.410 + 0.328 = 0.943
Assignment #3 Chapter 5: 28, 36, 37 Chapter 6: 16, 18, 19 Due this Friday Oct. 9 th by 2pm in your TA s homework box
Assignment #4 Chapter 7: 21, 22, 28 Due next Friday Oct. 16 th by 2pm in your TA s homework box
Reading For Today: Chapter 7 For Thursday: Chapter 8
Second part of Chapter 6 Review
Significance level The acceptable probability of rejecting a true null hypothesis Called α For many purposes, α = 0.05 is acceptable
Type I error Rejecting a true null hypothesis False Positive Detecting an effect that is not present Probability of Type I error is α (the significance level)
Type II error Not rejecting a false null hypothesis False Negative Failing to detect and effect that is present The probability of a Type II error is β. The smaller β, the more power a test has.
Power The ability of a test to reject a false null hypothesis Power = 1- β
H o = No wolf present Type I error: Crying wolf when no wolf is present Type II error: Not crying wolf when there is a wolf present.
H o = Red- and blue-shirted athletes are equally likely to win (proportion = 0.5) Type I error: Concluding red- and blueshirted athletes are not equally likely to win, when they actually are. Type II error: Concluding red- and blueshirted athletes are equally likely to win, when they actually are not.
One- and two-tailed tests Most tests are two-tailed tests. This means that a deviation in either direction would reject the null hypothesis. Normally α is divided into α/2 on one side and α/2 on the other.
2.5% 2.5% Test statistic
First part of Chapter 7 Review
Binomial distribution The binomial distribution describes the probability of a given number of "successes" from a fixed number of independent trials, when the probability of success is the same in each trial.
Binomial distribution Probability of obtaining X left-handed flowers out of n = 27 randomly sampled, if the proportion of left-handed flowers in the population is 0.25!
n trials; p probability of success Pr[X ] = " $ # n X % ' p X ( 1 p) n X & Probability of X successes in n trials! Probability of a given ordered sequence of successes and failures that yield X successes in n trials! n choose X! The # of unique ordered sequences of successes and failures that yield X successes in n trials!
n trials; p probability of success Pr[X ] = " $ # n X % ' p X ( 1 p) n X & " n % $ ' = # X& n! X! ( n X)!
Example: Paradise flycatchers A population of paradise flycatchers has 80% brown males and 20% white. Your field assistant captures 5 male flycatchers at random.
In-class Exercise: What is the probability that 3 or more are brown?! Pr 3 or more are brown [ ] = Pr[3]+ Pr 4 [ ] + Pr 5 [ ] Pr[3] =!!! Pr[4] =!!! Pr[5] =!! # "! # "! # " 5 3 5 4 5 5 $ &0.8 3 (1 0.8) 5 3 = 0.205 % $ &0.8 4 (1 0.8) 5 4 = 0.410 % $ &0.8 5 = 0.328 %
Hypothesis testing on proportions The binomial test!
Binomial test The binomial test uses data to test whether a population proportion p matches a null expectation for the proportion. H 0 : The relative frequency of successes in the population is p 0. H A : The relative frequency of successes in the population is not p 0.
Binomial distribution Represents the sampling distribution for the number of successes (X) in a random sample of n trials, when the probability of success is the same in each trial Rather than using a computer to simulate a vast number of random samples, we can use this to calculate the null distribution!
Binomial distribution Probability of obtaining X left-handed flowers out of n = 27 randomly sampled, if the proportion of left-handed flowers in the population is 0.25!
Example An example: Imagine a student takes a multiple choice test before starting a statistics class. Each of the 10 questions on the test have 5 possible answers, only one of which is correct. This student gets 4 answers right. Can we deduce from this that this student knows anything at all about statistics?
Hypotheses H 0 : Student got correct answers randomly. H A : Student got more answers correct than random. This is properly a one-tailed test.!
Hypotheses H 0 : Student got correct answers randomly. H 0 : p = 0.2 H A : Student got more answers correct than random. H A : p > 0.2
N =10, p = 0.2 P = Pr[4] + Pr[5] + Pr[6] +... + Pr[10] " = 10 % $ ' 0.2 # 4 & = 0.12 " ( ) 4 ( 0.8) 6 + $ 10 5 # % ' 0.2 & " ( ) 5 ( 0.8) 5 + $ 10 6 # % ' 0.2 & ( ) 6 ( 0.8) 4 +... Note: The capital P here is used for the P-value, in contrast to the population proportion with a small p.
P = 0.12 This is greater than the α value of 0.05, so we would not reject the null hypothesis.! It is plausible that the student had four answers correct just by guessing randomly.!
Estimating Proportions: Proportion of successes in a sample p is the true population proportion! ˆp = X n The hat (^) shows that! this is an estimate of p.!
Standard error of the estimate of a proportion is the standard deviation of the sampling distribution σ ˆρ = p ( 1 p ) n
We usually don t know p so we estimate the standard error with ˆp SE ˆp = ˆp ( 1 ˆp ) n
A proportion is like a mean Yes = 1 No = 0 82/344 = 0.238 (82*1 + 262*0)/344 = 0.238
Variance of the estimate of a proportion is p(1-p) Case Worth It? Score (X) Mean (X-mean) (X-mean) 2 1 yes 1 0.6 0.4 0.16 2 no 0 0.6-0.6 0.36 3 no 0 0.6-0.6 0.36 4 yes 1 0.6 0.4 0.16 5 yes 1 0.6 0.4 0.16 6 yes 1 0.6 0.4 0.16 7 yes 1 0.6 0.4 0.16 8 no 0 0.6-0.6 0.36 9 yes 1 0.6 0.4 0.16 10 no 0 0.6-0.6 0.36 6/10 =.6 (mean of proportion) = 2.4 (sum of squares) Variance = 2.4/10 = 0.6 * 0.4 = 0.24!!
We usually don t know p so we estimate the standard error with ˆp SE ˆp = ˆp ( 1 ˆp ) n
A larger sample has a lower standard error
The law of large numbers The greater the sample size, the closer an estimate of a proportion is likely to be to its true value.! ˆ p Sample size!
95% confidence interval for a proportion p " = X + 2 n + 4 $ & % p " 1.96 ( ) p " 1 p " n + 4 ' ) ( p $ p " +1.96 p " 1 p " & % n + 4 ( ) ' ) ( This is the Agresti-Coull confidence interval!
Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists?!
Example: The daughters of radiologists 30 out of 87 offspring of,male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists?! pˆ = 30/87, or 0.345
Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists? What is the 95% confidence interval for this estimate?! ˆ p = 30/87, or 0.345
Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists? What is the 95% confidence interval for this estimate?! p " = X + 2 n + 4 = 30+ 2 87+ 4 = 0.352
Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists? What is the 95% confidence interval for this estimate?! p " = X + 2 n + 4 = 30+ 2 87+ 4 = 0.352 p! ±1.96 ( ) p! 1 p! n + 4 = 0.352 ±1.96 0.352( 1 0.352) 87+ 4 = 0.352 ± 0.098
Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists? What is the 95% confidence interval for this estimate?! p " = X + 2 n + 4 = 30+ 2 87+ 4 = 0.352 p " ± Z ( ) p " 1 p " n + 4 = 0.352±1.96 0.352( 1 0.352) 87+ 4 = 0.352± 0.098 0.254 < p < 0.450