A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Similar documents
CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Chapter 9 Inferences from Two Samples

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Single Sample Means. SOCY601 Alan Neustadtl

Visual interpretation with normal approximation

Statistical Inference for Means

Chapter. Hypothesis Testing with Two Samples. Copyright 2015, 2012, and 2009 Pearson Education, Inc. 1

Chapter 5 Confidence Intervals

CHAPTER 10 Comparing Two Populations or Groups

Data Analysis and Statistical Methods Statistics 651

Inferences About Two Proportions

Sampling Distribution of a Sample Proportion

Beyond p values and significance. "Accepting the null hypothesis" Power Utility of a result. Cohen Empirical Methods CS650

6.4 Type I and Type II Errors

Mathematical Notation Math Introduction to Applied Statistics

Homework Exercises. 1. You want to conduct a test of significance for p the population proportion.

3. DISCRETE PROBABILITY DISTRIBUTIONS

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

HYPOTHESIS TESTING. Hypothesis Testing

Econ 325: Introduction to Empirical Economics

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 22. Comparing Two Proportions 1 /29

Performance Evaluation and Comparison

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Math 140 Introductory Statistics

Harvard University. Rigorous Research in Engineering Education

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

STAT Chapter 8: Hypothesis Tests

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Lecture Slides. Elementary Statistics. Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Chapter 15 Sampling Distribution Models

Chapter 24. Comparing Means

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

10.1. Comparing Two Proportions. Section 10.1

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Chapter 6 Estimation and Sample Sizes

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

Chapter Six: Two Independent Samples Methods 1/51

Chapter 22. Comparing Two Proportions 1 /30

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Lecture 11 - Tests of Proportions

Introduction to Statistical Hypothesis Testing

Sampling Distributions

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Statistics 135 Fall 2007 Midterm Exam

Lecture 7: Hypothesis Testing and ANOVA

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

p = q ˆ = 1 -ˆp = sample proportion of failures in a sample size of n x n Chapter 7 Estimates and Sample Sizes

Lab #12: Exam 3 Review Key

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Slides for Data Mining by I. H. Witten and E. Frank

Chapter 12 - Lecture 2 Inferences about regression coefficient

The Components of a Statistical Hypothesis Testing Problem

Soc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS

Announcements. Final Review: Units 1-7

Mathematical Notation Math Introduction to Applied Statistics

Random Sampling - what did we learn?

Random Sampling - what did we learn? Homework Assignment

Difference Between Pair Differences v. 2 Samples

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Lecture 26 Section 8.4. Wed, Oct 14, 2009

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Charles Geyer University of Minnesota. joint work with. Glen Meeden University of Minnesota.

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Section 6.2 Hypothesis Testing

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Sections 7.1 and 7.2. This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Business Statistics MEDIAN: NON- PARAMETRIC TESTS

Formulas and Tables. for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. ˆp E p ˆp E Proportion.

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

1 Hypothesis testing for a single mean

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Chapter 8. Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis

Summary of Chapters 7-9

Chapter 16: Understanding Relationships Numerical Data

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

COGS 14B: INTRODUCTION TO STATISTICAL ANALYSIS

Difference between means - t-test /25

Module 10: Analysis of Categorical Data Statistics (OA3102)

Practice Final Exam ANSWER KEY Chapters 7-13

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

How do we compare the relative performance among competing models?

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Sampling Distribution of a Sample Proportion

16.400/453J Human Factors Engineering. Design of Experiments II

Sampling Distributions: Central Limit Theorem

1; (f) H 0 : = 55 db, H 1 : < 55.

Transcription:

Proportions

A proportion is the fraction of individuals having a particular attribute. It is also the probability that an individual randomly sampled from the population will have that attribute Can range from 0 to 1!

Example: 2092 adult passengers on the Titanic; 654 survived Proportion of survivors = 654/2092 0.3

Probability that two out of three randomly chosen passengers survived the Titanic

Binomial distribution The binomial distribution describes the probability of a given number of "successes" from a fixed number of independent trials, when the probability of success is the same in each trial.

Binomial distribution Used when individuals can be divided into two (bi-) mutually exclusive named groups (-nomial). For example: Left handed or right handed Alive or dead University student or not university student We call the two groups successes vs. failures

Binomial distribution Probability of obtaining X left-handed flowers out of n = 27 randomly sampled, if the proportion of left-handed flowers in the population is 0.25!

n trials; p probability of success Pr[X ] = " $ # n X % ' p X ( 1 p) n X & Probability of X successes in n trials! Probability of a given ordered sequence of successes and failures that yield X successes in n trials! n choose X! The # of unique ordered sequences of successes and failures that yield X successes in n trials!

n trials; p probability of success Pr[X ] = " $ # n X % ' p X ( 1 p) n X & " n % $ ' = # X& n! X! ( n X)!

n! = n n-1 n-2... 3 2 1! 6! = 6 5 4 3 2 1 = 720 0!=1 1!=1

Binomial distribution Assumptions: The number of trials (n) is fixed Separate trials are independent The probability of successes (p) is the same in every trial

Probability that two out of three randomly chosen passengers survived the Titanic " Pr[2] = $ 3% ' 0.3 # 2& ( ) 2 ( 1 0.3) 3 2 = 3! 2! 1! ( 0.3)2 ( 0.7) 1 = 3( 0.3) 2 ( 0.7) =0.189

Probability that two out of three randomly chosen passengers survived the Titanic " Pr[2] = $ 3% ' 0.3 # 2& Number of ways to get 2 survivors out of 3 passengers! ( ) 2 ( 1 0.3) 3 2 Probability of 2 survivors! Probability of 1 death!

Example: Paradise flycatchers A population of paradise flycatchers has 80% brown males and 20% white. Your field assistant captures 5 male flycatchers at random. What is the chance that 3 of those are brown and 2 are white?

Call brown success p = 0.8 n = 5 X = 3 Pr[3] = " 5 $ % ' 0.8 3 (1 0.8) 5 3 = 120 # 3& 6 2 0.83 (0.2) 2 = 0.205

In-class Exercise: What is the probability that 3 or more are brown?!

In-class Exercise: What is the probability that 3 or more are brown?! Pr 3 or more are brown [ ] = Pr[3]+ Pr 4 [ ] + Pr 5 [ ]

In-class Exercise: What is the probability that 3 or more are brown?! Pr 3 or more are brown [ ] = Pr[3]+ Pr 4 [ ] + Pr 5 [ ] Pr[3] =!!! Pr[4] =!!! Pr[5] =!! # "! # "! # " 5 3 5 4 5 5 $ &0.8 3 (1 0.8) 5 3 = 0.205 % $ &0.8 4 (1 0.8) 5 4 = 0.410 % $ &0.8 5 = 0.328 %

In-class Exercise: What is the probability that 3 or more are brown?! Pr 3 or more are brown [ ] = Pr[3]+ Pr 4 [ ] + Pr 5 [ ] = 0.205+ 0.410 + 0.328 = 0.943

Assignment #3 Chapter 5: 28, 36, 37 Chapter 6: 16, 18, 19 Due this Friday Oct. 9 th by 2pm in your TA s homework box

Assignment #4 Chapter 7: 21, 22, 28 Due next Friday Oct. 16 th by 2pm in your TA s homework box

Reading For Today: Chapter 7 For Thursday: Chapter 8

Second part of Chapter 6 Review

Significance level The acceptable probability of rejecting a true null hypothesis Called α For many purposes, α = 0.05 is acceptable

Type I error Rejecting a true null hypothesis False Positive Detecting an effect that is not present Probability of Type I error is α (the significance level)

Type II error Not rejecting a false null hypothesis False Negative Failing to detect and effect that is present The probability of a Type II error is β. The smaller β, the more power a test has.

Power The ability of a test to reject a false null hypothesis Power = 1- β

H o = No wolf present Type I error: Crying wolf when no wolf is present Type II error: Not crying wolf when there is a wolf present.

H o = Red- and blue-shirted athletes are equally likely to win (proportion = 0.5) Type I error: Concluding red- and blueshirted athletes are not equally likely to win, when they actually are. Type II error: Concluding red- and blueshirted athletes are equally likely to win, when they actually are not.

One- and two-tailed tests Most tests are two-tailed tests. This means that a deviation in either direction would reject the null hypothesis. Normally α is divided into α/2 on one side and α/2 on the other.

2.5% 2.5% Test statistic

First part of Chapter 7 Review

Binomial distribution The binomial distribution describes the probability of a given number of "successes" from a fixed number of independent trials, when the probability of success is the same in each trial.

Binomial distribution Probability of obtaining X left-handed flowers out of n = 27 randomly sampled, if the proportion of left-handed flowers in the population is 0.25!

n trials; p probability of success Pr[X ] = " $ # n X % ' p X ( 1 p) n X & Probability of X successes in n trials! Probability of a given ordered sequence of successes and failures that yield X successes in n trials! n choose X! The # of unique ordered sequences of successes and failures that yield X successes in n trials!

n trials; p probability of success Pr[X ] = " $ # n X % ' p X ( 1 p) n X & " n % $ ' = # X& n! X! ( n X)!

Example: Paradise flycatchers A population of paradise flycatchers has 80% brown males and 20% white. Your field assistant captures 5 male flycatchers at random.

In-class Exercise: What is the probability that 3 or more are brown?! Pr 3 or more are brown [ ] = Pr[3]+ Pr 4 [ ] + Pr 5 [ ] Pr[3] =!!! Pr[4] =!!! Pr[5] =!! # "! # "! # " 5 3 5 4 5 5 $ &0.8 3 (1 0.8) 5 3 = 0.205 % $ &0.8 4 (1 0.8) 5 4 = 0.410 % $ &0.8 5 = 0.328 %

Hypothesis testing on proportions The binomial test!

Binomial test The binomial test uses data to test whether a population proportion p matches a null expectation for the proportion. H 0 : The relative frequency of successes in the population is p 0. H A : The relative frequency of successes in the population is not p 0.

Binomial distribution Represents the sampling distribution for the number of successes (X) in a random sample of n trials, when the probability of success is the same in each trial Rather than using a computer to simulate a vast number of random samples, we can use this to calculate the null distribution!

Binomial distribution Probability of obtaining X left-handed flowers out of n = 27 randomly sampled, if the proportion of left-handed flowers in the population is 0.25!

Example An example: Imagine a student takes a multiple choice test before starting a statistics class. Each of the 10 questions on the test have 5 possible answers, only one of which is correct. This student gets 4 answers right. Can we deduce from this that this student knows anything at all about statistics?

Hypotheses H 0 : Student got correct answers randomly. H A : Student got more answers correct than random. This is properly a one-tailed test.!

Hypotheses H 0 : Student got correct answers randomly. H 0 : p = 0.2 H A : Student got more answers correct than random. H A : p > 0.2

N =10, p = 0.2 P = Pr[4] + Pr[5] + Pr[6] +... + Pr[10] " = 10 % $ ' 0.2 # 4 & = 0.12 " ( ) 4 ( 0.8) 6 + $ 10 5 # % ' 0.2 & " ( ) 5 ( 0.8) 5 + $ 10 6 # % ' 0.2 & ( ) 6 ( 0.8) 4 +... Note: The capital P here is used for the P-value, in contrast to the population proportion with a small p.

P = 0.12 This is greater than the α value of 0.05, so we would not reject the null hypothesis.! It is plausible that the student had four answers correct just by guessing randomly.!

Estimating Proportions: Proportion of successes in a sample p is the true population proportion! ˆp = X n The hat (^) shows that! this is an estimate of p.!

Standard error of the estimate of a proportion is the standard deviation of the sampling distribution σ ˆρ = p ( 1 p ) n

We usually don t know p so we estimate the standard error with ˆp SE ˆp = ˆp ( 1 ˆp ) n

A proportion is like a mean Yes = 1 No = 0 82/344 = 0.238 (82*1 + 262*0)/344 = 0.238

Variance of the estimate of a proportion is p(1-p) Case Worth It? Score (X) Mean (X-mean) (X-mean) 2 1 yes 1 0.6 0.4 0.16 2 no 0 0.6-0.6 0.36 3 no 0 0.6-0.6 0.36 4 yes 1 0.6 0.4 0.16 5 yes 1 0.6 0.4 0.16 6 yes 1 0.6 0.4 0.16 7 yes 1 0.6 0.4 0.16 8 no 0 0.6-0.6 0.36 9 yes 1 0.6 0.4 0.16 10 no 0 0.6-0.6 0.36 6/10 =.6 (mean of proportion) = 2.4 (sum of squares) Variance = 2.4/10 = 0.6 * 0.4 = 0.24!!

We usually don t know p so we estimate the standard error with ˆp SE ˆp = ˆp ( 1 ˆp ) n

A larger sample has a lower standard error

The law of large numbers The greater the sample size, the closer an estimate of a proportion is likely to be to its true value.! ˆ p Sample size!

95% confidence interval for a proportion p " = X + 2 n + 4 $ & % p " 1.96 ( ) p " 1 p " n + 4 ' ) ( p $ p " +1.96 p " 1 p " & % n + 4 ( ) ' ) ( This is the Agresti-Coull confidence interval!

Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists?!

Example: The daughters of radiologists 30 out of 87 offspring of,male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists?! pˆ = 30/87, or 0.345

Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists? What is the 95% confidence interval for this estimate?! ˆ p = 30/87, or 0.345

Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists? What is the 95% confidence interval for this estimate?! p " = X + 2 n + 4 = 30+ 2 87+ 4 = 0.352

Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists? What is the 95% confidence interval for this estimate?! p " = X + 2 n + 4 = 30+ 2 87+ 4 = 0.352 p! ±1.96 ( ) p! 1 p! n + 4 = 0.352 ±1.96 0.352( 1 0.352) 87+ 4 = 0.352 ± 0.098

Example: The daughters of radiologists 30 out of 87 offspring of male radiologists are males, and the rest female. What is the best estimate of the proportion of sons among radiologists? What is the 95% confidence interval for this estimate?! p " = X + 2 n + 4 = 30+ 2 87+ 4 = 0.352 p " ± Z ( ) p " 1 p " n + 4 = 0.352±1.96 0.352( 1 0.352) 87+ 4 = 0.352± 0.098 0.254 < p < 0.450