Announcements. Unit 5: Inference for Categorical Data Lecture 1: Inference for a single proportion

Similar documents
A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

Chapter 8: Estimating with Confidence

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Statistics 20: Final Exam Solutions Summer Session 2007

MATH/STAT 352: Lecture 15

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

MidtermII Review. Sta Spring Tables will be provided. PS 5 due March 31 at 11:55 PM. PA 5 is due April 2 at 11:55 PM.

1 Constructing and Interpreting a Confidence Interval

1 Inferential Methods for Correlation and Regression Analysis

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Frequentist Inference

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Common Large/Small Sample Tests 1/55

Sample Size Determination (Two or More Samples)

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Statistics 511 Additional Materials

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Data Analysis and Statistical Methods Statistics 651

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

AP Statistics Review Ch. 8

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Properties and Hypothesis Testing

1 Constructing and Interpreting a Confidence Interval

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Confidence Intervals for the Population Proportion p

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Chapter 23: Inferences About Means

Chapter 1 (Definitions)

Stat 200 -Testing Summary Page 1

Chapter 18 Summary Sampling Distribution Models

University of California, Los Angeles Department of Statistics. Hypothesis testing

z is the upper tail critical value from the normal distribution

Computing Confidence Intervals for Sample Data

Statistical Intervals for a Single Sample

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Sampling Distributions, Z-Tests, Power

MA238 Assignment 4 Solutions (part a)

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Statistical inference: example 1. Inferential Statistics

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

STAC51: Categorical data Analysis

Final Examination Solutions 17/6/2010

Statistics. Chapter 10 Two-Sample Tests. Copyright 2013 Pearson Education, Inc. publishing as Prentice Hall. Chap 10-1

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Y i n. i=1. = 1 [number of successes] number of successes = n

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Rule of probability. Let A and B be two events (sets of elementary events). 11. If P (AB) = P (A)P (B), then A and B are independent.

Chapter 6 Sampling Distributions

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Math 140 Introductory Statistics

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

CONFIDENCE INTERVALS STUDY GUIDE

Understanding Samples

STAT 203 Chapter 18 Sampling Distribution Models

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Chapter 22: What is a Test of Significance?

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

CH19 Confidence Intervals for Proportions. Confidence intervals Construct confidence intervals for population proportions

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

S160 #12. Sampling Distribution of the Proportion, Part 2. JC Wang. February 25, 2016

GG313 GEOLOGICAL DATA ANALYSIS

Understanding Dissimilarity Among Samples

Mathematical Notation Math Introduction to Applied Statistics

Chapter 8 Interval Estimation

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Parameter, Statistic and Random Samples

(7 One- and Two-Sample Estimation Problem )

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Confidence Level We want to estimate the true mean of a random variable X economically and with confidence.

Data Analysis and Statistical Methods Statistics 651

Topic 9: Sampling Distributions of Estimators

Estimation of a population proportion March 23,

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Transcription:

Housekeepig Aoucemets Uit 5: Iferece for Categorical Data Lecture 1: Iferece for a sigle proportio Statistics 101 Mie Çetikaya-Rudel PA 4 due Friday at 5pm (exteded) PS 6 due Thursday, Oct 30 October 23, 2014 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 2 / 19 Mai ideas (1) Parameter ad poit estimate for a sigle proportio Mai ideas (2) Distributio of ˆp (1) Parameter ad poit estimate for a sigle proportio (2) Distributio of ˆp Parameter of iterest, p: Proportio of success i the populatio (ukow) Poit estimate, ˆp: Proportio of success i the sample Cetral limit theorem for proportios Sample proportios will be early ormally distributed with mea equal p (1 p) to the populatio mea, p, ad stadard error equal to. ˆp N mea = p, SE = p (1 p) Coditios: Idepedece: Radom sample/assigmet + 10% rule At least 10 successes ad failures Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 3 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 4 / 19

Mai ideas (2) Distributio of ˆp Mai ideas (2) Distributio of ˆp Suppose p = 0.93. What shape does the distributio of ˆp have i radom samples of = 100. (a) uimodal ad symmetric (early ormal) (b) bimodal ad symmetric Suppose p = 0.05. What shape does the distributio of ˆp have i radom samples of = 100. (a) uimodal ad symmetric (early ormal) (b) bimodal ad symmetric Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 5 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 6 / 19 Mai ideas (2) Distributio of ˆp Mai ideas (3) Expected vs. observed couts / proportios (3) Expected vs. observed couts / proportios Suppose p = 0.5. What shape does the distributio of ˆp have i radom samples of = 100. (a) uimodal ad symmetric (early ormal) (b) bimodal ad symmetric Whe doig a HT we must assume H 0 is true, whe costructig a CI there is o ull hypothesis that govers the calculatios. S-F: Number of successes ad failures for checkig the success-failure coditio for the early ormal distributio of ˆp: CI: use observed proportio ˆp 10 ad (1 ˆp) 10 HT: use ull value of the proportio p 0 10 ad (1 p 0 ) 10 SE: Proportio of success for calculatig the stadard error of ˆp: p(1 p) SE = CI: use observed proportio SE = ˆp(1 ˆp) HT: use ull value of the proportio SE = p 0 (1 p 0 ) Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 7 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 8 / 19

Mai ideas (4) Simulatio vs. theoretical iferece (4) Simulatio vs. theoretical iferece Sigle populatio proportio, large sample Behavioral idicators such as positive ad egative emotios are a vital measure of a society s wellbeig, ad are used for evaluatig coutries sice traditioal ecoomic idicators (e.g. GDP) aloe caot quatify the huma coditio. A 2012 Gallup poll foud that Sigaporeas are the least likely i the world to report experiecig emotios of ay kid o a daily basis. 36% out of Sigaporeas polled report feelig either positive or egative emotios. If the S-F coditio is met, ca do theoretical iferece: Z test, Z iterval If the S-F coditio is ot met, must use simulatio based methods: radomizatio test, bootstrap iterval Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 9 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 10 / 19 Sigle populatio proportio, large sample You are asked to write a ewspaper article about this fidig, ad provide a probable rage of values (a 95% cofidece iterval) for the true proportio of Sigaporeas who experiece emotios o a daily basis. Which of the followig is the correct CI? (a) 0.36 ± (b) 0.36 ± 1.65 (c) 0.36 ± 1.96 (d) 0.36 ± 1.96 (e) 0.36 ± 1.96 ( ) ˆp = 0.36 = 1, 095 Sigle populatio proportio, large sample Evaluate whether these data provide covicig evidece that majority of Sigaporeas do ot experiece emotios o a daily basis. ˆp = 0.36 = 1, 095 p: Proportio of Sigaporeas who experiece emotios daily H 0 : p = 0.5 H A : p < 0.5 S-F coditio: 1, 095 0.5 > 10 Z = obs ull SE = 0.36 0.50 = 3.97 p value = P(Z > 3.97) < 0.0001 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 11 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 12 / 19

Sigle populatio proportio, large sample Sigle populatio proportio, small sample Which of the followig is the best iterpretatio of the p-value? (a) If 50% of Sigaporeas experiece emotios daily, probability of obtaiig Sigaporeas where less tha 36% experiece emotios is less tha 0.001. (b) If 50% of Sigaporeas experiece emotios daily, probability of obtaiig Sigaporeas where more tha 36% experiece emotios is less tha 0.001. (c) If 50% of Sigaporeas experiece emotios daily, probability of obtaiig Sigaporeas where less tha 36% or more tha 64% experiece emotios is less tha 0.001. (d) If 36% of Sigaporeas experiece emotios daily, probability of obtaiig Sigaporeas where more tha 50% experiece emotios is less tha 0.001. Are you left haded? (a) Yes (b) No Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 13 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 14 / 19 Sigle populatio proportio, small sample A variety of studies suggest that 10% of the world populatio is lefthaded. Assumig that this class is a represetative sample of Duke studets, which of the followig are the correct set of hypotheses for testig if the proportio of Duke studets who are left-haded is differet tha the proportio of left-haded people i the world. (a) H 0 : p = 0.10; H A : p < 0.10 (b) H 0 : p = 0.10; H A : p 0.10 (c) H 0 : ˆp = 0.10; H A : ˆp < 0.10 (d) H 0 : ˆp Duke = ˆp world ; H A : ˆp Duke = ˆp world (e) H 0 : p Duke = p world ; H A : p Duke = p world Simulate by had Sigle populatio proportio, small sample Describe a simulatio scheme for this hypothesis test. 10 chips i a bag: 1 red (left-haded), 9 black (right-haded). Sample radomly times from the bag. Calculate ˆp, the proportio of reds (successes) i the radom sample of chips, record this value. Repeat may times. Calculate the proportio of simulatios where ˆp is at least as differet from 0.10 as the oe observed. Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 15 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 16 / 19

Simulate i R Sigle populatio proportio, small sample Recap Recap o CLT based methods dowload("https://stat.duke.edu/ mc301/r/iferece.rdata", destfile = "iferece.rdata") load("iferece.rdata") _left = [fill i based o class data] _otleft = [fill i based o class data] class_had = c(rep("left", _left), rep("ot left", _otleft)) iferece(class_had, success = "left", est = "proportio", type = "ht", ull = 0.10, alterative = "twosided", method = "simulatio") Calculatig the ecessary sample size for a CI with a give margi of error: If there is a previous study, use ˆp from that study If ot, use ˆp = 0.5: if you do t kow ay better, 50-50 is a good guess ˆp = 0.5 gives the most coservative estimate highest possible sample size HT vs. CI for a proportio Success-failure coditio: CI: At least 10 observed successes ad failures HT: At least 10 expected successes ad failures, calculated usig the ull value Stadard error: CI: calculate usig observed sample proportio: SE = HT: calculate usig the ull value: SE = p 0 (1 p 0 ) p(1 p) Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 17 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 18 / 19 Recap Recap o simulatio methods If the S-F coditio is ot met HT: Radomizatio test simulate uder the assumptio that H 0 is true, the fid the p-value as proportio of simulatios where the simulated ˆp is at least as extreme as the oe observed. CI: Bootstrap iterval resample with replacemet from the origial sample, ad costruct iterval usig percetile or stadard error method. Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 19 / 19