Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Similar documents
Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

One-sample categorical data: approximate inference

Power and sample size calculations

Testing Independence

MA 1125 Lecture 33 - The Sign Test. Monday, December 4, Objectives: Introduce an example of a non-parametric test.

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Statistics in medicine

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

POLI 443 Applied Political Research

Two-sample inference: Continuous data

Practice Questions: Statistics W1111, Fall Solutions

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Quantitative Analysis and Empirical Methods

You may use your calculator and a single page of notes.

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Chapter 12 - Lecture 2 Inferences about regression coefficient

Difference between means - t-test /25

Physics 509: Non-Parametric Statistics and Correlation Testing

Exam 2 (KEY) July 20, 2009

Two-sample inference: Continuous data

79 Wyner Math Academy I Spring 2016

Introduction to the Analysis of Tabular Data

HYPOTHESIS TESTING. Hypothesis Testing

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

1 Hypothesis testing for a single mean

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

ECE 510 Lecture 6 Confidence Limits. Scott Johnson Glenn Shirley

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

An introduction to biostatistics: part 1

Measures of Association and Variance Estimation

Review for Final Exam Stat 205: Statistics for the Life Sciences

Goodness of Fit Goodness of fit - 2 classes

Hypothesis testing. Data to decisions

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Chapter 9 Inferences from Two Samples

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

ph: 5.2, 5.6, 5.8, 6.4, 6.5, 6.8, 6.9, 7.2, 7.5 sample mean = sample sd = sample size, n = 9

Discrete Multivariate Statistics

Ch18 links / ch18 pdf links Ch18 image t-dist table

9-6. Testing the difference between proportions /20

Lecture 28 Chi-Square Analysis

Lecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

CHAPTER 10 Comparing Two Populations or Groups

Ch. 11 Inference for Distributions of Categorical Data

Two Sample Problems. Two sample problems

6/25/14. The Distribution Normality. Bell Curve. Normal Distribution. Data can be "distributed" (spread out) in different ways.

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Chapter 12: Inference about One Population

Mean Vector Inferences

6.3 Use Normal Distributions. Page 399 What is a normal distribution? What is standard normal distribution? What does the z-score represent?

PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

STAT Chapter 8: Hypothesis Tests

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

Econ 325: Introduction to Empirical Economics

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

The Chi-Square Distributions

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

Test statistic P value Reject/fail to reject. Conclusion:

Chapter 11. Hypothesis Testing (II)

Are the Digits in a Mersenne Prime Random? A Probability Model

Homework Exercises. 1. You want to conduct a test of significance for p the population proportion.

Statistics 251: Statistical Methods

Regression With a Categorical Independent Variable: Mean Comparisons

Inferences About Two Population Proportions

Two-sample Categorical data: Testing

Econometrics. 4) Statistical inference

SMAM 314 Exam 3 Name. F A. A null hypothesis that is rejected at α =.05 will always be rejected at α =.01.

Chapter 7: Hypothesis Testing

Inferences About Two Proportions

Mathematical Statistics

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

Two-sample Categorical data: Testing

Comparing Means from Two-Sample

68% 95% 99.7% x x 1 σ. x 1 2σ. x 1 3σ. Find a normal probability

Inferences for Correlation

Chapter 9: Sampling Distributions

Data Processing Techniques

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Transcription:

BIOS 4120: Introduction to Biostatistics Breheny Lab #11 We will explore observational studies in today s lab and review how to make inferences on contingency tables. We will only use 2x2 tables for today s lab. Then we will review for the quiz. 1. χ 2 and Fisher s Exact Test Review The χ 2 test and Fisher s Exact Test both do the same thing: they test whether or not two groups are dependent. As in, does being in one group for variable A affect whether you will be in a certain group for variable B? The χ 2 test is approximate while the Fisher s Exact test is exact. Variable B Y N Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d The χ 2 test works when all expected cell counts are not near zero, since it is an approximate test. Usually the Fisher s Exact Test results are about the same as the χ 2 test for 2x2 contingency tables; however, you must make sure expected cell counts under the null hypothesis of independence is not near zero (about greater than 5). 2. Odds Ratio and Relative Risk Refer to the table above for this section. The odds of an event is the ratio of the number of times the event occurs to the number of times the event fails to occur. The odds for an event for the first row and second row are a/b and c/d, respectively. Since the odds ratio is literally the ratio of two odds we have: OR = a b c d = ad bc What does an odds ratio of 1 mean?

The risk of an event is the probability of an event occurring. Therefore, the relative risk is the ratio of the risks for two groups. RR = a/(a + b) c/(c + d) Estimated x% confidence interval for the true log(odds Ratio) of the population of interest: log( OR ) ± Z x% 1 + 1 + 1 + 1 a b c d NOTE: This is the confidence interval for the log(odds Ratio). So we will have to exponentiate it to get a confidence interval for the actual Odds Ratio. X% Confidence interval: exp(lower bound for log OR), exp(upper bound for log OR). 3. Example Caesarian Delivery EFM Exposure Yes No Total Yes 358 229 587 No 2492 2745 5237 Total 2850 2974 5824 a) What is the estimated relative risk? RR = (358/587)/(2492/5237) = 1.282 b) What is the estimated odds ratio? (358*2745)/(229*2492) = 1.72 c) Find the 95% confidence interval for the true odds ratio. Is one inside of your interval? What do you conclude? log(1.72) +/- 1.96*sqrt(1/358+1/229+1/2492+1/2745) = (0.37, 0.72). exp(0.37), exp(0.72) = (1.45, 2.05)

Quiz 3 Review P(X = k) = ( n k )pk (1 p) n k BINOMIAL DISTRIBUTION Uses: This is for EXACT probability. This finds the probability, given n, k, and p, of observing EXACTLY k events. Useful note: P(X = 0) + P(X = 1) + +P(X = n) = 1. Let 1 represent this space. For at least one event: 1 P(X = 0) = P(X = 1) + P(X = 2) + +P(X = n) P(X k) = P(X = 0) + P(X = 1) P(X = k) Z = p p 0 p 0 (1 p 0 ) n APPROXIMATE NORMAL FOR BINOMIAL DISTRIBUTION Uses: for hypothesis testing whether the true proportion is equal to p 0. Under the null hypothesis, this statistic follows an approximate normal distribution. x μ Z = σ NORMAL DISTRIBUTION Uses: For finding probabilities/proportions in normally distributed data T = x μ 0 s/ n STUDENT S T DISTRIBUTION Uses: testing whether the true mean of a population is equal to µ 0.

T = x 0 d s d / n Paired data: Paired T-test Uses: Testing if the true mean difference between paired data is equal to zero. Uses the magnitude of the difference and is generally more powerful than the binomial test. 1) Record how many pairs had one treatment that did better than the other. Count as successes. 2) Find the probability of observing data as extreme or more extreme, assuming the null hypothesis to be p = 0.5. Paired data: Binomial Test Uses: Assuming 50-50 chance of a treatment being better than another, treat as binomial distribution with probability = 0.5. Does not take into effect the magnitude of the difference of treatments. POWER: probability of rejecting the null hypothesis when the null hypothesis is in reality false. Affected by sample size, effect size, and variability. CONFIDENCE INTERVALS p ± Z X% p (1 p ) n x ± T x%,df s/ n These are only a select few of the topics covered for the material for the quiz on Thursday. Please review all lecture notes and homework solutions, especially the procedure slides in the lecture notes.

PRACTICE PROBLEMS 1. A company owns 400 laptops. Each laptop has an 8% probability of not working. You randomly select 20 laptops for your salespeople. (a) What is the probability that 5 will be broken? P(X = 5) = ( 20 5 )0.085 (0.92) 15 = 0.015 (b) What is the probability that they will all work? P(X = 0) = ( 20 0 )0.080 (0.92) 20 = 0.19 (c) What is the probability of at most two not working? P(X <= 2) = ( 20 0 )0.080 (0.92) 20 + ( 20 1 )0.081 (0.92) 19 + ( 20 2 )0.082 (0.92) 18 = 0.79 2. We are interested in testing whether a certain at-risk population for diabetes has a daily sugar intake that is equal to the general population, which is equal to 77 grams/day. A sample of size 37 was taken from this at-risk population, and we obtained a sample mean of 80 and sample standard deviation of 11 grams. (a) Perform a hypothesis test to test whether this population has a significantly different mean sugar intake from 77 grams. 80 77 T = = 1.66, df = 36. Use table at df = 35. Pvalue between 0.15 and 0.1. Probably (11/ 37) about 0.12. Exact p-value is: > (1-pt(1.66, df =36))*2 [1] 0.1056019 We fail to reject the null hypothesis and cannot conclude this at-risk population mean sugar intake is different from the general population mean sugar intake (p = 0.11) 3. The distribution of LDL cholesterol levels in a certain population is approximately normal with mean 90 mg/dl and standard deviation 8 mg/dl. (a) What is the probability an individual will have a LDL cholesterol level above 95 mg/dl? Z = (95-90)/8 = 0.625. P(Z > 0.625) = 0.27

(b) Suppose we have a sample of 10 people from this population. What is the probability of exactly 3 of them being above 95 mg/dl? P(X = 3) = ( 10 3 )0.273 (0.73) 7 = 0.261 (c) Take the sample of size from part b. What is the probability that their sample mean will be above 95 mg/dl? Z = 95 90 = 1.98 P(Z>1.98) = 0.024 8/ 10 (d) Suppose we take 5samples of size 10 from the population. What is the probability that at least one of the sample means will be greater than 95 mg/dl? P(X >= 1) = 1 - P(X = 0) = 1 - ( 5 0 )0.0240 (0.976) 5 = 0.11 4. A study indicates that 4% of American teenagers have tattoos. You randomly sample 30 teenagers. What is the probability that exactly 3 will have a tattoo? P(X = 3) = ( 30 3 )0.043 (0.96) 27 = 0.09 5. An XYZ cell phone is made from 55 components. Each component has a.002 probability of being defective. What is the probability that at least one XYZ cell phone will not work perfectly? P(X >= 1) = 1 - P(X = 0) = 1 - ( 55 0 )0.0020 (0.998) 55 = 0.104 6. In the following scenarios, identify what will happen to the power of a hypothesis test: (a) We increase the sample size. Power increases (b) The standard deviation of the sample is larger than what we expected. Power decreases (c) Our effect size moves from 5 units to 10 units. Power increases