Understanding Dissimilarity Among Samples

Similar documents
A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Understanding Samples

MATH/STAT 352: Lecture 15

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Economics Spring 2015

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Chapter 8: Estimating with Confidence

Math 140 Introductory Statistics

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Confidence Intervals for the Population Proportion p

(7 One- and Two-Sample Estimation Problem )

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Estimation of a population proportion March 23,

S160 #12. Sampling Distribution of the Proportion, Part 2. JC Wang. February 25, 2016

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

Common Large/Small Sample Tests 1/55

1 Models for Matched Pairs

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

S160 #12. Review of Large Sample Result for Sample Proportion

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Exam 2 Instructions not multiple versions

Statistics 511 Additional Materials

Confidence Intervals รศ.ดร. อน นต ผลเพ ม Assoc.Prof. Anan Phonphoem, Ph.D. Intelligent Wireless Network Group (IWING Lab)

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Read through these prior to coming to the test and follow them when you take your test.

Frequentist Inference

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Describing the Relation between Two Variables

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Final Examination Solutions 17/6/2010

Chapter 6 Sampling Distributions

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Introduction There are two really interesting things to do in statistics.

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

1 Inferential Methods for Correlation and Regression Analysis

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Chapter 7 Student Lecture Notes 7-1

Introduction to Probability and Statistics Twelfth Edition

Sample Size Determination (Two or More Samples)

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

AP Statistics Review Ch. 8

MEASURES OF DISPERSION (VARIABILITY)

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

Random Variables, Sampling and Estimation

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Topic 10: Introduction to Estimation

Lesson 7: Estimation 7.3 Estimation of Population Proportio. 1-PropZInterval

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

This is an introductory course in Analysis of Variance and Design of Experiments.

Simple Random Sampling!

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Introducing Sample Proportions

Topic 9: Sampling Distributions of Estimators

STAT 203 Chapter 18 Sampling Distribution Models

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Homework 5 Solutions

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Sampling Distributions, Z-Tests, Power

Introducing Sample Proportions

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Statistics 300: Elementary Statistics

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

University of California, Los Angeles Department of Statistics. Hypothesis testing

5. A formulae page and two tables are provided at the end of Part A of the examination PART A

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Stat 400: Georgios Fellouris Homework 5 Due: Friday 24 th, 2017

7: Sampling Distributions

SOLUTIONS y n. n 1 = 605, y 1 = 351. y1. p y n. n 2 = 195, y 2 = 41. y p H 0 : p 1 = p 2 vs. H 1 : p 1 p 2.

Announcements. Unit 5: Inference for Categorical Data Lecture 1: Inference for a single proportion

Data Analysis and Statistical Methods Statistics 651

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Econ 371 Exam #1. Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement.

Stat 225 Lecture Notes Week 7, Chapter 8 and 11

CONFIDENCE INTERVALS STUDY GUIDE

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 5: Hypothesis testing

24.1 Confidence Intervals and Margins of Error

Parameter, Statistic and Random Samples

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Tables and Formulas for Sullivan, Fundamentals of Statistics, 2e Pearson Education, Inc.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Central Limit Theorem the Meaning and the Usage

Transcription:

Aoucemets: Midterm is Wed. Review sheet is o class webpage (i the list of lectures) ad will be covered i discussio o Moday. Two sheets of otes are allowed, same rules as for the oe sheet last time. Office hours today, Mo, Tues slightly revised from usual. See webpage. Homework (due Moday): Chapter 9: #50 (Each part couts for poit, so problem is worth 6 poits.) Chapter 9, Sectios 4, 5, 9 Samplig Distributios for Proportios: Oe proportio or differece i two Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 Uderstadig Dissimilarity Amog Samples Key: Need to uderstad what kid of dissimilarity we should expect to see i various samples from the same populatio. Suppose kew most samples were likely to provide a aswer that is withi 0% of the populatio aswer. The would also kow the populatio aswer should be withi 0% of whatever our specific sample gave. => Have a good guess about the populatio value based o just oe sample value. Statistics ad Parameters A statistic is a umerical value computed from a sample. Its value may differ for differet samples. e.g. sample mea x, sample stadard deviatio s, ad sample proportio. A parameter is a umerical value associated with a populatio. Cosidered fixed ad uchagig. e.g. populatio mea µ, populatio stadard deviatio σ, ad populatio proportio p. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 3 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 4 Samplig Distributios Each ew sample take => sample statistic will chage. The distributio of possible values of a statistic for repeated samples of the same size from a populatio is called the samplig distributio of the statistic. May statistics of iterest have samplig distributios that are approximately ormal distributios 9.4 Samplig Distributio for Oe Sample Proportio Suppose (ukow to us) 40% of a populatio carry the gee for a disease, (p = 0.40). We will take a radom sample of 5 people from this populatio ad cout X = umber with gee. Although we expect o average to fid 0 people (40%) with the gee, we kow the umber will vary for differet samples of = 5. I this case, X is a biomial radom variable with = 5 ad p = 0.4. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 5 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 6

May Possible Samples Four possible radom samples of 5 people: Sample : X =, proportio with gee =/5 = 0.48 or 48%. Sample : X = 9, proportio with gee = 9/5 = 0.36 or 36%. Sample 3: X = 0, proportio with gee = 0/5 = 0.40 or 40%. Sample 4: X = 7, proportio with gee = 7/5 = 0.8 or 8%. Note: Each sample gave a differet aswer, which did ot always match the populatio value of 40%. Although we caot determie whether oe sample statistic will accurately estimate the true populatio parameter, statisticias have determied probabilities for how far from the truth the sample values could be. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 7 The Normal Curve Approximatio Rule for Sample Proportios Let p = populatio proportio of iterest or biomial probability of success. Let = sample proportio or proportio of successes. If umerous radom samples or repetitios of the same size are take, the distributio of possible values of is approximately a ormal curve distributio with Mea = p p ( Stadard deviatio = s.d.( ) = This approximate distributio is samplig distributio of. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 8 Ex: Medicie cures 60% Sample 00 people p = proportio of sample cured Samplig distributio for p is: Approximately ormal Mea = p =.60 (.4)(.6) St. dev. = =.0346 00 From Empirical Rule, expect 95% of samples to produce p to be i the iterval mea ± s.d..60 ± (.0346) or.60 ±.07 or.53 to.67. Samplig distributio of p-hat for = 00, p =.6 Normal, Mea=0.6, StDev=0.0346 0.53 0.6 Possible p-hat 95% 0.67 9 0 The Normal Curve Approximatio Rule for Sample Proportios Normal Approximatio Rule ca be applied i two situatios: Situatio : A radom sample is take from a populatio. Situatio : A biomial experimet is repeated umerous times. I each situatio, three coditios must be met: Coditio : The Physical Situatio There is a actual populatio or repeatable situatio. Coditio : Data Collectio A radom sample is obtaied or situatio repeated may times. Coditio 3: The Size of the Sample or Number of Trials The size of the sample or umber of repetitios is relatively large, p ad p(- must be at least 5 ad preferable at least 0. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 How well does the approximatio work? It depeds o ad p. Try this applet: http://bcs.whfreema.com/pbs/cat_050/pbs/clt-biomial.html

Examples for which Rule Applies Polls: to estimate proportio who favor a cadidate; uits = all voters. Televisio Ratigs: to estimate proportio of households watchig TV program; uits = all households with TV. Cosumer Prefereces: to estimate proportio of cosumers who prefer ew recipe compared with old; uits = all cosumers. Testig ESP: to estimate probability a perso ca successfully guess which of 5 symbols o a hidde card; repeatable situatio = a guess. Example: Belief i evolutio Gallup Poll. Feb. 6-7, 009. N=,08 adults atiowide. Margi of error give as +/-3%. "Now, thikig about aother historical figure: Ca you tell me with which scietific theory Charles Darwi is associated?" Optios rotated Correct respose (Evolutio, atural selectio, etc.) 55% Icorrect respose 0% Usure/do t kow 34% No aswer % Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 3 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 4 Example, cotiued "I fact, Charles Darwi is oted for developig the theory of evolutio. Do you, persoally, believe i the theory of evolutio, do you ot believe i evolutio, or do't you have a opiio either way? (Poll based o = 08 adults) Believe i evolutio 39% Do ot believe i evolutio 5% No opiio either way 36% No aswer % Example, cotiued Let p = populatio proportio who believe i evolutio. Our observed =.39, from sample of 08. Based o samples of = 08, comes from a distributio of possible values, which is approximately ormal with mea µ = p ad stadard deviatio σ = p( 08 Based o this, ca we use to estimate p? Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 5 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 6 Estimatig the Populatio Proportio from a Sigle Sample Proportio I practice, we do t kow the true populatio proportio p, so we caot compute the stadard deviatio of, p ( s.d.( ) =. I practice, we oly take oe radom sample, so we oly have oe sample proportio. Replacig p with i the stadard deviatio expressio gives us a estimate that is called the stadard error of. p ( ) s.e.( ) =. If = 0.39 ad = 08, the the stadard error is 0.053. So the true proportio who believe i evolutio is almost surely betwee 0.39 3(0.053) = 0.344 ad 0.39 + 3(0.053) = 0.436. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 7 Idepedet Samples Two samples are called idepedet samples whe the measuremets i oe sample are ot related to the measuremets i the other sample. Radom samples take separately from two populatios ad same respose variable is recorded. Oe radom sample take ad a variable recorded, but uits are categorized to form two populatios. Participats radomly assiged to oe of two treatmet coditios, ad same respose variable is recorded. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 8

Parameter : Differece i two populatio proportios, based o idepedet samples Example research questios: How much differece is there betwee the proportios that would quit smokig if wearig a icotie patch versus if wearig a placebo patch? How much differece is there i the proportio of UCI studets ad UC Davis studets who are a oly child? Were the proportios believig i evolutio the same i 994 ad 005? Populatio parameter: p p = differece betwee the two populatio proportios. Sample estimate: = differece betwee the two sample proportios. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 9 Samplig distributio for the differece i two proportios Approximately ormal Mea is p p = true differece i the populatio proportios Stadard deviatio of p is s. d.( ) = p ( p ) p ( p ) Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 0 + Ex: drugs, cure rates of 60% ad 65%, what is probability that drug will cure more i the sample tha drug if we sample 00 takig each drug? Wat P( > 0) Samplig distributio for is: Approximately ormal Mea =.05. 6 s.d. = (.6).65 (.65 ) + =.048 00 00 See picture o ext slide. Samplig distributio for differece i proportios (00 i each sample) Normal, Mea=-0.05, StDev=0.048-0.0 0.488-0.5-0.0-0.05 0.00 0.05 Possible differeces i proportios cured (Drug - Drug ) 0.0 Geeral format for all samplig distributios i Chapter 9 The samplig distributio of the sample estimate (the sample statistic) is: Approximately ormal Mea = populatio parameter Stadard deviatio is called the stadard deviatio of, where the blak is filled i with the ame of the statistic (p-hat, x-bar, etc.) The estimated stadard deviatio is called the stadard error of. Stadard Error of the Differece Betwee Two Sample Proportios ( ) ( ) s. e.( ) = Are more UCI tha UCD childre a oly child? = 358 (UCI, classes combied) = 73 (UCD) UCI: 40 of the 358 studets were a oly child = p =. UCD: 4 of the 73 studets were a oly child = p =.08 So, p =..08 =.03 + ( )...08(.08) ad se..( = + =.064 358 73 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 3 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 4

Suppose populatio proportios are the same, so true differece p p = 0 Samplig distributio of p The the samplig distributio of is: Approximately ormal Mea = populatio parameter = 0 The estimated stadard deviatio is.064 Observed differece of.03 is z =.74 stadard errors above the mea of 0. See picture o ext slide; area above.03 =.0 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 5 Desity Normal, Mea=0, StDev=0.064 6 4 0 8 6 4 0.0 0 0 0.03 Copyright 004 Brooks/Cole, possible a divisio values of Thomso of differece Learig, Ic., i updated sample by proportios Jessica Utts, Nov 00 6 Stadardized Statistics for samplig distributios Recall the geeral form for stadardizig a radom variable x whe it has a ormal distributio: z = x µ σ For all 5 parameters we will cosider, we ca fid where our observed sample statistic falls if we hypothesize a specific umber for the populatio parameter: sample statistic populatio parameter z = s. d.( sample statistic) Example: Do college studets watch less TV? I geeral, there is t much correlatio betwee age ad hrs/tv per day. I 008 Geeral Social Survey (very large ), 73% watched hours per day. So assume populatio proportio is.73. I a sample of 75 college studets (at Pe State), 05 said they watched or more hours per day. Is it likely that the populatio proportio for studets is also.73? 05 p = =.6 75.6.73 z = = 3.8.034 p( 0.73( 0.73) sd..( ) = = = 0.034 75 This z-score is too small! Area below it is.00007. Studets are differet from geeral populatio. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 7 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 8 Case Study 9. Do Americas Really Vote Whe They Say They Do? Electio of 994: Time Magazie Poll: = 800 adults (two days after electio), 56% reported that they had voted. Ifo from Committee for the Study of the America Electorate: oly 39% of America adults had voted. If p = 0.39 the sample proportios for samples of size = 800 should vary approximately ormally with p( 0.39( 0.39) mea = p = 0.39 ad s.d.( ) = = = 0. 07 800 Case Study 9. Do Americas Really Vote Whe They Say They Do? If respodets were tellig the truth, the sample percet should be o higher tha 39% + 3(.7%) = 44.%, owhere ear the reported percetage of 56%. If 39% of the populatio voted, the stadardized score for the reported value of 56% is 0.56 0.39 z = = 0.0 0.07 It is virtually impossible to obtai a stadardized score of 0. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 9 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 30