Chapter 10. Discrete Data Analysis
|
|
- Katherine Perkins
- 5 years ago
- Views:
Transcription
1 Chapter 1. Discrete Data Analysis 1.1 Inferences on a Population Proportion 1. Comparing Two Population Proportions 1.3 Goodness of Fit Tests for One-Way Contingency Tables 1.4 Testing for Independence in Two-Way Contingency Tables 1.5 Supplementary Problems NIPRL
2 1.1 Inferences on a Population Proportion Population Proportion p with characteristic Random sample of size n With characteristic Without characteristic Cell probability p Cell probability 1-p Cell frequency x Cell frequency n-x Sample proportion µ p X ~ B( n, p) µ x p = n E( µ p) = p and Var( µ p) = For large n, µ æ p(1 p) ö p ~ N - ç p, çè n ø µ p- p ~ N(,1) p(1 - p) n p(1 - p) n NIPRL
3 1.1.1 Confidence Intervals for Population Proportions µ p- p Since ~ N(,1) for large n, two-sided 1- a confidence interval is p(1 - p) n æ µ p(1 - p) µ p(1 - p) ö p Î p z /, p z ç - a + a / çè n n ø µ x µ p(1 - p) where p =, se..( p) =. n n µ µ p (1 - µ p ) 1 xn ( - x ) But it is customary that se..( p) = =. n n n Two-Sided Confidence Intervals for a Population Proportion X ~ Bn (, p) Approximate two-sided 1- a confidence interval for p is æ µ za / xn ( - x) µ za / xn ( - x) ö p Î p, p ç - +. çè n n n n ø The approximation is reasonable as long as both x and n - x are larger than 5. NIPRL 3
4 1.1.1 Confidence Intervals for Population Proportions Example 55 : Building Tile Cracks Random sample n = 15 of tiles in a certain group of downtown building for cracking. x = 98 are found to be cracked. µ x 98 p = = =.784, z.5 =.576 n 15 99% two-sided confidence interval æ µ za / xn ( - x) µ za / xn ( - x) ö p Î p, p ç - + çè n n n n ø æ (15-98) (15-98) ö =.784,.784 ç - + çè ø = (.588,.98)» (.6,.1) Total number of cracked tiles is between.6 5,, = 3, and.1 5,, = 5, NIPRL 4
5 1.1.1 Confidence Intervals for Population Proportions µ p - p Since ~ N(,1) for large n, one-sided 1- a confidence interval p(1 - p) n æ µ p(1 - p) ö for a lower bound is p Î p za,1 ç -. çè n ø One-Sided Confidence Intervals for a Population Proportion X ~ Bn (, p) Approximate two-sided 1- a confidence interval for p is æ µ za xn ( - x) ö æ µ za xn ( - x) ö p Î p,1 and p, p - Î -. ç n n è ø çè n n ø The approximation is reasonable as long as both x and n - x are larger than 5. NIPRL 5
6 1.1. Hypothesis Tests on a Population Proportion Two-Sided Hypothesis Tests H : p = p versus H : p ¹ p A pvalue - = P( X ³ x) if µ p = x / n > p pvalue - = P( X x) if µ p = x / n < p where X ~ Bn (, p ). When np and n(1 - p ) are both larger than 5, a normal approximation can be used to give the p-value of pvalue - = F (- z ) where F ( ) is the standard normal c.d.f. and µ p- p x- np z = =. p(1 - p) np(1 - p) n NIPRL 6
7 1.1. Hypothesis Tests on a Population Proportion Continuity Correction In order to improve the normal approximation the value in the numerator of the z-statistic x- np -.5 may be used when x- np > may x np x np be used when - <.5. A size a hypothesis test accepts H when and rejects H when z z or p - value ³ a / z > z or p - value < a. a / a NIPRL 7
8 1.1. Hypothesis Tests on a Population Proportion One-Sided Hypothesis Tests for a Population Proportion H : p ³ p versus H : p < p A pvalue - = P( X x) where X ~ Bn (, p ). - The normal approximation to this is x- np + - ( ) and. pvalue = F z z =.5 np (1- p ) - Accept H if z ³ - z and reject H if z < - z. a a For H : p < p versus H : p ³ p, A pvalue - = P( X ³ x) where X ~ Bn (, p ). - The normal approximation to this is x- np -.5 pvalue - = 1 - F ( z) and z =. np (1- p ) a - Accept H if z - z and reject H if z > - z. a NIPRL 8
9 1.1. Hypothesis Tests on a Population Proportion Example 55 : Building Tile Cracks 1% or more of the building tiles are cracked? H : p ³.1 versus H A : p <.1 n = 15, x = 98, p =.1. z x- np +.5 = = - np (1- p ).5. pvalue - = F (-.5) =.6. z = -.5 NIPRL 9
10 1.1.3 Sample Size Calculations The most convenient way to assess the amount of precision afforded by a sample size n is to consider L of two-sided confidence interval for p. µ µ p(1 - p) 4 z µ µ a / p(1 - p) L = za / Þ n =. n L Since µ p (1 - µ p ) is unknown, if we take µ p (1 - µ p ) the largest, za / µ 1 n = and p =. L However, if µ p is far from.5, we can take µ p = p* from prior information for p and n 4 z p *(1- p* ).(either p* <.5 or p* >.5) L a / ; NIPRL 1
11 1.1.3 Sample Size Calculations Example 59 : Political Polling To determine the proportion p of people who agree with the statement The city mayor is doing a good job. within 3% accuracy. (-3% ~ +3%), how many people do they need to poll? Construct a 99% confidence interval for p with a length no large thanl = 6% ( µ p - 3% ~ µ p + 3%) za /.576 Þ n = = = ( a =.1) L.6 Þ representative sample : at least 1844 If "1 respondents, ± 3% sampling error", Þ z = nl = 1.6 = 1.9, F (- 1.9) =.87. Þ a a / ;.57 Þ 95% confidence interval NIPRL 11
12 1.1.3 Sample Size Calculations Example 55 : Building Tile Cracks Recall a 99% confidence interval for p is (.588,.98). We need to know p within 1% with 99% confidence.( L = %) Based on the above interval, let p* =.1=.5. 4 za / p*(1 - p*) Þ n ; = 597. or about 6. L Þ representative sample : 475 (initial sample = 15) After the secondary sampling, n = 6 andx = = 46. µ 46 p = =.677 and 6 æ µ za / xn ( - x) µ za / xn ( - x) ö p Î p, p ç - + çè n n n n ø = (.593,.761) NIPRL 1
13 1. Comparing Two Population Proportions Comparing two population proportions Assume X ~ Bn (, p ), Y ~ B( m, p ) and X ^ Y. A µ x µ y µ µ x y pa = and pb = Þ pa - pb = -. n m n m µ µ µ µ pa(1 - pa) pb(1 - pb ) Var( pa - pb) = Var( pa) + Var( pb) = +. n m ( µ p µ A - pb) - ( pa - p µ µ B ) ( pa - pb )- ( pa - pb) z = =. µ p (1 µ ) µ (1 µ ) ( ) ( ) A - pa pb - p xn- x y m- y B n m n m If H : p = p, the pooled estimate is A B µ µ µ x + y pa - pb p = and z =. n + m µ µ æ1 1 ö p(1 - p) ç + çè n m ø B NIPRL 13
14 1..1 Confidence Intervals for the Difference Between Two Population Proportions Confidence Intervals for the Difference Between Two Population Proportions Assume X ~ Bn (, p ), Y ~ B( m, p ) and X ^ Y. Approximate two-sided 1-a A B confidence level confidence interval is æ µ µ µ p µ µ µ µ µ µ µ ö A(1 - pa) pb(1 - pb) pa - pb Î pa - pb - z µ µ pa(1 - pa) pb(1 - pb ) a /, pa pb z a / + n m n m ç è ø æ µ µ xn ( - x) y( m- y) µ µ xn ( - x) y( m- y) ö = pa pb za /, p 3 3 A pb z ç a / çè n m n m ø Approximate one-sided 1-a confidence level confidence interval is æ µ µ xn ( - x) y( m- y) ö pa - pb Î pa pb z 3 3, 1 ç - - a + and çè n m ø æ µ µ xn ( - x) y( m- y) ö pa - pb Î 1, pa pb z ç a çè n m ø The approximations are reasonable as long as x, n - x, y andm -y are all larger than 5. NIPRL 14
15 1..1 Confidence Intervals for the Difference Between Two Population Proportions Example 55 : Building Tile Cracks Building A : 46 cracked tiles out of n = 6. Building B : 83 cracked tiles out of m =. µ x 46 µ y 83 p = A.677 and pb.415. n = 6 = = m = = æ µ µ xn ( - x) y( m- y) µ µ xn ( - x) y( m- y) ö pa - pb Î pa pb z /, p 3 3 A pb z ç - - a a / çè n m n m ø = (.1,.44) wherea =.1. The fact this confidence interval contains only positive values indicates that p > p. A B NIPRL 15
16 1.. Hypothesis Tests on the Difference Between Two Population Proportions Hypothesis Tests of the Equality of Two Population Proportions Assume X ~ Bn (, p ), Y ~ Bm (, p ) and X ^ Y. A B H : p = p versus H : p ¹ p has p -value = F (- z ) A B A A wherez = B p A - pb and µ p = µ µ æ1 1 ö p(1 - p) ç + çè n m ø - AcceptH if z z and RejectH if z > z. a / a / x + y n + m H : p ³ p versus H : p < p has p - value = F ( z) A B A A B AcceptH if z ³ - z and RejectH if z < - z. a a H : p p versus H : p > p has p -value = 1 - F ( z) A B A A B AcceptH if z - z and RejectH if z > - z. a a NIPRL 16
17 1.. Hypothesis Tests on the Difference Between Two Population Proportions Example 59 : Political Polling µ 67 µ 41 pa = =.65 and pb = = Population age age >= 4 A B H : p = p versus H : p ¹ p A B A A B µ Pooled estimate p = = z = 11.39, p -value = F ( ) ; Random sample n=95 Random sample m=143 The city mayor is doing a good job. - Two-sided 99% confidence interval is p - p Î A B (.199,.311) Agree : x = 67 Disagree : n-x = 35 Agree : y = 41 Disagree : m-y = 6 NIPRL 17
18 Summary problems (1) Why do we assume large sample sizes for statistical inferences concerning proportions? So that the Normal approximation is a reasonable approach. α () Can you find an exact size test concerning proportions? No, in general. NIPRL 18
19 1.3 Goodness of Fit Tests for One-Way Contingency Tables One-Way Classifications Each of n observations is classified into one of k categories or cells. cell frequencies : x, x, ¼, x withx + x + L + x = n x ~ Bn (, p ) Þ µ p = 1 k 1 cell probabilities : p, p, ¼, p with p + p + L + p = 1 i i i Hypothesis Test 1 k 1 xi n H : p = p 1 i k versus H : H is false * i A The null hypothesis of homogeneity is that p * i i 1 = for 1 i k k k k NIPRL 19
20 1.3.1 One-Way Classifications Example 1 : Machine Breakdowns n = 46 machine breakdowns. x 1 = 9 : electrical problems x = 4 : mechanical problems x 3 = 13 : operator misuse It is suggested that the probabilities of these three kinds are p* 1 =., p* =.5, p* 3 =.3. NIPRL
21 1.3.1 One-Way Classifications Goodness of Fit Tests for One-Way Contingency Tables Consider a multinomial distribution with k-cells and p1 p ¼ p k 1 ¼, k with 1 a set of unknown cell probabilities,,, and a set of observed cell frequencies x, x, H p = p i k * : i 1 i p - value = P( c ³ X ) or p - value = P( c ³ G ) k- 1 k- 1 ( x - e ) æx ö wherex = andg = x ln ç ø k k i i i å å i i= 1 ei i= 1 çè ei * with ei np, 1 = i k i The p -value are appropriate as long ase a, k- 1 a, k- 1 a, k- 1 a, k- 1 x x + x + L + x = n ³ 5, 1 i k. At size a, accepth if X c (or if G c ) and rejecth if X > c (or if G > c ). i k NIPRL 1
22 1.3.1 One-Way Classifications Example 1 : Machine Breakdowns H : p 1 =., p =.5, p 3 =.3 Observed cell freq. Electrical Mechanical Operator misuse x 1 = 9 x = 4 x 3 = 13 n = 46 Expected e 1 = 46*. e = 46*.5 e 3 = 46*.3 n = 46 cell freq. = 9. =3. =13.8 X G (9.- 9.) (4.- 3.) ( ) = + + = æ æ9. ö æ4. ö æ13. ö = 9. ln + 4. ln ln =.945 ç è çè 9. ø çè 3. ø èç 13.8 ø p - value = P( c ³.94) ;.95 ( k - 1= 3-1= ) NIPRL
23 1.3.1 One-Way Classifications H is plausible from the p - value = P( c ³.94) ;.95. Consider 95% confidence interval for p. p æ (46-4) (46-4) ö Î, ç - + çè ø = (.378,.666) Check for the homogeneity. Let p = p = p = Thene = e = e = = ( ) ( ) ( ) X = + + = p - value = Pc ( ³ 7.87) ;. Þ 1 3 ( ) Not plausible. Also notice Ï.378, NIPRL 3
24 1.3. Testing Distributional Assumptions Example 3 : Software Errors For some of expected values are smaller than 5, it is appropriate to group the cells. Test if the data are from a Poisson distribution with mean=3. NIPRL 4
25 1.3. Testing Distributional Assumptions X ( ) ( ) ( ) = ( ) ( ) ( ) = 5.1 p - value = Pc ( ³ 5.1) =.4 Þ 5 Plausible that the software errors have a Poisson distribution with meanl = 3. If : the software errors have a Poisson distribution, e would be calculated usingl = x =.76 and p -value would i H be calculated from a chi-square distribution with k - 1-1= 4 degrees of freedom. NIPRL 5
26 1.4 Testing for Independence in Two-Way Contingency Tables Two-Way Classifications A two-way (r x c) contingency table. Second Categorization Level 1 Level Level j Level c First Categorization Level 1 x 11 x 1 x 1c x 1. Level x 1 x x c x. Level i x ij x i. Level r x r1 x r x rc x r. x. 1 x. x. j x. c n = x.. Column marginal frequencies Row marginal frequencies NIPRL 6
27 1.4.1 Two-Way Classifications Example 55 : Building Tile Cracks Location Building A Building B Tile Condition Undamaged x 11 = 5594 x 1 = 1917 x 1. = 7511 Cracked x 1 = 46 x = 83 x. = 489 x. 1 = 6 x. = n = x.. = 8 Notice that the column marginal frequencies are fixed. ( x. 1 = 6, x. = ) NIPRL 7
28 1.4. Testing for Independence Testing for Independence in a Two-Way Contingency Table H : Two categorizations are independent. Test statistics r c ( x ) r c ij - e æ ij x ij ö X = org xij ln å å = å å i= 1 j= 1 e ç ij i= 1 j= 1 çè eij ø xx i.. j whereeij = n p -value = P( c ³ X ) or p - value = P( c ³ G ) where n = ( r - 1) ( c- 1) A sizea hypothesis test accepts H if c > X (or G ) rejects H n if c < X (or G ) a, n n a, n NIPRL 8
29 1.4. Testing for Independence Example 55 : Building Tile Cracks Building A Building B Undamaged x 11 = 5594 e 11 = x 1 = 1917 e 1 = x 1. = 7511 Cracked x 1 = 46 e 1 = x = 83 e = 1.5 x. = 489 x. 1 = 6 x. = n = x.. = 8 e e = = , e = = = = , e = = NIPRL 9
30 1.4. Testing for Independence X ( ) ( ) = ( ) ( ) = p - value = P( c ³ ) ; 1 Consequently, it is not plausible that becomes cracked) and does not contain zero. p B p A (the probability that a tile on Building A (the probability that a tile on Building B becomes cracked) are equal. This is consistent that the confidence interval of p and p p - p Î A B (.1,.44) A formula for the Pearson chi-square statistic for table is X = n( x x - x x ) x x x x g g1 g g A B NIPRL 3
31 Summary problems 1. Construct a goodness-of-fit test for testing a distributional assumption of a normal distribution by applying the one-way classification method.. In testing H : p = p vsh : noth, what is the df of the Pearson * ij ij 1 chi-square statistic, wheni = 1,..., r, j = 1,..., c? NIPRL 31
Topic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as
page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These
More informationTUTORIAL 8 SOLUTIONS #
TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationFormulas and Tables by Mario F. Triola
Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or
More informationModule 10: Analysis of Categorical Data Statistics (OA3102)
Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationChapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.
Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:
More information11-2 Multinomial Experiment
Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:
More informationIntroduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution
Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More information10.2: The Chi Square Test for Goodness of Fit
10.2: The Chi Square Test for Goodness of Fit We can perform a hypothesis test to determine whether the distribution of a single categorical variable is following a proposed distribution. We call this
More informationF O R SOCI AL WORK RESE ARCH
7 TH EUROPE AN CONFERENCE F O R SOCI AL WORK RESE ARCH C h a l l e n g e s i n s o c i a l w o r k r e s e a r c h c o n f l i c t s, b a r r i e r s a n d p o s s i b i l i t i e s i n r e l a t i o n
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationPractice Questions: Statistics W1111, Fall Solutions
Practice Questions: Statistics W, Fall 9 Solutions Question.. The standard deviation of Z is 89... P(=6) =..3. is definitely inside of a 95% confidence interval for..4. (a) YES (b) YES (c) NO (d) NO Questions
More informationExample. χ 2 = Continued on the next page. All cells
Section 11.1 Chi Square Statistic k Categories 1 st 2 nd 3 rd k th Total Observed Frequencies O 1 O 2 O 3 O k n Expected Frequencies E 1 E 2 E 3 E k n O 1 + O 2 + O 3 + + O k = n E 1 + E 2 + E 3 + + E
More informationGEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs
STATISTICS 4 Summary Notes. Geometric and Exponential Distributions GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs P(X = x) = ( p) x p x =,, 3,...
More informationFormulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion
Formulas and Tables for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. Ch. 3: Descriptive Statistics x Sf. x x Sf Mean S(x 2 x) 2 s Å n 2 1 n(sx 2 ) 2 (Sx)
More information15: CHI SQUARED TESTS
15: CHI SQUARED ESS MULIPLE CHOICE QUESIONS In the following multiple choice questions, please circle the correct answer. 1. Which statistical technique is appropriate when we describe a single population
More informationIntroduction to Survey Analysis!
Introduction to Survey Analysis! Professor Ron Fricker! Naval Postgraduate School! Monterey, California! Reading Assignment:! 2/22/13 None! 1 Goals for this Lecture! Introduction to analysis for surveys!
More informationStatistics for Managers Using Microsoft Excel
Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationPOLI 443 Applied Political Research
POLI 443 Applied Political Research Session 6: Tests of Hypotheses Contingency Analysis Lecturer: Prof. A. Essuman-Johnson, Dept. of Political Science Contact Information: aessuman-johnson@ug.edu.gh College
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationInferences for Proportions and Count Data
Inferences for Proportions and Count Data Corresponds to Chapter 9 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT), with some slides by Ramón V. León (University of Tennessee) 1 Inference
More informationLecture 25: Models for Matched Pairs
Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture
More informationChi-Squared Tests. Semester 1. Chi-Squared Tests
Semester 1 Goodness of Fit Up to now, we have tested hypotheses concerning the values of population parameters such as the population mean or proportion. We have not considered testing hypotheses about
More informationLecture 41 Sections Mon, Apr 7, 2008
Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,
More informationSTP 226 ELEMENTARY STATISTICS NOTES
STP 226 ELEMENTARY STATISTICS NOTES PART 1V INFERENTIAL STATISTICS CHAPTER 12 CHI SQUARE PROCEDURES 12.1 The Chi Square Distribution A variable has a chi square distribution if the shape of its distribution
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More information(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)
45 TIME SERIES (Chapter 8 of Wilks) In meteorology, the order of a time series matters! We will assume stationarity of the statistics of the time series. If there is non-stationarity (e.g., there is a
More informationBivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationEpidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Hypothesis testing Anna Wegloop iels Landwehr/Tobias Scheffer Why do a statistical test? input computer model output Outlook ull-hypothesis
More informationHYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC
1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationStatistics. Statistics
The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,
More informationWe know from STAT.1030 that the relevant test statistic for equality of proportions is:
2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationFormulas and Tables. for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. ˆp E p ˆp E Proportion.
Formulas and Tables for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. Ch. 2: Descriptive Statistics x Sf. x x Sf Mean S(x 2 x) 2 s Å n 2 1 n(sx 2 ) 2 (Sx) 2 s Å n(n 2 1) Mean (frequency
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationT i t l e o f t h e w o r k : L a M a r e a Y o k o h a m a. A r t i s t : M a r i a n o P e n s o t t i ( P l a y w r i g h t, D i r e c t o r )
v e r. E N G O u t l i n e T i t l e o f t h e w o r k : L a M a r e a Y o k o h a m a A r t i s t : M a r i a n o P e n s o t t i ( P l a y w r i g h t, D i r e c t o r ) C o n t e n t s : T h i s w o
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationRama Nada. -Ensherah Mokheemer. 1 P a g e
- 9 - Rama Nada -Ensherah Mokheemer - 1 P a g e Quick revision: Remember from the last lecture that chi square is an example of nonparametric test, other examples include Kruskal Wallis, Mann Whitney and
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationLecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
Lecture 22 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 19, 2007 1 2 3 4 5 6 7 8 9 1 tests for equivalence of two binomial 2 tests for,
More informationSlides Prepared by JOHN S. LOUCKS St. Edward s s University Thomson/South-Western. Slide
s Preared by JOHN S. LOUCKS St. Edward s s University 1 Chater 11 Comarisons Involving Proortions and a Test of Indeendence Inferences About the Difference Between Two Poulation Proortions Hyothesis Test
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationFormulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc.
Formulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc. Ch. 2: Descriptive Statistics x Sf. x x Sf Mean S(x 2 x) 2 s 2
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationChi-Square. Heibatollah Baghi, and Mastee Badii
1 Chi-Square Heibatollah Baghi, and Mastee Badii Different Scales, Different Measures of Association Scale of Both Variables Nominal Scale Measures of Association Pearson Chi-Square: χ 2 Ordinal Scale
More information1.0 Hypothesis Testing
.0 Hypothesis Testing The Six Steps of Hypothesis Testing Rejecting or Accepting the Null Samples and Data. Parametric Tests for Comparing Two Populations Testing Equality of Population Means (T Test!
More informationLOOKING FOR RELATIONSHIPS
LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an
More informationThe scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationLecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests
Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationAnalysis of data in square contingency tables
Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first
More informationReview: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.
1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately
More informationMultiple comparisons - subsequent inferences for two-way ANOVA
1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of
More informationPart 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2
Problem.) I will break this into two parts: () Proving w (m) = p( x (m) X i = x i, X j = x j, p ij = p i p j ). In other words, the probability of a specific table in T x given the row and column counts
More informationprobability of k samples out of J fall in R.
Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More information7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between
7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation
More informationMath 152. Rumbos Fall Solutions to Exam #2
Math 152. Rumbos Fall 2009 1 Solutions to Exam #2 1. Define the following terms: (a) Significance level of a hypothesis test. Answer: The significance level, α, of a hypothesis test is the largest probability
More informationQuantitative Analysis and Empirical Methods
Hypothesis testing Sciences Po, Paris, CEE / LIEPP Introduction Hypotheses Procedure of hypothesis testing Two-tailed and one-tailed tests Statistical tests with categorical variables A hypothesis A testable
More informationChi-square (χ 2 ) Tests
Math 442 - Mathematical Statistics II April 30, 2018 Chi-square (χ 2 ) Tests Common Uses of the χ 2 test. 1. Testing Goodness-of-fit. 2. Testing Equality of Several Proportions. 3. Homogeneity Test. 4.
More informationOrdinal Variables in 2 way Tables
Ordinal Variables in 2 way Tables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 C.J. Anderson (Illinois) Ordinal Variables
More information2.3 Analysis of Categorical Data
90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results
More informationSection VII. Chi-square test for comparing proportions and frequencies. F test for means
Section VII Chi-square test for comparing proportions and frequencies F test for means 0 proportions: chi-square test Z test for comparing proportions between two independent groups Z = P 1 P 2 SE d SE
More informationProbability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!
Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive
More informationCh. 11 Inference for Distributions of Categorical Data
Ch. 11 Inference for Distributions of Categorical Data CH. 11 2 INFERENCES FOR RELATIONSHIPS The two sample z procedures from Ch. 10 allowed us to compare proportions of successes in two populations or
More informationPower and Sample Size for Log-linear Models 1
Power and Sample Size for Log-linear Models 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 1 Overview 2 / 1 Power of a Statistical test The power of a test is the probability of rejecting
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationThe Multinomial Model
The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X 1.04) =.8508. For z < 0 subtract the value from
More informationSampling, Confidence Interval and Hypothesis Testing
Sampling, Confidence Interval and Hypothesis Testing Christopher Grigoriou Executive MBA HEC Lausanne 2007-2008 1 Sampling : Careful with convenience samples! World War II: A statistical study to decide
More information2.6.3 Generalized likelihood ratio tests
26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used
More informationPump failure data. Pump Failures Time
Outline 1. Poisson distribution 2. Tests of hypothesis for a single Poisson mean 3. Comparing multiple Poisson means 4. Likelihood equivalence with exponential model Pump failure data Pump 1 2 3 4 5 Failures
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationChi-square (χ 2 ) Tests
Math 145 - Elementary Statistics April 17, 2007 Common Uses of the χ 2 test. 1. Testing Goodness-of-fit. Chi-square (χ 2 ) Tests 2. Testing Equality of Several Proportions. 3. Homogeneity Test. 4. Testing
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationSome comments on Partitioning
Some comments on Partitioning Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/30 Partitioning Chi-Squares We have developed tests
More informationHomework 9 Sample Solution
Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold
More informationThe Chi-Square Distributions
MATH 03 The Chi-Square Distributions Dr. Neal, Spring 009 The chi-square distributions can be used in statistics to analyze the standard deviation of a normally distributed measurement and to test the
More informationCorrespondence Analysis
Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,
More informationSession 3 The proportional odds model and the Mann-Whitney test
Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session
More informationThe Components of a Statistical Hypothesis Testing Problem
Statistical Inference: Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 we studied one
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More information! " # $! % & '! , ) ( + - (. ) ( ) * + / 0 1 2 3 0 / 4 5 / 6 0 ; 8 7 < = 7 > 8 7 8 9 : Œ Š ž P P h ˆ Š ˆ Œ ˆ Š ˆ Ž Ž Ý Ü Ý Ü Ý Ž Ý ê ç è ± ¹ ¼ ¹ ä ± ¹ w ç ¹ è ¼ è Œ ¹ ± ¹ è ¹ è ä ç w ¹ ã ¼ ¹ ä ¹ ¼ ¹ ±
More information