Chapter 24: Comparing means

Size: px

Start display at page:

Download "Chapter 24: Comparing means"

Hollie Stafford
5 years ago
Views:

1 Chapter 4: Comparing means Example: Consumer Reports annually conducts a survey of automobile reliability Approximately 4 million households are surveyed by mail, The 990 survey is summarized in the Figure by manufacturing location Volvo 40 4 Volkswagen Jetta Toyota Tercel 4 Toyota Cressida 6 Toyota Corolla 4 Toyota Camry 4 Subaru Loyale 4 Subaru Legacy 4 Pontiac LeMans 4 Oldsmobile Cutlass Ciera 4 Oldsmobile Calais Nissan Stanza 4 Nissan Sentra Nissan Maxima V6 Nissan 40SX 4 Mitsubishi Sigma V6 Mitsubishi Galant 4 Mercury Tracer 4 Mazda Protege 4 Mazda MPV V6 Mazda 99 V6 Mazda 66 4 Honda Prelude Si 4WS Honda Civic CRX Si 4 Honda Civic Honda Accord 4 Ford Thunderbird V6 Ford Tempo 4 Ford Taurus V6 Ford Probe Ford Mustang V8 Ford LTD Crown Victoria V8 Ford Festiva 4 Ford Escort 4 Ford Aerostar V6 Eagle Summit 4 Eagle Premier V6 Dodge Grand Caravan V6 Dodge Daytona Chrysler New Yorker V6 Chrysler Le Baron V6 Chrysler Le Baron Coupe Chevrolet Caprice V8 Chevrolet Camaro V8 Chevrolet Beretta 4 Buick Skylark 4 Buick Le Sabre V6 Buick Century 4 Acura Legend V6 Domestic Reliability Foreign The data suggest that domestic cars are considered by owners to be less reliable than foreign cars Do these data conclusively support the contention than domestic cars are less reliable than foreign? The question is best addressed by making it more specific: is the mean reliability ratings of domestic cars less than that of foreign cars? To proceed, let µ and µ denote the mean reliability ratings of foreign and domestic cars, respectively, for the time period , and suppose that the data shown in the figure are representative of the 0-year time period of interest The objective is to draw inferences regarding µ µ ; specifically, to assess the strength of evidence in favor of the hypothesis H a : µ µ < 0 and against H 0 : µ µ = 0 Inferences regarding µ µ : The point estimator of µ µ is y y The mathematical The response rate is on the order of 0% All car owners are not alike, and they can have personality traits that directly influence their choice of vehicle, their vehicle expectations, and how they subsequently treat and maintain their cars It s impossible to control for this kind of systematic error in their surveys 83

2 foundation for confidence intervals and hypothesis tests regarding µ and µ is the sampling distribution of y y Before developing the sampling distribution of y y, recall that y N(µ, σ / n ) and y N(µ, σ / n ) The results described in Chapter 6 imply the following statements are true: E(y y ) = µ µ, Var(y y ) = Var(y ) + Var(y ) and σ(y y ) = = σ + σ, n n σ + σ n n The last statement is true only if y and y are computed from independent samples from the σ respective populations The term σ(y y ) = n + σ n is the standard deviation of y y The Central Limit Theorem implies that y y N µ µ, σ + σ n In almost all practical applications, σ and σ must be estimated by the sample standard deviations s and s, and σ(y y ) is estimated by the standard error of the difference: s σ(y y ) = + s n n As in the one-sample case, when σ(y y ) is replaced by σ(y y ), the standardized form of y y has a distribution that is approximately t 3 That is, T = y y (µ µ ) s + s n n n t df () Consequently, the t-distribution is used to obtain critical values and for computing p-values The only substantive difference between one- and two-sample situations is that the degrees of freedom (df) for the two-sample case cannot be determined exactly There are two common approximations of the degrees of freedom: 3 The standardized form is exactly t df in distribution if the sampled population is normal; if it s not, then standardized form is approximately in t df distribution 84

3 Set the degrees of freedom to be the smaller of n and n The approximation is conservative in the sense that confidence intervals are slightly wider than necessary and p-values are slightly larger than would be obtained from a more accurate approximation The second approximation (Satterthwaite s approximation) is more accurate but troublesome to compute It is df n ( s + s n n ( s n ) ) + n ( ) s n A 00( α)% confidence interval for µ µ Let t denote the critical value for degrees of freedom df and a 00( α)% confidence level Then, a 00( α)% confidence interval for µ µ is y y ± t σ(y y ) = y y ± t s n + s n For the automobile reliability comparison, n = 6 and n = 3, so the first approximation of df is The R function ttest computes Satterthwaite s approximation of the degrees of freedom as df = The critical value, obtained from R (using the command qt(975,46797)) is t = 0 The sample statistics are y = 4384, s = 03, y = 6, and s = 964 Thus, y y ± t σ(y y ) = ± 0 = [55, 70] The estimated difference in reliability scores between foreign and domestic cars is 3 and a confidence interval for the mean difference among all foreign and domestic cars is [55, 70] Zero is not even remotely close to being bracketed by the interval If Satterthwaite s approximation of the degrees of freedom was replaced by the alternative approximation (the smaller of n = 5 and n = ), then t = 074 The difference in the confidence interval width is negligible when comparing the two degrees of freedom approximation A hypothesis test for comparing µ and µ 4 ttest also computes a confidence interval and a test of the two-sided alternative H a : µ µ 0 85

4 A common null hypothesis when comparing two groups is H 0 : µ = µ, or equivalently, H 0 : µ µ = 0 The alternative hypothesis can be one-sided or two-sided: The test statistic is the two-sample t-statistic H a : µ µ > 0 (or H a : µ > µ ) H a : µ µ < 0 (or H a : µ < µ ) H a : µ µ 0 (or H a : µ µ ) T = = y y σ(y y ) y y s + s n n If H 0 is correct, then T t df As discussed above, there are two common approximations of the degrees of freedom: Set the degrees of freedom to be the smaller of n and n Satterthwaite s approximation: df n H a determines how the p-value is computed: Conditions of the two-sample t-procedures Sampling: ( s + s n n ( s n ) ) + n ( ) s n H a : µ µ < 0 p-value = P (T t H 0 ) H a : µ µ > 0 p-value = P (T t H 0 ) H a : µ µ 0 p-value = P (T t H 0 ) Randomization: In an observational study, the two samples are random samples from their respective populations In a controlled experiment, the subjects are randomly assigned to the two treatments Independent samples: In an observational study, the two samples are drawn independently of each other In a controlled experiment, the condition is met if the subjects are randomly assigned to treatment group 86

5 Normality: The distribution of the variable across population is normal; similarly, the distribution of the variable across population is normal As this condition is rarely met, the distribution of T is accurately approximated by the t df distribution provided that either If a sample size is less than 5, then sample distribution is without skewness or outliers If a sample size between 5 and 40, then the sample distribution is roughly normal (only mild skewness or mild outliers are present) If a sample size is greater than 40, then the sample distribution is of little concern 5 Example: Low birth weight is often an indicator of developmental delays and susceptibility to disease Infant mortality rates and birth defect rates are greater for low birth weight babies than normal birth weight A woman s behavior during pregnancy (including diet, smoking habits, and prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight It s suspected that maternal hypertension is associated with low birth weight, and to investigate the relationship, data were collected on 89 women, of which had low birth weight babies and 77 of which had normal birth weight babies Birthweight (g) Hypertension history No hypertension history The figure above shows the distribution of birth weight for newborns born to mothers with a history of hypertension, and to mothers without history of hypertension The figure is suggestive of a difference between the mean weight µ of newborns born to mothers with a history of hypertension and the mean weight µ of newborns born to mothers without a history of hypertension A formal comparison can be conducted through a hypothesis test 5 In other words, if a sample size is greater than 40, then there is no concern even if the sample is highly skewed and contains outliers 6 Hosmer and Lemeshow (000) Applied Logistic Regression: Second Edition Data were collected at Baystate Medical Center, Springfield, Massachusetts during

6 The hypotheses are The sample statistics are H 0 : µ µ = 0 and H a : µ µ > 0 n = 77 n = y = 97 g y = 5368 g s = 974 g s = 7094 g The sample difference, as a percentage of y is 46%, is practically significant The test statistic is y T = y () s + s n n = (3) = 6 (4) Satterthwaite s approximation of the degrees of freedom is 909 and the p-value is P (T 6 H 0 ) = 0666 There is some evidence supporting the contention that lower birth weights are associated with mother s hypertension Perhaps if the sample size for the hypertensive group were larger, then the data would be more conclusive The independence and normality conditions are apparently satisfied (I used the R command qqnorm to construct the normal quantile-quantile plots to the right) Sample Quantiles Sample Quantiles Theoretical Quantiles Theoretical Quantiles Normal quantile-quantile plots provide a visual check for the fit of a theoretical distribution to the observed data A normal quantile-quantile is constructed by plotting the observed values of a variable against the theoretical quantiles assuming that the data were sampled from a normal distribution If the fit of the theoretical distribution to the observed values is 88

7 good, then the plotted values fall along a straight line, though sampling variability implies that there will be variation about the line If the data were not obtained by random sampling of a normal distribution, then the data pairs will deviate substantively from a straight line I don t know enough about the data to comment on whether the samples are random or representative of larger populations of newborns Example: The textbook website has a data file containing sugar content (percentage by weight) of 7 children s cereals and 9 adult cereals The data are visually summarized to the right 60 The box plot suggests that the cereals are vastly different with respect to sugar content Is this observation supported by an objective statistical analysis? Carry out a hypothesis test appropriate for answering this question Sugar Step identifies to which of the following situations this problem corresponds: 0 Adult Children One population involving the population proportion p Two populations involving a comparison of population proportions p and p 3 One population involving the population mean µ 4 Two populations involving a comparison of population means µ and µ This problem is a two population problem involving a comparison of population means µ and µ : the population of interests are all children s cereals and all adults cereals The parameters of interest are the mean sugar content (percentage by weight) µ of the children s cereal, and mean sugar content (percentage by weight) µ of the adult s cereal Step sets up the hypotheses for testing whether µ is greater than µ H 0 : µ µ = 0 H a : µ µ > 0 89

8 Step 3 identifies the test statistic and computes the terms necessary to evaluate the test statistic The test statistic is the two-sample t-statistic t = y y σ(y y ), where σ(y y ) = s n + s n From StatCrunch, y = 47%, y = 7% and y y = 355% Also, s = 435, s = 880, n = 7, and n = 9 s σ(y y ) = + s n n 435 = = 36 Step 4 checks the large-sample and sampling conditions Randomization is doubtful - I suspect the data are a convenience sample from the selves of a supermarket To be fair, I don t have any evidence that this is how the data was collected Independence: there are only a handful of cereal manufacturers, and so I believe that number of cereals in the samples were made by the same manufacturers Independence is doubtful 3 The samples contain a few outliers and some skewness More than 40 observations are needed in each sample to be confident that the normality condition is satisfied Normality is doubtful 4 It s unclear whether the 0% condition is satisfied since the population sizes are unknown I suspect that there are fewer than 00 cereals nationwide The formal test is nearly pointless because none of the conditions are met; furthermore, the box plots show very large differences between the distributions as do the sample means However, for those readers that are unwilling to draw a conclusion from the boxplot, the t-statistic, viewed as a coarse measure of strength of evidence, shows very strong evidence against H 0 and in favor of H a The conclusion should be stated conservatively since the conditions are not met 90

9 Step 5 computes the test statistic and the p-value: t = = 50 Set df = n = 6 StatCrunch computes p-value = P (t 6 5) = , so I will report that the p-value is less than 00 Step 6 states the conclusion: there is very strong evidence that children s cereal contain more sugar than adult cereal, moreover the observed difference, 355% is very large Said another way, there s about four times as much sugar, on average in childrens cereals than adults cereals 9

Operators and the Formula Argument in lm

Operators and the Formula Argument in lm Recall that the first argument of lm (the formula argument) took the form y. or y x (recall that the term on the left of the told lm what the response variable