This page intentionally left blank

Size: px

Start display at page:

Download "This page intentionally left blank"

Rosa Price
6 years ago
Views:

3 This page intentionally left blank

5 Copyright 006 New Age International (P) Ltd., Publishers Published by New Age International (P) Ltd., Publishers All rights reserved. No part of this ebook may be reproduced in any form, by photostat, microfilm, xerography, or any other means, or incorporated into any information retrieval system, electronic or mechanical, without the written permission of the publisher. All inquiries should be ed to ISBN : PUBLISHING FOR ONE WORLD NEW AGE INTERNATIONAL (P) LIMITED, PUBLISHERS 4835/4, Ansari Road, Daryaganj, New Delhi Visit us at

6 PREFACE Statistics is a subject used in research and analysis of data in almost all fields. Official government statistics are our old records and creates historical evidences. Many people have contributed to the refinement of statistics, which we use today in various fields. It is a long process of development. Today we have many statistical tools for application and analysis of data in various fields like business, medicine, engineering, agriculture, management etc. Many people feel difficult to find which statistical technique is to be applied and where. Even though computer softwares have minimized the work, a basic knowledge is must for proper application. This book is providing the important and widely used statistical tests with worked out examples and exercises in real life applications. It is presented in a simple way in an understandable manner. It will be useful for the researchers to apply these tests for their data analysis. The statisticians also find it useful for easy reference. It is good companion for all who need statistical tools for their field. The author is greatly indebted to the Authorities of Annamalai University for permitting to publish this book. V. Rajagopalan

7 This page intentionally left blank

8 CONTENTS Preface... v. INTRODUCTION PARAMETRIC TESTS Test Test for a Population Proportion... 9 Test Test for a Population Mean (Population variance is known)...3 Test 3 Test for a Population Mean (Population variance is unknown)...6 Test 4 Test for a Population Variance (Population mean is known)...0 Test 5 Test for a Population Variance (Population mean is unknown)...4 Test 6 Test for Goodness of Fit...7 Test 7 Test for Equality of two Population Proportions...30 Test 8 Test for Equality of two Population Means (Population variances are equal and known)...33 Test 9 Test for Equality of two Population Means (Population variances are unequal and known)...36 Test 0 Test for Equality of two Population Means (Population variances are equal and unknown)...39 Test Test for Paired Observations...4 Test Test for Equality of two Population Standard Deviations...45 Test 3 Test for Equality of two Population Variances...48 Test 4 Test for Consistency in a table...53 Test 5 Test for Homogeneity of Several Population Proportions...56 Test 6 Test for Homogeneity of Several Population Variances (Bartlett's test)...60 Test 7 Test for Homogeneity of Several Population Means...65 Test 8 Test for Independence of Attributes...70 Test 9 Test for Population Correlation Coefficient Equals Zero...74 Test 0 Test for Population Correlation Coefficient Equals a Specified Value...78 Test Test for Population Partial Correlation Coefficient...8 Test Test for Equality of two Population Correlation Coefficients...83 Test 3 Test for Multiple Correlation Coefficient...86

9 viii Contents Test 4 Test for Regression Coefficient...88 Test 5 Test for Intercept in a Regression ANALYSIS OF VARIANCE TESTS Test 6 Test for Completely Randomized Design...97 Test 7 ANOCOVA Test for Completely Randomized Design... 0 Test 8 Test for Randomized Block Design Test 9 Test for Randomized Block Design... 5 (More than one observation per cell) Test 30 ANOCOVA Test for Randomized Block Design... 0 Test 3 Test for Latin Square Design... 7 Test 3 Test for Factorial Design... 3 Test 33 Test for 3 Factorial Design Test 34 Test for Split Plot Design... 4 Test 35 ANOVA Test for Strip Plot Design MULTIVARIATE TESTS Test 36 Test for Population Mean Vectors (Covariance matrix is known)...57 Test 37 Test for Population Mean Vector (Covariance matrix is known) Test 38 Test for Equality of Population Mean Vectors (Covariance matrices are equal and known) Test 39 Test for Equality of Population Mean Vectors (Covariance matrices are equal and unknown) Test 40 Test for Equality of Population Mean Vectors (Covariance matrices are unequal and unknown) NON-PARAMETRIC TESTS Test 4 Sign Test for Median Test 4 Sign Test for Medians (Paired observations) Test 43 Median Test Test 44 Median Test for two Populations... 8 Test 45 Median Test for K Populations Test 46 Wald Wolfowitz Run Test Test 47 Kruskall Wallis Rank Sum Test (H Test) Test 48 Mann Whitney Wilcoxon Rank Sum Test... 9 Test 49 Mann Whitney Wilcoxon U-Test Test 50 Kolmogorov Smirnov Test for Goodness of Fit Test 5 Kolmogorov Smirnov Test for Comparing two Populations Test 5 Spearman Rank Correlation Test... 0 Test 53 Test for Randomness Test 54 Test for Randomness of Rank Correlation Test 55 Friedman's Test for Multiple Treatment of a Series of Objects... 07

10 Contents ix 6. SEQUENTIAL TESTS Test 56 Sequential Test for Population Mean (Variance is known)... 3 Test 57 Sequential Test for Standard Deviation (Mean is known)...6 Test 58 Sequential Test for Dichotomous Classification... 8 Test 59 Sequential Test for the Parameter of a Bernoulli Population... 0 Test 60 Sequential Probability Ratio Test TABLES REFERENCES

11 CHAPTER INTRODUCTION Testing of Statistical hypotheses is a remarkable aspect of statistical theory, which helps us to make decisions where there is a lack of uncertainty. There are many real life situations where we would like to take a decision for further action. Further, there are some problems, for which we would like to determine whether the claims are acceptable or not. Suppose that we are interested to test the following claims:. The average consumption of electricity in city A is 75 units per month.. Bath soap B reduces the rate of skin infections by 50%. 3. Oral polio vaccine is more potent than parenteral polio vaccine. 4. A new variety of paddy yields 6.5 tones per hectare. 5. Drug C produces less drug dependence than drug D. 6. Health drink E improves weight gain by 5% for children. 7. Plant produced by cloning grows 50% faster than the ordinary one. 8. Door-to-door campaign increases the sales of a washing powder by 0%. 9. Machine F produces items within specifications than Machine G. 0. The defective items in a large consignment of coconut is less than 4%. These are a few of the many varieties of problems, which can be solved, only with the help of statisticians. To solve such problems, we need the following basic and important concept in statistics theory, as follows.. POPULATION In any statistical investigation, the interest usually lies in the assessment of general magnitude with respect to one or more characters relating to individuals belonging to a group. Such group of individuals under study is called population. The number of units in any population is known as population size, which may be either finite or infinite. In a finite population, the size is denoted by, N. Thus in statistics, population is an aggregate of objects, animate or inanimate under study. In statistical survey, complete enumeration of population is tedious, if the population size is too large or infinite. In some situations, even though, 00% inspection is possible, the units are destroyable during the course of inspection. As there are various constraints in conducting complete enumeration namely man-power, time factor, expenditure etc., we take the help of sampling.

12 Selected Statistical Tests. SAMPLE A finite, small subset of units of a population is called a sample and the number of units in a sample is called sample size and is denoted by n. The process of selecting a sample is known as sampling. Every member of a sample is called sample unit and the numerical values of such sample units are called observations. If each unit of population has an equal chance of being included in it, then such a sample is called random sample. A sample of n observations be denoted by X, X,, X n. 3. PARAMETERS The statistical measures namely mean, standard deviation, variance, correlation coefficient etc., if they are calculated based on the population are called parameters. If the population information is neither available completely nor finite, parameters cannot be evaluated. In such cases, the parameters are termed as unknown. 4. STATISTICS The statistical measures, if they are obtained, based on the sample alone, they are called statistics. Any function of sample observations is also known as a statistic. The following are the list of standard symbols used for parameters and statistics: Statistical measures Parameter Statistic Mean µ X Median M m Standard deviation σ s Variance σ s Proportion P p Correlation coefficient ρ r Regression coefficient β b 5. SAMPLING ERROR Errors arise because only a part of the population is (i.e., sample) used to estimate the parameters and drawing inferences about the population. Such error is called sampling error. 6. STATISTICAL INFERENCE The process of ascertaining or arriving valid conclusions to the population based on a sample or samples is called statistical inference. It has two major divisions namely, estimation and testing of hypothesis. 7. ESTIMATION When the parameters are unknown, they are estimated by their respective statistics based on the samples. Such a process is called estimation. If an unknown parameter is estimated by a specific statistic, it is called an estimator. For example, the sample mean is an estimator to the population mean. If a specific value is used for estimating, the unknown parameter is called an estimate. It is broadly classified into two types namely point and interval estimation.

13 Introduction 3 8. POINT AND INTERVAL ESTIMATION If a single value is used as an estimate to the unknown parameter, it is called as point estimate and if we choose two values a and b (a < b) so that the unknown parameter is expected to lie in between a and b. Such an interval (a, b), found for estimating the parameter is called as an interval estimate. 9. TESTING OF HYPOTHESIS Hypothesis testing begins with an assumption or hypothesized value that we make about the unknown population parameter. The sample data are collected and sample statistics are obtained from it. These statistics are used to test the assumption about the parameter whether we made is correct. The difference between the hypothesized value and the actual value of the sample statistic is determined. Then we decide whether the difference is significant or not. The smaller the difference, the greater the likelihood, that our hypothesized value is correct. We cannot accept or reject the hypothesized value about a population parameter simply by intuition. The statistical tests for testing the significance of the difference between the hypothesized value and the actual value of the sample statistic or the difference between any set of sample statistics are called tests of significance. 0. STANDARD ERROR The standard deviation of any statistic is known as its standard error and it is abbreviated as S.E. It plays an important role in statistical tests. List of standard errors of some well-known statistics for large samples are given below: S.No. Statistic Standard error X σ / n p PQ / n 3 s σ / n 4 s σ / n 5 r ( ρ )/ n 6 ( X X ) σ n σ + n 7 ( s s ) 8 ( p p ) σ σ + n n P Q n P Q + n. PARAMETRIC TESTS The statistical tests for testing the parameters of the population are called parametric tests. The different kinds of parametric tests are studied in Chapter.

14 4 Selected Statistical Tests The following are the test procedures that we adopt in studying the parametric tests in a systematic manner:. Null Hypothesis It is a tentative statement about the unknown population parameter. It is to be tested based on the sample data. It is always of no difference between the hypothesized value and the actual value of the sample statistic. It is to be tested, for possible rejection under the assumption that it is true. It is usually denoted by H 0.. Alternative Hypothesis Any hypothesis, which is complementary to the null hypothesis, is called an alternative hypothesis. It is usually denoted by H..3 Type-I and Type-II Errors In hypothesis testing, we draw valid inferences about the population parameters on the basis of the sample data alone. Due to sampling errors, there may be a possibility of rejecting a true null hypothesis, called as Type-I error and of accepting a false null hypothesis, called as Type-II error are tabulated as follows: Situation H 0 is true H 0 is false Conclusion (H is false) (H is true) H 0 is accepted Correct Type-II (H is rejected) Decision Error H 0 is rejected Type-I Correct (H is accepted) Error Decision The acceptance or rejection of H 0 depend on the test criterion that is used in hypothesis testing. In any hypothesis testing, we would like to control both Type-I and Type-II errors. The probability of committing Type-I error is denoted by α and the probability of committing Type-II error is denoted by β..4 Level of Significance There is no standard or universal level of significance for testing hypotheses. In some instances, a 5 percent level or percent of significance are used. However, the choice of the level of significance must be at minimum. The higher the significance level leads to higher the probability of rejecting a null hypothesis when it is true. Usually, the level of significance is the size of the Type-I error, i.e., either 5% or %, is to be fixed in advance before collecting the sample information..5 Critical Region A region corresponding to a statistic, t in the sample space S which amounts to rejecting of H 0 is termed as region of rejection or critical region. If ω is the critical region and if t is a statistic based on a sample of size, n then P (t ω H 0 ) = α. That is, the null hypothesis is rejected, if the observed value falls in the critical region. The boundary value of the critical region is called as critical value. Let it be Z α..6 One-sided and Two-sided Tests In any test, the critical region is represented by a portion of area under the probability curve of the sampling distribution of the statistic. In a statistical test, if the alternative hypothesis is one-sided (left-

15 Introduction 5 sided or right-sided) is called a one-sided test. For example, a test for testing the mean of a population, H 0 : µ = µ 0 against the alternative hypothesis H : µ < µ 0 (left-sided) or H : µ > µ 0 (right-sided) and for testing H 0 against H : µ µ 0 (two-sided) is known as two-sided test..7 Test Statistic A statistical test is conducted by means of a test statistic for which the probability distribution is determined by the assumption that the null hypothesis is true. It is based on the statistic, the expected value of the statistic (hypothesized value assumed in H 0 ) and the standard error of the statistic. The value so obtained as test statistic value based on the observed data is called observed value of the test statistic, let it be Z, and we use this value for arriving conclusion..8 Conclusion By comparing the two values namely, the observed value of the test statistic and the critical value, the conclusion is arrived at. If Z Z α, we conclude that there is no evidence against the null hypothesis H 0 and hence it may be accepted. If Z > Z α, we conclude that there is evidence against the null hypothesis H 0 and in favor of H. Hence, H 0 is rejected and alternatively, H is accepted.. ANALYSIS OF VARIANCE It is a powerful statistical tool in tests of significance. In parametric tests, we discussed the statistical tests relating to mean of a population or equality of means of two populations. In situations, when we have three or more samples to consider at a time, an alternative procedure is needed for testing the hypothesis that all the samples are drawn from the same populations, which have the same mean. Analysis of variance (ANOVA) was introduced by R.A. Fisher to deal the problem in the analysis of agricultural data. Variations in the observations are inherent in nature. The total variation in the observed data is due to the following two causes namely, (i) assignable causes, and (ii) chance causes. By this technique, the total variation in the sample data can be bifurcated into variation between sample and variation within samples. The second kind of variation is due to experimental error. These kinds of tests are very much applicable in agricultural field experiments, where they want to know the yield of different kinds of seeds, fertilizers adopted, pesticides used, different irrigation, cultivation method etc., accordingly there are different types of ANOVA tests available and are provided in Chapter 3. In ANOVA tests, we need the following terms with their definitions:. Treatments Various factors or methods that we adopted in a comparative experiment are termed as treatments. For example, in field experiments, different varieties of paddy seeds, different kinds of fertilizers, different methods of cultivation etc., are called treatments.. Experimental Unit A small area of experimental material is used for applying the treatment is called an experimental unit. In agricultural experiments, a cultivated land, usually called as experimental material is divided into smaller areas of plots in which, different treatment can be applied in it. Such kind of plots are called experimental units.

16 6 Selected Statistical Tests.3 Blocks In field experiments, the experimental material is firstly divided into relatively homogeneous divisions, known as Blocks. All the blocks are further divided into small plots of experimental units..4 Replication The repetition of the treatments to the experimental units more number of times under investigation is called replication. In agricultural experiments, each block will receive all the treatments and in every block the similar treatments are repeated according to the number of blocks available. Hence, in analysis, the number of blocks will be same as number of replications..5 Randomization The adoption of various treatments to the experimental units in a random manner is called randomization. Different kinds of randomization will be adopted in the ANOVA tests, namely, complete randomization, randomization within blocks, row-wise, column-wise etc., according to the types of experimental designs. 3. MULTIVARIATE DATA ANALYSIS The data and analysis that we consider for more than one character (variable) plays an important role in the theory of statistics, usually called as multivariate analysis. Such kind of data will be in two dimensions. For example, in the study of physical characters namely, age (X ), height (X ), weight (X 3 ) of N individuals, it can be arranged into a two dimensional data in the form of a matrix of order, 3 N observations, the one direction being the sample numbers and the other being the variables. Hence, matrix theory has a major role in multivariate data analysis and the readers should have knowledge on matrix algebra. The tests of significance relating to multivariate data are provided in Section NON-PARAMETRIC METHODS The hypothesis tests mentioned above have made inferences about population parameters. These parametric tests have used the parametric statistics of samples that came from the population being tested. For those tests, we made the assumption about the population from which the samples were drawn. There are tests, which do not have any restriction or assumption about the population from which we sampled. They are known as distribution free or non-parametric tests. The hypotheses of non-parametric tests are concerned with something other than the value of a population parameter. Such different kinds of non-parametric tests are discussed in Chapter SEQUENTIAL TESTS The statistical tests mentioned earlier are based on fixed sample size. That is, the number of sample observations for those tests are constants. However, in sequential tests, the number of observations required depends on the outcome of the observations and is therefore, not pre-determined, but a random variable. The sequential test for testing hypothesis, H 0 against H is described as follows. At each stage of the experiment, the sample observation is drawn and making any one of the following three decisions namely (i) accepting H 0, (ii) rejecting H 0 ( or accepting H ) and (iii) continue the experiment by making an additional observation. Thus, such a test procedure is carried out sequentially. Some of the sequential tests are provided in Chapter 6.

17 CHAPTER PARAMETRIC TESTS

18 THIS PAGE IS BLANK

19 TEST TEST FOR A POPULATION PROPORTION Aim To test the population proportion, P be regarded as P 0, based on a random sample. That is, to investigate the significance of the difference between the observed sample proportion p and the assumed population proportion P 0. Source If X is the number of occurrences of an event in n independent trials with constant probability P of occurrences of that event for each trial, then E(X ) = np and V(X ) = npq, where Q = P, is the probability of non-occurrence of that event. It has proved that for large n, the binomial distribution tends to normal distribution. Hence, the normal test can be applied. In a random sample of size n, let X be the number of persons possessing the given attribute. Then the observed proportion in the sample be X = p, n (say), then E(p) = P and S.E(p) = Var( p) P( P) =. n Assumption The sample size must be sufficiently large (i.e., n > 30) to justify the normal approximation to binomial. Null Hypothesis H 0 : The population proportion (P) is regarded as P 0. That is, there is no significant difference between the observed sample proportion p and the assumed population proportion P 0. i.e., H 0 : P = P 0. Alternative Hypotheses H () : P P 0 H () : P > P 0 H (3) : P < P 0

20 0 Selected Statistical Tests Level of Significance ( a) and Critical Region () Z > Z such that P { Z > Z } = α α α α/ α/ 0 Zα/ Z α/ () Z > Z α such that P {Z > Z α } = α 0 Zα α (3) Z < Z α such that P {Z < Z α } = α α Zα 0

21 Parametric Tests Critical Values ( Z a ) Critical value Level of Significance (α) (Z α ) % 5% 0%. Two-sided test Z α =.58 Z α =.96 Z α =.645. Right-sided test Z α =.33 Z α =.645 Z α =.8 3. Left-sided test Z α =.33 Z α =.645 Z α =.8 Test Statistic Z = p P P( P) n The statistic Z follows Standard Normal Distribution. Conclusions (Under H 0 : P = P 0 ). If Z Z α, we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, it may be accepted at α% level of significance. Otherwise reject H 0 or accept H ().. If Z Z α, we conclude that the data do not provide us any evidence against the null hypothesis H 0 and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H (). 3. If Z Zα, we conclude that the data do not provide us any evidence against the null hypothesis H 0 and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H (3). Example Hindustan Lever Ltd. Company expects that more than 30% of the households in Delhi city will consume its product if they manufacture a new face cream. A random sample of 500 households from the city is surveyed, 63 are favorable in manufacturing the product. Examine whether the expectation of the company would be met at % level. Solution Aim: To test the HLL Company s manufacture of a new product of face cream will be consumed by 30% of the households in New Delhi or more. H 0 : The HLL Company s manufacture of a new product of face cream will be consumed by 30% of the households in New Delhi. i.e., H 0 : P = 0.3. H : The HLL Company s manufacture of a new product of face cream will be consumed by more than 30% of the households in New Delhi. i.e., H : p > 0.3

22 Selected Statistical Tests Level of Significance: α = 0.05 and Critical Value: Z α =.645 Based on the above data, we observed that, n = 500, p = (63/500) = 0.36 Test Statistic: Z = p P P( P) n (Under H 0 : P = 0.3) = (0.3)(0.7) 500 =.7 Conclusion: Since Z < Z α, we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, accept H 0 at 5% level of significance. That is, the HLL Company s manufacture of a new product of face cream will be consumed by 30% of the households in New Delhi. Example A plastic surgery department wants to know the necessity of mesh repair of hernia. They think that 5% of the hernia patients only need mesh. In a sample of 50 hernia patients from hospitals, 4 only needed mesh. Test at % level of significance that the expectation of the department for mesh repair of hernia patients is true. Solution Aim: To test the necessity of hernia repair with mesh is 5% or not. H 0 : The necessity of mesh repair of hernia is 5%. i.e., H 0 : P = 0.5 H : The necessity of mesh repair of hernia is not 5%. i.e., H : P 0.5 Level of Significance: α = 0.0 and Critical Value: Z α =.33 Based on the above data, we observed that, n = 50, p = (4/50) = 0.36 Test Statistic: Z = p P P( P) n (Under H 0 : P = 0.5) = (0.5)(0.85) 50 = 0.80 Conclusion: Since Z < Z α, we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, accept H 0 at % level of significance. That is, the necessity of mesh repair of hernia as expected by the plastic surgery department 5% is true. EXERCISES. A random sample of 400 apples was taken from large consignment and 35 were found to be bad. Examine whether the bad items in the lot will be 7% at % level.. 50 people were attacked by a disease of which 5 died. Will you reject the hypothesis that the death rate, if attacked by this disease is 3% against the hypothesis that it is more, at 5% level?

23 TEST TEST FOR A POPULATION MEAN (Population Variance is Known) Aim To test the population mean µ be regarded as µ 0, based on a random sample. That is, to investigate the significance of the difference between the sample mean X and the assumed population mean µ 0. Source Let X be the mean of a random sample of n independent observations drawn from a population whose mean µ is unknown and variance σ is known. Assumptions (i) (ii) Null Hypothesis The population from which, the sample drawn, is assumed as Normal distribution. The population variance σ is known. H 0 : The sample has been drawn from a population with mean µ be µ 0. That is, there is no significant difference between the sample mean X and the assumed population mean µ 0. i.e., H 0 : µ = µ 0. Alternative Hypotheses H () : µ µ 0 H () : µ > µ 0 H (3) : µ < µ 0 Level of Significance (a) and Critical Region: (As in Test )

24 4 Selected Statistical Tests Test Statistic Z = X µ The Statistic Z follows Standard Normal distribution. Conclusions (As in Test ) Example σ / n (Under H 0 : µ = µ 0 ) The daily wages of a Factory s workers are assumed to be normally distributed. A random sample of 50 workers has the average daily wage of rupees 0. Test whether the average daily wages of that factory be regarded as rupees 5 with a standard deviation of rupees 0 at 5% level of significance. Solution Aim: Our aim is to test the null hypothesis that the average daily wage of the Factory s workers be regarded as rupees 5 with standard deviation of rupees 0. H 0 : The average daily wage of the Factory s workers is 5 rupees. i.e., H 0 : µ = 5. H : The average daily wage of the Factory s workers is not 5 rupees. i.e., H : µ 5. Level of Significance: α = 0.05 and Critical Value: Z α =.96 Test Statistic: Z = X µ σ / n (Under H 0 : µ = 5) = 0 5 0/ 50 =.77. Conclusion: Since the observed value of the test statistic Z =.77, is smaller than the critical value.96 at 5% level of significance, the data do not provide us any evidence against the null hypothesis H 0. Hence it is accepted and concluded that the average daily wage of the Factory s workers be regarded as rupees 5 with a standard deviation of rupees 0. Example A bulb manufacturing company hypothesizes that the average life of its product is,450 hours. They know that the standard deviation of bulbs life is 0 hours. From a sample of 00 bulbs, the company finds the sample mean of,390 hours. At a % level of significance, should the company conclude that the average life of the bulbs is less than the hypothesized,450 hours? Solution Aim: Our aim is to test whether the average life of bulbs is regarded as,450 hours or less. H 0 : The average life of bulbs is,450 hours. i.e., H 0 : µ = 450. H : The average life of bulbs is below,450 hours. i.e., H : µ < 450. Level of Significance: α = 0.0 and Critical Value: Z α =.33

25 Parametric Tests 5 Test Statistic: Z = X µ σ / n (Under H 0 : µ = 450) = =.86 0 / 00 Conclusion: Since the observed value of the test statistic Z =.86, is smaller than the critical value.33 at % level of significance, the data provide us evidence against the null hypothesis H 0 and in favor of H. Hence, H is accepted and concluded that the average life of the bulbs is significantly less than the hypothesized,450 hours. EXERCISES. A Film producer knows that his movies ran an average of 00 days in each cities of Tamilnadu, and the corresponding standard deviation was 8 days. A researcher randomly chose 80 theatres in southern districts and found that they ran the movie an average of 86 days. Test the hypotheses at % significance level.. A sample of 50 children observed from rural areas of a district has an average birth weight of.85 kg. The past record shows that the standard deviation of birth weight in the district is 0.3 kg. Can we expect that the average birth weight of the children in the district will be more than 3 kg at 5% level?

26 TEST 3 TEST FOR A POPULATION MEAN (Population Variance is Unknown) Aim To test that the population mean µ be regarded as µ 0, based on a random sample. That is, to investigate the significance of the difference between the sample mean X and the assumed population mean µ 0. Source A random sample of n observations X i, (i =,,, n) be drawn from a population whose mean µ and variance σ are unknown. Assumptions (i) (ii) Null Hypothesis The population from which, the sample drawn is Normal distribution. The population variance σ is unknown. (Since σ is unknown, it is replaced by its unbiased estimate S ) H 0 : The sample has been drawn from a population with mean µ be µ 0. That is, there is no significant difference between the sample mean X and the assumed population mean µ 0. i.e., H 0 : µ = µ 0. Alternative Hypotheses H (): µ µ 0 H (): µ > µ 0 H (3): µ < µ 0

27 Parametric Tests 7 Level of Significance ( a) and Critical Region () t > t α,n such that P{ t > t α, n } = α α/ t α/, n 0 t α/, n α/ () t > tα, n such that P { t > tα, n } = α 0 t α,n α (3) t < tα, n such that P { t < tα, n } = α α t α, n 0 Critical Values (t α, n ) are obtained from Table.

28 8 Selected Statistical Tests Test Statistic t = X µ S / n (Under H 0 : µ = µ 0 ) n X = i= n X i n, S = ( X i X ) n i= The Statistic t follows t distribution with (n ) degrees of freedom. Conclusions. If t t α, we conclude that the data do not provide us any evidence against the null hypothesis H 0, and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H ().. If t tα, we conclude that the data do not provide us any evidence against the null hypothesis H 0, and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H (). 3. If t tα, we conclude that the data do not provide us any evidence against the null hypothesis H 0, and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H (3). Example A sample of students from a school has the following scores in an I.Q. test Do this data support that the mean I.Q. mark of the school students is 80? Test at 5% level. Solution Aim: To test the mean I.Q. marks of the school students be regarded as 80 or not. H 0 : The mean I.Q. mark of the school students is 80. i.e., H 0 : µ=80. H : The mean I.Q. mark of the school students is not 80. i.e., H : µ 80. Level of Significance: α = 0.05 and Critical Value: t 0.05, =.0 Test Statistic: t = X µ S / n (Under H 0 : µ = 80) = 7.0/ = 0.5 Conclusion: Since t <.0, we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, accept H 0, at 5% level of significance. That is, the mean I.Q. mark of the school students is regarded as 80.

29 Parametric Tests 9 Example The average breaking strength of steel rods is specified as.5 kg. To test this, a sample of 0 rods was examined. The mean and standard deviations obtained were.35 kg and.5 respectively. Is the result of the experiment significant at 5% level? Solution Aim: To test the average breaking strength of steel rods specified as.5 kg is true or not. H 0 : The average breaking strength of steel rods specified as.5 kg is true. i.e., H 0 : µ =.5. H : The average breaking strength of steel rods specified as.5 kg is not true. i.e., H : µ.5. Level of Significance: α = 0.05 and Critical Value: t 0.05,9 =.09 Test Statistic: t = X µ S / n (Under H 0 : µ =.5).35.5 = = Conclusion: Since t <.09, we conclude that the data do not provide us any evidence against the null hypothesis H 0 and hence it may be accepted at 5% level of significance. That is, the average breaking strength of steel rods specified as.5 kg is true. EXERCISES. A sales person says that the average sales of pickle in a week will be 0 numbers. A sample of sales on 8 weeks observed as Examine whether the claim of the salesman is true at % significance level.. A sample of 0 coconut has the following yield of coconuts from a grove in a season are Shall we conclude that the average yield of coconuts from the grove is 65? Test at % level.

30 TEST 4 TEST FOR A POPULATION VARIANCE (Population Mean is Known) Aim To test the population variance σ be regarded as σ, based on a random sample. That is, to 0 investigate the significance of the difference between the assumed population variance σ and the 0 sample variance s. Source A random sample of n observations X i, (i =,,, n) be drawn from a normal population with known mean µ and unknown variance σ. Assumption The population from which, the sample drawn is normal distribution. Null Hypothesis H 0 : The population variance σ is σ. That is, there is no significant difference between the 0 assumed population variance σ and the sample variance s. i.e., H 0 0 : σ = σ. 0 Alternative Hypotheses H () : σ σ 0 H () : σ > σ 0 H (3) : σ < σ 0

31 Parametric Tests Level of Significance ( a) and Critical Region () χ < χ (α/),n χ > χ (α/), n such that P{χ < χ (α/),n χ > χ (α/), n } = α () > χ α, n α/ 0 ( α/ ), n χ such that P { > χ α,n} χ χ = α χ ( α/ ), n α/ 0 (3) χ < χ α, n such that P {χ < χ α, n } = α. χ α,n fi a α 0 χ ( α), n

32 Selected Statistical Tests The critical values of Left sided test and Right sided test are provided as a and b are obtained from Table 3. Test Statistic χ = n i= ( X i σ 0 µ ) The statistic χ follows χ distribution with n degrees of freedom. Conclusions. If χ (α/) χ χ (α/), we conclude that the data do not provide us any evidence against the null hypothesis H 0, and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H ().. If χ χ α, we conclude that the data do not provide us any evidence against the null hypothesis H 0, and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H (). 3. If χ χ α, we conclude that the data do not provide us any evidence against the null hypothesis H 0, and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H (3). Example An agriculturist expects that the average yield of coconut is 63 per coconut tree and variance is 0.5 per year from a coconut grove. A random sample of 0 coconut trees has the following yield in a year: Test the variance is significant at 5% level of significance. Solution Aim: To test the variance yield of coconut from the grove is significant with the sample variance or not. H 0 : The variance of the yield of coconut in the grove is 0.5. i.e., H 0 : σ = 0.5 H : The variance of the yield of coconut in the grove is not 0.5. i.e., H : σ 0.5 Level of Significance: α = 0.05 Critical Values: χ (.975), 0 = 3.47 & χ (.05), 0 = Critical Region: P (χ (.975), 0 < 3.47) + P (χ (.05), 0 >0.483) = 0.0 i= Test Statistic: χ = σ n ( X i 0 µ ) = 49. =

33 Parametric Tests 3 Conclusion: Since χ (α/) < χ < χ (α/), we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, H 0 is accepted at 5% level of significance. That is, the variance of the yield of coconut in the grove be regarded as 0.5. Example The variation of birth weight (as measured by the variance) of children in a region is expected to be more than 0.6. The mean of the birth weight is known, which is.4 Kg. A sample of children is selected, whose birth weight is obtained as follows. Solution Weight (in Kgs.): Set up the hypotheses and for testing the expectedness at 5% level of significance. Aim: To test the variance of the birth weight of the children be 0.6 or more. H 0 : The variance of the birth weight of children in the region is 0.6. i.e., H 0 : σ = 0.6 H : The variance of the birth weight of children in the region is more than 0.6. i.e., H : σ > 0.6 Level of Significance: α = 0.05 and Critical Value: χ 0.05, = i= Test Statistic: χ = σ n ( X i 0 µ ) = =.94 Conclusion: Since χ < χ α, we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, H 0 is accepted at 5% level of significance. That is, the variance of the birth weight of children in the region is 0.6. EXERCISES. A psychologist is aware of studies showing that the mean and variability (measured as variance) of attention, spans of 5-year-olds can be summarized as 80 and 64 minutes respectively. She wants to study whether the variability of attention span of 6-year-olds is different. A sample of 0 6-yearolds has the following attention spans in minutes: State explicit null and alternative hypotheses and test at 5% level.. The average and variance of daily expenditure of office going women is known as Rs.30 and Rs.0 respectively. A sample of 0 office going women is selected whose daily expenditure is obtained as Test whether the variance of the daily expenditure of office going women is 0 at % level of significance.

34 TEST 5 TEST FOR A POPULATION VARIANCE (Population Mean is Unknown) Aim To test the population variance σ be regarded as σ, based on a random sample. That is, to 0 investigate the significance of the difference between the assumed population variance σ and the 0 sample variance s. Source A random sample of n observations X i, (i =,,, n) be drawn from a normal population with mean µ and variance σ (both are unknown). The unknown population mean µ is estimated by its unbiased estimate X. Assumption The population from which, the sample drawn is normal distribution. Null Hypothesis H 0 : The population variance σ is σ. That is, there is no significant difference between the 0 assumed population variance σ and the sample variance s. i.e., H 0 0 : σ = σ. 0 Alternative Hypotheses H () : σ σ 0 H () : σ > σ 0 H (3) : σ < σ 0 Level of Significance ( a) and Critical Region: (As in Test 4)

35 Parametric Tests 5 Test Statistic n i= ( X i X ) χ = σ The statistic χ follows χ distribution with (n ) degrees of freedom. Conclusions (As in Test 4) Example 0 A Statistics Professor conducted an examination to the class of 3 freshmen and sophomores. The mean score was 7.7 and the sample standard deviation was 5.9. Past experience to the Professor to believe that, a standard deviation of about 3 points on a 00-point examination indicates that the exam does a good job. Does this exam meet his goodness criterion at 0% level? Solution Aim: To test that, the examination meets the professor s goodness criterion or not. H 0 : The variance of the score on the exam is regarded as 3 (=69). i.e., H 0 : σ = 69 H : The variance of the score on the exam is not 69. i.e., H : σ 69 Level of Significance: α = 0.0 Critical Values: χ (.95), 30 = & χ (.05), 30 = Critical Region: P (χ (.95),30 < 8.493) + P (χ (.05),30 i= ( X i X ) Test Statistic: χ = σ n 0 ns = σ0 3 (5.9) = 3 > ) = 0.0 = Conclusion: Since χ > χ (α/), we conclude that the data provide us evidence against the null hypothesis H 0 and in favor of H. Hence, H is accepted at 0% level of significance. That is, this examination does not meet his goodness criterion of believing the standard deviation to be 3. Example The variation of daily sales in a vegetable mart is reported as Rs.00. A sample of 0 day s was observed with variance as Rs.60. Test whether the variance of the sales in the vegetable mart be regarded as Rs.00 or not at % level of significance. Solution Aim: To test the variance of the sales in the vegetable mart be regarded as Rs.00 or not. H 0 : The variance of the sales in the vegetable mart is Rs.00. i.e., H 0 : σ = 00 H 0 : The variance of the sales in the vegetable mart is not Rs.00. i.e., H : σ 00 Level of Significance: α = 0.05 Critical Values: χ (.975), 9 = & χ (.05), 9 = 3.85

36 6 Selected Statistical Tests Critical Region: P (χ (.975), 9 < 8.907) + P (χ (.05), 9 > 3.85) = 0.05 i= ( X i X ) Test Statistic: χ = σ n = = 3 00 Conclusion: Since χ (α/) < χ < χ (α/), we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, H 0 is accepted at 5% level of significance. That is, the variance of the sales in the vegetable mart is Rs.00. EXERCISES. A manufacturer claims that the lifetime of a certain brand of batteries produced by his company has a variance more than 6800 hours. A sample of 0 batteries selected from the production department of that company has a variance of 5000 hours. Test the manufacturer s claim at 5% level.. A manufacturer recorded the cut-off bias (volt) of a sample of 0 tubes as follows: The variability of cut-off bias for tubes of a standard type as measured by the standard deviation is 0.0 volts. Is the variability of new tube with respect to cut-off bias less than that of the standard type at % level?

37 TEST 6 TEST FOR GOODNESS OF FIT Aim To test that, the observed frequencies are good for fit with the theoretical frequencies. That is, to investigate the significance of the difference between the observed frequencies and the expected frequencies, arranged in K classes. Source Let O i, (i =,,, K) is a set of observed frequencies on K classes based on any experiment and E i (i =,,, K) is the corresponding set of expected (theoretical or hypothetical) frequencies. Assumptions (i) The observed frequencies in the K classes should be independent. K (ii) O i = E i= K i= i = N. (iii) The total frequency, N should be sufficiently large (i.e., N > 50). (iv) Each expected frequency in the K classes should be at least 5. Null Hypothesis H 0 : The observed frequencies are good for fit with the theoretical frequencies. That is, there is no significant difference between the observed frequencies and the expected frequencies, arranged in K classes. Alternative Hypothesis H : The observed frequencies are not good for fit with the theoretical frequencies. That is, there is a significant difference between the observed frequencies and the expected frequencies, arranged in K classes.

38 8 Selected Statistical Tests Level of Significance ( a) and Critical Region χ > χ α,(k ) such that P{χ > χ α,(k ) } = α Test Statistic χ = K i= Oi E Ei The Statistic χ follows χ distribution with (K ) degrees of freedom. Conclusion If χ χ α,(k ), we conclude that the data do not provide us any evidence against the null hypothesis H 0 and hence it may be accepted at α% level of significance. Otherwise reject H 0 or accept H. Example The sales of milk from a milk booth are varying from day-to-day. A sample of one-week sales (Number of Liters) is observed as follows. Day: Monday Tuesday Wednesday Thursday Friday Saturday Sunday Sales: Examine whether the sales of milk are same over the entire week at % level of significance. Solution Aim: To test the sales of milk is same over the entire week or not. H 0 : The sale of milk is same over the entire week. H : The sale of milk is not same over the entire week. Level of Significance: α = 0.0 Critical value: χ 0.0,6 = 6.8 i Day Frequency Observed (O i ) Expected (E i ) ( Oi Ei ) ( Oi Ei) E i Monday Tuesday Wednesday Thursday Friday Saturday Sunday Test Statistic: χ = K i= Oi E Ei i = 7.05

39 Parametric Tests 9 Conclusion: Since χ < χ α,(k ), we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, H 0 is accepted at % level of significance. That is, the sales of milk are same over the entire week. Example In an experiment on pea breeding, Mendal obtained the following frequencies of seeds from 560 seeds: 3 rounded and yellow (RY), 04 wrinkled and yellow (WY); round and green (RG), 3 wrinkled and green (WG). Theory predicts that the frequencies should be in the proportion 9:3:3: respectively. Set up the hypothesis and test it for % level. Solution Aim: To test the observed frequencies of the pea breeding in the ratio 9:3:3:. H 0 : The observed frequencies of the pea breeding are in the ratio 9:3:3:. H : The observed frequencies of the pea breeding are not in the ratio 9:3:3:. Level of Significance: α = 0.0 Critical value: χ 0.0,3 =.345 Seed type Frequency Observed (O i ) Expected (E i ) O E ) ( i i ( Oi Ei) E RY WY RG WG Test Statistic: χ = K i= Oi E Ei i = Conclusion: Since χ < χ α,(k ), we conclude that the data do not provide us any evidence against the null hypothesis H 0. Hence, H 0 is accepted at % level of significance. That is, the observed frequencies of the pea breeding are in the ratio 9:3:3:. i EXERCISES. A chemical extract plant processes seawater to collect sodium chloride and magnesium. It is known that seawater contains sodium chloride, magnesium and other elements in the ratio of 6:4:34. A sample of 300 hundred tones of seawater has resulted in 95 tones of sodium chloride and 9 tones of magnesium. Are these data consistent with the known composition of seawater at 0% level?. Among 80 off springs of a certain cross between guinea pigs, 4 were red, 6 were black and were white. According to genetic model, these numbers should be in the ratio 9:3:4. Are these consistent with the model at % level of significance?

40 TEST 7 TEST FOR EQUALITY OF TWO POPULATION PROPORTIONS Aim To test the two population proportions P and P be equal, based on two random samples. That is, to investigate the significance of the difference between the two sample proportions p and p. Source From a random sample of n observations, X observations possessing an attribute A whose sample proportion p is X /n. Let the corresponding proportion in the population be denoted by P, which is unknown. From another sample of n observations, X observations possessing the attribute A whose sample proportion p is X /n. Let the corresponding proportion in the population be denoted by P, which is unknown. Assumption The sample sizes of the two samples are sufficiently large (i.e., n, n 30 ) to justify the normal approximation to the binomial. Null Hypothesis H 0 : The two population proportions P and P are equal. That is, there is no significant difference between the two sample proportions p and p. i.e., H 0 : P = P. Alternative Hypotheses H () : P P H () : P > P H (3) : P < P Level of Significance ( a) and Critical Region: (As in Test )

41 Parametric Tests 3 Test Statistic Z = ( p p ) ( P P ) P( P) + n n (Under H 0 : P = P ) n p + n p P = n + n The statistic Z follows Standard Normal distribution. Conclusions (As in Test ) Example Random samples of 300 male and 400 female students were asked whether they like to introduce CBCS system in their university. 60 male and 30 female were in favor of the proposal. Test the hypothesis that proportions of male and female in favor of the proposal are equal or not at % level. Solution Aim: To test the proportion of male and female students are equal or not, in introducing CBCS system in their university. H 0 : The proportion of male (P ) and female (P ) students are equal, in favour of the proposal of introducing CBCS system in their university. i.e., H 0 : P = P. H : The proportion of male and female students is not equal, in favour of the propasal of introducing CBCS system in their university. i.e., H : P P Level of Significance: α = 0.0 and Critical Value: Z α =.33 6 Based on the data, we observed that n = 300, p = = 0.53, n = 400, p = = P = Test Statistic: Z = n p ( p + n n + n p p P( P) + n n ) ( P ( ) + ( ) = = P ) (Under H 0 : P = P ) Z = ( ) =.3 Conclusion: Since Z < Z α, we conclude that the data do not provide us any evidence against the null hypothesis H 0 and hence it is accepted at % level of significance. That is, the proportion of male and female students are equal, in favour of the propsal of introducing CBCS system in their university.

42 3 Selected Statistical Tests Example From a random sample of 000 children selected from rural areas of a district in Tamilnadu, it is found that five are affected by polio. Another sample of 500 from urban areas of the same district, three of them is affected. Will it be reasonable to claim that the proportion of polio-affected children in rural area is more than urban area at % level? Solution Aim: To test the proportion of polio-affected children in rural area is same as in urban area or more than urban area. H 0 : The proportion of polio-affected children in rural (P ) and urban (P ) areas are equal i.e., H 0 : P = P. H : The proportion of polio-affected children in rural area is more than urban area. i.e., H : P > P. Level of Significance: α = 0.0 and Critical Value: Z α =.33 5 Based on the data, we observed that n = 000, p = = 0.005, n = 500, p = = P = n p + n p n + n = ( ) + ( ) = ( p p) ( P P ) Test Statistic: Z = P( P) + n n (Under H 0 : P = P ) Z = ( ) =.30 Conclusion: Since Z < Z α, we conclude that the data do not provide us any evidence against the null hypothesis H 0 and hence it is accepted at % level of significance. That is, the proportions of polio-affected children in rural and urban areas are equal. EXERCISES. From a sample of 300 pregnancies in city-a in a year, 63 births are females. Another sample of 50 pregnancies in city-b in the same year, 3 births are females. Test whether the female births in both cities are equal at % level of significance.. A sample of 500 persons were selected from a city in Tamilnadu, 0 are tea drinkers. Another sample of 300 persons from a city of Kerala, 60 persons are tea drinkers. Test the hypothesis that the tea drinkers in Tamilnadu are less than that of Kerala at 0% level.

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method