LECTURE NOTES. INTSTA2 Introductory Statistics 2. Francis Joseph H. Campeña, De La Salle University Manila

Size: px

Start display at page:

Download "LECTURE NOTES. INTSTA2 Introductory Statistics 2. Francis Joseph H. Campeña, De La Salle University Manila"

Annabel Lambert
6 years ago
Views:

1 LECTURE NOTES INTSTA Introductory Statistics Francis Joseph H. Campeña, De La Salle University Manila

2 Contents 1 Normal Distribution 1.1 Normal Distribution Sampling and Sampling Distribution 8.1 Sampling and Sampling Distribution Estimation Estimating the Population Mean (µ) Estimating the Population Proportion (π) Estimating the Population Variance (σ ) Errors in Estimation and Sample Size Determination Estimation of Two Parameters Estimating Difference of Two Means Estimating Difference of Two Proportions Estimating the Ratio of Two variances: σ 1 σ Statistical Test of Hypothesis Hypothesis Testing Bibliography 34 1

3 Chapter 1 Normal Distribution Recall that a continuous random variable has a probability of zero of assuming exactly any of its values. And due to the nature of the random variable, we cannot enumerate all of its possible values. Thus when we consider continuous random variables and their probabilities, we only look at probabilities of the random variable have a value in a specified interval. However, we will only consider one type of continuous random variable, the Normal random variable and its associated probability distribution. 1.1 Normal Distribution The normal distribution is one of the most important continuous distribution in the entire field of statistics. And the graph of this distribution is called the normal curve. This distribution is sometimes called the Gaussian distribution in honor of Karl Friedrich Gauss, who derived its equation.

4 CHAPTER 1. NORMAL DISTRIBUTION 3 Remark Properties of the normal curve: 1. It has a bell-shaped curve.. The mode, which is the point on the horizontal axis where the curve is a maximum, occurs at x = µ. 3. The curve is symmetric about a vertical axis through the mean, µ. 4. The normal curve approaches the horizontal axis asymptotically as we proceed in either direction away from the mean. (The graph approaches the x-axis but the graph will never intersect the x-axis). 5. The total area under the curve and above the horizontal axis is equal to 1. Definition A continuous random variable X having the bell-shaped distribution is called a normal random variable. The mathematical equation for the probability distribution of the normal random variable depends on two parameters µ and σ ; its mean and standard deviation. Thus we denote the probability density of X by N(x; µ; σ). If X is a normal random variable with mean µ and variance σ, then the equation of the normal curve is N(x; µ; σ) = 1 πσ e 1 ( x µ σ ) for < x <. Remark It is difficult to compute for the probabilities of a normal random variable using the above formula. However, another way of calculating such probabilities is through the transformation of a normal random variable to its corresponding standard normal random variable. By transforming a normal random variable to a standard normal random variable we can now determine probabilities of the said random variable. Thus we define the standard normal random variable and its distribution.

CHAPTER 1. NORMAL DISTRIBUTION 4 Definition The distribution of a normal random variable with mean µ = 0 and standard deviation σ = 1 is called a standard normal distribution.

5 CHAPTER 1. NORMAL DISTRIBUTION 4 Definition The distribution of a normal random variable with mean µ = 0 and standard deviation σ = 1 is called a standard normal distribution. In order to transform a normal random variable to a standard normal one, we use the following formula: X µ Z=. σ By using the table for the standard normal random variable, we can now determine the probability of any normal random variable by transforming the given random variable to its corresponding standard normal random variable. Example Given a normally distributed random variable X with mean 18 and standard deviation of.5, find 1. P (X < 15).. P (17 < X < 1). Solution: (a) P (X < 15) = P Z < = P (Z < 1.) = Refer to the standard normal table: (b) P (17 < X < 1) = P <Z<.5.5 P (Z < 1.) P ( 0.4) = = = P (0.4 < Z < 1.) =

6 CHAPTER 1. NORMAL DISTRIBUTION 5 Example An electrical firm manufacturers light bulbs that have a length of life that is normally distributed with mean equal to 800 hours and standard deviation of 40 hours. Find the probability that the bulb burns between 778 and 834 hours. Solution: The distribution of the light bulbs is illustrated by the figure below: The z values corresponding to x 1 = 778 and x = 834 are Hence, z 1 = z = = 0.55, = P (778 < X < 834) = P ( 0.55 < Z < 0.85) = P (Z < 0.85) P (Z < 0.55) = =

7 CHAPTER 1. NORMAL DISTRIBUTION 6 Exercises (1) Given a normally distributed random variable X with mean 18 and standard deviation of.5, find the value of k such that () A certain type of storage battery last on the average 3.0 years, with a standard deviation of 0.5 years. Assuming that the battery lives are normally distributed, find the probability that a given battery will last less than.3 years. (3) An electrical firm manufactures light bulbs that have a length of life that is normally distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a bulb burns between 778 and 834 hours. (4) If the average height of miniature poodles is 30 centimeters, with a standard deviation of 4.1 cm, what percentage of miniature poodles exceeds 35 cm in height, assuming that the height follows a normal distribution and can be measured to any desired degree of accuracy? (5) A set of final examination grades in an introductory statistics course was found to be normally distributed, with a mean of 73 and a variance of 64. (a) What is the probability of getting a grade of 91 or less in this exam? (b) What percentage of students scored between 81 and 89? (c) Only 5% of the students taking the test scored higher than what grade? (6) Plastic bags used for packaging produce re manufactured so that the breaking strength of the bag is normally distributed with a mean of 5 pounds per square inch and a standard deviation of 1.5 pounds per square inch. (a) What proportion of the bags produced have a mean breaking strength of between 5 and 5.5 pounds per square inch? (b) What is the probability that a randomly selected bag will have a mean breaking strength of at least 6 pounds per square inch?

8 CHAPTER 1. NORMAL DISTRIBUTION 7 (c) What percentage of the bags have a mean breaking strength of less than 4.17 pound per square inch? (d) Between what two values symmetrically distributed around the mean will 95% of the breaking strengths fall? (7) If we know that the length of time it takes a college student to find a parking spot in the university parking lot follows a normal distribution with a mean of 3.5 minutes and a standard deviation of 1 minute, find the probability that if we select 36 randomly selected college students, the average time it would take for them to find a parking spot is (a) less than 3. minutes? (b) between 3.4 and 3.7 minutes? (c) more than 3.8 minutes? (8) The time needed to complete a final exam in a particular college course is normally distributed with mean 80 minutes and standard deviation of 10 minutes. (a) What is the probability of completing the exam in an hour or less? (b) What is the probability that a student will complete the exam in more than 60 minutes but less than 75 minutes? (c) Assume that the class has 60 students and that the examination period is 90 minutes in length. How many of the students do you expect will be unable to complete the exam in the allotted time?

9 Chapter Sampling and Sampling Distribution Recall that one of the objectives of statistics is to make inferences concerning a population. And these inferences are based only in partial information regarding the population, since the information of the statistics is based on the sample. And the value of our statistics may vary from sample to sample. Because of this, we need to understand first the variations that are associated with the statistic involved in our inference. Another concern regarding inference based on sample information is the factor of how the samples are taken and how large the sample size is so that meaningful interpretations can be drawn from the sample. This concern is addressed in specialized study of statistics, Sampling Theory, which is beyond the scope of our study in this course. But an overview of terms and concepts in sampling theory are discussed in section Sampling and Sampling Distribution A statistic is a numerical descriptive measure derived from a sample. However, there are random samples thus producing different values for a certain statistic. Since statistic varies from sample to sample then we can say that a statistic is also a random variable. Recall that we can construct a probability distribution for a random variable hence probability distribution for a statistic can also be constructed. We call the probability distribution of a statistic a sampling distribution. 8

10 CHAPTER. SAMPLING AND SAMPLING DISTRIBUTION 9 Definition The sampling distribution of a statistic is the probability distribution for the possible values of the statistic that results when random samples of size n are repeatedly drawn from the population. Example A population consists of N = 5 numbers :1,, 3, 4, and 5. If a random sample of size n = 3 is selected, find the sampling distributions for the sample mean. Solution: Computation of the population mean and variance will give us µ = 3 and σ =. Since there are only 5 distinct and equally likely elements in our population the probability that one will occur is the same for all elements in the population, that is, P (x) = 1. Since we are only choosing 3 from the 5 population there are only 5 C 3 = 10 different possible samples and they are as follows: No. Sample Sample Mean x 1 1,,3 1,, ,, ,3, ,3, ,4, ,3,4 3 8,3, ,4, ,4,5 4 Thus the sampling distribution of the sample mean is x f( x) Notice that if we take the average of all the sample means we will get the value 3 and a variance of 1. But if we increase our sample size say n = 4 and 3 compute for the sampling distribution of x again, we will still get a mean of 3 but a variance of 0.15.

11 CHAPTER. SAMPLING AND SAMPLING DISTRIBUTION 10 Remark We can notice that µ x the mean of the sample means is equal to the population mean, and the variance σ x or the standard deviation σ x will decrease as our sample size increases. If all possible random samples of size n are drawn, without replacement, from a finite population of size N with mean µ and standard deviation σ, then the sampling distribution of the sample mean will be approximately normally distributed and the mean and standard deviation is given by The factor N n N 1 µ x = µ and σ x = σ n N n N 1. is called the finite correction factor. For large or infinite populations, this correction factor will be approximately equal to 1. Hence σ x = σ n The above notion regarding the sampling distribution of the sample mean gives us the foundation of the next theorem; the central limit theorem. The central limit theorem states that in general situations and condition sums and means of samples of random observations that are drawn from a population of any distribution tends to possess, approximately, a bell shaped distribution in repeated sampling. And thus the distribution can be assumed approximately normal. One of the significance of the central limit theorem is that it explains why some of the observations in the real world tends to possess an approximately a normal distribution. To illustrate this significance, consider the weight of a person. Weight can be affected by many factors whether environmental or genetics for instance, family lineage such as the parents weights. Another factor can be the physical activities of the person. All this possibilities may really affect the weight of a person but the central limit theorem together with other theorems applicable to the normal distribution provides an explanation of this events. Another significance of the central limit theorem and probably the most important attribute is its application to statistical inference. Many statistical estimators that are used to make inferences about a population have parameters that are sums and averages of sample observations.

12 CHAPTER. SAMPLING AND SAMPLING DISTRIBUTION 11 Theorem Central Limit Theorem If random samples of size n are drawn from a large or infinite population with mean µ and variance σ, then the sampling distribution of the sample mean is approximately normally distributed with mean and standard deviation µ x and σ x = σ n where µ and sigma are the mean and standard deviation of the population, respectively. Thus, z = ( x µ x σ x = σ n ) is a value of a standard normal random variable Z. Remark 1. If samples are taken from a population having a normal distribution, then the sampling distribution of the sample mean will have a normal distribution no matter what n is.. If samples are taken from a population which is not normally distributed, then the sampling distribution of the sample mean will have an approximate normal distribution only for large samples, that is, when n The standard deviation of the sampling distribution of x,σ x, is called the standard error of the sample mean.

13 Chapter 3 Estimation Procedures and formulas used in estimating values of unknown population parameters that are based on information provided in a sample data are based on the theory of sampling distributions and the methods used to collect these sample. The sampling distributions allow us to associates specific levels of confidence with each statistical inference. And thus enabling us to quantify how much confidence we place in a sample statistic correctly estimating the population parameter. Definition An estimator is a rule, usually expressed as a formula that tells us how to calculate an estimate based on information in the sample. We can classify estimators into two, point estimators and interval estimators. 1. Point estimation - Based on sample data, a single number is calculated to estimate the population parameter. The rule or formula that describes this calculation is called the point estimator, and the resulting number is called the point estimate.. Interval estimation - Based on sample data, two numbers are calculated to form an interval within which the parameter is expected to lie. The rule or formula that describes this calculation is called the interval estimator, and the resulting pair of numbers is called an interval estimate or confidence interval. 1

14 CHAPTER 3. ESTIMATION Estimating the Population Mean (µ) A. Point Estimate for µ (1) The best point estimate for the population mean, µ, is the sample mean, x. () The point estimator x is unbiased with standard error given by SE = σ n. (3) The margin of error of the point estimate, x, is given by±1.96se. (4) If σ is unknown and n 30, the sample standard deviation s can be used to approximate σ. B. Interval Estimate for µ To construct and interval estimate for the population mean, we consider two cases. One case is when the standard deviation of the population is known or unknown by the sample size is large enough, that is, n 30. The other case is when the standard deviation is not known and the sample size is less than 30. (a) CASE 1: If σ is known or σ unknown but n 30, a (1 α)100% confidence interval for a population mean,µ is given by: x ± Z α ( ) σ n where: x = sample mean Z α = z-score with an area of α to the right n = sample size σ = population standard deviation (b) CASE : If σ unknown and n 30, a (1 α)100% confidence interval for a population mean,µ is given by: x ± t α ( s n )

15 CHAPTER 3. ESTIMATION 14 where: x = sample mean t α = critical t-value with an area of α to the right and a degree of freedom n 1 n = sample size s = sample standard deviation Remark (1) If x is used as an estimate of µ, we ( can then be (1 α)100% confident that the error will not exceed Z α σ n ). () If x is used as an estimate of µ, we can then be (1 α)100% confident that the error will not exceed a specified amount e when the sample size ( Z α ) σ is n =. Example e The mean and standard deviation for the quality grade point averages of a random sample of 36 college seniors are calculated to be.6 and 0.3 respectively. Find the 95% confidence interval for the mean of the entire senior class. Solution: The following are known: x =.6, s = 0.3, n = 36, α = 5% for a 95% C.I. A 95% confidence interval for the quality grade point average of the entire senior class ( ) is given by: ( ) ( ) σ x ± Z α 0.3 n.6 ± Z ± = (.50,.698) 3. Estimating the Population Proportion (π) A. Point Estimate for π (1) The best point estimate for the population proportion,π, is the sample proportion, ˆp. () The point estimator ˆp is unbiased with standard error given by SE = ˆpˆq n. (3) The margin of error of the point estimate, ˆp, is given by±1.96se.

16 CHAPTER 3. ESTIMATION 15 (4) The maximum error in estimating π using ˆp, is given by±z α. B. Interval Estimate for π To construct and interval estimate for the population proportion we have the following formula: A (1 α)100% confidence interval for a population proportion,π is given by: where: ˆp = sample proportion ˆq = 1 ˆp Z α ˆp ± Z α ( ) ˆpˆq n = z-score with an area of α to the right n = sample size Example In a random sample of 500 people eating lunch at a hospital cafeteria on various Fridays, it was found that 160 preferred seafood. Find a 95% confidence interval for the actual proportion of people who eat seafood on Fridays at this cafeteria. Solution: The following are known: x = 160, n = 500, α = 5% for a 95% C.I. The point estimate of π is ˆp = = 0.3 A 95% confidence interval for the proportion of people who prefers seafood on Fridays at this cafeteria is given by: ( ) ( ) ˆpˆq (0.3)(0.68) ˆp ± Z α 0.3 ± Z 0.05 n 500 ( ) (0.3)(0.68) 0.3 ± (1.96) < π <

17 CHAPTER 3. ESTIMATION Estimating the Population Variance (σ ) Suppose a sample of size n is drawn from a normal population with variance σ. The point estimate for the population variance,σ, is the sample variance, s and a (1 α)100% confidence interval for the population variance is given by (n 1)s χ α < σ < (n 1)s χ 1 α where χ α and χ 1 are values with n 1 degrees of freedom leaving areas α of α and 1 α, respectively, to the right. Example The following are the volumes, in deciliters, of 10 cans of peaches distributed by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45., and Find a 95% confidence interval for the variance of all such cans of peaches distributed by this company, assuming that the volume is normally distributed random variable. Solution: The following are known: n = 10, α = We compute s to be and the chi-square values to be χ α = χ 0.05 = and χ 1 = χ α =.700. Using these values, a95% confidence interval for the variance of the volume of canned peaches by this company is: (9)(0.934) < σ < (9)(0.934) < σ < Errors in Estimation and Sample Size Determination We note that a (1 α)100% confidence interval provides an estimate of the accuracy of our point estimates. If the parameter is actually at the center of the interval estimate then the point estimate estimates the parameter without error. However, this will not always be the case. Hence, we provide the following theorems.

18 CHAPTER 3. ESTIMATION 17 Theorem Error in Estimating µ If x is used as an estimate ( of µ, ) we can then be σ. n (1 α)100% confident that the error will not exceed Z α Theorem Sample Size for Estimating µ If x is used as an estimate of µ, we can then be (1 α)100% confident that the error will not exceed a specified amount ( σ ) e when the sample size is n = Z α. e Theorem Error in Estimating π If ˆp is used as an estimate ( of π, we ) can then be ˆpˆq. n (1 α)100% confident that the error will not exceed Z α Theorem Sample Size for Estimating π If ˆp is used as an estimate of π, we can then be (1 α)100% confident that( the ) error will not exceed a specified amount ˆpˆq e when the sample size is n = Z α. e

19 CHAPTER 3. ESTIMATION 18 Exercises 1. A scientist interested in monitoring chemical contaminants in food, and thereby the accumulation of contaminants in human diets, selected a random sample of n = 50 male adults. It was found that the average daily intake of dairy products was 756 grams per day with a standard deviation of 35 grams per day. Construct a 95% confidence interval for the mean daily intake of dairy products for men.. A random sample of 1 female students in a certain dormitory showed an average weekly expenditure of P400 for snack foods, with a standard deviation of P1.50. Construct a 90% confidence interval for the average amount spent on snack foods by female students living in this dormitory, assuming the expenditures to be normally distributed. 3. The contents of 7 similar containers of sulfuric acid are 9.8, 10., 10.4, 9.8, 10.0, 10., and 9.6 liters. Find a 95% confidence interval for the mean content of all such containers, assuming an approximate normal distribution for container contents. 4. The mean and standard deviation for the quality grade point averages of a random sample of 36 college seniors are calculated to be.6 and 0.3 respectively. Find the 95% and 99% confidence intervals for the mean of the entire senior class. 5. The following data were collected based from a sample in an experiment: n = 64, x =.5 and s = 3.4. (a) What is the point estimate of µ? (b) What is the margin of error associated with the point estimate of µ? (c) Construct a 99% confidence interval for µ. (d) What is the maximum error of the estimate for (c)? 6. A telephone answering service completes a report in which the length of the call is recorded, at the end of each call. A random sample of 9 reports yields a mean length of call of 1. minutes. Construct a 95% confidence interval for the mean length of call for the whole telephone answering service company if it is known that the population is normally distributed with a standard deviation of 0.6 minutes.

20 CHAPTER 3. ESTIMATION A random sample of 10 chocolate bars has an average of 30 calories with a standard deviation of 15 calories. Assuming that the distribution of the calories is approximately normal. (a) Construct a 99% confidence mean calories content of this chocolate bar. (b) How large a sample is needed if we wish to be 99% confident that our sample mean will be within 5 calories of the true mean? 8. A sample selected from a population gave a sample proportion equal to 0.73 (a) Make a 99% confidence interval for π assuming n = 100. (b) Make a 99% confidence interval for π assuming n = 600. (c) Make a 99% confidence interval for π assuming n = (d) Does the width of the confidence interval constructed for a-c decrease as the sample size increases? If yes, explain why. 9. In poll of 617 workers conducted for Ernst and Young, 5% said that they had observed their co-workers stealing products or cash from their employers. (a) What is the point estimate of the corresponding population proportion? (b) What is the margin of error associated with the point estimate? (c) Find a 95% confidence interval for the proportion of all such workers who have observed their co-workers stealing productgs or cash from their employers. 10. In a random sample of 500 teenagers 1 to 17 years old, it was found that 330 have regular access to computers and the Internet. (a) What is the point estimate of the corresponding population proportion? (b) Construct a 95%confidence interval for the true proportion of teenagers 1 to 17 years old who have regular access to computers and the Internet.

21 CHAPTER 3. ESTIMATION 0 (c) What can you assert with 95% confidence about the possible size of the error e if you estimate the true proportion of teenagers 1 to 17 years old who have regular access to computers and the Internet to be equal to = 0.66? 11. A random sample of 985 likely voters who are likely to vote in the upcoming election were polled during a phonathon conducted by the Liberal party. Of those surveyed, 59 indicated ha they intend to vote for the Liberal candidate in the upcoming election. (a) Construct a 90% confidence interval for the proportion of likely voters in the population who intend to vote for a liberal candidate. (b) What can we assert with a 90% confidence about the possible size of error if we estimate the fraction of voters who intend to vote for the Liberal candidate is (c) How large a sample is needed if we want to be 90% confident that our estimate of p is within 0.1?

22 Chapter 4 Estimation of Two Parameters 4.1 Estimating Difference of Two Means Let µ 1 and σ 1 be the mean and standard deviation, respectively, of the first population and µ and σ be the mean and standard deviation, respectively, of the second population. Random samples of size n 1 are taken from the first population and random samples of size n are taken from the second population. A. Point estimation for µ 1 µ : (1) The best point estimate for the difference between two population means, µ 1 µ, is given by the difference between their sample means, x 1 x. () The point estimator, x 1 x, is unbiased with standard error given σ1 by SE = + σ. n 1 n (3) The margin of error of the point estimate is given by 1.96SE. (4) If σ 1 and σ are unknown but both n 1 and n are 30 are more, then the sample variances s 1 and s can be used. B. Interval estimation for µ 1 µ : We consider four cases in constructing a (1 α)100% Confidence Interval for the difference between two population means. 1

23 CHAPTER 4. ESTIMATION OF TWO PARAMETERS Case 1: Large Sample Case σ 1 and σ are known or σ 1 and σ are unknown but n 1 n 30. ( x 1 x ) ± Z α σ 1 + σ n 1 n 30 and Case : Small Sample Case, Equal Variance σ 1 and σ are unknown and n 1 < 30 and n < 30 but σ 1 = σ. ( 1 ( x 1 x ) ± t α s p + 1 ) n 1 n (n 1 1)s 1 + (n 1)s where, v = n 1 + n and s p = n 1 + n Case 3: Small Sample Case, Unequal Variance σ 1 and σ are unknown and n 1 < 30 and n < 30 but σ 1 σ. ( x 1 x ) ± t α where, v = df = s 1 + s n 1 n ( ) s 1 n 1 + s n ( s 1 n 1 ) n ( s n ) n 1

24 CHAPTER 4. ESTIMATION OF TWO PARAMETERS 3 Case 4: Paired Sample σ 1 and σ are unknown and n 1 < 30 and n < 30 but σ 1 σ. s d ± t α d a n where, d is the mean of the differences and s d is the standard deviation n of the differences computed as d = ( n ) d i n d i=1 i n i=1 s d = n 1 i=1 n 4. Estimating Difference of Two Proportions Suppose independent random samples of size n 1 and n are taken from two populations and let x 1 and x be the number of successes in the first and second populations, respectively. A. Point estimation for π 1 π : d i and (1) The best point estimate for the difference between two population proportion, π 1 π, is given by the difference between their sample proportions, ˆp 1 ˆp. () The point estimator, ˆp 1 ˆp, is unbiased with standard error given ˆp1 ˆq 1 by SE = + ˆp ˆq. n 1 n (3) The margin of error of the point estimate is given by 1.96SE. B. Interval estimation for π 1 π : We consider four cases in constructing a (1 α)100% Confidence Interval for the difference between two population proportion.

25 CHAPTER 4. ESTIMATION OF TWO PARAMETERS 4 ( ˆp 1 ˆp ) ± Z α ( ˆp1 ˆq 1 n 1 + ˆp ˆq n ) 4.3 Estimating the Ratio of Two variances: σ 1 σ For any two independent random samples of size n 1 and n selected from two normal populations, the ratio of the sample variances, s 1, is computed and s the following (1 α)100% confidence interval for σ 1 is given by σ s 1 s f α 1 (v 1, v ) < σ 1 σ < s 1 f s α (v, v 1 ) where f α (v 1, v ) is an f value with a v 1 = n 1 1 and v = n 1 degrees of freedom leaving an area of α to the right. Exercises 1. The wearing qualities of two types of automobile tires were compared by road testing samples of 100 tires of each type. The number of miles until wear out was defined as a specific amount of tire wear. The test results are given below: T ire1 T ire x 1 = 6, 000 x 1 = 5, 100 s 1 = 1, 440, 000 s = 1, 960, 000 Estimate the difference in mean miles to wear out, µ 1 µ.. A standardized chemistry test was given to a random sample of 50 girls and 75 boys. The girls made an average grade of 76 with a standard deviation of 6 while the boys made an average grade of 8 with a standard deviation of 8. Find a 90% confidence interval for the difference µ 1 µ where µ 1 is the mean sore of all boys and µ is the mean of all girls who might take the test.

26 CHAPTER 4. ESTIMATION OF TWO PARAMETERS 5 3. A course in Mathematics is taught to 1 students by the conventional classroom procedure. A second group of 10 students was given the same course by means of programmed materials. At the end of the term, the same examination was given to each group. The 1 students meeting in the classroom made an average group of 85 with a standard deviation of 4 while the 10 students using programmed materials made an average of 81 with a standard deviation of 5. Find a 90% confidence interval for the difference between the population means, assuming the population approximates a normal distribution with equal variances. 4. Records for the past 15 years have shown the average rainfall in a certain region of the country for the month of May to be 4.93 cm with a standard deviation of 1.14 cm. A second region of the country has had an average rainfall in May of.64 cm with a standard deviation of 0.06 cm during the past 10 years. Find a 95% confidence interval for the difference of the true average rainfalls in these two regions assuming that the observations come from normal populations with different variances. 5. It is claimed that a new diet will reduce a persons weight by 4.5 kilograms on the average in a period of weeks. The weights of 7 women who followed this diet were recorded before and after a -week period: Bef ore Af ter (a) Find the average of the differences in weights before and after the weight loss program for the 4 women who participated. (b) Find a 95% Confidence interval for the differences in weights before and after the weight loss program for the 4 women who participated. (c) 6. A poll is taken among the residents of a city and the rounding county to determine the feasibility of a proposal to construct a civic center. If 400 of 5000 city residents favor the proposal and 100 of 000 county residents favor it, find a 90% confidence interval for the true difference in the fractions favoring the proposal to construct the civic center.

27 CHAPTER 4. ESTIMATION OF TWO PARAMETERS 6 7. A geneticist is interested in the proportion of males and females in the population that have a certain minor blood disorder. In a random sample of 100 males, 4 are found to be afflicted, whereas 13 out of 100 females tested appear to have the disorder. Compute a 99% confidence interval for the difference between proportion of males and females that have this blood disorder. 8. In a study of the relationship between birth order and college success, an investigator found that 16 in a sample of 180 college students were firstborn or only child. In a sample of 100 non-graduates of comparable age and socio-economic background, the number of firstborn or only child was 54. Find a point estimate for the difference between the proportions of firstborn or only child in the two populations from which these samples were drawn. 9. An efficiency expert wishes to determine the average time that it takes to drill 3 holes in a certain metal clamp. How large should a sample will be needed for the expert to be 95% confident that his sample mean will be within 15 seconds of the true mean? 10. The government awarded grants to the agricultural departments of nine universities to test the yield capabilities of two new varieties of wheat. Each variety was planted on plots of equal area at each university and the yields, in kilograms per plot, were recorded as follows: V ariety V ariety Find a 95% confidence interval for the mean difference between the yields of the two varieties assuming the distributions of yileds to be approximately normal. 11. A random sample of 1 female students in a certain dormitory showed an average weekly expenditure of Php for snack foods, with a standard deviation of Php (a) What is the point estimate for the average weekly expenditure of females in this dormitory?

28 CHAPTER 4. ESTIMATION OF TWO PARAMETERS 7 (b) What is the standard error in estimating the average weekly expenditure of females in this dormitory? (c) Construct a 90% confidence interval for the average amount spend each week on snack foods by female students living in this dormitory, assuming the expenditures to be approximately normally distributed. 1. Two kinds of thread are being compared for strength. Fifty pieces of each type of thread are tested under similar conditions. Brand A had an average tensile strength of 78.3 kilograms with a standard deviation of 5.6 kilograms, while Brand B had an average tensile strength of 87. kilograms with a standard deviation of 6.3 kilograms. (Use µ 1 for Brand A and µ for Brand B ) (a) What is the point estimate for the difference in the average tensile strength of the two threads? (b) What is the standard error in estimating the difference of the average tensile strength of the two kinds of thread? (c) Construct a 95% confidence interval for the difference of the population means. (d) What is the maximum error in estimating the difference of the average tensile strength of the two kinds of thread? 13. A new rocket-launching system is being considered for deployment of small short-range launches. The existing system has π = 0.8 as the probability of a successful launch. A sample of 40 experimental launches is made with the new system and 34 are successful. (a) Construct a 95% confidence interval for π. (b) Would you conclude that the new system is better? answer. Explain your 14. A study is made to determine if a cold climate results in more students being absent from school during a semester than for a warmer climate. Two groups of students are selected at random, one group from Vermont and the other groups from Georgia. Of the 300 studnets from Vermont, 64 were absent at least 1 day during the semester, and of the 400 students from Georgia, 51 were absent 1 day or more days. Find a 95% confidence

29 CHAPTER 4. ESTIMATION OF TWO PARAMETERS 8 interval for the difference between the fractions of the students who are absent in the two states. 15. A random sample of 100 homes in a certain city, it is found that 68 are heated by natural gas. Find the 98% confidence interval for the fraction of homes in this city that are heated by natural gas. 16. A random sample of 75 colleges students is selected and 16 are found to have cars on campus. Use a 95% confidence interval to estimate the fraction of students who have cars on campus.

30 Chapter 5 Statistical Test of Hypothesis In a certain perspective, we can view hypothesis testing just like a jury in a court trial. In a jury trial, the null hypothesis is similar to the jury making a decision of not- guilty, and the alternative is the guilty verdict. Here we assume that in a jury trial that the defendant isn t guilty unless the prosecution can show beyond a reasonable doubt that defendant is guilty. If it has been established that there is evidence beyond a reasonable doubt and the jury believes that there is enough evidence to refute the null hypothesis, the jury gives a verdict in favor of the alternative hypothesis, which is a guilty verdict. In general, when performing hypothesis testing, we set up the null (H o ) and alternative (H a ) hypothesis in such a way that we believe that Ho is true unless there is sufficient evidence (information from a sample; statistics) to show otherwise. 5.1 Hypothesis Testing A statistical hypothesis is an assertion or conjecture concerning one or more populations. 9

31 CHAPTER 5. STATISTICAL TEST OF HYPOTHESIS 30 Remark 1. Null hypothesis - the hypothesis that we wish to focus our attention on. Generally this is a statement that a population parameter has a specified value. The hypothesis that is tested and the one which the researcher wishes to reject or not to reject. Specifies an exact value of the population parameter. Denoted by H o.. Alternative hypothesis - the hypothesis that is accepted if the null hypothesis is rejected. Allows for the possibility of several values. Denoted by H a or H 1. May be directional (quantifier < or >) or non-directional (quantifier is ). A test of hypothesis is the method to determine whether the statistical hypothesis is true or not. In performing statistical test of hypothesis we consider the following situations: REJECT DO NOT REJECT N ullhypothesis TRUE FALSE T ypeierror CorrectDecision CorrectDecision T ypeiierror The probability of committing a TYPE I error is also called the level of significance and is denoted by a small Greek symbol alpha, α. Some of the common values used for the level of significance are 0.1, 0.05, and For example, if α = 0.1 for a certain test, and the null hypothesis is rejected, then it means that we are 90% confident that this is the correct decision.

32 CHAPTER 5. STATISTICAL TEST OF HYPOTHESIS 31 Remark The following are some important properties pertaining to α and β. The Type I error and Type II error are related. A decrease in the probability of one results in the increase in the porbability of the other. The size of the critical region, and therefore the probability of committing a Type I error, can always be reduced by adjusting the critical values. An increase in the sample size will reduce α and β simultaneously. If the null hypothesis is false, β is a maximum when the true value of a parameter is close to the hypothesized value. The greater the distance between the true value and the hypothesized value, the smaller β will be. Remark

33 CHAPTER 5. STATISTICAL TEST OF HYPOTHESIS 3 The following are some important terms and concepts in performing a test of hypothesis. 1. Level of significance,α. The level of significance,α, is the probability of committing an error of rejecting the null hypothesis when, in fact, it is true.. One-tailed tests v.s. Two-tailed tests. One Tailed Test. A one tailed test is performed when the alternative hypothesis is concerned with values specifically below or above an exact value of the null hypothesis. The alternative hypothesis is directional(i.e. < or >). Two Tailed Test. A two-tailed test is performed when the alternative hypothesis is concerned with values that are not equal to an exact value of the null hypothesis. The alternative hypothesis is non-directional. 3. Test Statistic The value generated from sample data. Test value to be compared with the critical values. 4. Critical Region (Region of rejection/region of acceptance) Depends on the type of test to be performed. If test is one tailed, then the critical region is concentrated on either the left tail (for<) or the right tail of the distribution (for >). If test is two tailed, then the critical region is distributed on each tail of the distribution. Critical values are obtained depending on the type of test to be performed. If the test is one tailed, the significance level will be the area either on the left tail or on the right tail of the distribution. If the test is two tailed, the area in each tail of the distribution will be α.

34 CHAPTER 5. STATISTICAL TEST OF HYPOTHESIS 33 The following are the steps in performing a test of hypothesis: (1) Setup the null and alternative hypothesis. () Indicate the level of significance. (3) Determine the critical region and the corresponding critical values. (4) Compute the value of the test statistic. (5) Make a decision. (6) Draw appropriate conclusion.

35 Bibliography [1] R. Walpole Introduction to Statistics. Pearson Education South Asia Pte Ltd.004. [] R. Walpole, R. Myers, K. Ye, and S. Myers Probability and Statistics for Engineers and Scientists. Pearson Education International.007. [3] L. Stephens Schaums s Outline of Theory and Problems in Beginning Statistics. The McGraw-Hill Companies, Inc [4] L. Kazmier Schaums s Easy Outlines: Business Statistics. The McGraw- Hill Companies, Inc.003. [5] L. Gonick and W. Smith Cartoon Guide to Statistics. HarperCollins Publisher, [6] A. Graham Developing Thinking in Statistics Paul Chapman Publishing 006. [7] R. Khazanie Elementary Statistics: In a World of Applications. Goodyear Publishing Inc.,

Chapter 6 ESTIMATION OF PARAMETERS

Chapter 6 ESTIMATION OF PARAMETERS Recall that one of the objectives of statistics is to make inferences concerning a population. And these inferences are based only in partial information regarding the