DEPARTMENT OF MATHEMATICS AND STATISTICS

Size: px

Start display at page:

Download "DEPARTMENT OF MATHEMATICS AND STATISTICS"

Coleen Caren Bennett
6 years ago
Views:

1 DEPARTMENT OF MATHEMATICS AND STATISTICS Memorial University of Newfoundland St. John s, Newfoundland CANADA A1C 5S7 ph. (709) fax (709) Alwell Julius Oyet, Phd STATISTICS FOR PHYSICAL SCIENCES LECTURE NOTES 4. CONTINUOUS RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 4.1 Continuous Random Variables Recall that for a discrete random variable, the set of all possible outcomes has to be discrete in the sense that the elements of the set can be listed sequentially beginning with the smallest number. On the other hand, if the set of all possible values of a random variable is an entire interval of numbers, then the random variable is said to be continuous. In this case, between any of the two possible values we can find infinitely many possible values. This implies that the values cannot be listed sequentially. Example: For instance, let X = length of a randomly chosen rattle snake. Suppose the smallest rattle snake is of length a and the longest rattle snake is b units long. Then the set of all possible values lie between a and b. We then write a x b as the range of possible values. Example: Let X = reaction temperature (in o C) in a certain chemical process. Suppose the minimum temperature is 5 o C and maximum temperature for the process is 5 o C. Then the set of all possible values is {x : 5 x 5}. It is possible to argue that restrictions of our measuring instruments may sometimes limit us to a discrete world. For instance, our measurements of temperature may sometimes make us feel that temperature is discrete. However, continuous random variables and distributions often approximate real-world situations very well. Furthermore, problems based on continuous variables are often easier to solve than those based on discrete variables. 4.2 Probability Distributions For Continuous Random Variables 1

2 In the discrete case, the probability distribution of a random variable called the probability mass function (pmf) shows us how the total probability of 1 is distributed among the possible values of the random variable. The probability distribution of a continuous random variable is called the probability density function. Now, let Y be a continuous random variable. Then a probability density function (pdf) of Y is a function f(y) such that for any two numbers a and b, with a b, P (a Y b) = Z b a f(y)dy. That is, if we plot the graph of f(y) or the density curve, the probability that Y takes a value between a and b is the area between these two numbers and under the graph of f(y). That means, probabilities involving continuous random variables is the same as area under the curve. It therefore makes sense that for any constant, say c, P (Y = c) =0. That is, the area under the graph at a single point is zero. This implies that if Y is a continuous random variable, for any two numbers a and b, with a b, P (a Y b) =P (a <Y b) =P (a Y<b)=P (a <Y <b). We wish to emphasize the point that the equality of the four probabilities above holds for only continuous random variables. For any function f(y) to be a legitimate probability density function, it must satisfy 1. f(y) 0, for all y. 2. R f(y)dy = total area under any density curve = 1. Example: Problem 5, Page 151 A college professor never finishes his lecture before the end of the hour and always finishes his lectures within 2 min after the hour. Let X = the time that elapses between the end of the hour and the end of the lecture and suppose the pdf of X is (a) Find the value of k. f(x) = ( kx 2 0 x 2 0 otherwise. 2

3 (b) What is the probability that the lecture ends within 1 min of the end of the hour? Solution: In our discussion on discrete random variables we studied some random variables with special mass functions such as binomial, hypergeometric and Poisson. The first random variable with a special density function we shall discuss is the uniformly distributed random variable. By definition, a continuous random variable X is said to have a uniform distribution on the interval [A, B] if the pdf of X is f(x; A, B) = ( 1 B A A x B 0 otherwise. This is equivalent to the notion of equally likely outcomes of a sample space. Example: Problem 7, Page 151 The time X (min) for a lab assistant to prepare the equipment for a certain experiment is believed to have a uniform distribution with parameters 25 and 35. (a) Write the pdf of X and sketch its graph. (b) For any a such that 25 <a<a+2< 35, what is the probability that preparation time is between a and a + 2 min? Solution: 4.3 Cumulative Distribution Functions and Expected Values The cumulative distribution function (cdf) F (x) of a continuous random variable, for every number x, is the entire area to the left of the point x under the density curve f(x) ofx. That is, F (x) =P (X x) = Z x f(y)dy. Notice that the variable in the integrand has been changed to y in order not to confuse that with the upper limit of the integral. For the purpose of illustration, let us determine the cdf of the uniform random variable X in Problem 7, Page 151. Now, for this problem, ( 1 25 x 35 f(x;25, 35) = otherwise. It follows that for every x, F (x) = Z x f(y)dy = Z x 25 1 x 25 dy =

4 Therefore the cdf of X = time for a lab assistant to prepare equipment is F (x) = 0 x<25 x x<35 1 x 35. The idea of complements can also be used to compute probabilities when dealing with continuous random variables. If X is a continuous random variable and a is any constant, then P (X a) =P (X >a)=1 P (X a) =1 F (a). Some students may have noticed that this is different from the definition for discrete random variables. Recall that if Y is a discrete random variable and a is any integer, then P (Y a) =1 P (X <a)= 1 F (a 1). Students should therefore take note of this difference between discrete and continuous random variable. Also, we note that if X is a continuous random variable and a and b are constants, such that a b, then P (a X b) =P (X b) P (X a) =F (b) F (a). Again, this is slightly different from what we would have done if X were to be discrete. Percentiles of a Continuous Distribution Now, we have seen that given a value, say x = 4 and the density function of the continuous random variable X, we can compute the cdf of X at the point x = 4. For instance, let ( 0.5x 0 x 2 f(x) = 0 otherwise. Then, it is easy to verify that 0 x<0 F (x) = 0.25x 2 0 x<2 1 x 2. Thus, we can see that at x =3,F (3) = 1 and F ( 0.2) = 0. Similarly, at x =1.5, F (1.5) = 0.25 (1.5) 2 = Now, the question is, suppose we know that F (x) =0.375 and we are given the density function f(x), is it possible to find the correponding value of x, sayx = c? In this case, we can see that the value of x that gives F (x) =0.375 is x =1.5. In statistics, we call the point x =1.5, the ( )th = 37.5th percentile of the distribution of the continuous random variable X. Students may have observed that for both discrete and continuous random varables, the smallest value of the cdf is zero and the largest value is 1. A general definition of the percentile of a distribution is as follows. 4

5 Definition: Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous random variable X, we shall denote by c, is that value for which Z c p = F (c) = f(y)dy. Important cases of p are as follows. (a) p =0.5gives the 50th percentile which is also the median of the distribution of X. The median is the point on the x axis which divides the area under the density curve of X into two equal parts. In the course text, the median is denoted by eµ. (b) p =0.25 gives the 25th percentile which is also the first quartile of the distribution of X. The first quartile is the point on the x axis which divides the area under the density curve of X into two parts such that the area to the left is 25% of the total area and the area to the right is 75% of the total area. (c) p =0.75 gives the 75th percentile which is also the third quartile of the distribution of X. The third quartile is the point on the x axis which divides the area under the density curve of X into two parts such that the area to the left is 75% of the total area and the area to the right is 25% of the total area. As an example, suppose we wish to find the median of the distribution of the random variable X with density function given by f(x) = ( 0.5x 0 x 2 0 otherwise. The first step is to compute F (x). From previous results, we have 0 x<0 F (x) = 0.25x 2 0 x<2 1 x 2. Now, we are seeking the value of x for which F (x) = 50% = 0.5. Thus, we solve the equation F (x) =0.25x 2 =0.5, for x. We obtain x = ± q = ± 2. Since the negative value falls outside the acceptable limits, the median of the distribution of X is x = 2. Examples and Practice Problems (see class notes for solutions) 5

6 Problem 1, Page 150 Let X denote the amount of time for which a book on a 2-hour reserve at a college library is checked out by a randomly selected student and suppose that X has density function ( 0.5x 0 x 2 f(x) = 0 otherwise Calculate the following probabilities (a) P (X 1) (b) P (0.5 X 1.5) (c) P (1.5 X). Problem 7, Page 151 The time X (min) for a lab assistant to prepare the equipment for a certain experiment is believed to have a uniform distribution with A = 25 and B = 35. (a) Write the pdf of X and sketch its graph. (b) What is the probability that preparation time exceeds 33 min? (c) What is the probability that preparation time is within 2 min of the mean time? (d) For any a such that 25 <a<a+2 < 35, what is the probability that preparation time is between a and a + 2 min? Problem Find the cdf of X = measurement error with pdf (a) Compute P (X <0). f(x) = (b) Compute P ( 1 <X<1). ( 3 (4 32 x2 ) 2 x 2 0 otherwise (c) Verify that the median of the distribution of X is zero Expected or Mean Values Definition: The expected or mean vaue of a continuous random variable X with pdf f(x) is µ = µ X = E(X) = Z x f(x)dx. 6

7 If h(x) is any function of X, then µ h(x) = E[h(X)] = Z h(x) f(x)dx. A very important special case which we have met in our study of discrete random variable is when h(x) =(X µ) 2. The expectation is then called the variance of X and denoted by σ 2 or σx 2 or V (X). That is, σ 2 = V (X) = Z (x µ) 2 f(x)dx. We repeat, once again, that for computational purposes, it is easier to use the expression V (X) =E(X 2 ) [E(X)] 2. Examples and Practice Problems (see class notes for solutions) Problem 21, Page 160 An ecologist wishes to mark off a circular sampling region having radius 10m. However, the radius of the resulting region is actually a random variable R with pdf f(r) = (a) Compute E(R) and V (R). ( 3 4 [1 (10 r)2 ] 9 x 11 0 otherwise (b) What is the expected area of the resulting circular region? Problem 22, Page 160 The weekly demand for propane gas (in 1000 s of gallons) from a particular facility is a continuous random variable X with pdf f(x) = (a) Compute the cdf of X. ( 2 ³ 1 1 x 2 1 x 2 0 otherwise. (b) Obtain an expression for the (100p)th percentile. What is the median? (c) Compute the mean and variance of X. (d) If 1.5 thousand gallons are in stock at the beginning of the week and no new supply is due in during the week, how much of the 1.5 thousand gallons is expected to be left at the end of the week? 7

8 4.5. The Normal Distribution In statistics, the normal distribution is the most important distribution because several random variables that we encounter in real life have distributions that can be approximated by the normal distribution. Students who chose to pursue a career in statistics or take more statistics courses will find that the normal distribution is central to several statistical techniques that are used in data analysis. Definition: A continuous random variable X is said to have a normal distribution with parameters µ and σ 2 if the pdf of X is f(x; µ, σ 2 )= 1 σ 2π exp ½ 1 2 ³ x µ 2¾ σ x, where µ and σ 2 > 0. It can be shown that the mean and variance of a normally distributed random variable X are E(X) =µ and V (X) =σ 2. The function g(x) =exp(x) is called the exponential function with f(1) = exp(1) In statistics, the notation X N(µ, σ 2 ) is interpreted to mean that the random variable X has the normal distribution with mean µ and variance σ 2. Some properties of the normal distribution are as follows. (a) The normal density curve is bell shaped. (b) The normal density curve is symmetric about the mean x = µ. (c) The median of the normal density curve is equal to the mean of the normal density curve. That is, the curve is centered on x = µ and the point x = µ divides the area under the normal curve into two equal parts. Therefore, the area to the left of x = µ is 0.5 and the area to the right is also 0.5. (d) If X N(µ, σ1) 2 and Y N(µ, σ2) 2 are two normally distributed random variables with same mean but different variance σ1 2 > σ2), 2 then the density curve of the random variable X with the larger variance σ1 2 will be more spread out than the density curve for Y. Now, let X N(µ, σ 2 ) consider a change of variable from x to z by defining a new random variable Z = X µ, σ which comes from the argument of the exponential function in the expression for the normal density. This technique is called standardizing the random variable X. IfQ is a random variable with mean 8

9 µ Q = E(Q) and standard deviation σ Q = q V (Q), then we can define a new random variable by standardizing Q. In general, to standardize Q, we subtract µ Q from Q and divide by σ Q to obtain the expression, say U, given by U = Q µ Q. σ Q It is quite straightforward to verify that µ Z = E(Z) = 0 and σz 2 = V (Z) = 1. Now, using a common technique in statistic, which is beyond the scope of this course, we can see that the new density function for Z, called the standard normal density function is f(z;0, 1) = 1 2π exp ³ z2 z. 2 The properties of the standard normal density curve are the same as before except that µ = 0 and σ 2 = 1. We shall denote the cdf of a standard normal random variable Z by Φ(Z) Φ(Z) =P (Z z) = Z f(y;0, 1)dy. Observe that computing probabilities involving normal random variables will require that we integrate the normal density function. This integral is not easy to evaluate. As a result, values of the cumulative distribution function Φ(z) of the standard normal curve have been evaluated and tabulated for us to use. Examples (see notes for solutions) Problem 26, Page 171 Let Z be a standard normal random variable and calculate the following probabilities (a) P (0 Z 2.17) (b) P ( 1.53 Z 2) (c) P ( 1.75 Z) Problem 27, Page 171 In each case, determine the value of the constant c that makes the probability statement correct. (a) Φ(c) = (b) P (c Z ) =

10 Percentiles of the Standard Normal Distribution We recall that the point c, such that Φ(c) =P (Z c) =p is called the (100p)th percentile of the standard normal distribution. A notation that is commonly used in relation to the normal distribution is z α, where z α is the 100(1 α)th percentile of the standard normal distribution. That is, Φ(z α )=P(Z z α )=1 α. This means that the area under the standard normal curve to the right of z α is α and the area to the left is 1 α. As a result of symmetry of the standard normal curve about the point µ z =0, we have that the area to the left of z α is α and the area to the right is 1 α. The z α values are usually called critical values rather than percentiles for reasons that will be made clear later in our studies. This notation is particularly useful and convenient because later in this course we will need the critical values for some small values of α. Example: From the standard normal table, we can see that (a) z 0.05 = 95th percentile of the standard normal distribution = That is, for Φ(z 0.05 )=P(Z z 0.05 )= = 0.95, we must have z 0.05 = (b) Similarly, z = 97.5th percentile of the standard normal distribution = (c) z 0.01 = 99th percentile of the standard normal distribution = Students will sometimes encounter problems that involve nonstandard normal distributions in the sense that although the distribution is normal, the mean is not zero and the variance may or may not be 1. In that case, we cannot use the standard normal tables directly to compute probabilities. We first have to transform or standardize the nonstandard normal random variable into a standard normal random variable as previously discussed before we can use the standard normal table. For instance, let X N(µ, σ 2 ) with µ 6= 0 and σ 6= 1, and we wish to compute P (a X b). Since X has a nonstandard normal distribution, we first apply the transformation Z = X µ, σ to the variable X and the limits a and b and then use the standard normal table to obtain the required probability as follows. Ã a µ P (a X b) = P X µ b µ! σ σ σ Ã a µ = P Z b µ! σ σ Ã! µ b µ a µ = Φ Φ. σ σ 10

11 Example (see class notes for solution) Problem 31, Page 171 Suppose the force acting on a column that helps to support a building is normally distributed with mean 15 kips and standard deviation 1.25 kips. (a) What is the probability that the force is at most 18 kips? (b) What is the probability that the force is between 10 and 12 kips? (c) What is the probability that the force differs from 15 kips by at most 2 standard deviations? (d) How would you characterize the largest 5% of all values of the force? Problem 32, Page 171 The article Reliability of Domestic-Waste Biofilm Reactors (J. of Envir. Engr., 1995: ) suggests that substrate concentration (mg/cm 3 )ofinfluent to a reactor is normally distributed with µ =0.3 and σ =0.06. (a) What is the probability that the concentration exceeds 0.25? (b) How would you characterize the largest 5% of all concentration values? The Normal Approximation to the Binomial Distribution Students may have noticed that the normal distribution is a continuous distribution. However, it is sometimes used to approximate distributions that are not continuous. Surprisingly, these approximations are sometimes very close to the true value. In this section we shall discuss one of several possible approximations. In such cases a correction, called continuity correction is often introduced in the calculations to account for the fact that we are using a continuous distribution to approximate a discrete distribution. Let X Bin(n, p). Then, for sufficiently large value of n, the distribution of X can be approximated by a normal distribution with mean µ = np and variance σ 2 = np(1 p). The approximation appear to work well if both np 10 and n(1 p) 10. In particular, if x is any one of the possible values of X, then µ X µ P (X x) = B(x; n, p) P (X x +0.5) = P x +0.5 µ σ σ = P q X np x +0.5 np q np(1 p) np(1 p) = Φ q x +0.5 µ. np(1 p) 11

12 Example (see class notes for solution) Problem 51, Page 173 Suppose only 40% of all drivers in a certain state regularly wear a seat belt. A random sample of 500 drivers is selected. What is the probability that between 180 and 230 (inclusive) of the drivers in the sample regularly wear a seat belt? 5. JOINTLY DISTRIBUTED DISCRETE RANDOM VARIABLES AND RANDOM SAMPLES In some experimental situations, the experimenter may define more than one random variable, say X and Y, on the sample space of the experiment and may be interested in studying some properties of these random variables. The experimenter may wish to obtain the distribution of X and the distribution of Y separately. It is also possible to obtain the distribution of X and Y together (i.e. the joint distribution of X and Y. The joint probability distribution of X and Y outlines how much probability mass is placed on each possible pair of values (x, y). In this section, we shall restrict our discussion to joint distributions for discrete random variables. Definition: Let X and Y be two discrete random variables defined on the sample space of an experiment. The joint probability mass function (pmf) p(x, y) ofx and Y is defined, for each pair of numbers (x, y), by p(x, y) =P (X = y and Y = y). Given the joint pmf of two discrete random variables, it is possible to obtain the pmf of each of the random variables separately. These separate pmfs are called marginal pmfs. The marginal pmf of, say X, is obtained by summing the joint pmf over all possible values of Y. Definition: Let the joint probability mass function (pmf) of X and Y be p(x, y). Then the marginal pmf of X, denoted by p X (x) is given by p X (x) = X p(x, y). all y Similarly, the marginal pmf of Y, denoted by p Y (y) is given by p Y (y) = X all x p(x, y). 12

13 Given the marginal pmfs of X and Y, we say that the two random variables are independent if the joint pmf is equal to the product of the marginal pmfs at all possible pair of values (x, y). That is, if X and Y are independent then p(x, y) =p X (x) p Y (y), for every (x, y). If this relation fails to hold, even for only one of all the possible values, then the random variables are not independent. They are dependent. Examples and Practice Problems (see class notes for solutions) Problem 1, Page 215 A service station has both self-service and full-service islands. On each island, there is a single regular unleaded pump with two hoses. Let X denote the number of hoses being used on the self-service island, and let Y denote the number of hoses on the full-service island in use. The joint pmf of X and Y are in the table below: y p(x, y) x (a) What is P (X =1and Y =1)? (b) Compute P (X 1 and Y 1). (c) Compute P (X 6= 0and Y 6= 0). (d) Compute P (X + Y 1). (e) Compute the marginal pmf of X and of Y. (f) Are X and Y independent random variables? Problem 3, Page 216 A certain market has both an express checkout line and a superexpress checkout line. Let X 1 denote the number of customers in line at the express checkout at a particular time of day and let X 2 denote the number of customers in line at the superexpress checkout at the same time. Suppose the joint pmf of X 1 and X 2 is as given in the accompanying table. x 2 p(x, y) x

14 (a) What is the probability that there is exactly one customer in each line? (b) What is the probability that the numbers of customers in the two lines are identical? (c) What is the probability that there are at least two more customers in one line than in the other line? (d) What is the probability that the total number of customers in the two lines is exactly four? At least four? Problem 4, Page 216 Return to the situation described in Problem 3, Page 212. (a) Determine the marginal pmf of X 1 and then calculate the expected number of customers in line at the express checkout. (b) Determine the marginal pmf of X 2. (c) Are X 1 and X 2 independent? 5.1. Expected Values, Covariance and Correlation Consider two jointly distributed discrete random variables X and Y with joint pmf p(x, y). Let h(x, Y ) be any function of the random variables, then h(x, Y ) is itself a random variable. The expected or mean value of h(x, Y ) denoted by µ h(x,y ) or E[h(X, Y )] is given by µ h(x,y ) = E[h(X, Y )] = X X h(x, y) p(x, y). all x all y A special case is when h(x, Y ) takes the form h(x, Y )=(X µ X )(Y µ Y ). In that case, E[h(X, Y )] is called the covariance of X and Y and denoted by Cov(X, Y ). That is, the covariance between X and Y is Cov(X, Y )=E[(X µ X )(Y µ Y )] = X X (x µ X )(y µ Y ) p(x, y). all x all y For ease of computation it is best to use the formula Cov(X, Y )=E(XY ) µ X µ Y. If X and Y are independent, it can be shown that E(XY )=E(X) E(Y )=µ X µ Y. It is then clear that when X and Y are independent, Cov(X, Y ) = 0. On the other hand Cov(X, Y ) = 0 does not necessarily imply independence. 14

15 We have seen that the variance of a random variable σx 2 or V (X) is a measure of the size of the error made by an experimenter. If the experimenter committed a lot of mistakes during experimentation, the variance will tend to be large. Alternatively, if the experimenter was very careful and made little or no mistakes during repeated trials of the experiment, variance will tend to be small. Now, the covariance between X and Y measures how the two variables are changing together. For instance if the values of Y tend to increase as X increases, then we say that there is a positive linear relationship between X and Y. In that case, the value of Cov(X, Y ) will be positive and large if the linear relationship is very strong. On the other hand if the values of Y tend to decrease as X increases, then there is a negative linear relationship between X and Y and the Cov(X, Y ) will be negative. One problem with using covariance as a measure of the strength of linear relationships is the fact that it is usually affected by the unit in which the observations were measured. In order to remove the dependence of the covariance on the units, it is standardized to obtain the correlation coefficient denoted by ρ X,Y or simply ρ or Corr(X, Y ) and given by ρ X,Y = Cov(X, Y ) σ X σ Y, where σ X is the standard deviation of X and σ Y is the standard deviation of Y. Examples and Practice Problems (see class notes for solutions) Refer to Problem 1, Page 215. (a) Compute the expected total number of hoses in use E(X + Y ). (b) Compute the covariance for X and Y. (c) Compute the correlation for X and Y. Problem 22, Page 224 An instructor has given a short quiz consisting of two parts. For a randomly selected student, let X = the number of points earned on the first part and Y = the number of points earned on the second part. Suppose that the joint pmf of X and Y is given in the accompanying table y p(x, y) x

16 (a) If the score recorded in the grade book is the total number of points earned on the two parts, what is the expected recorded score E(X + Y )? (b) If the maximum of the two scores is recorded, what is the expected recorded score. (c) Compute the covariance for X and Y. (d) Compute the correlation for X and Y. (e) Instead of marking the first part out of 15 and the second part out of 10, suppose the instructor changed the points in the first part to y =0, 1, 2, 3 and x =0, 1, 2 for the second part. How would this affect the covariance and the correlation? Explain Statistics and Their Distribution Suppose that the owner of a vending machine on campus is trying to determine how often the machine should be refilled in a week. The owner decides to observe the number of items left in the vending machine at the end of each day. Let X = number of items left in the machine on any given day. Now, at the end of Day 1 suppose the observed number of items is x 1. Similarly, let x 2,x 3,,x 30 be the observed numbers from Day 2 to Day 30. We shall call these observed values a sample of size 30. From these observed numbers we can compute the average number of items the owner expects will be left over in any month of the year, x = x 1 + x x = X i=1 x i. We can also determine how the numbers vary from day to day, s 2 = X i=1 (x i x) 2, called the sample variance. Suppose, we repeat the experiment for another 30 days, we expect the observed values x 1,x 2,x 3,,x 30 to be slightly or very different from the observed values from the first experiment. This will in turn lead to different values of x and s 2. This variation in observed values in turn implies that the value of any function of the sample observations, such as mean, variance, standard deviation, and so on, also varies from sample to sample. Definition: A statistic is any quantity that is calculated from sample data. Now, before we actually observe the daily number of left over items, the owner has no way of knowing what the value will be on any given day. The number could range from 0 to the total number of items the machine can contain, say n. Therefore, before the experiment is actually performed, the 16

17 daily number of leftover items is itself viewed as a random variable. Denote the random variables by X 1,X 2,X 3,,X 30. Therefore, the mean and variances computed from these random variables will also be a random variable, X = X 1 + X X = 1 30 and S 2 = 1 30X (X i 29 X) 2. i=1 Notice that we have used lower case or small letter x for the actual observations and upper case or capital letter X to represent the random variable. Thus, the sample mean regarded as a statistic is denoted by X with calculated or observed value x. Similarly, the sample variance regarded as a statistic is represented as S 2 with s 2 as its computed value. Definition: The random variables X 1,X 2,,X n are said to form a random sample of size n if 1. the X i s are independent random variables. 2. every X i has the same distribution. Another way of saying the same thing is to say that X 1,X 2,,X n are independent and identically distributed abbreviated as iid. Recall that every random variable has a probability distribution. This implies that every statistic, being a random variable, has a probability distribution. The probability distribution of a statistic is sometimes called sampling distribution simply to emphasize how values of a statistic varies from sample to sample. The main issue is how the sampling distribution of a statistic can be derived. There are a number of ways in which this can be done. If the X i s is a random sample and their distribution is known, then we can use probability rules to determine the distribution of a statistic computed from the random sample. In other situations were it is too complicated to use probability rules we may obtain information about the sampling distribution of a statistic by using a computer to mimick the behaviour of the statistic through what is normally called a simulation experiment. We shall use an example to illustrate how the probability mass function of a statistic can be derived. Example (see class notes for solution) Problem 38, Page 234 There are two traffic lights on my way to work. Let X 1 be the number of lights at which I must stop and suppose that the distribution of X 1 is as follows: 17 30X i=1 X i,

18 x µ =1.1, σ p(x 1 ) =0.49 Let X 2 be the number of lights at which I must stop on the way home; X 2 is independent of X 1. Assume that X 2 has the same distribution as X 1, so that X 1, X 2 is a random sample of size n =2. (a) Let T 0 = X 1 + X 2 and determine the probability distribution of T 0. (b) Calculate µ T0 and σ 2 T 0. Problem 41, Page 236 Let X be the number of packages being mailed by a randomly selected customer at a certain shipping facility. Suppose the distribution of X is as follows. x p(x) (a) Consider a random sample of size n = 2 and let X be the sample mean number of packages shipped. Obtain the probability distribution of X. (b) Refer to Part (a) and calculate P ( X 2.5). (c) Calculate µ X and σ 2 X. (d) Again consider a random sample of size n = 2, but now focus on the statistic R = the sample range (difference between the largest and the smallest values in the sample). Obtain the distribution of R The Distribution of the Sample Mean In Problem 41, Page 236, we saw that the population mean of the random variable X, wasµ = E(X) = 2 and the population variance was σ 2 = V (X) =E(X 2 ) µ 2 =5 4 = 1. The sample mean X of a random sample of size n on the random variable X provides an indication of what a typical value of X will be and the sample variance S 2 tells us how spread out the values of X will be from this typical value X. Intuitively, we can say that since X is a typical value of X, the mean or expected value of X should be the same as that of X. That is, we expect that µ X = E( X) =E(X) =µ =2. In our example in class, Problem 41, Page 236, we saw that this was the case. However, we saw that the variance of X was slightly different from the variance of X, in the sense that σ 2 X = V ( X) = V (X) n = σ2 n = 1 2 =

19 This gives us an idea of the relationship between the mean of X, µ and the mean of X, µ X and that between the variance of X, σ and the variance of X, σ 2 X. This relationship can be extended to more general situations. We state this extension below. Proposition: Let X 1,X 2,,X n be a random sample from a distribution with mean value µ and standard deviation σ. Then 1. µ X = E( X) =E(X) =µ. 2. σ 2 X = V ( X) = V (X) n = σ2 n. and σ X = q V (X) n = σ n. In addition, if T 0 = X 1 + X 2 + X X n is the sample total, then E(T 0 )=nµ, σt 2 0 = V (T 0 )=nσ 2 and σ T0 = σ n. We remark that the above statement refers to a random sample from any distribution. We state below the distribution of X if the random sample is from a normal distribution. Proposition: If in particular the random sample X 1,X 2,,X n is from a normal distribution with mean µ and variance σ 2,(X N(µ, σ 2 )) then ³ 1. X N µ, σ. 2 n 2. T 0 N (nµ, nσ 2 ). Remark: Even when the random sample X 1,X 2,,X n is not from a normal distribution but the sample size n is sufficiently large, it can be shown that both the sample mean X and the sample total T 0 can be approximated by a normal distribution. That is, if the random sample is not from a normal distribution, then ³ 1. X N µ, σ. 2 n 2. T 0 N (nµ, nσ 2 ). The larger the value of n, the better the approximation. This statement of approximation is called the central limit theorem (CLT). In this class, by sufficiently large, we mean n>30. Example (see class notes for solution) Problems 46, 47, Page 242 The inside diameter of a randomly selected piston ring is a random variable with mean value 12cm and standard deviation 0.04cm. 19

20 (a) If X is the sample mean diameter for a random sample of n = 16 rings, where is the sampling distribution of X centered, and what is the standard deviation of the X distribution? (b) Suppose the distribution of diameter is normal, calculate P (11.99 X 12.01). Problem 48, Page 242 Let X 1,X 2,...,X 100 denote the actual net weights of 100 randomly selected 50-lb bags of fertilizer. If the expected weight of each bag is 50 and the variance is 1, calculate P (49.75 X 50.25). Problem 50, Page 243 The breaking strength of a rivet has a mean value of 10,000psi and a standard deviation of 500psi. What is the probability that the sample mean breaking strength for a random sample of 40 rivets is between 9900 and 10,200? 5.4. The Distribution of a Linear Combination Some students may have noticed that the sample mean can also be written as X = X 1 + X X n, n X = 1 n X n X n X n. If we replace the coefficients of X 1,X 2,,X n, in the expression for X, by arbitrary constants a 1,a 2,,a n, we obtain a more general expression nx Y = a 1 X 1 + a 2 X a n X n = a i X i, called a linear combination we can use to represent any statistic for different values of the constants. We observe that the sample mean X is a special case of the linear combination Y with a 1 = 1, n a 2 = 1,,a n n = 1. When a n 1 =1,a 2 = a 3 = = a n = 1, the linear combination Y becomes the sample total T 0. The next issue is how to compute the expectation and variance of a linear combination. Proposition: Let X 1,X 2,,X n be random variables with mean values E(X 1 )=µ 1, E(X 2 )=µ 2,,E(X n )=µ n and variances V (X 1 )=σ1, 2 V (X 2 )=σ2, 2,V(X n )=σn. 2 Then, the mean value of the linear combination Y is µ Y = E(a 1 X 1 + a 2 X a n X n ) = a 1 E(X 1 )+a 2 E(X 2 )+ + a n E(X n ) nx = a 1 µ 1 + a 2 µ a n µ n = a i µ i. 20 i=1 i=1

21 Furthermore, if the random variables X 1,X 2,,X n are independent, then the variance of the linear combination Y is σy 2 = V (a 1 X 1 + a 2 X a n X n ) = a 2 1V (X 1 )+a 2 2V (X 2 )+ + a 2 nv (X n ) = nx a 2 1σ1 2 + a 2 2σ a 2 nσn 2 = a 2 i σi 2. Now, if the random variables X 1,X 2,,X n are not independent then the variance of the linear combination Y is nx nx σy 2 = V (a 1 X 1 + a 2 X a n X n )= a i a j Cov(X i,x j ). i=1 i=1 j=1 To fix ideas, we discuss the following examples. Example (see class notes for solution) Problem 59, Page 247 Let X 1, X 2 and X 3 represent the times necessary to perform three successive repair tasks at a certain service facility. Suppose they are independent normal random variables with expected values µ 1, µ 2 and µ 3 and variances σ1, 2 σ2 2 and σ3 2 respectively. If µ 1 = µ 2 = µ 3 = 60 and σ1 2 = σ2 2 = σ3 2 = 15 calculate (a) P (X 1 + X 2 + X 3 200). (b) P (58 X 62). (c) P ( 10 X 1 0.5X 2 0.5X 3 5). (d) If µ 1 = 40, µ 2 = 50, µ 3 = 60, σ1 2 = 10, σ2 2 = 12 and σ3 2 = 14, calculate P (X 1 + X 2 + X 3 160). Problem 70, Page 248 Suppose we take a random sample of size n from a continuous distribution having median 0 so that the probability of any one observation being positive is 0.5. We now disregard the signs of the observations, rank them from smallest to largest in absolute value, and then let W = the sum of the ranks of the observations having positive signs. For example, if the observations are -0.3, 0.7, 2.1, and -2.5, then the 21

22 ranks of the positive observations are 2 and 3, so W = 5. In Chapter 15, W is called Wilcoxon s signed rank statistic. W can be represented as follows: nx W =1 Y 1 +2 Y 2 +3 Y n Y n = i Y i. where the Y i s are independent Bernoulli random variables, each with p =0.5 (Y i = 1 corresponds to the observation with rank i being positive). Compute the following: i=1 (a) E(Y i ) and then E(W ). Hint nx i = i=1 n(n +1). 2 (b) V (Y i ) and then V (W ). Hint nx i 2 = i=1 n(n + 1)(2n +1) POINT ESTIMATION NOT PART OF SYLLABUS. SKIP TO CHAPTER INTERVAL ESTIMATION Let us assume that we are trying to figure out the amount of money a student needs to survive in a typical university for a given semester. Now, based on our experience as students or other information available to us, in the form of data, we may estimate the amount in two ways. We can choose to use the data to specify a single amount, say $ , as our estimate. Such an estimate which specifies a single value is called a point estimate. It is called a point estimate because it is a single value, a point in some sense. Such estimates are discussed in Chapter 6. Someone who is more careful and does not like to be wrong may suggest instead that we specify an interval as an estimate. So, we may say, the amount should lie between $3500 and $4500. Such an estimate is 22

23 called an interval estimate. We may decide not to be too precise by specifying a very wide range of values, say $3500 and $ In that case, the interval estimate will be unreliable and therefore not very helpful to students because it is too wide, the width being $ $3500 = In this section, we will be focusing on how to use information in data to obtain interval estimates Basic Properties of Confidence Intervals We shall introduce the basic concepts of interval estimation with an example that is practically unrealistic. We shall then consider situations that may arise in practice. Consider a random variable X = alcohol content of a certain brand of wine manufactured by a company. Suppose, the problem is to estimate the unknown population mean µ of the alcohol content of all bottles of this particular brand of wine manufactured by the company. Let us assume that X N(µ, σ 2 ), with σ 2 known and suppose that we take a random sample of size n wine bottles of this brand with alcohol content X 1,X 2,,X n. On the average, the alcohol content of the n wine bottles will be X. From results of 5, we know that X N(µ, σ 2 /n). If we standardize X, we obtain a standard normal random variable Z = X µ σ n. Using the cumulative standard normal probability table we find that P ( 1.96 <Z<1.96) = If we replace Z by the expression for standardizing the mean, we have P 1.96 < X µ σ < 1.96 =0.95. n Then, by simplifying the expression within the brackets, we can rewrite the probability expression above as Ã P X 1.96 σ <µ< X σ! =0.95. n n This simply means that the probability that the random interval Ã X 1.96 σ, X σ! n n, will cover the true mean µ is The interval is random because the statistic X is random. That is, before the experiment is performed and measurements are taken, we are 95% certain or confident 23

24 that the true mean µ will lie inside the interval above. Once the experiment is performed and measurements are observed, say X 1 = x 1,X 2 = x 2,,X n = x n, then the observed average or mean becomes X = x. The resulting fixed, non-random interval given by Ã x 1.96 n σ, x n σ!, or written as x 1.96 σ <µ< x σ n n or in a more compact form as x ± 1.96 σ, n is called a 95% confidence interval for µ. The situation we have used to obtain the expression for the confidence interval is practically unrealistic because of the assumptions we have made. We assumed that the random sample is from a normal population with unknown mean µ and known variance σ 2. In practice, the mean is needed in order to compute the variance. So, if the mean is unknown, it is highly unlikely that the variance will be known. Some students may have noticed that there is a relationship between the value 1.96 in the expression for the confidence interval and the confidence level We observe that 1.96 is the (1 0.05/2)100th =( )100th = 97.5th percentile or critical value of the standard normal distribution. Using the notations from Chapter 4, we write z =1.96. Thus, for a general confidence level, say 99% or 98% or 90%, a (1 α)100% confidence interval for the population mean µ of a distribution is obtained by replacing 1.96 with the critical value z α, 2 Ã x z α 2 σ n, x + z α 2! σ, n or x ± z α 2 σ n. The width of the (1 α)100% confidence interval is Ã x + z α 2! Ã σ x z n α 2! σ =2z n α 2 σ n. An interval width a large width shows that the error in our estimation is too large and therefore the interval will not be reliable or the estimate becomes unrealistic. Example (see class notes for solution) 24

25 Problems 1, 5, Page 290 Assume that the helium porosity (in percentage) of coal samples taken from any particular seam is normally distributed with true standard deviation (a) What is the confidence level for the interval x ± 2.81 σ n? (b) What value of z α/2 in the CI formula results in a confidence level of 99.7%? (c) Compute a 98% CI for the true average porosity of a certain seam if the average porosity for 16 specimens from the seam was (d) How large a sample size is necessary if the width of the 95% interval is to be 0.4? Problem 2, Page 290 Each of the following is a confidence interval for µ = true average (i.e., population mean) resonance frequency (Hz) for all tennis rackets of a certain type: (114.4, 115.6) (114.1, 115.9) (a) What is the value of the sample mean resonance frequency? (b) Both intervals were calculated from the same sample data. The confidence level for one of these intervals is 90% and for the other is 99%. Which of the intervals has the 90% confidence level, and why? 7.2. Large-Sample Confidence Intervals for a Population Mean and Proportion In 7.1 we made some assumptions which are practically unrealistic. We assumed that the variance of the population distribution σ 2 is known. The assumption that the random sample is from a normal population may be valid sometimes but may fail in other situations. We will now discuss large-sample confidence intervals whose validity does not depend on these assumptions. Large-Sample Interval for µ Let X 1,X 2,,X n be a random sample from a population whose distribution is not known with mean µ and variance σ 2 that are also unknown. Recall that if the sample size n is large, then by the Central Limit Theorem we discussed in Chapter 5, the distribution of the sample mean X is approximately normal with mean E( X) =µ and variance V ( X) =σ 2 /n. That is, It follows that the standardized random variable X N(µ, σ 2 /n). Z = X µ σ n, 25

26 has approximately the standard normal distribution. Therefore, P z α/2 X µ z α/2 1 α. σ n Following the technique used in 7.1 we can rearrange the expression in the parentheses to obtain an approximate (1 α)100% confdence interval for the population mean µ as x ± z α/2 The difficulty with this expression for confidence interval is that σ is not known. Therefore, since n is sufficiently large, we simply replace σ by the sample standard deviation σ n. v u s = t 1 Ã X n x 2 i n x!, n 1 2 i=1 to obtain the large-sample confidence interval for µ with an approximate confidence level of (1 α)100% as s x ± z α/2. n Examples (see class notes for solution) Problem 13, Page 297 The article Extravisual Damage Detection? Defining the Standard Normal Tree (Photogrammetric Engr. and Remote Sensing, 1981: ) discusses the use of color infrared photography in identification of normal trees in Douglas fir stands. Among data reported were summary statistics for green-filter analytic optical densitometric measurements on samples of both healthy and diseased trees. For a sample of 69 healthy trees, the sample mean dye-layer density was 1.028, and the sample standard deviation was (a) Calculate a 95% (two-sided) CI for the true average dye-layer density for all such trees. (b) Suppose the investigators had made a rough guess of 0.16 for the value of s before collecting data. What sample size would be necessary to obtain an interval width of 0.05 for a confidence level of 95%? 26

27 Large-Sample Interval for a Population Proportion p In Chapter we discussed how the normal distribution can be used to approximate the binomial distribution. Now, suppose that X is a binomial random variable with parameters n and p, where n is the number of trials and p is the probability or proportion of successes. We recall that if n is sufficiently large, then X has approximately a normal distribution with mean E(X) = np and variance V (X) =np(1 p). Now, if X = number of successes in n trials, a reasonable estimator of the proportion of successes is ˆp = X/n, the sample fraction of successes. Since ˆp is just a fraction of X, the distribution of ˆp is also approximately normal. It can be shown that E(ˆp) = p and V (ˆp) = p(1 p)/n. Therefore, the standardized ˆp defined by Z = ˆp p, q p(1 p) n is approximately standard normal. Thus, using the same approach as before, we find that P z α/2 ˆp p q p(1 p) n z α/2 1 α. If we rearrange the inequalities in the bracket, we obtain a large-sample confidence interval for a population proportion p with an approximate confidence level of (1 α)100% to be r ˆp + z2 α/2 ± z ˆp(1 ˆp) 2n α/2 + z2 α/2 n 4n 2 1+ z2 α/2 n Suppose that the width of the confidence interval for p is w. Then, the sample size required to obtain a confidence interval of width w is n = 2z2 α/2ˆp(1 ˆp) z2 α/2 w2 ± q 4z 4 α/2ˆp(1 ˆp)[ˆp(1 ˆp) w2 ]+w 2 z 4 α/2 w 2. Unfortunately, the formula for n involves the unknown ˆp. It can be shown that the maximum value of ˆp(1 ˆp) is attained when ˆp =0.5. Therefore, using the value of ˆp =0.5will ensure that the width will be at most w regardless of what value of ˆp results from the sample. Alternatively, if the investigator believes strongly based on prior experience and information, that the true value of p, say p 0, is less than 0.5 then the value p 0 can be used in place of ˆp. Examples (see class notes for solution). 27

28 Problem 21, Page 298 A random sample of 539 households from a certain city was selected, and it was determined that 133 of these households owned at least one firearm. Compute a 95% confidence interval for the proportion of all households in this city that own at least one firearm. Problem 23, Page 298 The article An Evaluation of Football Helmets Under Impact Conditions (American Journal of Sports Medicine, 1984: ) reports that when each football helmet in a random sample of 37 suspension-type helmets was subjected to a certain impact test, 24 showed a damage. Let p denote the proportion of all helmets of this type that would show damage when tested in the prescribed manner. (a) Calculate a 99 % CI for p. (b) What sample size would be required for the width of a 99 % CI to be at most 0.10, irrespective of ˆp Intervals Based on a Normal Population Distribution The interval estimates we discussed in 7.1 was not practical because of the assumptions underlying the estimates. In 7.2 we discussed intervals for large samples, e.g. that is when n>30. Here, we discuss interval estimates for small samples and the assumptions under which the interval estimation method can be used. In 7.2 the intervals where obtained by invoking the central limit theorem because the sample size was assumed to be large. The key was that the standardized variable Z = X µ S n, has approximately the standard normal distribution when n is large even when the population distribution of the random sample X 1,X 2,,X n is not the normal distribution. Now, when the sample size n is small, we cannot use the central limit theorem to obtain a random variable that has an approximate standard normal distribution. However, we can show that if n is small and the random sample X 1,X 2,,X n of size n is from a normal population, then the standardized variable T = X µ, S n has a probability distribution that is called a student t-distribution with n 1 degrees of freedom, where S is the standard deviation. Notice that the standardized variable has not changed, we 28

29 have simply chosen to denote it by T rather than Z in order to emphasize the point that it is no longer normally distributed. Due to the complicated mathematical expression for the probability density function of T, we shall not provide the expression in this class. In addition, to assist us in computing probabilities without actually integrating the density function, a table of values of cumulative probability for the t-distribution is available in most elementary statistics text and will be made available to us in the class. Properties of t Distributions The normal distribution is characterized by two parameters, the mean µ and the variance σ 2. By that we mean, once these two parameters are known, the normal distribution is completely determined. On the other hand, the t-distribution is characterized by only one parameter called the degrees of freedom we shall denote by the Greek letter ν. Possible values of ν are 1, 2, 3,. For a fixed value of ν, some of the properties or features of the density curve of the t-distribution are as follows. Let t ν denote the density curve of the t-distribution with ν degrees of freedom. Then, 1. each t ν curve is bell-shaped and centered at 0. The standard normal z curve also has this property. 2. each t ν curve is more spread out than the standard normal z curve. In order words, the variance of data from a population with a t-distribution is more likely to be larger than the variance of data from a population with a standard normal distribution. 3. as ν increases, the spread of the corresponding t ν curve decreases. 4. as ν becomes larger and larger or tends to infinity, the sequence of t ν curves approaches the standard normal z curve. Notation In Chapter 4, we introduced the z α notation which we called the z critical value. An equivalent notation for the t-distribution is t α,ν = the point on the t-axis of the t ν curve for which the area to the right under the curve is α. We shall call t α,ν a t critical value. The One-Sample t Confidence Interval for a Population mean Consider the random sample X 1,X 2,,X n of size n from a normal population with n<30. Let X be the sample mean. Then, the standardized random variable T = X µ S n, 29

STAT Chapter 5 Continuous Distributions

STAT Chapter 5 Continuous Distributions STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range