Continuous random variables A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The total area under a density curve is 1. The probability of any event is the area under the density curve and above the value of X that make up the event. For any continuous random variable X P(X = a) =. week 6 1
Density curves Density curve is a curve that is always on or above the horizontal axis. has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any range of values is the relative frequency (proportion) of all observations that fall in that range of values. week 6 2
Example: The curve below shows the density curve for scores in an exam and the area of the shaded region is the proportion of students who scores between 6 and 8. week 6 3
Example Uniform Distribution The density function of a continuous Uniform random variable X is given in the graph below. Find i) P(X < 7) ii) P(6 < X < 8) iii) P(X = 7) iv) P(5.5 < X < 7 or 8 < X < 9) week 6 4
Median and mean of Density Curve The median of a distribution described by a density curve is the point that divides the area under the curve in half. A mode of a distribution described by a density curve is a peak point of the curve, the location where the curve is highest. Quartiles of a distribution can be roughly located by dividing the area under the curve into quarters as accurately as possible by eye. week 6 5
Normal distributions An important class of density curves are the symmetric unimodal bell-shaped curves known as normal curves. They describe normal distributions. All normal distributions have the same overall shape. The exact density curve for a particular normal distribution is specified by giving its mean μ and its standard deviation σ. The mean is located at the center of the symmetric curve and is the same as the median and the mode. Changing μ without changing σ moves the normal curve along the horizontal axis without changing its spread. week 6 6
The standard deviation σ controls the spread of a normal curve. week 6 7
There are other symmetric bell-shaped density curves that are not normal e.g. t distribution. The normal density curves are specified by a particular function. The height of a normal density curve at any point x is given by 2 1 x μ 1 e 2 σ σ 2π Notation: A normal distribution with mean μ and standard deviation σ is denoted by N(μ, σ). week 6 8
The 68-95-99.7 rule In the normal distribution with mean μ and standard deviation σ, Approx. 68% of the observations fall within σ of the mean μ. Approx. 95% of the observations fall within 2σ of the mean μ. Approx. 99.7% of the observations fall within 3σ of the mean μ. week 6 9
Example The distribution of heights of women aged 18-24 is approximately N(64.5, 2.5), that is,normal with mean μ = 64.5 inches and standard deviation σ = 2.5 inches. The 68-95-99.7 rule says that the middle 95% (approx.) of women are between 64.5-5 to 64.5+5 inches tall. The other 5% have heights outside the range from 59.5 to 69.5 inches, and 2.5% of the women are taller than 69.5. Exercise: 1) The middle 68% (approx.) of women are between to inches tall. 2) % of the women are taller than 66.75. 3) % of the women are taller than 72. week 6 1
The Standard Normal distribution The standard normal distribution is the normal distribution N(, 1) that is, the mean μ = and the sdev σ = 1. If a random variable X has normal distribution N(μ, σ), then the standardized variable Z = X μ σ has the standard normal distribution. There is no formula to calculate areas under a normal curve. Calculations use either software or a table of areas. The most software and tables calculate one kind of area: cumulative proportions. A cumulative proportion is the proportion of observations in a distribution that fall at or below a given value and is also the area under the curve to the left of a given value. week 6 11
The standard normal tables Table IV gives proportions for the standard normal distribution. The table entry for each value z is the area under the curve between and z. The notation we use to find cumulative probabilities is P( Z z). Example: week 6 12
The standard normal tables - Example What proportion of the observations of a N(,1) distribution takes values a) less than z = 1.4? b) greater than z = 1.4? c) greater than z = -1.96? d) between z =.43 and z = 2.15? week 6 13
Properties of Normal distribution If a random variable Z has a N(,1) distribution then P(Z = z)=. The area under the curve below any point is. The area between any two points a and b (a < b) under the standard normal curve is given by P(a Z b) = P(Z b) P(Z a) As mentioned earlier, if a random variable X has a N(μ, σ) distribution, then the standardized variable Z μ = X σ has a standard normal distribution and any calculations about X can be done using the following rules: week 6 14
week 6 15 P(X = k) = for all k. The solution to the equation P(X k) = p is k = μ + σz p Where z p is the value z from the standard normal table that has area (and cumulative proportion) p below it, i.e. z p is the p th percentile of the standard normal distribution. ( ) = σ μ a Z P a X P ( ) = σ μ b Z P b X P 1 ( ) = σ μ σ μ b Z a P b X P a
Questions 1. The marks of STA221 students has N(65, 15) distribution. Find the proportion of students having marks (a) less then 5. (b) greater than 8. (c) between 5 and 8. 2. Scores on SAT verbal test follow approximately the N(55, 11) distribution. How high must a student score in order to place in the top 1% of all students taking the SAT? 3. The time it takes to complete a stat22 term test is normally distributed with mean 1 minutes and standard deviation 14 minutes. How much time should be allowed if we wish to ensure that at least 9 out of 1 students (on average) can complete it? (final exam Dec. 21) week 6 16
4. General Motors of Canada has a deal: an oil filter and lube job in 25 minutes or the next one free. Suppose that you worked for GM and knew that the time needed to provide these services was approximately normal with mean 15 minutes and std. dev. 2.5 minutes. How many minutes would you have recommended to put in the ad above if it was decided that about 5 free services for 1 customers was reasonable? 5. In a survey of patients of a rehabilitation hospital the mean length of stay in the hospital was 12 weeks with a std. dev. of 1 week. The distribution was approximately normal. a) Out of 1 patients how many would you expect to stay longer than 13 weeks? b) What is the percentile rank of a stay of 11.3 weeks? c) What percentage of patients would you expect to be in longer than 12 weeks? d) What is the length of stay at the 9 th percentile? e) What is the median length of stay? week 6 17
Normal quantile plots and their use If the stem-and-leaf plot or histogram appears roughly symmetric and unimodal, we use another graph, called normal quantile plot as a better way of judging the adequacy of a normal model. Any normal distribution produces a straight line on that plot. Interpretation of normal quantile plots: If the points on a normal quantile plot lie close to a straight line, the plot indicates that the data are normal. Systematic deviations from a straight line indicate a nonnormal distribution. Outliers appear as points that are far away from the overall pattern of the plot. week 6 18
Histogram, the nscores plot and the normal quantile plot for data generated from a normal distribution (N(5, 2)). 15 54 53 1 52 Frequency 5 value 51 5 49 48 46 47 48 49 5 51 52 53 54 value 47 46 Normal Probability Plot for value -2-1 1 2 ncores 99 ML Estimates 95 9 Mean: StDev: 5.343 17.4618 Percent 8 7 6 5 4 3 2 1 5 1 week 6 19 45 5 55 Data
Histogram, the nscores plots and the normal quantile plot for data generated from a right skewed distribution 1 Frequency 5 5 1 value 1 value 5-2 -1 1 2 week ncores 21 6 2
2 1 ncores -1-2 5 1 value Norm al Probability Plot for value 99 M L Estim ates 95 9 M ean: StDev: 2.64938 2.17848 Percent 8 7 6 5 4 3 2 1 5 1 5 1 week 6 21 Data
Histogram, the nscores plots and the normal quantile plot for data generated from a left skewed distribution 1 Frequency 5.25.35.45.55.65.75.85.95 1.5 value 1..9.8 value.7.6.5.4.3-2 -1 1 2 nscore week 6 22
2 1 nscore -1-2.3.4.5.6.7.8.9 1. value Normal Probability Plot for value 99 M L Estimates 95 9 M ean: StDev:.812.161648 Percent 8 7 6 5 4 3 2 1 5 1.5.75 1. 1.25 Data week 6 23
Histogram, the nscores plots and the normal quantile plot for data generated from a uniform distribution (,5) Frequency 9 8 7 6 5 4 3 2 1..5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. value 5 4 value 3 2 1-2 -1 1 2 ncores week 6 24
2 1 ncores -1-2 1 2 3 4 5 value Normal Probability Plot for value 99 M L Estim ates 95 9 M ean: StDev: 2.2163 1.46678 Percent 8 7 6 5 4 3 2 1 5 1-2 -1 1 2 3 4 5 6 week 6 25 Data
Question (similar to Q5 Term test Oct, 2) Below are 4 normal probability (quantile) plots and 4 histograms produced by MINITAB for some data sets. The histograms are not in the same order as normal scores plots. Match the histograms with the nscores plots. week 6 26
12 1 11 data 1 9 Frequency 5 8-2 -1 1 2 nscores 2 4 6 8 1 12 14 data 4 5 4 data 3 Frequency 3 2 1 2-2 -1 1 2 nscores 1 2 3 4 5 6 data 14 12 15 1 data 8 6 4 Frequency 1 5 2-2 -1 1 2 nscores 8 84 88 92 96 1 14 18 112 116 data 6 8 data 5 4 3 2 1-2 -1 1 2 nscores Frequency 7 6 5 4 3 2 1 week 6 2 22 24 26 28 3 32 34 36 38 4 27 data