EM375 STATISTICS AND MEASUREMENT UNCERTAINTY INTRODUCTION TO PROBABILITY Probability is a numerical value expressing the likelihood of an event happening. Example: Draw cards from a full deck. We define event A as drawing a picture card. There are 52 cards and 12 picture cards (we choose to exclude aces as picture cards). PP(AA) = 12 52 = 0.2307 Probabilities can be fractional, rational or percentage form. PP(AA) = 12 = 0.2307 = 23.07% 52 The complement or inverse of event A is given the symbol AA and is read as not A. For our example, AA means drawing anything other than a picture card. Some rules of probability: Probability is bounded in the range 0 to 0%. 0 PP 1.0 PP(AA) = 1 = 0% means event A must happen. PP(AA) = 0 = 0% means event A cannot happen. PP(AA) + PP(AA ) = 1 since either the event or its inverse must occur. If two events A and B are mutually exclusive, they cannot both occur at the same time. For example, when tossing a die if we define event A means tossing a 4 and event B means tossing a 6, we cannot, with a single throw, toss both a 4 and a 6. Therefore events A and B are mutually exclusive one happening excludes the possibility of the other happening. In this case: PP(AA or BB) = PP(AA) + PP(BB) PP(4 or 6) = PP(4) + PP(6) = 1 6 + 1 6 = 1 3 Two events A and B are independent of each other if the outcome of one does not depend on the outcome of the other. For example, the probability the display on your laptop monitor failing during your time at USNA is 4% and the probability that the disk drive will fail is 2%. The display and disk drive failures are independent. In this case the probability of both failing is given by: EM375: Probability - 1
PP(AA and BB) = PP(AAAA) = PP(AA)PP(BB) PP(both fail) = 0.04 0.02 = 0.0008 = 0.08% The probability of A, or B, or both, is called the probability of A union B and is: PP(AA BB) = PP(AA) + PP(BB) PP(AAAA) = PP(AA) + PP(BB) PP(AA)PP(BB) For the above example, what is the probability of your display and/or disk drive failing? PP(AA BB) = PP(AA) + PP(BB) PP(AAAA) = 0.04 + 0.02 0.04 0.02 = 0.0592 = 5.92% Example: You toss a single die and record the value. Then you toss it again. We define event A means tossing a 4 at the first throw and event B means tossing a 6 at the second throw. 1. Are events A and B mutually exclusive? Throwing a 4 at the first toss does not stop you throwing a 6 at the second toss. Therefore events A and B are not mutually exclusive. 2. Are events A and B independent? The result of the first toss has no effect on the second toss. Therefore events A and B are independent. 3. What is the probability of throwing a 4 on the first toss and throwing a 6 at the second toss? PP(AA and BB) = PP(AAAA) = PP(AA)PP(BB) = 1 6. 1 6 = 1 = 0.0277 = 2.77% 36 EM375: Probability - 2
We can demonstrate the independent probability equation for the laptop problem (display and/or drive failing) with a Venn diagram. On this course we do not go into the full use and production of Venn diagrams. For this example, we can assume that a Venn diagram shows probabilities as shapes. The probability of an event happening is the area of the shape. Here are the Venn diagrams in support of the above PP(AA BB)equation: This diagram represents the universe and has area 0%. It also shows event P(A), the probability of a laptop display failure. This diagram shows event P(B), the probability of a disk failure. The shaded gray area is the area we need to calculate to get PP(AA BB) EM375: Probability - 3
We need to take account of the probability of both A and B happening, P(AB). Thus PP(AA BB) = PP(AA) + PP(BB) PP(AAAA) = PP(AA) + PP(BB) PP(AA)PP(BB) PROBABILITY DISTRIBUTION FUNCTIONS Distributions of random variables tend to follow certain mathematical formulae. If we can determine the formula that matches a given situation, we can then use it to make predictions. For example, if we know the mean and standard deviation for the life expectancy of automobile tires we can predict how many tires we will need to support a fleet of vehicles. For example, if we have a fleet of 20 Humvees and know the statistics of tire wear (maybe the average life of a tire and its standard deviation), we can estimate how many tires we need to ship to the middle of the desert to keep 80% of the vehicles operational. The functions we use are called probability mass functions (for discrete random variables) or probability density functions (for continuous random variables). On this course we are mainly concerned with continuous functions. The typical process might be: Run an experiment and analyze the data to determine the statistics of a sample data set. These statistics are our best estimates of the parameters of the mathematical probability distribution functions. We then use the mathematical model (theoretical probability distribution function) to predict future events. EM375: Probability - 4
PROBABILITY DENSITY FUNCTIONS (pdf) We will define a probability density function as f(x). The probability of a random variable x being in the range a to b is given as: PP(aa xx bb) = ff(xx)dddd Recalling that integration of a function gives the area under that function, we see that the probability of a random variable x being in the range a to b is given by the area under the pdf between x = a and x = b. Note that the probability of a continuous random variable being exactly a number is zero, since: PP(aa = xx) = PP(aa xx aa) = ff(xx)dddd = 0 Some properties of probability density functions: ff(xx) 0 for all x. Otherwise some ranges of x would have a negative probability NOT ALLOWED! ff(xx)dddd = 1 since x MUST be somewhere in the range < xx < + Mean or Expected value: μμ = Variance: σσ 2 = (xx μμ) 2 ff(xx)dddd xxxx(xx)dddd aa bb Note that the mean and variance are POPULATION PARAMETERS. We determined them from the equations, and thus the full population. There was no sampling. aa aa EM375: Probability - 5
EXAMPLE: The life of a ball bearing is characterized by the following probability density function, where x is in units of 00 hours: 0 xx < ff(xx) = 3000 xx > 0.3 0.2 f(x) 0.1 0 0 20 30 40 Random variable, x a) Confirm this function has the properties of a probability density function. ff(xx) 0 for all x: Check there are no negative values for any x in range < xx < +. Zero values are OK ff(xx)dddd = 1: ff(xx)dddd = 0 + 3000 dddd = 3000 = 0 ( 1) = 1 so it checks 3xx 3 b) Calculate the expected life of a bearing, and its standard deviation. The expected life is the mean μμ = xxxx(xx)dddd xxxx(xx)dddd = 0 + xx 3000 dddd = (i.e.,,000 hours since x is in units of 00 hours) The variance is σσ 2 = (xx μμ) 2 ff(xx)dddd = (xx ) 2 3000 dddd = 75 khours 2 The standard deviation is σσ = (variance) = 75 = 8.66 (i.e., 8,660 hours) EM375: Probability - 6
c) What is the probability that a bearing chosen at random will have a life, x, that is more than the expected (mean) value? PP(xx > ) = 3000 dddd = 0.296 = 29.6% d) What is the probability that a bearing chosen at random will have a life, x, that is less than the expected (mean) value? Alternatively PP(xx < ) = 3000 dddd = 0 + 3000 dddd = 0.704 = 70.4% So PP(xx < ) + PP(xx > ) = 1 PP(xx < ) = 1 PP(xx > ) = 1 0.296 = 0.704 = 70.4% e) What is the probability that a bearing chosen at random will have a life, x, that is exactly the expected (mean) value? PP(xx = ) = 3000 dddd = 0 CUMULATIVE DISTRIBUTION FUNCTIONS The cumulative distribution function, cdf or F(x), gives the probability of a random variable rv being less than a certain value x. The cdf for many standard distributions is often tabulated in statistical texts and is also available in programs such as Matlab, and even Excel! PP(rrrr < xx) = FF(xx) = ff(xx)dddd Rather than integrating the pdf directly to calculate probabilities, we can use tabulated values of cdf to calculate the probability of a random variable being in a certain range: xx PP(aa < xx < bb) = PP(xx < bb) PP(xx < aa) = FF(bb) FF(aa) Separately look up (or calculate) the cdf for F(a) and F(b). Then calculate the required probability by taking the difference between the cdf values. EM375: Probability - 7
Also, PP(xx > aa) = 1 PP(xx < aa) = 1 FF(aa) EXAMPLE: For the previous bearing example the cdf can be calculated as follows: For xx < For xx > xx xx FF(xx) = ff(xx)dddd = 0 FF(xx) = ff(xx)dddd = 3000 dddd xx = 00 xx xx 3 = 1 00 xx 3 This two-part solution is plotted in the following figure: Why go to the added complexity of integrating the pdf to get a cdf, then using the cdf? Why not use the pdf directly for each problem? We will soon see that some pdfs (for example, the normal distribution) cannot be integrated analytically; they can only be integrated numerically. As a consequence, effort was put into pre integrating the pdfs for many common distributions, and then providing them in books of statistical tables. These tables are now embedded in programs such as Matlab, and can be accessed quite easily - much easier than integrating the pdf each time. The following example EM375: Probability - 8
shows how to use the cdf from a graph. You will soon get lots of practice using Matlab to calculate probabilities for other distributions! Continuing the example: a) What is the probability that a bearing will have a life expectancy of less than,000 hours? PP(xx < ) From graph, F() = 0.70 = 70%. This is the answer. Note that a more exact answer is FF() = 00 3 = 0.70370 = 70.37% b) What is the probability that a bearing will have a life expectancy between,000 and 30,000 hours? PP( < xx < 30) We already know that, from the graph, F() is about 0.70 From the graph, F(30) is about 0.96 PP( < xx < 30) = FF(30) FF() = 0.96 0.70 = 0.26 = 26% Since we have a mathematically simple cdf function, we could also use that to calculate these values more accurately. FF(30) = 1 00 xx 3 = 1 00 30 3 = 0.96292 FF() = 1 00 xx 3 = 1 00 3 = 0.70370 PP( < xx < 30) = FF(30) FF() = 0.96292 0.70370 = 0.25922 = 25.922% EM375: Probability - 9