From Probability, For the Enthusiastic Beginner (Draft version, March 2016) David Morin,

Size: px
Start display at page:

Download "From Probability, For the Enthusiastic Beginner (Draft version, March 2016) David Morin,"

Transcription

1 Chapter 4 Distributions From Probability, For the Enthusiastic Beginner (Draft version, March 2016) David Morin, morin@physics.harvard.edu At the beginning of Section 3.1, we introduced the concepts of random variables and probability distributions. A random variable is a variable that can take on certain numerical values with certain probabilities. The collection of these probabilities is called the probability distribution for the random variable. A probability distribution specifies how the total probability (which is always 1) is distributed among the various possible outcomes. In this chapter, we will discuss probability distributions in detail. In Section 4.1 we warm up with some examples of discrete distributions, and then in Section 4.2 we discuss continuous distributions. These involve the probability density, which is the main new concept in this chapter. It takes some getting used to, but we ll have plenty of practice with it. In Sections we derive and discuss a number of the more common and important distributions. They are, respectively, the uniform, Bernoulli, binomial, exponential, Poisson, and Gaussian (or normal) distributions. Parts of this chapter are a bit mathematical, but there s no way around this if we want to do things properly. However, we ve relegated some of the more technical issues to Appendices B and C. If you want to skip those and just accept the results that we derive there, that s fine. But you are strongly encouraged to at least take a look at Appendix B, where we derive many properties of the number e, which is the most important number in probability and statistics. 4.1 Discrete distributions In this section we ll give a few simple examples of discrete distributions. To start off, consider the results from Example 3 in Section 2.3.4, where we calculated the probabilities of obtaining the various possible numbers of Heads in five coin flips. We found: P(0) = 1 32, P(1) = 5, P(2) = 32, P(3) = 10 32, P(4) = 5 32, P(5) = (4.1) 182

2 4.1. Discrete distributions 183 These probabilities add up to 1, as they should. Fig. 4.1 shows a plot of P(n) versus n. The random variable here is the number of Heads, and it can take on the values of 0 through 5, with the above probabilities. P(n) 10/32 5/ n Figure 4.1: The probability distribution for the number of Heads in five coin flips. As we ve done in Fig. 4.1, the convention is to plot the random variable on the horizontal axis and the probability on the vertical axis. The collective information, given either visually in Fig. 4.1 or explicitly in Eq. (4.1), is the probability distribution. A probability distribution simply tells you what all the probabilities are for the values that the random variable can take. Note that P(n) in the present example is nonzero only if n takes on one of the discrete values, 0, 1, 2, 3, 4, or 5. It s a silly question to ask for the probability of getting 4.27 Heads, because n must of course be an integer. The probability of getting 4.27 Heads is trivially zero. Hence the word discrete in the title of this section. Another simple example of a discrete probability distribution is the one for the six possible outcomes of the roll of one die. The random variable in this setup is the number on the top face of the die. If the die is fair, then all six numbers have equal probabilities, so the probability for each is 1/6, as shown in Fig P(n) 1/ n Figure 4.2: The probability distribution for the roll of one die. What if the die isn t fair? For example, what if we make the 1 face heavier than the others by embedding a small piece of lead in the center of that face, just below the surface? The die is then more likely to land with the 1 face pointing down. The 6 face is opposite the 1, so the die is more likely to land with the 6 pointing up. Fig. 4.2 will therefore be modified by raising the 6 dot and lowering

3 184 Chapter 4. Distributions the other five dots; the sum of the probabilities must still be 1, of course. P 2 through P 5 are all equal, by symmetry. The exact values of all the probabilities depend in a complicated way on how the mass of the lead weight compares with the mass of the die, and also on the nature of both the die and the table on which the die is rolled (how much friction, how bouncy, etc.). As mentioned at the beginning of Section 3.1, a random variable is assumed to take on numerical values, by definition. So the outcomes of Heads and Tails for a single coin flip technically aren t random variables. But it still makes sense to plot the probabilities as shown in Fig. 4.3, even though the outcomes on the horizontal axis aren t associated with a random variable. Of course, if we define a random variable to be the number of Heads, then the Heads in the figure turns into a 1, and the Tails turns into a 0. In most situations, however, the outcomes take on numerical values right from the start, so we can officially label them as random variables. But even if they don t, we ll often take the liberty of still referring to the thing being plotted on the horizontal axis of a probability distribution as a random variable. P(face) 1/2 Tails Heads face Figure 4.3: The probability distribution for a single coin flip. 4.2 Continuous distributions Motivation Probability distributions are fairly straightforward when the random variable is discrete. You just list (or plot) the probabilities for each of the possible values of the random variable. These probabilities will always add up to 1. However, not everything comes in discrete quantities. For example, the temperature outside your house takes on a continuous set of values, as does the amount of water in a glass. (We ll ignore the atomic nature of matter!) In finding the probability distribution for a continuous random variable, you might think that the procedure should be exactly the same as in the discrete case. That is, if our random variable is the temperature at a particular location at noon tomorrow, then you might think that you simply have to answer questions of the form: What is the probability that the temperature at noon tomorrow will be 70 Fahrenheit?

4 4.2. Continuous distributions 185 Unfortunately, there is something wrong with this question, because it is too easy to answer. The answer is that the probability is zero, because there is simply no chance that the temperature at a specific time (and a specific location) will be exactly 70. If it s 70.1, that s not good enough. And neither is 70.01, nor even Basically, since the temperature takes on a continuous set of values (and hence an infinite number of possible values), the probability of a specific value occurring is 1/, which is zero.1 However, even though the above question ( What is the probability that the temperature at noon tomorrow will be 70? ) is a poor one, that doesn t mean we should throw in the towel and conclude that probability distributions don t exist for continuous random variables. They do in fact exist, because there are some useful questions we can ask. These useful questions take the general form of: What is the probability that the temperature at a particular location at noon tomorrow lies somewhere between 69 and 71? This question has a nontrivial answer, in the sense that it isn t automatically zero. And depending on what the forecast is for tomorrow, the answer might be something like 20%. We can also ask: What is the probability that the temperature at noon lies somewhere between 69.5 and 70.5? The answer to this question is smaller than the answer to the previous one, because it involves a range of only one degree instead of two degrees. If we assume that inside the range of 69 to 71 the temperature is equally likely to be found anywhere (which is a reasonable approximation although undoubtedly not exactly correct), and if the previous answer was 20%, then the present answer is (roughly) 10%, because the range is half the size. The point here is that the smaller the range, the smaller the chance that the temperature lies in that range. Conversely, the larger the range, the larger the chance that the temperature lies in that range. Taken to an extreme, if we ask for the probability that the temperature at noon lies somewhere between 100 and 200, then the answer is exactly equal to 1 (ignoring liquid nitrogen spills, forest fires, and such things!). In addition to depending on the size of the range, the probability also of course depends on where the range is located on the temperature scale. For example, the probability that the temperature at noon lies somewhere between 69 and 71 is undoubtedly different from the probability that it lies somewhere between 11 and 13. Both ranges have a span of two degrees, but if the given day happens to be in late summer, the temperature is much more likely to be around 70 than to be sub-freezing (let s assume we re in, say, Boston). To actually figure out the probabilities, many different pieces of data would have to be considered. In the present temperature example, the data would be of the meteorological type. But if we were interested in the probability that a random person is between 69 and 71 inches tall, then we d need to consider a whole different set of data. The lesson to take away from all this is that if we re looking at a random variable that can take on a continuous set of values, the probability that this random variable falls into a given range depends on three things. It depends on: 1Of course, if you re using a digital thermometer that measures the temperature to the nearest tenth of a degree, then it does make sense to ask for the probability that the thermometer reads, say, 70.0 degrees. This probability is generally nonzero. This is due to the fact that the reading on the digital thermometer is a discrete random variable, whereas the actual temperature is a continuous random variable.

5 186 Chapter 4. Distributions 1. the location of the range, 2. the size of the range, 3. the specifics of the situation we re dealing with. The third of these is what determines the probability density, which is a function whose argument is the location of the range. We ll now discuss probability densities Probability density Consider the plot in Fig. 4.4, which gives a hypothetical probability distribution for the temperature example we ve been discussing. This plot shows the probability distribution on the vertical axis, as a function of the temperature T (the random variable) on the horizontal axis. We have chosen to measure the temperature in Fahrenheit. We re denoting the probability distribution by2 ρ(t) instead of P(T), to distinguish it from the type of probability distribution we ve been talking about for discrete variables. The reason for this new notation is that ρ(t) is a probability density and not an actual probability. We ll talk about this below. When writing the functional form of a probability distribution, we ll denote probability densities with lowercase letters, like the ρ in ρ(t) or the f in f (x). And we ll denote actual probabilities with uppercase letters, like the P in P(n). 0.1 ρ(t) T Figure 4.4: A hypothetical probability distribution for the temperature. We haven t yet said exactly what we mean by ρ(t). But in any case, it s clear from Fig. 4.4 that the temperature is more likely to be near 70 than near 60. The following definition of ρ(t) allows us to be precise about what we mean by this. 2As mentioned at the beginning of Section 3.1, a random variable is usually denoted with an uppercase letter, while the actual values are denoted with lowercase letters. So we should technically be writing ρ(t) here. But since an uppercase T is the accepted notation for temperature, we ll use T for the actual value.

6 4.2. Continuous distributions 187 Definition of the probability density function, ρ(t): ρ(t) is the function of T that, when multiplied by a small interval T, gives the probability that the temperature lies between T and T + T. That is, P(temp lies between T and T + T) = ρ(t) T. (4.2) Note that the lefthand side contains an actual probability P, whereas the righthand side contains a probability density, ρ(t). The latter needs to be multiplied by a range of T (or whatever quantity we re dealing with) in order to obtain an actual probability. The above definition is relevant to any continuous random variable, of course, not just temperature. Eq. (4.2) might look a little scary, but a few examples should clear things up. From Fig. 4.4, it looks like ρ(70 ) is about So if we pick T = 1, we find that the probability of the temperature lying between 70 and 71 is about ρ(t) T = (0.07)(1) = 0.07 = 7%. (4.3) If we instead pick a smaller T, say 0.5, we find that the probability of the temperature lying between 70 and 70.5 is about (0.07)(0.5) = 3.5%. And if we pick an even smaller T, say 0.1, we find that the probability of the temperature lying between 70 and 70.1 is about (0.07)(0.1) = 0.7%. Similarly, we can apply Eq. (4.2) to any other value of T. For example, it looks like ρ(60 ) is about So if we pick T = 1, we find that the probability of the temperature lying between 60 and 61 is about (0.02)(1) = 2%. And as above, we can pick other values of T too. Note that, in accordance with Eq. (4.2), we have been using the value of ρ at the lower end of the given temperature interval. That is, when the interval was 70 to 71, we used ρ(70 ) and then multiplied this by T. But couldn t we just as well use the value of ρ at the upper end of the interval? That is, couldn t the righthand side of Eq. (4.2) just as well be ρ(t + T) T? Indeed it could. But as long as T is small, it doesn t matter much which value of ρ we use. They will both give essentially the same answer. See the second remark below. Remember that three inputs are necessary when finding the probability that the temperature lies in a specified range. As we noted at the end of Section 4.2.1, the first input is the value of T we re concerned with, the second is the range T, and the third is the information encapsulated in the probability density function, ρ(t), evaluated at the given value of T. The latter two of these three quantities are the two quantities that are multiplied together on the righthand side of Eq. (4.2). Knowing only one of these isn t enough to give you a probability. To recap, there is a very important difference between the probability distribution for a continuous random variable and that for a discrete random variable. For a continuous variable, the probability distribution consists of a probability density. But for a discrete variable, it consists of actual probabilities. We plot a density for a continuous distribution, because it wouldn t make sense to plot actual probabilities, since they re all zero. This is true because the probability of obtaining exactly a particular value is zero, since there is an infinite number of possible values. Conversely, we plot actual probabilities for a discrete distribution, because it wouldn t make sense to plot a density, since it consists of a collection of infinite

7 188 Chapter 4. Distributions spikes. This is true because on a die roll, for example, there is a 1/6 chance of obtaining a number between, say, and The probability density at the outcome of 5, which from Eq. (4.2) equals the probability divided by the interval length, is then (1/6)/( ), which is huge. And the interval can be made arbitrarily small, which means that the density is arbitrarily large. To sum up, the term probability distribution applies to both continuous and discrete variables, whereas the term probability density applies only to continuous variables. Remarks: 1. ρ(t) is a function of T, so it depends on what units we re using to measure T. We used Fahrenheit above, but what if we instead want to use Celsius? Problem 4.1 addresses this issue (but you will need to read Section first). 2. Note the inclusion of the word small in the definition of the probability density in Eq. (4.2). The reason for this word is that we want ρ(t) to be (roughly) constant over the specified range. If T is small enough, then this is approximately true. If ρ(t) varied greatly over the range of T, then it wouldn t be clear which value of ρ(t) we should multiply by T to obtain the probability. The point is that if T is small enough, then all of the ρ(t) values are roughly the same, so it doesn t matter which one we pick. An alternative definition of the density ρ(t) is P ( temp lies between T ( T)/2 and T + ( T)/2 ) = ρ(t) T. (4.4) The only difference between this definition and the one in Eq. (4.2) is that we re now using the value of ρ(t) at the midpoint of the temperature range, instead of the leftend value we used in Eq. (4.2). Both definitions are equally valid, because they give essentially the same result for ρ(t), provided that T is small. Similarly, we could use the value of ρ(t) at the right end of the temperature range. How small do we need T to be? The answer to this will be evident when we talk about probability in terms of area in Section In short, we need the change in ρ(t) over the span of T to be small compared with the values of ρ(t) in that span. 3. The probability density function involves only (1) the value of T (or whatever) we re concerned with, and (2) the specifics of the situation at hand (meteorological data in the above temperature example, etc.). The density is completely independent of the arbitrary value of T that we choose. This is how things work with any kind of density. For example, consider the mass density of gold. This mass density is a property of the gold itself. More precisely, it is a function of each point in the gold. For pure gold, the density is constant throughout the volume, but we could imagine impurities that would make the mass density be a varying function of position, just as the above probability density is a varying function of temperature. Let s call the mass density ρ(r), where r signifies the possible dependence of ρ on the location of a given point within the volume. (The position of a given point can be described by the vector pointing from the origin to the point. And vectors are generally denoted by boldface letters like r.) Let s call the small volume we re concerned with V. Then the mass in the small volume V is given by the product of the density and the volume, that is, ρ(r) V. This is directly analogous to the fact that the probability in the above temperature example is given by the product of the probability density and the temperature span,

8 4.2. Continuous distributions 189 that is, ρ(t) T. The correspondence among the various quantities is Mass in V around location r Prob that temp lies in T around T ρ(r) ρ(t) V T. (4.5) Probability equals area The graphical interpretation of the product ρ(t) T in Eq. (4.2) is that it is the area of the rectangle shown in Fig This is true because T is the base of the rectangle, and ρ(t) is the height. 0.1 ρ(t) T Figure 4.5: Interpretation of the product ρ(t) T as an area. We have chosen T to be 2 in the figure. With this choice, the area of the rectangle, which equals ρ(70 ) (2 ), gives a reasonably good approximation to the probability that the temperature lies between 70 and 72. But it isn t exact, because ρ(t) isn t constant over the 2 interval. A better approximation to the probability that the temperature lies between 70 and 72 is achieved by splitting the 2 interval into two intervals of 1 each, and then adding up the probabilities of lying in each of these two intervals. These two probabilities are approximately equal to ρ(70 ) (1 ) and ρ(71 ) (1 ), and the two corresponding rectangles are shown in Fig But again, the sum of the areas of these two rectangles is still only an approximate result for the true probability that the temperature lies between 70 and 72, because ρ(t) isn t constant over the 1 intervals either. A better approximation is achieved by splitting the 1 intervals into smaller intervals, and then again into even smaller ones. And so on. When we get to the point of having 100 or 1000 extremely thin rectangles, the sum of their areas will essentially be the area shown in Fig This area is the correct probability that the temperature lies between 70 and 72. So in retrospect, we see that the rectangular area in Fig. 4.5 exceeds the true probability by the area of the tiny triangular-ish region in the upper righthand corner of the rectangle. We therefore arrive at a more precise definition (compared with Eq. (4.2)) of the probability density, ρ(t):

9 190 Chapter 4. Distributions 0.1 ρ(t) T Figure 4.6: Subdividing the area, to produce a better approximation to the probability. 0.1 ρ(t) T Figure 4.7: The area below the curve between 70 and 72 equals the probability that the temperature lies between 70 and 72. Improved definition of the probability density function, ρ(t): ρ(t) is the function of T for which the area under the ρ(t) curve between T and T + T gives the probability that the temperature (or whatever quantity we re dealing with) lies between T and T + T. This is an exact definition, and there is no need for T to be small, as there was in the definition in Eq. (4.2). The difference is that the present definition involves the exact area, whereas Eq. (4.2) involved the area of a rectangle (via simple multiplication by T), which was only an approximation. But technically the only thing we need to add to Eq. (4.2) is the requirement that we take the T 0 limit. That makes the definition rigorous. The total area under any probability density curve must be 1, because this area equals the probability that the temperature (or whatever) takes on some value between and +, and because every possible result is included in the to + range. However, in any realistic case, the density is essentially zero outside a specific finite region. So there is essentially no contribution to the area from the parts

10 4.3. Uniform distribution 191 of the plot outside that region. There is therefore no need to go to ±. The total area under each of the curves in the above figures, including the tails on either side which we haven t bothered to draw, is indeed equal to 1 (at least roughly; the curves were drawn by hand). Given a probability density function f (x), the cumulative distribution function F(x) is defined to be the probability that X takes on a value that is less than or equal to x. That is, F(x) = P(X x). For a continuous distribution, this definition implies that F(x) equals the area under the f (x) curve from up to the given x value. A quick corollary is that the probability P(a < x b) that x lies between two given values a and b is equal to F(b) F(a). For a discrete distribution, the definition F(x) = P(X x) still applies, but we now calculate P(X x) by forming a discrete sum instead of finding an area. Although the cumulative distribution function can be very useful in probability and statistics, we won t use it much in this book. We ll now spend a fair amount of time in Sections discussing some common types of probability distributions. There is technically an infinite number of possible distributions, although only a hundred or so come up frequently enough to have names. And even many of these are rather obscure. A handful, however, come up again and again in a variety of settings, so we ll concentrate on these. They are the uniform, Bernoulli, binomial, exponential, Poisson, and Gaussian (or normal) distributions. 4.3 Uniform distribution We ll start with a very simple continuous probability distribution, one that is uniform over a given interval, and zero otherwise. Such a distribution might look like the one shown in Fig If the distribution extends from x 1 to x 2, then the value of ρ(x) in that region must be 1/(x 2 x 1 ), so that the total area is 1. 1/(x 2 -x 1 ) ρ(x) x 1 x 2 x Figure 4.8: A uniform distribution. This type of distribution could arise, for example, from a setup where a rubber ball bounces around in an empty rectangular room. When it finally comes to rest, we measure its distance x from a particular one of the walls. If you initially throw the ball hard enough, then it s a pretty good approximation to say that x is equally likely to take on any value between 0 and L, where L is the length of the room in the relevant direction. In this setup, the x 1 in Fig. 4.8 equals 0 (so we would need to shift the rectangle to the left), and the x 2 equals L.

11 192 Chapter 4. Distributions The random variable here is X, and the value it takes is denoted by x. So x is what we plot on the horizontal axis. Since we re dealing with a continuous distribution, we plot the probability density (not the probability!) on the vertical axis. If L equals 10 feet, then outside the region 0 < x < 10, the probability density ρ(x) equals zero. Inside this region, the density equals the total probability divided by the total interval, which gives 1 per 10 feet, or equivalently 1/10 per foot. If we want to find the actual probability that the ball ends up between, say, x = 6 and x = 8, then we just multiply ρ(x) by the interval length, which is 2 feet. The result is (1/10 per foot)(2 feet), which equals 2/10 = 1/5. This makes sense, of course, because the 2-foot interval is 1/5 of the total distance. A uniform density is easy to deal with, because the area under a given part of the curve (which equals the probability) is simply a rectangle. And the area of a rectangle is just the base times the height, which is the interval length times the density. This is exactly the product we formed above. When the density isn t uniform, it can be very difficult sometimes to find the area under a given part of the curve. Note that the larger the region of nonzero ρ(x) in a uniform distribution, the smaller the value of ρ(x). This follows from the fact that the total area under the density curve (which is just a straight line segment in this case) must equal 1. So if the base becomes longer, the height must become shorter. 4.4 Bernoulli distribution We ll now consider a very simple discrete distribution, called the Bernoulli distribution. This is the distribution for a process in which only two possible outcomes, 1 and 0, can occur, with probabilities p and 1 p, respectively. (They must add up to 1, of course.) The plot of this probability distribution is shown in Fig It is common to call the outcome of 1 a success and the outcome of 0 a failure. A special case of a Bernoulli distribution is the distribution for a coin toss, where the probabilities for Heads and Tails (which we can assign the values of 1 and 0, respectively) are both equal to 1/2. p 1-p P 0 1 Figure 4.9: A Bernoulli distribution takes on the values 1 and 0 with probabilities p and 1 p. The Bernoulli distribution is the simplest of all distributions, with the exception of the trivial case where only one possible outcome can occur, which therefore has

12 4.5. Binomial distribution 193 a probability of 1. The uniform and Bernoulli distributions are simple enough that there isn t much to say. In contrast, the distributions in the following four sections (binomial, exponential, Poisson, and Gaussian) are a bit more interesting, so we ll have plenty to say about them. 4.5 Binomial distribution The binomial distribution, which is discrete, is an extension of the Bernoulli distribution. The binomial distribution is defined to be the probability distribution for the total number of successes that arise in an arbitrary number of independent and identically distributed Bernoulli processes. An example of a binomial distribution is the probability distribution for the number of Heads in, say, five coin tosses, which we discussed in Section 4.1. We could just as well pick any other number of tosses. In the case of five coin tosses, each coin toss is a Bernoulli process. When we put all five tosses together and look at the total number of successes (Heads), we get a binomial distribution. Let s label the total number of successes as k. In this specific example, there are n = 5 Bernoulli processes, with each one having a p = 1/2 probability of success. The probability distribution P(k) is simply the one we plotted earlier in Fig. 4.1, where we counted the number of Heads. Let s now find the binomial distribution associated with a general number n of independent Bernoulli trials, each with the same probability of success, p. So our goal is to find the value of P(k) for all of the different possible values of the total number of successes, k. The possible values of k range from 0 up to the number of trials, n. To calculate the binomial distribution (for given n and p), we first note that p k is the probability that a specific set of k of the n Bernoulli processes all yield success, because each of the k processes has a p probability of yielding success. We then need the other n k processes to not yield success, because we want exactly k successes. This happens with probability (1 p) n k, because each of the n k processes has a 1 p probability of yielding failure. The probability that a specific set of k processes (and no others) all yield success is therefore p k (1 p) n k. Finally, since there are ( n k) ways to pick a specific set of k processes, we see that the probability that exactly k of the n processes yield success is P(k) = ( ) n p k (1 p) n k (binomial distribution) (4.6) k This is the desired binomial distribution. Note that this distribution depends on two parameters the number n of Bernoulli trials and the probability p of success in each trial. If you want to make these parameters explicit, you can write the Binomial distribution P(k) as B n, p (k). That is, B n, p (k) = But we ll generally just use the simple P(k) notation. ( ) n p k (1 p) n k. (4.7) k

13 194 Chapter 4. Distributions In the special case of a binomial distribution generated from n coin tosses, we have p = 1/2. So Eq. (4.6) gives the probability of obtaining k Heads as P(k) = 1 ( ) n 2 n. (4.8) k To recap: In Eq. (4.6), n is the total number of Bernoulli processes, p is the probability of success in each Bernoulli process, and k is the total number of successes in the n processes. (So k can be anything from 0 to n.) Fig shows the binomial distribution for the cases of n = 30 and p = 1/2 (which arises from 30 coin tosses), and n = 30 and p = 1/6 (which arises from 30 die rolls, with a particular one of the six numbers representing success). P(k) P(k) k k n = 30, p = 1/2 n = 30, p = 1/6 Figure 4.10: Two binomial distributions with n = 30 but different values of p. Example (Equal probabilities): Given n, for what value of p is the probability of zero successes equal to the probability of one success? Solution: In Eq. (4.6) we want P(0) to equal P(1). This gives ( ) ( ) n p 0 (1 p) n 0 n = p 1 (1 p) n = 1 1 (1 p) n = n p (1 p) n 1 = 1 p = np = p = 1 n + 1. (4.9) This p = 1/(n + 1) value is the special value of p for which various competing effects cancel. On one hand, P(1) contains an extra factor of n from the ( n 1 ) coefficient, which arises from the fact that there are n different ways for one success to happen. But on the other hand, P(1) also contains a factor of p, which arises from the fact that one success does happen. The first of these effects makes P(1) larger than P(0), while the second makes it smaller.3 The effects cancel when p = 1/(n + 1). Fig shows the plot for n = 10 and p = 1/11. The p = 1/(n + 1) case is the cutoff between the maximum of P(k) occurring when k is zero or nonzero. If p is larger than 1/(n + 1), as it is in both plots in Fig Another effect is that P(1) is larger because it contains one fewer factor of (1 p). But this effect is minor when p is small, which is the case if n is large, due to the p = 1/(n + 1) form of the answer.

14 4.5. Binomial distribution 195 P(k) k n = 10, p = 1/11 Figure 4.11: P(0) equals P(1) if p = 1/(n + 1). above, then the maximum occurs at a nonzero value of k. That is, the distribution has a bump. On the other hand, if p is smaller than 1/(n + 1), then the maximum occurs at k = 0. That is, the distribution has its peak at k = 0 and falls off from there. Having derived the binomial distribution in Eq. (4.6), there is a simple double check that we can perform on the result. Since the number of successes, k, can take on any integer value from 0 to n, the sum of the P(k) probabilities from k = 0 to k = n must equal 1. The P(k) expression in Eq. (4.6) does indeed satisfy this requirement, due to the binomial expansion, which tells us that ( ) n n ( ) n p + (1 p) = p k (1 p) n k. (4.10) k k=0 This is just Eq. (1.21) from Section 1.8.3, with a = p and b = 1 p. The lefthand side of Eq. (4.10) is simply 1 n = 1. And each term in the sum on the righthand side is a P(k) term from Eq. (4.6). So Eq. (4.10) becomes 1 = n P(k), (4.11) k=0 as we wanted to show. You are encouraged to verify this result for the probabilities in, say, the left plot in Fig Feel free to make rough estimates of the probabilities when reading them off the plot. You will find that the sum is indeed 1, up to the rough estimates you make. The task of Problem 4.4 is to use Eq. (3.4) to explicitly demonstrate that the expectation value of the binomial distribution in Eq. (4.6) equals pn. In other words, if our binomial distribution is derived from n Bernoulli trials, each having a probability p of success, then we should expect a total of pn successes (on average, if we do a large number of sets of n trials). This must be true, of course, because a fraction p of the n trials yield success, on average, by the definition of p for the given Bernoulli process.

15 196 Chapter 4. Distributions Remark: We should emphasize what is meant by a probability distribution. Let s say that you want to experimentally verify that the left plot in Fig is the correct probability distribution for the total number of Heads that show up in 30 coin flips. You of course can t do this by flipping a coin just once. And you can t even do it by flipping a coin 30 times, because all you ll get from that is just one number for the total number of Heads. For example, you might obtain 17 Heads. In order to experimentally verify the distribution, you need to perform a large number of sets of 30 coin flips, and you need to record the total number of Heads you get in each 30-flip set. The result will be a long string of numbers such as 13, 16, 15, 16, 18, 14, 11, 17,.... If you then calculate the fractions of the time that each number appears, these fractions should (roughly) agree with the probabilities shown in Fig The longer the string of numbers, the better the agreement, in general. The main point here is that the distribution does t say much about one particular set of 30 flips. Rather, it says what the expected distribution of outcomes is for a large number of sets of 30 flips. 4.6 Exponential distribution In Sections we ll look at three probability distributions (exponential, Poisson, and Gaussian) that are a bit more involved than the three we ve just discussed (uniform, Bernoulli, and binomial). We ll start with the exponential distribution, which takes the general form, ρ(t) = Ae bt, (4.12) where A and b are quantities that depend on the specific situation at hand. We will find below in Eq. (4.26) that these quantities must be related in a certain way in order for the total probability to be 1. The parameter t corresponds to whatever the random variable is. The exponential distribution is a continuous one, so ρ(t) is a probability density. The most common type of situation where this distribution arises is the following. Consider a repeating event that happens completely randomly in time. By completely randomly we mean that there is a uniform probability that the event happens at any given instant (or more precisely, in any small time interval of a given length), independent of what has already happened. That is, the process has no memory. The exponential distribution that we ll eventually arrive at (after a lot of work!) in Eq. (4.26) gives the probability distribution for the waiting time until the next event occurs. Since the time t is a continuous quantity, we ll need to develop some formalism to analyze the distribution. To ease into it, let s start with the slightly easier case where time is assumed to be discrete Discrete case Consider a process where we roll a hypothetical 10-sided die once every second. So time is discretized into 1-second intervals. It s actually not necessary to introduce time here at all. We could simply talk about the number of iterations of the process. But it s easier to talk about things like the waiting time than the number of iterations you need to wait for. So for convenience, we ll discuss things in the context of time. If the die shows a 1, we ll consider that a success. The other nine numbers represent failure. There are two reasonable questions we can ask: What is the average

16 4.6. Exponential distribution 197 waiting time (that is, the expectation value of the waiting time) between successes? And what is the probability distribution of the waiting times between successes? Average waiting time It is fairly easy to determine the average waiting time. There are 10 possible numbers on the die, so on average we can expect 1/10 of them to be 1 s. If we run the process for a long time, say, an hour (which consists of 3600 seconds), then we can expect about s. The average waiting time between successes is therefore (3600 seconds)/360 = 10 seconds. More generally, if the probability of success in each trial is p, then the average waiting time is 1/p (assuming that the trials happen at 1-second intervals). This can be seen by the same reasoning as above. If we perform n trials of the process, then pn of them will yield success, on average. The average waiting time between successes is the total time (n) divided by the number of successes (pn): Average waiting time = n pn = 1 p. (4.13) Note that the preceding reasoning gives us the average waiting time, without requiring any knowledge of the actual probability distribution of the waiting times (which we will calculate below). Of course, once we do know what the probability distribution is, we should be able to calculate the average (the expectation value) of the waiting times. This is the task of Problem 4.7. Distribution of waiting times Finding the probability distribution of the waiting times requires a little more work than finding the average waiting time. For the 10-sided die example, the question we re trying to answer is: What is the probability that if we consider two successive 1 s, the time between them will be 6 seconds? Or 30 seconds? Or 1 second? And so on. Although the average waiting time is 10 seconds, this certainly doesn t mean that the waiting time will always be 10 seconds. In fact, we will find below that the probability that the waiting time is exactly 10 seconds is quite small. Let s be general and say that the probability of success in each trial is p (so p = 1/10 in our present setup). Then the question is: What is the probability, P(k), that we will have to wait exactly k iterations (each of which is 1 second here) to obtain the next success? To answer this, note that in order for the next success to happen on the kth iteration, there must be failure (which happens with probability 1 p) on the first k 1 iterations, and then success on the kth one. The probability of this happening is P(k) = (1 p) k 1 p (geometric distribution) (4.14) This is the desired (discrete) probability distribution for the waiting time. This distribution goes by the name of the geometric distribution, because the probabilities form a geometric progression, due to the increasing power of the (1 p) factor. The geometric distribution is the discrete version of the exponential distribution that we ll arrive at in Eq. (4.26) below.

17 198 Chapter 4. Distributions Eq. (4.14) tells us that the probability that the next success comes on the very next iteration is p, the probability that it comes on the second iteration is (1 p)p, the probability that it comes on the third iteration is (1 p) 2 p, and so on. Each probability is smaller than the previous one by the factor (1 p). A plot of the distribution for p = 1/10 is shown in Fig The distribution is maximum at k = 1 and falls off from that value. Even though k = 10 is the average waiting time, the probability of the waiting time being exactly k = 10 is only P(10) = (0.9) 9 (0.1) 0.04 = 4% P(k) (p = 1/10) k Figure 4.12: The geometric distribution with p = 1/10. If p is large (close to 1), the plot of P(k) starts high (at p, which is close to 1) and then falls off quickly, because the factor (1 p) is close to 0. On the other hand, if p is small (close to 0), the plot of P(k) starts low (at p, which is close to 0) and then falls off slowly, because the factor (1 p) is close to 1. As a double check on the result in Eq. (4.14), we know that the next success has to eventually happen sometime, so the sum of all the P(k) probabilities must be 1. These P(k) probabilities form a geometric series whose first term is p and whose ratio is 1 p. The general formula for the sum of a geometric series with first term a and ratio r is a/(1 r), so we have P(1) + P(2) + P(3) + = p + p(1 p) + p(1 p) 2 + p = 1 (1 p) = 1, (4.15) as desired. As another check, we can verify that the expectation value (the average) of the waiting time for the geometric distribution in Eq. (4.14) equals 1/p, as we already found above; see Problem 4.7. You are encouraged to use a coin to experimentally verify Eq. (4.14) (or equivalently, the plot analogous to Fig. 4.12) for the case of p = 1/2. Just flip a coin as many times as you can in ten minutes, each time writing down a 1 if you get Heads and a 0 if you get Tails. Then make a long list of the waiting times between the 1 s. Then count up the number of one-toss waits, the number of two-toss waits, and so on. Then divide each of these numbers by the total number of waits (not the total number of tosses!) to find the probability of each waiting length. The results should

18 4.6. Exponential distribution 199 be (roughly) consistent with Eq. (4.14) for p = 1/2. In this case, the probabilities in Eq. (4.14) for k = 1, 2, 3, 4,... are 1/2, 1/4, 1/8, 1/16, Rates, expectation values, and probabilities Let s now consider the case where time is a continuous quantity. That is, let s assume that we can have a successful event at any instant, not just at the evenlyspaced 1-second marks as above. A continuous process whose probability is uniform in time can be completely described by just one number the average rate of success, which we ll call λ. We generally won t bother writing the word average, so we ll just call λ the rate. Before getting into the derivation of the continuous exponential distribution in Section 4.6.3, we ll need to talk a little about rates. The rate λ can be determined by counting the number of successful events that occur during a long time interval, and then dividing by this time. For example, if 300 (successful) events happen during 100 minutes, then the rate λ is 3 events per minute. Of course, if you count the number of events in a different span of 100 minutes, you will most likely get a slightly different number, perhaps 313 or 281. But in the limit of a very long time interval, you will find essentially the same rate, independent of which specific long interval you use. If the rate λ is 3 events per minute, you can alternatively write this as 1 event per 20 seconds, or 1/20 of an event per second. There is an infinite number of ways to write λ, and it s personal preference which one you pick. Just remember that you have to state the per time interval you re using. If you just say that the rate is 3, that doesn t mean anything. What is the expectation value of the number of events that happen during a time t? This expected number simply equals the product λt, from the definition of λ. If the expected number were anything other than λt, then if we divided it by t to obtain the rate, we wouldn t get λ. If you want to be a little more rigorous, consider a very large number n of intervals with length t. The total time in these intervals is nt. This total time is very large, so the number of events that happen during this time is (approximately) equal to (nt)λ, by the definition of λ. The expected number of events in each of the n intervals with length t is therefore ntλ/n = λt, as above. So we can write (Expected number of events in time t) = λt (4.16) In the above setup where λ equals 3 events per minute, the expected number of events that happen in, say, 5 minutes is λt = (3 events per minute)(5 minutes) = 15 events. (4.17) Does this mean that we are guaranteed to have exactly 15 events during a particular 5-minute span? Absolutely not. We can theoretically have any number of events, although there is essentially zero chance that the number will differ significantly from 15. (The probability of obtaining the various numbers of events is governed by the Poisson distribution, which we ll discuss in Section 4.7.) But the expectation value is 15. That is, if we perform a large number of 5-minute trials and then

19 200 Chapter 4. Distributions calculate the average number of events that occur in each trial, the result will be close to 15. A trickier question to ask is: What is the probability that exactly one event happens during a time t? Since λ is the rate, you might think that you can just multiply λ by t, as we did above, to say that the probability is λt. But this certainly can t be correct, because it would imply a probability of 15 for a 5-minute interval in the above setup. This is nonsense, because probabilities can t be larger than 1. If we instead pick a time interval of 20 seconds (1/3 of a minute), we obtain a λt value of 1. This doesn t have the fatal flaw of being larger than 1, but it has another issue, in that it says that exactly one event is guaranteed to happen during a 20-second interval. This can t be correct either, because it s certainly possible for zero (or two or three, etc.) events to occur. We ll figure out the exact probabilities of these numbers in Section 4.7. The strategy of multiplying λ by t to obtain a probability doesn t seem to work. However, there is one special case where it does work. If the time interval is extremely small (let s call it ϵ, which is a standard letter to use for something that is very small), then it is true that the probability of exactly one event occurring during the ϵ time interval is essentially equal to λϵ. We re using the word essentially because, although this statement is technically not true, it becomes arbitrarily close to being true in the limit where ϵ approaches zero. In the above example with λ = 1/20 events per second, the statement, λt is the probability that exactly one event happens during a time t, is a lousy approximation if t = 20 seconds, a decent approximation if t = 2 seconds, and a very good approximation if t = 0.2 seconds. And it only gets better as the time interval gets smaller. We ll explain why in the first remark below. We can therefore say that if P ϵ (1) stands for the probability that exactly one event happens during a small time interval ϵ, then P ϵ (1) λϵ (if ϵ is very small) (4.18) The smaller ϵ is, the better this approximation is. Technically, the condition in Eq. (4.18) is really if λϵ is very small. But we ll generally be dealing with normal sized λ s, so λϵ being small is equivalent to ϵ being small. When we deal with continuous time below, we ll actually be taking the ϵ 0 limit. In this mathematical limit, the sign in Eq. (4.18) becomes an exact = sign. To sum up: If t is very small, then λt is both the expected number of events that happen during the time t and (essentially) the probability that exactly one event happens during the time t. If t isn t very small, then λt is only the expected number of events. Remarks: 1. We claimed above that λt equals the probability of exactly one event occurring, only if t is very small. The reason for this restriction is that if t isn t small, then there is the possibility of multiple events occurring during the time t. We can be explicit about this as follows. Since we know from Eq. (4.16) that the expected number of events during

20 4.6. Exponential distribution 201 any time t is λt, we can use the expression for the expectation value in Eq. (3.4) to write λt = P t (0) 0 + P t (1) 1 + P t (2) 2 + P t (3) 3 +, (4.19) where P t (k) is the probability of obtaining exactly k events during the time t. Solving for P t (1) gives P t (1) = λt P t (2) 2 P t (3) 3 +. (4.20) We see that P t (1) is smaller than λt due to the P t (2) and P t (3), etc., probabilities. So P t (1) isn t equal to λt. However, if all of the probabilities of multiple events occurring (P t (2), P t (3), etc.) are very small, then P t (1) is essentially equal to λt. And this is exactly what happens if the time interval is very small. For small times, there is hardly any chance of the event even occurring once. So it is even less likely that it will occur twice, and even less likely for three times, etc. We can be a little more precise about this. The following argument isn t completely rigorous, but it should convince you that if t is very small, then P t (1) is essentially equal to λt. If t is very small, then assuming we don t know yet that P t (1) equals λt, we can still say that it should be roughly proportional to λt. This is true because if an event has only a tiny chance of occurring, then if you cut λ in half, the probability is essentially cut in half. Likewise if you cut t in half. This proportionality then implies that the probability that exactly two events occur is essentially proportional to (λt) 2. We ll see in Section 4.7 that there is actually a factor of 1/2 involved here, but that is irrelevant in the present argument. The important point is the quadratic nature of (λt) 2. If λt is sufficiently small, then (λt) 2 is negligible compared with λt. Likewise for P t (3) (λt) 3, etc. We can therefore ignore the scenarios where multiple events occur. So with t ϵ, Eq. (4.20) becomes P ϵ (1) λϵ P ϵ (2) 2 P ϵ (3) 3 +, (4.21) in agreement with Eq. (4.18). As mentioned above, if λϵ is small, it is because ϵ is small, at least in the situations we ll be dealing with. 2. Imagine drawing the λ vs. t curve. We have put curve in quotes because the curve is actually just a straight horizontal line, since we re assuming a constant λ. If we consider a time interval t, the associated area under the curve equals λ t, because we have a simple rectangular region. So from Eq. (4.18), this area gives the probability that an event occurs during a time t, provided that t is very small. This might make you think that λ can be interpreted as a probability distribution, because we found in Section that the area under a distribution curve gives the probability. However, the λ curve cannot be interpreted as a probability distribution, because this areaequals-probability result holds only for very small t. The area under a distribution curve has to give the probability for any interval on the horizontal axis. The λ curve doesn t satisfy this property. The total area under the λ curve is infinite (because the straight horizontal line extends for all time), whereas actual probability distributions must have a total area of Since only one quantity, λ, is needed to describe everything about a random process whose probability is uniform in time, any other quantity we might want to determine must be able to be written in terms of λ. This will become evident below.

Probability, For the Enthusiastic Beginner (Exercises, Version 1, September 2016) David Morin,

Probability, For the Enthusiastic Beginner (Exercises, Version 1, September 2016) David Morin, Chapter 8 Exercises Probability, For the Enthusiastic Beginner (Exercises, Version 1, September 2016) David Morin, morin@physics.harvard.edu 8.1 Chapter 1 Section 1.2: Permutations 1. Assigning seats *

More information

From Probability, For the Enthusiastic Beginner (Draft version, March 2016) David Morin,

From Probability, For the Enthusiastic Beginner (Draft version, March 2016) David Morin, Chapter 2 Probability From Probability, For the Enthusiastic Beginner (Draft version, March 2016 David Morin, morin@physics.harvard.edu Having learned in Chapter 1 how to count things, we can now talk

More information

CIS 2033 Lecture 5, Fall

CIS 2033 Lecture 5, Fall CIS 2033 Lecture 5, Fall 2016 1 Instructor: David Dobor September 13, 2016 1 Supplemental reading from Dekking s textbook: Chapter2, 3. We mentioned at the beginning of this class that calculus was a prerequisite

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics February 26, 2018 CS 361: Probability & Statistics Random variables The discrete uniform distribution If every value of a discrete random variable has the same probability, then its distribution is called

More information

1 Normal Distribution.

1 Normal Distribution. Normal Distribution.. Introduction A Bernoulli trial is simple random experiment that ends in success or failure. A Bernoulli trial can be used to make a new random experiment by repeating the Bernoulli

More information

1 What is the area model for multiplication?

1 What is the area model for multiplication? for multiplication represents a lovely way to view the distribution property the real number exhibit. This property is the link between addition and multiplication. 1 1 What is the area model for multiplication?

More information

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table. MA 1125 Lecture 15 - The Standard Normal Distribution Friday, October 6, 2017. Objectives: Introduce the standard normal distribution and table. 1. The Standard Normal Distribution We ve been looking at

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Lines and Their Equations

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Lines and Their Equations ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 017/018 DR. ANTHONY BROWN. Lines and Their Equations.1. Slope of a Line and its y-intercept. In Euclidean geometry (where

More information

MAT Mathematics in Today's World

MAT Mathematics in Today's World MAT 1000 Mathematics in Today's World Last Time We discussed the four rules that govern probabilities: 1. Probabilities are numbers between 0 and 1 2. The probability an event does not occur is 1 minus

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER 2 2017/2018 DR. ANTHONY BROWN 5.1. Introduction to Probability. 5. Probability You are probably familiar with the elementary

More information

Basic Probability. Introduction

Basic Probability. Introduction Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with

More information

Figure 1: Doing work on a block by pushing it across the floor.

Figure 1: Doing work on a block by pushing it across the floor. Work Let s imagine I have a block which I m pushing across the floor, shown in Figure 1. If I m moving the block at constant velocity, then I know that I have to apply a force to compensate the effects

More information

Fundamentals of Probability CE 311S

Fundamentals of Probability CE 311S Fundamentals of Probability CE 311S OUTLINE Review Elementary set theory Probability fundamentals: outcomes, sample spaces, events Outline ELEMENTARY SET THEORY Basic probability concepts can be cast in

More information

CS 124 Math Review Section January 29, 2018

CS 124 Math Review Section January 29, 2018 CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Name: Firas Rassoul-Agha

Name: Firas Rassoul-Agha Midterm 1 - Math 5010 - Spring 016 Name: Firas Rassoul-Agha Solve the following 4 problems. You have to clearly explain your solution. The answer carries no points. Only the work does. CALCULATORS ARE

More information

Introducing Proof 1. hsn.uk.net. Contents

Introducing Proof 1. hsn.uk.net. Contents Contents 1 1 Introduction 1 What is proof? 1 Statements, Definitions and Euler Diagrams 1 Statements 1 Definitions Our first proof Euler diagrams 4 3 Logical Connectives 5 Negation 6 Conjunction 7 Disjunction

More information

Slope Fields: Graphing Solutions Without the Solutions

Slope Fields: Graphing Solutions Without the Solutions 8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,

More information

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Spring 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate

More information

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences Random Variables Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

P (A) = P (B) = P (C) = P (D) =

P (A) = P (B) = P (C) = P (D) = STAT 145 CHAPTER 12 - PROBABILITY - STUDENT VERSION The probability of a random event, is the proportion of times the event will occur in a large number of repititions. For example, when flipping a coin,

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14 PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14 GENERAL INFO The goal of this lab is to determine the speed of sound in air, by making measurements and taking into consideration the

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 11: Geometric Distribution Poisson Process Poisson Distribution Geometric Distribution The Geometric

More information

MA554 Assessment 1 Cosets and Lagrange s theorem

MA554 Assessment 1 Cosets and Lagrange s theorem MA554 Assessment 1 Cosets and Lagrange s theorem These are notes on cosets and Lagrange s theorem; they go over some material from the lectures again, and they have some new material it is all examinable,

More information

Quadratic Equations Part I

Quadratic Equations Part I Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing

More information

Continuum Probability and Sets of Measure Zero

Continuum Probability and Sets of Measure Zero Chapter 3 Continuum Probability and Sets of Measure Zero In this chapter, we provide a motivation for using measure theory as a foundation for probability. It uses the example of random coin tossing to

More information

Grade 8 Chapter 7: Rational and Irrational Numbers

Grade 8 Chapter 7: Rational and Irrational Numbers Grade 8 Chapter 7: Rational and Irrational Numbers In this chapter we first review the real line model for numbers, as discussed in Chapter 2 of seventh grade, by recalling how the integers and then the

More information

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2 1 Basic Probabilities The probabilities that we ll be learning about build from the set theory that we learned last class, only this time, the sets are specifically sets of events. What are events? Roughly,

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Part 3: Parametric Models

Part 3: Parametric Models Part 3: Parametric Models Matthew Sperrin and Juhyun Park August 19, 2008 1 Introduction There are three main objectives to this section: 1. To introduce the concepts of probability and random variables.

More information

Random Variables Example:

Random Variables Example: Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the

More information

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.] Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the

More information

In the real world, objects don t just move back and forth in 1-D! Projectile

In the real world, objects don t just move back and forth in 1-D! Projectile Phys 1110, 3-1 CH. 3: Vectors In the real world, objects don t just move back and forth in 1-D In principle, the world is really 3-dimensional (3-D), but in practice, lots of realistic motion is 2-D (like

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Getting Started with Communications Engineering

Getting Started with Communications Engineering 1 Linear algebra is the algebra of linear equations: the term linear being used in the same sense as in linear functions, such as: which is the equation of a straight line. y ax c (0.1) Of course, if we

More information

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips 6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips Hey, everyone. Welcome back. Today, we're going to do another fun problem that

More information

CMPSCI 240: Reasoning Under Uncertainty

CMPSCI 240: Reasoning Under Uncertainty CMPSCI 240: Reasoning Under Uncertainty Lecture 5 Prof. Hanna Wallach wallach@cs.umass.edu February 7, 2012 Reminders Pick up a copy of B&T Check the course website: http://www.cs.umass.edu/ ~wallach/courses/s12/cmpsci240/

More information

RVs and their probability distributions

RVs and their probability distributions RVs and their probability distributions RVs and their probability distributions In these notes, I will use the following notation: The probability distribution (function) on a sample space will be denoted

More information

Ratios, Proportions, Unit Conversions, and the Factor-Label Method

Ratios, Proportions, Unit Conversions, and the Factor-Label Method Ratios, Proportions, Unit Conversions, and the Factor-Label Method Math 0, Littlefield I don t know why, but presentations about ratios and proportions are often confused and fragmented. The one in your

More information

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1 The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma

More information

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted

More information

Cosets and Lagrange s theorem

Cosets and Lagrange s theorem Cosets and Lagrange s theorem These are notes on cosets and Lagrange s theorem some of which may already have been lecturer. There are some questions for you included in the text. You should write the

More information

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur Lecture 1 Real Numbers In these lectures, we are going to study a branch of mathematics called

More information

Steve Smith Tuition: Maths Notes

Steve Smith Tuition: Maths Notes Maths Notes : Discrete Random Variables Version. Steve Smith Tuition: Maths Notes e iπ + = 0 a + b = c z n+ = z n + c V E + F = Discrete Random Variables Contents Intro The Distribution of Probabilities

More information

Chapter 9: Roots and Irrational Numbers

Chapter 9: Roots and Irrational Numbers Chapter 9: Roots and Irrational Numbers Index: A: Square Roots B: Irrational Numbers C: Square Root Functions & Shifting D: Finding Zeros by Completing the Square E: The Quadratic Formula F: Quadratic

More information

Notes on Mathematics Groups

Notes on Mathematics Groups EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties

More information

Introduction to Algebra: The First Week

Introduction to Algebra: The First Week Introduction to Algebra: The First Week Background: According to the thermostat on the wall, the temperature in the classroom right now is 72 degrees Fahrenheit. I want to write to my friend in Europe,

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Problem Solving. Kurt Bryan. Here s an amusing little problem I came across one day last summer.

Problem Solving. Kurt Bryan. Here s an amusing little problem I came across one day last summer. Introduction Problem Solving Kurt Bryan Here s an amusing little problem I came across one day last summer. Problem: Find three distinct positive integers whose reciprocals add up to one. Prove that the

More information

Conditional Probability

Conditional Probability Conditional Probability Idea have performed a chance experiment but don t know the outcome (ω), but have some partial information (event A) about ω. Question: given this partial information what s the

More information

3: Gauss s Law July 7, 2008

3: Gauss s Law July 7, 2008 3: Gauss s Law July 7, 2008 3.1 Electric Flux In order to understand electric flux, it is helpful to take field lines very seriously. Think of them almost as real things that stream out from positive charges

More information

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010 MATH 9B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 00 This handout is meant to provide a collection of exercises that use the material from the probability and statistics portion of the course The

More information

MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM

MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM YOUR NAME: KEY: Answers in blue Show all your work. Answers out of the blue and without any supporting work may receive no credit even if they are

More information

Chapter 8: An Introduction to Probability and Statistics

Chapter 8: An Introduction to Probability and Statistics Course S3, 200 07 Chapter 8: An Introduction to Probability and Statistics This material is covered in the book: Erwin Kreyszig, Advanced Engineering Mathematics (9th edition) Chapter 24 (not including

More information

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models Contents Mathematical Reasoning 3.1 Mathematical Models........................... 3. Mathematical Proof............................ 4..1 Structure of Proofs........................ 4.. Direct Method..........................

More information

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1 Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1 What is a linear equation? It sounds fancy, but linear equation means the same thing as a line. In other words, it s an equation

More information

Probability Year 9. Terminology

Probability Year 9. Terminology Probability Year 9 Terminology Probability measures the chance something happens. Formally, we say it measures how likely is the outcome of an event. We write P(result) as a shorthand. An event is some

More information

CS1800: Strong Induction. Professor Kevin Gold

CS1800: Strong Induction. Professor Kevin Gold CS1800: Strong Induction Professor Kevin Gold Mini-Primer/Refresher on Unrelated Topic: Limits This is meant to be a problem about reasoning about quantifiers, with a little practice of other skills, too

More information

1 INFO Sep 05

1 INFO Sep 05 Events A 1,...A n are said to be mutually independent if for all subsets S {1,..., n}, p( i S A i ) = p(a i ). (For example, flip a coin N times, then the events {A i = i th flip is heads} are mutually

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman. Math 224 Fall 2017 Homework 1 Drew Armstrong Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman. Section 1.1, Exercises 4,5,6,7,9,12. Solutions to Book Problems.

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

P (E) = P (A 1 )P (A 2 )... P (A n ).

P (E) = P (A 1 )P (A 2 )... P (A n ). Lecture 9: Conditional probability II: breaking complex events into smaller events, methods to solve probability problems, Bayes rule, law of total probability, Bayes theorem Discrete Structures II (Summer

More information

Astronomy 102 Math Review

Astronomy 102 Math Review Astronomy 102 Math Review 2003-August-06 Prof. Robert Knop r.knop@vanderbilt.edu) For Astronomy 102, you will not need to do any math beyond the high-school alegbra that is part of the admissions requirements

More information

What is a random variable

What is a random variable OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr

More information

the time it takes until a radioactive substance undergoes a decay

the time it takes until a radioactive substance undergoes a decay 1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete

More information

RATES OF CHANGE. A violin string vibrates. The rate of vibration can be measured in cycles per second (c/s),;

RATES OF CHANGE. A violin string vibrates. The rate of vibration can be measured in cycles per second (c/s),; DISTANCE, TIME, SPEED AND SUCH RATES OF CHANGE Speed is a rate of change. It is a rate of change of distance with time and can be measured in miles per hour (mph), kilometres per hour (km/h), meters per

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: 03 17 08 3 All about lines 3.1 The Rectangular Coordinate System Know how to plot points in the rectangular coordinate system. Know the

More information

Binomial random variable

Binomial random variable Binomial random variable Toss a coin with prob p of Heads n times X: # Heads in n tosses X is a Binomial random variable with parameter n,p. X is Bin(n, p) An X that counts the number of successes in many

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Lecture 2 - Length Contraction

Lecture 2 - Length Contraction Lecture 2 - Length Contraction A Puzzle We are all aware that if you jump to the right, your reflection in the mirror will jump left. But if you raise your hand up, your reflection will also raise its

More information

Lesson One Hundred and Sixty-One Normal Distribution for some Resolution

Lesson One Hundred and Sixty-One Normal Distribution for some Resolution STUDENT MANUAL ALGEBRA II / LESSON 161 Lesson One Hundred and Sixty-One Normal Distribution for some Resolution Today we re going to continue looking at data sets and how they can be represented in different

More information

The following are generally referred to as the laws or rules of exponents. x a x b = x a+b (5.1) 1 x b a (5.2) (x a ) b = x ab (5.

The following are generally referred to as the laws or rules of exponents. x a x b = x a+b (5.1) 1 x b a (5.2) (x a ) b = x ab (5. Chapter 5 Exponents 5. Exponent Concepts An exponent means repeated multiplication. For instance, 0 6 means 0 0 0 0 0 0, or,000,000. You ve probably noticed that there is a logical progression of operations.

More information

Generating Function Notes , Fall 2005, Prof. Peter Shor

Generating Function Notes , Fall 2005, Prof. Peter Shor Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we

More information

One-to-one functions and onto functions

One-to-one functions and onto functions MA 3362 Lecture 7 - One-to-one and Onto Wednesday, October 22, 2008. Objectives: Formalize definitions of one-to-one and onto One-to-one functions and onto functions At the level of set theory, there are

More information

Grades 7 & 8, Math Circles 24/25/26 October, Probability

Grades 7 & 8, Math Circles 24/25/26 October, Probability Faculty of Mathematics Waterloo, Ontario NL 3G1 Centre for Education in Mathematics and Computing Grades 7 & 8, Math Circles 4/5/6 October, 017 Probability Introduction Probability is a measure of how

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

1.20 Formulas, Equations, Expressions and Identities

1.20 Formulas, Equations, Expressions and Identities 1.0 Formulas, Equations, Expressions and Identities Collecting terms is equivalent to noting that 4 + 4 + 4 + 4 + 4 + 4 can be written as 6 4; i.e., that multiplication is repeated addition. It s wise

More information

GRE Quantitative Reasoning Practice Questions

GRE Quantitative Reasoning Practice Questions GRE Quantitative Reasoning Practice Questions y O x 7. The figure above shows the graph of the function f in the xy-plane. What is the value of f (f( ))? A B C 0 D E Explanation Note that to find f (f(

More information

Senior Math Circles November 19, 2008 Probability II

Senior Math Circles November 19, 2008 Probability II University of Waterloo Faculty of Mathematics Centre for Education in Mathematics and Computing Senior Math Circles November 9, 2008 Probability II Probability Counting There are many situations where

More information

The First Derivative Test

The First Derivative Test The First Derivative Test We have already looked at this test in the last section even though we did not put a name to the process we were using. We use a y number line to test the sign of the first derivative

More information

Chapter 4: An Introduction to Probability and Statistics

Chapter 4: An Introduction to Probability and Statistics Chapter 4: An Introduction to Probability and Statistics 4. Probability The simplest kinds of probabilities to understand are reflected in everyday ideas like these: (i) if you toss a coin, the probability

More information

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ). CMPSCI611: Verifying Polynomial Identities Lecture 13 Here is a problem that has a polynomial-time randomized solution, but so far no poly-time deterministic solution. Let F be any field and let Q(x 1,...,

More information

Solutions to February 2008 Problems

Solutions to February 2008 Problems Solutions to February 008 Problems Problem. An Egyptian-style pyramid has a square b b base and height h. What is the volume of the smallest sphere that contains the pyramid? (This may be a little trickier

More information

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom Conditional Probability, Independence and Bayes Theorem Class 3, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence of events. 2.

More information

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 14 From Randomness to Probability Copyright 2012, 2008, 2005 Pearson Education, Inc. Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen,

More information

CS280, Spring 2004: Final

CS280, Spring 2004: Final CS280, Spring 2004: Final 1. [4 points] Which of the following relations on {0, 1, 2, 3} is an equivalence relation. (If it is, explain why. If it isn t, explain why not.) Just saying Yes or No with no

More information

AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS

AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS The integers are the set 1. Groups, Rings, and Fields: Basic Examples Z := {..., 3, 2, 1, 0, 1, 2, 3,...}, and we can add, subtract, and multiply

More information

Probability and the Second Law of Thermodynamics

Probability and the Second Law of Thermodynamics Probability and the Second Law of Thermodynamics Stephen R. Addison January 24, 200 Introduction Over the next several class periods we will be reviewing the basic results of probability and relating probability

More information

You separate binary numbers into columns in a similar fashion. 2 5 = 32

You separate binary numbers into columns in a similar fashion. 2 5 = 32 RSA Encryption 2 At the end of Part I of this article, we stated that RSA encryption works because it s impractical to factor n, which determines P 1 and P 2, which determines our private key, d, which

More information

Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities)

Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities) Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities) By Scott Fallstrom and Brent Pickett The How and Whys Guys This work is licensed under a Creative Commons

More information

Chapter 3: The Derivative in Graphing and Applications

Chapter 3: The Derivative in Graphing and Applications Chapter 3: The Derivative in Graphing and Applications Summary: The main purpose of this chapter is to use the derivative as a tool to assist in the graphing of functions and for solving optimization problems.

More information

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This document was written and copyrighted by Paul Dawkins. Use of this document and its online version is governed by the Terms and Conditions of Use located at. The online version of this document is

More information

High School Math Contest

High School Math Contest High School Math Contest University of South Carolina February th, 017 Problem 1. If (x y) = 11 and (x + y) = 169, what is xy? (a) 11 (b) 1 (c) 1 (d) (e) 8 Solution: Note that xy = (x + y) (x y) = 169

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information