Introduction to Probability - PDF Free Download

Introduction to Probability Salvatore Pace September 2, 208 Introduction In a frequentist interpretation of probability, a probability measure P (A) says that if I do something N times, I should see event A happen P (A) N times. This intuition is not wrong and everyday life tells us that as N, the frequentist interpretation becomes correct. For example, flip a fair coin and count how many times you get heads and tails. The more you flip the coin, the more the count between heads and tails will converge to a similar number. It was Andrey Kolmogorov, a 20th century Soviet mathematician, who first successfully quantified this intuition with the following probability axioms:. The probability of an event must be a non-negative real number (P (A) 0). 2. The the sum of the probabilities of each possible event is one (P (Ω) N i= P (A i) =, where Ω = {A, A 2,..., A N } is the set of all possible events). 3. The probability of the sum of two mutually exclusive events (two events that cannot happen at the same time) is equal to the sum of their individual probabilities (P (A + A 2 ) = P (A ) + P (A 2 ) iff events A and A 2 are mutually exclusive). That is all there is. Everything in probability can technically be derived from these three axioms. To state one of the important things that follow, if I have two events A and B are independent (if event A happens it does not affect the probability that event B happens) then the probability that they both events happen is given by P (A and B) = P (A) P (B). () That is, if two events are independent, they probability for them to both happen is equal to the product of their individual probabilities (For an explanation of why this is, look up conditional probability). Furthermore, another useful tool is that if asked for the probability that event A occurs, one could also find one minus the probability that event A does not occur, which by axiom 2 will gave you the same result. 2 Discrete random variables This section will deal with sample spaces that are countable. Countability will be talked more in the beginning of section 3. 2. Equally-likely probability When all elements in your sample space Ω are equally-likely, the probability to get an event A n will be proportional to the number of ways event A n can occur. To make this

more clear, let us considered flipping a fair coin three times. Denoting tails as T and heads as H, I can get the following outcomes: Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T } where, for example, HHT means I first got heads, then another heads, and lastly tails. This flipping coins example fits under out equally-likely probability since I am equally likely to get a heads as I am a tails each time I flipped my coin. HHH is just as likely as HHT, as each time I flip the coin, my result is not dependent on my past results. So, I was interested in the question, what is the probability I can get two heads if I flip a fair coin three times, my initial statement of this section says that it should be proportional to the number of ways I can get two heads (HHT or HT H or T HH). We can figure out its exact form by looking at the second axiom of probability. If P (Ω) is proportional to the amount of ways I can get Ω (The number of outcomes one can get from flipping a coin three times), then in order to make sure P (Ω) =, I must normalize my probability measure to one by dividing by the number of outcomes one can get from flipping a coin three times. Therefore, in this example, the probability to get two heads will be: P (2 heads) = N (Two heads) N (Total outcomes) = 3 8, (2) where N (A) denotes the number of ways for event A to occur. In general, the probability to get event A if each outcome of my experiment/trial is equally likely is P (A) = N (A) N (Ω). (3) The game of counting is a rich field of math known as combinatorics, and is very important in statistical physics. In these equally-likely case, all one must do is simply count, but as you can imagine, this counting can become extremely nontrivial fast (for the interested reader, Google Catalan numbers )! 2.2 Binomial Distribution A more general probability game is allowing the probability of outcomes to be unequal. For example, let us go back to our friendly coin flipping game as a motivation, but this time the probability to get heads is p and the probability to get tails is q (which by axiom 2, q = p). What if someone now asked, what is the probability that if I flip a coin N times, that I get n heads? Armed with equation, one may make a guess that P (# Heads = n) p n q N n = p n ( p) N n (4) where the N n term denotes the number of tails I get (If I don t get heads, I must get tails). While this takes into account the given probabilities, it does not take into account that there are multiple configurations to get the same result. For example, if I flip a coin N times and want to get N heads, there are N different ways, or combinations, I can come up with that gets me my results (one for each position the tails could be). Thus, we should multiply equation 4 by the total number of configurations to get n heads. The reason for this factor can also be seen by considering axiom 3. To figure out what this factor is imagine I have N boxes, each needing to be filled with a head or tail. When deciding where my first head should go, there are N boxes 2

to chose from. For my second, there are N boxes, and generally, for my kth head, there are N k + boxes to choose from. Thus by the basic counting rule (which states that the if there are m ways of doing one thing, m 2 ways of doing the second thing, m 3 ways of doing the third thing,etc, then the total number of ways to do all of the things is equal to m m 2... m k k i= m i), the number of ways to put n things in N boxes is N! N (N )... (N n + ) = (N n)!. (5) However, if the things you are putting in boxes are indistinguishable, then we accidentally are over counting with the above. For example, if I switched the heads in box with the heads in box 2, I do not get a new results (it is not a separate element in my sample space). Therefore, we must divide equation 5 by the number of permutations I can make with each of the n heads, namely n! The reason it is n! is given by the same argument at which we can to equation 5 with (t first I can choose n different heads to switch, then - since I already chose one to switch - I can chose n, etc). Therefore, the factor I must N! multiply equation 4 by. This leads to what is called the Binomial Distribution: (N n)!n! P N (n) = N! (N n)!n! pn ( p) N n (6) It measures the probability for cases such as: I have N independent trials in my experiment, what is the probability that I get n successful trials if each trial has a probability to succeed p. Another discrete probability distribution is the Poisson distribution. We can derive it as a limiting case of the binomial distribution for when N is large, p is small, and is a constant of about order one. This limiting case occurs when the likelihood of an event to occur is rare. The Poisson distribution is given by: 2.3 Proof: Binomial Poisson P λ (n) = λn e λ Using Stirling s approximation, x! x x exp[ x] 2πx., we can rewrite the binomial distribution n! (7) as P = n! k!(n k)! pk ( p) n k 2πn( n e P )n n 2π(n k)( n k e )n k k! pk ( p) n k n n e k = n k (n k) n k k! pk ( p) n k Now, letting N while keeping = λ, which is a constant: n n p k ( p) n k e k (n k) n k k! = nn ( λ n )k ( λ n )n k e k (n) n k ( k n )n k k! Reminding ourselves that lim n ( x n )n e x, P = λk e λ e k e k k! which is the Poisson distribution. 3 = λk ( λ n )n k e k ( k n )n k k! = λk e λ k λk ( λ n )n e k ( k n )n k!

3 Continuous random variables So far we have dealt with random variables (variables that follow some sort of probability function) have been discrete (i.e. the sample spaces has been countable). This idea of discreteness is important in many fields of physics and can be hard to wrap your head around at first. If something is discrete, you can assign each value of it an integer. If something is continuous, you cannot. Hence the math word countable. With this, you can get into the realm of some infinities being bigger than others, which is hard to wrap your head around at first. For the interested reader, for a famous proof that the infinite set of all real numbers is larger than the infinite set of all integers, look up Cantor s diagonal argument. Now, if you have a continuous random variable, then the sample space of possible outcomes must be uncountable and therefore, the probability to get exactly one element from this set is zero. Some textbooks use this as the definition of a continuous random variables. However, it can also be interpreted as a result of a single element from an uncountable set having zero measure. Therefore, for a continuous random variable x, since P (x) = 0, we instead ask questions about P (a < x < b). Now for discrete random variables, something like P (a < x < b) would be answered with a sum P (x i ), which sums over the countable set of x i which satisfy a < x i < b. Therefore, one may expect that instead of sums, for continuous random variables we use integrals! ie, if x is a continuous random variable, then P (a < x < b) = ˆ b a ds p(s). (8) Where p(s) is called the probability density function. This definition can be made more rigorous with the aid of Riemann sums. Following from the 3 axioms of probability, one can see that the only limitation of p(s) is that it must a nonnegative function and properly normalized such that axiom 2 is satisfied. An extremely important probability distribution is the Gaussian distribution (also called the normal distribution). Its is the classical bell curve we all know and love. Its not only essential in physics but in all science fields (its universality is due to something called the central limit theorem.). It is given by: p(x; µ; σ) = 2πσ 2 e (x µ)2 2σ 2 (9) There are many ways to arrive at this, but one which is particularly enlightening is by the de Moivre-Laplace theorem, which shows that the Gaussian distribution is a limiting case of the Binomial distribution, of large N. 3. Proof: Binomial Gaussian Starting with the binomial distribution: P bin (k) = N! k!(n k)! pk q N k We can rewrite it by using Stirling s approximation to a factorial: x! x x exp[ x] 2πx. This leads to: 4

N N exp[ N] 2πN k k exp[ k] 2πk(N k) (N k) exp[ (N k)] 2π(N k) pk q N k which simplifies to: ( ) k ( ) N k N 2πk(N k) k N k In the case of large N, we can approximate p k We can use this approximation by N rewriting our current approximation as: ( ) k ( ) N k 2πN k ( k ) k N k N N ( ) k ( ) N k 2π( p) k N k 2πq ( k ) k ( ) N k N k Our amplitude is now correct, but we eventually want to have an exponential term, so we do the following: { { ( 2πq exp Log k { { ( ) } 2πq exp klog k ) k }} { { ( exp Log N k { ( + (N k)log N k ) N k }} ) }} At this point, we will define a new variable that makes the algebra in the future easier. let z = k µ = k σ q. The term z is what we want to have in our exponential term for our Gaussian, so making this substitute will make it easier to get our exponential in the correct final form. With this new variable, it is easy to verify: k = + z q and N k = z q. Let us look at each logarithm s argument with this new variable. and k = + z q = N k = z q = ( + z q ( z q ) ( ) q = + z ) ( ) p = z Plugging this in, and pulling the negative one in the exponent out in front of the logarithm, our approximation now becomes: { ( 2πq exp ( + z { } q q)log + z + ( z { }) } p q)log z 5

Using Taylors theorem: Log( + x) x x2, stopping at 2nd order because the larger 2 term in a Gaussian exponential is of 2nd order, our expression is now: { ( 2πq exp ( + z ( ) q q) z z2 q + ( z ( )) } p q) z 2 z2 p 2 Distributing leads to: { (( 2πq exp z ) ( q z2 q 2 + z2 q + z )) } q z2 p 2 + z2 p { } 2πq exp (p + q) z2 2 which, using p + q = and plugging in our expression for z, simplifies the approximation to: { } (k ) 2 2πq exp 2q. 4 Expectation value and variance As a final part of the notes, I will quickly define what the expectation values and variances are. For discrete random variables, the expectation is given by f(x) = s f(s)p (s) (0) and for continuous random variables it is given by ˆ f(x) = ds f(s) p(s) () s where the sum and integral is over all possible values of the random variable. By the definitions, the expectation value is essentially a weighted sum, such that the most likely values will be nearest to the expectation value. Nevertheless, the expectation value does not have to be a possible outcome of the trial/experiment you are doing. Some people may call the expectation value the mean or the average. The variance of a random variable x is defined by: and can be written in the more friendly form Var(x) = (x x ) 2 (2) Var(x) = (x x ) 2 = x 2 2x x x 2 = x 2 2 x x x 2 = x 2 x 2 (3) Where we have used the linearity of the expectation, the fact that the expectation value is simply a number, and the expectation value of a number is the same as the number itself. Conceptually, it measures how spread out the random numbers are from they expectation value. 6

The below table shows the expectation value and variance of the distributions talked about in these notes. I very much encourage you to prove some these yourself! Distrubution Expectation Value Variance Binomial ( p) Poisson λ λ Gaussian µ σ 2 5 Exercises ) Given the probability measure P (i, j) = e 5 2i 3 3, for the discrete random variables i!j! i, j {0,, 2, 3,...} show that P (i, j) satisfies the first two probability axioms. 2) A box contains four balls numbered,2,3, and 4. A ball is chosen ar random, its number noted, and the ball is returned into the box. This process is then repeated one more time. a) Determine the sample space Ω b) If each outcome is assigned the same probability, what is the common probability? c) Using the probability assignment in part (b), find the probability that the two numbers chosen are different. 3) Given the probability measure P (x) = C 4x, with x {0,, 2, 3,...}: x! a) Find the normalizing constant C, should that the second probability axiom is satisfied. b) Find the probability that x is at least 2 (ie. P (x 2)) (Hint, look at the 2nd and 3rd probability axioms). 4) You throw two 6 sided dice. Find the expectation value of their sum given that they are fair dice (Each side is equally likely). 5) You flip a weighted coin 00 times, which has the property that it is two times more likely to get heads than tails. What is the probability that you get 42 heads? 6) Prove that the Gaussian distribution (equation 9) is properly normalized. (Hint: If it is properly normalized, than P ( < x < ) = P ( < x < ) P ( < y < ) =, where x and y both follow the Gaussian distribution. Therefore P ( < x < ) P ( < y < ) = implies P ( < x < ) = ) 7

Binomiali called Binomial coefficient In general mania In it enka l n pop n ftp.qnncoe fe pdp fff BinomialTheorm N FN ptg p N ft f l int Ent f enqn n popen h pnqm poppope poet n Nn png n Goetizendens poet Cptgf x z any cpdpfcptqy pdpfdplptgf pdpfpnftgfy ffnlftqfttfvcvucetg.tt pntfnip2n Varlx X art 2 2 2 4 7 thx 2 th 2 477 4 7 427 22 74 7 tht 427 72 For Binomial varix 2 7 LAE pntp2nz f N p NE porph Nfuf e g

fiodo o Gaussian t.q.jo us Exe INT tfioe to g e as as e as Is e tmrw no e Qiao