Lecture On Probability Distributions

Lecture On Probability Distributions 1 Random Variables & Probability Distributions Earlier we defined a random variable as a way of associating each outcome in a sample space with a real number. In our dice rolling experiment, each of the 36 possible outcomes must to be associated with one of the numbers through 1. We call x the sum of two faces on the dice a random variable since it associates each outcome with a real number from to 1. < How many different sums, can possibly turn up on the roll of two dice? >,3,4,5,6,7,8,9,10,11,1 or eleven different numbers. Now, for awhile we will limit ourselves to random variables which can take on a finite number of values i.e., we will limit ourselves to discrete random variables... later we will include a discussion of continuous random variables when we talk about the normal distributions. 1.1. Probability Mass Functions: Now that we know that a random variable is a real-numbered representation of the outcomes of an experiment, the question is how we associate our measures of probability with different values of our random variables, Definition: A probability MASS function is a function which assigns a probability P( x) to each real number (x) with in the range of a discrete random variable x. Examples: For each roll of two dice (fair) the probability that they will sum to a given number is given by the table. {next slide} This ProbDistLec_rev 013.lwp Page 1 of 58

correspondence between a value of a random variable and a measure of probability for that value represents a probability function connecting the value of probability. Sum of Dice 3 4 5 6 7 8 9 10 11 1 Total Probability is: Probability 1/36 /36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 /36 1/36 1 < What is P(5) =? 4/36. P(9)? = 4/36. What is the sum of the probabilities for all R.V.'s > (1) Now, whenever we can we like to express probability functions by means of compact formulas we'll see momentarily, how we can write the probability functions for some very important probability distributions. {next slide} ProbDistLec_rev 013.lwp Page of 58

percentage Probability Distribution for Sum of Two Dice 18 15 1 9 6 3 0 3 4 5 6 7 8 9 10 11 1 Sum of Dice 1.. CUMULATIVE PROBABILITY FUNCTION: A cumulative probability function describes the cumulative probability that a random variable takes on values less than a given value. Take the dice throwing example just presented < What is the range of values of the RV Xs? > -1 <What is the PR(Xs <5)? = P(x=) + P(x=3) + P(x=4) + P(x=5) = 1/36 + /36 + 3/36 + 4/36 = 10/36 F(5) i5 P(x i) 10 36. {next slide} ProbDistLec_rev 013.lwp Page 3 of 58

Cumulative Probability Distribution for Sum of Two Dice 100 percentage 80 60 40 0 0 3 4 5 6 7 8 9 10 11 1 Sum of Dice 1.3. Properties of a Probability Function To summarize, the probability function of a random variable has two basic properties: {next slide} Properties of a probability function: 0 P x~ x 1 The probability of any value of a random variable is in the range (0,1) inclusive P x ~ 10. all x The probabilities of all values of a random variable in the sample space sum to 1. 1.4. Features of probability functions < Now, when we roll two (fair) dice, what is the most likely sum of the two? > If you look at the probability distribution, you would guess 7 right? {next slide} ProbDistLec_rev 013.lwp Page 4 of 58

percentage Probability Distribution for Sum of Two Dice 18 15 1 9 6 3 0 3 4 5 6 7 8 9 10 11 1 Sum of Dice most likely (or "expected") value Well, your intuition is actually quite good, the outcome most expected is the one with the highest probability of occurring. This works for a symmetric distribution, but not necessarily for all distributions. Just as we defined the mean of an empirical distribution of data as roughly the center of the distribution, we can define the mean of a random variable roughly as the center of mass of the probability distribution of the random variable. Mathematically, we weight each possible outcome by its probability and sum those products. That sum is the expected value of the random variable: {next slide} ProbDistLec_rev 013.lwp Page 5 of 58

Features of Probability Functions: Measure of Central Location: E x~ x P x all x The expected value of a random variable is the weighted sum of all its possible values, with the weights being the probability of each value. So, let's see how the formula for expected value works in the dice tossing case: {next slide} Sum of Dice Probability Products 1/36 x 0.078 3 /36 3 x 0.056 4 3/36 4 x 0.083 5 4/36 5 x 0.111 6 5/36 6 x 0.139 7 6/36 7 x 0.167 8 5/36 8 x 0.139 9 4/36 9 x 0.111 10 3/36 10 x 0.083 11 /36 11 x 0.056 1 1/36 1 x 0.078 Total Probability is: 1 Expected Value = all x P x~ i xi 7. 0 Secondly we have the expression for the variance of a probability distribution: {next slide} ProbDistLec_rev 013.lwp Page 6 of 58

Measure of the Spread of the Distribution: ~ ~ V x E x x P x all x The variance of a random variable is the expected value of the weighted sum of the squared deviations from the mean of the probability distribution, with the weights being the probability of each value. 1.5. Proof of Tchebysheff's Theorem A while back we talked about Tchebysheff's Theorem and about how, for any distribution, it allowed us to develop conservative estimates of the probability of obtaining a value within k standard deviations of the mean. Now that we know a little more about probability distributions, we can actually prove that theorem. Here's the theorem: {next slide} Tschebysheff's Theorem: Given any probability distribution with mean the probability of obtaining a value within k standard deviations of the mean is at least 1-1/k. i.e., Pr ~ 1 x k 1. k Note that Tschebysheff's theorem also implies that the probability that we get a value more than k standard deviations away from the mean is 1/k. Proof: {next slide} We know from our definition of a probability density function that: ~ ~ V x E x x P x all x ProbDistLec_rev 013.lwp Page 7 of 58

. We can split our probability distribution into 3 parts corresponding to the areas (a) outside the k region and (b) inside the k region region 1 region region 3 {next slide} mean k k 3. Then we can decompose the variance as follows: x Px x P x x P x region 1 region region 3 4. Since all three of these terms are greater than zero, we can drop the second term, leaving us with: x P x x P x region 1 region 3 5. Now, since the absolute value of the deviation of x from the mean, is at least k for all terms, we can write: region 1 region 3 k P x k P x ProbDistLec_rev 013.lwp Page 8 of 58

or, dividing through by k 1 k P x region 1 region 3 P x probability that x k probability that x k Therefore, the probability that a random variable is greater than k standard deviations away from the mean of its probability distribution is less than or equal to 1/k. End of Proof of Tschebysheff's theorem. 1.6. More Rules of Expectations Here are some more rules of expectations associated with probability functions: {next slide} {next slide} More Rules of Expectations E k k where k is a constant V k 0 the variance of a constant is zero (duh!) E kx ~ k E x ~ V kx ~ k V x ~ E x y E x E y ~ ~ ~ ~ E x y E x E y if x & y are independent. ~ ~ ~ ~ V x y V x V y if x & y are independent. always positive ProbDistLec_rev 013.lwp Page 9 of 58

The Expectation of a linear transformation of a random variable. The linear transformation is: ~ y = a + b ~ x This implies that the expectation of the linear transformation of x is: E ~ y E a b E ~ x a be ~ x A particularly useful linear transformation: Take any random variable x with mean and standard deviation. Form the new random variable: ~ z ~ x or, rearranging, we have, ~ 1 z ~ x. Now, let's find the expectation and variance of our new random variable z: E ~ 1 z E x ~. 1 0. The expectation of our new random variable z is zero! ProbDistLec_rev 013.lwp Page 10 of 58

The variance of our transformed random variable is: V ~ 1 z V x~ 1 V x~ V 1 V x~ 1 1 0 (Expectations Rule) (Expectations Rule) So, our new random variable has a variance of 1. Notice that this standardized random variable, z, has a mean = 0 and a variance/standard deviation = 1 irrespective of the type of distribution from which the random variable comes. These properties of the standardized random variable, z, will prove to be extremely useful when we begin to work with the normal distribution. Indeed, much of what we will be doing for the rest of the course involves taking a random variable x and transforming it into a new random variable y that has known properties (such as = 1 and expected value = 0). 1.7. Summary Now, we re at the point in the course where we start collecting a lot of information directly from probability theory which will later be enormously useful when we approach statistical inference. But first, let s see where we ve been: 1. We started off with the notion of uncertainty of attempting to find out things about an unknown population on the basis of samples from that population. ProbDistLec_rev 013.lwp Page 11 of 58

. We defined a measure of our uncertainty and called it probability and we said that our probability measure ought to have certain properties. 3. Then we went on to deduce other features of our probability measure which follow directly from our Postulates. Now, it's time to start looking directly at specific probability distributions that have known and useful properties that we can use in statistical inference to reduce our uncertainty about characteristics of unknown populations. Binomial Distribution Repeated identical trials are called Bernoulli Trials if three conditions are satisfied: 1. each trial has two possible outcomes, denoted generically, s, for success and f, for failure.. the trials are independent; and 3. the probability of a success remains the same from trial to trial, called the success probability and denoted p. The binomial distribution is the probability distribution for the number of successes in a sequence of Bernoulli trials. Example: Mortality: Mortality tables enable actuaries to obtain the probability that a person at any particular age will live a specified number of years. Such probabilities, in turn, permit the determination of life-insurance premiums, retirement pensions, annuity payments, and related items of importance to insurance companies and others. According to tables provided by the U.S. National Center for Health Statistics in Vital Statistics of the United States, there is about an 80% chance ProbDistLec_rev 013.lwp Page 1 of 58