Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

Similar documents
Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

CS 361: Probability & Statistics

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Recitation 2: Probability

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

CMPSCI 240: Reasoning Under Uncertainty

COMPSCI 240: Reasoning Under Uncertainty

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Single Maths B: Introduction to Probability

Probability Theory and Applications

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Discrete Random Variables

Sample Spaces, Random Variables

Properties of Probability

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

3 Multiple Discrete Random Variables

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Brief Review of Probability

University of California, Berkeley, Statistics 134: Concepts of Probability. Michael Lugo, Spring Exam 1

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

CMPSCI 240: Reasoning about Uncertainty

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Notes 12 Autumn 2005

Name: Firas Rassoul-Agha

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

M378K In-Class Assignment #1

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Lecture 3. Discrete Random Variables

Probability, Random Processes and Inference

Lecture notes for probability. Math 124

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

Conditional Probability

1 Random variables and distributions

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Dynamic Programming Lecture #4

Outline Conditional Probability The Law of Total Probability and Bayes Theorem Independent Events. Week 4 Classical Probability, Part II

Lecture 2: Review of Basic Probability Theory

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Econ 325: Introduction to Empirical Economics

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

MAT 271E Probability and Statistics

Week 12-13: Discrete Probability

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Dept. of Linguistics, Indiana University Fall 2015

Discrete Random Variable

MAT Mathematics in Today's World

CME 106: Review Probability theory

More on Distribution Function

Introduction to discrete probability. The rules Sample space (finite except for one example)

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

the time it takes until a radioactive substance undergoes a decay

Homework 4 Solution, due July 23

STA 247 Solutions to Assignment #1

12 1 = = 1

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Chapter 2 Random Variables

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

Basics on Probability. Jingrui He 09/11/2007

1. Discrete Distributions

STA 291 Lecture 8. Probability. Probability Rules. Joint and Marginal Probability. STA Lecture 8 1

CMPSCI 240: Reasoning Under Uncertainty

Lecture 1. ABC of Probability

Terminology. Experiment = Prior = Posterior =

Probability Theory Review

Distributions of linear combinations

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Probability Year 9. Terminology

Sum and Product Rules

Chapter 3 Discrete Random Variables

Review of probability. Nuno Vasconcelos UCSD

Review of Probability Mark Craven and David Page Computer Sciences 760.

Math 3338: Probability (Fall 2006)

2. The Binomial Distribution

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Computing Probability

6.041/6.431 Spring 2009 Quiz 1 Wednesday, March 11, 7:30-9:30 PM. SOLUTIONS

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Introduction to Probability 2017/18 Supplementary Problems

RVs and their probability distributions

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

L2: Review of probability and statistics

Probability Year 10. Terminology

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3

X = X X n, + X 2

CMPSCI 240: Reasoning Under Uncertainty

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Probability Basics. Part 3: Types of Probability. INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

MA : Introductory Probability

CS 630 Basic Probability and Information Theory. Tim Campbell

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

Transcription:

EE 178 Probabilistic Systems Analysis Spring 2018 Lecture 6 Random Variables: Probability Mass Function and Expectation Probability Mass Function When we introduce the basic probability model in Note 1, we defined three things: 1) the basic random variables; 2) the sample space Ω consisting of all the possible outcomes of the experiment; 2) the probability of each of the outcomes. Usually, a probability model consists of multiple random variables. If we want to focus on just one of the random variables, there are two things important about it: 1) the set of values that it can take ; 2) the probabilities with which it takes on the values. Let a be any number in the range of a random variable X. Since X = a is an event, we can talk about its probability, P(X = a). The collection of these probabilities, for all possible values of a, is known as the probability mass function or distribution of the r.v. X. Definition 6.1 (probability mass function or distribution): The probability mass function (or distribution) of a random variable X is the collection of values {(a,p X (a) = P(X = a)) : a A }, where A is the set of all possible values taken by X. The probability mass function of a random variable can be computed from the probabilities of the outcomes. For example, consider the experiment with two independent rolls of a dice. Let X be the result of the first roll, and Y be the result of the second roll. The sample space is shown in Figure 1 with 36 outcomes. The probability of each outcome is 1/36. Then P(X = 3) is simply the sum of the probabilities of the outcomes in the 3rd column. And P(Y = 2) is the sum of the probabilities of the outcomes in the 2nd row. Hence: P X (a) = P(X = a) = 1 6, a = 1,...,6. P Y (b) = P(Y = b) = 1 6, b = 1,...,6. Note that the collection of events X = a, a A, satisfy two important properties: any two events X = a 1 and X = a 2 with a 1 a 2 are disjoint. the union of all these events is equal to the entire sample space Ω. The collection of events thus form a partition of the sample space. As a consequence, the sum of the probabilities P(X = a) over all possible values of a is exactly 1. So when we sum up the probabilities of the events X = a, we are really summing up the probabilities of all the outcomes. In the dice rolling examples, the events X = 1,X = 2,X = 3,X = 4,X = 5,X = 6 are the six columns of the 2-dimensional sample space and form a partition of it. In the dice rolling example, the probability mass functions of X and of Y are computed from the probability assignment to the outcome. But more often then not, things happen in reverse: a probability model is put EE 178, Spring 2018, Lecture 6 1

Figure 1: The sample space for the example of rolling two dice. The column, row and the diagonal correspond to the three events X = 3, Y = 2 and S = 4 respectively, where S = X +Y together by first specifying the probability mass functions of the individual random variables, and then the whole model is put together by specifying the probabilistic relationship between the random variables. For example, the probabilistic model for the dice rolling example is built by assuming X and Y each has the uniform probability mass function, and the whole model is constructed by assuming X and Y are independent. Defining New Random Variables A probability model is built by defining certain basic random variables. But more often than not, we want to ask questions about other random variables which are not among the basic ones but can be defined in terms of the basic ones. For example, the basic random variables for the dice rolling problem is X, the result of the roll of the first die, and Y, the result of the roll of the second die. But maybe we are not interested in these random variables individually but in their sum S = X +Y. Then S is a random variable on its own right. Just like the basic random variables X and Y, the newly defined random variable S has a probability mass function p S (c) = P(S = c), and this can be computed. In our example P(S = 4) = 3 a=1 P(X = a,y = 4 a) = 3 a=1 P X (a)p Y (4 a) = 1 12. The event S = 4 corresponds to the diagonal subset of outcomes in Figure 1. The entire probability mass function is shown in the table below: a 2 3 4 5 6 7 8 9 10 11 12 1 1 1 1 5 1 5 1 1 1 1 P S (a) 36 18 12 9 36 6 36 9 12 18 36 The distribution of a general random variable X, whether it is a basic random variable or a random variable defined in terms of the basic random variables, can be visualized as a bar diagram, shown in Figure 2. The x-axis represents the values that the random variable can take on. The height of the bar at a value a is the probability P(X = a). Each of these probabilities can be computed by looking at the probability of the corresponding event in the sample space. EE 178, Spring 2018, Lecture 6 2

Figure 2: Visualization of how the distribution of a random variable is defined. The bottom part of the figure refers to an example to be discussed in a later lecture. EE 178, Spring 2018, Lecture 6 3

Bionomial Distribution Suppose you have n balls and select k out of the n balls. How many such subsets of k balls exist? We denote the number of such subsets by ( ( n k). To compute n k) in terms of n and k, we consider the number of ways to rearrange the n items. This is given by n! which is equal to n (n 1)...2 1. Another way to find the number of rearrangements is to first fix the first k elements in ( n k) ways and then rearrange within the first k and the last n k elements. Therefore, we should have ( ) n n! = k!(n k)! k and therefore, ( ) n n! = k k!(n k)! The binomial distribution is one of the most important distributions in probability. It can be defined in terms of a coin-tossing experiment. Consider n independent tosses of a biased coin with Heads probability p. Define the random variable X i = 1 if the ith flip is a Heads, and X i = 0 otherwise, Let X be the number of Heads. Note that X can be defined in terms of the basic random variables X i s: X = X 1 +... + X n. To compute the distribution of X, we first enumerate the possible values X can take on. They are simply 0,1,...,n. Then we compute the probability of each event X = i for i = 0,...,n. The probability of the event X = i is the sum of the probabilities of all the outcomes with i Heads. Any such outcome has a probability p i (1 p) n i. There are exactly ( n i) of these outcomes. So ( ) n P(X = i) = p i (1 p) n i i = 0,1,...n (1) i This is the binomial distribution with parameters n and p. A random variable with this distribution is called a binomial random variable (for brevity, we will say X Bin(n, p)). An example of a binomial distribution is shown in Figure 3. Although we define the binomial distribution in terms of an experiment involving tossing coins, this distribution is useful for modeling many real-world problems. Consider for example the problem of reliable data storage in the face of hard disk failure. The technology is called RAID. (See http://en.wikipedia.org/wiki/raid.) Reliability is provided by adding redundancy and using error-correction coding: the data is distributed across n disks and can be recovered as long as no more than k disks fail. (The parameters n and k depend on the level of RAID used.) Assuming each disk fails independently with probability p, the number of disk failures X is binomial distributed with parameters n and p. So the probability of unrecoverability of the data is given by : P(X > k) = n i=k+1 ( ) n p i (1 p) n i. i For a given value of p, we can choose k large enough such that this probability is no less than, say, 0.99. EE 178, Spring 2018, Lecture 6 4

Figure 3: The binomial distributions for two choices of (n, p). Joint Probability Mass Functions The pmf of a random variable X summarizes all the probabilistic information about it. When we have two random variables X and Y, the events of interest are X = a and Y = b for all possible values of (a,b) that (X,Y ) can take on. Thus, a natural generalization of the notion of pmf to multiple random variables is the following. Definition 6.2 (joint pmf (distribution)): The joint pmf (distribution) of two discrete random variables X and Y is the collection of values {(a,b,p X,Y (a,b) := P(X = a,y = b)) : (a,b) A B}, where A and B are the sets of all possible values taken by X and Y respectively. This notion obviously generalizes to three or more random variables. In fact, the probability assignment to the outcomes of the sample space can be viewed as the joint pmf of all the basic random variables X 1,...,X n defining the probability model. Just like the distribution of a single random variable, the joint distribution is normalized, i.e. P X,Y (a,b) = 1. a A,b B This follows from noticing that the events X = a,y = b (where a ranges over A and b ranges over B) partition the sample space. The joint distribution between two random variables fully describes their statistical relationships. Moreover, the individual distributions of X and Y can be recovered from the joint distribution as follows: P X (a) = P X,Y (a,b) a A, (2) b B P Y (b) = P X,Y (a,b) b B. (3) a A The first follows from the fact that the events Y = b (where b ranges over B) form a partition of the sample space Ω, and so the events X = a and Y = b (where b ranges over B) are disjoint and their union yields the event X = a. The second fact follows for similar reasons. Pictorially, one can think of the joint distribution values as entries filling a table, with the columns indexed by the values that X can take on and the rows indexed by the values Y can take on (see Figure 4). To get the EE 178, Spring 2018, Lecture 6 5

Figure 4: A tabular representation of a joint distribution. distribution of X, all one needs to do is to sum the entries in each of the columns. To get the distribution of Y, just sum the entries in each of the rows. This process is sometimes called marginalization and the individual distributions are sometimes called marginal distributions to differentiate them from the joint distribution. Note that in general, the individual distributions of X and Y alone do not fully specify the joint distribution. However, in the special case when X and Y are independent, then we have P X,Y (a,b) = P X (a)p Y (b). EE 178, Spring 2018, Lecture 6 6