Review of Basic Probability Theory

Similar documents
Computing Probability

Probability Theory Review

Properties of Probability

Fundamentals of Probability CE 311S

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

CMPSCI 240: Reasoning about Uncertainty

Lecture 2: Review of Probability

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Recitation 2: Probability

Notes on Mathematics Groups

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

2) There should be uncertainty as to which outcome will occur before the procedure takes place.

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

ECE353: Probability and Random Processes. Lecture 2 - Set Theory

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Deep Learning for Computer Vision

Origins of Probability Theory

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Sample Spaces, Random Variables

Econ 113. Lecture Module 2

Dept. of Linguistics, Indiana University Fall 2015

Lecture 2: Repetition of probability theory and statistics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

Bandits, Experts, and Games

Probability. Lecture Notes. Adolfo J. Rumbos

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Single Maths B: Introduction to Probability

Econ 325: Introduction to Empirical Economics

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Lecture 11. Probability Theory: an Overveiw

Brief Review of Probability

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

= 2 5 Note how we need to be somewhat careful with how we define the total number of outcomes in b) and d). We will return to this later.

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

01 Probability Theory and Statistics Review

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

Class 26: review for final exam 18.05, Spring 2014

Math-Stat-491-Fall2014-Notes-I

God doesn t play dice. - Albert Einstein

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

Review of Probability. CS1538: Introduction to Simulations

We will briefly look at the definition of a probability space, probability measures, conditional probability and independence of probability events.

Statistical Methods for the Social Sciences, Autumn 2012

Random Vectors, Random Matrices, and Matrix Expected Value

Chapter 7 Probability Basics

Lecture 1: Basics of Probability

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

Lecture 1: Probability Fundamentals

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Probability theory basics

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

Part (A): Review of Probability [Statistics I revision]

Module 1. Probability

M378K In-Class Assignment #1

Probability and random variables

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

Multivariate probability distributions and linear regression

Probability and random variables. Sept 2018

STAT Chapter 3: Probability

CME 106: Review Probability theory

Algorithms for Uncertainty Quantification

Appendix A : Introduction to Probability and stochastic processes

18.440: Lecture 26 Conditional expectation

Probability Theory and Statistics. Peter Jochumzen

Lecture 4: Probability and Discrete Random Variables

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Random Variables and Their Distributions

Probability 1 (MATH 11300) lecture slides

Handout 1: Probability

Probabilities and Expectations

Making Hard Decision. Probability Basics. ENCE 627 Decision Analysis for Engineering

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Probability Theory and Random Variables

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18

ECE Homework Set 3

ECON Fundamentals of Probability

Discrete Probability Refresher

Probability and Distributions

X = X X n, + X 2

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Lectures on Elementary Probability. William G. Faris

Introduction to Probability and Stocastic Processes - Part I

Axioms of Probability. Set Theory. M. Bremer. Math Spring 2018

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 16. A Brief Introduction to Continuous Probability

MS-A0504 First course in probability and statistics

Notes Week 2 Chapter 3 Probability WEEK 2 page 1

Notes on Discrete Probability

Transcription:

Review of Basic Probability Theory James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 35

Review of Basic Probability Theory 1 Introduction 2 Set Theory Definition of a Set Complement, Union, Intersection, and Difference Set Partition and Mutually Exclusive Sets 3 Probabilistic Experiments and Events Probabilistic Experiments and their Sample Spaces Elementary and Compound Events 4 Axioms and Basic Theorems of Probability 5 Basic Rules for Computing Probability The General Rule 6 Joint Events, Conditional Probability, and Independence Joint and Marginal Events Conditional Probability 7 Sequences and the Multiplicative Rules Sequences as Intersections The Multiplicative Rules 8 Probability Distributions Discrete Probability Distributions Continuous Probability Distributions Calculating Probability with R 9 Random Variables and Their Characteristics Random Variables Expected Value of a Random Variable Variance and Covariance of Random Variables The Algebra of Expected Values, Variances, Covariances and Linear Combinations James H. Steiger (Vanderbilt University) 2 / 35

Introduction Introduction If you took Psychology 310, you had a thorough introduction to set theory and probability theory from the ground up. RDASA Chapter 3 provides a very condensed account of many of the key facts we developed. I ll review these facts here, but lecture slides and recordings are also available of material from Psychology 310 should you need them. MWL use the notation p(a) to stand for the probability of an event A, while I use the notation Pr(A). James H. Steiger (Vanderbilt University) 3 / 35

Set Theory Set Theory Some grounding in basic set theory is very useful to gaining a solid understanding of probability theory. Many books attempt to discuss probability theory while never formally defining set theory, and instead relying on an informal version of set theory. This saves time, but also reduces precision, depending on the sophistication of the reader. MWL use informal language and notation while I shall stick with the formal notation of set theory. In the long run, it will be useful to you to be able to read texts using either notational variation. James H. Steiger (Vanderbilt University) 4 / 35

Set Theory Definition of a Set Set Theory Definition of a Set A set is any well-defined collection of objects. The term object is used loosely. It could refer to numbers, ideas, people, or fruits. The elements of a set are the objects in the set. James H. Steiger (Vanderbilt University) 5 / 35

Set Theory Complement, Union, Intersection, and Difference Set Theory Complement, Union, Intersection, and Difference The universal set Ω is the set of all objects currently under consideration. The null (empty) set is a set with no elements. The complement of A, i.e., Ā, is the set of all elements not in A (but in Ω). Consider two sets A and B: The union of A and B, i.e., A B, is the set of all elements in A, B, or both. The intersection of A and B, i.e., A B, is the set of all elements in A and B. Informal discussions of probability will refer to Pr(A and B) to signify Pr(A B). The set difference, A B, is defined as the set of elements in A but not in B. That is, A B = A B. James H. Steiger (Vanderbilt University) 6 / 35

Set Theory Set Partition and Mutually Exclusive Sets Set Theory Set Partition and Mutually Exclusive Sets Two sets A and B are said to be mutually exclusive if they have no elements in common, i.e., A B = A set of events is exhaustive of another set if their union is equal to that set. James H. Steiger (Vanderbilt University) 7 / 35

Set Theory Set Partition and Mutually Exclusive Sets Set Theory Set Partition and Mutually Exclusive Sets A group of n sets E i, i = 1,...n is said to partition the set A if the E i are mutually exclusive and exhaustive with respect to A. Note the following: 1 Any set is the union of its elements. 2 Any set is partitioned by its elements. James H. Steiger (Vanderbilt University) 8 / 35

Probabilistic Experiments and Events Probabilistic Experiments and their Sample Spaces Probabilistic Experiments and Events Probabilistic Experiments and their Sample Spaces A probabilistic experiment E is a situation in which 1 Something can happen. 2 What can happen is well-defined. 3 The outcome is potentially uncertain. James H. Steiger (Vanderbilt University) 9 / 35

Probabilistic Experiments and Events Probabilistic Experiments and their Sample Spaces Probabilistic Experiments and Events Probabilistic Experiments and their Sample Spaces Associated with every probabilistic experiment E i is a sample space S i. The sample space is defined as the universal set of all possible outcomes of E i. James H. Steiger (Vanderbilt University) 10 / 35

Probabilistic Experiments and Events Elementary and Compound Events Probabilistic Experiments and Events Elementary and Compound Events The elementary events in a sample space are the elements of S. As such, they constitute the finest possible partition of S. Example (Elementary Events) You throw a fair die, and observe the number that comes up. The elementary events are 1,2,3,4,5,6. James H. Steiger (Vanderbilt University) 11 / 35

Probabilistic Experiments and Events Elementary and Compound Events Probabilistic Experiments and Events Elementary and Compound Events Compound events are the union of two or more elementary events. Example (Compound Events) In the die throw experiment, the event Even = 2 4 6 = {2, 4, 6} is a compound event. James H. Steiger (Vanderbilt University) 12 / 35

Axioms and Basic Theorems of Probability Axioms and Basic Theorems of Probability Given a sample space S and any event A i within S, we assign to each event a number called the probability of A. The probabilities in S must satisfy the following: 1 Pr(A) 0 2 Pr(S) = 1 3 If two events A and B in S are mutually exclusive, then Pr(A B) = Pr(A) + Pr(B). James H. Steiger (Vanderbilt University) 13 / 35

Axioms and Basic Theorems of Probability Axioms and Basic Theorems of Probability From the axioms, we can immediately derive the following theorems: Pr(A) = 1 Pr(A) Pr( ) = 0 Pr(A B) = Pr(A) + Pr(B) Pr(A B) James H. Steiger (Vanderbilt University) 14 / 35

Basic Rules for Computing Probability The General Rule Basic Rules for Computing Probability The General Rule For any event A composed of elementary events e i, i.e., A = n i=1 e i, the probability of A is the sum of the probabilities of the elementary events in A, i.e., Pr(A) = n i=1 Pr(e i) James H. Steiger (Vanderbilt University) 15 / 35

Basic Rules for Computing Probability The General Rule Basic Rules for Computing Probability The General Rule Since the probabilities of the elementary events must sum to 1, if the elementary events are equally likely, then every one of the n S elementary event has a probability of 1/n S. Consequently, the total probability of an event A can be computed as Pr(A) = n A n S (1) where n A is the number of elementary events in A, and n S is the number of elementary events in the sample space. James H. Steiger (Vanderbilt University) 16 / 35

Joint Events, Conditional Probability, and Independence Joint and Marginal Events Joint Events, Conditional Probability, and Independence Joint and Marginal Events In many situations, two or more physical processes are occurring simultaneously, and we can define the elementary events of the sample space to be the intersection of the outcomes on the two processes. For example, I might throw a die and a coin simultaneously. In that case, we have 12 elementary events, each one of which is an intersection. They are H 1, H 2..., T 1,... T 6. The marginal events are the outcomes on the individual processes, i.e., H, T, 1, 2, 3, 4, 5, 6. James H. Steiger (Vanderbilt University) 17 / 35

Joint Events, Conditional Probability, and Independence Joint and Marginal Events Joint Events, Conditional Probability, and Independence Joint and Marginal Events Here is a tabular representation. Note that the probability of a marginal event in a row or column is the sum of the probabilities of the joint events in that row or column. (Why?) Die Coin 1 2 3 4 5 6 H H 1 H 2 H 3 H 4 H 5 H 6 T T 1 T 2 T 3 T 4 T 5 T 6 James H. Steiger (Vanderbilt University) 18 / 35

Joint Events, Conditional Probability, and Independence Conditional Probability Joint Events, Conditional Probability, and Independence Conditional Probability The conditional probability of A given B, written as Pr(A B), is the probability of A within the reduced sample space defined by B. To evaluate conditional probability, we simply move inside the class of events defined by B, and calculate what proportion of the events in B are examples of A. For example, suppose the probabilities of our joint events were as given in the table on the next slide. James H. Steiger (Vanderbilt University) 19 / 35

Joint Events, Conditional Probability, and Independence Conditional Probability Joint Events, Conditional Probability, and Independence Conditional Probability Die 1/6 1/6 1/6 1/6 1/6 1/6 Coin 1 2 3 4 5 6 1/2 H 1/6 0 1/6 0 1/6 0 1/2 T 0 1/6 0 1/6 0 1/6 What is Pr(1 H)? Pr(1 T )? What is Pr(H 1)? Pr(H 2)? Notice that the conditional probabilities within any conditionalizing event are simply the joint probabilities inside that event, re-standardized so that they add up to 1. They are re-standardized to add up to 1 by dividing by what they currently add up to, i.e., Pr(B). This leads to the formula for conditional probability often given as its definition: Pr(A B) Pr(A B) = (2) Pr(B) James H. Steiger (Vanderbilt University) 20 / 35

Joint Events, Conditional Probability, and Independence Conditional Probability Joint Events, Conditional Probability, and Independence Conditional Probability Conditional probability reflects how the probabilities of the outcomes on one process change after we are informed about the status of the outcome of the other process. If knowledge of one process changes the probability structure for the other, we say that the processes are dependent. If knowledge of one process does not change the probability structure for the other, then we say that the processes are independent. This leads to two formally equivalent definitions of independence. Two events A and B are independent if Pr(A B) = Pr(A) (3) or Pr(A B) = Pr(A) Pr(B) (4) James H. Steiger (Vanderbilt University) 21 / 35

Sequences and the Multiplicative Rules Sequences as Intersections Sequences and the Multiplicative Rules Sequences as Intersections Sequences of events are, of course, events themselves. Consider when we shuffle a poker deck and draw two cards off the top. The sequence of cards involves Card 1 followed by Card 2. If we now consider the event Ace on the first card followed by Ace on the second card, we quickly realize that this is the intersection of two events, i.e. A 1 A 2. How can we compute the probability of this event? James H. Steiger (Vanderbilt University) 22 / 35

Sequences and the Multiplicative Rules The Multiplicative Rules Sequences and the Multiplicative Rules The Multiplicative Rules Recall that Pr(B A) = Pr(B A)/ Pr(A) = Pr(A B)/ Pr(A) This implies that Pr(A B) = Pr(A) Pr(B A) (5) If A and B are independent, the rule simplifies to. Pr(A B) = Pr(A) Pr(B) James H. Steiger (Vanderbilt University) 23 / 35

Sequences and the Multiplicative Rules The Multiplicative Rules Sequences and the Multiplicative Rules The Multiplicative Rules These rules are sometimes referred to as the multiplicative rules of probability. They generalize to sequences of any length. For example, suppose you draw 3 cards from a poker deck without replacement. Since the outcome on each card affects the probability structure of the succeeding draws, the events are not independent, and the probability of drawing 3 aces can be written using the general multiplicative rule as: Pr(A 1 A 2 A 3 ) = Pr(A 1 ) Pr(A 2 A 1 ) Pr(A 3 A 1 A 2 ) (6) James H. Steiger (Vanderbilt University) 24 / 35

Probability Distributions Probability Distributions There are two fundamental types of random variables, discrete and continuous With discrete random variables, the number of events is countable. With continous random variables, the number of events is not countable and infinite, and continuous over at least some range on the number line. James H. Steiger (Vanderbilt University) 25 / 35

Probability Distributions Discrete Probability Distributions Probability Distributions Discrete Probability Distributions When the number of events is countable and finite, the probability of each elementary event can be calculated, and the sum of these probabilities is 1. In this case, each event has a probability, and the probability density function, or pdf, is usually denoted p(x). This is the probability that the random variable X takes on the value x, i.e., p(x) = Pr(X = x). The cumulative probability function, F (x), is the cumulative probability up to and including x, i.e., F (x) = Pr(X x). James H. Steiger (Vanderbilt University) 26 / 35

Probability Distributions Continuous Probability Distributions Probability Distributions Continuous Probability Distributions When a random variable X is continuous, it takes on all values over some continuous range, and so the number of outcomes is uncountably infinite. For continuous random variables, the probability of a particular outcome x cannot be defined, even if x can occur. The probability is infinitesimally small. Rather than defining the probability of x, we instead define an alternative concept, probability density. Probability density f (x) is not probability. Rather, it is the instantaneous rate at which the cumulative probability F is increasing at x. That is, it is the slope (derivative) of F (x) at x. In a similar fashion, F (x) is the area under the probability density curve. It is the integral of f (x). Although the probability of a single value x cannot be defined with continuous random variables, the probability of an interval can be. To compute the probability of an interval, we simply take the difference of the two cumulative probabilities. That is, Pr(a X b) = F (b) F (a) James H. Steiger (Vanderbilt University) 27 / 35

Probability Distributions Calculating Probability with R Probability Distributions Calculating Probability with R R provides a set of functions for performing probability calculations for many of the most important distributions. The key function types are 1 d, for calculating probability (discrete r.v.) or probability density (continuous r.v.) 2 p, for calculating cumulative probability. 3 q, for calculating inverse cumulative probabilities, or quantiles. To perform these calculations, you need to know the code for the distribution you are working with. Some common codes are: 1 norm. The normal distribution. 2 t. Student s t distribution. 3 f. The F distribution. 4 chisq. The chi-square distribution. You combine the function type and the distribution name to obtain the name of the function you need to perform a particular calculation. James H. Steiger (Vanderbilt University) 28 / 35

Probability Distributions Calculating Probability with R Probability Distributions Calculating Probability with R What value of the standard normal distribution is at the 95th percentile? > qnorm(0.95) [1] 1.645 What is the probability that a t-variable with 19 degrees of freedom is less than or equal to 1.00? > pt(1, 19) [1] 0.8351 What is the probability that an observation from a normal distribution with mean 500 and standard deviation 100 will be between 600 and 700? > pnorm(700, 500, 100) - pnorm(600, 500, 100) [1] 0.1359 James H. Steiger (Vanderbilt University) 29 / 35

Random Variables and Their Characteristics Random Variables Random Variables and Their Characteristics Random Variables In many introductory courses, the informal notion of a random variable X is that it is a process that generates numerical outcomes according to some rule. This definition suffices for practical purposes in such courses, and we can get by with it here. A more formal definition is that, given a probabilistic experiment E with sample space of outcomes S, a random variable S is a function assigning a unique number to each outcome in S. Because each outcome is connected to a unique number, each unique number inherits all the probabilities attached to its outcomes. Here is a simple example from the discrete case of a die throw. James H. Steiger (Vanderbilt University) 30 / 35

Random Variables and Their Characteristics Random Variables Random Variables and Their Characteristics Random Variables Suppose you throw a fair die, and code the outcomes as in the table below Outcome (in S) 1 2 3 4 5 6 Value of X 1 0 1 0 1 0 The random variable X would then have the probability distribution shown in the following table x P X (x) 1 1/2 0 1/2 James H. Steiger (Vanderbilt University) 31 / 35

Random Variables and Their Characteristics Expected Value of a Random Variable Random Variables and Their Characteristics Expected Value of a Random Variable The expected value, or mean of a random variable X, denoted E(X ) (or, alternatively, µ X ), is the long run average of the values taken on by the random variable. Technically, this quantity is defined differently depending on whether a random variable is discrete or continuous. For some random variables, E( X ) =, and we say that the expected value does not exist. James H. Steiger (Vanderbilt University) 32 / 35

Random Variables and Their Characteristics Variance and Covariance of Random Variables Random Variables and Their Characteristics Variance and Covariance of Random Variables The variance of a random variable X is its average squared deviation score. We can convert a random variable into deviation scores by subtracting its mean, i.e., X E(X ) So the squared deviation score is (X E(X )) 2 and the average squared deviation score is An alternate, very useful formula is Var(X ) = σ 2 X = E [ (X E(X )) 2] Var(X ) = σ 2 X = E(X 2 ) (E(X ))) 2 James H. Steiger (Vanderbilt University) 33 / 35

Random Variables and Their Characteristics Variance and Covariance of Random Variables Random Variables and Their Characteristics Variance and Covariance of Random Variables In a similar vein, the covariance of two random variables X and Y is their average cross-product of deviations, i.e. which may also be calculated as Cov(X, Y ) = E [(X E(X ))(Y E(Y ))] (7) Cov(X, Y ) = E(XY ) E(X ) E(Y ) (8) James H. Steiger (Vanderbilt University) 34 / 35

Random Variables and Their Characteristics The Algebra of Expected Values, Variances, Covariances and Linear Combinations Random Variables and Their Characteristics The Algebra of Expected Values, Variances, Covariances and Linear Combinations Properties of expected value and variance of a random variable are developed in detail in the Psychology 310 notes, and in the handout chapter on Covariance Algebra. For our purposes, the key properties are 1 Under linear transforms, the expected value (mean) of a random variables behaves the same as the mean of a list of numbers, i.e., for constants a, b and random variable X, E(aX + b) = a E(X ) + b (9) 2 If X and Y are random variables and are uncorrelated, E(XY ) = E(X ) E(Y ) (10) 3 The mean, variance, and covariance of linear combinations follow the identical rules we developed earlier. For example E(aX + by ) = a E(X ) + b E(Y ) (11) Var(X + Y ) = Var(X ) + Var(Y ) + 2 Cov(X, Y ) (12) Cov(X + Y, X Y ) = Var(X ) Var(Y ) (13) James H. Steiger (Vanderbilt University) 35 / 35