Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Similar documents
Lecture 2: Probability and Distributions

Probability and Probability Distributions. Dr. Mohammed Alahmed

Binomial and Poisson Probability Distributions

Sampling WITHOUT replacement, Order IS important Number of Samples = 6

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Conditional Probabilities

Today we ll discuss ways to learn how to think about events that are influenced by chance.

Statistical Experiment A statistical experiment is any process by which measurements are obtained.

Basic Statistics and Probability Chapter 3: Probability

Review for Exam Spring 2018

Dept. of Linguistics, Indiana University Fall 2015

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

9/6/2016. Section 5.1 Probability. Equally Likely Model. The Division Rule: P(A)=#(A)/#(S) Some Popular Randomizers.

7.1 What is it and why should we care?

4. Discrete Probability Distributions. Introduction & Binomial Distribution

CIVL Why are we studying probability and statistics? Learning Objectives. Basic Laws and Axioms of Probability

Announcements. Topics: To Do:

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Lecture 1: Probability Fundamentals

Probability and Discrete Distributions

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

BASICS OF PROBABILITY CHAPTER-1 CS6015-LINEAR ALGEBRA AND RANDOM PROCESSES

Lecture 16. Lectures 1-15 Review

STAT 201 Chapter 5. Probability

Probability 5-4 The Multiplication Rules and Conditional Probability

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

ACMS Statistics for Life Sciences. Chapter 9: Introducing Probability

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

The possible experimental outcomes: 1, 2, 3, 4, 5, 6 (Experimental outcomes are also known as sample points)

Probability Notes (A) , Fall 2010

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

AMS7: WEEK 2. CLASS 2

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

STEP Support Programme. Statistics STEP Questions

Lecture 2: Repetition of probability theory and statistics

Decision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation

Discrete Probability. Chemistry & Physics. Medicine

lecture notes October 22, Probability Theory

Given a experiment with outcomes in sample space: Ω Probability measure applied to subsets of Ω: P[A] 0 P[A B] = P[A] + P[B] P[AB] = P(AB)

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Algorithms for Uncertainty Quantification

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

Probability Review. Gonzalo Mateos

CS 361: Probability & Statistics

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Chapter 2: Probability Part 1

Econ 325: Introduction to Empirical Economics

Topic 4 Probability. Terminology. Sample Space and Event

Basic Concepts of Probability

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

CS 237: Probability in Computing

2.6 Tools for Counting sample points

Statistical Inference

2. AXIOMATIC PROBABILITY

Lecture 4: Random Variables and Distributions

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Lecture 6: The Pigeonhole Principle and Probability Spaces

Chapter 6. Probability

3.2 Probability Rules

Properties of Probability

Conditional Probability

Stochastic Models of Manufacturing Systems

Discrete Probability

Fundamentals of Probability CE 311S

Review of Probabilities and Basic Statistics

Lecture Lecture 5

HW MATH425/525 Lecture Notes 1

1 Probability Theory. 1.1 Introduction

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

Bandits, Experts, and Games

CIVL Probability vs. Statistics. Why are we studying probability and statistics? Basic Laws and Axioms of Probability

Review of probability

1 The Basic Counting Principles

Chapter Summary. 7.1 Discrete Probability 7.2 Probability Theory 7.3 Bayes Theorem 7.4 Expected value and Variance

Discrete Probability and State Estimation

1 Presessional Probability

Counting principles, including permutations and combinations.

Advanced Herd Management Probabilities and distributions

Deep Learning for Computer Vision

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Basic probability. Inferential statistics is based on probability theory (we do not have certainty, but only confidence).

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

Math Bootcamp 2012 Miscellaneous

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

MAT2377. Ali Karimnezhad. Version September 9, Ali Karimnezhad

What is Probability? Probability. Sample Spaces and Events. Simple Event

CIVL 7012/8012. Basic Laws and Axioms of Probability

Class 26: review for final exam 18.05, Spring 2014

Lecture 2 Binomial and Poisson Probability Distributions

Introduction to Probability. Experiments. Sample Space. Event. Basic Requirements for Assigning Probabilities. Experiments

Chapter 2 Class Notes

Recap of Basic Probability Theory

Lecture 2: Discrete Probability Distributions

Chapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition:

Transcription:

Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical notation Providing a framework for answering scientific questions Later, we will see how some common statistical methods in the scientific literature are actually probability concepts in disguise 1 / 67 2 / 67 What is Probability? Classical Definition Probability is a measure of uncertainty about the occurrence of events Two definitions of probability Classical definition Relative frequency definition P(E) = m N If an event can occur in N equally likely and mutually exclusive ways, and if m of these ways possess the characteristic E, then the probability of E is m N 3 / 67 4 / 67

Example: Coin toss Relative Frequency Definition Flip one coin Tails and heads equally likely N = 2 possible events Let H=Heads and T=Tails P(E) = m n If an experiment is repeated n times, and characteristic E occurs m of those times, then the relative frequency of E is m n, and it is approximately equal to the probability of E We are interested in the probability of tails: P(Tails)= P(T) = 1 2 5 / 67 6 / 67 Example: Multiple coin tosses I Example: Multiple coin tosses II Flip 100 coins Outcome Frequency T = Tails 53 H = Heads 47 Total 100 P(Tails) = P(T) 53 100 = 0.53 0.50 What happens if we flip 10,000 coins? Outcome Frequency T = Tails 5063 H = Heads 4937 Total 10000 P(Tails) = P(T) 5063 10000 = 0.51 0.50 7 / 67 8 / 67

Relative frequency intuition Outcome characteristics The probability of T is the limit of the relative frequency of T, as the sample size n goes to infinity The long run relative frequency Statistical independence Mutually exclusive 9 / 67 10 / 67 Statistical Independence Example Two events are statistically independent if the joint probability of both events occurring is the product of the probabilities of each event occurring: P(A and B) = P(A) P(B) Let A = first born child is female Let B = second child is female P(A and B) = probability that first and second children are both female: Assuming independence: P(A and B) = P(A) P(B) = 1 2 1 2 = 1 4 11 / 67 12 / 67

Statistical independence: comment Mutually exclusive In a study where we are selecting patients at random from a population of interest, we assume that the outcomes we observe are independent... In what situations would this assumption be violated? Two events are mutually exclusive if the joint probability of both events occurring is 0: P(A and B) = 0 Ex: A = first child is female, B = first child is male 13 / 67 14 / 67 ASIDE: Independence and Mutual exclusivity aren t the same Probability rules Two events are independent if the joint probability of both events occurring is the product of the probabilities of each event occurring: P(A and B) = P(A) P(B) Two events are mutually exclusive if the joint probability of both events occurring is 0: P(A and B) = 0 So the events A and B are both mutually exclusive AND independent only when P(A) = 0 or P(B) = 0. 1 The probability of any event is non-negative, and no greater than 1: 0 P(E) 1 2 Given n mutually exclusive events, E 1,E 2,,E n covering the sample space, the sum of the probabilities of events is 1: n P(E i ) = P(E 1 ) + P(E 2 ) + + P(E n ) = 1 i=1 3 If E i and E j are mutually exclusive events, then the probability that either E i or E j occur is: P(E i E j ) = P(E i ) + P(E j ) 15 / 67 16 / 67

Set notation The addition rule If two events, A and B, are not mutually exclusive, then the probability that event A or event B occurs is: A set is a group of disjoint objects An element of a set is an object in the set The union of two sets, A and B, is a larger set that contains all elements in either A, B or both Notation: A B The intersection of two sets, A and B, is the set containing all elements found in both A and B Notation: A B P(A B) = P(A) + P(B) P(A B) where P(A B) is the probability that both events occur P(A B) A B 17 / 67 18 / 67 Conditional probability The multiplication rule The conditional probability of an event A given an event B is: where P(B) 0 P(A B) = P(A B) P(B) P(A B) In general: P(A B) = P(B) P(A B) When events A and B are independent, P(A B) = P(A) and: P(A B) = P(A) P(B) A B 19 / 67 20 / 67

Bayes rule Example: Sex and Age I Useful for computing P(B A) if P(A B) and P(A B c ) are known Ex: Screening We know P(test positive true positive) We want P(true positive test positive) Ex: Bayesian statistics uses assumptions about P(data state of the world) to derive statements about P(state of the world data) Age Age Young (B 1 ) Older (B 2 ) Total Male (A 1 ) 30 20 50 Female (A 2 ) 40 10 50 Total 70 30 100 The rule: P(B A) = P(A B) P(B) P(A B) P(B) + P(A B c ) P(B c ) where B c denotes the complement of B or not B 21 / 67 22 / 67 Example: Sex and Age II Age Age Young (B 1 ) Older (B 2 ) Total Male (A 1 ) 30 20 50 Female (A 2 ) 40 10 50 Total 70 30 100 A 1 = {all males}, A 2 = {all females} B 1 = {all young}, B 2 = {all older} A 1 A 2 = {all people} = B 1 B 2 A 1 A 2 = {no people} = = B 1 B 2 A 1 B 1 = {male or young} A 1 B 2 = {male or old} A 2 B 2 = {female and old} Example: Sex and Age III Age Age Young (B 1 ) Older (B 2 ) Total Male (A 1 ) 30 20 50 Female (A 2 ) 40 10 50 Total 70 30 100 P(A 1 ) = P(male) = 50 100 = 0.5 P(A 2 ) = P(female) = 50 100 = 0.5 P(B 1 ) = P(young) = 70 100 = 0.7 P(B 2 ) = P(older) = 30 100 = 0.3 23 / 67 24 / 67

Example: Sex and Age IV Age Age Young (B 1 ) Older (B 2 ) Total Male (A 1 ) 30 20 50 Female (A 2 ) 40 10 50 Total 70 30 100 Example: Sex and Age V Age Age Young (B 1 ) Older (B 2 ) Total Male (A 1 ) 30 20 50 Female (A 2 ) 40 10 50 Total 70 30 100 P(A 2 B 2 ) = P(older and female) = 10 100 = 0.1 P(A 1 B 1 ) = P(young or male) = P(A 1 ) + P(B 1 ) P(A 1 B 1 ) = 50 100 + 70 100 30 100 = 90 100 = 0.9 P(B 2 A 2 ) = P(older female) = P(B 2 A 2 ) = 10/100 P(A 2 ) 50/100 = 10 50 = 0.2 P(B 2 A 1 ) = P(older male) = P(B 2 A 1 ) = 20/100 P(A 1 ) 50/100 = 20 50 = 0.4 P(B 2 ) = P(older) = 30 100 = 0.3 26 / 67 25 / 67 Example: Sex and Age VI Example: Sex and Age VII P(B 2 A 2 ) P(B 2 A 1 ) P(B 2 ) In this group, sex and age are not independent Age Age Young (B 1 ) Older (B 2 ) Total Male (A 1 ) 30 20 50 Female (A 2 ) 40 10 50 Total 70 30 100 Try these on your own... P(A 1 A 2 ) = P(B 1 B 2 ) = P(A 2 B 2 ) = 27 / 67 28 / 67

Example: Blood Groups I Example: Blood Groups II Sex Blood group Male Female Total O 113 170 283 A 103 155 258 B 25 37 62 AB 10 15 25 Total 251 377 628 P(male) = 1 P(female) = 251 628 0.4 P(O) = 283 628 0.45 P(A) = 258 628 0.41 P(B) = 62 628 0.10 P(AB) = 25 628 0.04 29 / 67 30 / 67 Example: Blood Groups III Example: Disease in the population Question: Are sex and blood group independent? P(O male) = 113 251 0.45 P(O female) = 170 377 0.45 same as P(O) = 283 628 0.45 Can show same equalities for all blood types Yes, sex and blood group appear to be independent of each other in this sample For patients with Disease X, suppose we knew the age proportions per sex, as well as the sex distribution. Question: Could we compute the sex proportions in each age group (young / older)? Answer: Use Bayes Rule A 1 = {males}, A 2 = {females} B 1 = {young}, B 2 = {older} P(A 1 ) = P(A 2 ) = 0.5 P(B 2 A 2 ) = 0.2 P(B 2 A 1 ) = 0.4 P(B 2 A 2 ) P(A 2 ) P(A 2 B 2 ) = P(B 2 A 2 ) P(A 2 ) + P(B 2 A 1 ) P(A 1 ) 31 / 67 32 / 67

Probability Distributions Commonly Used Distributions Often, we assume a true underlying distribution Ex: P(tails) = 1 2, P(heads) = 1 2 This distribution is characterized by a mathematical formula and a set of possible outcomes Two types of distributions: Discrete Continuous Discrete Binomial two possible outcomes Underlies much of statistical applications to epidemiology Basic model for logistic regression Poisson uses counts of events of rates Basis for log-linear models Continuous Normal bell shaped curve Many characteristics are normally distributed or approximately normally distributed Basic model for linear regression Exponential useful in describing growth 33 / 67 34 / 67 Counting techniques Factorials Factorials: count the number of ways to arrange things Permutations: count the number of possible ordered arrangements of subsets of a given size Combinations: count the number of possible unordered arrangements of subsets of a given size Notation: n! ( n factorial ) Number of possible arrangements of n objects n! = n(n-1)(n-2)(n-3) (3)(2)(1) Standard convention: 0!=1 35 / 67 36 / 67

Permutations Combinations The number of ways you can take r objects from a total of n objects when order matters np r = n! (n r)! = n(n 1) (n r + 1)(n r) 1 (n r)(n r 1) 1 = n(n 1)(n 2) (n r + 1) The number of ways you can take r objects from a total of n objects when order doesn t matter ( ) n n! = n choose r = r r!(n r)! ( ) 4 4! = 2 2!(4 2)! = 4 3 2 1 = 12 2! 2! 2 = 6 Note: the number of combinations is less than or equal to the number of permutations. 37 / 67 38 / 67 The Binomial Distribution Binomial Distribution Assumption You ve seen it before: 2 x 2 tables and applications Proportions: CIs and tests Sensitivity and Specificity Odds ratio and relative risk Logistic regression Bernoulli trial model The study of experiment consists of n smaller experiments (trials) each of which has only two possible outcomes Dead or alive Success of failure Diseased, not diseased The outcomes of the trials are independent The probabilities of the outcomes of the trial remain the same from trial to trial 39 / 67 40 / 67

Binomial Distribution Function Example: Binomial (n=2) I The probability of obtaining x successes in n Bernoulli trials is: P(X = x) = ( ) n x p x (1 p) n x where: p = probability of a success q = 1-p = probability of failure X is a random variable x is a particular number What is the probability, in a random sample of size 2, of observing 0, 1, or 2 heads? # heads Possible outcome Probability 2 HH p p = p 2 1 HT p q 1 TH q p 0 TT q q= q 2 41 / 67 42 / 67 Example: Binomial (n=2) II Example: Binomial (n=2) III Recall x = number of observed heads ( ) n P(X = x) = p x (1 p) n x x ( ) 2 P(X = 0) = (0.5) 0 (0.5) 2 0 0 2! = 0!(2 0)! (1)(0.5)2 = 0.25 = q 2 ( ) 2 P(X = 1) = (0.5) 1 (0.5) 2 1 1 2! = 1!(2 1)! (0.5)(0.5) = 2(0.5)(0.5) = 0.5 = 2 p q ( ) 2 P(X = 2) = (0.5) 2 (0.5) 2 2 2 2! = 2!(2 2)! (0.5)2 (0.5) 0 = 0.25 = p 2 43 / 67 44 / 67

Example: Binomial (n=3) I Example: Binomial (n=3) II Since X takes discrete values only: # successes Samples P(X=x) ( 3 {+ + +} 3 ) ( 3 p 3 q 0 = p 3 2 {+ +,+ +, + +} 3 ) ( 2 p 2 q = 3p 2 q 1 {+, +, +} 3 ) 1 ) pq 2 = 3pq 2 0 { } p 0 q 3 = q 3 ( 3 0 P(X 1) = P(X = 0) + P(X = 1) P(X < 1) = P(X = 0) P(X > 2) = P(X = 3) P(1 X 2) = P(X = 1) + P(X = 2) P(X 1) = P(X = 1) + P(X = 2) + P(X = 3) = 1 P(X = 0) 45 / 67 46 / 67 Example: Binomial (n=3) III Example: Binomial (n=3) IV 2 3 = 8 possible outcomes The probability that a person suffering from a head cold will obtain relief with a particular drug is 0.9. Three randomly selected sufferers from the cold are given the drug. p=0.9 q=1-p=0.1 n=3 Outcome Probability Trial 1 Trial 2 Trial 3 S S S ppp S S F ppq S F S pqp F S S qpp F F S qqp F S F qpq S F F pqq F F F qqq 47 / 67 48 / 67

Example: Binomial (n=3) V Probability exactly zero (none) obtain relief: ( ) 3 P(X = 0) = p 0 q 3 = q 3 0 = (0.1) 3 = 0.001 Probability exactly one obtains relief: ( ) 3 P(X = 1) = p 1 q 2 1 = 3! 1!2! pq2 = 3 2! 1!2! pq2 = 3pq 2 = 3(0.9)(0.1) 2 = 0.027 Mean and Variance Mean of a random variable (r.v.) X Expected value, expectation µ = E(X) i x ip(x = x i ) for discrete r.v. x f (x)dx for continuous r.v. + Variance of a random variable, X σ 2 = Var(X) = E(X µ) 2 = E(X 2 ) µ 2 The standard deviation σ = σ 2 = Var(X) 49 / 67 50 / 67 Example: Bernoulli Distribution Properties of Expectation Let X = 1 with probability p, and 0 otherwise Calculation of the mean: E(X) = µ = i=0,1 x ip(x = x i ) = 1 p + 0 (1 p) = p Calculation of the variance: E(X 2 ) = (1 2 ) p + (0 2 ) (1 p) = p Var(X) = E(X 2 ) µ 2 = p p 2 = p (1 p) 1 E(c) = c where c is a constant 2 E(c X) = c E(X) 3 E(X 1 + X 2 ) = E(X 1 ) + E(X 2 ) 51 / 67 52 / 67

Properties of Variance Binomial Mean and Variance 1 Var(c) = 0 where c is a constant 2 Var(c X) = c 2 Var(X) 3 Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) if X 1 and X 2 are independent S is Binomial (n,p), so... S = n i=1 X i where X i are independent Bernoulli(p) random variables E(S) = n i=1 E(X i) = n i=1 p = np Var(S) = n i=1 Var(X i) = n i=1 p(1 p) = np(1 p) 53 / 67 54 / 67 Poisson Distribution Poisson Distribution Examples Describes occurrences or objects which are distributed randomly in space or time Often used to describe distribution of the number of occurrences of a rare event Underlying assumptions similar to those for binomial distribution Useful when there are counts with no denominator Distribution Binomial Poisson Parameters needed n, p λ = np = the expected number of events per unit time Number of Prussian officers killed by horse kicks between 1875 and 1894 Spatial distribution of stars, weeds, bacteria, flying-bomb strikes Emergency room or hospital admissions Typographical errors Deaths due to a rare disease 55 / 67 56 / 67

Poisson Assumptions Poisson Probability The occurrences of a random event in an interval of time are independent In theory, an infinite number of occurrences of the event are possible (though perhaps rare) within the interval In any extremely small portion of the interval, the probability of more than one occurrence of the event is approximately zero The probability of x occurrence of an event in an interval is: P(X = x) = e λ λ x,x = 0,1,2,... x! where λ = the expected number of occurrences in the interval e = a constant ( 2.718) For the Poisson distribution: mean = variance = λ 57 / 67 58 / 67 Example: Traffic accidents I Example: Traffic accidents II Suppose the goal has been set of bringing the expected number of traffic accidents per day in Baltimore down to 3. There are 5 fatal accidents today. Has the goal been attained? The number of accidents follows a Poisson distribution because The population that drives in Baltimore is large The number of accidents is relatively small People have similar risks of having an accident (?) The number of people driving each day is fairly stable The probability of two accidents occurring at exactly the same time is approximately zero We are aiming for of a rate of λ = 3 fatal accidents per day, or lower The observed number is 5 P(X = 5;λ = 3) = e 3 3 5 5! = 0.101 Has the goal been attained? 59 / 67 60 / 67

Example: Suicide in the City Poisson and Binomial If the rate for a given rare condition is expressed as µ per time period, the expected number of events is µt where t is the time period. Some questions we can answer using the properties of the Poisson distribution are: Suppose the weekly rate of suicide in a large city is 2. What is the probability of one suicide in a given week? P(X = 1;λ = 2) What is the probability of 2 suicides in 2 weeks? Since the weekly rate of suicide was 2/week, we expect 2 2 or 4 suicides per 2 week period. You can use the poisson distribution to calculate P(X = 2;λ = 4) The Poisson distribution can be used to approximate a Binomial(n,p) distribution when: n is large and p is very small, or np = λ is fixed, and n becomes infinitely large 61 / 67 62 / 67 Example: Cancer in a large population Example: Down s syndrome I The incidence of Down s syndrome as a function of mother s age Yearly cases of esophageal cancer in a large city; 30 cases observed in 1990 P(X = 30) = e λ λ 30 30! where λ = yearly average number of cases of esophageal cancer 63 / 67 64 / 67

Example: Down s syndrome II Example: Down s syndrome III Suppose the incidence of Down s syndrome in 40-year-old mothers is 1/100 Out of 25 babies born to 40-year-old women, what is the frequency of babies with Down s syndrome? We can approach this problem using a Binomial(25, 1/100) model, or using a Poisson(λ = 0.25) model Babies with P(X=x) Down s Syndrome Poisson Binomial 0 0.779 0.778 1 0.195 0.196 2 0.024 0.024 >2 0.002 0.002 Note: the approximation becomes even better for larger values of n. 65 / 67 66 / 67 Lecture 2 Summary We ve covered a lot of ground this lecture. Here s a quick summary of what we just discussed: Probability Commonly used definitions and properties including: independence, mutually exclusive, addition rule, conditional probability, the multiplication rule and Bayes rule Discrete distributions Binomial and Poisson Tomorrow we ll discuss the Normal distribution, the Central Limit Theorem, the t-distribution and confidence intervals. 67 / 67