Lecture 4: Random Variables and Distributions

Similar documents
Introduction to Probability and Statistics Slides 3 Chapter 3

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Bernoulli Trials, Binomial and Cumulative Distributions

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Binomial and Poisson Probability Distributions

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

Lecture 3. Discrete Random Variables

Chapter 3 Discrete Random Variables

Bayes Theorem and Hypergeometric Distribution

Outline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Probability and Probability Distributions. Dr. Mohammed Alahmed

Lecture 1: August 28

Lecture 2: Repetition of probability theory and statistics

The Central Limit Theorem

STAT509: Discrete Random Variable

Math/Stat 352 Lecture 8

Bayesian statistics, simulation and software

A continuous random variable X takes all values in an interval of numbers. Not countable

Lecture 14. Text: A Course in Probability by Weiss 5.6. STAT 225 Introduction to Probability Models February 23, Whitney Huang Purdue University

Chapters 3.2 Discrete distributions

STAT Chapter 5 Continuous Distributions

Mathematical Statistics 1 Math A 6330

Topic 9 Examples of Mass Functions and Densities

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

ECO220Y Continuous Probability Distributions: Uniform and Triangle Readings: Chapter 9, sections

3 Multiple Discrete Random Variables

Advanced Herd Management Probabilities and distributions

Relationship between probability set function and random variable - 2 -

Counting principles, including permutations and combinations.

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Brief Review of Probability

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

STT 315 Problem Set #3

Special distributions

success and failure independent from one trial to the next?

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random

Things to remember when learning probability distributions:

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

Continuous Distributions

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Conditional Probability

Bernoulli Trials and Binomial Distribution

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Binomial random variable

continuous random variables

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 1

Chapter 2. Discrete Distributions

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Analysis of Engineering and Scientific Data. Semester

Discrete Distributions

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

Stat 100a, Introduction to Probability.

Lecture 2: Probability and Distributions

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Probability Distributions Columns (a) through (d)

SDS 321: Introduction to Probability and Statistics

Algorithms for Uncertainty Quantification

Random Variables Example:

Random Variables. Lecture 6: E(X ), Var(X ), & Cov(X, Y ) Random Variables - Vocabulary. Random Variables, cont.

15 Discrete Distributions

Discrete Probability distribution Discrete Probability distribution

Basic concepts of probability theory

p. 4-1 Random Variables

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Chapter 5. Chapter 5 sections

Example A. Define X = number of heads in ten tosses of a coin. What are the values that X may assume?

Chapter 3: Discrete Random Variable

Fundamental Tools - Probability Theory II

To find the median, find the 40 th quartile and the 70 th quartile (which are easily found at y=1 and y=2, respectively). Then we interpolate:

Lecture 11. Probability Theory: an Overveiw

p. 6-1 Continuous Random Variables p. 6-2

Functions of Several Random Variables (Ch. 5.5)

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Quick Tour of Basic Probability Theory and Linear Algebra

R Functions for Probability Distributions

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

4. Discrete Probability Distributions. Introduction & Binomial Distribution

Classical Probability

Topic 3: The Expectation of a Random Variable

Basic concepts of probability theory

Math Key Homework 3 (Chapter 4)

Bernoulli Trials and Binomial Distribution

Chapter 2: Random Variables

1 Presessional Probability

Deep Learning for Computer Vision

What is a random variable

CSE 312 Final Review: Section AA

Exponential, Gamma and Normal Distribuions

Discrete Distributions

CSE 312, 2017 Winter, W.L. Ruzzo. 7. continuous random variables

Statistical Methods in Particle Physics

ST 371 (V): Families of Discrete Distributions

Conditional distributions (discrete case)

Basic probability. Inferential statistics is based on probability theory (we do not have certainty, but only confidence).

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

M378K In-Class Assignment #1

Transcription:

Lecture 4: Random Variables and Distributions

Goals Random Variables Overview of discrete and continuous distributions important in genetics/genomics Working with distributions in R

Random Variables A rv is any rule (i.e., function) that associates a number with each outcome in the sample space " -1 0 1

Two Types of Random Variables A discrete random variable has a countable number of possible values A continuous random variable takes all values in an interval of numbers

Discrete Probability Distributions of RVs Let X be a discrete rv. Then the probability mass function (pmf), f(x), of X is: f (x) = P(X = x), x Ω 0, x Ω Continuous A a Let X be a continuous rv. Then the probability density function (pdf) of X is a function f(x) such that for any two numbers a and b with a b: P(a " X " b) = b # a f (x)dx a b

Using CDFs to Compute Probabilities Continuous rv: F(x) = P(X " x) = f (y)dy pdf x % #$ cdf P(a " X " b) = F(b) # F(a)

Using CDFs to Compute Probabilities Continuous rv: F(x) = P(X " x) = f (y)dy pdf x % #$ cdf P(a " X " b) = F(b) # F(a)

Expectation of Random Variables Discrete Let X be a discrete rv that takes on values in the set D and has a pmf f(x). Then the expected or mean value of X is: Continuous The expected or mean value of a continuous rv X with pdf f(x) is: $ % #$ $ x #D µ X = E[X] = x " f (x) µ X = E[X] = x " f (x)dx

Discrete Variance of Random Variables Let X be a discrete rv with pmf f(x) and expected value µ. The variance of X is: Continuous " 2 X = V[X] = %(x # µ) 2 = E[(X # µ) 2 ] The variance of a continuous rv X with pdf f(x) and mean µ is: % " 2 X = V[X] = & (x # µ) 2 $ f (x)dx = E[(X # µ) 2 ] #% x $D

Example of Expectation and Variance Let L 1, L 2,, L n be a sequence of n nucleotides and define the rv X i : 1, if L i = A X i 0, otherwise pmf is then: P(X i = 1) = P(L i = A) = p A P(X i = 0) = P(L i = C or G or T) = 1 - p A E[X] = 1 x p A + 0 x (1 - p A ) = p A Var[X] = E[X - µ] 2 = E[X 2 ] - µ 2 = [1 2 x p A + 0 2 x (1 - p A )] - p A 2 = p A (1 - p A )

The Distributions We ll Study 1. Binomial Distribution 2. Hypergeometric Distribution 3. Poisson Distribution 4. Normal Distribution

Binomial Distribution Experiment consists of n trials e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed Trials are identical and each can result in one of the same two outcomes e.g., head or tail in each toss of a coin Generally called success and failure Probability of success is p, probability of failure is 1 p Trials are independent Constant probability for each observation e.g., Probability of getting a tail is the same each time we toss the coin

Binomial Distribution pmf: P{X = x} = ( n ) p x (1" p) n"x x cdf: P{X " x} = x $ n y y= 0 ( ) p y (1# p) n#y E(x) = np Var(x) = np(1-p)

Binomial Distribution: Example 1 A couple, who are both carriers for a recessive disease, wish to have 5 children. They want to know the probability that they will have four healthy kids P{X = 4} = ( 5 )0.75 4 " 0.25 1 4 = 0.395 p(x) 0 1 2 3 4 5

Binomial Distribution: Example 2 Wright-Fisher model: There are i copies of the A allele in a population of size 2N in generation t. What is the distribution of the number of A alleles in generation t + 1? p ij = 2N j " $ # i 2N % ' & j " $ 1( i # 2N % ' & 2N( j j = 0, 1,, 2N

Hypergeometric Distribution Population to be sampled consists of N finite individuals, objects, or elements Each individual can be characterized as a success or failure, m successes in the population A sample of size k is drawn and the rv of interest is X = number of successes

Hypergeometric Distribution Similar in spirit to Binomial distribution, but from a finite population without replacement 20 white balls out of 100 balls If we randomly sample 10 balls, what is the probability that 7 or more are white?

Hypergeometric Distribution pmf of a hypergeometric rv: P{X = i n,m,k} = m i m + n k n k - i For i = 0, 1, 2, 3, Where, k = Number of balls selected m = Number of balls in urn considered success n = Number of balls in urn considered failure m + n = Total number of balls in urn

Hypergeometric Distribution Extensively used in genomics to test for enrichment : Number of genes of interest with annotation Number of genes of interest Number of genes with annotation " = Number of annotated genes

Poisson Distribution Useful in studying rare events Poisson distribution also used in situations where events happen at certain points in time Poisson distribution approximates the binomial distribution when n is large and p is small

Poisson Distribution A rv X follows a Poisson distribution if the pmf of X is: P{X = i} = e "# # i i! For i = 0, 1, 2, 3, λ is frequently a rate per unit time: λ = αt = expected number of events per unit time t Safely approximates a binomial experiment when n > 100, p < 0.01, np = λ < 20) E(X) = Var(X) = λ

Poisson RV: Example 1 The number of crossovers, X, between two markers is X ~ poisson(λ=d) P{X = i} = e "d d i P{X = 0} = e "d i! P{X "1} =1# e #d

Poisson RV: Example 2 Recent work in Drosophila suggests the spontaneous rate of deleterious mutations is ~ 1.2 per diploid genome. Thus, let s tentatively assume X ~ poisson(λ = 1.2) for humans. What is the probability that an individual has 12 or more spontaneous deleterious mutations? P{X "12} =1# 11 $ i= 0 e #1.2 1.2 i i! = 6.17 x 10-9

Poisson RV: Example 3 Suppose that a rare disease has an incidence of 1 in 1000 people per year. Assuming that members of the population are affected independently, find the probability of k cases in a population of 10,000 (followed over 1 year) for k=0,1,2. The expected value (mean) = λ =.001*10,000 = 10 P(X = 0) = (10)0 e "(10) =.0000454 0! P(X =1) = (10)1 e "(10) =.000454 1! P(X = 2) = (10)2 e "(10) =.00227 2!

Normal Distribution Most important probability distribution Many rv s are approximately normally distributed Even when they aren t, their sums and averages often are (CLT)

Normal Distribution pdf of normal distribution: f (x;µ," 2 ) = 1 2 2#" e$(x$µ) / 2" 2 standard normal distribution (µ = 0, σ 2 = 1): cdf of Z: f (z;0,1) = P(Z " z) = z 1 2 2"# e$z / 2 % f (y;0,1) dy #$

Standardizing Normal RV If X has a normal distribution with mean µ and standard deviation σ, we can standardize to a standard normal rv: Z = X " µ #

I Digress: Sampling Distributions Before data is collected, we regard observations as random variables (X 1,X 2,,X n ) This implies that until data is collected, any function (statistic) of the observations (mean, sd, etc.) is also a random variable Thus, any statistic, because it is a random variable, has a probability distribution - referred to as a sampling distribution Let s focus on the sampling distribution of the mean, X

Behold The Power of the CLT Let X 1,X 2,,X n be an iid random sample from a distribution with mean µ and standard deviation σ. If n is sufficiently large: " n ) X ~N(µ,

Example If the mean and standard deviation of serum iron values from healthy men are 120 and 15 mgs per 100ml, respectively, what is the probability that a random sample of 50 normal men will yield a mean between 115 and 125 mgs per 100ml? First, calculate mean and sd to normalize (120 and 15 / 50 ) $ 115 #120 p(115 " x "125 = p& % 2.12 " x " 125 #120 2.12 ' ) ( = p "2.36 # z # 2.36 ( ) = p z " 2.36 ( ) # p z " #2.36 ( ) = 0.9909 # 0.0091 = 0.9818

R Understand how to calculate probabilities from probability distributions Normal: dnorm and pnorm Poisson: dpois and ppois Binomial: dbinom and pbinom Hypergeometric: dhyper and phyper Exploring relationships among distributions