APM 504: Probability Notes. Jay Taylor Spring Jay Taylor (ASU) APM 504 Fall / 65

Size: px
Start display at page:

Download "APM 504: Probability Notes. Jay Taylor Spring Jay Taylor (ASU) APM 504 Fall / 65"

Transcription

1 APM 504: Probability Notes Jay Taylor Spring 2015 Jay Taylor (ASU) APM 504 Fall / 65

2 Outline Outline 1 Probability and Uncertainty 2 Random Variables Discrete Distributions Continuous Distributions 3 Multivariate Distributions 4 Sums of Random Variables Jay Taylor (ASU) APM 504 Fall / 65

3 Probability and Uncertainty Probability: Interpretations and Basic Principles Frequentist Interpretation The probability of an event is equal to its limiting frequency in an infinite series of independent identical trials. Bayesian Interpretations Logical: The probability of a proposition is equal to the strength of evidence in favor of the proposition. Subjective: The probability of a proposition is equal to the strength of an individual s belief in the proposition. Jay Taylor (ASU) APM 504 Fall / 65

4 Probability and Uncertainty Although probabilities can be interpreted in different ways, these interpretations are usually based on the same mathematical rules. To describe these, we will use P(E) to denote the probability of an event or proposition E. Probability Axioms 1 If S is certain to be true, then P(S) = P(E) 1 for any proposition E. 3 If E and F are mutually exclusive propositions, then the probability that either E or F is true is equal to the sum of the probabilities of E and of F : P(E or F ) = P(E) + P(F ). Jay Taylor (ASU) APM 504 Fall / 65

5 Probability and Uncertainty A more formal treatment can be given using Kolmogorov s notion of a probability space. Definition A probability space is a triple (Ω, F, P) consisting of the following objects: 1 A set Ω called the sample space. 2 A collection F of subsets of Ω, called a σ-algebra, satisfying: Ω F; if A F, then Ā = Ω\A F; if A 1, A 2, F, then i=1 A i F. 3 A probability measure P : F R satisfying: P(A) 0 for every A F; P(Ω) ( = 1; ) P i 1 A i = i 1 P(A i) for any countable collection of disjoint sets A i F. Jay Taylor (ASU) APM 504 Fall / 65

6 Probability and Uncertainty The axioms of probability imply several useful properties: P(Ā) = 1 P(A); P( ) = 0; If A B, then P(A) P(B). P(A B) = P(A) + P(B) P(A, B); General inclusion-exclusion formula: ( n ) n P A i = P(A i ) P(A i A j ) + P(A i A j A k ) + i=1 i=1 i<j ( 1) n+1 P(A 1A 2 A n). i<j<k Exercise: Show that these properties follow from the definition. Jay Taylor (ASU) APM 504 Fall / 65

7 Probability and Uncertainty The probability assigned to a proposition depends on the information or evidence available to us. This can be made explicit through conditional probability. Conditional Probability Suppose that E and F are propositions and that P(E) > 0. If we know E to be true, then the conditional probability of F given E is equal to P(F E) = P(E and F ). P(E) P(S) = 1 S P(E) = 0.46, P(F ) = 0.33 P(E and F ) = 0.21 P(F E) = 0.21/ E E and F F Jay Taylor (ASU) APM 504 Fall / 65

8 Probability and Uncertainty Joint probabilities can often be calculated by conditioning on one of the propositions. Product Rule P(E and F ) = P(E) P(F E). Example: Suppose that two balls are sampled without replacement from an urn containing five red balls and five blue balls. If we let E be the event that the first ball sampled is red and F be the event that second ball sampled is red, then the probability that both balls sampled are red is P(E, F ) = P(E)P(F E) = = 2 9. Jay Taylor (ASU) APM 504 Fall / 65

9 Probability and Uncertainty Because we can condition on either E or F, the joint probability of E and F can be decomposed in two different ways using the product rule: { P(E) P(F E) (conditioning on E) P(E and F ) = P(F ) P(E F ) (conditioning on F ). It follows that the two expressions on the right-hand side are equal, i.e., P(E) P(F E) = P(F ) P(E F ), and if we then divide both sides by P(E), we arrive at one of the most important formulas in probability theory: Bayes Formula P(F E) = P(F ) P(E F ). P(E) Jay Taylor (ASU) APM 504 Fall / 65

10 Probability and Uncertainty The denominator in Bayes formula can often be calculated with the help of the following formula. The Law of Total Probability Suppose that F 1,, F n are mutually exclusive events and that B is an event contained in the union F 1 F n. Then P(B) = = n P(B, F i ) i=1 n P(F i ) P(B F i ). i=1 In other words, the probability of B is equal to the average of the conditional probabilities P(B F i ), weighted by the probabilities of the events being conditioned on. Jay Taylor (ASU) APM 504 Fall / 65

11 Probability and Uncertainty Example: Reversed Sexual Size Dimorphism in Spotted Owls Like many raptors, adult female Spotted Owls (Strix occidentalis) are larger, on average, than their male counterparts. For example, a study of a California population found that the wing chord distribution (in mm) is approximately N (329, 6) in females and N (320, 6) in males (Blakesley et al., 1990, J. Field Ornithology). Wing Chord in Spotted Owls male female density wing chord Jay Taylor (ASU) APM 504 Fall / 65

12 Probability and Uncertainty Problem: Suppose that an adult bird with a wing chord of 329 mm is randomly sampled from a population with a 1 : 1 adult sex ratio. What is the probability that this is a female? Solution: Let F (resp., M) be the event that the bird is female (resp., male) and let W be the event that the wing chord is 329 mm. Then P(F ) = 0.5 p(w F ) = /72 2π e ( ) p(w ) = P(F )p(w F ) + P(M)p(W M) = /72 1 2π e ( ) /72 2π e ( ) and upon substituting these quantities into Bayes formula we find that P(F W ) = P(F ) p(w F ) p(w ) Jay Taylor (ASU) APM 504 Fall / 65

13 Probability and Uncertainty In general, there is no simple relationship between the probabilities P(A), P(B) and P(A, B). However, there is an important special case where the three probabilities are related by a simple identity. Independence Events A and B are independent if P(A, B) = P(A) P(B). Events A 1, A 2, A 3, are independent if every finite collection {A i1, A i2,, A in } of distinct events satisfies the identity: ( n ) n P A ij = P ( ) A ij. j=1 j=1 Jay Taylor (ASU) APM 504 Fall / 65

14 Probability and Uncertainty Colloquially, we say that two events are independent if there is no causal or logical relationship between them, i.e., knowing that B is true does not change the likelihood that A is true. Indeed, if A and B are independent and P(A) > 0 and P(B) > 0, then P(A B) = P(A, B) P(B) = P(A)P(B) P(B) = P(A) P(B A) = P(A, B) P(A) = P(A)P(B) P(A) = P(B), which shows that our formal definition of independence is consistent with this heuristic interpretation. However, the formal definition is slightly broader in that it applies even when one or more of the events has probability 0. Jay Taylor (ASU) APM 504 Fall / 65

15 Random Variables Random Variables Suppose that (Ω, F, P) is a probability space that represents our beliefs concerning the state of some system. In many cases, it will not be possible to directly observe which state ω Ω the system occupies, but it will be possible to conduct experiments that provide some information about this state. Mathematically, we can model such an experiment by a function defined on the sample space which takes values in a set E. X : Ω E Since the value of the variable X (ω) depends on the unknown state ω of the system, we say that X is a random variable. If we wish to emphasize the range of X, then we say that X is an E-valued random variable. Jay Taylor (ASU) APM 504 Fall / 65

16 Random Variables Distribution of a Random Variable Suppose that X : Ω E is a random variable defined on a probability space (Ω, F, P). The distribution of X is the probability distribution P X on E defined by the following identity: P X (A) P(X A) = P(X 1 (A)). Notice that the two expressions on the right-hand side are determined by the probability measure P on Ω, i.e., the distribution of a random variable depends on the probability distribution on the underlying space on which the variable is defined. Technical aside: To make this definition rigorous, we need to introduce a σ-algebra E on E and require that X 1 (A) F whenever A E. X is said to be measurable with respect to the two σ-algebras F and E if this condition is satisfied. The distribution of X is then a probability measure on the measurable space (E, E). Jay Taylor (ASU) APM 504 Fall / 65

17 Random Variables Discrete Distributions Discrete Random Variables Definition A random variable X is said to be discrete if X takes values in a set E that is either finite or countably infinite. The probability mass function of a discrete random variable X is the function p X : E [0, 1] defined by the formula p X (e) = P(X = e). Example: A random variable X with values in the set E = {0, 1} and probability mass function p X (1) = p, p X (0) = 1 p is said to be a Bernoulli random variable with parameter p. Jay Taylor (ASU) APM 504 Fall / 65

18 Random Variables Discrete Distributions The probability mass function of a discrete random variable completely determines its distribution via the following identity. Calculating probabilities with probability mass functions Let X be a discrete random variable with probability mass function p X : E [0, 1]. Then, for any subset A E, P(X A) = x A p X (x). In other words, to calculate the probability that X belongs in A, we simply sum the probability mass function over all of the values in A. In particular, when the probability mass function is summed over the entire space, the sum must be equal to 1: p X (x) = P(X E) = 1. x E Jay Taylor (ASU) APM 504 Fall / 65

19 Random Variables Discrete Distributions Expectations of Discrete Variables If X is a discrete random variable with values in a subset of the real numbers, then the expected value (expectation, mean) of X is the weighted average of these values: E[X ] = x E p X (x) x. Example: If X is Bernoulli with parameter p, then the expected value of X is E[X ] = p 1 + (1 p) 0 = p. Notice that the expected value of a random variable need not belong to the range of the variable. Jay Taylor (ASU) APM 504 Fall / 65

20 Random Variables Discrete Distributions Properties of Expectations The following two properties are often useful: Linearity: n E[c 1X c nx n] = c i E[X i ] Transformations: If f : E R is a real-valued function, then E[f (X )] = p X (x) f (x) x E i=1 Caveat: If f is a non-linear function, then in general E[f (X )] f (E[X ]). Jay Taylor (ASU) APM 504 Fall / 65

21 Random Variables Discrete Distributions Variance If X is a discrete random variable with values in a subset of the real numbers, then the variance of X is the weighted average of the squared difference between X and its mean: [ Var(X ) E (X E[X ]) 2] = p X (x) (x E[X ]) 2. x E In practice, it is often more convenient to calculate the variance using the following formula: Var(X ) = E [ X 2] E[X ] 2. Example: If X Bernoulli(p), then, since X 2 = X, Var(X ) = p p 2 = p(1 p). Jay Taylor (ASU) APM 504 Fall / 65

22 Random Variables Discrete Distributions Binomial Distribution X is said to have the binomial distribution with parameters n 1 and p [0, 1], written X Binomial(n, p), if X takes values in the set E = {0, 1,, n} with probability mass function P(X = k) = ( n k ) p k (1 p) n k, k = 0, 1,, n. Furthermore, the mean and the variance of X are E[X ] = np Var(X ) = np(1 p). Application: Suppose that we perform a series of n independent, identical (IID) trials, each of which results in a success with probability p or a failure with probability 1 p. Then the total number of successes in the n trials is a binomial random variable with parameters n and p. In particular, a Bernoulli random variable with parameter p is also a binomial random variable with parameters n = 1 and p. Jay Taylor (ASU) APM 504 Fall / 65

23 Random Variables Discrete Distributions Geometric Distribution X is said to have the geometric distribution with parameter p [0, 1], written X Geometric(p), if X takes values in the non-negative integers E = {0, 1, } with probability mass function P(X = k) = (1 p) k p, k 0. Furthermore, the mean and the variance of X are E[X ] = 1 p p Var(X ) = 1 p p 2. Application: Suppose that we perform a series of IID trials, each of which results in a success with probability p. Then the number of failures that occur before we observe the first success is geometrically distributed with parameter p. Alternate definitions: Some authors define the geometric distribution to be the number of trials required to obtain the first success, in which case X takes values in the set E = {1, 2, } and P(X = k) = (1 p) k 1 p. Jay Taylor (ASU) APM 504 Fall / 65

24 Random Variables Discrete Distributions Negative Binomial Distribution X is said to have the negative binomial distribution with parameters r > 0 and p [0, 1], written X NB(r, p), if X takes values in the non-negative integers E = {0, 1, } with probability mass function ( ) r + k 1 P(X = k) = p r (1 p) k, k 0. k Furthermore, the mean and the variance of X are E[X ] = r(1 p) p Var(X ) = r(1 p) p 2. Application: Suppose that we perform a series of IID trials, each of which results in a success with probability p. Then the number of failures that occur before the r th success is a negative binomial random variable with parameters r and p. Jay Taylor (ASU) APM 504 Fall / 65

25 Random Variables Discrete Distributions The following plot shows the probability mass function for a series of negative binomial distributions, with fixed success probability p = 0.5 and increasing r = 0.2, 1, 5, 10. When r = 1, this distribution reduces to a geometric distribution. However, as r increases, the distribution becomes more symmetric (less skewed) around its mean. 0.9 NegativeBinomial Distributions r =0.2 r =1 r =5 r= p(k) Geometric(0.5) k Jay Taylor (ASU) APM 504 Fall / 65

26 Random Variables Discrete Distributions Poisson Distribution X is said to have the Poisson distribution with parameter λ 0, written X Poisson(λ), if X takes values in the non-negative integers E = {0, 1, } with probability mass function P(X = k) = e λ λ k k!, k 0. Furthermore, the mean and the variance of X are: E[X ] = Var(X ) = λ. Application: Suppose that we perform a large number n of IID trials, each with success probability p = λ/n. Then the total number of successes is approximately Poisson distributed with parameter λ and this approximation becomes exact in the limit as n. This is a special case of the Law of Rare Events. Jay Taylor (ASU) APM 504 Fall / 65

27 Random Variables Discrete Distributions The Poisson distribution provides a useful approximation for the distribution of many kinds of count data. Some examples include: The number of misspelled words on a page of a book. The number of misdialed phone numbers in a city on a particular day. The number of beta particles emitted by a 14 C source in an hour. The number of major earthquakes that occur in a year. The number of mutated sites in a gene that differ between two closely-related species. Jay Taylor (ASU) APM 504 Fall / 65

28 Random Variables Discrete Distributions The Poisson distribution is less suited for modeling count data corresponding to events or individuals that are clumped. In such cases, the data are usually overdispersed, i.e., the variance is greater than the mean, and the negative binomial distribution may provide a better fit. p(k) NegativeBinomial vs. Poisson (mean=4) Poisson(5) r =1 r =2 r =5 r = NB: σ 2 = µ + 1 r µ2 r is an inverse measure of aggregation or dispersion: smaller values of r give a more skewed distribution If we let r with p chosen so that the mean is µ, then NB(r, p) Poisson(µ) k examples: numbers of macro parasites per individual, numbers of infections per outbreak are usually modeled with the negative binomial distribution Jay Taylor (ASU) APM 504 Fall / 65

29 Random Variables Continuous Distributions Continuous Distributions A real-valued random variable X is said to be continuous if there is a non-negative function p : R [0, ], called the probability density function of X, such that P(X A) = p(x)dx, for any (nice) subset A R. A Technical aside: A set is nice if it is belongs to the Borel σ-algebra on R. This is the smallest σ-algebra on R that contains all open intervals. While we will largely ignore measurability in this course, this is an important concept in fully rigorous treatments of probability theory. See David Williams book Probability with Martingales (1991). Jay Taylor (ASU) APM 504 Fall / 65

30 Random Variables Continuous Distributions The probability density function plays the same role for continuous random variables that the probability mass function plays for discrete random variables. Be careful, however, not to confuse the two. The existence of a density function has several consequences: Densities integrate to 1: Points have zero probability mass: p(x)dx = P(X R) = 1. P(X = t) = t t p(x)dx = 0. These two identities seem to lead to a paradox. On the one hand, the probability that X is equal to any particular point x is zero for every x. On the other hand, the probability that X is equal to some point, i.e., X R, is 1. However, there is no contraction here, since there are uncountably infinitely many real numbers and we have not required that probability distributions be additive over arbitrary collections of disjoint sets: ( ) 1 = P(X R) = P {X = x} P(X = x) = 0. x R x R Jay Taylor (ASU) APM 504 Fall / 65

31 Random Variables Continuous Distributions Cumulative Distribution Function If X is a real-valued random variable (not necessarily continuous), the cumulative distribution function of X is the function F : R [0, 1] defined by F (x) = P(X x). If X is continuous, then the density p(x) and the cumulative distribution function F (x) are related in the following way: F (x) = x p(t)dt and p(x) = F (x). Furthermore, the density of X at a point x can be estimated by the following formulas: p(x) P(x ɛ < X x + ɛ) 2ɛ = F (x + ɛ) F (x ɛ). 2ɛ Jay Taylor (ASU) APM 504 Fall / 65

32 Random Variables Continuous Distributions Expectations of Continuous Random Variables If X is a continuous random variable with probability density function p(x), then the expected value of X is the weighted average of these values: E[X ] = x p(x)dx. In general, expectations of continuous random variables behave like expectations of discrete random variables provided that we replace the sum by an integral and the probability mass function by the probability density function. For example, if f : R R is a real-valued function and X is as in the definition, then E[f (X )] = f (x) p(x)dx. Jay Taylor (ASU) APM 504 Fall / 65

33 Random Variables Continuous Distributions Uniform Distribution X is said to have the uniform distribution on the interval (l, u), written X U(l, u), if X takes values in the bounded set E = (l, u) with probability density function p(x) = 1 u l, Furthermore, the mean and the variance of X are: E[X ] = l + u 2 x (l, u). Var(X ) = (u l)2. 12 If l = 0 and u = 1, then X is said to be a standard uniform random variable. Application: In Monte Carlo simulations, we usually must transform a sequence of independent standard uniform random variables into a sequence of random variables with the target distribution. Jay Taylor (ASU) APM 504 Fall / 65

34 Random Variables Continuous Distributions Beta Distribution X is said to have the Beta distribution with parameters a, b > 0, written X Beta(a, b), if X takes values in the set E = (0, 1) with probability density function p(x) = 1 β(a, b) x a 1 (1 x) b 1, x (0, 1). Here, β(a, b) is the Beta function, which is defined by the integral β(a, b) = 1 Furthermore, the mean and the variance of X are: 0 x a 1 (1 x) b 1 dx. E[X ] = a a + b Var(X ) = ab (a + b) 2 (a + b + 1). Remark: If a = b = 1, then the Beta distribution reduces to the standard uniform distribution. Jay Taylor (ASU) APM 504 Fall / 65

35 Random Variables Continuous Distributions Application: The Beta distribution provides a flexible family of probability distributions for quantities that take values in the interval [0, 1], e.g., proportions or probabilities. The figure below shows that, depending on whether the parameters are less than or greater than 1, the Beta density will either be bimodal, with maxima at x = 0, 1, or unimodal, with a mode at (a 1)/(a + b 2). 3.5 Beta distribution p(x) a =b =0.1 a =b =2 Other applications include: order statistics for IID uniform variables posterior distribution of success probabilities equilibrium frequencies of neutral alleles in panmictic populations x Jay Taylor (ASU) APM 504 Fall / 65

36 Random Variables Continuous Distributions Exponential Distribution X is said to have the exponential distribution with rate parameter λ > 0, written X Exp(λ), if X takes values in the set E = [0, ) with probability density function p(x) = λe λx, x 0. Furthermore, the mean and the variance of X are E[X ] = 1 λ Var(X ) = 1 λ 2, while the cumulative distribution function is F (t) = P(X t) = 1 e λt. Application: Exponential random variables are often used to model random times, e.g., the time until a radioactive nucleus decays. Of course, the shape of such a distribution should not depend on the units of measurement and this is true of the exponential distribution. In particular, if X is exponentially distributed with rate λ and Y = γx, then Y is exponentially distributed with rate λ/γ. Jay Taylor (ASU) APM 504 Fall / 65

37 Random Variables Continuous Distributions Memorylessness of Exponential Distributions The exponential distributions are the only distributions that satisfy the following property. Given t, s > 0, P(X > t + s X > t) = P(X > t + s) P(X > t) = e λ(t+s) e λt = e λs = P(X > s). In other words, the variable has no memory of having survived until time t: the probability that it survives from time t to time t + s is the same as the probability of surviving from time 0 to time s. Put another way, the rate at which the variable dies is constant in time. Jay Taylor (ASU) APM 504 Fall / 65

38 Random Variables Continuous Distributions There is also an important connection between the geometric distribution and the exponential distribution. Approximation of the Geometric Distribution by the Exponential Suppose that for each n 1, X n is a geometric random variable with success probability p n = λ/n and let Y n = 1 n Xn. Each of these variables has mean E[Yn] = λ 1. Furthermore, if we let n tend to infinity, then P(Y n > t) = P(X n > nt) = where X is exponentially distributed with rate λ. ( 1 λ ) nt n e λt = P(X > t) n In other words, we can approximate a geometric random variable with small success probability by an exponential random variable provided that we change the units in which time is measured. Jay Taylor (ASU) APM 504 Fall / 65

39 Random Variables Continuous Distributions Gamma Distribution X is said to have the gamma distribution with shape parameter α > 0 and rate parameter λ > 0, written X Gamma(α, λ), if X takes values in the set E = [0, ) with probability density function p(x) = λα Γ(α) x α 1 e λx, x 0. Here, Γ(α) is the Gamma function, which is defined for α > 0 by the integral Γ(α) = Furthermore, the mean and the variance of X are: 0 x α 1 e x dx. E[X ] = α λ Var(X ) = α λ 2. Remark: Sometimes the Gamma distribution is described in terms of the shape parameter α and a scale parameter θ = 1/λ. Jay Taylor (ASU) APM 504 Fall / 65

40 Random Variables Continuous Distributions The Gamma function introduced on the preceding slide is an important object in its own right and appears in many settings in mathematics and statistics. Although Γ(α) usually cannot be explicitly evaluated, the following identity is often useful: Γ(α + 1) = 0 x α e x dx = x α ( e x ) 0 = αγ(α). In particular, if α = n is an integer, then + 0 αx α 1 e x dx Γ(n + 1) = nγ(n) = n(n 1)Γ(n 2) = = n!γ(1) = n! since Γ(1) = 1. Thus the Gamma function can be regarded as a smooth extension of the factorial function to the positive real numbers. Jay Taylor (ASU) APM 504 Fall / 65

41 Random Variables Continuous Distributions The Gamma distribution is related to the exponential distribution in much the same way that the negative binomial distribution is related to the geometric distribution. 3 Gamma Distributions (rate =1) Other applications include: p(x) Exponential(1) a =0.5 a =1 a =2 a = x When α = 1, the Gamma distribution reduces to the exponential. If X 1,, X n are independent exponential RV s with rate λ, then their sum X = X X n is a Gamma RV with shape parameter α = n and rate λ. Thus the Gamma distribution is often used to model random lifespans that elapse after a series of independent events. Jay Taylor (ASU) APM 504 Fall / 65

42 Random Variables Continuous Distributions Normal Distributions X is said to have the normal (or Gaussian) distribution with mean µ and variance σ 2 > 0, written X N (µ, σ 2 ), if X takes values in R with probability density function p(x) = 1 σ 2 /2σ 2 2π e (x µ). In this case, the mean and the variance of X are µ and σ 2, as implied by the name of the distribution. Furthermore, if µ = 0 and σ 2 = 1, then X is said to be a standard normal random variable. Applications: Normal distributions arise as the limiting distribution of a sum of a large number of independent random variables. This is the content of the Central Limit Theorem. Many quantities encountered in biological systems are approximately normally distributed, including many morphological traits. Jay Taylor (ASU) APM 504 Fall / 65

43 Random Variables Continuous Distributions Some useful properties of the normal distribution include: Sums of independent normal RV s are normal: If X 1,, X n are independent normal RV s and X i N (µ i, σ 2 i ), then their sum X = X X n is a normal random variable with mean µ µ n and variance σ σ 2 n. 0.8 Normal Distributions p(x) σ 2 =0.25 σ 2 =1 σ 2 =4 σ 2 =25 Linear transforms preserve normality. If X N (µ, σ 2 ), then Y = ax + b is normal with mean aµ + b and variance a 2 σ 2. If Z N (0, 1), then X = σz + µ N (µ, σ 2 ) x Jay Taylor (ASU) APM 504 Fall / 65

44 Multivariate Distributions Multivariate Distributions and Random Vectors Joint and Marginal Distributions Suppose that X 1,, X n are random variables defined on a common probability space (Ω, F, P) with values in the sets E 1,, E n, respectively. Then the joint distribution of these variables is the distribution of the random vector X = (X 1,, X n) on the set E = E 1 E n, i.e., P(X A) = P((X 1,, X n) A) for A E. Furthermore, the distribution of each variable X i considered on its own is said to be the marginal distribution of that variable. The joint distribution of a collection of random vectors tells us how the variables are related to one another. Jay Taylor (ASU) APM 504 Fall / 65

45 Multivariate Distributions Example: If each of the variables X 1,, X n is marginally discrete, then the joint distribution is uniquely determined by the joint probability mass function p X1,,X n : E [0, 1], which is defined by p X1,,X n (x 1,, x n) = P(X 1 = x 1,, X n = x n). In this case, the marginal probability mass function of X i can be recovered by summing the joint probability mass function over all possible values of the remaining values: p Xi (y) = p X1,,X n (x 1,, x i 1, x, x i+1,, x n). x E:x i =y Jay Taylor (ASU) APM 504 Fall / 65

46 Multivariate Distributions Multinomial Distribution Let n 1 and let p 1,, p k be a collection of non-negative real numbers such that p p k = 1. We say that the random vector X = (X 1,, X k ) has the multinomial distribution with parameters n and (p 1,, p k ) if each of the variables X i takes values in the set {0,, n} and if the joint probability mass function of these variables is given by ( ) provided n n k = n. p(n 1,, n k ) = n n 1,, n k p n 1 1 pn k k Application: As the name suggests, the multinomial distribution generalizes the binomial distribution. Suppose that we conduct n IID trials and that each trial can result in one of k possible outcomes, which have probabilities p 1,, p k. If X i denotes the number of trials that result in the i th outcome, then (X 1,, X k ) has the multinomial distribution with parameters n and (p 1,, p k ). Jay Taylor (ASU) APM 504 Fall / 65

47 Multivariate Distributions Joint Continuity A collection of real-valued random variables X 1,, X n is said to be jointly continuous if there is a function f : R n [0, ], called the joint density function, such that for every set A = [a 1, b 1] [a n, b n] P((X 1,, X n) A) = f (x 1,, x n)dx 1 dx n. A In this case, each variable X i is marginally continuous and the marginal density functions can be recovered by integrating the joint density function, e.g., f X1 (y) = f (y, x 2,, x n)dx 2 dx n. R n 1 Jay Taylor (ASU) APM 504 Fall / 65

48 Multivariate Distributions Dirichlet Distribution Suppose that k 2 and let α 1,, α k be a collection of positive real numbers with sum α = α α k. We say that the random vector X = (X 1,, X k ) has the Dirichlet distribution with parameters k and (α 1,, α k ) if X takes values in the (k 1)-dimensional simplex { } k k 1 = (x 1,, x k ) : x 1,, x k 0 and x i = 1 with joint density function ( ) k Γ(α) p(x 1,, x k ) = k i=1 Γ(α x α i 1 i. i) i=1 i=1 When k = 2, the Dirichlet distribution reduces to the Beta distribution on the segment {(x, 1 x) : 0 x 1}. Notice that k 1 can be identified with the set of probability distributions on the discrete set {1,, k}. Jay Taylor (ASU) APM 504 Fall / 65

49 Multivariate Distributions When working with stochastic processes, it is often useful to consider the conditional distribution of one set of variables given knowledge of another set of variables. Conditional Distributions 1 Suppose that X and Y are discrete random variables with joint probability mass function p X,Y. Then the conditional distribution of X given Y = y is provided p Y (y) > 0. P (X = x Y = y) = p X,Y (x, y) p Y (y) 2 Similarly, if X and Y are jointly continuous with joint density function f X,Y, then conditional on Y = y, X is conditionally continuous with condition density function f X Y (x y) = f X,Y (x, y) f Y (y) provided f Y (y) > 0. Jay Taylor (ASU) APM 504 Fall / 65

50 Multivariate Distributions Although we can always recover the marginal distributions of a collection of random variables from their joint distribution, the inverse operation usually is not possible without additional information. One case where this is possible is when the variables are independent. Independence of Random Variables 1 The random variables X 1,, X n are said to be independent if P(X 1 E 1,, X n E n) = n P(X i E i ) 1 for all sets E 1,, E n such that the events {X i E i } are well-defined. 2 An infinite collection of random variables is said to be independent if every finite sub-collection is independent. In other words, X and Y are independent if and only if the events {X E} and {Y F } are independent for all subsets E and F of the ranges of X and Y. Jay Taylor (ASU) APM 504 Fall / 65

51 Multivariate Distributions As a general rule, calculations involving multivariate distributions are greatly simplified when the component variables are independent. The following theorem provides an important example of this principle. Theorem Suppose that X 1,, X n are independent real-valued random variables and that f 1,, f n are functions from R to R. Then f 1(X 1),, f n(x n) are independent real-valued random variables and [ n ] n E f i (X i ) = E [f i (X i )], i=1 whenever the expectations on both sides of the identity are defined. In particular, by letting each f i (x) = x be the identity function we obtain the following important special case: [ n ] n E X i = E [X i ]. i=1 i=1 i=1 Jay Taylor (ASU) APM 504 Fall / 65

52 Multivariate Distributions Of course, it is often the case that variables of interest are not independent and then we need metrics to quantify the extent to which they depend on each other. One such metric is the covariance. Covariance Suppose that X and Y are real-valued random variables defined on the same probability space. The covariance of X and Y is the quantity Cov(X, Y ) = E [(X E[X ])(Y E[Y ])] = E[XY ] E[X ]E[Y ]. The covariance between two random variables is a measure of their linear association. In particular, if X and Y are independent, then since the theorem on the previous slide shows that E[XY ] = E[X ]E[Y ], it follows that Cov(X, Y ) = 0. However, the converse of this result is not true: the mere fact that two variables have zero covariance does not imply that they are independent. Jay Taylor (ASU) APM 504 Fall / 65

53 Multivariate Distributions Properties of Covariance Covariances have a number of useful properties: Cov(X, X ) = Var(X ); Cov(X, Y ) = Cov(Y, X ); Cov(aX, by ) = ab Cov(X, Y ); Bilinearity: ( n ) m Cov X i, Y j = Cauchy-Schwartz inequality: i=1 j=1 n m Cov(X i, Y j ); i=1 j=1 Cov(X, Y ) Var(X ) Var(Y ). Exercise: Verify the above properties. Jay Taylor (ASU) APM 504 Fall / 65

54 Multivariate Distributions When we work with random vectors containing more than two variables, it is often useful to organize the covariances into a single matrix. Variance-Covariance Matrix Suppose that X 1,, X n are real-valued random variables and let σ ij = Cov(X i, X j ) denote the covariance of X i and X j. Then the variance-covariance matrix of the random vector X = (X 1,, X n) is the n n matrix Σ with entry σ ij in the i th column and j th row: σ 11 σ 12 σ 1n σ 21 σ 22 σ 2n Σ =.... σ n1 σ n2 σ nn Because Cov(X, Y ) = Cov(Y, X ), it is clear that every variance-covariance matrix is symmetric. Furthermore, it can be shown that any such matrix is also non-negative definite, i.e., given any (column) vector β R n, we have ( n ) β T Σβ = Var β i X i 0. i=1 Jay Taylor (ASU) APM 504 Fall / 65

55 Multivariate Distributions Multivariate Normal Distribution A continuous random vector X = (X 1,, X n) with values in R n is said to have the multivariate normal distribution with mean vector µ = (µ 1,, µ n) and n n variance-covariance matrix Σ if it has joint density function { p(x) = (2π) n/2 Σ 1/2 exp 1 } 2 (x µ)t Σ 1 (x µ). Here Σ is a positive-definite symmetric matrix with determinant Σ > 0 and inverse Σ 1. If X is multivariate normal, then every component variable X i is a normal random variable. Furthermore, if X and Y are multivariate normal random variables with values in R n, then so is X + Y, i.e., sums of multivariate normal random variables are also multivariate normal. Jay Taylor (ASU) APM 504 Fall / 65

56 Sums of Random Variables Sums of Independent Random Variables Many problems in applied probability and statistics involve sums of independent random variables. For example, in discrete-time branching processes, the size of a population at time t + 1 is equal to the sum of the number of offspring born to each adult female alive at time t. In some cases, the distribution of the sum of two independent integer-valued random variables can be calculated with the help of the following result. Theorem Suppose that X and Y are independent integer-valued random variables with probability mass functions p X and p Y, respectively. Then Z = X + Y is an integer-valued random variable with probability mass function p Z (n) = p X p Y (n) k= p X (k)p Y (n k) The operation p X p Y is called the discrete convolution of p X and p Y. Jay Taylor (ASU) APM 504 Fall / 65

57 Sums of Random Variables The proof of this result uses only elementary properties of probability distributions and independence: p Z (n) = P(Z = n) = P(X + Y = n) ( ) = P {X = k, Y = n k} k= k= = P(X = k, Y = n k) = P(X = k)p(y = n k) k= = p X (k)p Y (n k). k= Jay Taylor (ASU) APM 504 Fall / 65

58 Sums of Random Variables By way of example, we can show that the sum of two independent Poisson random variables is Poisson. Suppose that X Poisson(λ) and Y Poisson(µ) are independent and let Z = X + Y. Then the probability mass function of Z is p Z (n) = = k= p X (k)p Y (n k) n ) (e λ λ (e k µ µ n k k=0 = e (λ+µ) 1 n! k! n k=0 = e (λ+µ) (λ + µ) n n! (n k)! ) n! k!(n k)! λk µ n k which shows that X + Y Poisson(λ + µ). This result is very useful when working with Poisson processes. Jay Taylor (ASU) APM 504 Fall / 65

59 Sums of Random Variables An analogous result holds for sums of independent continuous random variables. Theorem Let X and Y be independent continuous random variables with densities p X and p Y, respectively. Then Z = X + Y is a continuous random variable with density p Z (z) = p X p Y (z) p X (t)p Y (z t)dt and p X p Y (z) is called the convolution integral of p X and p Y. Exercise: Use this theorem to show that the sum of two independent exponential random variables with rate parameter λ is Gamma distributed with shape parameter α = 2 and rate parameter λ. Jay Taylor (ASU) APM 504 Fall / 65

60 Sums of Random Variables Two of the most important results in probability theory concern the asymptotic or limiting behavior of a sum of independent random variables as the number of terms tends to infinity. The Strong Law of Large Numbers (SLLN) Suppose that X 1, X 2, is a sequence of independent and identically-distributed (iid) real-valued random variables with E X 1 <. If µ = EX 1 and S n = X X n, then the sequence of sample means S n/n converges almost surely to µ, i.e., { P lim n 1 n Sn = µ } = 1. Interpretation: As the number of independent trials increases, the sample mean is certain to converge to the true mean. Jay Taylor (ASU) APM 504 Fall / 65

61 Sums of Random Variables The SLLN is an example of a more general heuristic, which states that deterministic behavior can emerge in random systems containing a large number of weakly interacting components. For example, it is this heuristic which justifies the use of deterministic ODE s to model chemical reactions. However, in many instances, we are interested in the fluctuations of the system about this deterministic limit. This is addressed by the Central Limit Theorem. Central Limit Theorem (CLT) Suppose that X 1, X 2, is a sequence of i.i.d. real-valued random variables with finite mean µ and finite variance σ 2. If S n = X X n and Z n = 1 σ (Sn nµ), n then the sequence Z 1, Z 2, converges in distribution to a standard normal random variable Z, i.e., for every t (, ), t lim P n (Zn t) = P(Z t) = 1 e x2 /2 dx. 2π Jay Taylor (ASU) APM 504 Fall / 65

62 Sums of Random Variables The variables Z 1, Z 2, introduced in the CLT are said to be standardized in the sense that for every n 1, EZ n = 0 and Var(Z n) = 1. In other words, since we already know that the sample means S n/n are converging almost surely to the mean µ, to study the fluctuations of S n/n around this limit we need to subtract the limit and then amplify the differences S n/n µ by a factor that grows rapidly enough to compensate for the fact that this difference is tending to 0: Z n = n σ ( ) 1 n Sn µ. What the CLT tells us is that irrespective of the distribution of the X i s, these amplified differences will be approximately normally distributed when n is large. This observation presumably explains why so many quantities are approximately normally distributed - in effect, the microscopic details are lost to the Gaussian limit whenever we consider a macroscopic system in which the components act additively. Jay Taylor (ASU) APM 504 Fall / 65

63 Sums of Random Variables Example: Suppose that X 1, X 2, are independent Bernoulli random variables with success probability p and let S n = X X n be the sum of the first n variables. Since S n is the number of successes in a series of n independent trials, we know that S n is a binomial random variable with parameters n and p. Furthermore, by the CLT, we know that when n is large, the sums, Z n = 1 np(1 p) (S n np) are approximately normally distributed. However, since linear transformations of a normal random variable are normal, it follows that S n is itself approximately normally distributed with mean np and variance np(1 p), i.e., Binomial(n, p) N (np, np(1 p)). Remark: This observation can be used to construct a fast approximate algorithm for sampling from a binomial distribution. Jay Taylor (ASU) APM 504 Fall / 65

64 Sums of Random Variables The convergence of the binomial distribution to the normal distribution with increasing n is illustrated in the below figure which compares the normal distribution with mean np and variance np(1 p) with the binomial distribution for n = 10 (left) and n = 100 (right) when p = 0.2. Jay Taylor (ASU) APM 504 Fall / 65

65 Sums of Random Variables Another example of the scope of the CLT is provided by the distribution of adult human heights, which is approximately normal. The figure shows a histogram for the heights of a sample of 5000 adults (source: SOCR), as well as the best fitting normal distribution. Normality of quantitative traits can be explained by Fisher s infinitesimal model: The trait depends on a large number L of variable loci Distribution of Adult Heights data normal The two alleles at each locus have a small effect X l,m and X l,p on the trait. 150 The loci act additively. 100 Then an individual s height may be expressed as below: 50 H = height (inches) L (X l,m + X l,p ) + ɛ. where ɛ is the random environmental effect on height. l=1 Jay Taylor (ASU) APM 504 Fall / 65

APM 541: Stochastic Modelling in Biology Probability Notes. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 77

APM 541: Stochastic Modelling in Biology Probability Notes. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 77 APM 541: Stochastic Modelling in Biology Probability Notes Jay Taylor Fall 2013 Jay Taylor (ASU) APM 541 Fall 2013 1 / 77 Outline Outline 1 Motivation 2 Probability and Uncertainty 3 Conditional Probability

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor Lectures for APM 541: Stochastic Modeling in Biology Jay Taylor November 3, 2011 Contents 1 Distributions, Expectations, and Random Variables 4 1.1 Probability Spaces...................................

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

1.1 Review of Probability Theory

1.1 Review of Probability Theory 1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,

More information

Tom Salisbury

Tom Salisbury MATH 2030 3.00MW Elementary Probability Course Notes Part V: Independence of Random Variables, Law of Large Numbers, Central Limit Theorem, Poisson distribution Geometric & Exponential distributions Tom

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27 Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

1 Probability and Random Variables

1 Probability and Random Variables 1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Brief Review of Probability

Brief Review of Probability Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions

More information

Advanced topics from statistics

Advanced topics from statistics Advanced topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions The multinomial distribution

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

STAT Chapter 5 Continuous Distributions

STAT Chapter 5 Continuous Distributions STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Lecture 16. Lectures 1-15 Review

Lecture 16. Lectures 1-15 Review 18.440: Lecture 16 Lectures 1-15 Review Scott Sheffield MIT 1 Outline Counting tricks and basic principles of probability Discrete random variables 2 Outline Counting tricks and basic principles of probability

More information

2.1 Elementary probability; random sampling

2.1 Elementary probability; random sampling Chapter 2 Probability Theory Chapter 2 outlines the probability theory necessary to understand this text. It is meant as a refresher for students who need review and as a reference for concepts and theorems

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Lecture 1: Review on Probability and Statistics

Lecture 1: Review on Probability and Statistics STAT 516: Stochastic Modeling of Scientific Data Autumn 2018 Instructor: Yen-Chi Chen Lecture 1: Review on Probability and Statistics These notes are partially based on those of Mathias Drton. 1.1 Motivating

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Molecular Epidemiology Workshop: Bayesian Data Analysis

Molecular Epidemiology Workshop: Bayesian Data Analysis Molecular Epidemiology Workshop: Bayesian Data Analysis Jay Taylor and Ananias Escalante School of Mathematical and Statistical Sciences Center for Evolutionary Medicine and Informatics Arizona State University

More information

Probability Background

Probability Background CS76 Spring 0 Advanced Machine Learning robability Background Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu robability Meure A sample space Ω is the set of all possible outcomes. Elements ω Ω are called sample

More information

Lectures on Elementary Probability. William G. Faris

Lectures on Elementary Probability. William G. Faris Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................

More information

Lectures on Statistical Data Analysis

Lectures on Statistical Data Analysis Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk

More information

Mathematical Statistics 1 Math A 6330

Mathematical Statistics 1 Math A 6330 Mathematical Statistics 1 Math A 6330 Chapter 3 Common Families of Distributions Mohamed I. Riffi Department of Mathematics Islamic University of Gaza September 28, 2015 Outline 1 Subjects of Lecture 04

More information

Chapter 1. Sets and probability. 1.3 Probability space

Chapter 1. Sets and probability. 1.3 Probability space Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

Stat 426 : Homework 1.

Stat 426 : Homework 1. Stat 426 : Homework 1. Moulinath Banerjee University of Michigan Announcement: The homework carries 120 points and contributes 10 points to the total grade. (1) A geometric random variable W takes values

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

Probability Theory. Patrick Lam

Probability Theory. Patrick Lam Probability Theory Patrick Lam Outline Probability Random Variables Simulation Important Distributions Discrete Distributions Continuous Distributions Most Basic Definition of Probability: number of successes

More information

Probability Review. Gonzalo Mateos

Probability Review. Gonzalo Mateos Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ September 11, 2018 Introduction

More information

APM 421 Probability Theory Discrete Random Variables. Jay Taylor Fall Jay Taylor (ASU) Fall / 86

APM 421 Probability Theory Discrete Random Variables. Jay Taylor Fall Jay Taylor (ASU) Fall / 86 APM 421 Probability Theory Discrete Random Variables Jay Taylor Fall 2013 Jay Taylor (ASU) Fall 2013 1 / 86 Outline 1 Motivation 2 Infinite Sets and Cardinality 3 Countable Additivity 4 Discrete Random

More information

MAS113 Introduction to Probability and Statistics. Proofs of theorems

MAS113 Introduction to Probability and Statistics. Proofs of theorems MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a

More information

Lecture notes for Part A Probability

Lecture notes for Part A Probability Lecture notes for Part A Probability Notes written by James Martin, updated by Matthias Winkel Oxford, Michaelmas Term 017 winkel@stats.ox.ac.uk Version of 5 September 017 1 Review: probability spaces,

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems

More information

Department of Large Animal Sciences. Outline. Slide 2. Department of Large Animal Sciences. Slide 4. Department of Large Animal Sciences

Department of Large Animal Sciences. Outline. Slide 2. Department of Large Animal Sciences. Slide 4. Department of Large Animal Sciences Outline Advanced topics from statistics Anders Ringgaard Kristensen Covariance and correlation Random vectors and multivariate distributions The multinomial distribution The multivariate normal distribution

More information

Probability reminders

Probability reminders CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so

More information

Lectures 16-17: Poisson Approximation. Using Lemma (2.4.3) with θ = 1 and then Lemma (2.4.4), which is valid when max m p n,m 1/2, we have

Lectures 16-17: Poisson Approximation. Using Lemma (2.4.3) with θ = 1 and then Lemma (2.4.4), which is valid when max m p n,m 1/2, we have Lectures 16-17: Poisson Approximation 1. The Law of Rare Events Theorem 2.6.1: For each n 1, let X n,m, 1 m n be a collection of independent random variables with PX n,m = 1 = p n,m and PX n,m = = 1 p

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

Topic 2: Review of Probability Theory

Topic 2: Review of Probability Theory CS 8850: Advanced Machine Learning Fall 2017 Topic 2: Review of Probability Theory Instructor: Daniel L. Pimentel-Alarcón c Copyright 2017 2.1 Why Probability? Many (if not all) applications of machine

More information

Statistics 1B. Statistics 1B 1 (1 1)

Statistics 1B. Statistics 1B 1 (1 1) 0. Statistics 1B Statistics 1B 1 (1 1) 0. Lecture 1. Introduction and probability review Lecture 1. Introduction and probability review 2 (1 1) 1. Introduction and probability review 1.1. What is Statistics?

More information

Poisson approximations

Poisson approximations Chapter 9 Poisson approximations 9.1 Overview The Binn, p) can be thought of as the distribution of a sum of independent indicator random variables X 1 + + X n, with {X i = 1} denoting a head on the ith

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

THE QUEEN S UNIVERSITY OF BELFAST

THE QUEEN S UNIVERSITY OF BELFAST THE QUEEN S UNIVERSITY OF BELFAST 0SOR20 Level 2 Examination Statistics and Operational Research 20 Probability and Distribution Theory Wednesday 4 August 2002 2.30 pm 5.30 pm Examiners { Professor R M

More information

Continuous random variables

Continuous random variables Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot

More information

Continuous Probability Spaces

Continuous Probability Spaces Continuous Probability Spaces Ω is not countable. Outcomes can be any real number or part of an interval of R, e.g. heights, weights and lifetimes. Can not assign probabilities to each outcome and add

More information

2 Random Variable Generation

2 Random Variable Generation 2 Random Variable Generation Most Monte Carlo computations require, as a starting point, a sequence of i.i.d. random variables with given marginal distribution. We describe here some of the basic methods

More information

CME 106: Review Probability theory

CME 106: Review Probability theory : Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

1 Review of Probability

1 Review of Probability 1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution DiscreteUniform(n). Discrete. Rationale Equally likely outcomes. The interval 1, 2,..., n of

More information

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1). Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent

More information

Lecture 17: The Exponential and Some Related Distributions

Lecture 17: The Exponential and Some Related Distributions Lecture 7: The Exponential and Some Related Distributions. Definition Definition: A continuous random variable X is said to have the exponential distribution with parameter if the density of X is e x if

More information

Classical Probability

Classical Probability Chapter 1 Classical Probability Probability is the very guide of life. Marcus Thullius Cicero The original development of probability theory took place during the seventeenth through nineteenth centuries.

More information

Things to remember when learning probability distributions:

Things to remember when learning probability distributions: SPECIAL DISTRIBUTIONS Some distributions are special because they are useful They include: Poisson, exponential, Normal (Gaussian), Gamma, geometric, negative binomial, Binomial and hypergeometric distributions

More information

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4. UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Christopher Barr University of California, Los Angeles,

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability? Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B Statistics STAT:5 (22S:93), Fall 25 Sample Final Exam B Please write your answers in the exam books provided.. Let X, Y, and Y 2 be independent random variables with X N(µ X, σ 2 X ) and Y i N(µ Y, σ 2

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

The Wright-Fisher Model and Genetic Drift

The Wright-Fisher Model and Genetic Drift The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models Fatih Cavdur fatihcavdur@uludag.edu.tr March 20, 2012 Introduction Introduction The world of the model-builder

More information

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 14: Continuous random variables Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Lecture 2: Probability and Distributions

Lecture 2: Probability and Distributions Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 / 65 Probability: Why do we care? Probability helps us by: Allowing us to translate scientific questions info

More information

Poisson Processes. Stochastic Processes. Feb UC3M

Poisson Processes. Stochastic Processes. Feb UC3M Poisson Processes Stochastic Processes UC3M Feb. 2012 Exponential random variables A random variable T has exponential distribution with rate λ > 0 if its probability density function can been written

More information