Lecture 6: Special probability distributions. Summarizing probability distributions. Let X be a random variable with probability distribution

Size: px
Start display at page:

Download "Lecture 6: Special probability distributions. Summarizing probability distributions. Let X be a random variable with probability distribution"

Transcription

1 Econ 514: Probability and Statistics Lecture 6: Special probability distributions Summarizing probability distributions Let X be a random variable with probability distribution P X. We consider two types of probability distributions Discrete distributions: P X is absolutely continuous with respect to the counting measure. Continuous distributions: P X is absolutely continuous with respect to the Lebesgue measure. In both cases there is a density f X. Initially we consider X scalar and later random vectors X. 1

2 How do we summarize the probability distribution P X? Obvious method: Make graph of density. Figure 1 for discrete case and figures 2,3 for continuous case. 2

3 3

4 4

5 Graph can be used to visualize support and to compute probabilities. Intervals where f X is large have a high probability. Summarizing using moments We can also try to summarize P X by numbers. This never gives a complete picture, because we summarize a function f X by some number. Obvious choice is E(X), the expected value of X and the mean of the distribution P X. Interpretation: average over repetitions. Repeat the random experiment N times and call the outcomes X 1, X 2, X 3,..., X N. If N is large then 1 N N i=1 X i E(X), i.e. the mean is the average over repetitions. 5

6 Interpretation: optimal prediction. Consider predictor m of outcome X. Prediction error of this predictor X m Assume that the loss function is proportional to (X m) 2 i.e. proportional to squared prediction error. Optimal predictor minimizes expected loss E((X m) 2 ) = E ( ((X E(X)) + (E(X) m)) 2) = = E ( (X E(X)) 2 + 2(X E(X))(E(X) m) + (E(X) m) 2) = = E ( (X E(X)) 2 + (E(X) m) 2) which is minimal if m = E(X). Special case: If f X is symmetric around µ, i.e. f X (µ+ x) = f X (µ x), then, if E( X ) <, we have E(X) = µ. E(X) can be outside the support of X: see figure 3. Implication for prediction? 6

7 The mean E(X) is a measure of location of the distribution P X. A measure of dispersion is the variance of X defined by Var(X) = E ( (X E(X)) 2) Interpretation clear in discrete case Var(X) = i (x i E(X)) 2 f X (x i ) with x i E(X) deviation from the mean and f X (x i ) the weight, i.e. the probability of the deviation. We have Var(X) = E ( (X E(X)) 2) = E ( X 2 2XE(X) + E(X) 2) = = E(X 2 ) 2E(X) 2 + E(X) 2 = E(X 2 ) E(X) 2 Useful in computations. 7

8 Often we use µ or µ X for E(X) and σ 2 or σ 2 X for Var(X). The standard deviation of (the distribution of) X, often denoted by σ X is defined by σ X = Var(X) Example: Picking a number at random from [ 1, 1]. f X (x) = I [ 1,1] (x) 1 2 By symmetry E(X) = 0. Variance is equal to E(X 2 ) 1 Var(X) = σx 2 x 2 = 1 2 dx = x3 = Standard deviation σ X = 1 3 8

9 Mean and variance are determined by E(X) and E(X 2 ). These are the first two moments of the distribution of X. In general the k-th moment, often denoted by µ k, is µ k = E(X k ) We can also define the cumulants, i.e the moments around E(X) = µ m k = E ( (X µ) k) The third cumulant is called skewness and the fourth kurtosis. If the distribution is symmetric then skewness is 0 (see exercise). Kurtosis is a measure of peakedness (useful if distribution is symmetric). 9

10 More moments means more knowledge about distribution. What if w know all moments µ k, k = 1, 2,...? Useful too to obtain moments is the moment generating function of X, denoted by M X (t) and defined by M X (t) = E ( e tx) if this expectation exist for h < t < h, h > 0. Obviously M X (0) = 1. Take the derivative with respect to t and interchange integration and differentiation dm X (t) = dt xe tx f X (x)dx For a non-negative random variable this is allowed if Why? E ( Xe hx) < Hence dm X (0) = E(X) dt 10

11 In general so that d k M X (t k ) = x k e tx f X (x)dx dt d k M X dt k (0) = E(X k ) The moments E(X k ) do not uniquely determine the distribution of X. Casella and Berger give counterexample. With some further assumptions the moments do determine the distribution If the distributions of X and Y have bounded support, then they are the same if and only if the moments are the same. If the moment generating functions M X, M Y exist and if they are equal for h < t < h, then X and Y have the same distribution. 11

12 We can also consider the characteristic function m X (t) Because m X (t) = E ( e itx) e itx = sin(tx) + i cos(tx) the characteristic function always exists. There is a 1-1 correspondence between characteristic functions and distributions. 12

13 Special distributions There is a catalogue of standard distributions P X of random variables X. Often a random experiment that we encounter in practice is such that we are interested in the associated random variable X with such a standard distribution. Choosing such a standard distribution is the selection of a mathematical model for a random experiment, described by the probability space (R, B, P X ). Often P X depends on parameters that have to be chosen in order to have a fully specified mathematical model. Description of special distributions (i) In what type of random experiments can the standard distribution be used? (ii) Mean, variance, mgf (if exists). (iii) Shape of density, i.e. graph of density. 13

14 Discrete distributions Discrete uniform distribution Consider a random experiment with a finite number of outcomes that without loss of generality can be labeled 1,..., N. If outcomes are equally likely P X has a density with respect to the counting measure f X (x) = Pr(X = x) = 1 N, x = 1,..., N = 0, elsewhere This the discrete uniform distribution. This distribution has one parameter N. Moments etc. only have meaning if the outcomes 1,..., N are not just labels, but are a count. Moment generating function M X (t) = 1 N N e tk = 1 N k=1 et1 etn 1 e t Using N k=1 k = 1 2 N(N+1) and N k=1 k2 = N(N+1)(2N+1) 6, we have E(X) = 1 N k = N + 1 N 2 E(X 2 ) = 1 N k=1 N k 2 = k=1 (N + 1)(2N + 1) 6 14

15 so that Var(X) = E(X 2 ) E(X) 2 = (N + 1)(N 1) 12 Bernoulli distribution Random experiment has two outcomes that we label 0 and 1. Denote Pr(X = 1) = p. P X has a density with respect to the counting measure f X (x) = p x (1 p) 1 x, x = 0, 1 = 0, elsewhere This is the Bernoulli distribution. There is one parameter p with 0 p 1. The mgf is and M X (t) = pe t + 1 p E(X) = p E(X 2 ) = E(X) = p Var(X) = E(X 2 ) E(X) 2 = p p 2 = p(1 p) Binomial distribution Consider sequence of independent Bernoulli random experiments (or trials). Define X as the number of 1-s in n trials. Consider the event X = x. 15

16 For this event x trials must have outcome 1 and n x outcome 0. Sequence with x 1-s and n x 0-s is e.g The probability of this sequence is p x (1 p) n x. ( ) n There are sequences of 0-s and 1-s that x have the same probability, so that ( ) n Pr(X = x) = p x (1 p) n x x Hence P X has a density with respect to the counting measure ( ) n f X (x) = p x (1 p) n x, x = 0, 1,..., n x = 0, elsewhere This the Binomial distribution. Notation Binomial formula X B(n, p) (a + b) n = n a k b n k k=0 We use this formula to establish The density sums to 1 n ( ) n p x (1 p) n x = (p + (1 p)) n = 1 x x=0 16

17 The mgf is = M X (t) = n x=0 ( n x n x=0 ( n x ) e tx p x (1 p) n x = ) (pe t ) x (1 p) n x = ( pe t + 1 p ) n Using the mgf we find E(X) = dm X (0) = n ( pe t + 1 p ) n 1 pe t = np dt t=0 E(X 2 ) = d2 M X (0) = n ( pe t + 1 p ) n 1 pe t dt 2 + t=0 + n(n 1) ( pe t + 1 p ) n 2 p 2 e 2t t=0 = np+n(n 1)p 2 so that Var(X) = E(X 2 ) E(X) 2 = np(1 p) Let Y k be the outcome of the k-th Bernoulli trial, so that n X = k=1 with the Y k, k = 1,..., n stochastic independent. This implies that Y k E(X) = ne(y 1 ) = np Var(X) = nvar(y 1 ) = np(1 p) M X (t) = (M Y1 (t)) n = ( pe t + 1 p ) n Shape of the density f X 17

18 We have f X (x) f X (x 1) = 1 + (n + 1)p x p(1 p) We conclude that f X is increasing for x < (n+1)p and decreasing for x > (n + 1)p. If p > n n+1 then f X is increasing for x = 0,..., n and if p < 1 n+1 then f X is decreasing for x = 0,..., n. Otherwise f X is increasing/decreasing. The value of x that maximizes f X is called the mode of the distribution of X. For the binomial distribution the mode is the largest integer less than or the smallest integer greater than (n + 1)p. The binomial distribution has two parameters n, p with n a positive integer and 0 p 1. Example: sampling Let p be fraction of households in US with income less than $15000 per year. Select N households at random from the population. Define X is number of households among the n selected with income less than $ The distribution of X is binomial, if the selections of households are independent. This is true if the selection is done with replacement and approximately true if the population is sufficiently large. 18

19 Assume n = 100 and 16 households have an income less than $ Now 16 is an estimate of E(X) and this suggests that it is reasonable to guess that ˆp = 16 n =.16 or 16% of the US households has an income less than $

20 Hypergeometric distribution In the example we assumed (counterfactually) that selection was with replacement. Now consider a population of size N from which we select a sample of size n without replacement. In the population M households have an income of less that $ X is number of households among the n selected with income less than $ X = x iff we select x household from the M with ( an) income M of less that $15000: can be done in ways. x we select the remaining n x households from the N M ith an income ( greater than ) or equal N M to $15000: can be done in ways. n x 20

21 The total number of selections (without replacement) of ( n households ) from the population of N households N is. n Combining these results we have ( ) ( ) M N M x n x Pr(X = x) = ( ) N n 21

22 The distribution P X has a density with respect to the counting measure ( ) ( ) M N M f X (x) = x n x ( ) N, x = 0,..., n n = 0, otherwise The distribution P X is the Hypergeometric distribution. It can be shown (see Casella and Berger) E(X) = n M N Var(X) = N ( 1 n ) n M N 1 N N ( 1 M ) N Compare these results to those for the binomial distribution. 22

23 Geometric distribution Consider a sequence of independent Bernoulli random experiments with probability of outcome 1 equal to p. Call outcome 1 a success and outcome 0 a failure. Define X as the number of experiments before the first success. X = x iff the outcomes for x + 1 Bernoulli experiments are where there are x leading 0-s. Hence Pr(X = x) = (1 p) x p 23

24 P X has a density with respect to the counting measure f X (x) = (1 p) x p, x = 0, 1,... = 0, otherwise The distribution P X is called the Geometric distribution. This distribution has one parameter p with 0 p 1. The mgf is M X (t) = E ( e tx) = = p x=0 e tx 1 p) x p = x=0 ( (1 p)e t ) x = p 1 (1 p)e t 24

25 From the mgf we find so that E(X) = dm X (0) = 1 p dt p E(X 2 ) = d2 M X (0) = 1 p + dt 2 p 2 Var(X) = 1 p p 2 ( 1 p Sometimes we define X 1 is the number of Bernoulli experiments needed for first success. Then and e.g. X 1 = X + 1 M X1 (t) = E ( ) e tx 1 = e t te ( e tx) p ) 2 25

26 Example of geometric distribution: Consider a job seeker and let p be the probability of receiving a job offer in any week The week in which the first offer is received has the distribution P X1. We have for x 2 x 1 Pr(X 1 > x 2 X 1 > x 1 ) = Pr(X 1 > x 2 ) Pr(X 1 > x 1 ) = = x=x 2 +1 (1 p)x p x=x 1 +1 (1 p)x p = (1 p)x 2 (1 p) x 1 = (1 p) x 2 x 1 = = (1 p) x p = Pr(X 1 > x 2 x 1 ) x=x 2 x 1 +1 Conclusion: If the job seeker has waited x 1 weeks the probability that he/she has to wait another x 2 x 1 weeks is the same as the probability of waiting x 2 x 1 weeks from the beginning of the job search. The geometric distribution has no memory. 26

27 Negative binomial distribution Setup as for the geometric distribution. Define X as the number of failures before the r-th success. X = x iff trial x + r is success (event A) and in previous x+r 1 trials r 1 successes and x failures. Because the events A and B depend on independent random variables P (A B) = P (A)P (B). P (A) = p A sequence with r 1 successes and x failures has probability p r 1 (1 p) x. Because we can ( choose the ) x + r 1 r 1 successes in the x+r 1 trials in r 1 ways, this is the number of such sequences. Hence ( ) x + r 1 P (B) = p r 1 (1 p) x r 1 27

28 Combining Pr(X = x) = p ( x + r 1 r 1 ) p r 1 (1 p) x P X has density with respect to the counting measure ( ) x + r 1 f X (x) = p p r 1 (1 p) x r 1, x = 0, 1,... = 0, otherwise This is the Negative binomial distribution. The parameters are r (integer) and p with 0 p 1. 28

29 Poisson distribution Poisson distribution applies to number of occurrences of some event in a time interval of finite length, e.g. number of job offers received by job seeker in a month. Offers can arrive at any moment (in continuous time). Compare with the geometric distribution. Define X(a, b) as the number of offers in [a, b). The symbol o(h) (small o of h) denotes any function with lim h 0 o(h) h = 0. Assumptions (i) Pr(X(s, s + h) = 1) = λ(h) + o(h) (ii) Pr(X(s, s + h) 2) = o(h) (iii) X(a, b) and X(c, d) are independent if [a, b) [c, d) =. 29

30 Consider [0, t) and divide into n intervals with length h = t n. Then (neglect probabilities that are of order o(h)) ( ) n Pr(X(0, t) = k) = (λh) k (1 λh) n k = k ( ) k ( n = 1 λ t n k = k n) Now ) ( λ t n = 1 k! (λt)k ( 1 λt n ( lim 1 λt n n lim n ) n k = lim n ) n k n... (n k + 1) n... n n... (n k + 1) = 1 n... n ( 1 λt n ) k ( lim 1 λt ) n = e λt n n Conclusion: for n and if we write X for X(0, t) Pr(X = k) = e λt(λt)k k! 30

31 The distribution P X has a density with respect to the counting measure θ θx f X (x) = e, x = 0, 1,... x! = 0, otherwise The distribution P X is the Poisson distribution. It has one parameter θ > 0. Notation X Poisson(λ) The mgf is M X (t) = so that x=0 e tx θ θx e x! = e θ (e t θ) x x=0 x! = e (et 1)θ E(X) = θ E(X 2 ) = θ 2 + θ and Note E(X) = Var(X). Var(X) = θ 31

32 Continuous distributions Uniform distribution Random experiment: pick a number at random from [a, b]. P X ([a, x]) = x a = x a dx Hence P X measure has a density with respect to Lebesgue 1 f X (x) =, a x b b a = 0, otherwise P X is the Uniform distribution on [a, b]. Notation X U[a, b] We have M X (t) = ebt e at (b a)t E(X) = a + b 2 (b a)2 Var(X) = 12 32

33 Graph of density 33

34 Normal distribution The distribution P X has density with respect to the Lebesgue measure f X (x) = 1 σ 2π e 1 2σ 2 (x µ)2, < x < The mgf is M X (t) = E ( e tx) ) = e tµ E (e t(x µ) = Now = e tµ 1 σ 2π et(x µ) e 1 2σ 2 (x µ)2 dx = = e tµ 1 σ 1 2π e 2σ 2 ((x µ) 2 2σ 2 t(x µ)) dx (x µ) 2 2σ 2 t(x µ) = (x µ) 2 2σ 2 t(x µ)+σ 4 t 2 σ 4 t 2 = so that M X (t) = e tµ+ 1 2 σ2 t 2 = (x µ σ 2 t) 2 σ 4 t 2 1 σ 2π e 1 2σ 2 (x µ σ2 t) 2 dx = e tµ+ 1 2 σ2 t 2 34

35 From the mgf E(X) = µ E(X 2 ) = σ 2 + µ 2 so that Var(X) = σ 2 The distribution P X is the Normal distribution with mean µ and variance σ 2. Notation X N(µ, σ 2 ) 35

36 Define Then Z = X µ σ E(Z) = 0 Var(Z) = 1 Hence Z has a normal distribution with µ = 0, σ 2 = 1. This is the standard normal distribution with density φ(x) = 1 e 1 2 x2, < x < 2π and cdf Φ(x) = x φ(s)ds We can compute the probability of an interval [a, b] with the standard normal cdf ( a µ Pr(a X b) = Pr Z b µ ) = σ σ ( ) ( ) b µ a µ = Φ Φ σ σ 36

37 Shape of normal density: bell curve 37

38 Why is the normal distribution so popular? Galton s quincunx or dropping board Magnified Define X n position (relative to 0) after n rows of pins. If Z n takes values -1 and 1 and gives the direction at row n, then X n = Z Z n 38

39 If n is large then X n has approximately the normal distribution. Central limit theorem: Sum of many independent small effects gives normal distribution. 39

40 Exponential distribution Consider waiting time to an event that can occur at any time (compare with geometric distribution). Define the hazard or failure rate by Pr(event in [t, t + dt] event after t) = = Pr(t X < t + dt X t) = f X(t)dt 1 F X (t) Assume f X (t) 1 F X (t) = λ Then the solution to is obtained by integration f X (t) = λe λt 40

41 The distribution P X has a density with respect to the Lebesgue measure f X (x) = λe λx, x 0 = 0, otherwise P X has the Exponential distribution. There is one parameter λ > 0 and the notation is X Exp(λ) The mgf is and hence M X (t) = λ λ t E(X) = 1 λ var(x) = 1 λ 2 41

42 Note for t s Pr(X > t X > s) = Pr(X > t) Pr(X > s) = e λt e λs = e λ(t s) If you have waited s, the probability of an additional wait of t s is the same as if the wait had started at time 0. As the geometric distribution the exponential distribution has no memory. If X is length of human life, compare Pr(X > 40 X > 30) and Pr(X > 70 X > 60) Connection with Poisson distribution: If event is recurrent and waiting time has exponential distribution with parameter λ, then number of occurrences in [0, t] has a Poisson distribution with parameter λt. 42

43 Gamma distribution The Gamma distribution is the distribution of X = Y Y r with Y k independent exponential random variables with parameter λ. X is the waiting time to the r-the occurrence of the event. Compare with negative binomial distribution. The distribution P X has a density with respect to the Lebesgue measure f X (x) = λ Γ(r) (λx)r 1 e λx, x 0 = 0, otherwise with Γ the Γ function. Γ(r) = (r 1)! if r is a positive integer and otherwise it has to be computed numerically. 43

44 This is the Gamma distribution with parameters λ, r > 0. r need not be an integer. Notation X Γ(λ, r) The mgf is so that M X (t) = ( ) r λ t < λ λ t E(X) = r λ Var(X) = r λ 2 44

45 Lognormal distribution Let Y N(µ, σ 2 ) and define X = e Y. The distribution P X has density 1 f X (x) = xσ 1 2π e 2σ 2 (ln x µ)2, x 0 = 0, otherwise Derive this density. This is the Lognormal distribution with parameters µ and σ 2. The mean and variance can be derived from the mgf of the normal distribution E(X) = e µ+ 1 2 σ2 Var(X) = e 2µ+2σ2 e 2µ+σ2 What is E(ln X) and Var(ln X)? 45

46 Cauchy distribution A random variable that has a distribution with density with respect to the Lebesgue measure f X (x) = 1 ( πβ x α β ) 2, < x < has a Cauchy distribution with parameters α and β > 0. The density is symmetric around α. This is the median of X. E(X) does not exist and var(x) =. The mgf is for t 0. 46

47 Chi-squared distribution The chi-squared distribution is a special case of the Γ distribution: set r = k 2 and λ = 1 2. The density is 1 f X (x) = Γ ( ) k k x k 2 1 e x 2, x = 0, otherwise The parameter k is called the degrees od freedom of the distribution. The chi-squared distribution is important because of the following result: If X has a standard normal distribution, then Y = X 2 has a chi-squared distribution with k = 1. 47

48 We derive the mgf ( M Y (t) = E e tx2) = = 1 1 2t 1 2π 1 1 2π e tx2 1 2 x2 dx = e t 1 2t x 2 dx = ( = = 1 2t 1 2 t which is the mgf of the Γ distribution with r = 1 2 and λ = 1 2, i.e. the chi-squared distribution with k = 1. )

49 Exponential family of distributions The exponential family of densities are the densities that can expressed as f X (x) = h(x)c(θ)e k i=1 w i(θ)t i (x). < x < Note that c, w i, i = 1,..., k do not depend on x and h, t i, i = 1,..., k do not depnd on θ. θ is the vector of parameters of the distribution. Why useful: We will see that if we have data from an exponential family distribution, the information can be summarized by t i, i = 1,..., k. Examples (i) Binomial distribution: For x = 0,..., n ( ) ( ) n n f X (x) = p x (1 p) n x = (1 p) n e x ln ( 1 p) p x x Hence h(x) = ( n x ) t(x) = x c(θ) = (1 p) n ( ) p w(θ) = ln 1 p 49

50 (ii) Normal distribution: For < x < f X (x) = 1 σ 2π e 1 2σ 2 (x µ)2 = 1 Hence σ 2π e h(x) = 1 t 1 (x) = x 2 t 2 (x) = x µ 2 2σ 2 e 1 2σ 2 x2 + µ σ 2 x c(θ) = 1 σ µ 2 2π e 2σ 2 w 1 (θ) = 1 w 2σ 2 2 (θ) = µ σ 2 Other exponential family distributions: Poisson, exponential, Gamma. The density of the uniform distribution is f X (x) = 1 I(a x b) b a The function I(a x b) cannot be factorized in a function of x and a, b. Hence it does not belong to the exponential family. 50

51 Multivariate distributions: recapitulation Consider a probability space (Ω, A, P ) and define a vector of random variables or random vector X as a function X : Ω R K, i.e. X 1 (ω) X(ω) =. X K (ω) The distribution of X is a probability measure P X : B K [0, 1]. This is usually called the joint distribution of the random vector X. We consider the case that P X has a density with respect to the counting measure (discrete distribution) or with respect to the Lebesgue measure (continuous distribution). The density f X (x 1,..., x K ) is called the joint density of X. 51

52 We have with Pr(X 1 B) = P X (B R... R) = =... f X (x 1,..., x K )dx 1... dx K = B f X1 (x 1 ) = = B f X1 (x 1 )dx 1... f X (x 1, x 2,..., x K )dx 2... dx K f X1 is called the marginal density of X 1. The marginal density of X k for any k is obtained in the same way. For discrete distributions replace integration by summation. 52

53 Consider subvectors X 1,..., X K1 and X K1 +1,..., X K. The distributions of these subvectors are independent if and only if f X (x 1,..., x K ) = f X1...X K1 (x 1,..., x K1 )f XK X K (x K1 +1,..., x K ) i.e. the joint density is the product of the marginal densities. The conditional distribution of X 1,..., X K1 give X K1 +1,..., X K has density f X1...X K1 X K X K (x 1,..., x K1 x K1 +1,..., x K ) = f X (x 1,..., x K ) f XK X K (x K1 +1,..., x K ) i.e. it is the ratio of the joint density and the marginal density of the variables on which we condition. 53

54 If X is any subvector of X that does not have X1 as a component, then the conditional mean of X 1 given X = x can be computed using the conditional density of X 1 given X E(X 1 X = x) = x 1 f X1 X (x 1 x)dx 1 R For a discrete distribution replace integration by summation. The conditional variance of X 1 given X is Var(X 1 X = x) = E ((X 1 E(X 1 X ) = x)) 2 X = x We have Var(X 1 X = x) = = E (X 1 2 2X 1 E(X 1 X = x) + E(X 1 X ) = x) 2 X = x = = E ( X1 2 X = x ) 2E(X 1 X = x)) 2 +E(X 1 X = x)) 2 = = E ( ) X1 2 X = x E(X 1 X = x)) 2 Compare this result to that for the unconditional variance. 54

55 Law of iterated expectations: E(X 1 ) = E X(E X1 X (X 1 X)) Remember that on the rhs we just integrate E(X 1 X = x) with respect to the distribution of X. For the variance note [ E X Var(X 1 X) ] [ = E X E ( )] X1 2 X E X [E(X 1 X)) ] 2 and because E(X 1 X) is a random variable that is a function of X [ Var E(X 1 X) ] = E X [E(X 1 X) ] ( [ 2 E X E(X 1 X) ]) 2 we obtain if we add these equations ( E Var(X 1 X) ) +Var(E(X 1 X)) = E(X1) (E(X 2 1 )) 2 = Var(X 1 ) 55

56 Summary measures associated with multivariate distributions, i.e. distribution of a random vector X 56

57 Obvious: Means and variances of the random variables in X (marginal means and variances). In random vectors we also consider the covariance of any two components of X, say X 1 and X 2 Cov(X 1, X 2 ) = E [(X 1 E(X 1 ))(X 2 E(X 2 ))] The covariance is informative on the relation between X 1 and X 2, e.g. for a discrete distribution Cov(X 1, X 2 ) = (x 1 E(X 1 ))(x 2 E(X 2 ))f X1 X 2 (x 1, x 2 ) x 1 x 2 If outcomes with x 1 E(X 1 ) > 0 and x 2 E(X 2 ) > 0 or x 1 E(X 1 ) < 0 and x 2 E(X 2 ) < 0 (deviations go in same direction) are more likely than outcomes with x 1 E(X 1 ) > 0 and x 2 E(X 2 ) < 0 or x 1 E(X 1 ) < 0 and x 2 E(X 2 ) > 0 (deviations go in opposite directions), then. 57

58 In that case there is a positive association between X 1 and X 2. If the second type of outcomes are more likely Cov(X 1, X 2 ) < 0 and the association is negative. Note for constants c, d Cov(cX 1, dx 2 ) = cdcov(x 1, X 2 ) so that the size of Cov(X 1, X 2 ) is not a good measure of the strength of the association. To measure the strength we define the correlation coefficient of X 1, X 2 by ρ X1 X 2 = Cov(X 1, X 2 ) Var(X1 ) Var(X 2 ) 58

59 To derive its properties we need the Cauchy-Schwartz inequality Proof: Consider E(X 1 X 2 ) E(X 1 ) E(X 2 ) 0 E [ (tx 1 + X 2 ) 2] = t 2 E(X 2 1)+2tE(X 1 X 2 )+E(X 2 2) The rhs is a quadratic equation with at most one zero. The discriminant of the equation satisfies 4E(X 1 X 2 ) 2 4E(X 2 1)E(X 2 2) 0 Dividing by 4 and taking the square root gives the inequality. If then E [ (tx 1 + X 2 ) 2] = 0 Pr(tX 1 + X 2 = 0) = 1 i.e. the joint distribution is concentrated on the line tx 1 + x 2 = 0. 59

60 Properties of the correlation coefficient ρ cx1,dx 2 = ρ X1 X 2 By Cauchy-Schwartz Cov(X 1, X 2 ) = E [(X 1 E(X 2 ))(X 2 E(X 2 ))] E ((X 1 E(X 1 )) 2 ) E ((X 2 E(X 2 )) 2 ) so that ρ X1 X

61 Note ρ X1 X 2 = 1 Pr((X 2 E(X 2 )) = t(x 1 E(X 1 ))) = 1. Hence Pr(X 2 = a + bx 1 ) = 1 with a = E(X 2 ) te(x 1 ) and b = t. Note that Pr((X 2 E(X 2 )) = t(x 1 E(X 1 ))) = 1 Cov(X 1, X 2 ) = bvar(x 1 ) so that sign(ρ X1 X 2 ) = sign(cov(x 1, X 2 )) = sign(b). Conclusion: ρ X1 X 2 = 1 Pr(X 1 = a + bx 1 ) = 1 for b 0. If ρ X1 X 2 = 1 then b > 0 and if ρ X1 X 2 = 1 then b < 0 The correlation coefficient is a measure of the strength of the association and the extreme values correspond to a linear relation. 61

62 In the case of a multivariate distribution we organize the variances and covariances in a matrix, the variance(-covariance) matrix of X Var(X 1 ) Cov(X 1, X 2 ) Cov(X 1, X K ) Cov(X Var(X) = 1, X 2 ) Var(X 2 ) Cov(X 1, X K ) Var(X K ) Note that this is a symmetric K K matrix Var(X) = Var(X) Often we use the notation Var(X) = Σ 62

63 Remember if X is a K vector, then X 1 µ 1 (X µ)(x µ) =. [X 1 µ 1 X K µ K ] = X K µ K = (X 1 µ 1 ) 2 (X 1 µ 1 )(X 2 µ 2 ) (X 1 µ 1 )(X K µ K ) (X 1 µ 1 )(X 2 µ 2 ) (X 1 µ 1 ) (X 1 µ 1 )(X K µ K ) (X K µ K ) 2 so that if we denote µ = E(X) Σ = Var(X) = E ((X µ)(x µ) ) 63

64 Linear and quadratic functions of random vectors If X is a random vector with K components and a is a K vector of constants, we define the linear function of X K a X = a k X k k=1 Hence ( K ) E(a X) = E a k X k = Also k=1 K a k E(X k ) = a E(X) k=1 var(a X) = E [ (a X E(a X)) 2] = E [(a X a µ)(a X a µ)] = = E [(a X a µ)(x a µ a)] = E [a (X µ)(x µ) a] = = a [(X µ)(x µ) ] a 64

65 Moment generating function of a joint distribution If X is a random vector the mgf of X is M X (t) = E ( e t 1X 1 + +t K X K ) if the mgf exists for h < t k < h, k = 1,..., K. Note t 1 t =. Note that so that t K 2 M X (t) = E ( ) X 1 X 2 e t 1X 1 + +t K X K t 1 t 2 2 M X t 1 t 2 (0) = E (X 1 X 2 ) 65

66 This can be used to compute the covariance, because Cov(X 1, X 2 ) = E(X 1 X 2 ) E(X 1 )E(X 2 ) The mgf of the marginal distribution of X 1 is M X1 (t 1 ) = M X (t 1, 0,..., 0) 66

67 Special multivariate distributions Multinomial distribution Binomial distribution: Number of 1 s in n independent Bernoulli experiments. Instead of Bernoulli experiment with two outcomes consider random experiment wit K outcomes k = 1,..., K. Example is to pick student at random from class and record his/her nationality. Label nationalities with label k = 1,..., K. If fraction with nationality k is p k, then if outcome of random selection is Y we have Pr(Y = k) = p k with K k=1 p k = 1., k = 1,..., K 67

68 Repeat this experiment n times and let the repetitions be independent. Define X k is number of experiments with outcome k. Note K k=1 X k = n, so that X K is determined by X 1,..., X K 1. Consider a sequence of n outcomes Experiment n 1 n Outcome K 1 K Probability p 3 p 4 p 1 p 1... p K 1 p K Probability of this sequence is with x K = n K 1 k=1 x k. p x 1 1 px 2 2 px K 1 K 1 px K K To compute Pr(X 1 = x 1,..., X K 1 = x K 1 ) we count the number of such sequences. 68

69 This is equivalent to Pick x 1 experiments with outcome 1, x 2 with outcome 2 etc. from the n experiments. Start with picking the x 1 experiments with outcome 1 among ( ) the n experiments. This can be n done in ways. x 1 From the remaining n x 1 experiments pick the experiments ( ) with outcome 2. This can be done n x1 in ways. x 2 The total number of ways to choose the experiments with outcomes 1 and 2 is: ( ) ( ) n n x1 n! = x 1!x 2!(n x 1 x 2 )! x 1 x 2 Using the same argument repeatedly we find that the total number of ways to choose the experiments with outcomes 1, 2,... K is: n! x 1! (n x 1 x K 1 )! = n! x 1! x K! 69

70 Hence Pr(X 1 = x 1,..., X K 1 = x K 1 ) = n! x 1! x K! px 1 1 px 2 2 px K 1 K 1 px K K The Multinomial joint density of X 1,..., X K 1 is n! K f X (x 1,..., x K 1 ) = K k=1 x p x k k 0 x k n, k! k=1 = 0 otherwise Multinomial formula (a a K ) n = x 1 + +x K =n n! x 1! x K! ax 1 1 ax K K K x k = n k=1 70

71 Using this the mgf is = x 1 + +x K =n M X (t) = E ( e t 1X 1 + t K 1 X K 1 ) = n! x 1! x K! ( K 1 = k=1 From the mgf we find ( e t 1 p 1 ) x1 (e t K 1 p K 1 ) xk 1 p x K K = e t k p k + p K ) n E(X k ) = np k Var(X k ) = np k (1 p k ) Cov(X k, X l ) = np k p l Exercise: What is the marginal distribution of X k? What is the conditional distribution of X 1, X 2 given X 3 = x 3,..., X K 1 = x K 1? 71

72 Multivariate normal distribution The K random vector X has K-dimensional Multivariate normal distribution if its distribution has a density with respect to the K-dimensional Lebesgue measure equal to f X (x) = 1 Σ 1 2(2π) K 2 e 1 2 (x µ) Σ 1 (x µ), < x < By completion of squares (see 1-dimensional case) the mgf is M X (t) = e t µ+ 1 2 t Σt Exercise: Derive the mgf. Hence E(X) = µ Exercise: Derive these results. Var(X) = Σ The marginal distribution of X k normal with mean µ k and variance σk 2, the k-th element of the main diagonal of Σ. Exercise: Prove this using the mgf. 72

73 Special case K = 2, the bivariate normal distribution. Let the random vector be ( ) Y X The conditional distribution of Y given X = x is normal with E(Y X = x) = µ Y + σ XY (X µ X ) Var(Y X = x) = σ 2 Y with σ XY = Cov(X, Y ) σ 2 X ( ) 1 σ2 ( ) XY σx 2 = σ 2 σ2 Y 1 ρ 2 XY Y The conditional mean is linear in x. Compare wit result that Pr(Y = a+bx) = 1 if and only if ρ XY = 1. 73

74 Regression fallacy or regression to the mean Francis Galton ( ), observed that tall fathers have on average shorter sons, and short fathers have on average taller sons (in Victorian England mothers and daughters did not count). If this process were to continue, one would expect that in the long run extremes would disappear and all fathers and sons will have the average height. Using the same reasoning: Short sons have on average taller fathers (with a height closer to the mean) and tall sons have on average smaller fathers (again with a height closer to the mean). By this argument there is a tendency to move away from the mean! Similar observations can be made about many phenomena: Rookie players who do exceptionally well in the first year, tend to have a slump in the second; bringing in new management when a company underperforms seems to improve performance etc. 74

75 Analysis X Y = height of father = height of son Reasonable assumption: X, Y have a bivariate normal distribution with E(X) = E(Y ) = µ Var(X) = var(y ) = σ 2 0 < ρ XY < 1 75

76 Hence E(Y X = x) = µ + ρ(x µ) If x > µ 0 < E(Y X = x) µ < x µ i.e. average height of sons with fathers with more than average height is closer to the mean. If x < µ 0 > E(Y X = x) µ > x µ i.e. average height of sons with fathers with less than average height is closer to the mean. However, heights of fathers and sons have the same (normal) distribution, i.e. no change over the generations. 76

77 The distribution of linear and quadratic functions of normal random vectors X is a K random vector with X N(µ, Σ) Consider the random variables (i) Y 1 = a X with a a K vector of constants (scalar). (ii) Y 2 = AX + b with A an M K matrix and b an M vector of constants. (iii) Y 3 = X CX with C an K K matrix of constants. C is symmetric. 77

78 From the mgf of Y 1 and Y 2 we find (i) Y 1 N(a µ, a Σa).Exercise: Derive this. (ii) Y 2 N(Aµ + b, AΣA ) Exercise: Derive this. We verify E(Y 2 ) = AE(X) + b = Aµ + b Var(Y 2 ) = E [(Y 2 Aµ b)(y 2 Aµ b) ] = = E [(AX Aµ)(AX Aµ) ] = E [A(X µ)(x µ)a ] = = AE [(X µ)(x µ)] A = AΣA (iii) Special case X N(0, I) and C idempotent, i.e. C 2 = C, the matrix generalization of unity. 78

79 P is the K K matrix of eigenvectors of C and choose P such that P P = I, i.e. P is orthonormal. Define the diagonal matrix of eigenvalues λ 1 0 Λ = λ K We have and CP = P Λ P CP = Λ C = P ΛP because by P P = I we have P = (P ) 1. Hence P ΛP = C = C 2 = P Λ 2 P so that Λ 2 = Λ 79

80 This implies that λ k is either 0 or 1. Let L be 1 and consider so that Z N(0, 1). Hence Z = P X Y 3 = X P ΛP X = Z ΛZ = Finally K λ k ZK 2 χ 2 (L) k=1 tr(c) = tr(p ΛP ) = tr(λp P ) = tr(λ) = L Let X 1 and X 2 be subvectors of X of dimensions K 1 and K 2 with K 1 +K 2 = K. Then the variance matrix of X is ( ) Σ11 Σ Σ = 12 Σ 12 Σ 22 with Var(X 1 ) = Σ 11 and Var(X 2 ) = Σ 22 and Σ 12 = E((X 1 µ 1 )(X 2 µ 2 ) ). We have that X 1 and X 2 are independent if and only if Σ 12 = 0. To see this note that if Σ 12 = 0 ( ) 1 ( ) Σ 1 Σ11 0 Σ = = 0 Σ 22 0 Σ 1 22 Hence (x µ) Σ 1 (x µ) = (x 1 µ 1 ) Σ 1 11 (x 1 µ 1 )+(x 2 µ 2 ) Σ 1 22 (x 2 µ 2 ) Substitution in the density of the multivariate normal distribution shows that this density factorizes in a function of x 1 and a function of x 2, which establishes that these random vectors are independent. 80

81 Conclusion: In the normal distribution X 1, X 2 are independent if and only if Cov(X 1, X 2 ) = 0. Define Y 4 = X BX with B idempotent. Then if X N(0, I) (i) Y 1 and Y 3 are independent if and only if Ba = 0. (ii) Y 3 and Y 4 are stochastically independent if and only if BC = CB = 0. Proof: (i) Y 3 = X CX = X C 2 X = X C CX which is a function of CX. Hence Y 1 and Y 3 are independent if and only if Cov(BX, a X) = E(BXX a) = Ba = 0 (ii) Y 3 = X C CX and Y 4 = X D DX so that Y 3 and Y 4 are independent if and only if Cov(BXX C ) = BC = 0. 81

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

MAS113 Introduction to Probability and Statistics. Proofs of theorems

MAS113 Introduction to Probability and Statistics. Proofs of theorems MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

ECON 5350 Class Notes Review of Probability and Distribution Theory

ECON 5350 Class Notes Review of Probability and Distribution Theory ECON 535 Class Notes Review of Probability and Distribution Theory 1 Random Variables Definition. Let c represent an element of the sample space C of a random eperiment, c C. A random variable is a one-to-one

More information

Brief Review of Probability

Brief Review of Probability Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

15 Discrete Distributions

15 Discrete Distributions Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3) STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 07 Néhémy Lim Moment functions Moments of a random variable Definition.. Let X be a rrv on probability space (Ω, A, P). For a given r N, E[X r ], if it

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Errata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA

Errata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Effective July 5, 3, only the latest edition of this manual will have its

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Week 1 Quantitative Analysis of Financial Markets Distributions A

Week 1 Quantitative Analysis of Financial Markets Distributions A Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

Actuarial Science Exam 1/P

Actuarial Science Exam 1/P Actuarial Science Exam /P Ville A. Satopää December 5, 2009 Contents Review of Algebra and Calculus 2 2 Basic Probability Concepts 3 3 Conditional Probability and Independence 4 4 Combinatorial Principles,

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform

More information

Things to remember when learning probability distributions:

Things to remember when learning probability distributions: SPECIAL DISTRIBUTIONS Some distributions are special because they are useful They include: Poisson, exponential, Normal (Gaussian), Gamma, geometric, negative binomial, Binomial and hypergeometric distributions

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

MAS113 Introduction to Probability and Statistics. Proofs of theorems

MAS113 Introduction to Probability and Statistics. Proofs of theorems MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a

More information

ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata

ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Errata Effective July 5, 3, only the latest edition of this manual will have its errata

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

ASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata

ASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata ASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Errata Effective July 5, 3, only the latest edition of this manual will have its errata

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Discrete Distributions

Discrete Distributions Chapter 2 Discrete Distributions 2.1 Random Variables of the Discrete Type An outcome space S is difficult to study if the elements of S are not numbers. However, we can associate each element/outcome

More information

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Probability Models. 4. What is the definition of the expectation of a discrete random variable? 1 Probability Models The list of questions below is provided in order to help you to prepare for the test and exam. It reflects only the theoretical part of the course. You should expect the questions

More information

1.1 Review of Probability Theory

1.1 Review of Probability Theory 1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

1.6 Families of Distributions

1.6 Families of Distributions Your text 1.6. FAMILIES OF DISTRIBUTIONS 15 F(x) 0.20 1.0 0.15 0.8 0.6 Density 0.10 cdf 0.4 0.05 0.2 0.00 a b c 0.0 x Figure 1.1: N(4.5, 2) Distribution Function and Cumulative Distribution Function for

More information

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009. NAME: ECE 32 Division 2 Exam 2 Solutions, /4/29. You will be required to show your student ID during the exam. This is a closed-book exam. A formula sheet is provided. No calculators are allowed. Total

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

STAT/MATH 395 PROBABILITY II

STAT/MATH 395 PROBABILITY II STAT/MATH 395 PROBABILITY II Chapter 6 : Moment Functions Néhémy Lim 1 1 Department of Statistics, University of Washington, USA Winter Quarter 2016 of Common Distributions Outline 1 2 3 of Common Distributions

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

ECON Fundamentals of Probability

ECON Fundamentals of Probability ECON 351 - Fundamentals of Probability Maggie Jones 1 / 32 Random Variables A random variable is one that takes on numerical values, i.e. numerical summary of a random outcome e.g., prices, total GDP,

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two

More information

2 (Statistics) Random variables

2 (Statistics) Random variables 2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution DiscreteUniform(n). Discrete. Rationale Equally likely outcomes. The interval 1, 2,..., n of

More information

Review for the previous lecture

Review for the previous lecture Lecture 1 and 13 on BST 631: Statistical Theory I Kui Zhang, 09/8/006 Review for the previous lecture Definition: Several discrete distributions, including discrete uniform, hypergeometric, Bernoulli,

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr. Simulation Discrete-Event System Simulation Chapter 4 Statistical Models in Simulation Purpose & Overview The world the model-builder sees is probabilistic rather than deterministic. Some statistical model

More information

1 Review of Probability

1 Review of Probability 1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x

More information

Chp 4. Expectation and Variance

Chp 4. Expectation and Variance Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.

More information

CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY

CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY CHAPTER 1 DISTRIBUTION THEORY 2 Basic Concepts CHAPTER 1 DISTRIBUTION THEORY 3 Random Variables (R.V.) discrete random variable: probability

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Moment Generating Functions

Moment Generating Functions MATH 382 Moment Generating Functions Dr. Neal, WKU Definition. Let X be a random variable. The moment generating function (mgf) of X is the function M X : R R given by M X (t ) = E[e X t ], defined for

More information

Continuous Distributions

Continuous Distributions A normal distribution and other density functions involving exponential forms play the most important role in probability and statistics. They are related in a certain way, as summarized in a diagram later

More information

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution Lecture #5 BMIR Lecture Series on Probability and Statistics Fall, 2015 Department of Biomedical Engineering and Environmental Sciences National Tsing Hua University s 5.1 Definition ( ) A continuous random

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Sampling Distributions

Sampling Distributions Sampling Distributions In statistics, a random sample is a collection of independent and identically distributed (iid) random variables, and a sampling distribution is the distribution of a function of

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

discrete random variable: probability mass function continuous random variable: probability density function

discrete random variable: probability mass function continuous random variable: probability density function CHAPTER 1 DISTRIBUTION THEORY 1 Basic Concepts Random Variables discrete random variable: probability mass function continuous random variable: probability density function CHAPTER 1 DISTRIBUTION THEORY

More information

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise. 54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and

More information

Slides 8: Statistical Models in Simulation

Slides 8: Statistical Models in Simulation Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Continuous Distributions

Continuous Distributions Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a countable number of outcomes (i.e. of discrete type). In Chapter 3, we study

More information

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2015 Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler

More information

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013 Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm 2 Chapter Gaussian Vectors.

More information

Week 2. Review of Probability, Random Variables and Univariate Distributions

Week 2. Review of Probability, Random Variables and Univariate Distributions Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

3 Continuous Random Variables

3 Continuous Random Variables Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY

More information

STAT Chapter 5 Continuous Distributions

STAT Chapter 5 Continuous Distributions STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range

More information

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B Statistics STAT:5 (22S:93), Fall 25 Sample Final Exam B Please write your answers in the exam books provided.. Let X, Y, and Y 2 be independent random variables with X N(µ X, σ 2 X ) and Y i N(µ Y, σ 2

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems

More information

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition) Exam P Review Sheet log b (b x ) = x log b (y k ) = k log b (y) log b (y) = ln(y) ln(b) log b (yz) = log b (y) + log b (z) log b (y/z) = log b (y) log b (z) ln(e x ) = x e ln(y) = y for y > 0. d dx ax

More information

Exponential Distribution and Poisson Process

Exponential Distribution and Poisson Process Exponential Distribution and Poisson Process Stochastic Processes - Lecture Notes Fatih Cavdur to accompany Introduction to Probability Models by Sheldon M. Ross Fall 215 Outline Introduction Exponential

More information

Set Theory Digression

Set Theory Digression 1 Introduction to Probability 1.1 Basic Rules of Probability Set Theory Digression A set is defined as any collection of objects, which are called points or elements. The biggest possible collection of

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment: Moments Lecture 10: Central Limit Theorem and CDFs Sta230 / Mth 230 Colin Rundel Raw moment: Central moment: µ n = EX n ) µ n = E[X µ) 2 ] February 25, 2014 Normalized / Standardized moment: µ n σ n Sta230

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Lectures on Elementary Probability. William G. Faris

Lectures on Elementary Probability. William G. Faris Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

Statistical distributions: Synopsis

Statistical distributions: Synopsis Statistical distributions: Synopsis Basics of Distributions Special Distributions: Binomial, Exponential, Poisson, Gamma, Chi-Square, F, Extreme-value etc Uniform Distribution Empirical Distributions Quantile

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Ching-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12

Ching-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12 Lecture 5 Continuous Random Variables BMIR Lecture Series in Probability and Statistics Ching-Han Hsu, BMES, National Tsing Hua University c 215 by Ching-Han Hsu, Ph.D., BMIR Lab 5.1 1 Uniform Distribution

More information

Probability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution

Probability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution Probability distributions Probability Distribution Functions G. Jogesh Babu Department of Statistics Penn State University September 27, 2011 http://en.wikipedia.org/wiki/probability_distribution We discuss

More information