Lecture 6: Special probability distributions. Summarizing probability distributions. Let X be a random variable with probability distribution
|
|
- Emory Copeland
- 6 years ago
- Views:
Transcription
1 Econ 514: Probability and Statistics Lecture 6: Special probability distributions Summarizing probability distributions Let X be a random variable with probability distribution P X. We consider two types of probability distributions Discrete distributions: P X is absolutely continuous with respect to the counting measure. Continuous distributions: P X is absolutely continuous with respect to the Lebesgue measure. In both cases there is a density f X. Initially we consider X scalar and later random vectors X. 1
2 How do we summarize the probability distribution P X? Obvious method: Make graph of density. Figure 1 for discrete case and figures 2,3 for continuous case. 2
3 3
4 4
5 Graph can be used to visualize support and to compute probabilities. Intervals where f X is large have a high probability. Summarizing using moments We can also try to summarize P X by numbers. This never gives a complete picture, because we summarize a function f X by some number. Obvious choice is E(X), the expected value of X and the mean of the distribution P X. Interpretation: average over repetitions. Repeat the random experiment N times and call the outcomes X 1, X 2, X 3,..., X N. If N is large then 1 N N i=1 X i E(X), i.e. the mean is the average over repetitions. 5
6 Interpretation: optimal prediction. Consider predictor m of outcome X. Prediction error of this predictor X m Assume that the loss function is proportional to (X m) 2 i.e. proportional to squared prediction error. Optimal predictor minimizes expected loss E((X m) 2 ) = E ( ((X E(X)) + (E(X) m)) 2) = = E ( (X E(X)) 2 + 2(X E(X))(E(X) m) + (E(X) m) 2) = = E ( (X E(X)) 2 + (E(X) m) 2) which is minimal if m = E(X). Special case: If f X is symmetric around µ, i.e. f X (µ+ x) = f X (µ x), then, if E( X ) <, we have E(X) = µ. E(X) can be outside the support of X: see figure 3. Implication for prediction? 6
7 The mean E(X) is a measure of location of the distribution P X. A measure of dispersion is the variance of X defined by Var(X) = E ( (X E(X)) 2) Interpretation clear in discrete case Var(X) = i (x i E(X)) 2 f X (x i ) with x i E(X) deviation from the mean and f X (x i ) the weight, i.e. the probability of the deviation. We have Var(X) = E ( (X E(X)) 2) = E ( X 2 2XE(X) + E(X) 2) = = E(X 2 ) 2E(X) 2 + E(X) 2 = E(X 2 ) E(X) 2 Useful in computations. 7
8 Often we use µ or µ X for E(X) and σ 2 or σ 2 X for Var(X). The standard deviation of (the distribution of) X, often denoted by σ X is defined by σ X = Var(X) Example: Picking a number at random from [ 1, 1]. f X (x) = I [ 1,1] (x) 1 2 By symmetry E(X) = 0. Variance is equal to E(X 2 ) 1 Var(X) = σx 2 x 2 = 1 2 dx = x3 = Standard deviation σ X = 1 3 8
9 Mean and variance are determined by E(X) and E(X 2 ). These are the first two moments of the distribution of X. In general the k-th moment, often denoted by µ k, is µ k = E(X k ) We can also define the cumulants, i.e the moments around E(X) = µ m k = E ( (X µ) k) The third cumulant is called skewness and the fourth kurtosis. If the distribution is symmetric then skewness is 0 (see exercise). Kurtosis is a measure of peakedness (useful if distribution is symmetric). 9
10 More moments means more knowledge about distribution. What if w know all moments µ k, k = 1, 2,...? Useful too to obtain moments is the moment generating function of X, denoted by M X (t) and defined by M X (t) = E ( e tx) if this expectation exist for h < t < h, h > 0. Obviously M X (0) = 1. Take the derivative with respect to t and interchange integration and differentiation dm X (t) = dt xe tx f X (x)dx For a non-negative random variable this is allowed if Why? E ( Xe hx) < Hence dm X (0) = E(X) dt 10
11 In general so that d k M X (t k ) = x k e tx f X (x)dx dt d k M X dt k (0) = E(X k ) The moments E(X k ) do not uniquely determine the distribution of X. Casella and Berger give counterexample. With some further assumptions the moments do determine the distribution If the distributions of X and Y have bounded support, then they are the same if and only if the moments are the same. If the moment generating functions M X, M Y exist and if they are equal for h < t < h, then X and Y have the same distribution. 11
12 We can also consider the characteristic function m X (t) Because m X (t) = E ( e itx) e itx = sin(tx) + i cos(tx) the characteristic function always exists. There is a 1-1 correspondence between characteristic functions and distributions. 12
13 Special distributions There is a catalogue of standard distributions P X of random variables X. Often a random experiment that we encounter in practice is such that we are interested in the associated random variable X with such a standard distribution. Choosing such a standard distribution is the selection of a mathematical model for a random experiment, described by the probability space (R, B, P X ). Often P X depends on parameters that have to be chosen in order to have a fully specified mathematical model. Description of special distributions (i) In what type of random experiments can the standard distribution be used? (ii) Mean, variance, mgf (if exists). (iii) Shape of density, i.e. graph of density. 13
14 Discrete distributions Discrete uniform distribution Consider a random experiment with a finite number of outcomes that without loss of generality can be labeled 1,..., N. If outcomes are equally likely P X has a density with respect to the counting measure f X (x) = Pr(X = x) = 1 N, x = 1,..., N = 0, elsewhere This the discrete uniform distribution. This distribution has one parameter N. Moments etc. only have meaning if the outcomes 1,..., N are not just labels, but are a count. Moment generating function M X (t) = 1 N N e tk = 1 N k=1 et1 etn 1 e t Using N k=1 k = 1 2 N(N+1) and N k=1 k2 = N(N+1)(2N+1) 6, we have E(X) = 1 N k = N + 1 N 2 E(X 2 ) = 1 N k=1 N k 2 = k=1 (N + 1)(2N + 1) 6 14
15 so that Var(X) = E(X 2 ) E(X) 2 = (N + 1)(N 1) 12 Bernoulli distribution Random experiment has two outcomes that we label 0 and 1. Denote Pr(X = 1) = p. P X has a density with respect to the counting measure f X (x) = p x (1 p) 1 x, x = 0, 1 = 0, elsewhere This is the Bernoulli distribution. There is one parameter p with 0 p 1. The mgf is and M X (t) = pe t + 1 p E(X) = p E(X 2 ) = E(X) = p Var(X) = E(X 2 ) E(X) 2 = p p 2 = p(1 p) Binomial distribution Consider sequence of independent Bernoulli random experiments (or trials). Define X as the number of 1-s in n trials. Consider the event X = x. 15
16 For this event x trials must have outcome 1 and n x outcome 0. Sequence with x 1-s and n x 0-s is e.g The probability of this sequence is p x (1 p) n x. ( ) n There are sequences of 0-s and 1-s that x have the same probability, so that ( ) n Pr(X = x) = p x (1 p) n x x Hence P X has a density with respect to the counting measure ( ) n f X (x) = p x (1 p) n x, x = 0, 1,..., n x = 0, elsewhere This the Binomial distribution. Notation Binomial formula X B(n, p) (a + b) n = n a k b n k k=0 We use this formula to establish The density sums to 1 n ( ) n p x (1 p) n x = (p + (1 p)) n = 1 x x=0 16
17 The mgf is = M X (t) = n x=0 ( n x n x=0 ( n x ) e tx p x (1 p) n x = ) (pe t ) x (1 p) n x = ( pe t + 1 p ) n Using the mgf we find E(X) = dm X (0) = n ( pe t + 1 p ) n 1 pe t = np dt t=0 E(X 2 ) = d2 M X (0) = n ( pe t + 1 p ) n 1 pe t dt 2 + t=0 + n(n 1) ( pe t + 1 p ) n 2 p 2 e 2t t=0 = np+n(n 1)p 2 so that Var(X) = E(X 2 ) E(X) 2 = np(1 p) Let Y k be the outcome of the k-th Bernoulli trial, so that n X = k=1 with the Y k, k = 1,..., n stochastic independent. This implies that Y k E(X) = ne(y 1 ) = np Var(X) = nvar(y 1 ) = np(1 p) M X (t) = (M Y1 (t)) n = ( pe t + 1 p ) n Shape of the density f X 17
18 We have f X (x) f X (x 1) = 1 + (n + 1)p x p(1 p) We conclude that f X is increasing for x < (n+1)p and decreasing for x > (n + 1)p. If p > n n+1 then f X is increasing for x = 0,..., n and if p < 1 n+1 then f X is decreasing for x = 0,..., n. Otherwise f X is increasing/decreasing. The value of x that maximizes f X is called the mode of the distribution of X. For the binomial distribution the mode is the largest integer less than or the smallest integer greater than (n + 1)p. The binomial distribution has two parameters n, p with n a positive integer and 0 p 1. Example: sampling Let p be fraction of households in US with income less than $15000 per year. Select N households at random from the population. Define X is number of households among the n selected with income less than $ The distribution of X is binomial, if the selections of households are independent. This is true if the selection is done with replacement and approximately true if the population is sufficiently large. 18
19 Assume n = 100 and 16 households have an income less than $ Now 16 is an estimate of E(X) and this suggests that it is reasonable to guess that ˆp = 16 n =.16 or 16% of the US households has an income less than $
20 Hypergeometric distribution In the example we assumed (counterfactually) that selection was with replacement. Now consider a population of size N from which we select a sample of size n without replacement. In the population M households have an income of less that $ X is number of households among the n selected with income less than $ X = x iff we select x household from the M with ( an) income M of less that $15000: can be done in ways. x we select the remaining n x households from the N M ith an income ( greater than ) or equal N M to $15000: can be done in ways. n x 20
21 The total number of selections (without replacement) of ( n households ) from the population of N households N is. n Combining these results we have ( ) ( ) M N M x n x Pr(X = x) = ( ) N n 21
22 The distribution P X has a density with respect to the counting measure ( ) ( ) M N M f X (x) = x n x ( ) N, x = 0,..., n n = 0, otherwise The distribution P X is the Hypergeometric distribution. It can be shown (see Casella and Berger) E(X) = n M N Var(X) = N ( 1 n ) n M N 1 N N ( 1 M ) N Compare these results to those for the binomial distribution. 22
23 Geometric distribution Consider a sequence of independent Bernoulli random experiments with probability of outcome 1 equal to p. Call outcome 1 a success and outcome 0 a failure. Define X as the number of experiments before the first success. X = x iff the outcomes for x + 1 Bernoulli experiments are where there are x leading 0-s. Hence Pr(X = x) = (1 p) x p 23
24 P X has a density with respect to the counting measure f X (x) = (1 p) x p, x = 0, 1,... = 0, otherwise The distribution P X is called the Geometric distribution. This distribution has one parameter p with 0 p 1. The mgf is M X (t) = E ( e tx) = = p x=0 e tx 1 p) x p = x=0 ( (1 p)e t ) x = p 1 (1 p)e t 24
25 From the mgf we find so that E(X) = dm X (0) = 1 p dt p E(X 2 ) = d2 M X (0) = 1 p + dt 2 p 2 Var(X) = 1 p p 2 ( 1 p Sometimes we define X 1 is the number of Bernoulli experiments needed for first success. Then and e.g. X 1 = X + 1 M X1 (t) = E ( ) e tx 1 = e t te ( e tx) p ) 2 25
26 Example of geometric distribution: Consider a job seeker and let p be the probability of receiving a job offer in any week The week in which the first offer is received has the distribution P X1. We have for x 2 x 1 Pr(X 1 > x 2 X 1 > x 1 ) = Pr(X 1 > x 2 ) Pr(X 1 > x 1 ) = = x=x 2 +1 (1 p)x p x=x 1 +1 (1 p)x p = (1 p)x 2 (1 p) x 1 = (1 p) x 2 x 1 = = (1 p) x p = Pr(X 1 > x 2 x 1 ) x=x 2 x 1 +1 Conclusion: If the job seeker has waited x 1 weeks the probability that he/she has to wait another x 2 x 1 weeks is the same as the probability of waiting x 2 x 1 weeks from the beginning of the job search. The geometric distribution has no memory. 26
27 Negative binomial distribution Setup as for the geometric distribution. Define X as the number of failures before the r-th success. X = x iff trial x + r is success (event A) and in previous x+r 1 trials r 1 successes and x failures. Because the events A and B depend on independent random variables P (A B) = P (A)P (B). P (A) = p A sequence with r 1 successes and x failures has probability p r 1 (1 p) x. Because we can ( choose the ) x + r 1 r 1 successes in the x+r 1 trials in r 1 ways, this is the number of such sequences. Hence ( ) x + r 1 P (B) = p r 1 (1 p) x r 1 27
28 Combining Pr(X = x) = p ( x + r 1 r 1 ) p r 1 (1 p) x P X has density with respect to the counting measure ( ) x + r 1 f X (x) = p p r 1 (1 p) x r 1, x = 0, 1,... = 0, otherwise This is the Negative binomial distribution. The parameters are r (integer) and p with 0 p 1. 28
29 Poisson distribution Poisson distribution applies to number of occurrences of some event in a time interval of finite length, e.g. number of job offers received by job seeker in a month. Offers can arrive at any moment (in continuous time). Compare with the geometric distribution. Define X(a, b) as the number of offers in [a, b). The symbol o(h) (small o of h) denotes any function with lim h 0 o(h) h = 0. Assumptions (i) Pr(X(s, s + h) = 1) = λ(h) + o(h) (ii) Pr(X(s, s + h) 2) = o(h) (iii) X(a, b) and X(c, d) are independent if [a, b) [c, d) =. 29
30 Consider [0, t) and divide into n intervals with length h = t n. Then (neglect probabilities that are of order o(h)) ( ) n Pr(X(0, t) = k) = (λh) k (1 λh) n k = k ( ) k ( n = 1 λ t n k = k n) Now ) ( λ t n = 1 k! (λt)k ( 1 λt n ( lim 1 λt n n lim n ) n k = lim n ) n k n... (n k + 1) n... n n... (n k + 1) = 1 n... n ( 1 λt n ) k ( lim 1 λt ) n = e λt n n Conclusion: for n and if we write X for X(0, t) Pr(X = k) = e λt(λt)k k! 30
31 The distribution P X has a density with respect to the counting measure θ θx f X (x) = e, x = 0, 1,... x! = 0, otherwise The distribution P X is the Poisson distribution. It has one parameter θ > 0. Notation X Poisson(λ) The mgf is M X (t) = so that x=0 e tx θ θx e x! = e θ (e t θ) x x=0 x! = e (et 1)θ E(X) = θ E(X 2 ) = θ 2 + θ and Note E(X) = Var(X). Var(X) = θ 31
32 Continuous distributions Uniform distribution Random experiment: pick a number at random from [a, b]. P X ([a, x]) = x a = x a dx Hence P X measure has a density with respect to Lebesgue 1 f X (x) =, a x b b a = 0, otherwise P X is the Uniform distribution on [a, b]. Notation X U[a, b] We have M X (t) = ebt e at (b a)t E(X) = a + b 2 (b a)2 Var(X) = 12 32
33 Graph of density 33
34 Normal distribution The distribution P X has density with respect to the Lebesgue measure f X (x) = 1 σ 2π e 1 2σ 2 (x µ)2, < x < The mgf is M X (t) = E ( e tx) ) = e tµ E (e t(x µ) = Now = e tµ 1 σ 2π et(x µ) e 1 2σ 2 (x µ)2 dx = = e tµ 1 σ 1 2π e 2σ 2 ((x µ) 2 2σ 2 t(x µ)) dx (x µ) 2 2σ 2 t(x µ) = (x µ) 2 2σ 2 t(x µ)+σ 4 t 2 σ 4 t 2 = so that M X (t) = e tµ+ 1 2 σ2 t 2 = (x µ σ 2 t) 2 σ 4 t 2 1 σ 2π e 1 2σ 2 (x µ σ2 t) 2 dx = e tµ+ 1 2 σ2 t 2 34
35 From the mgf E(X) = µ E(X 2 ) = σ 2 + µ 2 so that Var(X) = σ 2 The distribution P X is the Normal distribution with mean µ and variance σ 2. Notation X N(µ, σ 2 ) 35
36 Define Then Z = X µ σ E(Z) = 0 Var(Z) = 1 Hence Z has a normal distribution with µ = 0, σ 2 = 1. This is the standard normal distribution with density φ(x) = 1 e 1 2 x2, < x < 2π and cdf Φ(x) = x φ(s)ds We can compute the probability of an interval [a, b] with the standard normal cdf ( a µ Pr(a X b) = Pr Z b µ ) = σ σ ( ) ( ) b µ a µ = Φ Φ σ σ 36
37 Shape of normal density: bell curve 37
38 Why is the normal distribution so popular? Galton s quincunx or dropping board Magnified Define X n position (relative to 0) after n rows of pins. If Z n takes values -1 and 1 and gives the direction at row n, then X n = Z Z n 38
39 If n is large then X n has approximately the normal distribution. Central limit theorem: Sum of many independent small effects gives normal distribution. 39
40 Exponential distribution Consider waiting time to an event that can occur at any time (compare with geometric distribution). Define the hazard or failure rate by Pr(event in [t, t + dt] event after t) = = Pr(t X < t + dt X t) = f X(t)dt 1 F X (t) Assume f X (t) 1 F X (t) = λ Then the solution to is obtained by integration f X (t) = λe λt 40
41 The distribution P X has a density with respect to the Lebesgue measure f X (x) = λe λx, x 0 = 0, otherwise P X has the Exponential distribution. There is one parameter λ > 0 and the notation is X Exp(λ) The mgf is and hence M X (t) = λ λ t E(X) = 1 λ var(x) = 1 λ 2 41
42 Note for t s Pr(X > t X > s) = Pr(X > t) Pr(X > s) = e λt e λs = e λ(t s) If you have waited s, the probability of an additional wait of t s is the same as if the wait had started at time 0. As the geometric distribution the exponential distribution has no memory. If X is length of human life, compare Pr(X > 40 X > 30) and Pr(X > 70 X > 60) Connection with Poisson distribution: If event is recurrent and waiting time has exponential distribution with parameter λ, then number of occurrences in [0, t] has a Poisson distribution with parameter λt. 42
43 Gamma distribution The Gamma distribution is the distribution of X = Y Y r with Y k independent exponential random variables with parameter λ. X is the waiting time to the r-the occurrence of the event. Compare with negative binomial distribution. The distribution P X has a density with respect to the Lebesgue measure f X (x) = λ Γ(r) (λx)r 1 e λx, x 0 = 0, otherwise with Γ the Γ function. Γ(r) = (r 1)! if r is a positive integer and otherwise it has to be computed numerically. 43
44 This is the Gamma distribution with parameters λ, r > 0. r need not be an integer. Notation X Γ(λ, r) The mgf is so that M X (t) = ( ) r λ t < λ λ t E(X) = r λ Var(X) = r λ 2 44
45 Lognormal distribution Let Y N(µ, σ 2 ) and define X = e Y. The distribution P X has density 1 f X (x) = xσ 1 2π e 2σ 2 (ln x µ)2, x 0 = 0, otherwise Derive this density. This is the Lognormal distribution with parameters µ and σ 2. The mean and variance can be derived from the mgf of the normal distribution E(X) = e µ+ 1 2 σ2 Var(X) = e 2µ+2σ2 e 2µ+σ2 What is E(ln X) and Var(ln X)? 45
46 Cauchy distribution A random variable that has a distribution with density with respect to the Lebesgue measure f X (x) = 1 ( πβ x α β ) 2, < x < has a Cauchy distribution with parameters α and β > 0. The density is symmetric around α. This is the median of X. E(X) does not exist and var(x) =. The mgf is for t 0. 46
47 Chi-squared distribution The chi-squared distribution is a special case of the Γ distribution: set r = k 2 and λ = 1 2. The density is 1 f X (x) = Γ ( ) k k x k 2 1 e x 2, x = 0, otherwise The parameter k is called the degrees od freedom of the distribution. The chi-squared distribution is important because of the following result: If X has a standard normal distribution, then Y = X 2 has a chi-squared distribution with k = 1. 47
48 We derive the mgf ( M Y (t) = E e tx2) = = 1 1 2t 1 2π 1 1 2π e tx2 1 2 x2 dx = e t 1 2t x 2 dx = ( = = 1 2t 1 2 t which is the mgf of the Γ distribution with r = 1 2 and λ = 1 2, i.e. the chi-squared distribution with k = 1. )
49 Exponential family of distributions The exponential family of densities are the densities that can expressed as f X (x) = h(x)c(θ)e k i=1 w i(θ)t i (x). < x < Note that c, w i, i = 1,..., k do not depend on x and h, t i, i = 1,..., k do not depnd on θ. θ is the vector of parameters of the distribution. Why useful: We will see that if we have data from an exponential family distribution, the information can be summarized by t i, i = 1,..., k. Examples (i) Binomial distribution: For x = 0,..., n ( ) ( ) n n f X (x) = p x (1 p) n x = (1 p) n e x ln ( 1 p) p x x Hence h(x) = ( n x ) t(x) = x c(θ) = (1 p) n ( ) p w(θ) = ln 1 p 49
50 (ii) Normal distribution: For < x < f X (x) = 1 σ 2π e 1 2σ 2 (x µ)2 = 1 Hence σ 2π e h(x) = 1 t 1 (x) = x 2 t 2 (x) = x µ 2 2σ 2 e 1 2σ 2 x2 + µ σ 2 x c(θ) = 1 σ µ 2 2π e 2σ 2 w 1 (θ) = 1 w 2σ 2 2 (θ) = µ σ 2 Other exponential family distributions: Poisson, exponential, Gamma. The density of the uniform distribution is f X (x) = 1 I(a x b) b a The function I(a x b) cannot be factorized in a function of x and a, b. Hence it does not belong to the exponential family. 50
51 Multivariate distributions: recapitulation Consider a probability space (Ω, A, P ) and define a vector of random variables or random vector X as a function X : Ω R K, i.e. X 1 (ω) X(ω) =. X K (ω) The distribution of X is a probability measure P X : B K [0, 1]. This is usually called the joint distribution of the random vector X. We consider the case that P X has a density with respect to the counting measure (discrete distribution) or with respect to the Lebesgue measure (continuous distribution). The density f X (x 1,..., x K ) is called the joint density of X. 51
52 We have with Pr(X 1 B) = P X (B R... R) = =... f X (x 1,..., x K )dx 1... dx K = B f X1 (x 1 ) = = B f X1 (x 1 )dx 1... f X (x 1, x 2,..., x K )dx 2... dx K f X1 is called the marginal density of X 1. The marginal density of X k for any k is obtained in the same way. For discrete distributions replace integration by summation. 52
53 Consider subvectors X 1,..., X K1 and X K1 +1,..., X K. The distributions of these subvectors are independent if and only if f X (x 1,..., x K ) = f X1...X K1 (x 1,..., x K1 )f XK X K (x K1 +1,..., x K ) i.e. the joint density is the product of the marginal densities. The conditional distribution of X 1,..., X K1 give X K1 +1,..., X K has density f X1...X K1 X K X K (x 1,..., x K1 x K1 +1,..., x K ) = f X (x 1,..., x K ) f XK X K (x K1 +1,..., x K ) i.e. it is the ratio of the joint density and the marginal density of the variables on which we condition. 53
54 If X is any subvector of X that does not have X1 as a component, then the conditional mean of X 1 given X = x can be computed using the conditional density of X 1 given X E(X 1 X = x) = x 1 f X1 X (x 1 x)dx 1 R For a discrete distribution replace integration by summation. The conditional variance of X 1 given X is Var(X 1 X = x) = E ((X 1 E(X 1 X ) = x)) 2 X = x We have Var(X 1 X = x) = = E (X 1 2 2X 1 E(X 1 X = x) + E(X 1 X ) = x) 2 X = x = = E ( X1 2 X = x ) 2E(X 1 X = x)) 2 +E(X 1 X = x)) 2 = = E ( ) X1 2 X = x E(X 1 X = x)) 2 Compare this result to that for the unconditional variance. 54
55 Law of iterated expectations: E(X 1 ) = E X(E X1 X (X 1 X)) Remember that on the rhs we just integrate E(X 1 X = x) with respect to the distribution of X. For the variance note [ E X Var(X 1 X) ] [ = E X E ( )] X1 2 X E X [E(X 1 X)) ] 2 and because E(X 1 X) is a random variable that is a function of X [ Var E(X 1 X) ] = E X [E(X 1 X) ] ( [ 2 E X E(X 1 X) ]) 2 we obtain if we add these equations ( E Var(X 1 X) ) +Var(E(X 1 X)) = E(X1) (E(X 2 1 )) 2 = Var(X 1 ) 55
56 Summary measures associated with multivariate distributions, i.e. distribution of a random vector X 56
57 Obvious: Means and variances of the random variables in X (marginal means and variances). In random vectors we also consider the covariance of any two components of X, say X 1 and X 2 Cov(X 1, X 2 ) = E [(X 1 E(X 1 ))(X 2 E(X 2 ))] The covariance is informative on the relation between X 1 and X 2, e.g. for a discrete distribution Cov(X 1, X 2 ) = (x 1 E(X 1 ))(x 2 E(X 2 ))f X1 X 2 (x 1, x 2 ) x 1 x 2 If outcomes with x 1 E(X 1 ) > 0 and x 2 E(X 2 ) > 0 or x 1 E(X 1 ) < 0 and x 2 E(X 2 ) < 0 (deviations go in same direction) are more likely than outcomes with x 1 E(X 1 ) > 0 and x 2 E(X 2 ) < 0 or x 1 E(X 1 ) < 0 and x 2 E(X 2 ) > 0 (deviations go in opposite directions), then. 57
58 In that case there is a positive association between X 1 and X 2. If the second type of outcomes are more likely Cov(X 1, X 2 ) < 0 and the association is negative. Note for constants c, d Cov(cX 1, dx 2 ) = cdcov(x 1, X 2 ) so that the size of Cov(X 1, X 2 ) is not a good measure of the strength of the association. To measure the strength we define the correlation coefficient of X 1, X 2 by ρ X1 X 2 = Cov(X 1, X 2 ) Var(X1 ) Var(X 2 ) 58
59 To derive its properties we need the Cauchy-Schwartz inequality Proof: Consider E(X 1 X 2 ) E(X 1 ) E(X 2 ) 0 E [ (tx 1 + X 2 ) 2] = t 2 E(X 2 1)+2tE(X 1 X 2 )+E(X 2 2) The rhs is a quadratic equation with at most one zero. The discriminant of the equation satisfies 4E(X 1 X 2 ) 2 4E(X 2 1)E(X 2 2) 0 Dividing by 4 and taking the square root gives the inequality. If then E [ (tx 1 + X 2 ) 2] = 0 Pr(tX 1 + X 2 = 0) = 1 i.e. the joint distribution is concentrated on the line tx 1 + x 2 = 0. 59
60 Properties of the correlation coefficient ρ cx1,dx 2 = ρ X1 X 2 By Cauchy-Schwartz Cov(X 1, X 2 ) = E [(X 1 E(X 2 ))(X 2 E(X 2 ))] E ((X 1 E(X 1 )) 2 ) E ((X 2 E(X 2 )) 2 ) so that ρ X1 X
61 Note ρ X1 X 2 = 1 Pr((X 2 E(X 2 )) = t(x 1 E(X 1 ))) = 1. Hence Pr(X 2 = a + bx 1 ) = 1 with a = E(X 2 ) te(x 1 ) and b = t. Note that Pr((X 2 E(X 2 )) = t(x 1 E(X 1 ))) = 1 Cov(X 1, X 2 ) = bvar(x 1 ) so that sign(ρ X1 X 2 ) = sign(cov(x 1, X 2 )) = sign(b). Conclusion: ρ X1 X 2 = 1 Pr(X 1 = a + bx 1 ) = 1 for b 0. If ρ X1 X 2 = 1 then b > 0 and if ρ X1 X 2 = 1 then b < 0 The correlation coefficient is a measure of the strength of the association and the extreme values correspond to a linear relation. 61
62 In the case of a multivariate distribution we organize the variances and covariances in a matrix, the variance(-covariance) matrix of X Var(X 1 ) Cov(X 1, X 2 ) Cov(X 1, X K ) Cov(X Var(X) = 1, X 2 ) Var(X 2 ) Cov(X 1, X K ) Var(X K ) Note that this is a symmetric K K matrix Var(X) = Var(X) Often we use the notation Var(X) = Σ 62
63 Remember if X is a K vector, then X 1 µ 1 (X µ)(x µ) =. [X 1 µ 1 X K µ K ] = X K µ K = (X 1 µ 1 ) 2 (X 1 µ 1 )(X 2 µ 2 ) (X 1 µ 1 )(X K µ K ) (X 1 µ 1 )(X 2 µ 2 ) (X 1 µ 1 ) (X 1 µ 1 )(X K µ K ) (X K µ K ) 2 so that if we denote µ = E(X) Σ = Var(X) = E ((X µ)(x µ) ) 63
64 Linear and quadratic functions of random vectors If X is a random vector with K components and a is a K vector of constants, we define the linear function of X K a X = a k X k k=1 Hence ( K ) E(a X) = E a k X k = Also k=1 K a k E(X k ) = a E(X) k=1 var(a X) = E [ (a X E(a X)) 2] = E [(a X a µ)(a X a µ)] = = E [(a X a µ)(x a µ a)] = E [a (X µ)(x µ) a] = = a [(X µ)(x µ) ] a 64
65 Moment generating function of a joint distribution If X is a random vector the mgf of X is M X (t) = E ( e t 1X 1 + +t K X K ) if the mgf exists for h < t k < h, k = 1,..., K. Note t 1 t =. Note that so that t K 2 M X (t) = E ( ) X 1 X 2 e t 1X 1 + +t K X K t 1 t 2 2 M X t 1 t 2 (0) = E (X 1 X 2 ) 65
66 This can be used to compute the covariance, because Cov(X 1, X 2 ) = E(X 1 X 2 ) E(X 1 )E(X 2 ) The mgf of the marginal distribution of X 1 is M X1 (t 1 ) = M X (t 1, 0,..., 0) 66
67 Special multivariate distributions Multinomial distribution Binomial distribution: Number of 1 s in n independent Bernoulli experiments. Instead of Bernoulli experiment with two outcomes consider random experiment wit K outcomes k = 1,..., K. Example is to pick student at random from class and record his/her nationality. Label nationalities with label k = 1,..., K. If fraction with nationality k is p k, then if outcome of random selection is Y we have Pr(Y = k) = p k with K k=1 p k = 1., k = 1,..., K 67
68 Repeat this experiment n times and let the repetitions be independent. Define X k is number of experiments with outcome k. Note K k=1 X k = n, so that X K is determined by X 1,..., X K 1. Consider a sequence of n outcomes Experiment n 1 n Outcome K 1 K Probability p 3 p 4 p 1 p 1... p K 1 p K Probability of this sequence is with x K = n K 1 k=1 x k. p x 1 1 px 2 2 px K 1 K 1 px K K To compute Pr(X 1 = x 1,..., X K 1 = x K 1 ) we count the number of such sequences. 68
69 This is equivalent to Pick x 1 experiments with outcome 1, x 2 with outcome 2 etc. from the n experiments. Start with picking the x 1 experiments with outcome 1 among ( ) the n experiments. This can be n done in ways. x 1 From the remaining n x 1 experiments pick the experiments ( ) with outcome 2. This can be done n x1 in ways. x 2 The total number of ways to choose the experiments with outcomes 1 and 2 is: ( ) ( ) n n x1 n! = x 1!x 2!(n x 1 x 2 )! x 1 x 2 Using the same argument repeatedly we find that the total number of ways to choose the experiments with outcomes 1, 2,... K is: n! x 1! (n x 1 x K 1 )! = n! x 1! x K! 69
70 Hence Pr(X 1 = x 1,..., X K 1 = x K 1 ) = n! x 1! x K! px 1 1 px 2 2 px K 1 K 1 px K K The Multinomial joint density of X 1,..., X K 1 is n! K f X (x 1,..., x K 1 ) = K k=1 x p x k k 0 x k n, k! k=1 = 0 otherwise Multinomial formula (a a K ) n = x 1 + +x K =n n! x 1! x K! ax 1 1 ax K K K x k = n k=1 70
71 Using this the mgf is = x 1 + +x K =n M X (t) = E ( e t 1X 1 + t K 1 X K 1 ) = n! x 1! x K! ( K 1 = k=1 From the mgf we find ( e t 1 p 1 ) x1 (e t K 1 p K 1 ) xk 1 p x K K = e t k p k + p K ) n E(X k ) = np k Var(X k ) = np k (1 p k ) Cov(X k, X l ) = np k p l Exercise: What is the marginal distribution of X k? What is the conditional distribution of X 1, X 2 given X 3 = x 3,..., X K 1 = x K 1? 71
72 Multivariate normal distribution The K random vector X has K-dimensional Multivariate normal distribution if its distribution has a density with respect to the K-dimensional Lebesgue measure equal to f X (x) = 1 Σ 1 2(2π) K 2 e 1 2 (x µ) Σ 1 (x µ), < x < By completion of squares (see 1-dimensional case) the mgf is M X (t) = e t µ+ 1 2 t Σt Exercise: Derive the mgf. Hence E(X) = µ Exercise: Derive these results. Var(X) = Σ The marginal distribution of X k normal with mean µ k and variance σk 2, the k-th element of the main diagonal of Σ. Exercise: Prove this using the mgf. 72
73 Special case K = 2, the bivariate normal distribution. Let the random vector be ( ) Y X The conditional distribution of Y given X = x is normal with E(Y X = x) = µ Y + σ XY (X µ X ) Var(Y X = x) = σ 2 Y with σ XY = Cov(X, Y ) σ 2 X ( ) 1 σ2 ( ) XY σx 2 = σ 2 σ2 Y 1 ρ 2 XY Y The conditional mean is linear in x. Compare wit result that Pr(Y = a+bx) = 1 if and only if ρ XY = 1. 73
74 Regression fallacy or regression to the mean Francis Galton ( ), observed that tall fathers have on average shorter sons, and short fathers have on average taller sons (in Victorian England mothers and daughters did not count). If this process were to continue, one would expect that in the long run extremes would disappear and all fathers and sons will have the average height. Using the same reasoning: Short sons have on average taller fathers (with a height closer to the mean) and tall sons have on average smaller fathers (again with a height closer to the mean). By this argument there is a tendency to move away from the mean! Similar observations can be made about many phenomena: Rookie players who do exceptionally well in the first year, tend to have a slump in the second; bringing in new management when a company underperforms seems to improve performance etc. 74
75 Analysis X Y = height of father = height of son Reasonable assumption: X, Y have a bivariate normal distribution with E(X) = E(Y ) = µ Var(X) = var(y ) = σ 2 0 < ρ XY < 1 75
76 Hence E(Y X = x) = µ + ρ(x µ) If x > µ 0 < E(Y X = x) µ < x µ i.e. average height of sons with fathers with more than average height is closer to the mean. If x < µ 0 > E(Y X = x) µ > x µ i.e. average height of sons with fathers with less than average height is closer to the mean. However, heights of fathers and sons have the same (normal) distribution, i.e. no change over the generations. 76
77 The distribution of linear and quadratic functions of normal random vectors X is a K random vector with X N(µ, Σ) Consider the random variables (i) Y 1 = a X with a a K vector of constants (scalar). (ii) Y 2 = AX + b with A an M K matrix and b an M vector of constants. (iii) Y 3 = X CX with C an K K matrix of constants. C is symmetric. 77
78 From the mgf of Y 1 and Y 2 we find (i) Y 1 N(a µ, a Σa).Exercise: Derive this. (ii) Y 2 N(Aµ + b, AΣA ) Exercise: Derive this. We verify E(Y 2 ) = AE(X) + b = Aµ + b Var(Y 2 ) = E [(Y 2 Aµ b)(y 2 Aµ b) ] = = E [(AX Aµ)(AX Aµ) ] = E [A(X µ)(x µ)a ] = = AE [(X µ)(x µ)] A = AΣA (iii) Special case X N(0, I) and C idempotent, i.e. C 2 = C, the matrix generalization of unity. 78
79 P is the K K matrix of eigenvectors of C and choose P such that P P = I, i.e. P is orthonormal. Define the diagonal matrix of eigenvalues λ 1 0 Λ = λ K We have and CP = P Λ P CP = Λ C = P ΛP because by P P = I we have P = (P ) 1. Hence P ΛP = C = C 2 = P Λ 2 P so that Λ 2 = Λ 79
80 This implies that λ k is either 0 or 1. Let L be 1 and consider so that Z N(0, 1). Hence Z = P X Y 3 = X P ΛP X = Z ΛZ = Finally K λ k ZK 2 χ 2 (L) k=1 tr(c) = tr(p ΛP ) = tr(λp P ) = tr(λ) = L Let X 1 and X 2 be subvectors of X of dimensions K 1 and K 2 with K 1 +K 2 = K. Then the variance matrix of X is ( ) Σ11 Σ Σ = 12 Σ 12 Σ 22 with Var(X 1 ) = Σ 11 and Var(X 2 ) = Σ 22 and Σ 12 = E((X 1 µ 1 )(X 2 µ 2 ) ). We have that X 1 and X 2 are independent if and only if Σ 12 = 0. To see this note that if Σ 12 = 0 ( ) 1 ( ) Σ 1 Σ11 0 Σ = = 0 Σ 22 0 Σ 1 22 Hence (x µ) Σ 1 (x µ) = (x 1 µ 1 ) Σ 1 11 (x 1 µ 1 )+(x 2 µ 2 ) Σ 1 22 (x 2 µ 2 ) Substitution in the density of the multivariate normal distribution shows that this density factorizes in a function of x 1 and a function of x 2, which establishes that these random vectors are independent. 80
81 Conclusion: In the normal distribution X 1, X 2 are independent if and only if Cov(X 1, X 2 ) = 0. Define Y 4 = X BX with B idempotent. Then if X N(0, I) (i) Y 1 and Y 3 are independent if and only if Ba = 0. (ii) Y 3 and Y 4 are stochastically independent if and only if BC = CB = 0. Proof: (i) Y 3 = X CX = X C 2 X = X C CX which is a function of CX. Hence Y 1 and Y 3 are independent if and only if Cov(BX, a X) = E(BXX a) = Ba = 0 (ii) Y 3 = X C CX and Y 4 = X D DX so that Y 3 and Y 4 are independent if and only if Cov(BXX C ) = BC = 0. 81
Chapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationMAS113 Introduction to Probability and Statistics. Proofs of theorems
MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationECON 5350 Class Notes Review of Probability and Distribution Theory
ECON 535 Class Notes Review of Probability and Distribution Theory 1 Random Variables Definition. Let c represent an element of the sample space C of a random eperiment, c C. A random variable is a one-to-one
More informationBrief Review of Probability
Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationACM 116: Lectures 3 4
1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance
More information15 Discrete Distributions
Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationProbability Distributions Columns (a) through (d)
Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)
More informationSTAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)
STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 07 Néhémy Lim Moment functions Moments of a random variable Definition.. Let X be a rrv on probability space (Ω, A, P). For a given r N, E[X r ], if it
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationErrata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA
Errata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Effective July 5, 3, only the latest edition of this manual will have its
More informationBASICS OF PROBABILITY
October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationPCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities
PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets
More informationWeek 1 Quantitative Analysis of Financial Markets Distributions A
Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationActuarial Science Exam 1/P
Actuarial Science Exam /P Ville A. Satopää December 5, 2009 Contents Review of Algebra and Calculus 2 2 Basic Probability Concepts 3 3 Conditional Probability and Independence 4 4 Combinatorial Principles,
More informationStat 5101 Notes: Brand Name Distributions
Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform
More informationThings to remember when learning probability distributions:
SPECIAL DISTRIBUTIONS Some distributions are special because they are useful They include: Poisson, exponential, Normal (Gaussian), Gamma, geometric, negative binomial, Binomial and hypergeometric distributions
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More informationMAS113 Introduction to Probability and Statistics. Proofs of theorems
MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a
More informationASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata
ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Errata Effective July 5, 3, only the latest edition of this manual will have its errata
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata
ASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Errata Effective July 5, 3, only the latest edition of this manual will have its errata
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationDiscrete Distributions
Chapter 2 Discrete Distributions 2.1 Random Variables of the Discrete Type An outcome space S is difficult to study if the elements of S are not numbers. However, we can associate each element/outcome
More informationProbability Models. 4. What is the definition of the expectation of a discrete random variable?
1 Probability Models The list of questions below is provided in order to help you to prepare for the test and exam. It reflects only the theoretical part of the course. You should expect the questions
More information1.1 Review of Probability Theory
1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More informationChapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University
Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real
More information1.6 Families of Distributions
Your text 1.6. FAMILIES OF DISTRIBUTIONS 15 F(x) 0.20 1.0 0.15 0.8 0.6 Density 0.10 cdf 0.4 0.05 0.2 0.00 a b c 0.0 x Figure 1.1: N(4.5, 2) Distribution Function and Cumulative Distribution Function for
More informationChapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations
Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationECE 302 Division 2 Exam 2 Solutions, 11/4/2009.
NAME: ECE 32 Division 2 Exam 2 Solutions, /4/29. You will be required to show your student ID during the exam. This is a closed-book exam. A formula sheet is provided. No calculators are allowed. Total
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationSTAT/MATH 395 PROBABILITY II
STAT/MATH 395 PROBABILITY II Chapter 6 : Moment Functions Néhémy Lim 1 1 Department of Statistics, University of Washington, USA Winter Quarter 2016 of Common Distributions Outline 1 2 3 of Common Distributions
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative
More informationContents 1. Contents
Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................
More informationChapter 2. Discrete Distributions
Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation
More informationECON Fundamentals of Probability
ECON 351 - Fundamentals of Probability Maggie Jones 1 / 32 Random Variables A random variable is one that takes on numerical values, i.e. numerical summary of a random outcome e.g., prices, total GDP,
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More information2 (Statistics) Random variables
2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is
More informationChapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables
Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results
More information1 Exercises for lecture 1
1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )
More informationStat 5101 Notes: Brand Name Distributions
Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution DiscreteUniform(n). Discrete. Rationale Equally likely outcomes. The interval 1, 2,..., n of
More informationReview for the previous lecture
Lecture 1 and 13 on BST 631: Statistical Theory I Kui Zhang, 09/8/006 Review for the previous lecture Definition: Several discrete distributions, including discrete uniform, hypergeometric, Bernoulli,
More informationWeek 12-13: Discrete Probability
Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible
More information5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.
88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal
More informationComputer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.
Simulation Discrete-Event System Simulation Chapter 4 Statistical Models in Simulation Purpose & Overview The world the model-builder sees is probabilistic rather than deterministic. Some statistical model
More information1 Review of Probability
1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x
More informationChp 4. Expectation and Variance
Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.
More informationCHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY
CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY CHAPTER 1 DISTRIBUTION THEORY 2 Basic Concepts CHAPTER 1 DISTRIBUTION THEORY 3 Random Variables (R.V.) discrete random variable: probability
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationMoment Generating Functions
MATH 382 Moment Generating Functions Dr. Neal, WKU Definition. Let X be a random variable. The moment generating function (mgf) of X is the function M X : R R given by M X (t ) = E[e X t ], defined for
More informationContinuous Distributions
A normal distribution and other density functions involving exponential forms play the most important role in probability and statistics. They are related in a certain way, as summarized in a diagram later
More informationBMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution
Lecture #5 BMIR Lecture Series on Probability and Statistics Fall, 2015 Department of Biomedical Engineering and Environmental Sciences National Tsing Hua University s 5.1 Definition ( ) A continuous random
More information1 Review of Probability and Distributions
Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote
More informationSampling Distributions
Sampling Distributions In statistics, a random sample is a collection of independent and identically distributed (iid) random variables, and a sampling distribution is the distribution of a function of
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationdiscrete random variable: probability mass function continuous random variable: probability density function
CHAPTER 1 DISTRIBUTION THEORY 1 Basic Concepts Random Variables discrete random variable: probability mass function continuous random variable: probability density function CHAPTER 1 DISTRIBUTION THEORY
More information(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.
54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and
More informationSlides 8: Statistical Models in Simulation
Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationContinuous Distributions
Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a countable number of outcomes (i.e. of discrete type). In Chapter 3, we study
More informationStatistics, Data Analysis, and Simulation SS 2015
Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler
More informationIntroduction to Computational Finance and Financial Econometrics Probability Review - Part 2
You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationDistributions of Functions of Random Variables. 5.1 Functions of One Random Variable
Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationMultivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013
Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm 2 Chapter Gaussian Vectors.
More informationWeek 2. Review of Probability, Random Variables and Univariate Distributions
Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More information1 Probability theory. 2 Random variables and probability theory.
Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major
More information3 Continuous Random Variables
Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random
More informationMATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours
MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY
More informationSTAT Chapter 5 Continuous Distributions
STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range
More informationStatistics STAT:5100 (22S:193), Fall Sample Final Exam B
Statistics STAT:5 (22S:93), Fall 25 Sample Final Exam B Please write your answers in the exam books provided.. Let X, Y, and Y 2 be independent random variables with X N(µ X, σ 2 X ) and Y i N(µ Y, σ 2
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems
More informationExam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)
Exam P Review Sheet log b (b x ) = x log b (y k ) = k log b (y) log b (y) = ln(y) ln(b) log b (yz) = log b (y) + log b (z) log b (y/z) = log b (y) log b (z) ln(e x ) = x e ln(y) = y for y > 0. d dx ax
More informationExponential Distribution and Poisson Process
Exponential Distribution and Poisson Process Stochastic Processes - Lecture Notes Fatih Cavdur to accompany Introduction to Probability Models by Sheldon M. Ross Fall 215 Outline Introduction Exponential
More informationSet Theory Digression
1 Introduction to Probability 1.1 Basic Rules of Probability Set Theory Digression A set is defined as any collection of objects, which are called points or elements. The biggest possible collection of
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationMoments. Raw moment: February 25, 2014 Normalized / Standardized moment:
Moments Lecture 10: Central Limit Theorem and CDFs Sta230 / Mth 230 Colin Rundel Raw moment: Central moment: µ n = EX n ) µ n = E[X µ) 2 ] February 25, 2014 Normalized / Standardized moment: µ n σ n Sta230
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationLectures on Elementary Probability. William G. Faris
Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................
More informationTopic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.
Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick
More informationStatistical distributions: Synopsis
Statistical distributions: Synopsis Basics of Distributions Special Distributions: Binomial, Exponential, Poisson, Gamma, Chi-Square, F, Extreme-value etc Uniform Distribution Empirical Distributions Quantile
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationChing-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12
Lecture 5 Continuous Random Variables BMIR Lecture Series in Probability and Statistics Ching-Han Hsu, BMES, National Tsing Hua University c 215 by Ching-Han Hsu, Ph.D., BMIR Lab 5.1 1 Uniform Distribution
More informationProbability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution
Probability distributions Probability Distribution Functions G. Jogesh Babu Department of Statistics Penn State University September 27, 2011 http://en.wikipedia.org/wiki/probability_distribution We discuss
More information