Section 21. Expected Values. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Similar documents
Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Continuous Random Variables

Probability and Distributions

3. Probability and Statistics

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Multiple Random Variables

conditional cdf, conditional pdf, total probability theorem?

Continuous Random Variables and Continuous Distributions

MAS223 Statistical Inference and Modelling Exercises

Continuous Random Variables

BASICS OF PROBABILITY

Formulas for probability theory and linear models SF2941

1 Review of Probability and Distributions

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Actuarial Science Exam 1/P

Random Variables and Their Distributions

Lecture 2: Review of Probability

LIST OF FORMULAS FOR STK1100 AND STK1110

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Chapter 4. Chapter 4 sections

Lecture 1: August 28

Chp 4. Expectation and Variance

ECON 5350 Class Notes Review of Probability and Distribution Theory

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

1.1 Review of Probability Theory

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Lecture 2: Repetition of probability theory and statistics

3 Continuous Random Variables

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

18.440: Lecture 28 Lectures Review

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Introduction to Machine Learning

1 Random Variable: Topics

Continuous Distributions

18.440: Lecture 28 Lectures Review

Gaussian vectors and central limit theorem

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

Exercises and Answers to Chapter 1

1 Probability theory. 2 Random variables and probability theory.

Practice Examination # 3

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

Chapter 7: Special Distributions

Sampling Distributions

Algorithms for Uncertainty Quantification

1 Presessional Probability

Probability Background

Stat410 Probability and Statistics II (F16)

Final Exam # 3. Sta 230: Probability. December 16, 2012

Introduction to Machine Learning

5 Operations on Multiple Random Variables

Probability Theory and Statistics. Peter Jochumzen

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Statistical Methods in Particle Physics

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Review of Probability Theory

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

4. Distributions of Functions of Random Variables

Statistics for scientists and engineers

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

SOLUTIONS TO MATH68181 EXTREME VALUES AND FINANCIAL RISK EXAM

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

6.041/6.431 Fall 2010 Quiz 2 Solutions

Product measure and Fubini s theorem

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

Chapter 4 : Expectation and Moments

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2 Functions of random variables

E[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =

where r n = dn+1 x(t)

Basic concepts of probability theory

CS145: Probability & Computing

Section 8.1. Vector Notation

Basic concepts of probability theory

Chapter 3: Random Variables 1

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

STA205 Probability: Week 8 R. Wolpert

STA 256: Statistics and Probability I

Partial Solutions for h4/2014s: Sampling Distributions

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Topic 4: Continuous random variables

Stat 5101 Notes: Brand Name Distributions

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Statistics, Data Analysis, and Simulation SS 2015

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

MATH Solutions to Probability Exercises

Characteristic Functions and the Central Limit Theorem

Multivariate Random Variable

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random

CHAPTER 1 DISTRIBUTION THEORY 1 CHAPTER 1: DISTRIBUTION THEORY

Topic 4: Continuous random variables

Multiple Random Variables

Exercises with solutions (Set D)

Transcription:

Section 21 Expected Values Po-Ning Chen, Professor Institute of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30010, R.O.C.

Expected value as integral 21-1 Definition (expected value) The expected value of a random variable X is the integral of X with respect to its cdf. I.e., E[X] =E[X + ] E[X ]= 0 xdf X (x) 0 ( x)df X (x) = Properties of expected value 1. E[X] exists if, and only if, at least one of E[X + ]ande[x ]isfinite. 2. X is integrable, if E[ X ] <. 3. For B/B-measurable (or simply B-measurable) function g, E[g(X)] = E[g(X) + ] E[g(X) ] = g(x)df X (x) = {x R:g(x) 0} g(x)df X (x). {x R:g(x)<0} xdf X (x). [ g(x)]df X (x)

Absolute moments 21-2 Definition (absolute moments) The kth absolute moment of X is: E [ X k] = Properties of absolute moments 1. The absolute moment always exists. x k df X (x). 2. If kth absolute moment is finite, then jth moment is finite for any j k. Proof: It can be easily proved by x j 1+ x k for j k.

Moments 21-3 Definition (moments) The kth moment of X is: E [ X k] = = x k df X (x) max{x k, 0}dF X (x) = E[(X k ) + ] E[(X k ) ]. ( min{x k, 0} ) df X (x) Properties of moments 1. E[X k ] exists if, and only if, at least one of E[(X k ) + ]ande[(x k ) ]isfinite. 2. X k is integrable, if E[ X k ] <. 3. For B/B-measurable (or simply B-measurable) function g, E[g(X k )] = E[g(X k ) + ] E[g(X k ) ] = g(x k )df X (x) = {x R:g(x k ) 0} g(x k )df X (x). {x R:g(x k )<0} [ g(x k )]df X (x)

Moments 21-4 Example 21.1 (moments of standard normal) The pdf of a standard normal distribution is equal to: 1 e x2 /2. 2π Integration by parts shows that: 1 x k e x2 /2 dx = 2π or equivalently, As a result, E[X k ]= E[X k ]=(k 1)E[X k 2 ]. 1 2π (k 1)x k 2 e x2 /2 dx, { 0, for k odd; 1 3 5 (k 1), for k even.

Computation of mean 21-5 Theorem For non-negative random variable X, E[X] = 0 Pr[X >t]dt = 0 Pr[X t]dt. In other words, the area of ccdf (complementary cdf) is the mean. (The theorem is valid even if E[X] =.)(This is an extension of the law of large numbers when empirical distribution is concerned)! Geometric interpretation Suppose Pr[X = x i ]=p i for 1 i k. 1 p k x k p k cdf p k 1 x k 1 p k 1. The area of ccdf = p 1 x 1 + p 2 x 2 + p 3 x 3 + + p k 1 x k 1 + p k x k = E[X] p 3 x 3 p 2 x 2 p 2 p 1 x 1 p 1 0 x 1 x 2 x 3 p 3 x k 1 x k

Computation of mean 21-6 Theorem For α 0, α + Note that it is always true that 0 xdf X (x) = 0 + xdf X (x) but it is possible α xdf x(x) α + xdf X (x)! xdf X (x) =α Pr[X >α]+ α Pr[X >t]dt. Proof: Let Y = X I [X>α],whereI [X>α] =1ifX>α, and zero, otherwise. Hence, Y =0forX α, andy = X for X>α.Consequently, xdf X (x) =E[Y ] = α + = = 0 α 0 α 0 Pr[Y >t]dt Pr[Y >t]dt + Pr[X >α]dt + = α Pr[X >α]+ α α α Pr[Y >t]dt Pr[X >t]dt Pr[X >t]dt. The empirical approximation of Pr[X > t](orpr[x t]) is more easily obtained than df (x). With the above result, E[X](orE[Y ]) can be established directly from Pr[X >t].

Inequalities regarding moments 21-7 Inequalities regarding moments

Markov s inequality 21-8 Lemma (Markov s inequality) For any k>0 (and implicitly α>0), Pr[ X α] 1 α k E[ X k ]. Proof: The below inequality is valid for any x R: x k α k 1{ x k α k } (21.1) Hence, E[ X k ] = x k df X (x) α k 1{ x k α k }df X (x) =α k Pr[ X α]. Equality holds if, and only if, equality in (21.1) is true with probability 1. I.e., Pr [ X k = α k 1{ X k α k } ] =1, or equivalently, Pr[ X =0orα] =1.

Chebyshev-Bienaymé inequality 21-9 Lemma (Chebyshev-Bienaymé inequality) For α>0, Pr[ X E[X] α] 1 α 2Var[X]. Proof: By Markov s inequality with k =2,wehave: Pr[ X E[X] α] 1 α 2E[ X E[X] 2 ]. Equality holds if, and only if, Pr[ X E[X] =0]+Pr [ X E[X] = α ] =1, which implies that Pr [ X = E[X]+α ] =Pr [ X = E[X] α ] = p and for α>0. Pr[X = E[X]] = 1 2p

Jensen s inequality 21-10 Definition (convexity) A function ϕ(x) issaidtobeconvex over an interval (a, b) if for every x 1,x 2 (a, b) and0 λ 1, ϕ(λx 1 +(1 λ)x 2 ) λϕ(x 1 )+(1 λ)ϕ(x 2 ). Furthermore, a function ϕ is said to be strictly convex if equality holds only when λ =0orλ =1.(Can we replace (a, b) byarealsetx?) Definition (support line) A line y = ax + b is said to be a support line of function ϕ(x) if among all lines of the same slope a, it is the largest one satisfying ax + b ϕ(x) for every x. A support line ax + b may not necessarily intersect with ϕ( ). In other words, it is possible that no x 0 satisfies ax 0 + b = ϕ(x 0 ). However, the existence of intersection between function ϕ( ) and its support line is guaranteed, if ϕ( ) isconvex.

Jensen s inequality 21-11 An example that no intersection exists for a function and its support line. ϕ(x) ax + b

Jensen s inequality 21-12 Lemma (Jensen s inequality) Suppose that function ϕ( ) isconvexonthe domain X of X. (Implicitly, E[X] X.) Then ϕ(e[x]) E[ϕ(X)]. Proof: Let ax + b be a support line through the point (E[X],ϕ(E[X])). Thus, over the domain X of ϕ(x), If equality holds in X in this step, ax + b ϕ(x). then equality remains true for the subsequent steps. By taking the expectation value of both sides, we obtain a E[X]+b E[ϕ(X)], but we know that a E[X]+b = ϕ(e[x]). Consequently, ϕ(e[x]) E[ϕ(X)]. Equality hold if, and only if, either there exist a and b such that ae[x] +b = ϕ(e[x]) and Pr ( {x X : ax + b = ϕ(x)} ) =1.

Jensen s inequality 21-13 The support line y = ax + b of the convex function ϕ(x). ϕ(x) ax + b ( E[X],ϕ(E[X]) )

Hölder s inequality 21-14 Lemma (Hölder s inequality) For p>1,q >1and1/p +1/q =1, E[ XY ] E 1/p [ X p ]E 1/q [ Y q ]. Proof: Since the inequality is trivially valid, if E 1/p [ X p ]E 1/q [ Y q ] = 0. Without loss of generality, assume E 1/p [ X p ]E 1/q [ Y q ] > 0. exp{x} is a convex function in x. Hence, by Jensen s inequality, { 1 exp p s + 1 } q t 1 p exp{s} + 1 q exp{t}. Since e x is strictly convex, equality holds iff s = t. Let a =exp{s/p} and b =exp{t/q}. Then the above inequality becomes: ab 1 p ap + 1 q bq, Equality holds iff a p = b q. whose validity is not restricted to positive a and b but to non-negative a and b. By letting a = x /E 1/p [ X p ]andb = y /E 1/q [ Y q ], we obtain: xy E 1/p [ X p ]E 1/q [ Y q ] 1 x p p E[ X p ] + 1 y q q E[ Y q ]. Equality [ holds if, and ] only if, X p Y q Pr = =1. E[ X p ] E[ Y q ]

Hölder s inequality 21-15 Taking the expectation values of both sides yields: E[ XY ] E 1/p [ X p ]E 1/q [ Y q ] 1 E[ X p ] p E[ X p ] + 1 E[ Y q ] q E[ Y q ] = 1 p + 1 q =1. Lemma (Hölder s inequality) For p>1,q >1and1/p +1/q =1, E[ XY ] E 1/p [ X p ]E 1/q [ Y q ]. Equality holds if, and only if, [ ] X p Y q Pr = E[ X p ] E[ Y q ] =1or Pr[X =0]=1or Pr[Y =0]=1. Example. p = q =2and Y =0 Y =1 X =0 p 00 p 01 X =1 p 10 p 11 E[ XY ] = p 11 = p 1/2 11 p1/2 11 (p 10 + p 11 ) 1/2 (p 01 + p 11 ) 1/2 = E 1/2 [ X 2 ]E 1/2 [ Y 2 ] with equality holding if, and only if, p 10 = p 01 =0.

Cauchy-Schwartz s inequality 21-16 Suppose E[ X ] > 0andE[ Y > 0. Lemma (Hölder s inequality) For p>1,q >1and1/p +1/q =1, E[ XY ] E 1/p [ X p ]E 1/q [ Y q ]. Equality holds if, and only if, there exists a such that Pr[ X p = a Y q ]=1. Lemma (Cauchy-Schwartz s inequality) E[ XY ] E 1/2 [X 2 ]E 1/2 [Y 2 ]. Equality holds if, and only if, there exists a such that Pr[X 2 = ay 2 ]=1. Proof: A special case of Hölder s inequality with p = q =2. Equality holds if, and only if, Pr [ X 2 = ay 2] =1 for some a.

Lyapounov s inequality 21-17 Lemma (Lyapounov s inequality) For 0 <α<β, E 1/α [ Z α ] E 1/β [ Z β ]. Equality holds if, and only if, Pr[ Z = a] =1forsomea. Proof: Letting X = Z α, Y =1,p = β/α and q = β/(β α) inhölder s inequality yields: [ ] [ ] E[ Z α ] E α/β ( Z α ) β/α E (β α)/β 1 β/(β α) = E α/β [ Z β ]. Equality holds if, and only if, [ ] Pr ( Z α ) β/α = a =Pr [ Z β = a ] =Pr [ Z = a 1/β] =1 for some a. Notably, in the statement of the lemma, β is strictly larger than α. It is certain that if α = β, the inequality automatically becomes an equality.

Joint integrals 21-18 Lemma For B k /B-measurable (or simply B k -measurable) function g and k- dimensional random vector X, E[g(X)] = g(x k )df X (x k ), R k if one of E[(g(X)) + ]ande[(g(x)) ]isfinite. Definition (covariance) The covariance of two random vectors X and Y is: Cov[X, Y ] = E [ (X E[X])(Y E[Y ]) T] X 1 E[X 1 ] X = E 2 E[X 2 ] [ Y1 E[Y. 1 ] Y 2 E[Y 2 ] Y l E[Y l ] ] X k E[X k ] (X 1 E[X 1 ])(Y 1 E[Y 1 ]) (X 1 E[X 1 ])(Y l E[Y l ]) =.. (X k E[X k ])(Y 1 E[Y 1 ]) (X k E[X k ])(Y l E[Y l ]) where T represents vector transpose operation, if one of E[((X i E[X i ])(Y j E[Y j ])) + ]ande[((x i E[X i ])(Y j E[Y j ])) ] is finite for every i, j. k l

Joint integrals 21-19 Definition (uncorrelated) X and Y is uncorrelated, if Cov[X, Y ]=0 k l. Definition (independence) X and Y is independent, if Pr [ (X 1 x 1 X k x k ) (Y 1 y 1 Y l y l ) ] = Pr [ ] [ ] X 1 x 1 X k x k Pr Y1 y 1 Y l y l. Lemma (integrability of product) For independent X 1,X 2,...,X k,ifeach of X i is integrable, so is X 1 X 2 X k,and E[X 1 X 2 X k ]=E[X 1 ]E[X 2 ] E[X k ]. Lemma (sum of variance for pair-wise independent samples) If X 1,X 2,...,X k are pair-wise independent and integrable, Var[X 1 + X 2 + + X k ]=Var[X 1 ]+Var[X 2 ]+ +Var[X k ]. Notably, pair-wise independence does not imply complete independence.

Joint integrals 21-20 Example (only pair-wise independence) Toss a fair coin twice, and assume independence. Define { 1, if head appears on the first toss; X = 0, otherwise, { 1, if head appears on the second toss; Y = 0, otherwise, { 1, if exactly one head and one tail appear on the two tosses; Z = 0, otherwise, Then Pr[X =1 Y =1 Z =1]=0; but Pr[X =1]Pr[Y =1]Pr[Z =1]= 1 2 1 2 1 2 = 1 8. So X, Y and Z are not independent. Obviously, X Y. In addition, Pr[X =1 Z =1] Pr[X =1 Z =1]= Pr[Z =1] Pr[X =1 Z =0] Pr[X =1 Z =0]= Pr[Z =0] = 1/4 1/2 = 1 2 = 1/4 1/2 = 1 2 One can similarly show (or by symmetry) that Y Z. =Pr[X =1] =Pr[X =1] X Z.

Joint integrals 21-21 Example (con t) By head head X + Y + Z =2 head tail X + Y + Z =2 tail head X + Y + Z =2 tail tail X + Y + Z =0 Var[X + Y + Z] = 3 4 (2 3/2)2 + 1 4 (0 3/2)2 = 3 4 This is equal to: Var[X]+Var[Y ]+Var[Z] = 1 4 + 1 4 + 1 4 = 3 4. This result matches that The variance of sum equals the sum of variances holds for pair-wise independent random variables.

Moment generating function 21-22 Definition (moment generating function) The moment generating function of X is defined as: M X (t) =E[e tx ]= e tx df X (x), for all t for which this is finite. If M X (t) is defined (i.e., finite) throughout an interval ( t 0,t 0 ), where t 0 > 0. then t k M X (t) = k! E[Xk ]. k=0 In other words, M X (t) has a Taylor expansion about t = 0 with positive radius of convergence if it is defined in some ( t 0,t 0 ). In case that M X (t) has a Taylor expansion about t = 0 with positive radius of convergence, the moment of X can be computed by the derivatives of M X (t) through: M (k) (0) = E[X k ]. If M Xi (t) is defined throughout an interval ( t 0,t 0 )foreachi,andx 1,X 2,...,X n are independent, then the moment generating function of X 1 + + X n is also n defined on ( t 0,t 0 ), and is equal to M Xi (t). i=1

Moment generating function 21-23 Example (random variable whose moment generating function is defined only at zero) The pdf of a Cauchy distribution is For t 0, the integral e tx 1 π(1 + x 2 ) dx = 0 0 0 1 f(x) = 1 π(1 + x 2 ). ( e tx + e tx) 1 π(1 + x 2 ) dx e t x 1 π(1 + x 2 ) dx t x π(1 + x 2 ) dx (by ex 1+x x for x 0) t x π(1 + x 2 ) dx t x t dx = π(x 2 + x 2 ) 2π So the moment generating function is not defined for any t 0. 1 1 1 dx =. x The Cauchy distribution is indeed the Student s T -distribution with 1 degree of freedom.

Moment generating function 21-24 Example (Student s T -distribution with n degree of freedom) This distribution has moments of order n 1 but any higher moments either do not exist or are infinity. Some books considers infinite moments as non-existence. So they write This distribution has moments of order n 1 but no higher moments exist. Let X, Y 1,Y 2,...,Y n be i.i.d. with standard normal marginal distribution. Then X n T n = Y 2 1 + + Yn 2 is called the Student s t-distribution (on R) withn degree of freedom. The numerator X n has a normal density with mean 0 and variance n. χ 2 = Y1 2 + +Yn 2 is a chi-square distribution with n degree of freedom, and has density f 1/2,n/2 (y), where f α,ν (x) = 1 Γ(ν) αν x ν 1 e αx on [0, ) is the gamma density (or sometimes named Erlangian density when ν is a positive integer) with parameters ν>0andα>0, and Γ(t) = x t 1 e x dx is the gamma function. 0

Moment generating function 21-25 (closure under convolutions for gamma density) f α,µ f α,ν = f α,µ+ν. Proof: (f α,µ f α,ν )(x) = = = = 0 α µ+ν f α,µ (x y)f α,ν (y)dy x (x y) µ 1 e α(x y) y ν 1 e αy dy Γ(µ)Γ(ν) 0 α µ+ν x Γ(µ)Γ(ν) e αx (x y) µ 1 y ν 1 dy 0 α µ+ν 1 Γ(µ)Γ(ν) e αx 0 1 1 = Γ(µ + ν) αµ+ν x µ+ν 1 e αx = f α,µ+ν (x), (x xt) µ 1 (xt) ν 1 xdt (by y = xt) Γ(µ + ν) Γ(µ)Γ(ν) 0 (1 t) µ 1 t ν 1 dt where that the last term is equal to one follows the beta integral. For µ>0andν>0, B(µ, ν) = beta integral. 1 0 (1 t) µ 1 t ν 1 dt = Γ(µ)Γ(ν) Γ(µ + ν) is the so-called

Moment generating function 21-26 That the pdf of chi-square distribution with 1 degree of freedom equals f 1/2,1/2 (y) canbederivedas: Pr[Y 2 1 y] = Pr[ y Y 1 y]=φ( y) Φ( y), where Φ( ) represents the unit Gaussian cdf. So the derivative of the above equation is y 1/2 φ(y 1/2 1 )= Γ(1/2) (1/2)1/2 y (1/2) 1 e (1/2)y for y 0. Then by closure under convolution for gamma densities, the distribution of chi-square distribution with n degree of freedom can be obtained. In statistical mechanics, Y1 2 + Y 2 2 + Y 3 2 appears as the square of the speed of particles. So the density of particle speed (on [0, )) is equal to v(x) = 2xf 1/2,3/2 (x 2 ), which is called Maxwell density. By letting Ȳ = Y 2 1 + + Y 2 n, Pr[T n t] = Pr[(X n)/ȳ t] = Pr[X yt/ n]df Ȳ (y) = 0 Φ(yt/ n) ( 2yf 1/2,n/2 (y 2 ) ) dy. 0

Moment generating function 21-27 So the density of T n is: ( y f Tn (t) = n φ(yt/ n) )( 2yf 1/2,n/2 (y 2 ) ) dy 0 ( y = e y2 t 2 )( /(2n) 1 0 2πn 2 n/2 1 Γ(n/2) yn 1 e ) y2 /2 dy 1 = 2 (n 1)/2 Γ(n/2) y n e y2 (1+t 2 /n)/2 dy πn 0 1 = 2 (n 1)/2 Γ(n/2) 2 n/2 s n/2 2 1/2 s 1/2 ) πn 0 (1 + t 2 /n) n/2e s( ds (by y = (1 + t 2 /n) 1/2 1 = Γ(n/2) s (n+1)/2 1 e s ds πn(1 + t 2 /n) (n+1)/2 0 C n = (1 + t 2 /n) (n+1)/2, where C Γ((n +1)/2) n = Γ(n/2) πn. 2s (1 + t 2 /n) 1/2)

Moment generating function 21-28 For t 0, the integral e tx C n (1 + x 2 /n) (n+1)/2dx C n 0 e t x 1 (1 + x 2 /n) (n+1)/2dx t n x n (1 + x 2 /n) (n+1)/2dx C n n! 0 ( by e x 1+x + x2 2 C n n! C n n! n n t n x n (1 + x 2 /n) (n+1)/2dx xn + + n! xn n! for x 0) t n x n (x 2 /n + x 2 /n) (n+1)/2dx = C n t n n (n+1)/2 n!2 (n+1)/2 So the moment generating function is not defined for any t 0. 1 dx =. n x

Moment generating function 21-29 However for r<n, E[ T n r t r ] = C n (1 + t 2 /n) (n+1)/2dt t r = 2C n (1 + t 2 /n) (n+1)/2dt But E[(Tn n ) + ]= 0 n (r+1)/2 Γ((n r)/2)γ((r +1)/2) = 2C n 2Γ((n +1)/2) = nr/2 Γ((n r)/2)γ((r +1)/2) Γ(n/2) <. π C n C n 0 t n (1 + t 2 /n)(n+1)/2dt, if n even; t n (1 + t 2 /n)(n+1)/2dt, if n odd. = and 0, if n even; E[(Tn n ) ]= 0 t n C n (1 + t 2 /n)(n+1)/2dt =, if n odd. This is a good example for which the moments, even if some of them exist (and are finite), cannot be obtained through the moment generating function.

Some seldom-known densities 21-30 gamma distribution : f α,ν (x) (on[0, )) has mean ν/α and variance ν/α 2. Snedecor s distribution or F -distribution : It is the distribution of Z = X1 2 + + Xm 2 m Y1 2 + + Yn 2 n where X i and Y j are all independent standard normal. Its pdf with positive integer parameters m and n is: m m/2 Γ((m + n)/2) n m/2 Γ(m/2)Γ(n/2) z m/2 1 (1 + mz/n) (m+n)/2 on z 0. Bilateral exponential distribution : It is the distribution of Z = X 1 X 2, where X 1 and X 2 are independent and have common exponential density αe αx on x 0. Its pdf is 1 2 αe α x on x R. It has zero mean and variance 2α 2.,

Some seldom-known densities 21-31 Beta distribution : Its pdf with parameters µ>0andν>0is β µ,ν (x) = Γ(µ + ν) Γ(µ)Γ(ν) (1 x)µ 1 x ν 1 on 0 <x<1. Its mean is ν/(µ + ν), and its variance is µν/[(µ + ν) 2 (µ + ν +1)]. Arc sine distribution :Itspdfisβ 1/2,1/2 (x) = Its cdf is given by 2 π sin 1 ( x)for0<x<1. 1 π x(1 x) on 0 <x<1. Generalized arc sine distribution : It is a beta distribution with µ + ν =1. Pareto distribution : It is the distribution of Z = X 1 1, where X is beta distributed with parameters µ>0andν>0. Its density is z µ 1 Γ(µ + ν) on 0 <z<. Γ(µ)Γ(ν) (1 + z) µ+ν This is often used as an incoming traffic with heavy tail as z (ν+1).

Some seldom-known densities 21-32 Cauchy distribution : It is the distribution of Z = αx/ Y, wherex and Y are independent standard normal distributed. Its pdf with parameter α>0is Its cdf is 1 2 + 1 π tan 1 (x/α). γ α (x) = 1 π α x 2 + α 2 on R. It is also closure under convolution, i.e., γ s γ u = γ s+u. It is interesting that Cauchy distribution is also closure under scaling, i.e., a X has density γ aα ( ), if X has density γ α ( ). Hence, we can easily obtain the density of a 1 X 1 + a 2 X 2 + + a n X n as γ a1 α 1 +a 2 α 2 + +a n α n ( ), if X i has density γ αi ( ).

Some seldom-known densities 21-33 One-sided stable distribution of index 1/2 : It is the distribution of Z = α 2 X 2,whereX has a standard normal distribution. Its pdf is f α (z) = α 1 2π z 3 e 1 2 α2 /z on z 0. It is also closure under convolution, namely,f α f β = f α+β. If Z 1,Z 2,...,Z n are i.i.d. with marginal density f α ( ), then Z 1 + + Z n n 2 has density f α ( ). also Weibull distribution : Its pdf and cdf are respectively given by ( ) α 1 α y e (y/β)α and 1 e (y/β)α β β with α>0, β>0and support (0, ). By defining X =1/Y,whereY has the above distribution, we derive the pdf and cdf of X as respectively: This is useful for ordered statistics. αβ(βy) 1 α e 1/(βy) and e 1/(βx)α.

Some seldom-known densities 21-34 For example, [ let X (n) ] denote the largest one among i.i.d. X 1,X 2,...,X n. X(n) Then Pr n x = e (α/π)(1/x),ifx j is Cauchy distributed with parameter α. [ ] X(n) Or Pr x = e (α 2/ π)(1/x 1/2),ifX j is a one-sided stable distribution n 2 of index 1/2 with parameter α. Logistic distribution : Its cdf with parameters α>0andβ Ris on R. 1 1+e αx β

Distribution with nice moment generating function 21-35 Gaussian distribution : M(t) = 1 2πσ 2 which exists for all t R. Exponential distribution : M(t) = is defined for t<α. So the kth moment is k!α k. Poisson distribution : which exists for all t R. 0 M(t) = e tx αe αx dx = r=0 e tx e (x m)2 /(2σ 2) dx = e mt+σ2 t 2 /2, α α t = k=0 e rt e λλr r! = eλ(et 1), ( ) k! t k α k k!