Notes on Mathematical Expectations and Classes of Distributions Introduction to Econometric Theory Econ. 770

Similar documents
Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Robust Estimation and Inference for Extremal Dependence in Time Series. Appendix C: Omitted Proofs and Supporting Lemmata

Module 3. Function of a Random Variable and its distribution

Chapter 1. Sets and probability. 1.3 Probability space

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Chapter 1. Probability, Random Variables and Expectations. 1.1 Axiomatic Probability

Introduction and Overview STAT 421, SP Course Instructor

Week 1 Quantitative Analysis of Financial Markets Distributions A

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

MC3: Econometric Theory and Methods. Course Notes 4

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

2. Variance and Higher Moments

Speci cation of Conditional Expectation Functions

STATISTICS SYLLABUS UNIT I

Plotting data is one method for selecting a probability distribution. The following

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Probability Distributions Columns (a) through (d)

Physics 403 Probability Distributions II: More Properties of PDFs and PMFs

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

CS 361: Probability & Statistics

Introduction to Probability Theory for Graduate Economics Fall 2008

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Statistics for scientists and engineers

Discrete Random Variables

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

Chapter 8: An Introduction to Probability and Statistics

3 Random Samples from Normal Distributions

Discrete probability distributions

MAT 135B Midterm 1 Solutions

EE514A Information Theory I Fall 2013

Measuring robustness

Statistics for Economists. Lectures 3 & 4

Fundamentals of Probability CE 311S

Basic Probability space, sample space concepts and order of a Stochastic Process

STA 247 Solutions to Assignment #1

MATH/STAT 3360, Probability

Relationship between probability set function and random variable - 2 -

Foundations of Probability and Statistics

Statistical Methods in Particle Physics

Math 493 Final Exam December 01

A Few Special Distributions and Their Properties

Exercises with solutions (Set D)

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Sample Spaces, Random Variables

Conditional Probability

Notes 6 : First and second moment methods

Chapter 1: Revie of Calculus and Probability

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Chapter 2 Random Variables

Brief Review of Probability

Binomial and Poisson Probability Distributions

Economics 241B Review of Limit Theorems for Sequences of Random Variables

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

Randomized Algorithms

Chapter 2: Probability Part 1

Functions, Graphs, Equations and Inequalities

MULTINOMIAL PROBABILITY DISTRIBUTION

STA205 Probability: Week 8 R. Wolpert

Discrete Random Variables

1 Bernoulli Distribution: Single Coin Flip

Management Programme. MS-08: Quantitative Analysis for Managerial Applications

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Alvaro Rodrigues-Neto Research School of Economics, Australian National University. ANU Working Papers in Economics and Econometrics # 587

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

Statistics for Managers Using Microsoft Excel (3 rd Edition)

Linear Models for Regression CS534

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Exponential Tail Bounds

Ling 289 Contingency Table Statistics

MATH Notebook 5 Fall 2018/2019

Discrete Probability Refresher

Corrections to Theory of Asset Pricing (2008), Pearson, Boston, MA

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Week 2. Review of Probability, Random Variables and Univariate Distributions

Lecture 4: Probability and Discrete Random Variables

Brief Review of Probability

Probability Background

CS 361: Probability & Statistics

Nonlinear Programming (NLP)

JOINT PROBABILITY DISTRIBUTIONS

CS 361: Probability & Statistics

Discrete Random Variables

1 Sequences of events and their limits

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Qualifying Exam in Probability and Statistics.

STAT 7032 Probability Spring Wlodek Bryc

Qualifying Exam in Probability and Statistics.

Probability Distribution

Chapter 1. GMM: Basic Concepts

Stochastic Processes

THE QUEEN S UNIVERSITY OF BELFAST

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 2: Review of Probability

Does k-th Moment Exist?

Math 151. Rumbos Spring Solutions to Review Problems for Exam 3

Transcription:

Notes on Mathematical Expectations and Classes of Distributions Introduction to Econometric Theory Econ. 77 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill October 4, 2 MATHEMATICAL EXPECTATIONS Let be a random variable on the probability space ( F ). The expectation of is a weighted (Lebesgue-Stieltjes) average over the support of, where the weight is based on the distribution () := ( ): Z := [] = () The generality of the integral allows for discrete and continuous cases. By de nition the induced measure that is based on de nes the expectations: we may synonymously write [] = R ()() which is identically Z [] = where fe g = is a partition of. ()() = sup. Discrete and Continuous X µ fe g = = inf () 2E (E ) If is discrete we write [] = X ( = ) where the summation is over all possible values of. If is continuous then () = R () hence () = (), so we write Z [] = ()

Example (discrete): Let be the space for a die roll, and () = j 4j 2 f23g. The measure is (fg) = 6 for any 2, hence ( = ) = ( : () = ) = (f4g) = 6 ( = ) = ( : () = ) = (f3 5g) = 6 + 6 = 3 ( = 2) = ( : () = 2) = (f4 6g) = 6 + 6 = 3 Then ( = 3) = ( : () = 3) = (fg) = 6 = [] = X ( = ) = 6 + 3 + 2 3 + 3 6 = 5 Example (continuous): ln() = and.2 Moments = [] = Let () = for. Then R = ln() Z The raw moment is ( ) for = 2 Z = = The central moment is j j for Variance measures dispersion: 2 := [] = ( ) 2 which reduces to [] = ( ) 2 = 2 2 []+ 2 = 2 2 []+ 2 = 2 2 The standard deviation is := p []. A random variable with a larger variance, ceteris parabus, has a larger likelihood of being distant from the mean: literally, the mean squared distance to the mean ( ) 2 will be large Skewness is the third standardized moment µ 3 S() := A (continuous) distribution symmetric at satis es ( + ) = ( ) 8 hence ( ) = ( ) = 2 Hence the mean is identically. A symmetric distribution satis es ( ) = for all odd = 3 hence S() =. If S() the distribution is skewed right, or positively skewed. 2

Proof of symmetry at the mean. Consider a continuous distribution. We have by change of variable = [] = = Z Z = + () = Z ( + ) ( + ) + Z ( + ) + ( + ) + = () + = + ( + ) () + () ( + ) ( + ) ( + ) ( + ) ( + ) + Z ( ) ( + ) By symmetry ( + ) = ( ) 8 hence R (+) R ( ) =, therefore [] =. QED. Proof of zero odd moments under symmetry. Consider a continuous distribution. Since =, and if for odd ( ) = = = = Z Z ( ) () ( ) () + ( + ) + Example (Pareto with index 4.5): ( ) + Let ( ) () ( + ) ( + ) = QED. ( ) = ( ) = 2 45 for all ( + ) Then is symmetric and continuous with density : () = () = µ 2 ( + ) 45 = 225 ( + ) 55 : () = () = µ 2 ( ) 45 = 225 ( ) 55 3

The function () integrates to and is therefore a proper density: () = 2 225 55 = 45 ( + ) The mean is zero because is symmetric at zero: [] = () = 225 = 225 ( + ) Z 55 + 225 55 225 ( + ) = 55 45j = ( ) 55 55 = ( + ) The variance [] = [ 2 ] is computed by two applications of integration by parts and one application of change of variables: 2 2 = 2 225 55 ( + ) µ = 45 2 45 ( + ) 45j + 2 45 µ = 2 = 2 35 35 ( + ) 35j + 35 Z 45 = 2 ( + ) ( + ) 35 = 2 ( + ) 35 ( + ) 35 = 2 35 25 = 2285743 45 35 We can check the result by simulation. A sample draw of 5, independent Pareto random variables renders a sample variance.2288327. A plot of observations follow: 4

FIGURE : Pareto with index 4.5 Example (Pareto with index 2.5): If the distribution is now then the density is ( ) = ( ) = 2 25 for all ( + ) () = ( ) = 25 35 for ( + ) By symmetry at zero [] =, and the variance is [] = 2 = 2 () 2 = 2 25 35 ( + ) µ = 25 2 25 ( + ) 25j + 2 25 µ = 2 5 ( + ) 5j + 5 = 2 = 2 5 5 5 5 ( + ) 5 25 5j = 2 5 5 = 266 = 2 ( + ) 25 Hence this Pareto has a larger variance than in the previous example. Figure 2 show the densities of the two Pareto distributions: notice the present Pareto with index 2.5 has much heavier tails, corresponding to a larger variance. 5

FIGURE 2: Pareto with index 2.5 or 4.5 Kurtosis is the fourth standardized moment: µ 4 K() := Kurtosis measures heaviness of tails, although so does variance. The advantage of kurtosis is standardization: we can compare kurtosis across distributions. The normal distribution ( 2 ) has a den- Example (Normal vs. Pareto): sity () = p 2 2 5( )2 2 You can prove [] = and [] = 2 by direct computation using integration with polar coordinates. This distribution is symmetric at zero, and has K() = 3. The Pareto distribution with index 2.5 has [ 4 ] = so the kurtosis is not de ned. Figure 3 plots the standard normal and Pareto with index 2.5. FIGURE 3: Pareto and Normal 6

Example (Normal vs. Asymmetric Pareto): Finally, consider the Pareto distribution ( ) = 2 ( ) 45 for and ( ) = 2 ( + ) 25 for Then the right tail is much heavier then the left tail. The density is () = 25 ( ) 55 for and () = 25 ( + ) 35 for FIGURE 4: Asymmetric Pareto and Normal.3 Inequalities The a many useful inequalities associated with expectations. Since expectations is itself just a Lebesgue-Stieltjes integral, these inequalities are ultimately grounded are very primitive mathematical properties. In all cases the following inequalities are true provided the required expectations exists. Chebyshev s Inequality: (jj ) 2 [ 2 ] for any. Markov s Inequality: (jj ) jj for any. Proof: jj = = Z + Z jj () jj () + Z jj () + jj () + jj () Z jj () jj () + jj () 7

µz = jj () + = jj (jj ). QED. () = jj ( ( ) + ( )) Hölder s Inequality: For any, + = : j j (jj ) (j j ). Proof: By convexity the natural log satis es ln( + ( )) ln() + ( )ln() for any and 2 (). Therefore + ( ). Choose = jj jj, = j j j j, = and therefore = to get jj jj + Now take expectations: j j µ jj µ j j j j jj j j = jjj j jj jj + where the left side is jj jj + and the right side is " jj j j hence we have shown j j " j j jj j j j j j j # ( jj ) ( j j ) = ( jj ) ( j j ) # ( jj ) ( j j ) = jj jj + j j j j = + = j j ( jj ) ( j j ) j j ( jj ) ( j j ) or ( jj ) ( j j ) j j. QED. Cauchy-Schwartz Inequality: Simply Holder s with = = 2. Lyapunov s Inequality: ( jj ) ( jj ) 8. Proof: Apply Holder s inequality: ~ ~ ³ ~ ³ Choose ~ = jj, ~ = and = to deduce hence ( jj ) ( jj ). ~ for any, + = jjj j ( jj ) = ( jj ) 8

.4 Moment Generating Function As long as a random variable has all moments, e.g. j[ ]j for all = 2 3 then it has a moment generating function: h () := i This function generates moments in the sense This is easily seen by expanding around : hence µ ()j = = [ ] () =! + µ j =! + + 2 j = () = X =! 2! = X =! Notice this implicitly requires [ ] exist for all. Random variables with all moments nite include those with bounded support and those with exponential tails (e.g. exponential distribution, normal distribution). Now di erentiate: ()j = = X = h i! = []! = [] Repeat to obtain (). Examples with given in the follow section. Why does a bounded random variable have all moments? Let satisfy (jj ) =, hence its support is no larger than [ ] for nite. Then jj = jj () = Z Z jj () jj () = jj (jj ) = jj = jj Boundedness inherently implies all moments exist. Moment generating functions have a unique corresponding distribution function. Claim: Each function has a unique corresponding distribution () = ( ). Matching moment generating functions across random variables and su ces to match their distributions. However, they must have moment generating functions: merely matching their moments itself does not su ce. 9

Claim: If = for every in some open neighborhood of then () = () for all. Matching moments of arbitrary bounded transforms also su ces. Claim: If [()] = [( )] for every bounded continuous function on R then () = () for all. These properties are used predominantly in asymptotic theory. For example, if we want to prove sequences of estimators ^ and ~ of some unknown parameter have the same asymptotic distribution then we might attempt to show j[(^ )] [( ~ )]j! as! for all bounded..5 Characteristic Function The moment generating function exists when as all moments nite. This holds su ciently if has bounded support, or has a distribution with exponential tail decay (e.g. exponential, logistic, normal, chi-squared). We can compare distributions by comparing moment generating functions but only if the moment generating function exists. A generalization of the moment generating function is the characteristic function: h () = i where = p All random variables have a characteristic function because () is bounded. Claim: j()j. Proof. By DeMoivre s formula and a standard trigonometric identity for any 2 R ³ = jcos() + sin()j = cos() 2 + sin() 2 2 = Therefore by properties of the Lebegue-Stieltjes integral j()j = [] = QED. Like the moment generating function, characteristic functions are unique for a distribution. Claim: Each function has a unique corresponding distribution () = ( ). If a moment jj for integer 2 N then () reveals like as follows. Claim: If jj for 2 N then µ µ ()j = = jj = ()j =

Proof. See Davidson (994: Theorem.4 and Corollary.5) QED. It should be noticed that the characteristic function need not be real-valued. Claim: () is real-valued if and only if is symmetrically distributed. Proof. The following argument exposes more properties of characteristic functions. Consult Davidson (994: p..3) for some information. We know is symmetric if and only if ( ) = ( ) = ( ) hence if and only if and have the same distribution, hence if and only if and have the same characteristic function. But is a linear function of, and linear transforms satisfy hence is symmetric if and only if + () = () () = () = ( ) where ( ) = () the complex conjugate of (). But () = () if and only if () is real, and nally () is real if and only if all existing odd moments are zero (i.e. [ ] = for 3 5 if those moments exist), hence if and only if is symmetric. 2 DISTRIBUTION CLASSES 2. Discrete Distributions 2.. Bernoulli The random variable is distributed Bernoulli, binary or fg-distributed, if ( = ) = and ( = ) = for 2 [] This is denoted» (). The values f g are arbitrary and can any two values, e.g. f g. Trivially [] = X ( = ) = ( ) + = and [] = 2 ( []) 2 = X 2 ( = ) 2 = 2 = ( ) Bernoulli s are bounded and therefore have all moments. The moment generating function is h i = X ( = ) = ( = ) + ( = ) = ( ) +

Hence [] = h i j = = = 2 = µ 2 h i j = = = and so on. This is trivial by direct computation [ ] = P ( = ) = =. The characteristic function is h i = ( = ) + ( = ) = + Example: Bernoulli s include: ipping a coin = {h,t}; married or not; sells a stock share or not; accepts a job o er or not. 2..2 Binomial If we count the number of positive outcomes in independent Bernoulli trials, we have a Binomial, denoted» ( ). For example, ip coins and count the hears. Ask 5 people if they are married and count the yeses. The distribution can be directly computed by using combinatorics. The probability = is the probability of one particular outcome ( ) times the number of ways we get positives : ( = ) = ( ) = ( )! ( )!! Let us prove this sums to one: X X ( = ) = ( ) = ( ( )) = = = because P = ( ) is identically a binomial expansion. It is much easier to de ne a Binomial as a sum of independent Bernoulli s» (): = X = Then = if of the Bernoulli are positives (i.e. = ). Again, like ipping coins and counting the number of heads, or asking 5 people if they are married. Then X [] = [ ] = = By the binomial theorem(+) = =. 2

and by independence 2 [] = X [ ] = ( ) = Example: Exactly 8% items produced at a factory are defective. You independently sample 3. What is the probability that or 2 or 3 defective. What is the mean number of defectives? Let = if the item is defective, and let = P = the number defectives. There is a.8 chance = hence [] = 3 8 = 24. By repeatedly sampling 3 items the mean over in nitely many such sample with be exactly 3. Finally, ( 3) = P 3 = ( ) 3 3 = 2382488 + 2696528 + 2888 = 722424. The distribution is plotted below..3.25.2 FIGURE 5: Binomial B(3,.8).24.27.29.5..5.82.28. 2 3 4 5 6 7 8 9 k 2..3 Poisson The Poisson distribution describes an integer count with in nite support: ( = ) =! for = 2 where. which we write» (). Figure 6 shows two Poisson distributions. FIGURE 6 : Poisson 2 In general [] = = [ ] is not true, but under independence it is true. We will discuss joint distributions and independence later. 3

This distribution is used to count occurrences in a time frame in which an upper bound cannot be predetermined. Examples are the number of asset trades in a day, the number of volcanic eruptions in a decade, and so on. We can show Consider [] = X =! = X = [] = [] =! = X = ( )! = X = because P =! = P = ( = ) =. The distribution also naturally arises in the sense of being roughly identical to a Binomial ( ) for a large number of trials and a small probability of a positive outcome : ( ) ¼ () Since the Binomial is a pain to work with when is larger (i.e. = implies we must manage probabilities in order to compute moments), the Poisson approximation is a nice simpli cation Further, it can be prove that ( )! () as! This says as the number of trials grows to in nite and the probability of a positive goes to zero, the result is a Poisson. Example: Let there be = 5 trials concerning whether a home buyer defaults on a mortgage, with probability =. Let be the number of people who defaulted. Then ( = ) by the Poisson approximation is roughly ( 5) 5! = 33689735, and exactly 99 499 5! (5 )!! = 338 Similarly, ( = 2) is roughly ( 5) 2 5 2! = 8422, and exactly 2 99 498 5! (5 2)!2! = 8363 4! =

Example: Assume the mean number visitors to the Federal Reserve Bank in St. Louis in one hour is 6. Assume the number of visitors is Poisson distributed. What is the probability of 2 visitors in 2.5 hours? Of no more than 7 visitors in 2 hours, or between and 2 visitors in 2 hours? In 2.5 hours on average there are 5 visitors, hence» (5). Then ( = 2) = 5 2 5 2! = 48. In 2 hours on average there are 2 visitors, hence 7X 2 2 ( 7) =! = = 895 and ( 2) = 2X = 2 2! = 746 2.2 Continuous Distributions 2.2. Uniform The uniform density on interval [ ] is a constant () = ( ), and () = for or. We write» [ ]. The distribution function is () = ( ) = Moments are easily computed: and so on. Example: are [] = [] = 2 = Z Z Z = j = = 2 = 2 2 j = 2 2 3 3 j = 3 3 for 2 [ ] = ( + ) 2 2 Let» []. Then () = : see the gure below. The moments = 2 2 j = 2 and 2 = 3 2 = 3 3 j = 3 hence [] = 3 4 = 2. The moment generating function is h () = i = = j = ³ Since () cannot be directly di erentiated at zero, expand around : = P =! = + P =!. Therefore () = X! = = X! = 5

Hence ()j = = X =2 µ 2 ()j = = ( ) 2 j = =! 2! = 2 X =3 and so on. The characteristic function is h () = i = ( )( 2) 3 (3 )(3 2) j = = = 2! 3! 6 = 3 = ³ FIGURE 7 : Uniform Distribution on [,].2..8.6.4.2. - -.8 -.6 -.4 -.2.2.4.6.8.2.4.6.8 2 2.2.2 Exponential The exponential class is represented with density () = expf g for and. We write» exp(). The parameter determines the distribution shape and therefore mean and variance. See the gure below. FIGURE 8 : Exponential Distribution 2. lamda =.5.5..5 lamda = 2 lamda = 6 lamda =. 2 3 4 5 6 7 8 9 6

Notice () = = j = = hence for any the function () is a probability density. The exponential distribution exhibits exponential tail decay: ( ) = = j = = Hence, a larger shape parameter aligns with slower decay, and a smaller parameter aligns with faster (compare = 5 to = in Figure 8). A distribution with exponential tail decay has all moments nite: [ ] (it is a very thin tailed distribution, so all moments exit). This easily proven by using bounds for the Lebesgue-Stieltjes integral. First, note for all su ciently large and some nite because by convexity µ ln for all large Therefore for any [ ] = The mean is by integration-by-parts [] = = = = j = = j = µ j + Similarly [] = 2 is easy to show by repeated application of integration-by-parts. By construction is skewed right: S(). 2.2.3 Normal The normal distribution calls in the exponential class. The density is () = p 2 2 5( )2 2 for 2 R The quadratic ensures a symmetric distribution at. We write» ( 2 ). Using integration by polar coordinates, and by parts, it can be proven () =, [] = and [] = 2 7

The standardized random variable» ( ) a standard normal distribution. The kurtosis is identically 3 since µ 4 K () = = () 4 = 3 by direct computation from the standard normal distribution. Normals have the following property: for any 2 R h +i = +[]+2[]2 = ++2 22 Therefore the moment generating function and characteristic function are easily deduced: h () := i = +2 2 2 h () := i = 2 22 By symmetry all odd centered moments are zero, while all even moments are functions of : ( ) = for = 3 5 ( ) = ( )! for = 246 where ( )! is the product of all odd numbers 3 up to this odd (e.g. (2 )! =, (4 )! = 3, (6 )! = 3 5). Therefore ( ) 4 = 4 (4)! = 4 3 which gives us kurtosis: (( )) 4 = 3. The moment generating function also gives us all moments and reveals all moments are linear functions of powers of and 2 : ()j = = +2 2 2 + 2 j = = µ 2 ()j = = ³ +2 2 2 + 2 2 2 + + 2 22 j = = 2 + 2 The parameter determines location (central tendency), while 2 determines shape (in this case, dispersion). See Figure 9. Since the only parameters that determine a normal are and 2, for a given the probability of being distant from is 8

larger for larger 2. Example: If» () then ( 5) = 668 while if» ( ) then ( 5) = 2677. See Figure..4.3.2 FIGURE 9: Normal Distribution mu =, sig2 = mu = 2, sig2 = mu =, sig2 = mu = 2, sig2 =.. -5-4 -3-2 - 2 3 4 5 FIGURE : Normal Probabilities Normals (likely) do not naturally occur in the sense of precisely describing the distribution of observed events (e.g. human weight, income, interest rates). But, they naturally occur in the following sense. Consider a random draw of identically distributed random variables f 2 g with mean and variance 2. Then under remarkably general conditions X p ( )! ( ) = That is, the random variable p P = ( ) is normally distributed as!. This is a version of the Central Limit Theorem, a corner stone of modern statistical sciences. Although in many text books the underlying assumptions are is iid with nite variance, in fact variance does not need to be nite (a famously forgotten fact), and highly dependent and heterogeneous data are also allowed. We will prove this beautiful fact of the cosmos later in the semester. 9

2.2.4 Chi-Squared Let be iid 3 standard normals:» (). Then by de nition P = 2 has the distribution chi-squared with degrees of freedom, denoted» 2 (). Consider = 2 for just one» (). Since standard normals are symmetric and thin tailed with support R, a 2 () random variable will be right skewed, with the bulk of probability near zero. The density follows from direct computation of the probability function: ( ) = 2 = ³ 2 2 = = 2 hence by the chain-rule ³ 2 ³ 2 = ³ 2 ³ ³ 2 ³ 2 () = ( ) = 2 ³ 2 = 2() 2 = () 2 where () is the standard normal pdf: () = p 2 2 The moments follow from the construction P = and [] = 2. = 2 and independence: [] 3 TRANSFORMATIONS 3. Poisson A simple linear function of Poissons is Poisson. Claim: Remark: Why? Proof. If» ( ) are for = independent then P =» ( P = ). This does not hold for any a ne combination + P =. Consider = 2: the general proof follows by repeating this one ar- 3 Independent and identically distributed. 2

gument. We have by mutual exclusivity and independence ( + 2 = ) = = X ( = 2 = ) = = X =! = ( + 2 )! 2 2 ( )! X = 2 = ( + 2 ) ( + 2 )! X ( = ) ( 2 = ) =! ( )!! = (+2) ( + 2 )! which exploits the Binomial expansion: P = 2!( )!! = ( + 2 ). QED. 3.2 Normal Normals, however, belong to a class of distributions that are closed under arbitrary a ne combinations. In fact, in this class (the Stable distributions, or -stable) the normals are the only ones that must be symmetric and have a nite variance: all others have an in nite variance. Claim: P If» ( 2 ) are for = independent then := + =» ( + P = P = 2 2 ). Proof. I will prove [ ] = [ 22 ] for = + P = and = P = 2 2, which is the characteristic function form of a normal. Since characteristic functions uniquely de ne a distribution function the proof will be complete. We have by independence " # h i Y Y h i = = Now use» ( 2 ) to obtain h i = = exp ( Y = = h i = Ã + = Y =! X 2 2 = 2 2 2 2 ) X 2 2 The latter is the characteristic function of a normal with mean + P = and variance P = 2 2. QED. = 2

3.3 Monotonic Transforms Let have an absolutely continuous distribution with density (). If := () is continuous and monotonic then we can deduce its density from (). Notice ( ) = ( () ) = () if monotonic increasing Therefore = () if monotonic decreasing. () = ( ) = Z () () = ( ()) () if increasing () = ( ) = () () = ( ()) () = () () if decreasing. Either is identically () = () () Example: Let» [ ] and = 2 3. Then () = ( ), 2 [2 3 2 3 ], () = 3 2 3 and () = 3 2 3 23 hence Indeed Z 2 3 2 3 () = () () = = everywhere else. 3( )2 3 23 = ( )2 3 3 j 23 3( )2 3 23 on [2 3 2 3 ] 2 3 = ³ ( )2 3 2 3 2 3 = 22