Slides. Advanced Statistics

Size: px

Start display at page:

Download "Slides. Advanced Statistics"

Gloria Butler
6 years ago
Views:

1 Slides Advanced Statistics Summer Term 2011 (April 5, 2011 May 17, 2011) Tuesdays, and Room: J 498 Prof. Dr. Bernd Wilfling Westfälische Wilhelms-Universität Münster

2 Contents 1 Introduction 1.1 Syllabus 1.2 Why Advanced Statistics? 2 Random Variables, Distribution Functions, Expectation, Moment Generating Functions 2.1 Basic Terminology 2.2 Random Variable, Cumulative Distribution Function, Density Function 2.3 Expectation, Moments and Moment Generating Functions 2.4 Special Parameteric Families of Univariate Distributions 3 Joint and Conditional Distributions, Stochastic Independence 3.1 Joint and Marginal Distribution 3.2 Conditional Distribution and Stochastic Independence 3.3 Expectation and Joint Moment Generating Functions 3.4 The Multivariate Normal Distribution 4 Distributions of Functions of Random Variables 4.1 Expectations of Functions of Random Variables 4.2 Cumulative-distribution-function Technique 4.3 Moment-generating-function Technique 4.4 General Transformations 5 Methods of Estimation 5.1 Sampling, Estimators, Limit Theorems 5.2 Properties of Estimators 5.3 Methods of Estimation Least-Squares Estimators Method-of-moments Estimators Maximum-Likelihood Estimators 6 Hypothesis Testing 6.1 Basic Terminology 6.2 Classical Testing Procedures Wald Test Likelihood-Ratio Test Lagrange-Multiplier Test i

3 References and Related Reading In German: Mosler, K. und F. Schmid (2008). Wahrscheinlichkeitsrechnung und schließende Statistik (3. Auflage). Springer Verlag, Heidelberg. Schira, J. (2009). Statistische Methoden der VWL und BWL Theorie und Praxis (3. Auflage). Pearson Studium, München. Wilfling, B. (2010). Statistik I. Skript zur Vorlesung Deskriptive Statistik im Wintersemester 2010/2011 an der Westfälischen Wilhelms-Universität Münster. Wilfling, B. (2011). Statistik II. Skript zur Vorlesung Wahrscheinlichkeitsrechnung und schließende Statistik im Sommersemester 2011 an der Westfälischen Wilhelms-Universität Münster. In English: Chiang, A. (1984). Fundamental Methods of Mathematical Economics, 3. edition. McGraw- Hill, Singapore. Feller, W. (1968). An Introduction to Probability Theory and its Applications, Vol. 1. John Wiley & Sons, New York. Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 2. John Wiley & Sons, New York. Garthwaite, P.H., Jolliffe, I.T. and B. Jones (2002). Statistical Inference, 3. edition. Oxford University Press, Oxford. Mood, A.M., Graybill, F.A. and D.C. Boes (1974). Introduction to the Theory of Statistics, 3. edition. McGraw-Hill, Tokyo. ii

4 1. Introduction 1.1 Syllabus Aim of this course: Consolidation of probability calculus statistical inference (on the basis of previous Bachelor courses) Preparatory course to Econometrics, Empirical Economics 1

5 Web-site: Study Courses summer term 2011 Advanced Statistics Style: Lecture is based on slides Slides are downloadable as PDF-files from the web-site References: See Contents 2

6 How to get prepared for the exam: Courses Class in Advanced Statistics (Thu, and , J 498, April 7, 2011 May 19, 2011) Auxiliary material to be used in the exam: Pocket calculator (non-programmable) All course-slides and solutions to class-exercises No textbooks 3

7 Class teacher: Dipl.-Mathem. Marc Lammerding (see personal web-site) 4

8 1.2 Why Advanced Statistics? Contents of the BA course Statistics II: Random experiments, events, probability Random variables, distributions Samples, statistics Estimators Tests of hypothesis Aim of the BA course Statistics II : Elementary understanding of statistical concepts (sampling, estimation, hypothesis-testing) 5

9 Now: Course in Advanced Statistics (probability calculus and mathematical statistics) Aim of this course: Better understanding of distribution theory How can we find good estimators? How can we construct good tests of hypothesis? 6

10 Preliminaries: BA courses Mathematics Statistics I Statistics II The slides for the BA courses Statistics I+II are downloadable from the web-site (in German) Later courses based on Advanced Statistics : All courses belonging to the three modules Econometrics and Empirical Economics (Econometrics I+II, Analysis of Time Series,...) 7

11 2. Random Variables, Distribution Functions, Expectation, Moment generating Functions Aim of this section: Mathematical definition of the concepts random variable (cumulative) distribution function (probability) density function expectation and moments moment generating function 8

12 Preliminaries: Repetition of the notions random experiment outcome (sample point) and sample space event probability (see Wilfling (2011), Chapter 2) 9

13 2.1 Basic Terminology Definition 2.1: (Random experiment) A random experiment is an experiment (a) for which we know in advance all conceivable outcomes that it can take on, but (b) for which we do not know in advance the actual outcome that it eventually takes on. Random experiments are performed in controllable trials. 10

14 Examples of random experiments: Drawing of lottery numbers Roulette, tossing a coin, tossing a dice Technical experiments (testing the hardness of lots from steel production etc.) In economics: Random experiments (according to Def. 2.1) are rare (historical data, trials are not controllable) Modern discipline: Experimental Economics 11

15 Definition 2.2: (Sample point, sample space) Each conceivable outcome ω of a random experiment is called a sample point. The totality of conceivable outcomes (or sample points) is defined as the sample space and is denoted by Ω. Examples: Random experiment of tossing a single dice: Ω = {1, 2, 3, 4, 5, 6} Random experiment of tossing a coin until HEAD shows up: Ω = {H, TH, TTH, TTTH, TTTTH,...} Random experiment of measuring tomorrow s exchange rate between the euro and the US-$: Ω = [0, ) 12

16 Obviously: The number of elements in Ω can be either (1) finite or (2) infinite, but countable or (3) infinite and uncountable Now: Definition of the notion Event based on mathematical sets Definition 2.3: (Event) An event of a random experiment is a subset of the sample space Ω. We say the event A occurs if the random experiment has an outcome ω A. 13

17 Remarks: Events are typically denoted by A, B, C,... or A 1, A 2,... A = Ω is called the sure event (since for every sample point ω we have ω A) A = (empty set) is called the impossible event (since for every ω we have ω / A) If the event A is a subset of the event B (A B) we say that the occurrence of A implies the occurrence of B (since for every ω A we also have ω B) Obviously: Events are represented by mathematical sets application of set operations to events 14

18 Combining events (set operations): Intersection: Union: n i=1 n i=1 A i occurs, if all A i occur A i occurs, if at least one A i occurs Set difference: C = A\B occurs, if A occurs and B does not occur Complement: C = Ω\A A occurs, if A does not occur The events A and B are called disjoint, if A B = (both events cannot occur simultaneously) 15

19 Now: For any arbitrary event A we are looking for a number P (A) which represents the probability that A occurs Formally: (P ( ) is a set function) P : A P (A) Question: Which properties should the probability function (set function) P ( ) have? 16

20 Definition 2.4: (Kolmogorov-axioms) The following axioms for P ( ) are called Kolmogorov-axioms: Nonnegativity: P (A) 0 for every A Standardization: P (Ω) = 1 Additivity: For two disjoint events A and B (i.e. for A B = ) P ( ) satisfies P (A B) = P (A) + P (B) 17

21 Easy to check: The three axioms imply several additional properties and rules when computing with probabilities Theorem 2.5: (General properties) The Kolmogorov-axioms imply the following properties: Probability of the complementary event: P (A) = 1 P (A) Probability of the impossible event: Range of probabilities: P ( ) = 0 0 P (A) 1 18

22 Next: General rules when computing with probabilities Theorem 2.6: (Calculation rules) The Kolmogorov-axioms imply the following calculation rules (A, B, C are arbitrary events): Addition rule (I): P (A B) = P (A) + P (B) P (A B) (probability that A or B occurs) 19

23 Addition rule (II): P (A B C) = P (A) + P (B) + P (C) P (A B) P (B C) P (A C) + P (A B C) (probability that A or B or C occurs) Probability of the difference event : P (A\B) = P (A B) = P (A) P (A B) 20

24 Notice: If B implies A (i.e. if B A) it follows that P (A\B) = P (A) P (B) 21

25 2.2 Random Variable, Cumulative Distribution Function, Density Function Frequently: Instead of being interested in a concrete sample point ω Ω itself, we are rather interested in a number depending on ω Examples: Profit in euro when playing roulette Profit earned when selling a stock Monthly salary of a randomly selected person Intuitive meaning of a random variable: Rule translating the abstract ω into a number 22

26 Definition 2.7: (Random variable [rv]) A random variable, denoted by X or X( ), is a mathematical function of the form X : Ω R ω X(ω). Remarks: A random variable relates each sample point ω Ω to a real number Intuitively: A random variable X characterizes a number that is a priori unknown 23

27 When the random experiment is carried out, the random variable X takes on the value x x is called realization or value of the random variable X after the random experiment has been carried out Random variables are denoted by capital letters, realizations are denoted by small letters The rv X describes the situation ex ante, i.e. before carrying out the random experiment The realization x describes the situation ex post, i.e. after having carried out the random experiment 24

28 Example 1: Consider the experiment of tossing a single coin (H=Head, T =Tail). Let the rv X represent the Number of Heads We have Ω = {H, T } The random variable X can take on two values: X(T ) = 0, X(H) = 1 25

29 Example 2: Consider the experiment of tossing a coin three times. Let X represent the Number of Heads We have Ω = {(H, H, H), (H, H, T ),..., (T, T, T ) }{{}}{{}}{{} =ω 1 =ω 2 The rv X is defined by X(ω) = number of H in ω =ω 8 } Obviously: X relates distinct ω s to the same number, e.g. X((H, H, T )) = X((H, T, H)) = X((T, H, H)) = 2 26

30 Example 3: Consider the experiment of randomly selecting 1 person from a group of people. Let X represent the person s status of employment We have Ω = { employed }{{}, unemployed }{{} =ω 1 =ω 2 } X can be defined as X(ω 1 ) = 1, X(ω 2 ) = 0 27

31 Example 4: Consider the experiment of measuring tomorrow s price of a specific stock. Let X denote the stock price We have Ω = [0, ), i.e. X is defined by X(ω) = ω Conclusion: The random variable X can take on distinct values with specific probabilities 28

32 Question: How can we determine these specific probabilities and how can we calculate with them? Simplifying notation: (a, b, x R) P (X = a) P ({ω X(ω) = a}) P (a < X < b) P ({ω a < X(ω) < b}) P (X x) P ({ω X(ω) x}) Solution: We can compute these probabilities via the so-called cumulative distribution function of X 29

33 Intuitively: The cumulative distribution function of the random variable X characterizes the probabilities according to which the possible values x are distributed along the real line (the so-called distribution of X) Definition 2.8: (Cumulative distribution function [cdf]) The cumulative distribution function of a random variable X, denoted by F X, is defined to be the function F X : R [0, 1] x F X (x) = P ({ω X(ω) x}) = P (X x). 30

34 Example: Consider the experiment of tossing a coin three times. Let X represent the Number of Heads We have Ω = {(H, H, H), (H, H, T ),..., (T, T, T )} }{{}}{{}}{{} = ω 1 = ω 2 = ω 8 For the probabilities of X we find P (X = 0) = P ({(T, T, T )}) = 1/8 P (X = 1) = P ({(T, T, H), (T, H, T ), (H, T, T )}) = 3/8 P (X = 2) = P ({(T, H, H), (H, T, H), (H, H, T )}) = 3/8 P (X = 3) = P ({(H, H, H)}) = 1/8 31

35 Thus, the cdf is given by F X (x) = for x < for 0 x < for 1 x < for 2 x < 3 1 for x 3 Remarks: In practice, it will be sufficient to only know the cdf F X of X In many situations, it will appear impossible to exactly specify the sample space Ω or the explicit function X : Ω R. However, often we may derive the cdf F X from other factual consideration 32

36 General properties of F X : F X (x) is a monotone, nondecreasing function We have lim x F X(x) = 0 and lim x + F X(x) = 1 F X is continuous from the right; that is, lim z x z>x F X (z) = F X (x) 33

37 Summary: Via the cdf F X (x) we can answer the following question: What is the probability that the random variable X takes on a value that does not exceed x? Now: Consider the question: What is the value which X does not exceed with a prespecified probability p (0, 1)? quantile function of X 34

38 Definition 2.9: (Quantile function) Consider the rv X with cdf F X. For every p (0, 1) the quantile function of X, denoted by Q X (p), is defined as Q X : (0, 1) R p Q X (p) = min{x F X (x) p}. The value of the quantile function x p = Q X (p) is called the pth quantile of X. Remarks: The pth quantile x p of X is defined as the smallest number x satisfying F X (x) p In other words: The pth quantile x p is the smallest value that X does not exceed with probability p 35

39 Special quantiles: Median: p = 0.5 Quartiles: p = 0.25, 0.5, 0.75 Quintiles: p = 0.2, 0.4, 0.6, 0.8 Deciles: p = 0.1, 0.2,..., 0.9 Now: Consideration of two distinct classes of random variables (discrete vs. continuous rv s) 36

40 Reason: Each class requires a specific mathematical treatment Mathematical tools for analyzing discrete rv s: Finite and infinite sums Mathematical tools for analyzing continuous rv s: Differential- and integral calculus Remarks: Some rv s are partly discrete and partly continuous Such rv s are not treated in this course 37

41 Definition 2.10: (Discrete random variable) A random variable X will be defined to be discrete if it can take on either (a) only a finite number of values x 1, x 2,..., x J or (b) an infinite, but countable number of values x 1, x 2,... each with strictly positive probability; that is, if for all j = 1,..., J,... we have P (X = x j ) > 0 and J,... j=1 P (X = x j ) = 1. 38

42 Examples of discrete variables: Countable variables ( X = Number of... ) Encoded qualitative variables Further definitions: Definition 2.11: (Support of a discrete random variable) The support of a discrete rv X, denoted by supp(x), is defined to be the totality of all values that X can take on with a strictly positive probability: supp(x) = {x 1,..., x J } or supp(x) = {x 1, x 2,...}. 39

43 Definition 2.12: (Discrete density function) For a discrete random variable X the function f X (x) = P (X = x) is defined to be the discrete density function of X. Remarks: The discrete density function f X ( ) takes on strictly positive values only for elements of the support of X. For realizations of X that do not belong to the support of X, i.e. for x / supp(x), we have f X (x) = 0: f X (x) = { P (X = xj ) > 0 for x = x j supp(x) 0 for x / supp(x) 40

44 The discrete density function f X ( ) has the following properties: f X (x) 0 for all x f X (x j ) = 1 x j supp(x) For any arbitrary set A R the probability of the event {ω X(ω) A} = {X A} is given by P (X A) = x j A f X (x j ) 41

45 Example: Consider the experiment of tossing a coin three times and let X = Number of Heads (see slide 31) Obviously: X is discrete and has the support supp(x) = {0, 1, 2, 3} The discrete density function of X is given by f X (x) = P (X = 0) = for x = 0 P (X = 1) = for x = 1 P (X = 2) = for x = 2 P (X = 3) = for x = 3 0 for x / supp(x) 42

46 The cdf of X is given by (see slide 32) F X (x) = for x < for 0 x < for 1 x < for 2 x < 3 1 for x 3 Obviously: The cdf F X ( ) can be obtained from f X ( ): F X (x) = P (X x) = {x j supp(x) x j x} =P (X=x j ) {}}{ f X (x j ) 43

47 Conclusion: The cdf of a discrete random variable X is a step function with steps at the points x j supp(x). The height of the step at x j is given by F X (x j ) lim x xj x<x j F (x) = P (X = x j ) = f X (x j ), i.e. the step height is equal to the value of the discrete density function at x j (relationship between cdf and discrete density function) 44

48 Now: Definition of continuous random variables Intuitively: In contrast to discrete random variables, continuous random variables can take on an uncountable number of values (e.g. every real number on a given interval) In fact: Definition of a continuous random variable is quite technical 45

49 Definition 2.13: (Continuous rv, probability density function) A random variable X is called continuous if there exists a function f X : R [0, ) such that the cdf of X can be written as x F X (x) = f X(t)dt for all x R. The function f X (x) is called the probability density function (pdf) of X. Remarks: The cdf F X ( ) of a continuous random variable X is a primitive function of the pdf f X ( ) F X (x) = P (X x) is equal to the area under the pdf f X ( ) between the limits and x 46

50 Cdf F X ( ) and pdf f X ( ) P(X x) = F X (x) f X (t) x t 47

51 Properties of the pdf f X ( ): 1. A pdf f X ( ) cannot take on negative value, i.e. f X (x) 0 for all x R 2. The area under a pdf is equal to one, i.e. + f X(x)dx = 1 3. If the cdf F X (x) is differentiable we have f X (x) = F X (x) df X(x)/dx 48

52 Example: (Uniform distribution over [0, 10]) Consider the random variable X with pdf f X (x) = { 0, for x / [0, 10] 0.1, for x [0, 10] Derivation of the cdf F X : For x < 0 we have F X (x) = x f X(t) dt = x 0 dt = 0 49

53 For x [0, 10] we have F X (x) = = x f X(t) dt 0 0 dt }{{} =0 + x dt = [0.1 t] x 0 = 0.1 x = 0.1 x 50

54 For x > 10 we have F X (x) = = = 1 x f X(t) dt 0 0 dt }{{} = dt + } 0 {{ } = dt }{{} =0 51

55 Now: Interval probabilities, i.e. (for a, b R, a < b) P (X (a, b]) = P (a < X b) We have P (a < X b) = P ({ω a < X(ω) b}) = P ({ω X(ω) > a} {ω X(ω) b}) = 1 P ({ω X(ω) > a} {ω X(ω) b}) = 1 P ({ω X(ω) > a} {ω X(ω) b}) = 1 P ({ω X(ω) a} {ω X(ω) > b}) 52

56 = 1 [P (X a) + P (X > b)] = 1 [F X (a) + (1 P (X b))] = 1 [F X (a) + 1 F X (b)] = F X (b) F X (a) = = b f X(t) dt b a f X(t) dt a f X(t) dt 53

57 Interval probability between the limits a and b f X (x) P(a < X b) a b x 54

58 Important result for a continuous rv X: P (X = a) = 0 for all a R Proof: P (X = a) = lim b a P (a < X b) = = a a f X(x)dx = 0 b lim f b a X(x) dx a Conclusion: The probability that a continuous random variable X takes on a single explicit value is always zero 55

59 Probability of a single value f X (x) a b 3 b 2 b 1 x 56

60 Notice: This does not imply that the event {X = a} cannot occur Consequence: Since for continuous random variables we always have P (X = a) = 0 for all a R, it follows that P (a < X < b) = P (a X < b) = P (a X b) = P (a < X b) = F X (b) F X (a) (when computing interval probabilities for continuous rv s, it does not matter if the interval is open or closed) 57

61 2.3 Expectation, Moments and Moment Generating Functions Repetition: Expectation of an arbitrary random variable X Definition 2.14: (Expectation) The expectation of the random variable X, denoted by E(X), is defined by E(X) = {x j supp(x)} + x j P (X = x j ), if X is discrete x f X(x) dx, if X is continuous. 58

62 Remarks: The expectation of the random variable X is approximately equal to the sum of all realizations each weighted by the probability of its occurrence Instead of E(X) we often write µ X There exist random variables that do not have an expectation (see class) 59

63 Example 1: (Discrete random variable) Let X repre- What is the Consider the experiment of tossing two dice. sent the absolute difference of the two dice. expectation of X? The support of X is given by supp(x) = {0, 1, 2, 3, 4, 5} 60

64 The discrete density function of X is given by f X (x) = This gives E(X) = 0 P (X = 0) = 6/36 for x = 0 P (X = 1) = 10/36 for x = 1 P (X = 2) = 8/36 for x = 2 P (X = 3) = 6/36 for x = 3 P (X = 4) = 4/36 for x = 4 P (X = 5) = 2/36 for x = 5 0 for x / supp(x) = =

65 Example 2: (Continuous random variable) Consider the continuous random variable X with pdf f X (x) = x, for 1 x 3 4 0, elsewise To calculate the expectation we split up the integral: E(X) = + x f X(x) dx = dx + x x dx dx 62

66 = 3 1 = 1 4 x 2 4 dx = 1 4 ( ) [ ] x3 1 = = Frequently: Random variable X plus discrete density or pdf f X is known We have to find the expectation of the transformed random variable Y = g(x) 63

67 Theorem 2.15: (Expectation of a transformed rv) Let X be a random variable with discrete density or pdf f X ( ). For any Baire-function g : R R the expectation of the transformed random variable Y = g(x) is given by E(Y ) = E[g(X)] = {x j supp(x)} + g(x j ) P (X = x j ), if X is discrete g(x) f X(x) dx, if X is continuous. 64

68 Remarks: All functions considered in this course are Baire-functions For the special case g(x) = x (the identity function) Theorem 2.15 coincides with Definition 2.14 Next: Some important rules for calculating expected values 65

69 Theorem 2.16: (Properties of expectations) Let X be an arbitrary random variable (discrete or continuous), c, c 1, c 2 R constants and g, g 1, g 2 : R R functions. Then: 1. E(c) = c. 2. E[c g(x)] = c E[g(X)]. 3. E[c 1 g 1 (X) + c 2 g 2 (X)] = c 1 E[g 1 (X)] + c 2 E[g 2 (X)]. 4. If g 1 (x) g 2 (x) for all x R then E[g 1 (X)] E[g 2 (X)]. Proof: Class 66

70 Now: Consider the random variable X (discrete or continuous) and the explicit function g(x) = [x E(X)] 2 variance and standard deviation of X Definition 2.17: (Variance, standard deviation) For any random variable X the variance, denoted by Var(X), is defined as the expected quadratic distance between X and its expectation E(X); that is Var(X) = E[(X E(X)) 2 ]. The standard deviation of X, denoted by SD(X), is defined to be the (positive) square root of the variance: SD(X) = + Var(X). 67

71 Remark: Setting g(x) = [X E(X)] 2 in Theorem 2.15 (on slide 64) yields the following explicit formulas for discrete and continuous random variables: Var(X) = E[g(X)] = {x j supp(x)} + [x j E(X)] 2 P (X = x j ) [x E(X)]2 f X (x) dx 68

72 Example: (Discrete random variable) Consider again the experiment of tossing two dice with X representing the absolute difference of the two dice (see Example 1 on slide 60). The variance is given by Var(X) = (0 70/36) 2 6/36 + (1 70/36) 2 10/36 + (2 70/36) 2 8/36 + (3 70/36) 2 6/36 + (4 70/36) 2 4/36 + (5 70/36) 2 2/36 = Notice: The variance is an expectation per definitionem rules for expectations are applicable 69

73 Theorem 2.18: (Rules for variances) Let X be an arbitrary random variable (discrete or continuous) and a, b R real constants; then 1. Var(X) = E(X 2 ) [E(X)] Var(a + b X) = b 2 Var(X). Proof: Class Next: Two important inequalities dealing with expectations and transformed random variables 70

74 Theorem 2.19: (Chebyshev inequality) Let X be an arbitrary random variable and g : R R + a nonnegative function. Then, for every k > 0 we have Special case: Consider P [g(x) k] E [g(x)]. k g(x) = [x E(X)] 2 and k = r 2 Var(X) (r > 0) Theorem 2.19 implies P { [X E(X)] 2 r 2 Var(X) } Var(X) r 2 Var(X) = 1 r 2 71

75 Now: P { [X E(X)] 2 r 2 Var(X) } = P { X E(X) r SD(X)} = 1 P { X E(X) < r SD(X)} It follows that P { X E(X) < r SD(X)} 1 1 r 2 (specific Chebyshev inequality) 72

76 Remarks: The specific Chebyshev inequality provides a minimal probability of the event that any arbitrary random variable X takes on a value from the following interval: [E(X) r SD(X), E(X) + r SD(X)] For example, for r = 3 we have P { X E(X) < 3 SD(X)} = 8 9 which is equivalent to P {E(X) 3 SD(X) < X < E(X) + 3 SD(X)} or P {X (E(X) 3 SD(X), E(X) + 3 SD(X))}

77 Theorem 2.20: (Jensen inequality) Let X be a random variable with mean E(X) and let g : R R be a convex function, i.e. for all x we have g (x) 0; then E [g(x)] g(e[x]). Remarks: If the function g is concave (i.e. if g (x) 0 for all x) then Jensen s inequality states that E [g(x)] g(e[x]) Notice that in general we have E [g(x)] g(e[x]) 74

78 Example: Consider the random variable X and the function g(x) = x 2 We have g (x) = 2 0 for all x, i.e. g is convex It follows from Jensen s inequality that i.e. E [g(x)] }{{} =E(X 2 ) g(e[x]) }{{} =[E(X)] 2 E(X 2 ) [E(X)] 2 0 This implies Var(X) = E(X 2 ) [E(X)] 2 0 (the variance of an arbitrary rv cannot be negative) 75

79 Now: Consider the random variable X with expectation E(X) = µ X, the integer number n N and the functions g 1 (x) = x n g 2 (x) = [x µ X ] n Definition 2.21: (Moments, central moments) (a) The n-th moment of X, denoted by µ n, is defined as µ n E[g 1 (X)] = E(X n ). (b) The n-th central moment of X about µ X, denoted by µ n, is defined as µ n E[g 2 (X)] = E[(X µ X ) n ]. 76

80 Relations: µ 1 = E(X) = µ X (the 1st moment coincides with E(X)) µ 1 = E[X µ X ] = E(X) µ X = 0 (the 1st central moment is always equal to 0) µ 2 = E[(X µ X ) 2 ] = Var(X) (the 2nd central moment coincides with Var(X)) 77

81 Remarks: The first four moments of a random variable X are important measures of the probability distribution (expectation, variance, skewness, kurtosis) The moments of a random variable X play an important role in theoretical and applied statistics In some cases, when all moments are known, the cdf of a random variable X can be determined 78

82 Question: Can we find a function that gives us a representation of all moments of a random variable X? Definition 2.22: (Moment generating function) Let X be a random variable with discrete density or pdf f X ( ). The expected value of e t X is defined to be the moment generating function of X if the expected value exists for every value of t in some interval h < t < h, h > 0. That is, the moment generating function of X, denoted by m X (t), is defined as m X (t) = E [ e t X]. 79

83 Remarks: The moment generating function m X (t) is a function in t There are rv s X for which m X (t) does not exist If m X (t) exists it can be calculated as m X (t) = E [ e t X] = {x j supp(x)} + e t x j P (X = x j ), if X is discrete et x f X (x) dx, if X is continuous 80

84 Question: Why is m X (t) called the moment generating function? Answer: Consider the nth derivative of m X (t) with respect to t: d n dt nm X(t) = {x j supp(x)} + (x j ) n e t x j P (X = x j ) for discrete X xn e t x f X (x) dx for continuous X 81

85 Now, evaluate the nth derivative at t = 0: d n dt nm X(0) = {x j supp(x)} + (x j ) n P (X = x j ) for discrete X xn f X (x) dx for continuous X = E(X n ) = µ n (see Definition 2.21(a) on slide 76) 82

86 Example: Let X be a continuous random variable with pdf f X (x) = { 0, for x < 0 λ e λ x, for x 0 (exponential distribution with parameter λ > 0) We have m X (t) = E [ e t X] = + et x f X (x) dx for t < λ = + 0 λ e (t λ) x dx = λ λ t 83

87 It follows that and thus m X (t) = λ (λ t) 2 and m 2λ X (t) = (λ t) 3 m X (0) = E(X) = 1 λ and m X (0) = E(X2 ) = 2 λ 2 Now: Important result on moment generating functions 84

88 Theorem 2.23: (Identification property) Let X and Y be two random variables with densities f X ( ) and f Y ( ), respectively. Suppose that m X (t) and m Y (t) both exist and that m X (t) = m Y (t) for all t in the interval h < t < h for some h > 0. Then the two cdf s F X ( ) and F Y ( ) are equal; that is F X (x) = F Y (x) for all x. Remarks: Theorem 2.23 states that there is a unique cdf F X (x) for a given moment generating function m X (t) if we can find m X (t) for X then, at least theoretically, we can find the distribution of X We will make use of this property in Section 4 85

89 Example: Suppose that a random variable X has the moment generating function m X (t) = 1 1 t Then the pdf of X is given by for 1 < t < 1 f X (x) = { 0, for x < 0 e x, for x 0 (exponential distribution with parameter λ = 1) 86

90 2.4 Special Parametric Families of Univariate Distributions Up to now: General mathematical properties of arbitrary distributions Discrimination: discrete vs continuous distributions Consideration of the cdf F X (x) the discrete density or the pdf f X (x) expectations of the form E[g(X)] the moment generating function m X (t) 87

91 Central result: The distribution of a random variable X is (essentially) determined by f X (x) or F X (x) F X (x) can be determined by f X (x) (cf. slide 46) f X (x) can be determined by F X (x) (cf. slide 48) Question: How many different distributions are known to exist? 88

92 Answer: Infinitely many But: In practice, there are some important parametric families of distributions that provide good models for representing realworld random phenomena These families of distributions are decribed in detail in all textbooks on mathematical statistics (see e.g. Mosler & Schmid (2008), Mood et al. (1974)) 89

93 Important families of discrete distributions Bernoulli distribution Binomial distribution Geometric distribution Poisson distribution Important families of continuous distributions Uniform or rectangular distribution Exponential distribution Normal distribution 90

94 Remark: The most important family of distributions at all is the normal distribution Definition 2.24: (Normal distribution) A continuous random variable X is defined to be normally distributed with parameters µ R and σ 2 > 0, denoted by X N(µ, σ 2 ), if its pdf is given by f X (x) = 1 ( ) x µ 2 e 1 2 σ, x R. 2π σ 91

95 PDF s of the normal distribution f X (x) N(0,1) N(5,1) N(5,3) N(5,5) 0 5 x 92

96 Remarks: The special normal distribution N(0, 1) is called standard normal distribution the pdf of which is denoted by ϕ(x) The properties as well as calculation rules for normally distributed random variables are important pre-conditions for this course (see Wilfling (2011), Section 3.4) 93

97 3. Joint and Conditional Distributions, Stochastic Independence Aim of this section: Multidimensional random variables (random vectors) (joint and marginal distributions) Stochastic (in)dependence and conditional distribution Multivariate normal distribution (definition, properties) Literature: Mood, Graybill, Boes (1974), Chapter IV, pp Wilfling (2011), Chapter 4 94

98 3.1 Joint and Marginal Distribution Now: Consider several random variables simultaneously Applications: Several economic applications Statistical inference 95

99 Definition 3.1: (Random vector) Let X 1,, X n be a set of n random variables each representing the same random experiment, i.e. X i : Ω R for i = 1,..., n. Then X = (X 1,..., X n ) is called an n-dimensional random variable or an n-dimensional random vector. Remark: In the literature random vectors are often denoted by X = (X 1,..., X n ) or more simply by X 1,..., X n 96

100 For n = 2 it is common practice to write X = (X, Y ) or (X, Y ) or X, Y Realizations are denoted by small letters: x = (x 1,..., x n ) R n or x = (x, y) R 2 Now: Characterization of the probability distribution of the random vector X 97

101 Definition 3.2: (Joint cumulative distribution function) Let X = (X 1,..., X n ) be an n-dimensional random vector. The function defined by F X1,...,X n : R n [0, 1] F X1,...,X n (x 1,..., x n ) = P (X 1 x 1, X 2 x 2,..., X n x n ) is called the joint cumulative distribution function of X. Remark: Definition 3.2 applies to discrete as well as to continuous random variables X 1,..., X n 98

102 Some properties of the bivariate cdf (n = 2): F X,Y (x, y) is monotone increasing in x and y lim x F X,Y (x, y) = 0 lim y F X,Y (x, y) = 0 lim x + y + F X,Y (x, y) = 1 Remark: Analogous properties hold for the n-dimensional cdf F X1,...,X n (x 1,..., x n ) 99

103 Now: Joint discrete versus joint continuous random vectors Definition 3.3: (Joint discrete random vector) The random vector X = (X 1,..., X n ) is defined to be a joint discrete random vector if it can assume only a finite (or a countable infinite) number of realizations x = (x 1,..., x n ) such that and P (X 1 = x 1, X 2 = x 2,..., X n = x n ) > 0 P (X1 = x 1, X 2 = x 2,..., X n = x n ) = 1, where the summation is over all possible realizations of X. 100

104 Definition 3.4: (Joint continuous random vector) The random vector X = (X 1,..., X n ) is defined to be a joint continuous random vector if and only if there exists a nonnegative function f X1,...,X n (x 1,..., x n ) such that xn x1 F X1,...,X n (x 1,..., x n ) =... f X 1,...,X n (u 1,..., u n ) du 1... du n for all (x 1,..., x n ). The function f X1,...,X n is defined to be a joint probability density function of X. Example: Consider X = (X, Y ) with joint pdf f X,Y (x, y) = { x + y, for (x, y) [0, 1] [0, 1] 0, elsewise 101

105 Joint pdf f X,Y (x, y) fhx,yl x y

106 The joint cdf can be obtained by F X,Y (x, y) = y x f X,Y (u, v) du dv = y 0 x 0 (u + v) du dv =... = (Proof: Class) 0.5(x 2 y + xy 2 ), for (x, y) [0, 1] [0, 1] 0.5(x 2 + x), for (x, y) [0, 1] [1, ) 0.5(y 2 + y), for (x, y) [1, ) [0, 1] 1, for (x, y) [1, ) [1, ) 103

107 Remarks: If X = (X 1,..., X n ) is a joint continuous random vector, then n F X1,...,X n (x 1,..., x n ) x 1 x n = f X1,...,X n (x 1,..., x n ) The volume under the joint pdf represents probabilities: P (a u 1 < X 1 a o 1,..., au n < X n a o n ) = a o n a u n... a o 1 a u 1 f X1,...,X n (u 1,..., u n ) du 1... du n 104

108 In this course: Emphasis on joint continuous random vectors Analogous results for joint discrete random vectors (see Mood, Graybill, Boes (1974), Chapter IV) Now: Determination of the distribution of a single random variable X i from the joint distribution of the random vector (X 1,..., X n ) marginal distribution 105

109 Definition 3.5: (Marginal distribution) Let X = (X 1,..., X n ) be a continuous random vector with joint cdf F X1,...,X n and joint pdf f X1,...,X n. Then F X1 (x 1 ) = F X1,...,X n (x 1, +, +,..., +, + ) F X2 (x 2 ) = F X1,...,X n (+, x 2, +,..., +, + )... F Xn (x n ) = F X1,...,X n (+, +, +,..., +, x n ) are called marginal cdfs while 106

110 f X1 (x 1 ) = f X2 (x 2 ) = f X 1,...,X n (x 1, x 2,..., x n ) dx 2... dx n f X 1,...,X n (x 1, x 2,..., x n ) dx 1 dx 3... dx n f Xn (x n ) = f X 1,...,X n (x 1, x 2,..., x n ) dx 1 dx 2... dx n 1 are called marginal pdfs of the one-dimensional (univariate) random variables X 1,..., X n. 107

111 Example: Consider the bivariate pdf f X,Y (x, y) = { 40(x 0.5) 2 y 3 (3 2x y), for (x, y) [0, 1] [0, 1] 0, elsewise 108

112 Bivariate pdf f X,Y (x, y) 3 fhx,yl x y

113 The marginal pdf of X obtains as f X (x) = (x 0.5)2 y 3 (3 2x y)dy = 40(x 0.5) (3y3 2xy 3 y 4 )dy [ = 40(x 0.5) y4 2x 4 y4 1 ] 1 5 y5 = 40(x 0.5) 2 ( 3 4 2x = 20x x 2 27x ) 0 110

114 Marginal pdf f X (x) fhxl x 111

115 The marginal pdf of Y obtains as f Y (y) = (x 0.5)2 y 3 (3 2x y)dx = 40y 3 1 = 10 3 y3 (y 2) 0 (x 0.5)2 (3 2x y)dx 112

116 Marginal pdf f Y (y) fhyl y 113

117 Remarks: When considering the marginal instead of the joint distributions, we are faced with an information loss (the joint distribution uniquely determines all marginal distributions, but the converse does not hold in general) Besides the respective univariate marginal distributions, there are also multivariate distributions which can be obtained from the joint distribution of X = (X 1,..., X n ) 114

118 Example: For n = 5 consider X = (X 1,..., X 5 ) with joint pdf f X1,...,X 5 Then the marginal pdf of Z = (X 1, X 3, X 5 ) obtains as f X1,X 3,X 5 (x 1, x 3, x 5 ) = + + f X 1,...,X 5 (x 1, x 2, x 3, x 4, x 5 ) dx 2 dx 4 (integrate out the irrelevant components) 115

119 3.2 Conditional Distribution and Stochastic Independence Now: Distribution of a random variable X under the condition that another random variable Y has already taken on the realization y (conditional distribution of X given Y = y) 116

120 Definition 3.6: (Conditional distribution) Let X = (X, Y ) be a bivariate continuous random vector with joint pdf f X,Y (x, y). The conditional density of X given Y = y is defined to be f X Y =y (x) = f X,Y (x, y). f Y (y) Analogously, the conditional density of Y given X = x is defined to be f Y X=x (y) = f X,Y (x, y). f X (x) 117

121 Remark: Conditional densities of random vectors are defined analogously, e.g. f X1,X 2,X 4 X 3 =x 3,X 5 =x 5 (x 1, x 2, x 4 ) = f X1,X 2,X 3,X 4,X 5 (x 1, x 2, x 3, x 4, x 5 ) f X3,X 5 (x 3, x 5 ) 118

122 Example: Consider the bivariate pdf f X,Y (x, y) { 40(x 0.5) = 2 y 3 (3 2x y), for (x, y) [0, 1] [0, 1] 0, elsewise with marginal pdf f Y (y) = 10 3 y3 (y 2) (cf. Slides ) 119

123 It follows that f X Y =y (x) = f X,Y (x, y) f Y (y) = 40(x 0.5)2 y 3 (3 2x y) 10 3 y3 (y 2) = 12(x 0.5)2 (3 2x y) 2 y 120

124 Conditional pdf f X Y =0.01 (x) of X given Y = 0.01 Bedingte 3 Dichte x 121

125 Conditional pdf f X Y =0.95 (x) of X given Y = 0.95 Bedingte 1.2 Dichte x 122

126 Now: Combine the concepts joint distribution and conditional distribution to define the notion stochastic independence (for two random variables first) Definition 3.7: (Stochastic Independence [I]) Let (X, Y ) be a bivariate continuous random vector with joint pdf f X,Y (x, y). X and Y are defined to be stochastically independent if and only if f X,Y (x, y) = f X (x) f Y (y) for all x, y R. 123

127 Remarks: Alternatively, stochastic independence can be defined via the cdfs: X and Y are stochastically independent, if and only if F X,Y (x, y) = F X (x) F Y (y) for all x, y R. If X and Y are independent, we have f X Y =y (x) = f X,Y (x, y) f Y (y) = f X(x) f Y (y) f Y (y) = f X (x) f Y X=x (y) = f X,Y (x, y) f X (x) = f X(x) f Y (y) f X (x) = f Y (y) If X and Y are independent and g and h are two continuous functions, then g(x) and h(y ) are also independent 124

128 Now: Extension to n random variables Definition 3.8: (Stochastic independence [II]) Let (X 1,..., X n ) be a continuous random vector with joint pdf f X1,...,X n (x 1,..., x n ) and joint cdf F X1,...,X n (x 1,..., x n ). X 1,..., X n are defined to be stochastically independent, if and only if for all (x 1,..., x n ) R n or f X1,...,X n (x 1,..., x n ) = f X1 (x 1 )... f Xn (x n ) F X1,...,X n (x 1,..., x n ) = F X1 (x 1 )... F Xn (x n ). 125

129 Remarks: For discrete random vectors we define: X 1,..., X n are stochastically independent, if and only if for all (x 1,..., x n ) R n or P (X 1 = x 1,..., X n = x n ) = P (X 1 = x 1 )... P (X n = x n ) F X1,...,X n (x 1,..., x n ) = F X1 (x 1 )... F Xn (x n ) In the case of independence, the joint distribution results from the marginal distributions If X 1,..., X n are stochastically independent and g 1,..., g n are continuous functions, then Y 1 = g 1 (X 1 ),..., Y n = g n (X n ) are also stochastically independent 126

130 3.3 Expectation and Joint Moment Generating Functions Now: Definition of the expectation of a function g : R n R (x 1,..., x n ) g(x 1,... x n ) of a continuous random vector X = (X 1,..., X n ) 127

131 Definition 3.9: (Expectation of a function) Let (X 1,..., X n ) be a continuous random vector with joint pdf f X1,...,X n (x 1,..., x n ) and g : R n R a real-valued continuous function. The expectation of the function g of the random vector is defined to be E[g(X 1,..., X n )] = g(x 1,..., x n ) f X1,...,X n (x 1,..., x n ) dx 1... dx n. 128

132 Remarks: For a discrete random vector (X 1,..., X n ) the analogous definition is E[g(X 1,..., X n )] = g(x 1,..., x n ) P (X 1 = x 1,..., X n = x n ), where the summation is over all realizationen of the vector Definition 3.9 includes the expectation of a univariate random variable X: Set n = 1 and g(x) = x E(X 1 ) E(X) = + xf X(x) dx Definition 3.9 includes the variance of X: Set n = 1 and g(x) = [x E(X)] 2 Var(X 1 ) Var(X) = + [x E(X)]2 f X (x) dx 129

133 Definition 3.9 includes the covariance of two variables: Set n = 2 and g(x 1, x 2 ) = [x 1 E(X 1 )] [x 2 E(X 2 )] Cov(X 1, X 2 ) = + + [x 1 E(X 1 )][x 2 E(X 2 )]f X1,X 2 (x 1, x 2 ) dx 1 dx 2 Via the covariance we define the correlation coefficient: Corr(X 1, X 2 ) = Cov(X 1, X 2 ) Var(X 1 ) Var(X 2 ) General properties of expected values, variances, covariances and the correlation coefficient Class 130

134 Now: Expectation and variances of random vectors Definition 3.10: (Expected vector, covariance matrix) Let X = (X 1,..., X n ) be a random vector. The expected vector of X is defined to be E(X) = E(X 1 ). E(X n ) The covariance matrix of X is defined to be Cov(X) = Var(X 1 ) Cov(X 1, X 2 )... Cov(X 1, X n ) Cov(X 2, X 1 ) Var(X 2 )... Cov(X 2, X n ) Cov(X n, X 1 ) Cov(X n, X 2 )... Var(X n ).. 131

135 Bemerkung: Obviously, the covariance matrix is symmetric per definition Now: Expected vectors and covariance matrices under linear transformations of random vectors Let X = (X 1,..., X n ) be a n-dimensional random vector A be an (m n) matrix of real numbers b be an (m 1) column vector of real numbers 132

136 Obviously: Y = AX + b is an (m 1) random vector: Y = a 11 a a 1n a 21 a a 2n a m1 a m2... a mn X 1 X 2. X n + b 1 b 2. b m = a 11 X 1 + a 12 X a 1n X n + b 1 a 21 X 1 + a 22 X a 2n X n + b 2. a m1 X 1 + a m2 X a mn X n + b m 133

137 The expected vector of Y is given by E(Y) = a 11 E(X 1 ) + a 12 E(X 2 ) a 1n E(X n ) + b 1 a 21 E(X 1 ) + a 22 E(X 2 ) a 2n E(X n ) + b 2. a m1 E(X 1 ) + a m2 E(X 2 ) a mn E(X n ) + b m = AE(X) + b The covariance matrix of Y is given by Cov(Y) = Var(Y 1 ) Cov(Y 1, Y 2 )... Cov(Y 1, Y n ) Cov(Y 2, Y 1 ) Var(Y 2 )... Cov(Y 2, Y n ) Cov(Y n, Y 1 ) Cov(Y n, Y 2 )... Var(Y n ) (Proof: Class) = ACov(X)A 134

138 Remark: Cf. the analogous results for univariate variables: E(a X + b) = a E(X) + b Var(a X + b) = a 2 Var(X) Up to now: Expected values for unconditional distributions Now: Expected values for conditional distributions (cf. Definition 3.6, Slide 117) 135

139 Definition 3.11: (Conditional expected value of a function) Let (X, Y ) be a continuous random vector with joint pdf f X,Y (x, y) and let g : R 2 R be a real-valued function. The conditional expected value of the function g given X = x is defined to be E[g(X, Y ) X = x] = + g(x, y) f Y X (y) dy. 136

140 Remarks: An analogous definition applies to a discrete random vector (X, Y ) Definition 3.11 naturally extends to higher-dimensional distributions For g(x, y) = y we obtain the special case E[g(X, Y ) X = x] = E(Y X = x) Note that E[g(X, Y ) X = x] is a function of x 137

141 Example: Consider the joint pdf f X,Y (x, y) = { x + y, for (x, y) [0, 1] [0, 1] 0, elsewise The conditional distribution of Y given X = x is given by f Y X (y) = x + y x + 0.5, for (x, y) [0, 1] [0, 1] 0, elsewise For g(x, y) = y the conditional expectation is given as E(Y X = x) = 1 0 y x + y x dy = 1 x ( x ) 138

142 Remarks: Consider the function g(x, y) = g(y) (i.e. g does not depend on x) Denote h(x) = E[g(Y ) X = x] We calculate the unconditional expectation of the transformed variable h(x) We have 139

143 E {E[g(Y ) X = x]} = E[h(X)] = + h(x) f X(x) dx = = + E[g(Y ) X = x] f X(x) dx [ + + g(y) f Y X (y) dy ] f X (x) dx = + + g(y) f Y X (y) f X(x) dy dx = + + g(y) f X,Y (x, y) dy dx = E[g(Y )] 140

144 Theorem 3.12: Let (X, Y ) be an arbitrary discrete or continuous random vector. Then and, in particular, E[g(Y )] = E {E[g(Y ) X = x]} E[Y ] = E {E[Y X = x]}. Now: Three important rules for conditional and unconditional expected values 141

145 Theorem 3.13: Let (X, Y ) be an arbitrary discrete or continuous random vector and g 1 ( ), g 2 ( ) two unidimensional functions. Then 1. E[g 1 (Y ) + g 2 (Y ) X = x] = E[g 1 (Y ) X = x] + E[g 2 (Y ) X = x], 2. E[g 1 (Y ) g 2 (X) X = x] = g 2 (x) E[g 1 (Y ) X = x]. 3. If X and Y are stochastically independent we have E[g 1 (X) g 2 (Y )] = E[g 1 (X)] E[g 2 (Y )]. 142

146 Finally: Moment generating function for random vectors Definition 3.14: (Joint moment generating function) Let X = (X 1,..., X n ) be an arbitrary discrete or continuous random vector. The joint moment generating function of X is defined to be m X1,...,X n (t 1,..., t n ) = E [ e t ] 1 X t n X n if this expectation exists for all t 1,..., t n with h < t j < h for an arbitary value h > 0 and for all j = 1,..., n. 143

147 Remarks: Via the joint moment generating function m X1,...,X n (t 1,..., t n ) we can derive the following mathematical objects: the marginal moment generating functions m X1 (t 1 ),..., m Xn (t n ) the moments of the marginal distributions the so-called joint moments 144

148 Important result: (cf. Theorem 2.23, Slide 85) For any given joint moment generating function m X1,...,X n (t 1,..., t n ) there exists a unique joint cdf F X1,...,X n (x 1,..., x n ) 145

149 3.4 The Multivariate Normal Distribution Now: Extension of the univariate normal distribution Definition 3.15: (Multivariate normal distribution) Let X = (X 1,..., X n ) be an continuous random vector. X is defined to have a multivariate normal distribution with parameters µ 1 σ 2 µ =. and Σ 1 σ 1n =....., µ n σ n1 σn 2 if for x = (x 1,..., x n ) R n its joint pdf is given by { f X (x) = (2π) n/2 [det(σ)] 1/2 exp 1 2 (x µ) Σ 1 (x µ) }. 146

150 Remarks: See Chang (1984, p. 92) for a definition and the properties of the determinant det(a) of the matrix A Notation: X N(µ, Σ) µ is a column vector with µ 1,..., µ n R Σ is a regular, positive definite, symmetric (n n) matrix Role of the parameters: E(X) = µ and Cov(X) = Σ 147

151 Joint pdf of the multiv. standard normal distribution N(0, I n ): { φ(x) = (2π) n/2 exp 1 } 2 x x Cf. the analogy to the univariate pdf in Definition 2.24, Slide 91 Properties of the N(µ, Σ) distribution: Partial vectors (marginal distributions) of X also have multivariate normal distributions, i.e. if then X = [ X1 X 2 ] N ([ µ1 µ 2 ] X 1 N(µ 1, Σ 11 ) X 2 N(µ 2, Σ 22 ), [ Σ11 Σ 12 Σ 21 Σ 22 ]) 148

152 Thus, all univariate variables of X = (X 1,..., X n ) have univariate normal distributions: X 1 N(µ 1, σ 2 1 ) X 2 N(µ 2, σ 2 2 ). X n N(µ n, σ 2 n) The conditional distributions are also (univariately or multivariately) normal: X 1 X 2 = x 2 N ( µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ), Σ 11 Σ 12 Σ 1 22 Σ 21 Linear transformations: Let A be an (m n) matrix, b an (m 1) vector of real numbers and X = (X 1,..., X n ) N(µ, Σ). Then AX + b N(Aµ + b, AΣA ) ) 149

153 Example: Consider X N(µ, Σ) ([ 0 N 1 ], [ Find the distribution of Y = AX + b where [ ] [ ] A =, b = It follows that Y N(Aµ + b, AΣA ) ]) In particular, Aµ + b = [ 3 6 ] and AΣA = [ ] 150

154 Now: Consider the bivariate case (n = 2), i.e. X = (X, Y ), E(X) = We have [ µx µ Y ], Σ = [ σ 2 X σ XY σ Y X σ 2 Y ] σ XY = σ Y X = Cov(X, Y ) = σ X σ Y Corr(X, Y ) = σ X σ Y ρ The joint pdf follows from Definition 3.15 with n = 2 f X,Y (x, y) = 1 2πσ X σ Y 1 ρ 2 exp (Derivation: Class) [ (x µx ) 2 σ 2 X 1 2 ( 1 ρ 2) 2ρ(x µ X)(y µ Y ) + (y µ Y ) 2 ]} σ X σ Y σy 2 151

155 f X,Y (x, y) for µ X = µ Y = 0, σ x = σ Y = 1 and ρ = fhx,yl y 2 0 x

156 f X,Y (x, y) for µ X = µ Y = 0, σ x = σ Y = 1 and ρ = fhx,yl y 2 0 x

157 Remarks: The marginal distributions are given by X N(µ X, σ 2 X ) and Y N(µ Y, σ 2 Y ) interesting result for the normal distribution: If (X, Y ) has a bivariate normal distribution, then X and Y are independent if and only if ρ = Corr(X, Y ) = 0 The conditional distributions are given by X Y = y N Y X = x N (Proof: Class) ( ( µ X + ρ σ X (y µ Y ), σx 2 σ Y µ Y + ρ σ Y σ X (x µ X ), σ 2 Y ( 1 ρ 2 )) ( 1 ρ 2 )) 154

158 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n R,..., g k : R n R Find the joint distribution of the k random variables Y 1 = g 1 (X 1,..., X n ),..., Y k = g k (X 1,... X n ) (i.e. find f Y1,...,Y k and F Y1,...,Y k ) 155

159 Example: Consider as given X 1,..., X n with f X1,...,X n Consider the functions g 1 (X 1,..., X n ) = n i=1 X i and g 2 (X 1,..., X n ) = 1 n n X i i=1 Find f Y1,Y 2 with Y 1 = n i=1 X i and Y 2 = 1 n ni=1 X i Remark: From the joint distribution f Y1,...,Y k we can derive the k marginal distributions f Y1,... f Yk (cf. Chapter 3, Slides 106, 107) 156

160 Aim of this chapter: Techniques for finding the (marginal) distribution(s) of (Y 1,..., Y k ) 157

161 4.1 Expectations of Functions of Random Variables Simplification: In a first step, we are not interested in the exact distributions, but merely in certain expected values of Y 1,..., Y k Expectation two ways: Consider as given the (continuous) random variables X 1,..., X n and the function g : R n R Consider the random variables Y = g(x 1,..., X n ) and find the expectation E[g(X 1,..., X n )] 158

162 Two ways of calculating E(Y ): or E(Y ) = E(Y ) = + y f Y (y) dy g(x 1,..., x n ) f X1,...,X n (x 1,... x n ) dx 1... dx n (cf. Definition 3.9, Slide 128) It can be proved that Both ways of calculating E(Y ) are equivalent choose the most convenient calculation 159

163 Now: Calculation rules for expected values, variances, covariances of sums of random variables Setting: X 1,..., X n are given continuous or discrete random variables with joint density f X1,...,X n The (transforming) function g : R n R is given by g(x 1,..., x n ) = n x i i=1 160

164 In a first step, find the expectation and the variance of Y = g(x 1,..., X n ) = n X i i=1 Theorem 4.1: (Expectation and variance of a sum) For the given random variables X 1,..., X n we have and Var n X i = i=1 E n i=1 n X i = i=1 Var(X i ) + 2 n i=1 E(X i ) n n i=1 j=i+1 Cov(X i, X j ). 161

165 Implications: For given constants a 1,..., a n R we have E (why?) n i=1 a i X i = n i=1 a i E(X i ) For two random variables X 1 and X 2 we have E(X 1 ± X 2 ) = E(X 1 ) ± E(X 2 ) If X 1,..., X n are stochastically independent, it follows that Cov(X i, X j ) = 0 for all i = j and hence Var n X i = i=1 n i=1 Var(X i ) 162

166 Now: Calculating the covariance of two sums of random variables Theorem 4.2: (Covariance of two sums) Let X 1,..., X n and Y 1,..., Y m be two sets of random variables and let a 1,... a n and b 1,..., b m be two sets of constants. Then Cov n a i X i, m i=1 j=1 b j Y j = n m i=1 j=1 a i b j Cov(X i, Y j ). 163

167 Implications: Var The variance of a weighted sum of random variables is given by n i=1 a i X i = Cov n i=1 a i X i, n j=1 a j X j = n n i=1 j=1 a i a j Cov(X i, X j ) = n i=1 a 2 i Var(X i) + n n i=1 j=1,j =i a i a j Cov(X i, X j ) = n i=1 a 2 i Var(X i) + 2 n n i=1 j=i+1 a i a j Cov(X i, X j ) 164

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n