Probability. Table of contents

Size: px

Start display at page:

Download "Probability. Table of contents"

Olivia Heath
5 years ago
Views:

1 Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous distributions L. Trapani MSc Induction - Probability 1

2 Overview After having examined descriptive statistics in TN 2, we are now ready to tackle the foundations of probability and random variables, covering the tools which are necessary for regression L. Trapani MSc Induction - Probability 2

3 1 Probability: some important definitions The following definitions are important since they form the vocabulary that is commonly used in statistics L. Trapani MSc Induction - Probability 3

4 Definition: outcomes The first important notion that needs to be introduced is that of outcome, which is the building block for probability theory Any experiment can produce a certain number of results, which will be termed the outcomes of the experiment Examples can be very simple or very complicated: Tossing a coin produces the two outcomes {H,T}, head or tail Rolling a dice can result in 6 possible outcomes, namely {1,2,3,4,5,6} The random extraction of a real number (using e.g. Excel) in the interval [0,1] can result in a quite large number of outcomes L. Trapani MSc Induction - Probability 4

5 Definition: sample space The set of all possible outcomes is called the sample space In the previous examples, the sample spaces are, respectively: The set {H,T} which contains only two elements; The set {1,2,3,4,5,6} which contains 6 elements The interval [0,1] which contains an infinite number of elements L. Trapani MSc Induction - Probability 5

6 Definition: events Events are combinations of elementary outcomes, or, to phrase this alternatively, combinations of elements in the sample space of an experiment It makes sense to consider events: consider the case of rolling a dice one could be interested in betting on a single outcome, i.e. one may wish to bet on 4 to come out but it is also possible that one may wish to bet on the more complicated event that an even number comes out this would mean betting on the subset {2,4,6} of the sample space L. Trapani MSc Induction - Probability 6

7 Thus, as said events are combinations of elementary outcomes, which enable the statistician to work out the probability associated to all possible results of an experiment, not merely the elementary outcomes Other than all the possible combinations of all the possible elementary outcomes, there are two special events that are considered in probability theory: the so-called null event, represented as an empty set: this means that as a result of the experiment nothing happens (roughly speaking, you roll the dice and the dice disappears) the sample space itself is an event: it means that as a result of the experiment something happens (rather obviously, this event happens for sure) L. Trapani MSc Induction - Probability 7

8 The collection of all possible events (including the sample space itself and the null event) is also referred to as the σ-algebra of the experiment Probabilities will be calculated with respect to events Note that identifying all possible events could be tedious Particularly, it could be proven that if the number of elementary outcomes is n, then the total number of possible events is 2 n there are 4 events when one flips a coin: {nothing happens}, {head}, {tail} and {either head or tail} there are as many as 64 events when one rolls a dice there is an infinite number of events when sampling randomly from the interval [0,1] L. Trapani MSc Induction - Probability 8

9 Definition: random variable Before defining probabilities it is necessary to introduce a useful, albeit rather abstract, notion: the concept of random variable Formally, let Σ denote the σ-algebra of an experiment and let w denote one event in Σ Then we can define the random variable X as a function that is fed with an event in Σ and returns a real number; formally X X : Σ R ( w) R for all w Σ L. Trapani MSc Induction - Probability 9

10 Much less rigorously, but more intuitively, a random variable is a device that sticks a bar-code onto each and every event: all events in Σ must be processed by the random variable each event will have its own unique identification code the notion of random variable is formal, but convenient: instead of describing events as e.g. I flip a coin and I get either head or tail, it provides a concise representation, by means of a number, of each event L. Trapani MSc Induction - Probability 10

11 Definition: discrete versus continuous random variables The examples discussed above (coin, dice, random number generator) differ in one important point: the dimension of the sample space (and, therefore, the dimension of the σ-algebra) When the number of elements in the sample space (and thus in the σ-algebra) is either finite or at least countable, the corresponding random variable is said to be a discrete random variable Otherwise, if the elements in the sample space are not countable (not merely infinity, but so many that cannot even be counted), the random variable is referred to as a continuous random variable L. Trapani MSc Induction - Probability 11

12 Definition: probability Consider an experiment and its associated random variable X it s not important whether X is continuous or discrete, but for simplicity assume X is discrete Then the function P(X=x) is the probability that X takes the value x, or the likelihood that the event represented by x happens if P(X=x) is a number in the interval [0,1] the sum of the probabilities across all outcomes is 1 this corresponds to the sample space, i.e. the event something happens, and something should happen for sure and some other higher level conditions L. Trapani MSc Induction - Probability 12

13 Roughly speaking, to calculate the probability of an event (which is ultimately what one wants to do) it is necessary to start from the outcomes, define events and random variables, then calculate P The expression P(X=x)=0.35, for example, reads the chance that X takes the value x, and thus the chance that the event represented by x happens, is 35% L. Trapani MSc Induction - Probability 13

14 2 Distributions Distributions are another important building block in order to compute probabilities L. Trapani MSc Induction - Probability 14

15 Cumulative distribution functions L. Trapani MSc Induction - Probability 15

16 Exhibit 1 Cumulative probability distribution and the additivity of probability (the probability of sales growing between 20 and 30% 34%) Prob % 0% 10% 20% 30% 40% 50% L. Trapani MSc Induction - Probability 16

17 Cumulative distribution functions We can define the cumulative distribution function (CDF; also known as the distribution function) of a random variable X, as the probability that, in a given random experiment, the random variable X will take a value (also known as a realization) which is equal or less than (x) Formally, the CDF is a function F such that F(x)=P[X<x] Thus, one should expect that F(x) takes values in the interval [0,1] Note that here (x) indicates values on the x-axis (see Exhibit 1) L. Trapani MSc Induction - Probability 17

18 The additivity of probability The CDF can never decrease because this would imply negative probabilities, and probabilities are measured on a [0,1] scale Consider the following example, illustrated in Exhibit 1: you estimate that sales of a given product have a 50% probability of growing by 20% or less over the next year also, you estimate that the probability of sales growing by 30% or less cannot be lower than 50% L. Trapani MSc Induction - Probability 18

19 The additivity of probability This is an intuitive consequence of the additivity of probability The probability of a sales growth of 30% or less equals the probability of sales growing at 20% or less plus the probability of sales growing between 20 and 30% L. Trapani MSc Induction - Probability 19

20 Formally, this means that let b be a real number, and a be another real number such that a<b then the event X<b can be decomposed as {X<a OR a<x<b} intuition leads to [ X < b] = P[ X < a] + P[ a < X b] P < and therefore F ( b) = F( a) + P[ a < X < b] [ < X < b] = F( b) F( a) P a L. Trapani MSc Induction - Probability 20

21 Exhibit 1 Cumulative probability distribution and the additivity of probability (the probability of sales growing between 20 and 30% 34%) Prob % 0% 10% 20% 30% 40% 50% L. Trapani MSc Induction - Probability 21

22 3 Discrete distributions. Moments The general notion of CDF will be spelt out more rigorously, by making a distinction between discrete and continuous random variables. The binomial distribution is used to illustrate the features of the CDF of a discrete random variable and also to introduce the concepts of expected value, variance and standard deviation of discrete probability distribution The binomial distribution has a large number of applications in business, especially finance (binomial models for the pricing of options and for investment valuation) L. Trapani MSc Induction - Probability 22

23 Discrete distributions We have been using discrete random variables with the cointossing and dice-rolling examples We know that a discrete random variable can take only finite values Each of the possible values can be attributed a probability measure, known as probability mass function (PMF) The probabilities attributed to all the possible outcomes must clearly add up to one because probability is measured on a [0,1] scale L. Trapani MSc Induction - Probability 23

24 Binomial distribution This distribution calculates the probabilities of the possible outcomes in problems: with a fixed number of tests or trials (e.g. we flip a coin a few times) when the possible outcome of a single trial can be only success (X = 1) or failure (X = 0) when trials are independent (non correlated) and when the probability of success (p) is constant throughout the experiment p = Prob [X = 1] L. Trapani MSc Induction - Probability 24

25 The binomial distribution is a parametric distribution, with two parameters: n = number on trials p = probability of success. In the simple case of tossing a fair coin: P[X = 1 (head)] = 0.5 As a preliminary example, consider a binomial random variable with parameters (n = 10, p = 0.5), as visualized in Exhibits 2, 3 and 4 Probabilities were computed using the BINOMDIST Excel function L. Trapani MSc Induction - Probability 25

26 Exhibit 2 Probability mass function and cumulative probability of tossing 10 times a fair coin (n = 10, p = 50%) No of heads Prob mass Cumulative prob % % % % % % % % % % % % % % % % % % % % % % L. Trapani MSc Induction - Probability 26

27 Exhibit 3 Binomial probability mass function of tossing 10 times a fair coin (n = 10, p = 50%). 30% Prob 20% 10% 0% Successes L. Trapani MSc Induction - Probability 27

28 Exhibit 4 Cumulative probability of tossing 10 times a fair coin (n = 10, p = 50%). 100% Prob 80% 60% 40% 20% 0% Successes L. Trapani MSc Induction - Probability 28

29 Observe the relationship between the CDF (exhibit 4) and the PMF (exhibit 3): they are actually completely equivalent, in that they are defined up to a one-to-one transformation Consider e.g. the probability that X=6 this can be calculated immediately from the PMF, and the table in exhibit 2 shows that P[X=6]=20.5%; however, exactly the same result could be obtained by employing the CDF: P [ X = 6] = P[ X 6] P[ X 5] = F ( 6) F( 5) = 82.8% 62.3% = 20.5% L. Trapani MSc Induction - Probability 29

30 It is interesting to note what the CDF looks like: its graph is not continuous, it has discontinuities at each event the size of each jump is the probability that the corresponding event will take place This is a property of discrete random variable: it can be said that a discrete random variable is a random variable whose CDF is piecewise continuous (actually, piecewise flat) with jumps corresponding to each event Also, note that as previously observed the CDF = 0 on the left of the lowest possible outcome the CDF = 1 on the right of the highest possible outcome L. Trapani MSc Induction - Probability 30

31 Binomial distribution We can now study some features of the binomial distributions We can see that, with p = 0.5, the probability mass function is perfectly symmetrical around the mean see exhibit 3 However, the binomial distribution can be used to model random variables where p 0.5 the case p=0.5 is just a possible example and it is a good representation for cases such flipping a fair coin, however, e.g. in management applications it is often the case that p needs not be 0.5. Or in finance, if you buy 10 shares and all you are interested in is whether you make or lose money L. Trapani MSc Induction - Probability 31

32 Binomial distribution In exhibits 5 & 6 the probability of success are respectively p=0.4 and p=0.7, This generates skewed mass functions L. Trapani MSc Induction - Probability 32

33 Exhibit 5 Binomial probability mass function for (n = 10, p = 40%). 30% Prob 20% 10% 0% Successes L. Trapani MSc Induction - Probability 33

34 Exhibit 6 Binomial probability mass function for (n = 10, p = 70%). 30% Prob 20% 10% 0% Successes L. Trapani MSc Induction - Probability 34

35 Moments of a discrete random variable In what follows, we shall give some very general definitions that apply to all possible discrete random variables, and subsequently apply them to the binomial distribution Formally, consider a random variable which can take values X 1,,X n ; we define uncentered moments of order k, denoted as E[X k ], as E [ ] k X = p( x ) n j= 1 centered (around the mean μ) moments of order k, denoted as E[(X-μ) k ], as E i x k i [( ) ] n k X μ = p( x )( x μ) j= 1 i i k L. Trapani MSc Induction - Probability 35

36 Thus, moments are weighted averages of X k, where weights are given by the probability that the random variable X takes the value X j Therefore, moments are similar to descriptive statistics: think of the centered moment of order k=2; this uses squared deviations (X j -μ) 2, and thus it is possible to make a parallel with the sample variance however, whilst the sample variance does not weigh the terms (X j -μ) 2, moments attach a small weight (i.e. a small probability) onto unlikely events and a bigger weight onto more likely events L. Trapani MSc Induction - Probability 36

37 Expected value of a discrete random variable The expected value is defined as the uncentered moment of order k=1, and the equation is E n [ X ] = p( ) j= 1 x j x j L. Trapani MSc Induction - Probability 37

38 The expected value is the analogue to the mean; only, in this case, each event is weighted by its probability Note that the expected value needs not be one of the values taken by the random variable, and thus it should not be interpreted as the most likely value to come out of the experiment This can be seen if one calculates the expected outcome from rolling a dice, which is 3.5 (a number that can obviously never come out) We have done together the exercise based on EVENT X=1 X=2 X=3 probability L. Trapani MSc Induction - Probability 38

39 Possible interpretations are: If each event is viewed as a mass point whose weight is given by the probability, then the expected value is the equilibrium point of the probability distribution The expected value is a rational prediction of the outcome of an experiment L. Trapani MSc Induction - Probability 39

40 Variance of a discrete random variable The variance of a discrete random variable is the centered moment of order 2 It is defined as the weighted average of the squared deviations of the possible n outcomes from their expected value Var [ ] n 2 = p( x )( ) j x j μ ( X ) = E ( X μ) j= 1 2 L. Trapani MSc Induction - Probability 40

41 Some algebra could lead to the following (quite useful and widely used) alternative formulation in terms of the expected value and the second order uncentered moment Var = E = E = E [ ] 2 ( X ) = E ( X μ) = [ ] 2 2 X 2μX + μ [ ] [ ] [ ] 2 2 X 2μE X + E μ [ ] 2 2 X μ L. Trapani MSc Induction - Probability 41

42 Last, like in the case of the variance as a descriptive statistic, it is common to employ the standard deviation as well, defined as σ = Var( X ) L. Trapani MSc Induction - Probability 42

43 The variance is the analogue to the sample variance; only, in this case, each event is weighted by its probability, like in the expected value case Note that the variance is always non-negative particularly, the variance is equal to zero if there is no deviation from the mean, i.e. if there is only one event tat occurs for sure (with probability one) The variance is interpreted as the degree of dispersion around the mean: the smaller the variance, the less likely extreme events are to happen thus, the variance is customarily employed as a measure of risk/uncertainty L. Trapani MSc Induction - Probability 43

44 Linear transformations Main question: what are expected value and variance of ax + b L. Trapani MSc Induction - Probability 44

45 L. Trapani MSc Induction - Probability 45 From the formula that defines the expected value, it is possible to observe that the expected value of a linear transformation of a random variable is the linear transformation of the expected value, i.e. [ ] ( )[ ] ( ) ( ) ( ) ( ) [ ] b X ae x p b x x p a b x p ax x p b ax x p b ax E n j j n j j j n j j n j j j n j j j + = + = + = + = + = = = = =

46 Also, the following useful formula for variance the of a linear transformation of a random variable could be proved ( ) 2 ax + b = a Var( X ) for all a b Var, This formula states that the variance increases quadratically with changes of scale of magnitude a and that it is invariant to shifts of any size b Think of the following intuitive argument: (1) when one buys 10 shares, one s risk is 100 times as much (don t put all your eggs in just one basket) (2) adding an asset with no risk at all can not make your portfolio more or less risky L. Trapani MSc Induction - Probability 46

47 Higher order moments Two frequently employed centered moments are the skewness, which is the centered moment of order 3 E [( ) ] n 3 X μ = p( x )( ) j x j μ the kurtosis, which is the centered moment of order 4 E j= 1 [( ) ] n 4 X μ = p( x )( ) j x j μ j= L. Trapani MSc Induction - Probability 47

48 The skewness is commonly interpreted as a measure of asymmetry of the distribution the distribution is symmetric if the skewness is equal to zero (and vice versa, skewness is equal to zero if the distribution is symmetric) a positive skewness implies that the right tail of the distribution is longer than the left tail, or, in more intuitive terms, that extreme positive events are more likely than extreme negative events likewise, a negative skewness entails a longer left tails L. Trapani MSc Induction - Probability 48

49 The kurtosis is interpreted as a measure of the heaviness of tails or, in more intuitive terms, as a measure of the impact of extreme events Note that the kurtosis is never negative: when the kurtosis is close to zero, this means that all events are clustered together around the mean: there are no extreme events the larger the kurtosis, the larger the likelihood of significantly extreme events (no matter whether these are positive or negative) thus, kurtosis is interpreted as another possible measure of risk L. Trapani MSc Induction - Probability 49

50 Quite often, these moments are rescaled by the variance, being defined as Skewness: 3 n X μ 1 E = [ Var( X )] 3/ 2 σ j= 1 p ( )( μ) x j x j 3 Kurtosis: 4 n X μ 1 E = [ Var( X )] 2 σ j= 1 p ( )( μ) x j x j 4 L. Trapani MSc Induction - Probability 50

51 4 Continuous random variables. Probability density functions In this section we shall examine how probability is defined for continuous random variables. This is an important issue, with applications in finance. Whilst the mathematics can be slightly more complicated than in the discrete random variable case, continuous random variables are used with extreme frequency and arise very often indeed in statistics and econometrics. L. Trapani MSc Induction - Probability 51

52 Continuous probability distributions Exhibit 7 shows a the CDF of a continuous random variable (it s a normal standard distribution) defined over the real line (the x- axis) L. Trapani MSc Induction - Probability 52

53 Exhibit 7 Continuous probability distribution. The numbers for this figure are calculated using the Excel function NORMSDIST L. Trapani MSc Induction - Probability 53

54 It is interesting to note that in this case the CDF is indeed continuous This is a property of continuous random variables: it can be said that a continuous random variable is a random variable whose CDF is continuous with no jumps etc Note that the probability of X being smaller or equal to a value (x) increases continuously with (x) moving from (- ) to (+ ) Note also that lim lim x x F F ( x) = 1 ( x) = 0 L. Trapani MSc Induction - Probability 54

55 The CDF has the same use and interpretation as in the discrete case It could be interesting to evaluate the probability that the random variable X is equal to a value x, i.e. P[X=x] This is equivalent to finding out the probability mass function, and it can be done by using the CDF Particularly, try to calculate P[X=1]; this would be given by P [ X = 1 ] = F( ) F( ) = 0 L. Trapani MSc Induction - Probability 55

56 Thus, the probability that the continuous random variable X is equal to a single value is always zero, i.e. P[X=x]=0 for all x We must recall that the real line (the x-axis) has an infinite number of points in any interval, no matter how small (for example from to ) Therefore, as an intuitive argument, it is impossible to attribute a probability value to any single point, say to X = 1.51 (the sum of such probabilities would be ) Therefore, continuous distributions do not have a mass probability function We can of course attribute a probability measure to intervals, such as from 1.50 to 1.60 L. Trapani MSc Induction - Probability 56

57 Probability density function A widely employed notion is that of probability density function (PDF) The PDF has a one-to-one correspondence with the CDF, and therefore it serves the same purpose and contains the same information Formally, the PDF of a random variable X is a function f(x) defined as d dx ( x) F( x) f = L. Trapani MSc Induction - Probability 57

58 Thus (once again formally), the PDF contains the same information as the CDF In order to see this, recall that the reverse operation of differentiation is the integral Thus, the probability that the random variable X lies between the real numbers a and b with a<b is equivalently calculated as P P [ a < X < b ] = F ( b ) F ( a ) [ a < X < b ] = f ( x ) a b dx L. Trapani MSc Induction - Probability 58

59 Exhibit 8 Probability density function of the standard normal distribution (μ = 0, σ = 1) Area = L. Trapani MSc Induction - Probability 59

60 The integral of the PDF also equals the area under the curve It s therefore common practice to visualize probabilities by shading the relevant area underneath the PDF Recall that probability is measured from zero to one [0, 1]. Formally, this means that in order for f(x) to be a PDF it must hold + f ( x) dx = 1 This implies that the area under the whole probability density function must equal one, as visualized in exhibit 8 L. Trapani MSc Induction - Probability 60

61 It should be emphasized that the function f(x) does NOT return the probability that the random variable X is equal to x: as we know, it holds that P[X=x]=0 for all possible x The PDF is merely a tool to compute the probability that X belongs to an interval; such calculation is performed by means of integrals L. Trapani MSc Induction - Probability 61

62 Moments of a continuous random variable Just like in the discrete case, moments play a pivotal role here as well Interpretation of moments is the same as in the discrete random variable case, albeit formulas differ uncentered moments of order k are defined as E [ ] k k X x f ( x) centered moments of order k are defined as E + = dx + [( ) ] k k X μ = ( x μ) f ( x) dx L. Trapani MSc Induction - Probability 62

63 The following moments are commonly employed Expected value + [ X ] = xf ( x ) μ = E dx Variance 2 σ + [( ) ] 2 2 X μ = ( x μ) f ( x) = E dx L. Trapani MSc Induction - Probability 63

64 L. Trapani MSc Induction - Probability 64 Skewness Kurtosis ( ) + = dx x f x X E 3 3 σ μ σ μ ( ) + = dx x f x X E 4 4 σ μ σ μ

65 5 Examples of continuous distribution. The normal distribution In this section we shall see the most frequently employed continuous distributions These are frequently employed in statistics; the most important one, that deserves particular attention, is the normal distribution L. Trapani MSc Induction - Probability 65

66 The normal distribution A random variable X which can take values over the whole real axis is said to have a normal distribution with mean μ and variance σ 2 if the equation of its PDF is f 1 1 x μ ( ) 2 σ x = e for x (, + ) 2πσ 2 2 L. Trapani MSc Induction - Probability 66

67 Exhibit 9 Normal density functions: f(z) is the standard normal, the other two curves are characterized by f(μ, σ) f(z) f(0, 0.5) f(1, 0.5) L. Trapani MSc Induction - Probability 67

68 Thus, the normal distribution is characterized by only two parameters: the mean μ and the standard deviation σ We denote a random variable following the normal distribution as X~N[μ,σ 2 ] Note that: the normal density function is perfectly symmetrical around the mean the higher the standard deviation the more the curve is spread around the mean L. Trapani MSc Induction - Probability 68

69 The standard normal distribution In the special case when the mean is zero (μ = 0) and the standard deviation is one (σ = 1), the distribution is known as standard normal distribution The equation in this cases reduces to f x ( ) 2 x = e for x (, + ) 2π L. Trapani MSc Induction - Probability 69

70 There is unfortunately no explicit equation for the CDF of the normal However, probabilities as P[X<x] can be computed by employing numerical methods this can be done with in Excel with the function NORMDIST (and NORMSDIST for the standard normal ) The cumulative normal distribution is usually denoted with F(x), and with Ф(x) for the standard normal L. Trapani MSc Induction - Probability 70

71 Properties of the normal distribution Moments it is not needed to calculate mean or variance, as these are given in the definition of the normal distribution: for a random variable X~N[μ,σ 2 ] we have E E [ X ] = μ ( X μ ) [ 2 ] 2 = σ L. Trapani MSc Induction - Probability 71

72 the normal distribution is symmetric around the mean μ, and therefore there is no need to calculate the skewness, which is zero it would be possible (but rather cumbersome) to calculate the kurtosis of X however, it can be shown that for each μ and σ 2 E 4 X μ σ = 3 L. Trapani MSc Induction - Probability 72

73 It is customary to employ the number 3 as a benchmark for kurtosis: random variables with kurtosis smaller than 3 have a steeper distribution than the normal, and the PDF is more clustered around the mean (these distributions are referred to as leptokurtic) thin tails random variables with kurtosis bigger than 3 have a flatter distribution than the normal, and the PDF is more dispersed around the mean (these distributions are referred to as platikurtic) fat tails L. Trapani MSc Induction - Probability 73

74 Probability under the tails The normal PDF is perfectly symmetrical, and therefore the probabilities under the left and the right tails are equal For example, the areas identified in exhibit 10 contain the following probabilities: Area Probability Left of -2 or right of Between -2 and Left of -1 or right of Between -1 and L. Trapani MSc Induction - Probability 74

75 Exhibit 10 Normal standard density function. Standard deviations are on the x-axis. Calculated with Excel function NORMSDIST L. Trapani MSc Induction - Probability 75

76 Alternatively, one can identify some probability values (usually: 1%, 2.5%, 5%) and then calculate the number of standard deviations from the mean (left or right) that correspond to these values From the table below, e.g., one can realize that 5% of the probability mass is in the interval [x< μ-2σ ] OR [x> μ+2σ] thus, the interval [μ-2σ, μ+2σ] contains 95% of the mass probability another name for these intervals is confidence intervals Probability N of std from mean 0.100% % % % L. Trapani MSc Induction - Probability 76

77 Standardization Most often, in order to compute probabilities from a random variable X~N[μ,σ 2 ], it is common to standardise X, i.e. to transform X into a N[0,1] variable Standardization is performed as follows subtract the mean μ from X, thereby getting the new, zero mean random variable X-μ rescale X-μ by the standard deviation, getting the new, unit variance random variable (X-μ)/σ then the new random variable (X-μ)/σ has a standard normal distribution L. Trapani MSc Induction - Probability 77

78 L. Trapani MSc Induction - Probability 78 This could be quickly proved recalling the formulas for expected value and variance of linear transformation of random variables This procedure can be used e.g. to calculate the probability P[a<X<b] for a given random variable X~N[μ,σ 2 ] [ ] = < < = < < σ μ σ μ π σ μ σ μ σ μ b a u du e b X a P b X P a

79 6 A note on multivariate random variables In this section we shall briefly see how to represent and calculate the probabilities for more than just one random variable L. Trapani MSc Induction - Probability 79

80 Multivariate random variables So far, the attention has been focused onto single, univariate random variables However, it may be interesting to consider the joint behaviour of two or more random variables e.g. consider a portfolio of 10 shares: returns to each of these shares could be viewed as 10 different random variables, and the portfolio is described by the joint behaviour of the 10 shares In what follows, we shall focus on the bivariate case, wherein only two random variables are analysed jointly, say X and Y; most of our analysis will consist of definitions L. Trapani MSc Induction - Probability 80

81 Some useful definitions the joint CDF of X and Y is a function F(x,y) which returns the probability that, at the same time, X<x AND Y<y formally: F ( x, y) = P[ X < x ANDY < y] L. Trapani MSc Induction - Probability 81

82 the joint PDF of X and Y is a function f(x,y) defined as f ( x, y) = 2 ( x, y) F x y the joint PDF can be used to calculate probabilities such as [ a < X < b, c < Y < d ] = f ( x y) P, b a d c dxdy L. Trapani MSc Induction - Probability 82

83 Independence A very important definition is that of independence between two random variables X and Y Intuitively, X and Y are said to be independent if the probability that X lies in any interval [a,b] is not affected by the value(s) taken by Y thus, knowing the behaviour of Y is not useful to estimate P[a<X<b] Formally, the two random variables X and Y are independent if their joint PDF is given by the product of the PDF of each variable, i.e. f ( x, y) = f ( x) f ( y) X Y L. Trapani MSc Induction - Probability 83

84 Conditioning Conditioning is an important notion that becomes useful in regression analysis Intuitively, when X and Y are NOT independent, knowing what values Y takes can indeed help to know the probability that X lies within an interval [a,b] The notation P[a<X<b Y=y] reads the probability that X lies in [a,b] given/knowing that Y has taken the value y The tool to compute such probabilities is the so-called conditional PDF of X given Y, denoted as f X Y (x y) and used as follows [ a < X < b Y = y] f ( x y) P = X Y b a dx L. Trapani MSc Induction - Probability 84

85 Moments of multivariate distributions Some useful rules on the moments of multivariate distributions: Expectations: it is worth pointing out that the expected value of the sum of random variables is the sum of the expected values, i.e. [ X + Y ] = E[ X ] E[ Y ] E + L. Trapani MSc Induction - Probability 85

86 Covariance: like in the descriptive statistics case, covariance is a measure of linear association between two random variables: E = the definition is Cov(X,Y)=E[(X-μ X )(Y-μ Y )], where μ represents the expected value the formula is [( X μ )( Y μ )] = ( x μ )( y μ ) f ( x, y) X Y [ ] μ ( ) XμY = xyf x, y dxdy μx Y E XY X Y μ dxdy L. Trapani MSc Induction - Probability 86

87 An important fact to bear in mind about the covariance is that if two random variables X and Y are independent, then their covariance is equal to zero However, the opposite does not generally hold i.e. covariance equal to zero is not enough to have independence Just like in the descriptive statistics case, one could define the correlation between X and Y, a number between -1 and 1 which represents the degree of linear association between X and Y ρ XY ( X, Y ) ( X ) Var ( Y ) Cov =, ρ XY Var [ 1,1] L. Trapani MSc Induction - Probability 87

88 Variance: it is important to remember that the variance of the sum of two value of the sum of independent random variables is given by ( X ± Y ) = Var( X ) + Var( Y ) 2Cov( X, Y ) Var ± here, independence would entail ( X ± Y ) = Var( X ) Var( Y ) Var + L. Trapani MSc Induction - Probability 88

89 Note that even though all notation for the multivariate case has been presented with respect to continuous random variables, it can be extended to the discrete case For example: consider flipping a coin, and introduce the random variable X which is equal to 1 if HEAD comes out and 0 if TAIL comes out of course P[X=1]=P[X=0]=0.5 thus, the expected value of X is E[X]=0.5 in light if this, consider a binomial random variable Y which counts the number of heads in 10 independent trials then E[Y]=E[X]+.+E[X]=10*E[X]=5 L. Trapani MSc Induction - Probability 89

90 7 Other examples of continuous distribution After discussing the normal distribution, in this section we shall see three other frequently employed continuous distributions All these random variables are derived from the standard normal, which shows the importance of this distribution L. Trapani MSc Induction - Probability 90

91 Other frequently employed distributions In statistics, there are some other distributions that arise quite frequently These are defined as transformations of the standard normal distribution, and they normally have rather complicated PDFs (which however do not need to be remembered) The most frequently used distributions are: the chi-squared distribution Student s t distribution Fisher s F distribution L. Trapani MSc Induction - Probability 91

92 The chi-squared distribution Consider the random variable X~N[0,1] The new random variable Y=X 2 is said to have a chi-squared distribution with one degree of freedom, denoted as χ 2 (1) More generally, consider n independent standard normal random variables X 1,,X n ; then the new variable Y = n j= 1 2 X j is said to have a chi-squared distribution with n degree of freedom, denoted as χ 2 (n) L. Trapani MSc Induction - Probability 92

93 Exhibit 11. The chi-square distribution for various degrees of freedom k L. Trapani MSc Induction - Probability 93

94 Some notes: independence is a crucial requirement; the chi-square takes only positive values (it s a square and thus no negative values are allowed); the chi-square has mean different from zero (and positive) and is not symmetric around the mean, as it can be seen in Exhibit 11 it could be proved that if X has a χ 2 (n) distribution, then E[X]=n it could be proved that if X has a χ 2 (n) distribution and n is very large, then the PDF of X is very similar to the PDF of a normally distributed random variable L. Trapani MSc Induction - Probability 94

95 Other distributions: Student s t distribution Consider two independent random variables X and Y, such that X~N[0,1] and Y has a χ 2 (n) distribution Then the new random variable Z defined as Z = X Y n is said to have a Student s t distribution with n degrees of freedom, denoted as t n L. Trapani MSc Induction - Probability 95

96 Exhibit 12. Student s t distribution for different values of the degree of freedom k L. Trapani MSc Induction - Probability 96

97 Some notes: the Student distribution can take positive and negative values; the Student distribution has mean strictly equal to zero is perfectly symmetric around the mean, just like the normal L. Trapani MSc Induction - Probability 97

98 it could be proved that if Z has a t n distribution, then its kurtosis is always greater than 3 thus, Student distribution could be useful to represent those situations where there are frequently extreme events it could be proved that if Z has a t n distribution and n is very large, then the PDF of Z is very similar to the PDF of a normally distributed random variable L. Trapani MSc Induction - Probability 98

99 Exhibit 13. Comparison between normal and Student s t distribution normal distribution t-distribution L. Trapani MSc Induction - Probability 99

100 Other distributions: Fisher s F distribution Consider two independent random variables X and Y, such that X has a χ 2 (n 1 ) distribution Y has a χ 2 (n 2 ) distribution Then the new random variable Z defined as X Z = Y n n 1 2 is said to have a Fisher s F distribution with n 1 and n 2 degrees of freedom, denoted as F n1,n2 L. Trapani MSc Induction - Probability 100

101 Exhibit 14. Fisher s F distribution for various values of the degree of freedom d 1 (numerator) and d 2 (denominator) L. Trapani MSc Induction - Probability 101

102 Some notes: the F distribution can take only positive values: it s the ratio of two random variables that can take only positive values has non zero mean and is not symmetric the quantities n 1 and n 2 are sometimes referred to as, respectively the degree of freedom of the numerator the degree of freedom of the denominator the F distribution is used very frequently in econometrics whenever hypothesis testing is carried out L. Trapani MSc Induction - Probability 102

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results