ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

Size: px

Start display at page:

Download "ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin"

Madeline McKenzie
6 years ago
Views:

1 ACE 562 Fall 2005 Lecture 2: Probability, Random Variables and Distributions Required Readings: by Professor Scott H. Irwin Griffiths, Hill and Judge. Some Basic Ideas: Statistical Concepts for Economists, Ch. 2 in Learning and Practicing Econometrics Mirer. "Random Variables and Probability Distributions" Ch. 9; "The Normal and t Distributions" Ch. 10 in Economic Statistics and Econometrics (readings packet) Gilovich, et. al. "The Hot Hand in Basketball: On the Misperception of Random Sequences." Cognitive Psychology 17(1985): (readings packet) Optional Readings: Mirer. Descriptive Statistics Ch. 3; "Probability Theory" Ch. 8 ACE 562, University of Illinois at Urbana-Champaign 2-1

2 Most sophomores can easily learn to put numbers into a computer and get back statistical results. To do this wisely---to make the statistical analysis valid- --requires a real understanding of the techniques involved. ---T. Mirer Overview Intro stats courses typically cover the following topics: Types of data Descriptive statistics Frequency distributions and histograms Probability Random variables Probability distributions Confidence intervals Hypothesis tests ACE 562, University of Illinois at Urbana-Champaign 2-2

3 A sample of data on wheat acreage planted in the US: Planted Acreage Marketing Year (million acres) 1975/76 74, /77 80, /78 75, /79 65, /80 71, /81 80, /82 88, /83 86, /84 76, /85 79, /86 75, /87 71, /88 65, /89 65, /90 76, /91 77,041 Source: USDA From a statistical viewpoint, how was this data generated? ACE 562, University of Illinois at Urbana-Champaign 2-3

4 Viewing the world through a statistical lens Random variable Data sample Statistics (mean, variance, etc.) Repeated sampling Confidence intervals and hypothesis tests This arrangement highlights the foundational role played by random variables in classical statistical analysis Our study of linear regression fits in this general model Note: I assume you understand probability concepts at the level found in Chapter 8 of Mirer ACE 562, University of Illinois at Urbana-Champaign 2-4

5 Random Variables Random variable: a process that generates data More formally, a random variable is a variable whose value is unknown until it is observed Random implies the existence of some probability distribution defined over the set of all possible values An arbitrary variable does not have a probability distribution associated with its values Think of a random variable as a machine that produces numbers one after another Each number produced is a value, or realization, of the random variable Uncertainty about the next value at any point in time A complete production run produces a set of numbers called a sample ACE 562, University of Illinois at Urbana-Champaign 2-5

6 Discrete Random Variable A discrete random variable can take only a finite number of values, that can be counted by using the positive integers. Examples: Prize money from the following lottery is a discrete random variable: first prize: $1,000 second prize: $50 third prize: $25 since it has only four (a finite number) (count: 1,2,3,4) of possible outcomes: $0.00 $25 $50 $1,000 Outcome from rolling a six-sided die is a discrete random variable, since there are only six (finite) possible outcomes: 1,2, 3, 4, 5, or 6 ACE 562, University of Illinois at Urbana-Champaign 2-6

7 A list of all of the possible values taken by a discrete random variable along with the chances of each outcome occurring is called a probability function or probability density function (pdf) die x f(x) one dot 1 1/6 two dots 2 1/6 three dots 3 1/6 four dots 4 1/6 five dots 5 1/6 six dots 6 1/6 In the discrete case only, f(x) is the probability that X takes on the value x or equivalently, f ( x) = Pr( X = x) f( x ) = Pr( X = x ) t Notation Note: capital X represents the random variable, while lower case x represents a particular value, or realization, or X t ACE 562, University of Illinois at Urbana-Champaign 2-7

8 Probability density functions for discrete random variables have two basic properties: 0 f ( x ) 1 for all t t f x1 f x2 f x T ( ) + ( ) ( ) = 1 In other words, probability of each outcome must be between zero and one and the sum of probabilities for all outcomes is one Probability density functions of discrete random variables may be represented in three equivalent ways 1. Table (always) 2. Graph (always) 3. Equation (sometimes) Consider this example: A hat contains 10 balls, with one labeled 5, two labeled 6, three 7 and four 8 The random variable is the outcome of selecting one ball out of the hat ACE 562, University of Illinois at Urbana-Champaign 2-8

9 ACE 562, University of Illinois at Urbana-Champaign 2-9

10 ACE 562, University of Illinois at Urbana-Champaign 2-10

11 Properties of Discrete Random Variables Descriptive statistics mainly attempt to measure the middle and dispersion of a data sample We are interested in the same properties of random variables Before developing these measures, it is helpful to review some rules of summation, which will be used throughout the course: Rule 1. If X takes on T values x1, x2,..., x T, then T t= 1 x = x + x x t 1 2 T Note that summation is a linear operator, which means it operates term by term Rule 2. If a is a constant, then T t= 1 a = Ta ACE 562, University of Illinois at Urbana-Champaign 2-11

12 Rule 3. If a is a constant, then T T ax = a x = ax + ax ax t t 1 2 T t= 1 t= 1 The arithmetic mean (average) of T values of X is simply an application of this rule T T x = x = x = ( x + x + + x ) L. t t 1 2 T t= 1T T t= 1 T Also, T t= 1 ( x x) = 0 t Rule 4. If f ( x ) is a function of X, then T t= 1 f ( x ) = f( x ) + f( x ) f( x ) t 1 2 T ACE 562, University of Illinois at Urbana-Champaign 2-12

13 We often use an abbreviated form of the summation notation. For example, if f(x) is a function of the values of X, T t= 1 f( x ) = f( x ) + f( x ) + L+ f( x ) t 1 2 = f ( x ) ("Sum over all values of the index t") = t x t t f ( x) ("Sum over all possible values of X") Rule 5. If X and Y are two variables, then T T T ( x + y ) = x + y t t t t t= 1 t= 1 t= 1 Rule 6. If X and Y are two variables, then T T T ( ax + by ) = a x + b y t t t t t= 1 t= 1 t= 1 ACE 562, University of Illinois at Urbana-Champaign 2-13

14 Note: Several summation signs can be used in one expression. Suppose the variable Y takes T values and X takes S values, and let f(x,y) = x + y. Then the double summation of this function is T S T S f ( x, y ) = ( x + y ) t s t s t= 1 s= 1 t= 1 s= 1 To evaluate such expressions work from the innermost sum outward. First set t=1 and sum over all values of s, and so on. That is, To illustrate, let T = 2 and S = 3. Then (, ) = (, ) + (, ) + (, ) f x y f x y f x y f x y t s t 1 t 2 t 3 t= 1 s= 1 t= 1 ( 1, 1) ( 1, 2) ( 1, 3) (, ) + (, ) + (, ) = f x y + f x y + f x y + f x y f x y f x y The order of summation does not matter, so T S T S f ( x, y ) = f( x, y ) t s t s t= 1 s= 1 s= 1 t= 1 ACE 562, University of Illinois at Urbana-Champaign 2-14

15 Mean of a Discrete Random Variable The "middle," or mean, of a discrete random variable is its expected value Often, a special notation is used for the mean μ = E( X) or β = EX ( ) There are two entirely different, but mathematically equivalent, ways of defining the expected value Analytical Definition If X is a discrete random variable which can take the values x 1, x 2,, x T with probability density values f(x 1 ), f(x 2 ),, f(x T ), the expected value of X is computed using the following mathematical expectation formula T E( X) = x f( x ) = x f( x ) + x f( x ) x f( x ) t= 1 t t T T ACE 562, University of Illinois at Urbana-Champaign 2-15

16 Note that the expected value (mean) is determined by weighting all the possible values of X by corresponding probabilities and summing Hence, the mean is a weighted-average of the possible values for the discrete random variable Empirical Definition The expected value of discrete random variable X is the average value from an experiment of producing an infinite number of samples We can use a thought experiment to illustrate the empirical definition First, use the discrete random variable "machine" to generate a single sample of T observations on X ( x1, x2,..., x T ) Next, use the discrete random variable "machine" to generate a very large (infinite) number of samples of size T Now, take the simple arithmetic average of all the x ' s generated in the previous two steps t ACE 562, University of Illinois at Urbana-Champaign 2-16

17 The computed average is the expected value of the discrete random variable and the center of its pdf Note: The above thought experiment is identical to one where we consider taking one sample of infinite size and compute the arithmetic average for this single infinitely-sized sample Analytical vs. Empirical The analytical and empirical definitions produce exactly the same expected value for a discrete random variable The equivalence depends on the number of samples in the empirical case going to infinity When the number of samples goes to infinity, the observations on X occur with frequencies across all samples exactly equal to the corresponding probabilities [ f(x t )] in the analytical case ACE 562, University of Illinois at Urbana-Champaign 2-17

18 Variance of a Discrete Random Variable It is essential to have a measure of the dispersion, or variability, of a discrete random variable Again, we can define the variance of a discrete random variable both analytically and empirically Before developing the analytical definition, it is useful to introduce the expectation of a function of discrete random variables If g(x) is a function of the discrete random variable X, then T EgX [ ( )] = gx ( ) f( x) t= 1 t t = gx ( ) f( x) + gx ( ) f( x) gx ( ) f( x) T T Important applications of this result: 1. If g( X) = a, where a is a constant, then Ea ( ) = a ACE 562, University of Illinois at Urbana-Champaign 2-18

19 2. If g( X) = cx, where c is a constant and X is a discrete random variable, then EcX ( ) = cex ( ) 3. If g( X) = a+ cx, where a and c are constants and X is a random variable, then Analytical Definition Ea ( + cx) = a+ cex ( ) To begin, set g( X) = [ X E( X)] 2 2 EgX [ ( )] = EX [ EX ( )] = [ x E( X)] f( x ) + [ x E( X)] f( x ) [ xt E( X)] f( xt) 2 = var( X ) = σ We can think of the variance as the expected value of the squared deviations around the mean of X Or, variance is a weighted-average of the squared distances between the values of X and the mean of the random variable ACE 562, University of Illinois at Urbana-Champaign 2-19

20 Empirical Definition The variance of discrete random variable X is the average squared deviation from the arithmetic mean based on an experiment of producing an infinite number of samples We can again use a thought experiment to illustrate the empirical definition First, use the discrete random variable "machine" to generate a single sample of T observations on X ( x1, x2,..., x T ) Next, use the discrete random variable "machine" to generate a very large (infinite) number of samples of size T Now, take the simple arithmetic average of all the x ' s generated in the previous two steps t Then, compute the squared deviation of each x t from the simple arithmetic average computed above ACE 562, University of Illinois at Urbana-Champaign 2-20

21 Finally, the average of the squared deviations is the variance of the discrete random variable and the dispersion of it s pdf Analytical vs. Empirical The analytical and empirical definitions produce exactly the same variance for a discrete random variable The equivalence depends on the number of samples in the empirical case going to infinity When the number of samples goes to infinity, the observations on X occur with frequencies across all samples exactly equal to the corresponding probabilities [ f(x t )] in the analytical case ACE 562, University of Illinois at Urbana-Champaign 2-21

22 ACE 562, University of Illinois at Urbana-Champaign 2-22

23 ACE 562, University of Illinois at Urbana-Champaign 2-23

24 Discrete Random Variables and Samples of Data Review If we generate a fixed number of observations from a discrete random variable, we have a sample The data for a sample can be: summarized by a relative frequency distribution analyzed with descriptive statistics At this point, it is helpful to re-emphasize that a random variable is a theoretical construct Examples are simply there to illustrate the concept In reality, we never see a random variable, only the resulting data sample Used to represent some kind of physical, economic or sociological process ACE 562, University of Illinois at Urbana-Champaign 2-24

25 ACE 562, University of Illinois at Urbana-Champaign 2-25

26 ACE 562, University of Illinois at Urbana-Champaign 2-26

27 Continuous Random Variables A continuous random variable can take any real value (not just whole numbers) in at least one interval on the real line In other words, an infinite number of possible values may occur for the next realization of the variable Examples: gross national product (GNP) money supply price of eggs household income expenditure on clothing The probability distribution of a continuous random variable has two components: 1. A statement of the possibly occurring values 2. A function that gives information about probabilities These two components have to be altered relative to the case of a discrete random variable ACE 562, University of Illinois at Urbana-Champaign 2-27

28 Consider the case of a continuous random variable where the probability for each outcome is the same The probability of any outcome occurring is 1, as there is an infinite number of possible outcomes The implication is that the probability of any individual value is zero! No longer useful to focus on f ( x ) = Pr( X = x ) since it equals zero t t For continuous random variables, we can only relate probabilities to an interval of X If the interval is [a,b], we want to compute Pr( a X b) Hence, a continuous random variable uses area under a curve rather than the height, f(x), to represent probability ACE 562, University of Illinois at Urbana-Champaign 2-28

29 ACE 562, University of Illinois at Urbana-Champaign 2-29

30 ACE 562, University of Illinois at Urbana-Champaign 2-30

31 Formally, the area under a curve is the integral of the equation that generates the curve: b Pr( a X b) = f( x) dx x= a In other words, for continuous random variables the integral of f(x), and not f(x) itself, defines the area and, therefore, the probability So, what does f(x) represent in the continuous case? It is still the height of the pdf, but it does not equal probability Instead, it represents the relative likelihood of a value of X occurring Note, that f(x) may take on a value greater than one ACE 562, University of Illinois at Urbana-Champaign 2-31

32 Mean and Variance of a Continuous Random Variable Once again, we want to develop measures of the middle and "dispersion" of a random variable In the continuous case, to derive the analytical definitions, we must resort to integral calculus Specifically, for a continuous random variable defined over the range x min and x max xmax EX ( ) = β = xf( xdx ) x= xmin xmax 2 2 var( X ) = σ = [ x E( X)] f( x) dx x= xmin The empirical definitions are the same in both the continuous and discrete random variable cases, the only twist is the type of variable generating the infinite number of samples ACE 562, University of Illinois at Urbana-Champaign 2-32

33 Normal Probability Density Function for Continuous Random Variables Many continuous random variables have pdf s that share a common mathematical form One of the most important families of distributions in econometrics is the famous normal distribution If X is a continuous random variable that can take on values from to, it has a normal pdf of the following form 2 1 ( X β ) f( x) = exp X 2 < < 2 2πσ 2σ We say that X is distributed normally with mean β 2 2 and variance σ [ X ~ N( β, σ )] Each member of the normal distribution family has 2 a different β (mean) and/or σ (variance) ACE 562, University of Illinois at Urbana-Champaign 2-33

34 ACE 562, University of Illinois at Urbana-Champaign 2-34

35 ACE 562, University of Illinois at Urbana-Champaign 2-35

36 Standard Normal Distribution It is possible to determine probabilities for areas under normal curves using integral calculus This is a cumbersome task, which can be avoided by taking advantage of the fact that any one normal distribution can be obtained from another by compressing or expanding it shifting it to the left or right We can formalize this idea by considering the following transformation of the continuous random variable X X β Z = σ The new random variable, Z, has the following pdf 2 1 Z f( x) = exp Z 2π < < 2 We say that Z is distributed standard normally with mean 0 and variance 1 [ ~ (0,1) Z N ] ACE 562, University of Illinois at Urbana-Champaign 2-36

37 ACE 562, University of Illinois at Urbana-Champaign 2-37

38 The equivalence of probability statements between a normal and standard normal distribution can also be shown mathematically Let x l and x u represent two values of the random variable X, and we would like to determine Pr( x X x ) l We can subtract the mean β from each term and not change the meaning of the probability statement (since β is a constant) Pr( x β X β x β) l Based on the same logic, we can divide each term by σ x β X β x β σ σ σ Pr l u Which can be re-stated as, Pr( Z Z Z ) Hence, the equivalence of the probability statements l u u u ACE 562, University of Illinois at Urbana-Champaign 2-38

39 Transformations of Random Variables The results from the previous section can be generalized We will assume the following linear transformation Y = a+ bx t t For discrete random variables, the transformation re-labels the possible values of X, but does not affect the probabilities of their occurring For continuous random variables, the transformation also re-labels the possible values of X, but does not affect the probability of Y and X being in corresponding intervals ACE 562, University of Illinois at Urbana-Champaign 2-39

40 ACE 562, University of Illinois at Urbana-Champaign 2-40

41 ACE 562, University of Illinois at Urbana-Champaign 2-41

42 Mean and Variance With the Linear Transformation When the following transformation is applied to either a discrete or continuous random variable Y = a+ bx t t The mean and variance of Y can be computed from the following relations β Y = a+ bβ X σ = b σ Y X σ Y = b σ X These relations can be proven using the original equations used to define expected value and variance ACE 562, University of Illinois at Urbana-Champaign 2-42

43 Joint Probability Density Functions Previously, we have worked with a single random variable that generates numbers The generation process is governed by the probability distribution of the random variable Now, we want to envision a more complex process where two numbers are generated Throwing a pair of dice simultaneously Determination of corn and soybean futures prices at the Chicago Board of Trade Key is that two numbers are produced simultaneously, but separately identifiable To generalize, we can think of a process where the output is the combination of numbers for two random variables Y and X The probability of observing the different combinations of Y and X is given by the joint probability density function, f(x,y) ACE 562, University of Illinois at Urbana-Champaign 2-43

44 ACE 562, University of Illinois at Urbana-Champaign 2-44

45 Joint Probability Density Function for Discrete Random Variables f(x,y) Y 6 10 X ACE 562, University of Illinois at Urbana-Champaign 2-45

46 ACE 562, University of Illinois at Urbana-Champaign 2-46

47 Formal Definitions The marginal probability density functions, f(x) and f(y), for discrete random variables, can be obtained by summing over f(x,y) with respect to the values of Y to obtain f(x) summing over f(x,y) with respect to the values of X to obtain f(y) The marginal distributions are the same thing as the regular univariate pdf s for Y and X The term marginal is used because the univariate pdf s are displayed in the margin of joint pdf tables ACE 562, University of Illinois at Urbana-Champaign 2-47

48 The conditional probability density functions of X given Y = y are f( x y) = Pr( X = x Y = y) = f ( xy, ) f ( y) The conditional probability density functions of Y given X = x are f( y x) = Pr( Y = y X = x) = f ( x, y) f ( x) In each of the above cases, think of fixing X or Y at some value and then determining the pdf ACE 562, University of Illinois at Urbana-Champaign 2-48

49 Independence Again, consider two discrete random variables X and Y that have a joint pdf f(x,y) Assume that all of the conditional distributions, f ( x y t ), are the same Implies that the probability of obtaining different values of X is not affected by the simultaneously determined value of Y We can then say that X and Y are independent random variables Mathematically, independence can be stated as f ( x y ) = f( x) for all t t From this result we can derive a highly useful implication of independence called the multiplication rule ACE 562, University of Illinois at Urbana-Champaign 2-49

50 Start by re-stating the definition of conditional probability f( x y) = f ( x, y) f( y) Independence allows us to substitute for f ( x y ) as follows f( x) = f ( x, y) f( y) Re-arranging f ( x, y) = f( x) f( y) Note that this condition holds for each and every pair of values x and y Also, the multiplication rule generalizes to more than two random variables and continuous random variables ACE 562, University of Illinois at Urbana-Champaign 2-50

51 Covariance and Correlation In econometrics, a key issue is the relationship between variables The covariance between two random variables, X and Y, measures the linear association between them cov( X, Y) = E[ X E( X)][ Y E( Y)] To more explicitly define covariance for the case of a discrete random variable, we need the following rules of summation T S T f ( x, y ) = [ f( x, y ) + f( x, y ) f( x, y )] t s t 1 t 2 t s t= 1 s= 1 t= 1 T S S T f ( x, y ) = f( x, y ) t s t s t= 1 s= 1 s= 1 t= 1 Then cov( X, Y) = E[ X E( X)][ Y E( Y)] T S = [ x E( X)][ y EY ( )] f ( x, y ) t= 1 s= 1 t s t s ACE 562, University of Illinois at Urbana-Champaign 2-51

52 ACE 562, University of Illinois at Urbana-Champaign 2-52

53 ACE 562, University of Illinois at Urbana-Champaign 2-53

54 Covariance is difficult to interpret because it depends on the units of measurement of X and Y The correlation between two random variables X and Y overcomes this problem by creating a pure number falling between 1 and +1 ρ = cov( X, Y ) var( X ) var( Y ) Independent random variables have zero covariance and, therefore, zero correlation. The converse is not true because X and Y may be related in a non-linear manner ACE 562, University of Illinois at Urbana-Champaign 2-54

55 Mean and Variance of a Weighted-Sum of Random Variables There are many situations in econometrics where we want to create a new random variable as the weighted-sum of other random variables For two discrete random variables, this can be expressed as W = a1x + ay 2 To develop the mean and variance of W, we need some more results from the rules of summation T T T x + y = x + y t t t t t= 1 t= 1 t= 1 = x + x x + y + y y 1 2 T 1 2 T T T T ax + by = ax + by t t t t t= 1 t= 1 t= 1 = ax + ax ax + by + by by 1 2 T 1 2 T ACE 562, University of Illinois at Urbana-Champaign 2-55

56 The expected value of the weighted sum of random variables is the sum of the expectations of the individual terms Since expectation is a linear operator, it can be applied term by term EW ( ) = EaX ( 1 ) + EaY ( 2 ) = ae( X) + aey ( ) 1 2 So, the expectation of a weighted-sum of random variables is the weighted-sum of the expectations of the individual random variables The variance of W is found as follows, var( W) = E[ W E( W)] = EaX [ + ay EaX ( + ay)] = a var( X) + a var( Y) + 2abcov( X, Y) Notice what happens when we assume X and Y are independent var( W) = a var( X) + a var( Y) ACE 562, University of Illinois at Urbana-Champaign 2-56

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results