Discrete Probability Distributions

Similar documents
Continuous Probability Distributions

What is Probability? Probability. Sample Spaces and Events. Simple Event

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

p. 4-1 Random Variables

Statistics for Managers Using Microsoft Excel (3 rd Edition)

Part (A): Review of Probability [Statistics I revision]

Distribusi Binomial, Poisson, dan Hipergeometrik

Binomial and Poisson Probability Distributions

Conditional Probability

Joint Distribution of Two or More Random Variables

Vehicle Freq Rel. Freq Frequency distribution. Statistics

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x.

More on Distribution Function

Chapter 2 Random Variables

Discrete Probability Distribution

Statistical Experiment A statistical experiment is any process by which measurements are obtained.

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Probabilistic models

Homework 4 Solution, due July 23

Random Models. Tusheng Zhang. February 14, 2013

7. Be able to prove Rules in Section 7.3, using only the Kolmogorov axioms.

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Discrete Random Variable

STA 2023 EXAM-2 Practice Problems From Chapters 4, 5, & Partly 6. With SOLUTIONS

Probabilistic models

Topic 5: Discrete Random Variables & Expectations Reference Chapter 4

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Discrete random variables and probability distributions

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

4/17/2012. NE ( ) # of ways an event can happen NS ( ) # of events in the sample space

Week 12-13: Discrete Probability

Chapter 13, Probability from Applied Finite Mathematics by Rupinder Sekhon was developed by OpenStax College, licensed by Rice University, and is

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Probability Year 9. Terminology

Final Review: Problem Solving Strategies for Stat 430

Quantitative Methods for Decision Making

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Introduction to bivariate analysis

STAT 414: Introduction to Probability Theory

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Introduction to bivariate analysis

Lecture Lecture 5

Probability and random variables. Sept 2018

Discrete Distributions

Stats Review Chapter 6. Mary Stangler Center for Academic Success Revised 8/16

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Statistics for Economists. Lectures 3 & 4

CS 361: Probability & Statistics

Probability and Probability Distributions. Dr. Mohammed Alahmed

Chapter 4 : Discrete Random Variables

Expected Value 7/7/2006

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin

Lecture 2: Review of Probability

Chapter 1: Revie of Calculus and Probability

2011 Pearson Education, Inc

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

Probability Year 10. Terminology

Chapter 3. Discrete Random Variables and Their Probability Distributions

STA 2023 EXAM-2 Practice Problems. Ven Mudunuru. From Chapters 4, 5, & Partly 6. With SOLUTIONS

STAT/MA 416 Answers Homework 4 September 27, 2007 Solutions by Mark Daniel Ward PROBLEMS

Section 7.2 Homework Answers

EECS 126 Probability and Random Processes University of California, Berkeley: Spring 2015 Abhay Parekh February 17, 2015.


Q1 Own your learning with flash cards.

Part 3: Parametric Models

ST 371 (V): Families of Discrete Distributions

1. If X has density. cx 3 e x ), 0 x < 0, otherwise. Find the value of c that makes f a probability density. f(x) =

Discrete Random Variables

1 Basic continuous random variable problems

324 Stat Lecture Notes (1) Probability

Introduction to Probability, Fall 2009

Question Bank In Mathematics Class IX (Term II)

HYPERGEOMETRIC and NEGATIVE HYPERGEOMETIC DISTRIBUTIONS

Probability, Random Processes and Inference

Probability 5-4 The Multiplication Rules and Conditional Probability

STAT 418: Probability and Stochastic Processes

To understand and analyze this test, we need to have the right model for the events. We need to identify an event and its probability.

Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics.

Probability. VCE Maths Methods - Unit 2 - Probability

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

We investigate the scientific validity of one aspect of the Chinese astrology: individuals

6.2 Introduction to Probability. The Deal. Possible outcomes: STAT1010 Intro to probability. Definitions. Terms: What are the chances of?

Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

1 Basic continuous random variable problems

ECE 450 Lecture 2. Recall: Pr(A B) = Pr(A) + Pr(B) Pr(A B) in general = Pr(A) + Pr(B) if A and B are m.e. Lecture Overview

Instructor Solution Manual. Probability and Statistics for Engineers and Scientists (4th Edition) Anthony Hayter

REPEATED TRIALS. p(e 1 ) p(e 2 )... p(e k )

Properties of Summation Operator

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

CHAPTER 4 PROBABILITY AND PROBABILITY DISTRIBUTIONS

MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Math P (A 1 ) =.5, P (A 2 ) =.6, P (A 1 A 2 ) =.9r

Transcription:

Discrete Probability Distributions Ka-fu WONG 24 August 2007 Abstract When there are many basic events, the probability of these events are better described as a probability function. Such function can of course be visualized as a histogram or a table. Nevertheless, students are expected to get used to probability functions. Do not avoid it. Be prepared. More complicated probability function will be introduced in a chapter following this. We will try to illustrate the difficult concepts with many examples. However, there is no need to stop at these examples. Try to construct your own and discuss with your TA and instructors. With the basic knowledge of probability reviewed in last chapter, we are equipped to discuss some commonly used statistical concepts and probability distributions. These concepts and probability distributions are used often by economists in Economics and Finance. We live in an uncertain world, almost everything to happen tomorrow is a random event. Examples are numerous: Will it rain tomorrow? Will the US central bank (i.e., Federal Reserves) raise its target interest rate (i.e., Federal Funds rate) in the next FOMC meeting? Will the price of my favorite stock go up tomorrow? By how much will the economy grow this year? How much time will it take for me to go to school tomorrow? Some of these events are numerical in nature. For those that are not numerical in nature, they can be coded into a numerical value. Since the numerical values of these random events are not known at the time we ask these questions, it is a random variable. Definition 1 (Random Variables): A random variable is a numerical value determined by the outcome of an experiment. A random variable is often denoted by a capital letter, e.g., X or Y. Example 1 (Random variables): The following variable X is a random variable that map numerical values to some events of weather. X event 1 rainy or cloudy 2 cloudy 3 cloudy or sunny 1

However, this type of random variables are not convenient to maneuver because the events are not mutually exclusive. 1 The following classification of events and mapping to the random variable are preferred: X event 1 rainy 2 cloudy 3 sunny Note that these events are mutually exclusive and exhaustive. There are many different ways of mapping the value of X to the event. Nothing prevents us from adopting a different mapping such as X event -1 rainy 0 cloudy 5 sunny However, unless we have strong reasons, it is conventional to map X to some consecutive positive integers (i.e., 1,2,3,4,5...) when the events are discrete. How should we describe a random variable that has not yet happened? We can list all the possible outcomes of the random variable and how likely each outcome is going to happen. Definition 2 (Probability distribution): A probability distribution is the listing of all possible outcomes of an experiment and the corresponding probability. These events should be mutually exclusive and exhaustive. Once the probability distribution is known, we can compute the probability of composite events, such as, P rob(a & B) and P rob(a or B) using the probability rules we learn in earlier chapters. According to the numerical values that the random variable can take and their probability of occurrence, we classify random variables as discrete and continuous, and hence discrete probability distribution and continuous probability distribution. 1 Recall that when events A and B are mutually exclusive, we have P (A or B) = P (A) + P (B) and P (A and B) = 0. 2

Definition 3 (Discrete Probability Distributions): A discrete probability distribution describes the probability of random variables that takes discrete values. The number of values (hence, outcome) the random variable takes need not be finite but are generally countable (i.e., it can be countable infinite outcomes). Discrete probability distribution is also known as probability mass function. Example 2 (The twelve zodiac signs): Chinese astrology has 12 animals representing a 12- year cycle of the lunar calendar: Rat, Ox, Tiger, Rabbit, Dragon, Snake, Horse, Sheep, Monkey, Rooster, Dog and Pig. If couples do not have a strong preference over birth years of their children, the probability of randomly drawn person being born in the year of Dragon is the same as in other years. 2 That is, Zodiac sign, X Rat Ox Tiger Rabbit Dragon Snake Horse Goat Monkey Rooster Dog P (X) 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 And, when the Zodiac signs are mapped to numerical values: Zodiac sign, X Rat Ox Tiger Rabbit Dragon Snake Horse Goat Monkey Rooster Dog X 1 2 3 4 5 6 7 8 9 10 12 P (X) 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 Example 3 (Household size in mainland China and Hong Kong): The following is the probability distribution of the family household size in mainland China 2005 (Source: Table 4-14 Family Households by Size and Region (2005), available at http://www.stats.gov.cn/tjsj/ndsj/2006/html/d0414e.xls). Household size, X 1 2 3 4 5 6 7 8 9 10 P (X) 0.107 0.245 0.298 0.192 0.102 0.038 0.011 0.004 0.002 0.001 2 The Far Eastern Economic Review (24 June 1999) reported that 2000 (a Dragon year) was a banner year for births in China. It pointed out that many Chinese considered the combination of a Dragon year and the Millennium a double whammy of good fortune. Consequently, in mainland China, many couples plan to give their cherished only child a head start in life by getting pregnant in 1999, so their babies would be born in 2000 a Dragon Year. 3

0.30 0.25 0.20 P(X) 0.15 0.10 0.05 0.00 1 2 3 4 5 6 7 8 9 >= 10 Household size (X) Note that the household size, X, is generally a positive integer, i.e., {1, 2, 3...}. From the distribution, we can compute the probability of a randomly drawn family has a household size smaller than or equal to 3 as P (X 3) = P (X = 1) + P (X = 2) + P (X = 3) = 0.651. As a comparison, the probability distribution of family household size in 2006 in Hong Kong (Source: Table161: Domestic households by household size and type of quarters, 2006, available at http://www.censtatd.gov.hk/) Household size, X 1 2 3 4 5 6 P (X) 0.165 0.241 0.232 0.227 0.096 0.039 0.30 0.25 0.20 P(X) 0.15 0.10 0.05 0.00 1 2 3 4 5 6 Household size (X) The probability of a randomly drawn family has a household size smaller than or equal to 3, P (X 3) = P (X = 1) + P (X = 2) + P (X = 3) = 0.638, which is not too different from that of 4

Mainland China. Should we conclude that there is no difference between the two distributions? Probably not. Hong Kong actually has more households of size 1 and less of size 3 compared to mainland China. As illustrated in the example above, a discrete probability distribution may be listed in a table, or in a bar chart with x-axis showing the possible values of the random variable and the y-axis showing the probability of the random variable. These two methods are not convenient when X can take many possible values. A better method is to use some function to relate the values the random variable can take, x, and their probability, P (X = x). Example 4 (Probability function for discrete random variables): The following function describe the probability a random variable: P (X) = X 10 for X = 1, 2, 3, 4. The probability can be listed in a table X 1 2 3 4 P (X) 0.1 0.2 0.3 0.4 and in a bar chart, as done in earlier examples. 0.45 0.40 0.35 0.30 P(X) 0.25 0.20 0.15 0.10 0.05 0.00 1 2 3 4 X When the random variable can take many values, listing the probability distribution in a table 5

will prove difficult. An example would be P (X) = 1001 X 500500 for X = 1, 2, 3,..., 1000. Definition 4 (Continuous Probability Distributions): A continuous probability distribution can assume an infinite number of values within a given range or interval(s) for random variables that take continuous values. Example 5 (Continuous Probability Distributions): Depending on the accuracy of measurement, the time it takes a student to travel to class can range from zero to two hours, i.e., X [0, 2]. Continuous probability distribution cannot be listed as in the discrete case because the random variable can take infinite uncountable possible continuous values. Note that in this example, the outcomes generally lie within an interval. In particular, at least theoretically, all of them can take any number inside the interval. They need not be integers, nor do they need to be a multiple of any number. However, in real life, no hairdresser is going to charge us $74.345672. They are often round to the nearest integers. We do not report time as 3.241787 seconds. Measuring time up to the six decimal places is possible but is costly. Thus, depending on the purpose, measurement of time are often round to the nearest seconds or minutes. Of course, the measurement of time will need to be very accurate up to six decimal places in the launch of space shuttle or some science experiments. In short, in real-life applications, what is supposed to be continuous variables are often treated/ approximated/ recorded in discrete values. 1 Features of a Discrete Probability Distribution Probability distribution may be classified according to the number of random variables it describes. Number of random variables Joint distribution 1 Univariate probability distribution 2 Bivariate probability distribution 3 Trivariate probability distribution...... n Multivariate probability distribution 6

These distributions have similar characteristics. We will discuss these characteristics for the univariate and the bivariate distribution. The extension to the multivariate will be straightforward. Theorem 1 (Charateristics of a Univariate Discrete Distribution): Let x 1,..., x N be the list of all possible outcomes (N of them) of a random variable X. 1. The probability of a particular outcome, P (x i ), must lie between 0 and 1.00, i.e. [0, 1]. 2. The sum of the probabilities of all possible outcomes (exhaustive and mutually exclusive) must be 1.00. That is, P (x 1 ) +... + P (x N ) = 1 3. The outcomes are mutually exclusive. That is, for all i not equal to k. P (x i and x k ) = 0. P (x i or x k ) = P (x i ) + P (x k ) Example 6 (Univariate probability distribution): Consider a random experiment in which a fair coin is tossed three times. Let x be the number of heads. Let H represent the outcome of a head and T the outcome of a tail. The possible outcomes for such an experiment are: T T T, T T H, T HT, T HH, HT T, HT H, HHT, HHH. Thus the possible values of x (number of heads) are x outcome P (X = x) 0 TTT 1/8 1 TTH, THT, HTT 3/8 2 THH, HTH, HHT 3/8 3 HHH 1/8 From the definition of a random variable, X as defined in this experiment, is a random variable. Theorem 2 (Characteristics of a Bivariate Discrete Distribution): If X and Y are discrete random variables that take N and M possible values, we may define their joint probability function as P XY (x, y). Let (x 1, y 1 ),..., (x N, y 1 ),..., (x 1, y M ),..., (x N, y M ) be the list of all possible outcomes (NM of them). Usually we put all these into a table like the following: 7

X x 1 x 2 x N-1 x N Total y 1 P XY (x 1,y 1 ) P XY (x 2,y 1 ) P XY (x N-1,y 1 ) P XY (x N,y 1 ) P Y (y 1 ) y 2 P XY (x 1,y 2 ) P XY (x 2,y 2 ) P XY (x N-1,y 2 ) P XY (x N,y 2 ) P Y (y 2 ) Y y M-1 P XY (x N,y M-1 ) P Y (y M-1 ) y M P XY (x 1,y M ) P XY (x 2,y M ) P XY (x N-1,y M ) P XY (x N,y M ) P Y (y M ) Total P X (x 1 ) P X (x 2 ) P X (x N-1 ) P X (x N ) 1 1. The probability of a particular outcome, P XY (x, y) must lie between 0 and 1, i.e. [0, 1]. 2. The outcomes are mutually exclusive. That is, for all i k or j l, P XY ((x i, y j ) and (x k, y l )) = 0 and P XY ((x i, y j ) or (x k, y l )) = P XY (x i, y j ) + P XY (x k, y l ) 3. The marginal probability function of X is P X (x) = M j=1 P XY (x, y j ) = y P XY (x, y). Note that the marginal probability function of X is used when we do not care about the values Y takes. Similarly the marginal probability function of Y is P Y (y) = N i=1 P XY (x i, y) = x P XY (x, y). 4. The sum of all joint probabilities equals 1. N M i=1 j=1 P XY (x i, y j ) = x P XY (x, y) = P XY (x 1, y 1 ) +... + P XY (x N, y M ) = 1 y 8

5. The conditional probability function of X given Y : P (X = x Y = y) if P (Y = y) > 0 P X Y (x y) = 0 if P (Y = y) = 0 For each fixed y this is a probability function for X, i.e. the conditional probability function is non-negative and P X Y (x y) = 1 x By the definition of conditional probability, P X Y (x y) = P XY (x, y)/p Y (y) E.g., P (HSI rises Rainy) = 0.2/0.35. When X and Y are independent, P X Y (x y) equals to P X (x). The relationship of these statements is better visualized in a table, as shown in the examples below. Example 7 (Bivariate distribution I): A bag contains 4 white, 3 red and 5 blue balls. Two are chosen at random without replacement. Let X be the number of red balls chosen and let Y be the number of white balls chosen. X and Y both take possible values 0, 1, and 2. We find that: p(0, 0) = P (X = 0, Y = 0) = P (both blue) = 10/66 p(1, 0) = P (X = 1, Y = 0) = P (one red, one blue) = 15/66, p(0, 1) = P (X = 0, Y = 1) = P (one white, one blue) = 20/66, p(0, 2) = P (X = 0, Y = 2) = P (both white) = 6/66, p(2, 0) = P (X = 2, Y = 0) = P (both red) = 3/66, p(1, 1) = P (X = 1, Y = 1) = P (one red, one white) = 12/66, The remaining combinations of X and Y are impossible. Hence the joint probability table will look like 9

y = 0 y = 1 y = 2 Total x = 0 10/66 20/66 6/66 36/66 x = 1 15/66 12/66 0 27/66 x = 2 3/66 0 0 3/66 Total 28/66 32/66 6/66 1 Example 8 (Bivariate distribution II): The hypothetical joint distribution of the movement of Hang Seng Index (HSI) and weather is shown in the following table. Rainy (y 1 ) Not Rainy (y 2 ) HSI falls (x 1 ) 0.15 0.4 HSI rises (x 1 ) 0.2 0.25 From the table, we can calculate the following quantities: 1. The unconditional probability of X, i.e., P (x), and the unconditional probability of Y, i.e., P (y). y 1 y 2 P (X = x) x 1 0.15 0.4 0.55 x 2 0.2 0.25 0.45 P (Y = y) 0.35 0.65 2. The conditional probability of Y given X, i.e., P Y X (y x), and the conditional probability of X given Y, i.e., P X Y (x y). y 1 y 2 P (X) P (X y 1 ) P (X y 2 ) x 1 0.15 0.4 0.55 0.15/0.35 0.4/0.65 x 2 0.2 0.25 0.45 0.2/0.35 0.25/0.65 P (Y ) 0.35 0.65 P (Y x 1 ) 0.15/0.55 0.4/0.55 P (Y x 2 ) 0.2/0.45 0.25/0.45 3. Stock market movement is not independent of weather: P (x 1 y 1 ) P (x 1 ). Note that the numbers in this example are not based on empirical data. Nevertheless, it remains an interesting subject to check whether weather and stock movement in Hong Kong are correlated. 10

Evidence on the United States is mixed. See, for examples, Hirshleifer and Shumway (2003) and Goetzmann and Zhu (2004). Theorem 3 (Transformation of Random variables): A transformation of random variable(s) results in a new random variable. Example 9 (Transformation of Random variables): If X and Y are random variables, the following variables Z are also random variables: 1. Z = 2X 2. Z = 3 + 2X 3. Z = X 2 4. Z = log(x) 5. Z = X + Y 6. Z = X 2 + Y 2 7. Z = XY Of course, Z = 0 X = 0 is not a random variable, but is sometimes called a degenerated random variable. Example 10 (A linear transformation of a random variable): Let a and b be constants, and X be a random variable. The random variable Z = a + bx will have a probability distribution similar to X. 1. Let X be the gender outcome of drawing a student in a class. The probability distribution is x 1 (male) 2 (female) P (x) 0.4 0.6 Let Z = 2 X. The probability distribution of Z is z 1 (male) 0 (female) P (z) 0.4 0.6 In essence, we are just re-coding the gender variable to a different set of integers. 11

2. Let X be the quiz grade outcome of drawing a student in a class. The probability distribution is x 1 2 3 4 5 P (x) 0.00 0.3 0.4 0.2 0.1 Suppose the professor decides to double the points awarded to all questions in the quiz, i.e., a linear transformation of the grades: Z = X 2. The probability distribution of Z is z 2 4 6 8 10 P (z) 0.00 0.3 0.4 0.2 0.1 3. Let X be the age outcome of drawing a student in a class in 2004. The probability distribution is x 16 17 18 19 20 P (x) 0.01 0.1 0.7 0.13 0.06 Suppose we want to display the result in year of birth, it is as if we are doing a linear transformation of the variable to Z = 2004 X. z 1988 1987 1986 1985 1984 P (z) 0.01 0.1 0.7 0.13 0.06 In essence, a linear transformation may be viewed as a change of unit, say from age to year of birth. It does not change the probability distribution. 4. Let X be the daily stock returns (percentage change in stock price) of a company, which takes discrete values from -5 to 5%. The probability distribution is x -5% -4% -3% -2% -1% 0% 1% 2% 3% 4% 5% P (x) 0.01 0.05 0.09 0.12 0.15 0.16 0.15 0.12 0.09 0.05 0.01 Suppose we invest $20,000 in the stock, what is the probability distribution of the value of our stock x 19,000 19,200 19,400 19,600 19,800 20,000 20,200 20,400 20,600 20,800 21,000 P (x) 0.01 0.05 0.09 0.12 0.15 0.16 0.15 0.12 0.09 0.05 0.01 12

2 Expectations, and expected values The concept of expectation plays a central role in statistics and Economics. Expectation (often known as mean, or first moment) is a measure of the central location of the data. Sometimes, it is also known as the long-run average value of the random variable (i.e., the average of the outcomes of many experiments), and moments. The concept of expectation is widely used in macroeconomics (rational expectation), study of uncertainty in microeconomics (expected utility), and the study of investment (expected portfolio returns). Definition 5 (Expectation, mean, first moment): Let X be a random variable that takes N possible values {x 1,..., x N } with probability distribution {P (x 1 ),..., P (x N )}. The expectation of X is N E(X) = x i P (x i ) i=1 The expectation, E(X), is often denoted by a Greek letter µ (pronounced as mu). Thus, expectation of a random variable is a weighted average of all the possible values of the random variable, weighted by its probability distribution. Example 11 (Expected rate of return): Let X be the daily stock returns (percentage change in stock price) of a company, which takes discrete values from -5 to 5%. The probability distribution is x -5% -4% -3% -2% -1% 0% 1% 2% 3% 4% 5% P (x) 0.01 0.05 0.09 0.12 0.15 0.16 0.15 0.12 0.09 0.05 0.01 The expected rate of return of investing in the stock is 0%. Should we be surprised to find a zero expected rate of return for the probability distribution above? Not really, because the stock return is symmetrically distributed around zero. Simulation 1 (Long-run average interpretation of expectation): Take the setup of the last example. We would like to verify that the expectation can be interpreted as long-run average. Let X be the daily stock returns (percentage change in stock price) of a company, which takes discrete values from -5 to 5%. The probability distribution is x -5% -4% -3% -2% -1% 0% 1% 2% 3% 4% 5% P (x) 0.01 0.05 0.09 0.12 0.15 0.16 0.15 0.12 0.09 0.05 0.01 13

1. Randomly draw a stock return according to the above probability distribution, e.g., 0.01 chance of drawing -0.5%. 2. Repeat drawing from the distribution n times. Compute the average of the stock returns of these 1000 draws. The following table reports the results of the simulations. Number of draws (n) 1 10 30 100 500 1500 5000 10000 Average Return -3.000% 0.200% 0.133% -0.290% -0.126% -0.105% -0.039% -0.001% As expected, as number of observations, n, increases, average return rate of the stock approaches the theoretical expectation, i.e., 0%. Thus, the simulations confirm that expected value has the interpretation of long-run average of the random variable. Definition 6 (Expectation of a random variable from a bivariate distribution): Suppose X and Y are jointly distributed random variables that takes values {(x 1, y 1, ),..., (x N, y M )} with probability distribution {P XY (x 1, y 1 ),..., P XY (x N, y M )}. The expectation of X is N M E(X) = i=1 j=1 x i P XY (x i, y j ) = y xp XY (x, y) x Example 12 (Expectation of a random variable in bivariate distribution): In order to better serve the elderly, providing them with better facilities to enrich their retired lives, a neighborhood committee conducted a census among the people older than 55 living in the neighborhood. Part of the result is summarized in the table below (due to one-child policy, no family consists of more than 5 members): Retiree presence, X Family Size, Y 1=Yes 0=No 1.04.08 2.20.05 3.18.04 4.15.02 5.22.02 14

According to the information, and we assume the structure is similar within the district, what is the expected family size for people older than 55 in the district? 5 2 E(Y ) = y j P XY (x i, y j ) j=1 i=1 = 1 (.04 +.08) + 2 (.2 +.05) + 3 (.18 +.04) + 4 (.15 +.02) + 5 (.22 +.02) = 3.16 Taking the result to its nearest integer, we may give a reasonable expectation of the family size is 3 for families containing at least a member older than 55. We can also compute the expected family size for families containing at least retiree, i.e., X = 1. E(Y X = 1) = (1 0.04+2 0.2+3 0.18+4 0.15+5 0.22)/(0.04+0.2+0.18+0.15+0.22) = 3.39 Example 13 (sum of two random variables): 1. Consider tossing a fair coin once. Let head be coded 1, and tail be coded 0. The probability distribution and the expected value of the random variable are X 1 0 P (X) 0.5 0.5 E(X) = 0.5 1 + 0.5 0 = 0.5 2. Consider tossing a fair coin twice. Let head be coded 1, and tail be coded 0. The joint probability distribution of the two random variables (X 1 as the result of the first toss, X 2 the result of the second toss) and the expected value of the random variable are X 1 = 1 X 1 = 0 X 2 = 1 0.25 0.25 X 2 = 0 0.25 0.25 or (X 1, X 2 ) (1,1) (1,0) (0,1) (0,0) P (X 1, X 2 ) 0.25 0.25 0.25 0.25 E(X 1 ) = 0.5 1 + 0.5 0 = 0.5 15

E(X 2 ) = 0.5 1 + 0.5 0 = 0.5 The total number of the heads in the two tosses is Z = X 1 +X 2. The probability distribution and the expected value of the random variable are Z 0 1 2 P (Z) 0.25 0.5 0.25 E(Z) = 0.25 0 + 0.5 1 + 0.25 2 = 1 This example shows numerically that E(Z) = E(X 1 ) + E(X 2 ), i.e., 1 = 0.5 + 0.5. More generally, we can verify mathematically that E(X + Y ) = E(X) + E(Y ). Example 14 (Expectation of the product of two random variable): The random variables X and Y are jointly distributed as Y = 1 Y = 2 Y = 3 X = 1 0.10 0.05 0.1 X = 4 0.35 0.40 0 Compute E(X), E(Y ), and E(XY ). 1. E(X) = (0.1 + 0.05 + 0.1) 1 + (0.35 + 0.4 + 0) 4 = 3.25. 2. E(Y ) = (0.1 + 0.35) 1 + (0.05 + 0.4) 2 + (0.1 + 0) 3 = 1.65. 3. To compute E(Z), first we will need to make a probability distribution table of Z = XY. Z 1 2 3 4 8 12 P (Z) 0.10 0.05 0.10 0.35 0.40 0 Hence E(Z) = E(XY ) = 0.10(1) + 0.05(2) + 0.10(3) + 0.35(4) + 0.4(8) + 0(12) = 5.1 In this example, it is not difficult to verify that E(XY ) E(X)E(Y ). Simulation 2 (Expectation of product of two random variables): Take the setup of the last example. We would like to verify that the expectation can be interpreted as a long-run average. The random variables X and Y are jointly distributed as 16

Y = 1 Y = 2 Y = 3 X = 1 0.10 0.05 0.1 X = 4 0.35 0.40 0 It will be useful to rearrange the table as (X, Y ) (1,1) (1,2) (1,3) (4,1) (4,2) (4,3) P (X, Y ) 0.10 0.05 0.10 0.35 0.4 0 1. Randomly draw a combination of X and Y from the probability distribution, e.g., 0.4 chance of drawing the pair of X = 4 and Y = 2. Compute the product of Z = XY. 2. Repeat drawing from the distribution 1000 times. Compute the average of X, average of Y, and average of Z of these 1000 draws. The simulation results are shown in the following table, along with the theoretical expectation. Number of draws (n) 1 10 50 200 1000 2000 5000 Theoretical Average X 4.00 3.10 3.10 3.143 3.208 3.226 3.253 3.25 Average Y 2.00 1.40 1.78 1.625 1.644 1.63 1.647 1.65 Average Z 8.00 4.40 5.26 4.88 5.031 5.037 5.114 5.10 It is not difficult to see that the averages approaches their theoretical counterpart as n increases. Definition 7 (Conditional expectation, conditional mean, conditional moment): For two random variables that are jointly distributed with a bivariate probability distribution, the conditional expectation or conditional mean E(X Y = y j ) is computed by the formula: E(X Y = y j ) = x x i P X Y (x y j ) = x 1 P X Y (x 1 y j ) + x 2 P X Y (x 2 y j ) +... + x N P X Y (x N y j ) Sometimes, we write µ X Y =yj = E(X Y = y j ). The unconditional expectation or mean of X is related to the conditional mean. E(X) = y E(X Y = y)p X Y (y) = E[E(X Y )] 17

Example 15 (Conditional Expectation): The random variables X and Y are jointly distributed as Y=1 Y=2 Y=3 X=1 0.10 0.05 0.1 X=4 0.35 0.40 0 Compute the conditional expectation, E(Y X = 4). First, we would need to find the conditional probability: Y 1 2 3 P (Y X = 4) 0.35/(0.35+0.40+0) 0.4/(0.35+0.40+0) 0/(0.35+0.40+0) Using the formula in the definition, we have E(Y X = 4) = y y j P Y X (y x i ) = 1 ([0.35/(0.35 + 0.40 + 0)] + 2 [0.4/(0.35 + 0.40 + 0)] + 3 [0/(0.35 + 0.40 + 0)] = 1.53 Example 16 (I haven t found good real data yet.): [Still looking for a good example...] Theorem 4 (Expectation of a linear transformed random variable): If a and b are constants and X is a random variable, then 1. E(a) = a 2. E(bX) = be(x) 3. E(a + bx) = a + be(x) Proof: In our proof, we will only show the most general case E(a + bx) = a + be(x). E(a + bx) = x = x (a + bx)p (a + bx) (a + bx)p (X) = (a + bx 1 )P (x 1 ) + (a + bx 2 )P (x 2 ) +... + (a + bx N )P (x N ) 18

= ap (x 1 ) + bx 1 P (x 1 ) + ap (x 2 ) + bx 2 P (x 2 ) +... + ap (x N ) + bx N P (x N ) = a[p (x 1 ) + P (x 2 ) +... + P (x N )] + b[x 1 P (x 1 ) + x 2 P (x 2 ) +... + x N P (x N )] = a + be(x) Example 17 (Expectations of linear transformed random variables): 1. Let X be the quiz grade outcome of drawing a student in a class. The probability distribution is x 1 2 3 4 5 P (x) 0.00 0.3 0.4 0.2 0.1 E(X) = 0 1 + 0.3 2 + 0.4 3 + 0.2 4 + 0.1 5 = 3.1 Suppose the professor decides to double the points awarded to all questions in the quiz, i.e., a linear transformation of the grades: Z = X 2. The probability distribution of Z is z 2 4 6 8 10 P (z) 0.00 0.3 0.4 0.2 0.1 E(Z) = 0 2 + 0.3 4 + 0.4 6 + 0.2 8 + 0.1 10 = 6.2 It is not difficult to verify that E(Z) = 2E(X). 2. Let X be the age outcome of drawing a student in a class in 2004. The probability distribution is x 16 17 18 19 20 P (x) 0.01 0.1 0.7 0.13 0.06 E(X) = 0.01 16 + 0.1 17 + 0.7 18 + 0.13 19 + 0.06 20 = 18.13 Suppose we want to display the result in year of birth, it is as if we are doing a linear transformation of the variable to Z = 2004 X. 19

z 1988 1987 1986 1985 1984 P (z) 0.01 0.1 0.7 0.13 0.06 E(X) = 0.01 1988 + 0.1 1987 + 0.7 1986 + 0.13 1985 + 0.06 1984 = 1985.87 It is not difficult to verify that E(Z) = 2004 E(X). Definition 8 (Variance, or central second moment): Let X be a random variable that takes N possible values {x 1,..., x N } with probability distribution {P (x 1 ),..., P (x N )}. The variance of X is N V (X) = E[X E(X)] 2 = (x i E(X)) 2 P (x i ) i=1 The variance, V (X), is often denoted by a Greek letter σ 2 (pronounced as sigma square). Note that variance of a random variable is the expectation of squared deviation of the random variable from its mean. That is, if we define a transformed variable as Z = [X E(X)] 2, V (X) = E(Z). Thus, we will expect the variance of a variable to be similar to the ones of the expectation of a transformed variable. Definition 9 (Variance of a random variable): Suppose X and Y are jointly distributed random variables that take values {(x 1, y 1, ),..., (x N, y M )} with probability distribution {P XY (x 1, y 1 ),..., P XY (x N, y M )}. The variance of X is V (X) = N M i=1 j=1[x i E(X)] 2 P XY (x i, y j ) = y [x i E(X)] 2 P XY (x, y) x Example 18 (Variance of bivariate random variable): Suppose random variables X and Y are jointly distributed as follows: X = 1 X = 2 Y = 0.15.10 Y = 6.35.24 Y = 10.11.05 First the expectation of X is E(X) = 1 (.15 +.35 +.11) + 2 (.1 +.24 +.05) = 1.39 20

Hence we have N M V (X) = [x i E(X)] 2 P XY (x i, y j ) i=1 j=1 = (1 1.39) 2 (.15 +.35 +.11) + (2 1.39) 2 (.1 +.24 +.05) =.2379 Simulation 3 (Variance of two jointly distributed random variables): We want to test whether the variance of a group randomly drawn observations are equal (approximately) to the theoretical value computed above. This process requires a large number of observations, n. We use the same method we assumed in simulating the expectation earlier. 1. Set up a given probability distribution as follows: (X, Y ) (1,0) (1,6) (1,10) (2,0) (2,6) (2,10) P (X, Y ) 0.15 0.35 0.11 0.10 0.24 0.05 2. Repeat drawing from the probability distribution n times. Compute the sample variance of the observations. Below we show the results of our simulation. As expected, the sample variance of X is very close to the theoretical variance in last example, i.e., 0.2379. n 2 10 50 200 1000 2000 5000 V ar(x) 0.000 0.233 0.249 0.249 0.242 0.239 0.239 V ar(y ) 0.000 8.160 9.446 10.838 10.396 10.528 10.647 Students are advised to calculate the theoretical value of V (Y ), and check if the simulation result approaches that value. Definition 10 (Conditional Variance): For bivariate probability distribution, the conditional expectation or conditional mean V (X Y ) is computed by the formula: V (X Y = y j ) = x (x E(X Y = y j )) 2 P X Y (x y j ) = (x 1 E(X Y = y j )) 2 P X Y (x 1 y j ) +... + (x N E(X Y = y j )) 2 P X Y (x N y j ) 21

where E(X Y = y j ) is the conditional expectation of X given Y = y j and P X Y (x y) is the conditional probability of X given y. Theorem 5 (Variance of a linear transformed random variable): If a and b are constants and X is a random variable, then 1. V (a) = 0 2. V (bx) = b 2 V (X) 3. V (a + bx) = b 2 V (X) Proof: In our proof, we will only show the most general case V (a + bx) = b 2 V (X). V (a + bx) = E[(a + bx) (a + be(x))] 2 = E[(bX be(x)] 2 = E[b(X E(X))] 2 = E[b 2 (X be(x) 2 ] = b 2 E[(X be(x) 2 ] = b 2 V (X) Example 19 (Expectations of linear transformed random variables): 1. Let X be the quiz grade outcome of drawing a student in a class. The probability distribution is x 1 2 3 4 5 P (x) 0.00 0.3 0.4 0.2 0.1 E(X) = 0 1 + 0.3 2 + 0.4 3 + 0.2 4 + 0.1 5 = 3.1 V (X) = 0 (1 3.1) 2 +0.3 (2 3.1) 2 +0.4 (3 3.1) 2 +0.2 (4 3.1) 2 +0.1 (5 3.1) 2 = 0.89 Suppose the professor decides to double the points awarded to all questions in the quiz, i.e., a linear transformation of the grades: Z = X 2. The probability distribution of Z is 22

z 2 4 6 8 10 P (z) 0.00 0.3 0.4 0.2 0.1 E(Z) = 0 2 + 0.3 4 + 0.4 6 + 0.2 8 + 0.1 10 = 6.2 V (X) = 0 (2 6.2) 2 +0.3 (4 6.2) 2 +0.4 (6 6.2) 2 +0.2 (8 6.2) 2 +0.1 (10 6.2) 2 = 3.56 It is not difficult to verify that V (Z) = 2 2 V (X) = 4V (X). 2. Let X be the age outcome of drawing a student in a class in 2004. The probability distribution is x 16 17 18 19 20 P (x) 0.01 0.1 0.7 0.13 0.06 E(X) = 0.01 16 + 0.1 17 + 0.7 18 + 0.13 19 + 0.06 20 = 18.13 V (X) = 0.01 (16 18.13) 2 +0.1 (17 18.13) 2 +0.7 (18 18.13) 2 +0.13 (19 18.13) 2 +0.06 (20 18.13) 2 = 0.4931 Suppose we want to display the result in year of birth, it is as if we are doing a linear transformation of the variable to Z = 2004 X. z 1988 1987 1986 1985 1984 P (z) 0.01 0.1 0.7 0.13 0.06 E(X) = 0.01 1988 + 0.1 1987 + 0.7 1986 + 0.13 1985 + 0.06 1984 = 1985.87 V (X) = 0.01 (1988 1985.87) 2 + 0.1 (1987 1985.87) 2 + 0.7 (1986 1985.87) 2 + 0.13 (1985 1985.87) 2 + 0.06 (1984 1985.87) 2 = 0.4931 It is not difficult to verify that E(Z) = 2004 E(X). Definition 11 (Covariance): Covariance between two random variables X and Y measures how two variables move together. It is defined as C(X, Y ) = E[(X E(X))(Y E(Y ))] = (x E(X)(y E(Y ))P XY (x, y) x y 23

Note that the covariance can be written as C(X, Y ) = E[(X E(X))(Y E(Y ))] = E[XY E(X)Y XE(Y ) + E(X)E(Y )] = E[XY ] E[E(X)Y ] E[XE(Y )] + E[E(X)E(Y )] = E[XY ] E(X)E(Y ) E(X)E(Y ) + E(X)E(Y ) = E[XY ] E(X)E(Y ) The concept of covariance is very important in various literature of economics and finance. For instance, in finance, we are interested in how the stock return of a company varies with the market returns (i.e., Capital Asset Pricing Model, or known a CAPM, in short). Note that E(XY ) may be computed following the steps illustrated in Example 14. Example 20 (Expectation of the product of two random variable): The random variables X and Y are jointly distributed as Y = 1 Y = 2 Y = 3 X = 1 0.10 0.05 0.10 X = 4 0.35 0.40 0 It will be useful to rearrange the table as X 1 1 1 4 4 4 Y 1 2 3 1 2 3 P (X, Y ) 0.10 0.05 0.10 0.35 0.4 0 From the table, we compute E(X) = 3.25, E(Y ) = 1.65, and E(XY ) = 5.1. X 1 1 1 4 4 4 Y 1 2 3 1 2 3 X EX -2.25-2.25-2.25 0.75 0.75 0.75 Y EY -0.65 0.35 1.35-0.65 0.35 1.35 (X EX)(Y EY ) 1.4625-0.7875-3.0375-0.4875 0.2625 1.0125 P (X, Y ) 0.1 0.05 0.1 0.35 0.4 0 24

There are two ways of computing the covariance. 1. Cov(X, Y ) = E(X, Y ) E(X)E(Y ) = 5.1 3.25 1.65 = 0.2625. 2. Cov(X, Y ) = x y (x E(X)(y E(Y ))P XY (x, y) = (1.4625)(0.1) + ( 0.7875)(0.05) + ( 3.0375)(0.1) + ( 0.4875)(0.35) + (0.2625)(0.4) + (1.0125)(0) = 0.2625 Thus, we verify that both approaches in computing the covariance are equivalent. Theorem 6 (Covariance of a linear transformed random variable): If a and b are constants and X is a random variable, then 1. C(a, b) = 0 2. C(a, bx) = 0 3. C(a + bx, Y ) = bc(x, Y ) Proof: In our proof, we will only show the most general case C(a + bx, Y ) = bc(x, Y ). C(a + bx, Y ) = E{[(a + bx) (a + be(x))][y E(Y )]} = E{[(bX be(x)][y E(Y )]} = E{[b(X E(X))][Y E(Y )]} = be{[x E(X)][Y E(Y )]} = bc(x, Y ) Theorem 7 (Variance of a sum of random variables): If a and b are constants, X and Y are random variables, then 1. V (X + Y ) = V (X) + V (Y ) + 2C(X, Y ) 2. V (ax + by ) = a 2 V (X) + b 2 V (Y ) + 2abC(X, Y ) Proof: In our proof, we will only show the most general case V (ax + by ) = a 2 V (X) + b 2 V (Y ) + 2abC(X, Y ). V (ax + by ) = E[(aX + by ) (ae(x) + be(y ))] 2 25

= E[aX ae(x) + by be(y )] 2 = E[a(X E(X)) + b(y E(Y ))] 2 = E[a 2 (X E(X)) 2 + b 2 (Y E(Y )) 2 + 2ab(X E(X))(Y E(Y ))] = a 2 E[(X E(X)) 2 ] + b 2 E[(Y E(Y )) 2 ] + 2abE[(X E(X))(Y E(Y ))] = a 2 V (X) + b 2 V (Y ) + 2abC(X, Y ) Example 21 (Variance of sum of random variables):in a class of 52 students, the variances of midterm and final exam scores are 137.2393 and 126.1129 respectively. The covariance of the two scores are 90.34089. If the weight on midterm is 0.4 and the weight on final is 0.6. What is the variance of the overall score? Mapping the notation we used earlier, X can be labeled the midterm exam score, Y the final exam score, a = 0.4, and b = 0.6, so that Z = ax + by is the overall score. With this mapping of notations, we can calculate the variance of overall score using the formula V ar(ax + by ) = a 2 V (X) + b 2 V (Y ) + 2abC(X, Y ) = 0.4 2 137.2393 + 0.6 2 126.1129 + 2 0.4 0.6 90.34089 = 110.7225 The strength of the dependence between X and Y is measured by the correlation coefficient: Definition 12 (Correlation coefficient): The correlation coefficient between two random variables X and Y is Corr(X, Y ) = C(X, Y ) V (X)V (Y ) Note that Corr(X, Y ) always lies between -1 to +1. 1. A correlation of -1 means that the two variables are negatively correlated. Whenever X rises, Y falls. 2. A correlation of +1 means that the two variables are positively correlated. Whenever X rises, Y also rises. 3. A correlation of 0 means that the two variables are not correlated. 26

Example 22 (Variance of sum of random variables):in a class of 52 students, the variances of midterm and final exam scores are 137.2393 and 126.1129 respectively. The covariance of the two scores are 90.34089. What is the correlation between the two exam scores? Mapping the notation we used earlier, X can be labeled the midterm exam score, and Y the final exam score. With this mapping of notations, we can calculate the covariance of the two exam scores using the formula Corr(X, Y ) = = C(X, Y ) V (X)V (Y ) 90.34089 137.2393 126.1129 = 0.6867 Thus, students performance in different exams tend to be highly correlated. Definition 13 (Moments of a random variable): The k-th moment is defined as the expectation of the k-th power of a random variable: m k = E(X k ) Similarly, the k-th centralized moment is defined as: m k = E[(X E(x)) k ] It will become clear that second centralized moment m 2 = E[(X E(x)) 2 ] is the variance of a random variable. Definition 14 (Independence): Consider two random variables X and Y with joint probability P XY (x, y), marginal probability P X (x), P Y (y), conditional probability P X Y (x y) and P Y X (y x). 1. They are said to be independent of each other if and only if P XY (x, y) = P X (x) P Y (y) for all x and y. X and Y are independent if each cell probability, P XY (x, y), is the product of the corre- 27

sponding row and column total. 2. X is said to be independent of Y if and only if P X Y (x y) = P X (x) for all x and y. 3. Y is said to be independent of X if and only if P Y X (y x) = P Y (y) for all x and y. Theorem 8 (Consequence of Independence): If X and Y are independent random variables, we will have E(XY ) = E(X)E(Y ) We caution that E(XY ) = E(X)E(Y ) needs not imply that the random variables X and Y are independent, as the following example shows. Example 23 (Consequence of Independence): Let X and Y have the following probability distribution. y = 0 y = 1 y = 2 Total x = 0 1/12 2/12 0 3/12 x = 1 2/12 0 4/12 6/12 x = 2 1/12 2/12 0 3/12 Total 4/12 4/12 4/12 1 Here E(XY ) = 1 and E(X) = 1 and E(Y ) = 1. However, X and Y are not independent. For instance, P (X = 1, Y = 2) P (X = 1) P (Y = 2). 3 Binomial Probability Distribution Example 24 (Tossing coins): Consider the probability distribution of the number heads when different number of fair coins 3 are tossed. 1. Toss one coin. The probability distribution for the number of heads is 3 A coin is fair if an only if P (head) = P (tail) = 0.5. 28

# of heads, X 0 1 P (X) 0.5 0.5 2. Toss two coins. The probability distribution for the number of heads is # of heads, X 0 1 2 P (X) 0.25 0.5 0.25 3. Toss three coins. The probability distribution for the number of heads is # of heads, X 0 1 2 3 P (X) 0.125 0.375 0.375 0.125 4. Toss four coins. The probability distribution for the number of heads is # of heads, X 0 1 2 3 4 P (X) 0.0625 0.25 0.375 0.25 0.0625 As in the example, we can continue writing out the table for different numbers of coins tossed and also for different probability of obtaining different number of heads. A more convenient way to summarize the probability distribution is using some mathematical formula. Definition 15 (Binomial Probability Distribution): Consider n independent trials. In each trial there are only two possible outcomes. Let the outcomes be labeled success and failure. Let the x be the number of observed successes in the n trials, and π be the probability of success on each trial. The probability of x successes in n trials is P (x) = n C x π x (1 π) n x The mean and variance are E(X) = nπ and V (X) = nπ(1 π) Example 25 (binomial): After its return to mainland China in 1997, Hong Kong experienced the Asian Financial Crisis as well as SARS (Severe acute respiratory syndrome) 4. In the third 4 For an explanation of SARS, refer to http://en.wikipedia.org/wiki/severe_acute_respiratory_syndrome, for instance. 29

quarter of 2003 (when Hong Kong was experiencing the SARS), the unemployment rate reached historical height of 8.4%. Suppose we had drawn a sample of 14 workers in 2004, we would have: 1. The probability that exactly three are unemployed. P (x = 3) = 14 C 3 (0.084) 3 (1 0.084) 11 = (364)(0.000593)(0.380934) = 0.082184 2. The probability that at least three are unemployed. P (x 3) = P (x = 3) + P (x = 4) +... + P (x = 14) = 14 C 3 (0.084) 3 (0.916) 11 +... + 14 C 14 (0.084) 14 (0.916) 0 = 0.082184 + 0.020726 +... +.000 = 0.107293 3. The probability that at least one is unemployed. P (x 1) = 1 P (x = 0) = 1 14 C 0 (0.084) 0 (0.916) 14 = 1 0.292777 = 0.707223 4. Mean E(X) = nπ = 14(0.084) = 1.176 5. Variance V (X) = nπ(1 π) = (14)(0.084)(0.916) = 1.077216 The following figure plots the binomial distribution when different number of workers are drawn (n = 10, 20, 30, 40, 50). 30

0.45 0.40 n=10 0.35 n=20 Probability 0.30 0.25 0.20 0.15 n=30 n=40 n=50 0.10 0.05 0.00 0 1 2 3 4 5 6 7 8 9 10 11 X Note that in the above plot we do not include x 10 because the probability for such x values is small in our case. Of course, theoretically, x can assume any integer less than n. We also note that the binomial distribution approaches a bell shape as n increases. 4 Hypergeometric Distribution Suppose we draw randomly n students from a class of N students, without replacement. What is the probability that x of the n selected students are female? Let s label female as success and denote X the number of female students in the sample. It looks like that X will have a binomial probability distribution. That is WRONG! The random variable X has a binomial distribution only if the probability of drawing a female remains the same in different draws. In this example, the probability of drawing a female in the n draws are not the same because the population is finite. Definition 16 (Finite Population): A finite population is a population consisting of a fixed number of known individuals, objects, or measurements. Example 26 (Changing probability in subsequent draws): In a bag containing 7 red chips and 5 blue chips you select 2 chips one after the other without replacement. The following tree diagram showing the combination of outcomes and their probabilities. 31

6/11 R2 7/12 R1 5/11 B2 7/11 R2 5/12 B1 4/11 B2 We can easily see that the probability of drawing a red chip changes in subsequent draws and depends on what has been drawn previously. Let R i be the event of red chip in the i-th draw (i = 1, 2), B i be the event of blue chip in the i-th draw (i = 1, 2). 1. In the first draw, the probability of drawing a red chip is P (R 1 ) = 7/12. 2. In the second draw, the probability of drawing a red chip depends on the outcome of the first draw: (a) If the first draw is red, P (R 2 R 1 ) = 6/11. (b) If the first draw is blue, P (R 2 B 1 ) = 7/11. When the population is finite, the probability of drawing success (or outcome of some characteristics) in a sequence of trials will change. In this case, an assumption of the binomial is violated. To account for the change in probability in subsequent trials, we need to use hypergeometric distribution instead of binomial. The hypergeometric distribution has the following characteristics: 1. There are only 2 possible outcomes. 2. The probability of a success is not the same on each trial. 3. It results from a count of the number of successes in a fixed number of trials. Definition 17 (Hypergeometric Distribution): Consider n draws without replacement from a finite population. Each draw has two possible outcomes. Let the outcomes be labeled success and failure. Let x be the number of observed successes in the n trials, and N be the size of the population, S be the number of successes in the population. The probability of x successes in a sample of n observations is P (x) = ( SC x )( N S C n x ) NC n 32

The process that resulted in hypergeometric distribution is very similar to that resulted in binomial. Students of statistics often get confused between them. The general rule is to use the hypergeometric distribution to find the probability of a specified number of successes or failures if both of the following conditions are fulfilled: 1. The sample is selected from a finite population without replacement (recall that a criteria for the binomial distribution is that the probability of success remains the same from trial to trial). Note that if a sample is selected from a finite population with replacement, the probability will remain constant. 2. The size of the sample n is greater than 5% of the size of the population N. If the sample size n is small relative to the population size N, the change in probability in subsequently draws is so small that we can safely ignore the changes in probability in subsequent draws without affecting the calculated result. The following figure shows some hypergeomertric distributions, H(n, S N, N), along with its corresponding binomial distributions, B(n, π = S N ). It shows that the hypergeometric distributiions and binomial distributions are similar for a given n and N, especially when S N is small. Thus, when S N is small, we binomial distributions can be good approximation for hypergeometric distributions. Probability 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 H(20,0.05,100) H(20,0.2,100) H(20,0.5,100) B(20,0.05) B(20,0.2) B(20,0.5) 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 X The following figure shows some hypergeomertric distributions, H(n, S N, N), with the same population proportion of success (i.e. S N is kept the same) but different n. We can see that the distribution approaches bell-shaped as n increases. and with n, the number of samples getting larger, the distribution line is more and more smooth.plot shows that given S/N, when N increases (i.e., n/n approaches zero), the hypergeometric 33

distribution approaches the binomial. Thus, when n/n is small, we can safely use the binomial distribution to approximate the hypergeometric distribution. Probability 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 H(5,0.2,100) H(10,0.2,100) H(20,0.2,100) H(30,0.2,100) H(40,0.2,100) H(50,0.2,100) 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 X Example 27 (Hypergeometric): The National Air Safety Board has a list of 10 reported safety violations. Suppose only 4 of the reported violations are actual violations and the Safety Board will only be able to investigate five of the violations. What is the probability that three of five violations randomly selected to be investigated are actually violations? P (X = 3) = ( 4C 3 )( 10 4 C 5 2 ) 10C 5 = ( 4C 3 )( 6 C 3 ) 10C 5 = 4(15) 252 = 0.238 5 Poisson Probability Distribution The binomial distribution becomes more skewed to the right (positive) as the probability of success become smaller. The limiting form of the binomial distribution where the probability of success π is small and n is large is called the Poisson probability distribution. Definition 18 (Poisson Probability Distribution): The Poisson distribution can be described mathematically using the formula: P (X = x) = µx e µ x! 34

where µ is the mean number of successes in a particular interval of time, e is the constant 2.71828, and x is the number of successes. The mean number of successes µ can be determined in binomial situations by nπ, where n is the number of trials and π the probability of a success. The variance of the Poisson distribution is also equal to nπ. Different from Binomial and Hypergeometric, the Poisson random variable X generally has no specific upper limit. Poisson probability distribution always skewed to the right and becomes symmetrical when µ gets large, as illustrated in the following plot. Probability 0.4 0.35 0.3 0.25 0.2 0.15 µ=1 µ=2.5 µ=5 µ=10 µ=15 µ=25 0.1 0.05 0 0 5 10 15 20 25 30 35 40 X Example 28 (Poisson Probability Distribution): According to Shanghai Yearbook from 2000 to 2006, there were many fire accidents every year, threatening to citizens properties and even lives. The numbers of fire accidents over 1999 to 2005 are reported in the following table: Year 1999 2000 2001 2002 2003 2004 2005 Number of fires 6551 5164 3164 5983 5781 5134 4167 Average per day 17.95 14.15 8.67 16.39 15.84 14.03 11.42 Assuming that the average over all seven years can be used to predict the future accidents. What s the probability that there are exactly 14 fires in Shanghai the next day? 35

Note that the number of accident to happen the next day is better described as a Poisson random variable. We first compute the µ, i.e. average number of fires per day in these 7 years: µ = 1 (6551 + 5164 + 3164 + 5983 + 5781 + 5134 + 4167) = 14.07 7 365 The probability that there are exactly 14 fires in Shanghai next day is: P (x = 14) = µx e µ x! = 14.0714 e 14.07 14! = 0.1060 [Data is available at http://www.shtong.gov.cn/node2/node19828/index.html, the website of Shanghai Local Records] 6 What distributions to use? There is no need to memorize the formula of these probability distributions because in practice, we can always find those formula from the internet, Excel, or a textbook. Computing the probability using a given distribution is a simple task. However, what computer cannot replace human brain is the choice of distribution appropriate in a given situation. Here, we review how to make such judgment. Poisson considers the number of times an event occurs over an INTERVAL of TIME or SPACE and there is no limit of values that Poisson random variable can take. Thus, if we are considering a sample of 10 observations and we are asked to compute the probability of having 6 successes, we should not use Poisson because the maximum number of success is limited to 10. It is only reasonable to consider Binomial or Hypergeometric. Hypergeometric consider the number of successes in a sample when the probability of success varies across trials due to the without replacement sampling strategy. To compute the Hypergeometric probability, one will need to know N and S separately. Suppose we know that the probability of success is 0.3. We are considering a sample of 10 observations and we are asked to compute the probability of having 6 successes. We cannot use Hypergeometric because we do not have N and S separately. Instead, we have to use Binomial even though we are dealing with finite population. Let s check our understanding in the following examples. 36

Example 29 (Choice of distributions): In a shipment of 15 boxes of eggs, 5 are contaminated. If 4 boxes are inspected, what is the probability that exactly 1 is defective? First, we recognize that it is not Poisson because 4 boxes are inspected (i.e., sample size =4). Second, it is sampling without replacement because if we were to inspect four boxes for contamination, we will not want to sample with replacement. Third, both N (15 boxes) and S(5 are contaminated) are given. Hence we will use Hypergeometric. Example 30 (Choice of distributions): A research team is doing a new medical survey to update the color blind rate among the male people in the area. They conduct the test by randomly choosing male passersby and ask him to do the medical test. What is the probability that exactly 3 examined males are color blind in a sample of 10? (In the team s report, the updating color blind rate among the male are 7%) There are only two pieces of information given in the problem: π = 7%, n = 10, and x = 3. Thus we can immediately exclude Hypergeometric because hypergeometric needs 4 parameters in its characterization of its distribution. We may reject Poisson because Poisson requires the the maximum number of success to be unlimited. It turns out that we should use Binomial here, although in doing a survey, usually it is not with replacement. This is because the population is assumed to be infinitive and survey with replacement does not make significant difference. Example 31 (Choice of distributions): Traffic Department of the City of Peiging classifies the traffic condition at a crossroad to be good at peak hour if the traffic flow is higher than 40 vehicles/minute at a crossroad; crowded if it is less than 10 vehicles/minute. According the data in the past months, the average traffic flow at peak hours on Chang an Street is 30 vehicles/ minute. What is the probability that the traffic condition is crowded at 17:43, a definitely peak hour time? (Do not try to compute the exact value. Write up the formula only.) We find this problem related to time, which is a very good hint to use Poisson. Going further, we can find the parameters needed is sufficient: µ = 30 and x 40. Other information turns out to be useless here. Just go ahead using Poisson then. 37