PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets of Ω is called a σ-field if it satisfies the following conditions: a) F; b) If A, A 2,... F then i= A i F; c) If A F then A C F. Examples. A few examples of σ-fields associated with Ω: a) F = {, Ω} b) F 2 = {, A, A C, Ω}, where A is any subset of Ω c) F 3 = {0, } Ω = {A, A Ω} With any experiment we may associate a pair (Ω, F), where Ω is the sample space (i.e., the set of all the possible outcomes or elementary events) and F is a σ-field of subsets of Ω which contains all the events in whose occurrences we may be interested. Therefore, to call a set A an event is equivalent to asserting that A belongs to the σ-field in question. We usually translate statements about combinations of events into set-theoretic jargon..2. Probability Definition (Probability Measure). A probability measure P on the σ-field (Ω, F) is a function P : F [0, ] satisfying a) P( ) = 0, P(Ω) = ; b) If A, A 2, A 3,... are disjoint members of F (i.e. A i A j = for all pairs (i, j), satisfying i j) then ( ) P A i = P(A i ) i= i=

Remark. The probability measure P is a function which associates to any event a real number between 0 and. Definition (Probability Space). A triple (Ω, F, P), consisting of a set Ω, a σ-field F of subsets of Ω, and a probability measure P on (Ω, F), is called a probability space. Proposition (De Morgan s Laws). Let {A i } i I be a collection of sets (all subsets of a set Ω). Then: ( i I A i)c = i I A C i ( i I A i)c = i I A C i (the complements are taken with respect to the set Ω) Proposition (Basic Properties of Probability Spaces). Consider the probability space (Ω, F, P) and let A, B F. Then a) P(A C ) = P(A) b) If A B then P(B) = P(A) + P(B \ A). Also P(B) P(A). c) P(A B) = P(A) + P(B) P(A B) d) More generally, if A, A 2,..., A n are members of F, then ( n ) n P A i = P(A i ) P(A i A j ) + i= i= where, for example, i,j n, i<j + ( ) n+ P(A A 2 A n ) i,j n, i<j i,j,k n, i<j<k P(A i A j A k ) P(A i A j ) sums over all unordered pairs (i, j) with i j. Proposition (Properties of Increasing/Decreasing Sequences of Events). Consider the probability space (Ω, F, P). a) If A A 2 A 3 is an increasing sequence of events and A = i= A i, then P(A) = lim i P(A i ). b) If B B 2 B 3 is a decreasing sequence of events and B = i= B i, then P(B) = lim i P(B i ). Definition (Null Event). Consider the probability space (Ω, F, P). The event A F is called null if P(A) = 0. 2

Remarks. The event is called the impossible event and is a null event (since P( ) = 0). However, there exist null events that are not the impossible event. Definition (Almost Sure Event). Consider the probability space (Ω, F, P). The event A F is called almost sure if P(A) =. Remarks. The event Ω is called the certain event and is an almost sure event (since P(Ω) = ). However, there exist almost sure events that are not the certain event..3. Conditional Probability Definition (Conditional Probability). Consider the probability space (Ω, F, P) and let A, B F. If P(B) > 0 then the probability that the event A occurs given that B occurs is defined to be P(A B) P(A B) = P(B) Definition (Partition of a set Ω). Consider the probability space (Ω, F, P). A family B, B 2,..., B n of events is called a partition of the set Ω if a) B i B j = for all pairs (i, j), satisfying i j b) n i= B i = Ω Remark. Each elementary event ω Ω belongs to exactly one set in a partition of Ω. Proposition (Conditional Probabilities Using Partitions). In the probability space (Ω, F, P) consider the events A, B F with 0 < P(B) <. Then P(A) = P(A B) P(B) + P(A B C ) P(B C ) More generally, if the events B, B 2,..., B n form a partition of Ω such that P(B i ) > 0 for all i, we have n P(A) = P(A B i ) P(B i ) Example: i= With any random experiment we may associate a probability space, which is a triple (Ω, F, P), where: Ω is the sample space (all the possible outcomes) 3

F is a σ-field of subsets of Ω (all the possible events considered) P is the probability measure, which is a function P : F [0, ]. It associates to any event A F its probability, which is a real number between 0 and. Important Special Case: We often consider experiments with finitely many possible outcomes and we assume that all the outcomes are equally likely. In this case, we consider the probability space (Ω, F, P), where Ω = {the set of all possible outcomes} F = {the collection of all the subsets of Ω} P : F [0, ], P(A) = A for any A F (therefore, for any A Ω) Ω ( S denotes the number of elements of a set S) Remark: In this Special Case, Ω and F are finite sets and F = 2 Ω. Special Case - Concrete Example: Consider rolling a fair die. probability space (Ω, F, P), where Ω = {, { 2, 3, 4, 5, 6}. Note that Ω = 6. F =, Then we consider the {}, {2}, {3}, {4}, {5}, {6}, {, 2}, {, 3}, {, 4}, {, 5}, {, 6}, {2, 3}, {2, 4}, {2, 5}, {2, 6}, {3, 4}, {3, 5}, {3, 6}, {4, 5}, {4, 6}, {5, 6}, {, 2, 3}, {, 2, 4}, {, 2, 5}, {, 2, 6}, {, 3, 4}, {, 3, 5}, {, 3, 6}, {, 4, 5}, {, 4, 6}, {, 5, 6}, {2, 3, 4}, {2, 3, 5}, {2, 3, 6}, {2, 4, 5}, {2, 4, 6}, {2, 5, 6}, {3, 4, 5}, {3, 4, 6}, {3, 5, 6}, {4, 5, 6} {, 2, 3, 4}, {, 2, 3, 5}, {, 2, 3, 6}, {, 2, 4, 5}, {, 2, 4, 6}, {, 2, 5, 6}, {, 3, 4, 5}, {, 3, 4, 6}, {, 3, 5, 6}, {, 4, 5, 6}, {2, 3, 4, 5}, {2, 3, 4, 6}, {2, 3, 5, 6},, {2, 4, 5, 6}, {3, 4, 5, 6} {, 2, 3, 4, 5}, {, } 2, 3, 4, 6}, {, 2, 3, 5, 6}, {, 2, 4, 5, 6}, {, 3, 4, 5, 6}, {2, 3, 4, 5, 6} {, 2, 3, 4, 5, 6} P : F [0, ], P(A) = A Ω = A for any A F 6 In particular, if A = {2, 4, 6} (the event the number rolled is even ), then P(A) = {2, 4, 6} 6 = 3 6 = 2 Useful Results from Combinatorics. [Simple Permutations] The number of possible orderings (permutations) of n distinct objects is n! = 2 3 (n ) n 4

2. [Combinations Without( Repetitions] ) The number of ways of choosing r distinct objects n n! from n distinct objects is = r (n r)! r! 3. [Permutations Without Repetitions] The number of ways of choosing an ordered r- n! tuple of distinct objects from n distinct objects is = n(n ) (n r + ) (n r)! 4. [Permutations With Repetitions] The number of ways of choosing an ordered r-tuple of objects from n distinct objects, allowing repetitions (i.e., the same object can be chosen multiple times), is n r. Convention: 0! =. The Correspondence Set Theory - Probability Theory The following table shows the correspondence between the set theory and the probability theory. Notation Set theory jargon Probability jargon Ω Whole space Sample space (Certain event) Empty set Impossible event ω, ω Ω Member of Ω Outcome (Elementary event) A, A Ω Subset of Ω Event that some outcome in A occurs A C Complement of A Event that no outcome in A occurs A B Intersection Both A and B A B Union Either A or B or both A \ B Difference A, but not B A B Symmetric difference Either A or B, but not both A B Inclusion If A, then B Note. By definition, a collection F of subsets of Ω is called σ-field if it contains the empty set and is closed under countable union and under taking the complements. You may wonder why we call such a collection of objects a σ-field. If you had Abstract Algebra, it is interesting to think about the following: a) Any σ-field F is closed under the following operations (union), (intersection) and (symmetric difference) b) In addition to being closed under finite union ( ), F is, by definition, closed under countable union (hence the part σ in σ-field ) 5

c) is identity for the operation (union); Ω is identity for the operation (intersection); is identity for the operation (symmetric difference). d) (F, ) is an Abelian group ( is the group operation). e) (F,, ) is a commutative ring ( is the first operation, is the second operation). Chapter 2 - Random Variables 2.. Definitions and Basic Properties Definition (Random Variable). Consider a probability space (Ω, F, P). A random variable associated to this space is a function X : Ω R with the property that, for each x R, the set A(x) = {ω Ω : X(ω) x} belongs to the σ-field F. Definition (The Distribution Function of a Random Variable). Consider a probability space (Ω, F, P) and let X : Ω R be a random variable associated to this space. The distribution function of X is the function F : R [0, ] given by F (x) = P({ω Ω : X(ω) x}). Remark. The event {ω Ω : X(ω) x} is denoted by {X x} or simply X x. Similar notations are used for the events {ω Ω : X(ω) < x}, {ω Ω : X(ω) x}, {ω Ω : X(ω) > x}, and {ω Ω : X(ω) = x}. Proposition (Properties of the Distribution Function of a Random Variable). Consider a probability space (Ω, F, P) and let X : Ω R be a random variable associated to this space. Let F be the distribution function associated to X. Then F has the following properties: a) lim x F (x) = 0, lim x F (x) = b) If x < y, then F (x) F (y) c) F is right-continuous, that is, F (x + h) F (x) as h 0 Remark. A function F is the distribution function of some random variable if and only if it satisfies (a), (b) and (c) from the previous proposition. Proposition (More Properties of the Distribution Function of a Random Variable). As before, consider a probability space (Ω, F, P) and let X : Ω R be a random 6

variable associated to this space. Let F be the distribution function associated to X. Then F has the following properties: a) P({X > x}) = F (x) b) P({x < X y}) = F (y) F (x) c) P(X = x) = F (x) lim y x F (y) 2.2. Discrete and Continuous Random Variables Definition (Discrete Random Variable). Consider a probability space (Ω, F, P) and let X : Ω R be a random variable associated to this space. The random variable X is called discrete if it takes values only in some countable subset {x, x 2,...} of R. Remark. The distribution function of a discrete random variable has jump discontinuities at the values x, x 2,... and is constant in between. Definition (The Probability Mass Function Associated to a Discrete Random Variable). Consider a probability space (Ω, F, P) and let X : Ω R be a discrete random variable associated to this space. The probability mass function associated to X is the function f : R [0, ] given by f(x) = P(X = x). Definition (Continuous Random Variable). Consider a probability space (Ω, F, P) and let X : Ω R be a random variable associated to this space. The random variable X is called continuous if there exists an integrable function f : R [0, ) such that the distribution function F of the random variable X can be expressed as for any real number x. F (x) = x f(u) du Definition (The Probability Density Function Associated to a Continuous Random Variable). With the notations from the previous definition, the function f is called the probability density function of the continuous random variable X. Remark. If X : Ω R is a random variable, then for any x R, the sets {ω Ω : X(ω) > x}, {ω Ω : X(ω) x}, {ω Ω : X(ω) < x}, and {ω Ω : X(ω) = x} belong to the σ-field F. Remark 2. The event {ω Ω : X(ω) x} is denoted by {X x} or simply X x. Similar notations are used for the events {ω Ω : X(ω) > x}, {ω Ω : X(ω) x}, 7

{ω Ω : X(ω) < x}, and {ω Ω : X(ω) = x}: {X > x}, {X x}, {X < x}, and {X = x}. Examples of Random Variables and Their Distribution Functions. Example [Constant Random Variables]. Consider a probability space (Ω, F, P) and let c R. Then the function X : Ω R defined by X(ω) = c is a random variable. Its distribution function is F = F X : R [0, ] defined by { 0, if x < c F X (x) =, if x c Example 2 [Bernoulli Random Variables]. A possibly biased coin is tossed once. We can take Ω = {H, T }, F = {, {H}, {T }, Ω} and the probability measure P : F [0, ] defined by P( ) = 0, P({H}) = p, P({T }) = p, P(Ω) = where p is a real number in [0, ]. We define Y : Ω R by Y (H) = and Y (T ) = 0. Then Y is a random variable with the distribution function F = F Y : R [0, ] defined by 0, if x < 0 F Y (x) = p, if 0 x <, if x Example 3 [Indicator Functions]. Consider a probability space (Ω, F, P). Then any event A F defines a random variable I A : Ω R by {, if ω A I A (ω) = 0, if ω / A The random variable I A is called the indicator function of the set A. Its distribution function is the function F = F IA : R [0, ] defined by 0, if x < 0 F IA (x) = P(A), if 0 x <, if x 8

2.3. Worked Examples Random Angle. An arrow is flung down at random onto a plane and the angle ω between the arrow and true north in measured (in the clockwise direction). The result is a number in [0, 2π). We consider a σ-field on Ω which contains all the nice subsets of Ω (including the collection of open subintervals (a, b), 0 a < b 2π). The implicit symmetry suggests that P((a, b)) = b a 2π (the probability that the angle lies in some interval is directly proportional to the length of the interval) Consider the random variable X : Ω = [0, 2π) R defined by X(ω) = ω. The distribution function of this random variable is F X : R [0, ] defined by 0, if x < 0 x F X (x) =, if 0 x < 2π 2π, if x 2π The random variable X is continuous because F X (x) = where f X : R [0, ) is the function f X (x) = {, 2π x f X (u) du if 0 x 2π 0, otherwise The function f X is the probability density function associated to the random variable X. Darts. A dart is flung at a circular target of radius 3; we assume for simplicity that the player is guaranteed to hit the target somewhere. We can think of the hitting point as the outcome of a random experiment. We can take Ω = {(x, y) R, x 2 + y 2 < 9} Take F = { nice subsets of Ω }. The probability that the dart lands in some region A is proportional to its area A. Thus P(A) = A 9π 9

We want to define a random variable score : X : Ω R. (The smaller the score, the better.) Case. The target is partitioned by three concentric circles C, C 2, and C 3, centered at the origin and with radii, 2, and 3.These circles divide the target into three annuli A, A 2, and A 3, where We define the score to be A k = {(x, y) : k x 2 + y 2 < k}, k =, 2, 3, if ω A X(ω) = 2, if ω A 2 3, if ω A 3 X is a discrete random variable with the distribution function F X : R [0, ]: 0, if r < F X (r) =, if r < 2 9 4, if 2 r < 3 9, if r 3 Its probability mass function is the function f : R [0, ] defined by, if r = 9 f(r) =, if r = 2 3 5, if r = 3 9 0, otherwise Case 2. We define the score Y : Ω R to be the distance between the hitting point and the origin: Y (ω) = x 2 + y 2, if ω = (x, y) The distribution function of the random variable Y is the function F Y : R [0, ] defined by 0, if r < 0 r F Y (r) = 2, if 0 r 3 9, if r > 3 The random variable Y is continuous; its probability density function is the function f Y : R [0, ): { 2r f Y (r) =, if 0 r 3 9 0, otherwise 0

2.4. Random Vectors Definition [Random Vector]. A random vector (of length n) on a probability space (Ω, F, P) is an n-tuple X = (X, X 2,..., X n ), where X k : Ω R is a random variable for any k ( k n). Definition [Joint Distribution Function of a Random Vector]. Let X = (X, X 2,..., X n ) be a random vector on the probability space (Ω, F, P). The joint distribution function of the random vector X is the function F X : R n [0, ] given by F X (x, x 2,..., x n ) = P(X x, X 2 x 2,..., X n x n ) Proposition [Properties of the Joint Distribution Function of a Random Vector]. Let X = (X, Y ) be a random vector on the probability space (Ω, F, P). The joint distribution function F X,Y : R 2 [0, ] of the random vector X = (X, Y ) has the following properties: a) lim x,y F X,Y (x, y) = 0, lim x,y F X,Y (x, y) = b) If x x 2 and y y 2, then F X,Y (x, y ) F X,Y (x 2, y 2 ) c) F X,Y is continuous from above F X,Y (x + u, y + v) F X,Y (x, y) as u, v 0 Remark. If the joint distribution function of (X, Y ) is F X,Y, then the distribution functions F X and F Y of the random variables X and Y can be computed using: lim F X,Y (x, y) = F X (x) y lim F X,Y (x, y) = F Y (y) x The functions F X and F Y are called the marginal distribution functions of F X,Y. Definition [Jointly Discrete Random Variables/Joint Probability Mass Function of a Random Vector]. The random variables X and Y on the probability space (Ω, F, P) are called (jointly) discrete if the vector (X, Y ) takes values in some countable (or finite) subset of R 2. The random vector (X, Y ) with X and Y jointly discrete has joint probability mass function f : R 2 [0, ] given by f(x, y) = P(X = x, Y = y). Definition [Joint Probability Density Function of a Random Vector]. The random variables X and Y on the probability space (Ω, F, P) are called (jointly) continuous if their joint distribution function F X,Y can be expressed as F X,Y (x, y) = x y f(u, v) dv du (x, y) R 2

for some integrable function f : R 2 [0, ) called the joint probability density function of the random vector (X, Y ). Remark. If it exists, the joint probability density function can be computed using f(x, y) = 2 F X,Y x y (x, y) Remark 2. If it exists, the joint probability density function f(x, y) satisfies: f(u, v) dv du = Problem. A fair coin is tossed twice. Let X be the number of heads and let Y be the indicator function of the event X =. Find P(X = x, Y = y) for all appropriate values of x and y. Problem 2. Let X and Y be two random variables with the joint distribution function F = F (x, y). Show that P( < X 2, 3 < Y 4) = F (2, 4) F (, 4) F (2, 3) + F (, 3) Problem 3. Consider the random variables X and Y with joint density function { 6e f(x, y) = 2x 3y, if x, y > 0 0, otherwise a) Compute P(X x, Y y). b) Find the marginal distribution functions F X (x) and F Y (y). Chapter 3 - Discrete Random Variables 3.. Probability Mass Functions Definition (The Probability Mass Function Associated to a Discrete Random Variable). Consider a probability space (Ω, F, P) and let X : Ω R be a discrete 2

random variable associated to this space. The probability mass function associated to X is the function f : R [0, ] given by f(x) = P(X = x). Proposition (Defining Properties of the Probability Mass Function). Consider a probability space (Ω, F, P) and let X : Ω R be a discrete random variable associated to this space. The probability mass function f : R [0, ] associated to X satisfies: a) The set of x such that f(x) 0 is countable (or finite). b) i f(x i ) =, where x, x 2,... are the values of x such that f(x) 0. Examples of Discrete Random Variables.. The Binomial Distribution. A discrete random variable X : Ω R is said to have the binomial distribution with parameters n and p (and is denoted bin(n, p)) if: - X takes values in {0,, 2,..., n} - the probability mass function f : R [0, ] of X is the function { ( n ) f(k) = k p k ( p) n k if k is an integer and 0 k n 0 otherwise. Note that X = Y + Y 2 + + Y n where each Y k is a Bernoulli random variable taking the value 0 with probability ( p) and the value with probability p. 2. The Poisson Distribution. A discrete random variable X : Ω R is said to have the Poisson distribution with parameter λ > 0 (and is denoted Poisson(λ)) if: - X takes values in {0,, 2,...} - the probability mass function f : R [0, ] of X is the function f(k) = { λ k k! e λ if k is a nonnegative integer 0 otherwise. Problem. For what values of the constant C do the following define mass functions on the positive integers, 2,...? a) f(x) = C3 x b) f(x) = C 3 x x c) f(x) = Cx3 x 3

Problem 2. For a random variable X having each of the probability mass functions in Problem, find: a) P(X > 2), b) The most probable value of X, c) The probability that X is even. 3.2. Independence Definition (Two Independent Discrete Random Variables). The discrete random variables X : Ω R and Y : Ω R are independent if for any x, y R the events {ω Ω : X(ω) = x} and {ω Ω : Y (ω) = y} are independent. Remark. As usual, the event {ω Ω : X(ω) = x} is denoted by {X = x} and the event {ω Ω : Y (ω) = y} is denoted by {Y = y}. The discrete random variables X : Ω R and Y : Ω R are independent if and only if for any x, y R, P(X = x, Y = y) = P(X = x)p(y = y). Theorem (Functions of Two Independent Discrete Random Variables). Suppose that the discrete random variables X : Ω R and Y : Ω R are independent and g, h : R R. Then the random variables g(x) and h(y ) are jointly discrete and independent. Definition (General Families of Discrete Random Variables). A family of random variables {X i : Ω R} i I is called independent if, for any collection of real numbers {x i } i I, the events {X i = x i } i I are independent. Equivalently, this means that the family of random variables {X i : Ω R} i I is independent if and only if P(X i = x i for all i J) = i J P(X i = x i ) for all sets {x i } i I and for all the finite subsets J of I. 3.3. Expectation and Moments for Discrete Random Variables Definition (The Expected Value of a Discrete Random Variable). Let X : Ω R be a discrete random variable with values in the set S = {x, x 2,...}. Suppose that 4

f : R [0, ] is the probability mass function of X. The mean value (or expectation, or expected value) of X is the number E(X) = x S xf(x) whenever the sum is absolutely convergent. Note: A series x n is said to be absolutely convergent if the series n= x n converges. n= Proposition (The Expected Value of a Function of a Discrete Random Variable). Let X : Ω R be a discrete random variable with values in the set S = {x, x 2,...} and suppose that f : R [0, ] is the probability mass function of X. Let g : R R be another function. Then g(x) is a discrete random variable and E(g(X)) = x S g(x)f(x) whenever the sum is absolutely convergent. Definition (The k-th Moment of a Discrete Random Variable). Let X : Ω R be a discrete random variable and let k a positive integer. The k-th moment m k of X is defined to be m k = E(X k ). Proposition (Formula for Computing the k-th Moment of a Discrete Random Variable). Let X : Ω R be a discrete random variable with values in the set S = {x, x 2,...} and suppose that f : R [0, ] is the probability mass function of X. Then the k-th moment m k of X can be computed using the formula m k = x S x k f(x) Definition (The k-th Central Moment of a Discrete Random Variable). Let X : Ω R be a discrete random variable and let k be a positive integer. The k-th central moment σ k of X is defined to be σ k = E((X E(X)) k ) = E((X m ) k ). Note: σ 2 is called the variance of X (denoted var(x)). σ = σ 2 is called the standard deviation of X. σ 3 measures the skewness of X. 5

σ 4 is used to find the kurtosis of X. Proposition (Formula for Computing the k-th Central Moment of a Discrete Random Variable). Let X : Ω R be a discrete random variable with values in the set S = {x, x 2,...} and suppose that f : R [0, ] is the probability mass function of X. Then the k-th central moment σ k of X can be computed using the formula σ k = x S(x m ) k f(x) where m is the first moment of X. Proposition (Formula for Computing Variances). Let X : Ω R be a discrete random variable. Then var(x) = E(X 2 ) (E(X)) 2 Theorem (Properties of the Expectation). Let X : Ω R and Y : Ω R be two discrete random variables. Then: a) If X 0 (which means that X(ω) 0 for all ω Ω), then E(X) 0 b) If a, b R, then E(aX + by ) = ae(x) + be(y ) c) If X = (which means that X(ω) = for all ω Ω), then E(X) =. Theorem (The Expectation of the Product of Two Independent Discrete Random Variables). Let X : Ω R and Y : Ω R be two independent discrete random variables. Then E(XY ) = E(X) E(Y ). Definition (Uncorrelated Random Variables). Let X : Ω R and Y : Ω R be two discrete random variables. Then X and Y are called uncorrelated if E(XY ) = E(X) E(Y ). Note. Two independent random variables are uncorrelated. The converse is not true. Theorem (Properties of the Variance). Let X : Ω R and Y : Ω R be two discrete random variables. Then: a) var(ax) = a 2 var(x) for any a R b) If X and Y are uncorrelated, then var(x + Y ) = var(x) + var(y ) Problem. Let X be a Bernoulli random variable, taking the value with probability p and 0 with probability p. Find E(X), E(X 2 ), and var(x). 6

Problem 2. Let X be bin(n, p). Show that E(X) = np and var(x) = np( p). Problem 3. Suppose that the discrete random variables X : Ω R and Y : Ω R are independent. Prove that: a) X 2 and Y are independent. b) X 2 and Y 2 are independent. Problem 4. An urn contains 3 balls numbered, 2, 3. We remove two balls at random (without replacement) and add up their numbers. Find the mean and the variance of the total. 3.4. Indicators and Matching Recall the definition of the indicator function: Definition [Indicator Function]. Consider a probability space (Ω, F, P). Then any event A F defines a random variable I A : Ω R by {, if ω A I A (ω) = 0, if ω / A I A is called the indicator function of A. Note: If P(A) = p, then E(I A ) = p and var(i A ) = p( p). 3.5. Dependence Recall the following definition: Definition (Joint Distribution Function/Joint Probability Mass Function). Let X : Ω R and Y : Ω R be two discrete random variables. The joint distribution function of X and Y is the function F = F X,Y : R 2 [0, ] defined by F X,Y (x, y) = P(X x, Y y). The joint probability mass function of X and Y is the function f = f X,Y : R 2 [0, ] defined by f X,Y (x, y) = P(X = x, Y = y). Proposition (Formula for Computing the Marginal Mass Functions). Let X : Ω R and Y : Ω R be two discrete random variables and let f X,Y be their joint mass 7

function. The probability mass functions f X and f Y of X and Y are called the marginal mass functions of the pair (X, Y ). They can be computed using the following formulas: f X (x) = y f Y (y) = x f X,Y (x, y) f X,Y (x, y) Proposition (The Joint Probability Mass Function of Two Independent Random Variables). The discrete random variables X and Y are independent if and only if f X,Y (x, y) = f X (x)f Y (y) for all x, y R More generally, X and Y are independent if and only if f X,Y (x, y) can be factorized as the product g(x)h(y) of a function of x alone and a function of y alone. Proposition (Formula for the Expectation of a Function of Two Discrete Random Variables). Let X : Ω R and Y : Ω R be two discrete random variables and let f X,Y be their joint probability mass function. Let g : R 2 R be another function. Then E(g(X, Y )) = g(x, y)f X,Y (x, y) x,y Example: Let Z = X 2 + Y 2. We have Z = g(x, Y ), where g(x, y) = x 2 + y 2. Therefore E(Z) = E(X 2 + Y 2 ) = (x 2 + y 2 )f X,Y (x, y). x,y Definition (Covariance of Two Random Variables X and Y ). The covariance of two discrete random variables X and Y is defined to be [ (X )( ) ] cov(x, Y ) = E E(X) Y E(Y ) Remarks:. A simple computation shows that cov(x, Y ) = E(XY ) E(X)E(Y ). Therefore, the random variables X and Y are uncorrelated if and only if cov(x, Y ) = 0. 2. For any random variable X, we have cov(x, X) = var(x). 3. If any of E(X), E(Y ), or E(XY ) does not exist or is infinite, then cov(x, Y ) does not exist. 8

Definition (Correlation Coefficient). The correlation (or correlation coefficient) of two discrete random variables X and Y with nonzero variances is defined to be ρ(x, Y ) = cov(x, Y ) var(x) var(y ) Proposition (The Cauchy-Schwartz Inequality for Random Variables). For any two discrete random variables X and Y, we have ( E(XY ) ) 2 E(X 2 )E(Y 2 ) with equality if and only if there exist two real numbers a and b (at least one of which is nonzero) such that P(aX = by ) =. Remark: The equality in the Cauchy-Schwartz Inequality is attained if and only if X and Y are proportional, which means that there exists a constant c R such that P( X = c) = or P( Y = c) =. Y X Proposition (Possible Values for the Correlation Coefficient). Let X and Y be two discrete random variables and let ρ(x, Y ) be their correlation coefficient. Then ρ(x, Y ) The equality is attained in the previous inequality if and only if there exist a, b, c R such that P(aX + by = c) =. Remark: The condition P(aX + by = c) = means that there exists a linear dependence of X and Y. More precisely, if ρ =, then Y increases linearly with X and if ρ =, then Y decreases linearly with X. 3.6. Conditional Distribution and Conditional Expectation. We consider a probability space (Ω, F, P) and two discrete random variables X : Ω R and Y : Ω R. Let S = {x R, P(X = x) > 0}. Note that S is the set of values of the random variable X and is a finite or countable set. Definition (Conditional Distribution Function). For any x S we define the conditional distribution function of Y given X = x, to be the function F Y X ( x) : R [0, ] defined by F Y X (y x) = P(Y y X = x) 9

Definition (Conditional Probability Mass Function). For any x S we define the conditional probability mass function of Y given X = x, to be the function f Y X ( x) : R [0, ] defined by f Y X (y x) = P(Y = y X = x) Remark. If X, Y : Ω R are two discrete random variables with joint probability mass function f X,Y and with marginal probability mass functions f X and f Y, then f Y X (y x) = P(Y = y X = x) = P(Y = y, X = x) P(X = x) = f X,Y (x, y) f X (x) Definition (The Random Variable Y {X=x} and its Expectation). Suppose that x S and the discrete random variable Y takes values in the set T = {y, y 2,...}. By definition, the random variable Y {X=x} also takes values in the set T and for any y T we have P(Y {X=x} = y) = P(Y = y X = x) = f Y X (y x) We have E(Y {X=x} ) = y T y P(Y {X=x} = y) = y T y f Y X (y x) = y T y f X,Y (x, y) f X (x) Definition (Conditional Expectation). The function ψ : S R is defined by ψ(x) = E(Y {X=x} ) The random variable ψ(x) is called the conditional expectation of Y given X and is denoted by E(Y X). Theorem (The Law of Iterated Expectation). The conditional expectation E(Y X) satisfies E(E(Y X)) = E(Y ) 3.7. Sums of Random Variables. Theorem (The Probability Mass Function of the Sum of Two Discrete Random Variables). Suppose that X : Ω R and Y : Ω R are two discrete random 20

variables with joint probability mass function f X,Y. Then Z = X +Y is a discrete random variable and its probability mass function f Z which can be computed by f Z (z) = P(Z = z) = P(X + Y = z) = x f X,Y (x, z x) Remark. f Z (z) = x f X,Y (x, z x) = x f X,Y (z x, x). Definition (Convolution of Two Functions). Suppose that f, g : R R are two functions such that the sets S = {x R, f(x) 0} and T = {x R, g(x) 0} are finite or countable. Then the convolution of f and g is the function h = f g : R R defined by h(z) = (f g) (z) = f(z x)g(x) x (assuming that all the sums converge) Remark. f g = g f. Theorem (The Probability Mass Function of the Sum of Two Independent Discrete Random Variables). Suppose that X, Y : Ω R are two discrete random variables with joint probability mass function f X,Y and with marginal probability mass functions f X and f Y. Then we have f X+Y = f X f Y. Chapter 4 - Continuous Random Variables Recall from Chapter 2: Definition (Continuous Random Variables). Let (Ω, F, P) be a probability space. Let X : Ω R be a random variable with the distribution function F X : R [0, ] (Recall that F X (x) = P(X x).) X is said to be a continuous random variable if there exists an integrable function f = f X : R [0, ) such that F X (x) = x f(u) du The function f X is called the density of the continuous random variable X. 2

Remark. The density function f X is not unique (since two functions which are equal everywhere except at a point have the same integrals). However, if the distribution function is differentiable at a point x, we normally set f X (x) = F X (x). We can think of f(x) dx as the element of probability P(x < X x + dx) = F (x + dx) F (x) f(x) dx Definition (The Borel σ-field, Borel Sets, Borel Measurable Functions). The smallest σ-algebra of R which contains all the open intervals is called the Borel σ-algebra and is denoted by B. Note that B contains all the nice subsets of R (intervals, countable unions of intervals, Cantor sets, etc). The sets in B are called Borel sets. A function g : R R is called Borel measurable if for any B B, we have g (B) B. This means that under a Borel measurable set, the inverse image of a nice set is a nice set. Remark: All nice functions are Borel measurable. More precisely: any continuous, monotonic, piecewise continuous or piecewise monotonic function is Borel measurable. Proposition (Properties of Continuous Random Variables). If X is a continuous random variable with density f, then: a) f(x) dx = b) P(X = x) = 0 for all x R, c) P(a X b) = b a f(x) dx for any a < b. d) In general, if B is a nice subset of R (interval, countable union of intervals and so on), we have P(X B) = f(x) dx Definition (Independent Random Variables). Two random variables X : Ω R and Y : Ω R are independent if the events {X x} and {Y y} are independent for all x, y R. Proposition (Functions of Independent Random Variables). Let X : Ω R and Y : Ω R be two independent random variables and let g, h : R R be Borel measurable functions (for example, continuous functions or characteristic functions). Then g(x) and h(y ) are independent random variables. 22 B

Definition (The Expectation of a Continuous Random Variable). Let X be a continuous random variable with the density function f. The expectation of X is whenever this integral exists. E(X) = xf(x) dx Proposition (The Expectation of a Function of a Random Variable). Let X be a continuous random variable with the density function f X. Let g : R R such that g(x) is a continuous random variable. Then E(g(X)) = g(x)f X (x) dx Note: The previous proposition allows us to define the moments m, m 2, m 3,... and the central moments σ, σ 2, σ 3,... of a continuous random variable X with density f X : m k = E(X k ) = σ k = E((X m ) k ) dx = x k f X (x) dx (x m ) k f X (x) dx for all k = 0,, 2,... Some (or all) of these moments may not exist. As in the case of the discrete random variables, σ 2 is called the variance and σ = σ 2 is called the standard deviation of X. Recall the following: Proposition (The Expectation of a Function of a Random Variable). Let X be a continuous random variable with the density function f X. Let g : R R such that g(x) is a continuous random variable. Then E(g(X)) = g(x)f X (x) dx Note: The previous proposition allows us to define the moments m, m 2, m 3,... and the central moments σ, σ 2, σ 3,... of a continuous random variable X with density f X : m k = E(X k ) = x k f X (x) dx 23

σ k = E((X m ) k ) dx = (x m ) k f X (x) dx for all k = 0,, 2,... Some (or all) of these moments may not exist. As in the case of the discrete random variables, σ 2 is called the variance and σ = σ 2 is called the standard deviation of X. 4.. Examples of Continuous Random Variables. The Uniform Distribution. The random variable X is uniform on the interval [a, b] if it has the distribution function 0, if x a x a F X (x) = b a, if a < x b, if x > b and density f X (x) = {, b a if a < x b 0, otherwise. 2. The Exponential Distribution with Parameter λ. The random variable X is exponential with parameter λ > 0 if it has the distribution function { e F X (x) = λx, if x 0 0, otherwise and the density f X (x) = { λe λx, if x 0 0, otherwise 3. The Normal (Gaussian) Distribution with Parameters µ and σ 2. The random variable X is normal (Gaussian) with parameters µ and σ 2 if it has the density function f X (x) = (x µ)2 e 2σ 2, < x < 2πσ 2 This random variable is denoted by N(µ, σ 2 ); it has mean µ and variance σ 2. N(0, ) is called the standard normal distribution (it has mean 0 and variance ). 24

Remark. If X is N(µ, σ 2 ) and σ > 0, then Y = X µ is N(0, ). σ The density and the distribution functions of N(0, ) are denoted by φ and Φ. Thus: φ(x) = 2π e 2 x2 Φ(y) = P(N(0, ) y) = y φ(x) dx = 2π y e 2 x2 dx 4. The Gamma Distribution. The random variable X has the gamma distribution with parameters λ, t > 0 if it has the density f X (x) = Γ(t) λt x t e λx, if x 0 0, otherwise where Γ(t) = 0 x t e x dx. This random variable is denoted by Γ(λ, t). Important special cases: Γ(λ, ) is the exponential distribution with parameter λ. If d is a positive integer, Γ( 2, d 2 ) is said to have the χ2 (d) distribution (the chisquared distribution with d degrees of freedom) 5. The Cauchy Distribution. The random variable X has the Cauchy distribution if it has the density f X (x) = π( + x 2 ), < x < 6. The Beta Distribution. The random variable X is beta with parameters a, b > 0 if it has the density f X (x) = B(a, b) xa ( x) b, if 0 x 0, otherwise where B(a, b) = 0 x a ( x) b dx. (B(a, b) is called the Beta function.) 25

This random variable is denoted by β(a, b). Important special case: If a = b =, then X is uniform on [0, ]. Remark: The Beta function satisfies B(a, b) = Γ(a)Γ(b) Γ(a + b). 7. The Weibull Distribution. The random variable X is Weibull with parameters α, β > 0 if it has the distribution function { e αx β, if x 0 F X (x) = 0, otherwise and the density f X (x) = { αβx β e αxβ, if x 0 0, otherwise Important special case: If β =, then X has the exponential distribution with parameter α. The mean and the variances of the continuous random variables presented before can be found in the following table: X Density f X (x) Range for x Mean Variance a+b Uniform [a, b] [a, b] b a 2 Exponential (λ) λe λx [0, ) λ (b a) 2 2 λ 2 Normal (µ, σ 2 (x µ)2 ) e 2σ 2 2πσ (, ) µ σ 2 2 Gamma (λ, t) Γ(t) λt x t e λx t [0, ) λ Cauchy π(+x 2 ) (, ) Beta (a, b) B(a,b) xa ( x) b [0, ] a a+b t λ 2 ab (a+b) 2 (a+b+) Weibull (α, β) αβx β e αxβ [0, ) α β Γ( + β ) α 2 β Γ( + 2 β ) m2 26

Note: In the last box, m denotes the mean of the Weibull distribution with parameters α and β: m = α β Γ( + β ). 4.2. Dependence Recall the following definitions: Definition [Joint Distribution Function of a Pair of Random Variables]. Let X and Y be two random variables on the probability space (Ω, F, P). The joint distribution function of (X, Y ) is the function F = F X,Y : R 2 [0, ] given by F X,Y (x, y) = P(X x, Y y) Remark. If the joint distribution function of (X, Y ) is F X,Y, then the distribution functions F X and F Y of the random variables X and Y can be computed using: lim F X,Y (x, y) = F X (x) y lim F X,Y (x, y) = F Y (y) x F X and F Y are called the marginal distribution functions of X and Y. Definition [Jointly Continuous Random Variables/Joint Probability Density Function of a Random Vector]. The random variables X and Y on the probability space (Ω, F, P) are called (jointly) continuous if their joint distribution function F X,Y can be expressed as F X,Y (x, y) = x y f(u, v) dv du (x, y) R 2 for some integrable function f : R 2 [0, ) called the joint probability density function of the pair (X, Y ). Remark. If it exists, the joint probability density function can be computed using f(x, y) = 2 F (x, y) x y Proposition (Formula for Computing Probabilities Associated to a Pair of Jointly Continuous Random Variables). Let X and Y be two jointly continuous 27

random variables on the probability space (Ω, F, P), with the joint probability density function f = f(x, y). Suppose that B is a nice subset of R 2. Then P((X, Y ) B) = f(x, y) dx dy Remark. In particular, if B = [a, b] [c, d], then P(a X b, c Y d) = P((X, Y ) [a, b] [c, d]) = B d b c a f(x, y) dx dy Proposition (Formula for Computing the Marginal Density Functions). Let X and Y be two jointly continuous random variables on the probability space (Ω, F, P), with the joint probability density function f = f(x, y). Then X and Y are continuous random variables and their density functions f X and f Y can be computed using the formulas The functions f X and f Y and Y. f X (x) = f Y (y) = f(x, y) dy f(x, y) dx are called the marginal probability density functions of X Proposition (Formula for Computing the Expectation of a Function of Two Jointly Continuous Random Variables). Let X and Y be two jointly continuous random variables on the probability space (Ω, F, P), with the joint probability density function f = f(x, y). Let g : R 2 R be a sufficiently nice function (Borel measurable). Then g(x, Y ) is a random variable whose expectation can be computed using E(g(X, Y )) = g(x, y)f(x, y) dx dy Remark (Linearity). In particular, if a, b R and g(x, y) = ax +by, we get E(g(X, Y )) = E(aX + by ) = ae(x) + be(y ). Definition (Independent Continuous Random Variables.) Let X and Y be two jointly continuous random variables on the probability space (Ω, F, P), with joint distribution function F = F (x, y) and joint probability density function f = f(x, y). Suppose 28

that F X and F Y are the marginal distribution functions of X and Y ; suppose also that f X and f Y are the marginal probability density functions of X and Y. The random variables X and Y are independent if and only if which is equivalent to F (x, y) = F X (x)f Y (y) f(x, y) = f X (x) f Y (y) for all x, y R for all x, y R Theorem (Cauchy-Schwartz Inequality). For any pair X, Y of jointly continuous random variables, we have that {E(XY )} 2 E(X 2 ) E(Y 2 ) with equality if and only if there exist two constants a and b (not both 0) such that P(aX = by ) =. Problem [Buffon s Needle]. A plane is ruled by the vertical lines x = n (n = 0, ±, ±2,...) and a needle of unit length is cast randomly on to the plane. What is the probability that it intersects some line? (We suppose that the needle shows no preference for position or direction). Problem 2 [Random Numbers]. Suppose that X, Y and Z are independent random variables uniformly distributed in [0, ]. Compute the following probabilities: a) P(Y 2X) b) P(Y = 2X) c) P(Z XY ) d) P(Z = XY ) Problem 3 [Buffon s Needle Revisited]. Two grids of parallel lines are superimposed: the first grid contains lines distance a apart, and the second contains lines distance b apart which are perpendicular to those of the first set. A needle of length r (< min{a, b}) is dropped at random. Show that the probability it intersects a line equals r(2a+2b r) πab. Problem 4 [The Standard Bivariate Normal Distribution]. Let ρ be a constant satisfying < ρ < and consider two jointly continuous random variables with the joint density function f : R 2 R defined by f(x, y) = 2π ρ 2 exp ( ) 2( ρ 2 ) (x2 2ρxy + y 2 ) 29

(f is called the standard bivariate normal density of a pair of continuous random variables) a) Show that both X and Y are N(0, ). b) Show that cov(x, Y ) = ρ. Problem 5. Suppose that X and Y are jointly continuous random variables with the joint density function { ) ( y f(x, y) = exp x, if 0 < x, y < y y 0, otherwise { e Show that the marginal density function of Y is the function f Y (y) = y, if y > 0 0, otherwise. Table of discrete random variables: X Values of X f X (k) Mean Variance Bernoulli (p) {0,} f X (0) = p, f X () = p p p( p) Binomial (n, p) {0,, 2,..., n} f X (k) = ( n k) p k ( p) n k np np( p) Poisson (λ) {0,, 2,...} f X (k) = λk k! e λ λ λ Geometric (p) {, 2,...} f X (k) = p( p) k p p 2 ( p) 4.3. Conditional Distribution and Conditional Expectation Definition [Conditional Distribution Function/Conditional Density Function]. Suppose that X and Y are jointly continuous random variables with the joint density f X,Y and the marginal density functions f X and f Y. Let x be a number such that f X (x) > 0. The conditional distribution function of Y given X = x is the function F Y X ( x) given by y f X,Y (x, v) F Y X (y x) = dv f X (x) F Y X (y x) is sometimes denoted by P(Y y X = x). The function f Y X ( x) given by f Y X (y x) = f X,Y (x, y) f X (x) 30

is called the conditional density function of F Y X. Definition [Conditional Expectation]. Let X and Y be two jointly continuous random variables. The conditional expectation of Y given X is the random variable ψ(x), where the function ψ is defined on the set {x, f X (x) > 0} by ψ(x) = E(Y X = x) = The random variable ψ(x) is denoted by E(Y X). y f Y X (y x) dy Theorem (Law of Iterated Expectation). Let X and Y be two jointly continuous random variables and let ψ(x) = E(Y X) be their conditional expectation. Then: E(E(Y X)) = E(Y ) 4.4. Functions of Random Variables. Theorem [The Change of Variables Formula in Two Dimensions]. Suppose that T = T (x, x 2 ) = (y (x, x 2 ), y 2 (x, x 2 )) maps the domain A R 2 to the domain B R 2 and is invertible (one-to-one and onto). Suppose that the inverse of T is the function T : B A, T (y, y 2 ) = (x (y, y 2 ), x 2 (y, y 2 )). Let g : A R be an integrable function. We have g(x, x 2 ) dx dx 2 = g(x (y, y 2 ), x 2 (y, y 2 )) J(y, y 2 ) dy dy 2 where A J(y, y 2 ) = (x, x 2 ) x (y, y 2 ) = y (y, y 2 ) x 2 y (y, y 2 ) B x y 2 (y, y 2 ) x 2 y 2 (y, y 2 ) = x y (y, y 2 ) x 2 y 2 (y, y 2 ) x y 2 (y, y 2 ) x 2 y (y, y 2 ) J(y, y 2 ) is called the Jacobian of the transformation T at the point (y, y 2 ). Theorem [The Change of the Joint Density Under a Function (in Two Dimensions)]. Suppose that the random variables X and X 2 have the joint density function f X,X 2 which is nonzero on the set A R 2 and zero outside the set A. Suppose that B is a subset of R 2 and T = T (x, x 2 ) : A B is invertible, with the inverse T : B A, 3

defined by T (y, y 2 ) = (x (y, y 2 ), x 2 (y, y 2 )). Let (Y, Y 2 ) = T (X, X 2 ). Then the random variables Y and Y 2 are jointly continuous and have the joint density { fx,x f Y,Y 2 (y, y 2 ) = 2 (x (y, y 2 ), x 2 (y, y 2 )) J(y, y 2 ), if (y, y 2 ) B 0, otherwise where J(y, y 2 ) = (x, x 2 ) (y, y 2 ) is the Jacobian of the transformation T at (y, y 2 ). 4.5. Sums of Random Variables Theorem [The Formula for the Density of the Sum of Two Jointly Continuous Random Variables]. Suppose that the random variables X and Y are jointly continuous and have the joint density function f X,Y. Then the random variable Z = X + Y is continuous and its density function f Z is given by the formula f Z (z) = f X,Y (x, z x) dx Definition [The Convolution of Two Functions]. The convolution of two functions f : R R and g : R R is the function h = f g : R R defined by h(z) = (f g) (z) = (assuming that the integral exists) Remark: f g = g f, which is equivalent to for any z (assuming that the integrals exist). f(x)g(z x) dx f(x)g(z x) dx = f(z x)g(x) dx Proposition [The Formula for the Density of the Sum of Two Independent Jointly Continuous Random Variables]. Suppose that the random variables X and Y are jointly continuous, have the joint density function f X,Y and the marginal densities f X and f Y. Suppose also that X and Y are independent. Then the random variable Z = X + Y is continuous and its density function f Z is given by the formula f Z = f X f Y Remark: With the notations used before, we also have f Z = f Y f X. 32

Problem. Let X and Y be independent N(0, ) variables. Show that X +Y is N(0, 2). Problem 2. Show that, if X is N(µ, σ 2 ) and Y is N(µ 2, σ 2 2) and X and Y are independent, then Z = X + Y is N(µ + µ 2, σ 2 + σ 2 2). 4.6. Distributions Arising from the Normal Distribution Distributions used by statisticians: X Density f X (x) Range for x Mean Variance Normal (µ, σ 2 (x µ)2 ) e 2σ 2 2πσ (, ) µ σ 2 2 Gamma (λ, t) Γ(t) λt x t e λx t t [0, ) λ λ 2 ) d ( Chi-Squared χ 2 (d) 2 x d Γ( d 2 ) 2 e x 2 [0, ) d 2d 2 ( ) Γ( Student s t t(r) r+ 2 ) r+ πr + Γ( r 2) x2 2 r (, ) 0 (r > ) (r > 2) r r 2 F F (r, s) f F (r,s) (0, ) m (F (r, s)) σ 2 (F (r, s)) where, for the F distribution, we have f F (r,s) (x) = m (F (r, s)) = r+s r Γ( ) 2 s Γ( r )Γ( s) 2 2 (rx/s) 2 r [ + (rx/s)] 2 (r+s), x > 0 s s 2, for s > 2 σ 2 (F (r, s)) = 2s2 (r + s 2) r(s 2) 2 (s 4), for s > 4 Properties of the Gamma Function. The gamma function: Γ(t) = defined for any t > 0, satisfies the following properties: i) Γ() =, Γ( 2 ) = π ii) For any t > 0, Γ(t + ) = t Γ(t) iii) For any positive integer n, Γ(n + ) = n! = n (n ) (n 2) 2 iv) For any positive integer n, Γ(n + 2 ) = ( n 2) ( n 3 2) ( 2) π 33 0 x t e x dx,

Remark: The volume of the unit ball in n dimensions is π n 2 Γ( n 2 + ). Useful Integrals: e x2 2 dx = 2π, e x2 dx = π, 0 e x2 dx = π 2. Recall the following: The Normal (Gaussian) Distribution with Parameters µ and σ 2. The random variable X is normal (Gaussian) with parameters µ and σ 2 if it has the density function f X (x) = (x µ)2 e 2σ 2, < x < 2πσ 2 This random variable is denoted by N(µ, σ 2 ); it has mean µ and variance σ 2. N(0, ) is called the standard normal distribution (it has mean 0 and variance ). Remark. If X is N(µ, σ 2 ) and σ > 0, then Y = X µ is N(0, ). σ The density and the distribution functions of N(0, ) are denoted by φ and Φ. Thus: φ(x) = e x2 2 2π Φ(y) = P(N(0, ) y) = Another commonly used function is y φ(x) dx = 2π y erf(x) = 2 x e t2 dt π 0 e x2 2 dx The Gamma Distribution. The random variable X has the gamma distribution with parameters λ, t > 0 if it has the density f X (x) = Γ(t) λt x t e λx, if x 0 0, otherwise where Γ(t) = 0 x t e x dx. 34

This random variable is denoted by Γ(λ, t). Important special cases: Γ(λ, ) is the exponential distribution with parameter λ. If d is a positive integer and λ = 2, Γ( 2, d 2 ) is said to have the χ2 (d) distribution (the chi-squared distribution with d degrees of freedom) The Chi-Squared Distribution. Suppose that X, X 2,..., X d are independent N(0, ) random variables. Then the random variable Y = X 2 + X 2 2 + + X 2 d has the χ 2 (d) distribution (chi-squared distribution with d degrees of freedom). As defined before, χ 2 (d) = Γ( 2, d 2 ). The density function of Y = χ 2 (d) is f Y (x) = { ( ) d 2 x d Γ( d 2 ) 2 e x 2, 2 if x 0 0, otherwise The Student s t Distribution. Suppose that the random variables X and Y are independent, X is N(0, ) and Y is χ 2 (n). Then the random variable T = X is said Y n to have the t distribution with n degrees of freedom, written t(n). Its density function is f T (x) = Γ ( ) n+ 2 ( πn Γ n ) 2 ) n+ ( + x2 2, < x < n The F Distribution. Suppose that the random variables U and V are independent, U is χ 2 (r) and V is χ 2 (s). Then the random variable F = U/r is said to have the F V/s distribution with r and s degrees of freedom, written F (r, s). Its density function is f F (r,s) (x) = r+s r Γ( ) 2 s Γ( r )Γ( s) (rx/s) 2 r 2 2 [ + (rx/s)], x > 0 2 (r+s) 35

FINDING CONFIDENCE INTERVALS Definition [Sample Mean]. The sample mean of a set of random variables X, X 2,..., X n is the random variable n k= X = X k n Definition [Sample Variance]. The sample variance of a set of random variables X, X 2,..., X n is the random variable S 2 defined by S 2 = n k= (X k X) 2 n where X is the sample mean. Remark: If X, X 2,..., X n are independent N(µ, σ 2 ) random variables, X is their sample mean, and S 2 is their sample variance, then E(X) = µ and E(S 2 ) = σ 2. Theorem [The Distributions of X and S 2 ]. If X, X 2,..., X n are independent N(µ, σ 2 ) random variables, X is their sample mean, and S 2 is their sample variance, then: a) X is N(µ, σ2 ) n b) n S 2 is χ 2 (n ). σ 2 Remark: X is N(µ, σ2 n ) implies that n(x µ) σ is N(0, ). Theorem [Confidence Interval for the Mean of a Population]. If X, X 2,..., X n are independent N(µ, σ 2 ) random variables, X is their sample mean, and S 2 is their sample variance, then: a) n(x µ) is t(n ) S b) ( If 0 < α < and t is ) chosen such that P( t [ t(n ) t ) = ] α, then P X t S n µ X + t S n = α. This means that X t S n, X + t S n is a ( α) confidence interval for µ. 4.7. Sampling From a Distribution We describe two methods of sampling from a distribution: technique and 2. The rejection method.. The inverse transform 36