Introduction to Statistical Inference Self-study

Contents

Definition, sample space The fundamental object in probability is a nonempty sample space Ω. An event is a subset A Ω.

Definition, σ-algebra A family of subsets of Ω, denoted by F, is a σ-algebra on Ω if 1. F. 2. If A F, then A c F. 3. If A 1, A 2,.., A i,... F, then i A i F.

Definition, probability measure Let Ω be nonempty, and let F be a σ-algebra on Ω. The mapping P : F [0, 1] is a probability measure, if 1. P(A) [0, 1], for all A F, 2. P(Ω) = 1 3. For all A 1, A 2,.., A i,... F, with A j A i =, i j, it holds that P( i A i ) = i P(A i).

Corollaries P(A) = 1 P(A c ). P(B C) = P(B) + P(C) P(B C).

Example Consider rolling two dice. The corresponding sample space Ω = {(1, 1), (1, 2),..., (6, 6)}. The event "both dice > 2" is A = {ω = (ω 1, ω 2 ) Ω ω 1 > 2, ω 2 > 2}. In this example, P({ω}) = 1/36, for all ω Ω.

Definition, conditional probability Let P(B) 0. The probability of an event A given B, P(A B), is the probability of A under the assumption that B has already occurred. Conditional probability, A given B is P(A B) = P(B A). P(B)

Example The probability of getting 3 (event A) when rolling the first dice, given that the other dice gave 4 (event B): P(A B) = P(B A)/P(B) = (1/36)/(6 1/36) = 1/6.

Definition, independence The events A 1,..., A n are independent if for all 1 i 1 < i 2 < < i k n P(A i1... A ik ) = P(A i1 ) P(A ik ).

Example In the dice example P(A B) = 1/36 and on the other hand P(A)P(B) = 1/36, for all A, B, (A, B) Ω.

Definition, random variable A real-valued random variable X is a mapping from the sample spare to the real line, i.e. X = X(ω) : Ω R. More precisely: Let Ω be nonempty and let F be a σ-algebra on Ω. Let X = X(ω) : Ω R be function. If {ω X(ω) r} F for all r R (i.e. X is F measurable), then X is a random variable.

Example, two dice As an example of a random variable, consider the sum: X : {(1, 1),..., (6, 6)} {2,..., 12}, X(ω) = ω 1 + ω 2. Note, however, that the identity function Y (ω 1, ω 2 ) = (ω 1, ω 2 ) also defines a random variable. Since Y : Ω R 2, this random variable is vector valued.

Definition, probability function The probability function of a random variable X, denoted P X, is P X (A) = P({ω : X(ω) A}).

Definition, cumulative distribution function The cumulative distribution function (cdf) of a random variable X, denoted F X, is F X (x) = P({ω Ω : X(ω) x}) (or shortly = P X (X x)).

Random variable Usually, in practice, ω is not observed directly and analysis is based on the observed random variable X(ω). Thus statistical analysis is based on the measure P X, not on P.

Definition, density function and probability mass function The probability density function (pdf) f X (x) of a continuous random variable X is the derivative of its cumulative distribution function f X (x) = d dx F X (x). (Note that the density function does not always exist.) In the case of a discrete random variable X, the analogue of a probability density function is a probability mass function (pmf) p X (x) = P(X = x), which corresponds to the probability of the event X = x.

Random variables are often defined by giving their cumulative distribution functions and/or density functions.

Examples discrete X: for example Binomial or Poisson distribution continuous X: for example uniform, normal or exponential distribution

Multivariate distributions Very often, instead of dealing with one random variable X only, we are interested in several random variables X 1,..., X k.

Joint cumulative distribution function Let X 1,..., X k be random variables. Then the joint cumulative distribution function of X 1,..., X k is given by F X1,...,X k (x 1,..., x k ) = P(X 1 x 1,..., X k x k ).

Joint probability density function Let X 1,..., X k be continuous random variables. Then the joint probability density function of X 1,..., X k (if it exists) is given by f X1,...,X k (x 1,..., x k ) = d n dx 1 dx k F X1,...,X k (x 1,..., x k ).

Joint probability mass function Let X 1,..., X k be discrete random variables. Then the joint probability mass function of X 1,..., X k is given by p X1,...,X k (x 1,..., x k ) = P(X 1 = x 1,..., X k = x k ).

Let X 1,..., X k be continuous random variables. Assume that the joint probability density function f X1,...,X k (x 1,..., x k ) exists. Then P((X 1,..., X k ) A) = f X1,...,X k (x 1,..., x k )dx 1...dx k. (x 1,...,x k ) A For discrete variables P((X 1,..., X n ) A) = c,c C,(X 1 (c),...,x n(c)) A p X1,...,X k (x 1,..., x k ). (In the formula above C is the sample space of the random event.)

Marginal distributions Let Z 1,..., Z h and Y 1,..., Y l be continuous random variables with joint probability density functions f Z1,...,Z h (z 1,..., z h ), f Y1,...,Y l (y 1,..., y l ) and f Z1,...,Z h,y 1,...,Y l (z 1,..., z h, y 1,..., y l ). Then f Z1,...,Z h (z 1,..., z h ) = For discrete variables p Z1,...,Z h (z 1,..., z h ) = f Z1,...,Z h,y 1,...,Y l (z 1,..., z h, y 1,..., y l )dy 1...dy l. y 1 <,...,y l < p Z1,...,Z h,y 1,...,Y l (z 1,..., z h, y 1,..., y l ).

Independence Let X 1,..., X n be continuous random variables with probability density functions f X1 (x 1 )... f Xn (x n ) and a joint probability density function f X1,...,X n (x 1,..., x n ). If f X1,...,X n (x 1,..., x n ) = f X1 (x 1 ) f Xn (x n ), the random variables X 1,..., X n are independent. Discrete random variables are independent, if p X1,...,X n (x 1,..., x n ) = p X1 (x 1 ) p Xn (x n ).

Example, independence Let X and Y have the joint pdf { x + y, 0 x 1, 0 y 1 f (x, y) = 0, otherwise. Are the variables X and Y independent? Now and f (x) = f (y) = 1 0 1 0 (x + y)dy = x + 1 2, 0 < x < 1 (x + y)dy = y + 1 2, 0 < y < 1. If the random variables are independent, then f (x, y) = f (x) f (y). Let x=1/3 and y=1/3. Now On the other hand, f (x, y) = x + y = 1/3 + 1/3 = 2/3. f (x) f (y) = (x + 1/2) (y + 1/2) = 5/6 5/6 = 25/36 2/3. Thus X and Y are not independent.

Example, independence Let X and Y have the joint pmf 1, x {1, 2}, y {1, 2} p(x, y) = 4. 0, otherwise Now p(x) = y {1,2} p(x, y) = 1/4 + 1/4 = 1 2, x {1, 2}, and otherwise p(x) = 0, and p(y) = x {1,2} p(x, y) = 1/4 + 1/4 = 1 2, y {1, 2}, and otherwise p(y) = 0.

If p(x, y) = p(x) p(y), then X and Y are independent. Now and p(x) p(y) = 1 2 1 2 = 1 = p(x, y), x {1, 2}, y {1, 2} 4 p(x) p(y) = 0 = p(x, y), otherwise. The random variables are independent!

Conditional distribution Let Z 1,..., Z n and Y 1,..., Y m continuous random variables with joint probability density functions f Z1,...,Z n (z 1,..., z n ), f Y1,...,Y m (y 1,..., y m ) and f Z1,...,Z n,y 1,...,Y m (z 1,..., z n, y 1,..., y m ). Then f Y1,...,Y m Z 1,...,Z n (y 1,..., y m z 1,..., z n ) = f Z 1,...,Z n,y 1,...,Y m (z 1,..., z n, y 1,..., y m ), f Z1,...,Z n (z 1,..., z n ) for f Z1,...,Z n (z 1,..., z n ) > 0. For discrete random variables p Y1,...,Y m Z 1,...,Z n (y 1,..., y m z 1,..., z n ) = p Z 1,...,Z n,y 1,...,Y m (z 1,..., z n, y 1,..., y m ), p Z1,...,Z n (z 1,..., z n ) for p Z1,...,Z n (z 1,..., z n ) > 0.

Definition, expected value Let X be a continuous random variable. If h(x) f X (x)dx <, then the expected value of a random variable h(x) is (the real number) E[h(X)] = h(x)f X (x)dx. Let X be a discrete random variable with the domain I. If x I h(x) p X (x) <, then the expected value of h(x) is E[h(X)] = x I h(x)p X (x).

Example The expected value of X, E[X], is obtained by setting h(x) = X. The variance of X, var[x], is obtained by setting h(x) = (X E[X]) 2. The kth moment of X, E[X k ], is obtained by setting h(x) = X k.

Numerical example, expected value Let X be a continuous random variable with the pdf { 1, 0 x 1 f X (x) = 0, otherwise. Now E[X] = x f X (x)dx = 1 0 x 1dx = 1 2. Let X be a discrete random variable with the pmf Now p X (x) = P(X = x) = 1 30 x 2, x = {1, 2, 3, 4}. E[X] = x p X (x) = 1 1 30 + 2 4 30 + 3 9 30 + 4 16 30 = 10 3.

Theorems, formulae for expectation and variance Let X 1,..., X n be random variables with finite expectations and variances. Let a, b R. Then E[ n i=1 X i] = n i=1 E[X i] E[aX i + b] = ae[x i ] + b var[ax i + b] = a 2 var[x i ] Let X 1,..., X n be independent. Then E[X 1 X 2 X n ] = E[X 1 ]E[X 2 ] E[X n ] var[ n i=1 X i] = n i=1 var[x i]

J. S. Milton, J. C. Arnold: Introduction to Probability and Statistics, McGraw-Hill Inc 1995. J. Crawshaw, J. Chambers: A Concise Course in Advanced Level Statistics, Nelson Thornes Ltd 2013. R. V. Hogg, J. W. McKean, A. T. Craig: Introduction to Mathematical Statistics, Pearson Education 2005. Pertti Laininen: Todennäköisyys ja sen tilastollinen soveltaminen, Otatieto 1998, numero 586. Ilkka Mellin: Tilastolliset menetelmät, http://math.aalto.fi/opetus/sovtoda/materiaali.html.