STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A} for all A Y. If X is a discrete rv, the rv Y = g(x is discrete and has pmf given by { x g p Y (y = (y p X(x for y Y 0 for y / Y. If X has pdf f X then the cdf of Y = g(x is given by F Y (y = f X (xdx, < y <. {g(x<y} If X has pdf f X which is continuous on X and if g is monotone and g has a continuous derivative then the pdf of Y = g(x is given by { fx (g (y d f Y (y = dy g (y for y Y 0 for y / Y. If X has mgf M X (t, the mgf of Y = ax + b is given by M Y (t = e bt M X (at. If X has continuous cdf F X, the rv U = F X (X has the uniform distribution on (0,. To generate X F X, generate U Uniform(0, and set X = F (U, where F X (u = inf{x : F X(x u}. X If F X is strictly increasing, F X satisfies for all 0 < u < and < x <. x = F X (u u = F X(x
. Transformations of multiple random variables Let (X, X be a pair of continuous random variables with pdf f (X,X having joint support X = {(x, x : f (X,X (x, x > 0} and let Y = g (X, X and Y = g (X, X, where g and g are one-to-one functions taking values in X and returning values in the set Y = {(y, y : y = g (x, x, y = g (x, x, (x, x X }. Then the joint pdf f (Y,Y of (Y, Y is given by { f(x,x f (Y,Y (y, y = (g (y, y, g (y, y J(y, y for (y, y Y 0 for (y, y / Y, where g and g are the functions satisfying y = g (x, x and y = g (x, x x = g (y, y and x = g (y, y and J(y, y = y g (y, y y g (y, y y g provided J(y, y is not equal to zero for all (y, y Y. For real numbers a, b, c, d, a c y g b d = ad bc. (y, y (y, y, Let X,..., X n be mutually independent rvs such that X i has mgf M Xi (t for i =,..., n. Then the mgf of Y = X + + X n is given by M Y (t = n M Xi (t. i= 3. Random samples, sums of rvs, and order statistics A random sample is a collection of mutually independent rvs all having the same distribution. This common distribution is called the population distribution and quantities describing the population distribution are called population parameters. A sample statistic is a function of the rvs in the random sample and its distribution is called its sampling distribution. What we can learn about population parameters from sample statistics has everything to do with their sampling distributions. Random samples might be introduced in any of the following ways, Let X,..., X n be a random sample from a population with such-and-such distribution. Let X,..., X n be independent copies of the random variable X, where X has such-and-such distribution.
Let X,..., X n be IID rvs from such-and-such distribution (IID means independent and identically distributed. Let X,..., X n iid such-and-such distribution. Let X,..., X n be a random sample from a population with mean µ and variance σ < and define the sample mean and sample variance as X n = n (X + + X n S n = n We have E X n = µ, Var X n = σ /n, and ES n = σ. n (X i X n. For a random sample X,..., X n, the order statistics are the ordered data values, represented as i= X ( < X ( < < X (n. Let X ( < < X (n be the order statistics of a random sample from a population with cdf F X and pdf f X. Then the pdf of the kth order statistic X (k is given by f X(k (x = n! (k!(n k! [F X(x] k [ F X (x] n k f X (x for k =,..., n. In particular, the minimum X ( and the maximum X (n have the pdfs f X( (x = n[ F X (x] n f X (x f X(n (x = n[f X (x] n f X (x Let X ( < < X (n be the order statistics of a random sample from a population with cdf F X and pdf f X. Then the joint pdf of the jth and kth order statistics (X (j, X (k, with j < k n, is given by f (X(j,X (k (u, v = n! (j!(k j!(n k! f X(uf X (v[f X (u] j [F X (v F X (u] k j [ F X (v] n k for < u < v <. In particular, the joint pdf of the minimum and maximum (X (, X (n is for < u < v <. f (X(,X (n (u, v = n(n f X (uf X (v[f X (v F X (u] n 4. Pivot quantities and sampling from the Normal distribution A pivot quantity is a function of sample statistics and population parameters which has a known distribution (not depending on unknown parameters. 3
A ( α00% confidence interval (CI for an unknown parameter θ R is an interval (L, U where L and U are random variables such that P (L,U (L < θ < U = α, for α (0,. Important distributions and upper quantile notation: The Normal(0, distribution has pdf φ(z = π e z / ( < z <, and for ξ (0, we denote by z ξ the value satisfying P Z (Z > z ξ = ξ, where Z Normal(0,. The Chi-square distributions χ ν, ν =,,... have the pdfs f X (x; ν = Γ(ν/ ν/ xν/ e x/ (x > 0, and for ξ (0, we denote by χ ν,ξ the value satisfying P X(X > χ ν,ξ = ξ, where X χ ν. The parameter ν is called the degrees of freedom. The t distributions t ν, ν =,,... have the pdfs f T (t; ν = ν+ Γ( ( (ν+/ Γ( ν + t ( < t <, νπ ν and for ξ (0, we denote by t ν,ξ the value satisfying P T (T > t ν,ξ = ξ, where T t ν. The parameter ν is called the degrees of freedom. The F distributions F ν,ν, ν, ν =,,... have the pdfs f R (r; ν, ν = Γ( ν +ν Γ( ν Γ( ν ( ν ν ν / ( r (ν / + ν (ν +ν / r (r > 0, ν and for ξ (0, we denote by F ν,ν,ξ the value satisfying P R (R > F ν,ν,ξ = ξ, where R F ν,ν. The parameters ν and ν are called, respectively, the numerator degrees of freedom and the denominator degrees of freedom. Let Z Normal(0, and X χ ν be independent rvs. Then T = Z X/ν t ν. Let X χ ν and X χ ν be independent rvs. Then R = X /ν X /ν F ν,ν. Let X,..., X n be a random sample from the Normal(µ, σ distribution. Then n( X n µ/σ Normal(0,, giving P Xn ( X n z α/ σ n < µ < X n + z α/ σ n = α. (n S n/σ χ n, giving P S n ((n S n/χ n,α/ < σ < (n S n/χ n, α/ = α. 4
n( X n µ/s n t n, giving P ( Xn,S n( X S n t n n,α/ n < µ < X S n + t n n,α/ n = α. Consider two independent random samples X,..., X n from the Normal(µ, σ distribution and Y,..., Y n from the Normal(µ, σ distribution with sample means X n and Ȳn and sample variances S and S, respectively. Then S /σ S /σ F n,n, giving ( S P (S,S F S n,n, α/ < σ σ < S F S n,n,α/ = α. 5. Parametric estimation and properties of estimators Parametric framework: Let X,..., X n have a joint distribution which depends on a finite number of parameters θ,..., θ d, the values of which are unknown and lie in the spaces Θ,..., Θ d, respectively, where Θ k R, for k =,..., d. To know the joint distribution of X,..., X n, it is sufficient to know the values of θ,..., θ d, so we define estimators ˆθ,..., ˆθ d of θ,..., θ d based on X,..., X n. Nonparametric framework: Let X,..., X n have a joint distribution which depends on an infinite number of parameters. For example, suppose X,..., X n is a random sample from a distribution with pdf f X, where we do not specify any functional form for f X. Then we may regard the value of the pdf f X (x at every point x R as a parameter, and since there are an infinite number of values of x R, there are infinitely many parameters. The bias of an estimator ˆθ of a parameter θ is defined as Bias ˆθ = Eˆθ θ. And estimator is called unbiased if Bias ˆθ = 0, that is if Eˆθ = θ. The standard error of an estimator ˆθ of a parameter θ is defined as SE ˆθ = Var ˆθ, so the standard error of an estimator is simply its standard deviation. The mean squared error (MSE of an estimator ˆθ of a parameter θ is defined as MSE ˆθ = E(ˆθ θ, so the MSE of ˆθ is the expected squared distance between ˆθ and θ. MSE ˆθ = Var ˆθ + (Bias ˆθ. If ˆθ is an unbiased estimator of θ, it is not generally true that τ(ˆθ is an unbiased estimator of τ(θ. 6. Large-sample properties of estimators, consistency and WLLN 5
An sequence of estimators {θ n } n of a parameter θ Θ R is called consistent if for every ɛ > 0 and every θ Θ, lim n P ( ˆθ n θ < ɛ =. The weak law of large numbers (WLLN: Let X,..., X n be a random sample from a distribution with mean µ and variance σ <. Then X n = n (X + + X n is a consistent estimator of µ. A sequence of estimators {ˆθ n } n for θ is consistent if (a lim n Var ˆθ n = 0 (b lim n Bias ˆθ n = 0 Suppose {ˆθ,n } n and {ˆθ,n } n be sequences of estimators for θ and θ, respectively. Then (a ˆθ,n ± ˆθ,n is consistent for θ ± θ. (b ˆθ,nˆθ,n is consistent for θ θ. (c ˆθ,n /ˆθ,n is consistent for θ /θ, so long as θ 0. (d For any continuous function τ : R R, τ(ˆθ,n is consistent for τ(θ. (e For any sequences {a n } n and {b n } n such that lim n a n = and lim n b n = 0, a nˆθ,n + b n is consistent for θ. If X,..., X n is a random sample from a distribution with mean µ and variance σ < and 4th moment µ 4 <, the sample variance S n is consistent for σ. If X,..., X n is a random sample from the Bernoulli(p distribution and ˆp = n (X + + X n, then ˆp( ˆp is a consistent estimator of p( p. For a consistent sequence of estimators {ˆθ n } n of θ, we often write ˆθ n p θ, where p denotes something called convergence in probability. probability to θ. We often say, ˆθ n converges in 7. Large-sample pivot quantities and central limit theorem A sequence of random variables Y, Y,... with cdfs F Y, F Y,... is said to converge in distribution to the random variable Y F Y if lim F Y n (y = F Y (y n for all y R. We express convergence in distribution with the notation Y n D Y and we refer to the distribution with cdf F Y as the asymptotic distribution of the sequence of random variables Y, Y,... Convergence in distribution is a sense in which a random variable (we can think of a quantity computed on larger and larger samples behaves more and more like another random variable (as the sample size grows. A large-sample pivot quantity is a function of sample statistics and population parameters for which the asymptotic distribution is fully known (not depending on unknown parameters. 6
The Central Limit Theorem (CLT: Let X,..., X n be a random sample from any distribution and suppose the distribution has mean µ and variance σ <. Then X µ σ/ n D Z, Z Normal(0,. Application to CIs: The above means that for any α (0,, ( lim P z α/ < X µ n σ/ n < z α/ = α, giving that X n ± z α/ σ n is an approximate ( α00% CI for µ as long as n is large. A rule of thumb says n 30 is large. Slutzky s Theorem: Let X n D X and Y n p. Then (a X n Y n D X (b X n + Y n D X Corollary of Slutzky s Theorem: Let X,..., X n be a random sample from any distribution and suppose the distribution has mean µ and variance σ <. Moreover, let ˆσ n be a consistent estimator of σ based on X,..., X n. Then X n µ ˆσ n / n D Z, Z Normal(0,. Application to CIs: The above means that for any α (0,, ( lim P z α/ < X µ n ˆσ n / n < z α/ = α, giving that X n ± z α/ ˆσ n n is an approximate ( α00% CI for µ as long as n is large. A rule of thumb says n 30 is large. Let X,..., X n be a random sample from any distribution and suppose the distribution has mean µ and variance σ < and 4th moment µ 4 <. Then X n µ S n / n D Z, Z Normal(0,, so that for large n, X n ± z α/ S n n is an approximate ( α00% CI for µ. 7
Let X,..., X n be a random sample from the Bernoulli(p distribution and let ˆp n = X n. Then ˆp n p ˆp n( ˆp n n D Z, Z Normal(0,, so that for large n (a rule of thumb is to require min{nˆp n, n( ˆp n } 5, ˆp n ± z α/ ˆpn ( ˆp n n is an approximate ( α00% CI for p. Summary of ( α00% CIs for µ: Let X,..., X n be independent rvs with the same distribution as the rv X, where EX = µ, Var X = σ. X ± z α/ σ n (exact σ known X Normal(µ, σ σ unknown X ± t n,α/ S n n (exact X ± z α/ σ n (approx X non-normal σ known n 30 σ unknown X ± z α/ S n n (approx n < 30 Beyond the scope of this course 8. Classical two-sample results comparing means, proportions, and variances Summary of ( α00% CIs for µ µ : Let X,..., X n be independent rvs with the same distribution as the rv X, where EX = µ and Var X = σ and let Y,..., Y n be independent rvs with the same distribution as the rv Y, where EY = µ and Var σ. 8
Case (i: σ σ : X X ± tˆν,α/ s n + s n Normal min{n, n } < 30 non-normal Beyond the scope of this course. min{n, n } 30 X X ± z α/ s n + s n In the above Case (ii: σ = σ : ˆν = ( S n + S n ( ( S S n n + n n X X ± t n +n,α/. s pooled ( n + n Normal min{n, n } < 30 non-normal Beyond the scope of this course. min{n, n } 30 X X ± z α/ s pooled ( n + n 9
A ( α00% CI for p p : Let X,..., X iid n Bernoulli(p and Y,..., Y iid n Bernoulli(p and let ˆp = n (X + + X n and ˆp = n (Y + + Y n. Then for large n and n (Rule of thumb is min{n ˆp, n ( ˆp } 5 and min{n ˆp, n ( ˆp } 5 ˆp ( ˆp ˆp ˆp ± z α/ + ˆp ( ˆp n n is an approximate ( α00% CI for p p. A ( α00% CI for σ/σ : Consider two independent random samples X,..., X n from the Normal(µ, σ distribution and Y,..., Y n from the Normal(µ, σ distribution with sample variances S and S, respectively. Then ( S F S n,n, α/, S F S n,n,α/ is a ( α00% CI for σ /σ. 9. Sample size calculations Let ˆθ be an estimator of a parameter θ and let ˆθ ± η be a CI for θ. Then the quantity η is called the margin of error (ME of the CI. Let X,..., X n be a random sample from a population with mean µ and variance σ. The smallest sample size n such that the ME of the CI X n ± z α/ σ n is at most M is n = (zα/ M σ where x is the smallest integer greater than or equal to x. Plug in an estimate ˆσ of σ from a previous study to make the calculation. Let X,..., X n be a random sample from the Bernoulli(p distribution. The smallest sample size n such that the ME of the CI p( p ˆp ± z α/ n is at most M is (zα/ n = p( p. M Plug in an estimate ˆp of p from a previous study to make the calculation or use p = / to err on the side of larger n. Let X,..., X n be iid rvs with mean µ and variance σ and let Y,..., Y n be iid rvs with mean µ and variance σ. To find the smallest n = n + n such that the ME of the CI X Ȳ ± z σ α/ + σ n n 0,
is at most M, compute and set n = ( n zα/ = (σ + σ M n and n = σ + σ ( σ ( σ σ + σ n. Plug in estimates ˆσ and ˆσ for σ and σ from a previous study to make the calculation. Let X,..., X n be a random sample from the Bernoulli(p distribution and let Y,..., Y n be a random sample from the Bernoulli(p distribution. To find the smallest n = n + n such that the ME of the CI p ( p ˆp ˆp ± z α/ + p ( p n n is at most M, compute n = and set ( p ( p n = p ( p + n p ( p ( zα/ ( p ( p M + p ( p and n = ( p ( p p ( p + n. p ( p Plug in estimates ˆp and ˆp for p and p from a previous study to make the calculation. 0. First principles of estimation, sufficiency and Rao Blackwell theorem Let X,..., X n have a joint distribution depending on the parameter θ Θ R d. A statistic T (X,..., X n is a sufficient statistic for θ if it carries all the information about θ contained in X,..., X n. Specifically, T (X,..., X n is a sufficient statistic for θ if the joint distribution of X,..., X n conditional on the value of T (X,..., X n does not depend on θ. We can establish sufficiency using the following results: For discrete rvs (X,..., X n p (X,...,X n(x,..., x n ; θ, T (X,..., X n is a sufficient statistic for θ if for all (x,..., x n in the support of (X,..., X n, the ratio p (X,...,X n(x,..., x n ; θ p T (T (x,..., x n ; θ is free of θ, where p T (t; θ is the pmf of T (X,..., X n. For continuous rvs (X,..., X n f (X,...,X n(x,..., x n ; θ, T (X,..., X n is a sufficient statistic for θ if for all (x,..., x n in the support of (X,..., X n, the ratio f (X,...,X n(x,..., x n ; θ f T (T (x,..., x n ; θ is free of θ, where f T (t; θ is the pdf of T (X,..., X n.
The factorization theorem gives another way to see whether a statistic is sufficient for θ: For discrete rvs (X,..., X n p (X,...,X n(x,..., x n ; θ, T (X,..., X n is a sufficient statistic for θ if and only if there exist functions g(t ; θ and h(x,..., x n such that p (X,...,X n(x,..., x n ; θ = g(t (x,..., x n ; θh(x,..., x n for all (x,..., x n in the support of (X,..., X n and all θ Θ. For continuous rvs (X,..., X n f (X,...,X n(x,..., x n ; θ, T (X,..., X n is a sufficient statistic for θ if and only if there exist functions g(t ; θ and h(x,..., x n such that f (X,...,X n(x,..., x n ; θ = g(t (x,..., x n ; θh(x,..., x n for all (x,..., x n in the support of (X,..., X n and all θ Θ. We also consider statistics with multiple components which take the form T (X,..., X n = (T (X,..., X n,..., T K (X,..., X n for some K. For example, T (X,..., X n = (X (,..., X (n is the set of order statistics (which is always sufficient; here K = n. Minimum-variance unbiased estimator (MVUE: Let ˆθ be an unbiased estimator of θ Θ R. If Var ˆθ Var θ for every unbiased estimator θ of θ, then ˆθ is a MVUE. Rao Blackwell Theorem: Let θ be an estimator of θ Θ R such that E θ = θ and let T be a sufficient statistic for θ. Define ˆθ = E[ θ T ]. Then ˆθ is an estimator of θ such that Eˆθ = θ and Var ˆθ Var θ. Interpretation: An unbiased estimator of θ can always be improved by taking into account the value of a sufficient statistic for θ. Therefore, a MVUE must be a function of a sufficient statistic. The MVUE for a parameter θ is (essentially unique (for all situations considered in this course; to find it, do the following: (a Find a sufficient statistic for θ. (b Find a function of the sufficient statistic which is unbiased for θ. This is the MVUE. This also works for finding the MVUE of a function τ(θ of θ. In step (b, just find a function of the sufficient statistic which is unbiased for τ(θ.. MoMs and MLEs Let X,..., X n be independent rvs with the same distribution as the rv X, where X has a distribution depending on the parameters θ,..., θ d. If the first d moments EX,..., EX d of X are finite, then the method of moments (MoMs estimators ˆθ,..., ˆθ d are the values of θ,..., θ d which solve the following system of equations: m := n X i = EX =: µ n (θ,..., θ d m d := n i=. n Xi d = EX d =: µ d(θ,..., θ d i=
Let X,..., X n be a random sample from a distribution with pdf f X (x; θ,..., θ d or pmf p X (x; θ,..., θ d which depends on some parameters (θ,..., θ d Θ R d. Then the likelihood function is defined as L(θ,..., θ d ; X,..., X n = { n i= f X(X i ; θ,..., θ d, if X,..., X n iid f X (x; θ,..., θ d n i= p X(X i ; θ,..., θ d, if X,..., X n iid p X (x; θ,..., θ d. Moreover, the log-likelihood function is defined as l(θ,..., θ d ; X,..., X n = log L(θ,..., θ d ; X,..., X n. Let X,..., X n be rvs with likelihood function L(θ,..., θ d ; X,..., X n for some parameters (θ,..., θ d Θ R d. Then the maximum likelihood estimators (MLEs of θ,..., θ d are the values ˆθ,..., θ d which maximize the likelihood function over all (θ,..., θ d Θ. That is (ˆθ,..., ˆθ d = argmax L(θ,..., θ d ; X,..., X n (θ,...,θ d Θ = argmax l(θ,..., θ d ; X,..., X n (θ,...,θ d Θ If l(θ,..., θ d ; X,..., X n is differentiable and has a single maximum in the interior of Θ, then we can find the MLEs ˆθ,..., θ d by solving the following system of equations: θ l(θ,..., θ d ; X,..., X n = 0. l(θ,..., θ d ; X,..., X n = 0. θ d The MLE is always a function of a sufficient statistic. If ˆθ is the MLE for θ, then τ(ˆθ is the MLE for τ(θ, for any function τ. 3
pmf/pdf X M X (t EX Var X p X (x; p = p x ( p x, x = 0, pe t + ( p p p( p p X (x; n, p = ( n x p x ( p n x, x = 0,,..., n [pe t + ( p] n np np( p p X (x; p = ( p x p, x =,,... p X (x; p, r = ( x r ( p x r p r, x = r, r +,... [ pe t ( pe t p ( pp pe t ( pe t ] r rp r( pp p X (x; λ = e λ λ x /x! x = 0,,... e λ(et λ λ p X (x; N, M, K = ( ( M N M ( x K x / N K x = 0,,..., K complicadísimo! KM p X (x; K = K x =,..., K K N K K+ x= etx p X (x; x,..., x n = n x = x,..., x n n n i= etx i x = n n i= x i f X (x; µ, σ = π σ exp ( f X (x; α, β = x α exp Γ(αβ α f X (x; λ = exp ( x λ λ (x µ σ ( x β KM N n (N K(N M N(N (K+(K n i= (x i x < x < e µt+σ t / µ σ 0 < x < ( βt α αβ αβ 0 < x < ( λt λ λ f X (x; ν = x ν/ exp ( Γ(ν/ x 0 < x < ( t ν/ ν ν ν/ f X (x; α, β = Γ(α+β Γ(αΓ(β xα ( x β 0 < x < + ( t k k α+r α k= k! r=0 α+β+r α+β αβ (α+β (α+β+ Table : Commonly encountered pmfs and pdfs along with their mgfs, expected values, and variances. { p (X,...,X K (x,..., x K ; p,..., p K = p x p x K K (x,..., x K {0, } K : } K k= x k = ( { n! p (Y,...,Y K (y,..., y K ; n, p,..., p K = y! y K p y! p y K K (y,..., y K {0,,..., n} K : } K k= y k = n [ ( ( ( ( ] f (X,Y (x, y; µ X, µ Y, σx, σ Y, ρ = ( exp x µ X π σ X σ Y ρ ρ σ X ρ x µ X y µ Y y µ σ X σ Y + Y σ Y Table : The multinoulli and multinomial pmfs and the bivariate Normal pdf. 4