Central Limit Theorem ( 5.3) - PDF Free Download

Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately normal with mean nµ and variance nσ 2 as n, that is, ( ) Sn nµ P σ a P(Z a) as n, for < a <, n where Z N(0, 1). Similarly, the distribution of the sample mean X n = 1 Sn becomes n approximately N(µ, σ 2 /n) as n, that is, P ( ) X µ σ/ n a P(Z a) as n, for < a <. Related homework: 1/10, 1/13

χ 2, t, and F distributions ( 6.2) Let Z 1, Z 2,..., Z n be independent standard normal random variables and defined X = Z 2 1 + Z 2 2 + + Z 2 n. Then the distribution of X is called the chi-square distribution with degrees of freedom n, and is denoted by χ 2 n. Let Z and U be two independent random variables with Z N(0, 1) and U χ 2 n. Then the distribution of the random variable T = Z U/n is called the t distribution with degrees of freedom n, and is denoted by t n. Let U and V be two independent random variables with U χ 2 n and U χ2 n. Then the distribution of the random variable F = U/m V /n is called the F distribution with degrees of freedom m and n, and is denoted by F m,n. Related homework: 1/15, 1/17

Sample mean and sample variance ( 6.3) Let X 1, X 2,..., X n be a sequence of i.i.d. random variables (a random sample), each having mean µ and variance σ 2. The sample mean and sample variance are defined as n X = 1 n i=1 respectively. Properties of X and S 2 : X i and S 2 = 1 n 1 n (X i X ) 2, E[X ] = µ, Var[X ] = σ2 n, and E[S2 ] = σ 2. X and S 2 are independent. If the random sample is from a normal distribution, then i=1 X µ S/ n t n 1, and (n 1)S 2 σ 2 χ 2 n 1. Related homework: 1/17, 1/22

MME (Method of Moments Estimate) ( 8.4) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ. The method of moments estimate is based on the law of large numbers: n ˆµ k = 1 n i=1 Xi k µ k = E[X1 k ] as n, that is, the kth sample moment ˆµ k converges to the kth moment µ k. Thus we can use ˆµ k to estimate µ k. If the parameter θ can be determined by the moments: θ = g(µ 1,... ), then the MME for θ is ˆθ = g(ˆµ 1,... ). Whenever lower moments are sufficient to determine θ, we do not use higher moments. Related homework: 1/22, 1/24

MLE (Maximum Likelihood Estimate) ( 8.5) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ. The maximum likelihood estimate is based on the principle of maximizing the likelihood function (joint pdf/pmf) of the observed sample: lik(θ) = f (X 1, X 2,..., X n θ) = n f (X i θ). (Treating X 1,..., X n as constants and θ as the variable.) Thus the MLE ˆθ for θ satisfies lik(ˆθ) = max {lik(θ)}. θ Usually, it is easier to maximize the log-likelihood function (via calculus) i=1 l(θ) = log[lik(θ)] = log f (X 1, X 2,..., X n θ) = n log f (X i θ). In some cases (when the support of the pdf/pmf depending on θ), we must maximize the likelihood function directly. E.g. Unif(0, θ). Related homework: 1/22, 1/24 i=1

Properties of MLE ( 8.5) Let X be a random variable from a probability distribution with parameter θ. The Fisher information for θ is [ ] [ 2 ] I (θ) = E log f (X θ) = E log f (X θ). θ θ2 Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ, and ˆθ be the MLE for θ. Then the asymptotic variance of ˆθ is 1 ni (θ). Moreover, ni (θ)(ˆθ θ) becomes approximately N(0, 1) as n, that is, ( ) ˆθ θ P a P(Z a) as n, for < a <, 1/(nI (θ)) where Z N(0, 1). Related homework: 1/29

Properties of point estimates ( 8.7) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ, and ˆθ be a point estimate (e.g. MLE, MME) for θ. The bias of the point estimate ˆθ is b(ˆθ) = E[ˆθ] θ. The point estimate ˆθ is said to be unbiased if b(ˆθ) = 0. The mean sqaured error of the point estimate ˆθ is Cramér-Rao lower bound: where I (θ) is the Fisher information. Related homework: 1/27 MSE(ˆθ) = E[(ˆθ θ) 2 ] = Var[ˆθ] + b(ˆθ) 2. Var[ˆθ] 1 ni (θ),

Interval estimation confidence intervals ( 8.5) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ, and ˆθ be the MLE for θ. An approximate confidence interval for θ with confidence level 100p% is ( ) 1 1 ˆθ z 1+p 2 ni (ˆθ), ˆθ + z 1+p, 2 ni (ˆθ) where I (θ) is the Fisher information, and P(Z z p) = p for Z N(0, 1). Let X 1, X 2,..., X n be a random sample from a normal distribution N(µ, σ). A confidence interval for µ with confidence level 100p% is X t 1+p 2,n 1 S 2 n, X + t S 2 1+p 2,n 1, n where X and S 2 are the sample mean and sample variance, respectively, and P(T t p,m) = p for T t m. A confidence interval for σ 2 with confidence level 100p% is (n 1)S2 χ 2, 1+p 2,n 1 where P(U χ 2 p,m) = p for U χ 2 m. Related homework: 2/3, 2/5 (n 1)S 2, χ 2 1 p,n 1 2

Sufficient statistics ( 8.8) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ, and T = T (X 1,..., X n) be a statistic. The statistic T is said to be sufficient for the parameter θ if the conditional joint distribution of X 1,..., X n given T = t no longer depends on θ for all possible t. Factorization Theorem A statistic T is sufficient for the parameter θ if and only if f (x 1,..., x n θ) = g(t (x 1,..., x n), θ)h(x 1,..., x n), for some function g(t, θ) and h, where f (x 1,..., x n) is the joint pdf/pmf for X 1,..., X n. A probability distribution with parameter θ is said to belonging to the exponential family if its pdf/pmf is of the form f (x θ) = Then T = { e c(θ)t (x)+d(θ)+s(x), x A 0, x / A A does not depend on θ. n T (X i ) is a sufficient statistic for θ, where X 1,..., X n is a random i=1 sample. Related homework: 2/7, 2/10

General hypothesis testing ( 9.1 9.2) A hypothesis is a statement about the population distribution. If a hypothesis completely specifies the distribution, it is called a simple hypothesis. (e.g. µ = µ 0.) If a hypothesis partially specifies the distribution, it is called a composite hypothesis. (e.g. µ > µ 0.) Typically, H 0 is chosen to be the more specific hypothesis. A test for the hypotheses H 0 and H A consists of a test statistic T and a rejection region R: if T R, then we reject H 0 ; if T / R, then we do not reject H 0. Consequence of a test decision Fact Related homework: 2/17, 2/19 Decision Reject H 0 Do not reject H 0 H 0 is true Type I error Correct decision H 0 is false Correct decision Type II error

General hypothesis testing ( 9.1 9.2) The significance level α of a test is the probability of making Type I error of the given test: α = P(reject H 0 H 0 is true) = P(T R H 0 is true). The probability of making Type II error of a test is denoted by β: β = P(do not reject H 0 H 0 is false) = P(T / R H 0 is false). The power of a test is the probability (power) of detecting a false H 0, and it equals to 1 β: power = 1 β = P(reject H 0 H 0 is false) = P(T R H 0 is false). Let X 1,..., X n be a random sample and T (X 1,..., X n) = t. The p-value of the sample X 1,..., X n is the smallest significance level α = α(t) corresponding to the rejection region R = R(t) such that t R(t) (meaning rejecting H 0 based on T = t). Related homework: 2/17, 2/19

Likelihood ratio rest ( 9.1 9.2) For the hypotheses H 0 : θ = θ 0 and H A : θ = θ 1, the likelihood ratio test based on a random sample X 1,... X n has test statistic Λ = lik(x 1,..., X n θ = θ 0 ) lik(x 1,..., X n θ = θ 1 ), and rejection region R = {Λ < c}. For the likelihood ratio test, the significance level is α = P(Λ < c θ = θ 0 ); the probability of making Type II error is β = P(Λ c θ = θ 1 ); power = 1 β = P(Λ < c θ = θ 1 ); and if Λ(X 1,..., X n) = λ, then the p-value of the sample is p = P(Λ < λ θ = θ 0 ). Related homework: 2/19, 2/21

Generalized likelihood ratio test ( 9.4) For the hypotheses H 0 : θ ω 0 and H A : θ ω 1, the generalized likelihood ratio test based on a random sample X 1,... X n has test statistic Λ = max θ ω 0 lik(x 1,..., X n θ) max θ Ω lik(x 1,..., X n θ), where Ω = ω 0 ω 1, and rejection region R = {Λ < c}. For the generalized likelihood ratio test, max θ ω0 lik(x 1,..., X n θ) = lik( θ), where θ is the mle of θ under the restriction of θ ω 0 ; and max θ Ω lik(x 1,..., X n θ) = lik(ˆθ), where ˆθ is the mle of θ under the restriction of θ Ω. Under certain conditions, when the sample size n is large, the distribution of 2 log Λ under H 0 is approximately χ 2 df, where df = dimω dimω 0 and dim refers to the number of free parameters. Related homework: 2/19, 2/21

Inference for µ based on normal model known σ 2 Let X 1, X 2,..., X n be a random sample from a normal distribution with mean µ and variance σ 2 (σ 2 is known). A 100(1 α)% confidence interval for µ is ( X z 1 α 2 σ n, X + z 1 α 2 ) σ. n The test for H 0 : µ = µ 0 v.s. H A has test statistic T = X µ 0 σ/ n and H A : µ > µ 0 H A : µ < µ 0 H A : µ µ 0 Rejection Region {T > c} {T < c} { T > c} α (given c) 1 Φ(c) Φ(c) 2[1 Φ(c)] c (given α) z 1 α z α z 1 α 2 p-value (given T = t) 1 Φ(t) Φ(t) 2[1 Φ( t )] β (given µ = µ 1 and α) Ψ (z 1 α ) 1 Ψ (z α) Ψ ( where Ψ(z) = Φ z + µ 0 µ 1 σ/ n Related homework: 2/24 (z 1 α2 ) Ψ ), and Φ is the cdf of N(0, 1). (z α2 )

Inference for µ based on normal model unknown σ 2 Let X 1, X 2,..., X n be a random sample from a normal distribution with mean µ and variance σ 2 (σ 2 is unknown). A 100(1 α)% confidence interval for µ is ( X t 1 α 2,n 1 S n, X + t 1 α 2,n 1 ) S. n The test for H 0 : µ = µ 0 v.s. H A has test statistic T = X µ 0 S/ n and H A : µ > µ 0 H A : µ < µ 0 H A : µ µ 0 Rejection Region {T > c} {T < c} { T > c} α (given c) 1 F n 1 (c) F n 1 (c) 2[1 F n 1 (c)] c (given α) t 1 α,n 1 t α,n 1 t 1 α 2,n 1 p-value (given T = t) 1 F n 1 (t) F n 1 (t) 2[1 F n 1 ( t )] where F n 1 is the cdf of the t-distribution with degrees of freedom n 1. Related homework: 2/26

Test for goodness-of-fit ( 9.5) Setting: Assume that the population contains m categories and the probability that a random observation being category i is p i. A random sample of size n contains X i observations of category i. (Thus the X i s follow a multinomial distribution with parameters n and p i s.) Hypotheses: H 0 : p i = p i (θ), and H A : H 0 is not true. In words, the null hypothesis H 0 specifies a model for the p i s. The generalized likelihood ratio test: test statistic Λ = max p= p(θ) lik(x 1,..., X m p) max p 1 =1 lik(x 1,..., X m p) = m i=1 ( ) Xi p i (ˆθ), where ˆp i p = (p 1,..., p m), ˆθ is the mle for θ, and ˆp i = X i /n is the mle for p i subject to p i = 1; rejection region R = {Λ < c} and 2 log Λ is approximately χ 2 df with df = (m 1) dimθ. An equivalent test (Pearson s χ 2 test) m test statistic X 2 (O i E i ) 2 =, where O i = X i represents the observed E i=1 i counts and E i = n p i (ˆθ) represents the expected counts; rejection region R = {X 2 > c} and X 2 is approximately χ 2 df with df = (m 1) dimθ. Related homework: 3/10

Inference for µ X µ Y based on normal model with two independent samples known σ 2 ( 11.2) Let X 1,..., X n be a random sample from N(µ X, σ 2 ); Y 1,..., Y m be a random sample from N(µ Y, σ 2 ). X s and Y s are independent. (Assume σ 2 is known.) A 100(1 α)% confidence interval for µ X µ Y is ( 1 (X Y ) z 1 α σ 2 n + 1 ) 1 m, (X Y ) + z 1 α σ 2 n + 1, m Let = µ X µ Y. Test for H 0 : = 0 v.s. H A has test statistic T = (X Y ) 0 and 1 σ n + 1 m H A : > 0 H A : < 0 H A : 0 Rejection Region {T > c} {T < c} { T > c} α (given c) 1 Φ(c) Φ(c) 2[1 Φ(c)] c (given α) z 1 α z α z 1 α 2 p-value (given T = t) 1 Φ(t) Φ(t) 2[1 Φ( t )] β (given = 1 and α) Ψ (z 1 α ) 1 Ψ (z α) Ψ ( where Ψ(z) = Φ z + 0 1 σ/ n Related homework: 3/14 (z 1 α2 ) Ψ ), and Φ is the cdf of N(0, 1). (z α2 )

Inference for µ X µ Y based on normal model with two independent samples unknown σ 2 ( 11.2) Let X 1,..., X n be a random sample from N(µ X, σ 2 ); Y 1,..., Y m be a random sample from N(µ Y, σ 2 ). X s and Y s are independent. (Assume σ 2 is unknown.) A 100(1 α)% confidence interval for µ X µ Y is ( (X Y ) t 1 α 2,n+m 2 S p 1 n + 1 m, (X Y ) + t 1 α 2,n+m 2 S p 1 n + 1 m ), where S p is the pooled standard error: Sp 2 = (n 1)S2 X + (m 1)S2 Y. n + m 2 Let = µ X µ Y. Test for H 0 : = 0 v.s. H A has test statistic T = (X Y ) 0 and 1 S p n + 1 m H A : > 0 H A : < 0 H A : 0 Rejection Region {T > c} {T < c} { T > c} α (given c) 1 F n+m 2 (c) F n+m 2 (c) 2[1 F n+m 2 (c)] c (given α) t 1 α,n+m 2 t α,n+m 2 t 1 α 2,n+m 2 p-value (given T = t) 1 F n+m 2 (t) F n+m 2 (t) 2[1 F n+m 2 ( t )] where F n+m 2 is the cdf of the t-distribution with degrees of freedom n + m 2. Related homework: 3/14

Test for comparing two populations Wilcoxon rank-sum test ( 11.2) Let X 1,..., X n be a random sample from a population with cdf F and Y 1,..., Y m be a random sample from a population with cdf G. The hypotheses are H 0 : F = G and H A : F G. Wilcoxon rank-sum test (also called Mann-Whiteny test) Order the observations X i and Y j, and assign ranks (1 through n + m) to each observation according to their order. Let R(Z) denote the rank of the observation Z. Assume that m < n. The test statistic is T Y = m j=1 R(Y j ). The rejection region is {T Y < c 1 or T Y > c 2 }. The distribution of T Y under H 0 can be determined from combinatorics. For example, the pmf of T Y for n = m = 2 is p(3) = p(4) = p(6) = p(7) = 1 6 and p(5) = 1 5. In practice, we apply symmetry and use test statistic R = min(r, R ), where R = T Y and R = m(n + m + 1) R (assuming m < n); and rejection region {R < c} (TABLE 8 of textbook). Related homework: 3/17

Inference for µ X µ Y based on normal model with matched pair design ( 11.3) Let X 1,..., X n be a random sample from a population with mean µ X and Y 1,..., Y n be a random sample from a population with mean µ Y. X i and Y i are paired for each 1 i n. The differences D i = X i Y i can be regarded as a random sample from a population with mean µ X µ Y. Furthermore, D i s are assumed to be a random sample from N(d, σ 2 ), where d = µ X µ Y. The inference methods for d is exactly the same as Inference for µ based on normal model. Related homework: 3/21

Test for comparing two populations with matched pair design signed rank test ( 11.3) Let X 1,..., X n be a random sample from a population with cdf F and Y 1,..., Y n be a random sample from a population with cdf G. X i and Y i are paired for each 1 i n. Let D i = X i Y i be the differences. The hypotheses are H 0 : D i s are symmetric about 0, and H A : D i s are not symmetric about 0. Signed rank test Order the magnitude of the differences D i, and assign ranks (1 through n) to each one according to their order. Let R(D i ) denote the rank of D i. The test statistic is W + = n i=1 1 (0, )(D i ) R(D i ), where 1 (0, ) (x) = 1 if x > 0 and 0 otherwise. The rejection region is {W + < c 1 or W + > c 2 }. The distribution of W + under H 0 can be determined from combinatorics. For example, the pmf of W + for n = 2 is p(0) = p(1) = p(2) = p(3) = 1 4. In practice, we apply symmetry and use test statistic W = min(w +, W ), where W = n(n + 1)/2 W +; and rejection region {W < c} (TABLE 9 of textbook). Related homework: 3/24

One-way ANOVA ( 12.2) setting Consider I groups (populations). For each group, a random sample of size J is drawn. Let Y ij denote the jth observation in the ith sample. The statistical model is Y ij = µ + α i + ε ij, where µ is the overall average of the I groups, α i is the correction of the ith group, and ε ij s are i.i.d. N(0, σ 2 ) random variables (errors). The sum of squares between groups measures the variation between the I samples: SS B = J I (Y i Y ) 2, and i=1 SS B σ 2 χ 2 I 1 if α i = 0 for all 1 i I, where Y i = 1 J Y ij and Y = 1 I J Y ij. J IJ j=1 i=1 j=1 The sum of squares within groups measures the overall variation inside the I samples: I J SS W = (Y ij Y i ) 2 SS W, and σ 2 χ 2 I (J 1). i=1 j=1 The total sum of squares measures the overall variation of the I samples: SS T = I i=1 j=1 J (Y ij Y ) 2 = SS B + SS W.

One-way ANOVA ( 12.2) F test The hypotheses of one-way ANOVA: H 0 : α i = 0 for all 1 i I H A : H 0 is false The F test: Intuition: if the variation between groups (SS B ) is large relative to the variation within groups (SS W ), then H 0 can not be true. Test statistic: F = SS B/(I 1) SS W /(I (J 1)), and F F (I 1, I (J 1)) under H 0. Rejection region: R = {F > c} with c = F 1 α (I 1, I (J 1)), where α is the significance level. The ANOVA table: Sum of Source df Squares Mean Square F Between Groups I 1 SS B MS B = SS B I 1 Within Groups I (J 1) S W MS W = SS W I (J 1) Total IJ 1 SS T Related homework: 4/2, 3/31 F = MS B MS W

Application of χ 2 test ( 13.3) test of homogeneity Consider I populations, each containing J categories. A random sample of size N is drawn from these populations: Population 1 Population I Total Category 1 n 11 n I 1 n 1..... Category J n 1J n IJ n J Total n 1 n I n where n ij is the number of observations of category j from the ith population, n i = j n ij, n j = i n ij, and N = n = i j n ij. Let p ij be the proportion of category j in population i. The hypotheses of test of homogeneity are: H 0 : p 1j = = p Ij for all 1 j J H A : H 0 is false The χ 2 test (test of goodness-of-fit): Test statistic: I J X 2 (O ij E ij ) 2 =, E i=1 j=1 ij and X 2 χ 2 (I 1)(J 1) under H 0. where O ij = n ij and E ij = n i n j. n Rejection region: R = {X 2 > c} with c = χ 2, where α is the 1 α,(i 1)(J 1) significance level. Related homework: 4/9

Application of χ 2 test ( 13.4) test of independence Consider two discrete random variables U and V. U has I possible values, with marginal pmf P(U = u i ) = p i ; and V has J possible values, with marginal pmf P(V = v j ) = q j. A random sample of size N is drawn from the population: u 1 u I Total v 1 n 11 n I 1 n 1..... v J n 1J n IJ n J Total n 1 n I n where n ij is the number of observations of the pair (u i, v j ), n i = j n ij, n j = i n ij, and N = n = i j n ij. Let the joint pmf be P(U = u i, V = v j ) = π ij. The hypotheses of test of independence are: H 0 : π ij = p i q j for all 1 i I, 1 j J H A : H 0 is false The χ 2 test (test of goodness-of-fit): I J Test statistic is X 2 (O ij E ij ) 2 =, and X 2 χ 2 E (I 1)(J 1) under i=1 j=1 ij H 0, where O ij = n ij and E ij = n i n j. n Rejection region: R = {X 2 > c} with c = χ 2, where α is the 1 α,(i 1)(J 1) significance level. Related homework: 4/9

Simple linear regression ( 14.1) The statistical model: y i = β 0 + β 1 x i + ε i, where ε i s are i.i.d. N(0, σ 2 ) random variables (error), β 0, β 1, and x i s are nonrandom constants. Given sample data (x 1, y 1 ),..., (x n, y n), we use the least square principle to find estimates ˆβ 0 and ˆβ 1 for β 0 and β 1, respectively, that is, we minimize the Residual Sum of Squares (RSS): Consequently, we have RSS = Related homework: 4/11, 4/9 n (y i ˆβ 0 ˆβ 1 x i ) 2. i=1 ( n ) ( n i=1 ˆβ 0 = x2 i i=1 y ) ( i n i=1 x ) ( n i i=1 x ) i y i n n i=1 x2 i ( n i=1 x ) 2, i ˆβ 1 = n n i=1 x i y i ( n i=1 x ) ( n i i=1 y ) i n n i=1 x2 i ( n i=1 x ) 2. i

Simple linear regression ( 14.1) continued The least square estimates ˆβ 0, ˆβ 1 are unbiased estimates, that is, Further more, Var[ ˆβ 0 ] = E[ ˆβ 0 ] = β 0, and E[ ˆβ 1 ] = β 1. σ 2 n i=1 x2 i n n i=1 x2 i ( n i=1 x ) 2, Var[ ˆβ 1 ] = i The error variance σ 2 can be estimated by nσ 2 n n i=1 x2 i ( n i=1 x i s 2 = RSS [, where RSS = y 2 i 1 ( ) ] 2 yi [n x i y i ( x i ) ( y i )] 2 n 2 n n 2 xi 2 n ( x i ) 2 Consequently, the estimated variance of ˆβ 0 and ˆβ 1 are s 2ˆβ 0 = s 2 n i=1 x2 i n n i=1 x2 i ( n i=1 x ) 2, s 2ˆβ = 1 i ns 2 n n i=1 x2 i ( n i=1 x i ) 2. ) 2. Moreover, ˆβ 0 β 0 t n 2, and ˆβ 1 β 1 t n 2 s ˆβ s 0 ˆβ 1 Related homework: 4/14