Central Limit Theorem ( 5.3)

Similar documents
Review. December 4 th, Review

Mathematical statistics

Mathematical statistics

Institute of Actuaries of India

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Summary of Chapters 7-9

Statistics. Statistics

Theory of Statistics.

STAT 512 sp 2018 Summary Sheet

A Very Brief Summary of Statistical Inference, and Examples

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

BTRY 4090: Spring 2009 Theory of Statistics

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Bias Variance Trade-off

simple if it completely specifies the density of x

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Chapter 8.8.1: A factorization theorem

Hypothesis testing: theory and methods

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

STAT 730 Chapter 4: Estimation

Problem Selected Scores

Lecture 15. Hypothesis testing in the linear model

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

HT Introduction. P(X i = x i ) = e λ λ x i

Math 494: Mathematical Statistics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

A Very Brief Summary of Statistical Inference, and Examples

Statistical Inference

Ch 2: Simple Linear Regression

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

TUTORIAL 8 SOLUTIONS #

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

MATH4427 Notebook 2 Fall Semester 2017/2018

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Master s Written Examination

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Chapters 9. Properties of Point Estimators

Spring 2012 Math 541B Exam 1

Confidence Intervals, Testing and ANOVA Summary

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Chapter 7. Hypothesis Testing

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Stat 5102 Final Exam May 14, 2015

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Asymptotic Statistics-VI. Changliang Zou

Simple Linear Regression

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

2017 Financial Mathematics Orientation - Statistics

Statistics and Econometrics I

Mathematical statistics

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Final Examination Statistics 200C. T. Ferguson June 11, 2009

1 One-way analysis of variance

Regression Estimation Least Squares and Maximum Likelihood

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Sociology 6Z03 Review II

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Master s Written Examination

Statistics 3858 : Maximum Likelihood Estimators

Ch. 5 Hypothesis Testing

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Introduction to Estimation Methods for Time Series models Lecture 2

Elements of statistics (MATH0487-1)

Statistics 135 Fall 2008 Final Exam

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Lecture 32: Asymptotic confidence sets and likelihoods

Analysis of Variance

Session 3 The proportional odds model and the Mann-Whitney test

Section 4.6 Simple Linear Regression

Introduction to Simple Linear Regression

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Statistics 3858 : Contingency Tables

This paper is not to be removed from the Examination Halls

First Year Examination Department of Statistics, University of Florida

Probability and Statistics Notes

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Some General Types of Tests

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Probability Theory and Statistics. Peter Jochumzen

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Problem 1 (20) Log-normal. f(x) Cauchy

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Master s Written Examination - Solution

8. Hypothesis Testing

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

F & B Approaches to a simple model

Topic 19 Extensions on the Likelihood Ratio

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Asymptotic Statistics-III. Changliang Zou

Performance Evaluation and Comparison

Transcription:

Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately normal with mean nµ and variance nσ 2 as n, that is, ( ) Sn nµ P σ a P(Z a) as n, for < a <, n where Z N(0, 1). Similarly, the distribution of the sample mean X n = 1 Sn becomes n approximately N(µ, σ 2 /n) as n, that is, P ( ) X µ σ/ n a P(Z a) as n, for < a <. Related homework: 1/10, 1/13

χ 2, t, and F distributions ( 6.2) Let Z 1, Z 2,..., Z n be independent standard normal random variables and defined X = Z 2 1 + Z 2 2 + + Z 2 n. Then the distribution of X is called the chi-square distribution with degrees of freedom n, and is denoted by χ 2 n. Let Z and U be two independent random variables with Z N(0, 1) and U χ 2 n. Then the distribution of the random variable T = Z U/n is called the t distribution with degrees of freedom n, and is denoted by t n. Let U and V be two independent random variables with U χ 2 n and U χ2 n. Then the distribution of the random variable F = U/m V /n is called the F distribution with degrees of freedom m and n, and is denoted by F m,n. Related homework: 1/15, 1/17

Sample mean and sample variance ( 6.3) Let X 1, X 2,..., X n be a sequence of i.i.d. random variables (a random sample), each having mean µ and variance σ 2. The sample mean and sample variance are defined as n X = 1 n i=1 respectively. Properties of X and S 2 : X i and S 2 = 1 n 1 n (X i X ) 2, E[X ] = µ, Var[X ] = σ2 n, and E[S2 ] = σ 2. X and S 2 are independent. If the random sample is from a normal distribution, then i=1 X µ S/ n t n 1, and (n 1)S 2 σ 2 χ 2 n 1. Related homework: 1/17, 1/22

MME (Method of Moments Estimate) ( 8.4) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ. The method of moments estimate is based on the law of large numbers: n ˆµ k = 1 n i=1 Xi k µ k = E[X1 k ] as n, that is, the kth sample moment ˆµ k converges to the kth moment µ k. Thus we can use ˆµ k to estimate µ k. If the parameter θ can be determined by the moments: θ = g(µ 1,... ), then the MME for θ is ˆθ = g(ˆµ 1,... ). Whenever lower moments are sufficient to determine θ, we do not use higher moments. Related homework: 1/22, 1/24

MLE (Maximum Likelihood Estimate) ( 8.5) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ. The maximum likelihood estimate is based on the principle of maximizing the likelihood function (joint pdf/pmf) of the observed sample: lik(θ) = f (X 1, X 2,..., X n θ) = n f (X i θ). (Treating X 1,..., X n as constants and θ as the variable.) Thus the MLE ˆθ for θ satisfies lik(ˆθ) = max {lik(θ)}. θ Usually, it is easier to maximize the log-likelihood function (via calculus) i=1 l(θ) = log[lik(θ)] = log f (X 1, X 2,..., X n θ) = n log f (X i θ). In some cases (when the support of the pdf/pmf depending on θ), we must maximize the likelihood function directly. E.g. Unif(0, θ). Related homework: 1/22, 1/24 i=1

Properties of MLE ( 8.5) Let X be a random variable from a probability distribution with parameter θ. The Fisher information for θ is [ ] [ 2 ] I (θ) = E log f (X θ) = E log f (X θ). θ θ2 Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ, and ˆθ be the MLE for θ. Then the asymptotic variance of ˆθ is 1 ni (θ). Moreover, ni (θ)(ˆθ θ) becomes approximately N(0, 1) as n, that is, ( ) ˆθ θ P a P(Z a) as n, for < a <, 1/(nI (θ)) where Z N(0, 1). Related homework: 1/29

Properties of point estimates ( 8.7) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ, and ˆθ be a point estimate (e.g. MLE, MME) for θ. The bias of the point estimate ˆθ is b(ˆθ) = E[ˆθ] θ. The point estimate ˆθ is said to be unbiased if b(ˆθ) = 0. The mean sqaured error of the point estimate ˆθ is Cramér-Rao lower bound: where I (θ) is the Fisher information. Related homework: 1/27 MSE(ˆθ) = E[(ˆθ θ) 2 ] = Var[ˆθ] + b(ˆθ) 2. Var[ˆθ] 1 ni (θ),

Interval estimation confidence intervals ( 8.5) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ, and ˆθ be the MLE for θ. An approximate confidence interval for θ with confidence level 100p% is ( ) 1 1 ˆθ z 1+p 2 ni (ˆθ), ˆθ + z 1+p, 2 ni (ˆθ) where I (θ) is the Fisher information, and P(Z z p) = p for Z N(0, 1). Let X 1, X 2,..., X n be a random sample from a normal distribution N(µ, σ). A confidence interval for µ with confidence level 100p% is X t 1+p 2,n 1 S 2 n, X + t S 2 1+p 2,n 1, n where X and S 2 are the sample mean and sample variance, respectively, and P(T t p,m) = p for T t m. A confidence interval for σ 2 with confidence level 100p% is (n 1)S2 χ 2, 1+p 2,n 1 where P(U χ 2 p,m) = p for U χ 2 m. Related homework: 2/3, 2/5 (n 1)S 2, χ 2 1 p,n 1 2

Sufficient statistics ( 8.8) Let X 1, X 2,..., X n be a random sample from a probability distribution with parameter θ, and T = T (X 1,..., X n) be a statistic. The statistic T is said to be sufficient for the parameter θ if the conditional joint distribution of X 1,..., X n given T = t no longer depends on θ for all possible t. Factorization Theorem A statistic T is sufficient for the parameter θ if and only if f (x 1,..., x n θ) = g(t (x 1,..., x n), θ)h(x 1,..., x n), for some function g(t, θ) and h, where f (x 1,..., x n) is the joint pdf/pmf for X 1,..., X n. A probability distribution with parameter θ is said to belonging to the exponential family if its pdf/pmf is of the form f (x θ) = Then T = { e c(θ)t (x)+d(θ)+s(x), x A 0, x / A A does not depend on θ. n T (X i ) is a sufficient statistic for θ, where X 1,..., X n is a random i=1 sample. Related homework: 2/7, 2/10

General hypothesis testing ( 9.1 9.2) A hypothesis is a statement about the population distribution. If a hypothesis completely specifies the distribution, it is called a simple hypothesis. (e.g. µ = µ 0.) If a hypothesis partially specifies the distribution, it is called a composite hypothesis. (e.g. µ > µ 0.) Typically, H 0 is chosen to be the more specific hypothesis. A test for the hypotheses H 0 and H A consists of a test statistic T and a rejection region R: if T R, then we reject H 0 ; if T / R, then we do not reject H 0. Consequence of a test decision Fact Related homework: 2/17, 2/19 Decision Reject H 0 Do not reject H 0 H 0 is true Type I error Correct decision H 0 is false Correct decision Type II error

General hypothesis testing ( 9.1 9.2) The significance level α of a test is the probability of making Type I error of the given test: α = P(reject H 0 H 0 is true) = P(T R H 0 is true). The probability of making Type II error of a test is denoted by β: β = P(do not reject H 0 H 0 is false) = P(T / R H 0 is false). The power of a test is the probability (power) of detecting a false H 0, and it equals to 1 β: power = 1 β = P(reject H 0 H 0 is false) = P(T R H 0 is false). Let X 1,..., X n be a random sample and T (X 1,..., X n) = t. The p-value of the sample X 1,..., X n is the smallest significance level α = α(t) corresponding to the rejection region R = R(t) such that t R(t) (meaning rejecting H 0 based on T = t). Related homework: 2/17, 2/19

Likelihood ratio rest ( 9.1 9.2) For the hypotheses H 0 : θ = θ 0 and H A : θ = θ 1, the likelihood ratio test based on a random sample X 1,... X n has test statistic Λ = lik(x 1,..., X n θ = θ 0 ) lik(x 1,..., X n θ = θ 1 ), and rejection region R = {Λ < c}. For the likelihood ratio test, the significance level is α = P(Λ < c θ = θ 0 ); the probability of making Type II error is β = P(Λ c θ = θ 1 ); power = 1 β = P(Λ < c θ = θ 1 ); and if Λ(X 1,..., X n) = λ, then the p-value of the sample is p = P(Λ < λ θ = θ 0 ). Related homework: 2/19, 2/21

Generalized likelihood ratio test ( 9.4) For the hypotheses H 0 : θ ω 0 and H A : θ ω 1, the generalized likelihood ratio test based on a random sample X 1,... X n has test statistic Λ = max θ ω 0 lik(x 1,..., X n θ) max θ Ω lik(x 1,..., X n θ), where Ω = ω 0 ω 1, and rejection region R = {Λ < c}. For the generalized likelihood ratio test, max θ ω0 lik(x 1,..., X n θ) = lik( θ), where θ is the mle of θ under the restriction of θ ω 0 ; and max θ Ω lik(x 1,..., X n θ) = lik(ˆθ), where ˆθ is the mle of θ under the restriction of θ Ω. Under certain conditions, when the sample size n is large, the distribution of 2 log Λ under H 0 is approximately χ 2 df, where df = dimω dimω 0 and dim refers to the number of free parameters. Related homework: 2/19, 2/21

Inference for µ based on normal model known σ 2 Let X 1, X 2,..., X n be a random sample from a normal distribution with mean µ and variance σ 2 (σ 2 is known). A 100(1 α)% confidence interval for µ is ( X z 1 α 2 σ n, X + z 1 α 2 ) σ. n The test for H 0 : µ = µ 0 v.s. H A has test statistic T = X µ 0 σ/ n and H A : µ > µ 0 H A : µ < µ 0 H A : µ µ 0 Rejection Region {T > c} {T < c} { T > c} α (given c) 1 Φ(c) Φ(c) 2[1 Φ(c)] c (given α) z 1 α z α z 1 α 2 p-value (given T = t) 1 Φ(t) Φ(t) 2[1 Φ( t )] β (given µ = µ 1 and α) Ψ (z 1 α ) 1 Ψ (z α) Ψ ( where Ψ(z) = Φ z + µ 0 µ 1 σ/ n Related homework: 2/24 (z 1 α2 ) Ψ ), and Φ is the cdf of N(0, 1). (z α2 )

Inference for µ based on normal model unknown σ 2 Let X 1, X 2,..., X n be a random sample from a normal distribution with mean µ and variance σ 2 (σ 2 is unknown). A 100(1 α)% confidence interval for µ is ( X t 1 α 2,n 1 S n, X + t 1 α 2,n 1 ) S. n The test for H 0 : µ = µ 0 v.s. H A has test statistic T = X µ 0 S/ n and H A : µ > µ 0 H A : µ < µ 0 H A : µ µ 0 Rejection Region {T > c} {T < c} { T > c} α (given c) 1 F n 1 (c) F n 1 (c) 2[1 F n 1 (c)] c (given α) t 1 α,n 1 t α,n 1 t 1 α 2,n 1 p-value (given T = t) 1 F n 1 (t) F n 1 (t) 2[1 F n 1 ( t )] where F n 1 is the cdf of the t-distribution with degrees of freedom n 1. Related homework: 2/26

Test for goodness-of-fit ( 9.5) Setting: Assume that the population contains m categories and the probability that a random observation being category i is p i. A random sample of size n contains X i observations of category i. (Thus the X i s follow a multinomial distribution with parameters n and p i s.) Hypotheses: H 0 : p i = p i (θ), and H A : H 0 is not true. In words, the null hypothesis H 0 specifies a model for the p i s. The generalized likelihood ratio test: test statistic Λ = max p= p(θ) lik(x 1,..., X m p) max p 1 =1 lik(x 1,..., X m p) = m i=1 ( ) Xi p i (ˆθ), where ˆp i p = (p 1,..., p m), ˆθ is the mle for θ, and ˆp i = X i /n is the mle for p i subject to p i = 1; rejection region R = {Λ < c} and 2 log Λ is approximately χ 2 df with df = (m 1) dimθ. An equivalent test (Pearson s χ 2 test) m test statistic X 2 (O i E i ) 2 =, where O i = X i represents the observed E i=1 i counts and E i = n p i (ˆθ) represents the expected counts; rejection region R = {X 2 > c} and X 2 is approximately χ 2 df with df = (m 1) dimθ. Related homework: 3/10

Inference for µ X µ Y based on normal model with two independent samples known σ 2 ( 11.2) Let X 1,..., X n be a random sample from N(µ X, σ 2 ); Y 1,..., Y m be a random sample from N(µ Y, σ 2 ). X s and Y s are independent. (Assume σ 2 is known.) A 100(1 α)% confidence interval for µ X µ Y is ( 1 (X Y ) z 1 α σ 2 n + 1 ) 1 m, (X Y ) + z 1 α σ 2 n + 1, m Let = µ X µ Y. Test for H 0 : = 0 v.s. H A has test statistic T = (X Y ) 0 and 1 σ n + 1 m H A : > 0 H A : < 0 H A : 0 Rejection Region {T > c} {T < c} { T > c} α (given c) 1 Φ(c) Φ(c) 2[1 Φ(c)] c (given α) z 1 α z α z 1 α 2 p-value (given T = t) 1 Φ(t) Φ(t) 2[1 Φ( t )] β (given = 1 and α) Ψ (z 1 α ) 1 Ψ (z α) Ψ ( where Ψ(z) = Φ z + 0 1 σ/ n Related homework: 3/14 (z 1 α2 ) Ψ ), and Φ is the cdf of N(0, 1). (z α2 )

Inference for µ X µ Y based on normal model with two independent samples unknown σ 2 ( 11.2) Let X 1,..., X n be a random sample from N(µ X, σ 2 ); Y 1,..., Y m be a random sample from N(µ Y, σ 2 ). X s and Y s are independent. (Assume σ 2 is unknown.) A 100(1 α)% confidence interval for µ X µ Y is ( (X Y ) t 1 α 2,n+m 2 S p 1 n + 1 m, (X Y ) + t 1 α 2,n+m 2 S p 1 n + 1 m ), where S p is the pooled standard error: Sp 2 = (n 1)S2 X + (m 1)S2 Y. n + m 2 Let = µ X µ Y. Test for H 0 : = 0 v.s. H A has test statistic T = (X Y ) 0 and 1 S p n + 1 m H A : > 0 H A : < 0 H A : 0 Rejection Region {T > c} {T < c} { T > c} α (given c) 1 F n+m 2 (c) F n+m 2 (c) 2[1 F n+m 2 (c)] c (given α) t 1 α,n+m 2 t α,n+m 2 t 1 α 2,n+m 2 p-value (given T = t) 1 F n+m 2 (t) F n+m 2 (t) 2[1 F n+m 2 ( t )] where F n+m 2 is the cdf of the t-distribution with degrees of freedom n + m 2. Related homework: 3/14

Test for comparing two populations Wilcoxon rank-sum test ( 11.2) Let X 1,..., X n be a random sample from a population with cdf F and Y 1,..., Y m be a random sample from a population with cdf G. The hypotheses are H 0 : F = G and H A : F G. Wilcoxon rank-sum test (also called Mann-Whiteny test) Order the observations X i and Y j, and assign ranks (1 through n + m) to each observation according to their order. Let R(Z) denote the rank of the observation Z. Assume that m < n. The test statistic is T Y = m j=1 R(Y j ). The rejection region is {T Y < c 1 or T Y > c 2 }. The distribution of T Y under H 0 can be determined from combinatorics. For example, the pmf of T Y for n = m = 2 is p(3) = p(4) = p(6) = p(7) = 1 6 and p(5) = 1 5. In practice, we apply symmetry and use test statistic R = min(r, R ), where R = T Y and R = m(n + m + 1) R (assuming m < n); and rejection region {R < c} (TABLE 8 of textbook). Related homework: 3/17

Inference for µ X µ Y based on normal model with matched pair design ( 11.3) Let X 1,..., X n be a random sample from a population with mean µ X and Y 1,..., Y n be a random sample from a population with mean µ Y. X i and Y i are paired for each 1 i n. The differences D i = X i Y i can be regarded as a random sample from a population with mean µ X µ Y. Furthermore, D i s are assumed to be a random sample from N(d, σ 2 ), where d = µ X µ Y. The inference methods for d is exactly the same as Inference for µ based on normal model. Related homework: 3/21

Test for comparing two populations with matched pair design signed rank test ( 11.3) Let X 1,..., X n be a random sample from a population with cdf F and Y 1,..., Y n be a random sample from a population with cdf G. X i and Y i are paired for each 1 i n. Let D i = X i Y i be the differences. The hypotheses are H 0 : D i s are symmetric about 0, and H A : D i s are not symmetric about 0. Signed rank test Order the magnitude of the differences D i, and assign ranks (1 through n) to each one according to their order. Let R(D i ) denote the rank of D i. The test statistic is W + = n i=1 1 (0, )(D i ) R(D i ), where 1 (0, ) (x) = 1 if x > 0 and 0 otherwise. The rejection region is {W + < c 1 or W + > c 2 }. The distribution of W + under H 0 can be determined from combinatorics. For example, the pmf of W + for n = 2 is p(0) = p(1) = p(2) = p(3) = 1 4. In practice, we apply symmetry and use test statistic W = min(w +, W ), where W = n(n + 1)/2 W +; and rejection region {W < c} (TABLE 9 of textbook). Related homework: 3/24

One-way ANOVA ( 12.2) setting Consider I groups (populations). For each group, a random sample of size J is drawn. Let Y ij denote the jth observation in the ith sample. The statistical model is Y ij = µ + α i + ε ij, where µ is the overall average of the I groups, α i is the correction of the ith group, and ε ij s are i.i.d. N(0, σ 2 ) random variables (errors). The sum of squares between groups measures the variation between the I samples: SS B = J I (Y i Y ) 2, and i=1 SS B σ 2 χ 2 I 1 if α i = 0 for all 1 i I, where Y i = 1 J Y ij and Y = 1 I J Y ij. J IJ j=1 i=1 j=1 The sum of squares within groups measures the overall variation inside the I samples: I J SS W = (Y ij Y i ) 2 SS W, and σ 2 χ 2 I (J 1). i=1 j=1 The total sum of squares measures the overall variation of the I samples: SS T = I i=1 j=1 J (Y ij Y ) 2 = SS B + SS W.

One-way ANOVA ( 12.2) F test The hypotheses of one-way ANOVA: H 0 : α i = 0 for all 1 i I H A : H 0 is false The F test: Intuition: if the variation between groups (SS B ) is large relative to the variation within groups (SS W ), then H 0 can not be true. Test statistic: F = SS B/(I 1) SS W /(I (J 1)), and F F (I 1, I (J 1)) under H 0. Rejection region: R = {F > c} with c = F 1 α (I 1, I (J 1)), where α is the significance level. The ANOVA table: Sum of Source df Squares Mean Square F Between Groups I 1 SS B MS B = SS B I 1 Within Groups I (J 1) S W MS W = SS W I (J 1) Total IJ 1 SS T Related homework: 4/2, 3/31 F = MS B MS W

Application of χ 2 test ( 13.3) test of homogeneity Consider I populations, each containing J categories. A random sample of size N is drawn from these populations: Population 1 Population I Total Category 1 n 11 n I 1 n 1..... Category J n 1J n IJ n J Total n 1 n I n where n ij is the number of observations of category j from the ith population, n i = j n ij, n j = i n ij, and N = n = i j n ij. Let p ij be the proportion of category j in population i. The hypotheses of test of homogeneity are: H 0 : p 1j = = p Ij for all 1 j J H A : H 0 is false The χ 2 test (test of goodness-of-fit): Test statistic: I J X 2 (O ij E ij ) 2 =, E i=1 j=1 ij and X 2 χ 2 (I 1)(J 1) under H 0. where O ij = n ij and E ij = n i n j. n Rejection region: R = {X 2 > c} with c = χ 2, where α is the 1 α,(i 1)(J 1) significance level. Related homework: 4/9

Application of χ 2 test ( 13.4) test of independence Consider two discrete random variables U and V. U has I possible values, with marginal pmf P(U = u i ) = p i ; and V has J possible values, with marginal pmf P(V = v j ) = q j. A random sample of size N is drawn from the population: u 1 u I Total v 1 n 11 n I 1 n 1..... v J n 1J n IJ n J Total n 1 n I n where n ij is the number of observations of the pair (u i, v j ), n i = j n ij, n j = i n ij, and N = n = i j n ij. Let the joint pmf be P(U = u i, V = v j ) = π ij. The hypotheses of test of independence are: H 0 : π ij = p i q j for all 1 i I, 1 j J H A : H 0 is false The χ 2 test (test of goodness-of-fit): I J Test statistic is X 2 (O ij E ij ) 2 =, and X 2 χ 2 E (I 1)(J 1) under i=1 j=1 ij H 0, where O ij = n ij and E ij = n i n j. n Rejection region: R = {X 2 > c} with c = χ 2, where α is the 1 α,(i 1)(J 1) significance level. Related homework: 4/9

Simple linear regression ( 14.1) The statistical model: y i = β 0 + β 1 x i + ε i, where ε i s are i.i.d. N(0, σ 2 ) random variables (error), β 0, β 1, and x i s are nonrandom constants. Given sample data (x 1, y 1 ),..., (x n, y n), we use the least square principle to find estimates ˆβ 0 and ˆβ 1 for β 0 and β 1, respectively, that is, we minimize the Residual Sum of Squares (RSS): Consequently, we have RSS = Related homework: 4/11, 4/9 n (y i ˆβ 0 ˆβ 1 x i ) 2. i=1 ( n ) ( n i=1 ˆβ 0 = x2 i i=1 y ) ( i n i=1 x ) ( n i i=1 x ) i y i n n i=1 x2 i ( n i=1 x ) 2, i ˆβ 1 = n n i=1 x i y i ( n i=1 x ) ( n i i=1 y ) i n n i=1 x2 i ( n i=1 x ) 2. i

Simple linear regression ( 14.1) continued The least square estimates ˆβ 0, ˆβ 1 are unbiased estimates, that is, Further more, Var[ ˆβ 0 ] = E[ ˆβ 0 ] = β 0, and E[ ˆβ 1 ] = β 1. σ 2 n i=1 x2 i n n i=1 x2 i ( n i=1 x ) 2, Var[ ˆβ 1 ] = i The error variance σ 2 can be estimated by nσ 2 n n i=1 x2 i ( n i=1 x i s 2 = RSS [, where RSS = y 2 i 1 ( ) ] 2 yi [n x i y i ( x i ) ( y i )] 2 n 2 n n 2 xi 2 n ( x i ) 2 Consequently, the estimated variance of ˆβ 0 and ˆβ 1 are s 2ˆβ 0 = s 2 n i=1 x2 i n n i=1 x2 i ( n i=1 x ) 2, s 2ˆβ = 1 i ns 2 n n i=1 x2 i ( n i=1 x i ) 2. ) 2. Moreover, ˆβ 0 β 0 t n 2, and ˆβ 1 β 1 t n 2 s ˆβ s 0 ˆβ 1 Related homework: 4/14