Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007

Introduction for Conditional Mean Modeling Suppose random variables Y, X 1, X 2,..., X k are considered and the conditional mean of Y on X 1, X 2,...,X k, E(Y X 1, X 2,..., X k ), is interested. Knowing E(Y X 1, X 2,..., X k ), the average behavior of Y conditional on specific realizations of X 1, X 2,..., X k is observed. Besides, the response of average behavior of Y conditional on one set of specific realizations of X 1, X 2,...,X k to another set of realizations could be analyzed. For example, the change of E(Y X 1 = x 1, X 2 = x 2,...,X k = x k ) to E(Y X 1 = x 1, X 2 = x 2 +,..., X k = x k ) could be treated as the pure effect of X 2 = x 2 changes to x 2 + on the average value of Y.

Denote m(x 1, x 2,...,x k ) = E(Y X 1 = x 1, X 2 = x 2,...,X k = x k ). To simplification, some functional forms are assumed for m(x 1, x 2,...,x k ), say linear and/or nonlinear parametric functional forms. Of course, m(x 1, x 2,...,x k ) can also be considered as nonparametric. The goal of econometric analysis is to estimate and infer m(x 1, x 2,...,x k ) using a collection of sample observations {y t, x t1, x t2,...,x tk, t = 1,..., T }, where T is total number of sample observations.

Linear Multiple Regression Suppose m(x 1, x 2,...,x k ) = β 10 x 1 + β 20 x 2 + + β k0 x k is assumed. For any realization (y, x 1, x 2,...,x k ), it can be represented as y = E(Y X 1 = x 1, X 2 = x 2,...,X k = x k ) + e, = β 10 x 1 + β 20 x 2 + + β k0 x k + e, where e is a difference between y and the conditional mean. Therefore, given a collection of sample observations {y t, x t1, x t2,...,x tk, t = 1,..., T }, the linear regression model is formulated as y t = β 10 x t1 + β 20 x t2 + + β k0 x tk + e t, t = 1,..., T. (1) The term e t is called as regression error.

OLS Estimator A linear regression model in (1) can be written in matrix notation as y T 1 = X T k β 0k 1 + e T 1, where k T. We want to find a k-dimensional regression hyperplane which best fits the data (y, X). Different estimators are obtained according different definition of best. The least squares estimator defines the best as the squared deviations of observed y t to fitted value ŷ t is minimized. The maximum likelihood estimator takes the best as the likelihood value is maximized.

Least Squares Estimator Denote the averaged squared deviations of all observed y t to ad hoc fitted values ŷ t as Q(β) = (y Xβ) (y Xβ)/T. The least squares estimator for β 0 is obtained by: min Q(β) := 1 β R k T (y Xβ) (y Xβ). The FOCs are also called the normal equations: β Q(β) = β (y y 2y Xβ + β X Xβ)/T = 2X (y Xβ)/T set = 0, and the resulting OLS estimator is ˆβ T = (X X) 1 X y.

The second order condition is satisfied because X X is positive definite. The vector of fitted values are ŷ = X ˆβ T = Py, where P = X(X X) 1 X, and the vector of regression residuals is ê = y ŷ = (I T P)y. By normal equations, X ê = 0 so that ŷe = 0. When X contains a constant term, we also have T t=1 êt = 0 and T t=1 y t = T T=1 ŷt.

Geometrically, ŷ = Py is the orthogonal projection of y onto the k-dimensional space span(x), the space spanned by the column vectors of X, and e = (I T P)y is the orthogonal projection of y onto span(x) (orthogonal complement space of span(x)). Consequently, ŷ is the best approximation of y in span(x) in the sense that y ŷ y z for all z span(x).

Properties of OLS Estimator under Classical Assumptions We first make the following classical assumptions on (y, X) and e: [A1 ]y = Xβ 0 + e, β 0 <, is the correct model. [A2 ]X is a T k nonstochastic and finite matrix. [A3 ]X X is nonsingular for all T k. (X is of full column rank.) [A4 ]e is a random vector such that E(e) = 0. [A4 ]e is a random vector such that E(e) = 0 and E(ee ) = σ0i 2 T, where σ0 2 <. [A5 ]e N(0, σ0i 2 T ), where σ0 2 <.

(1) Given assumptions [A1-A3], ˆβ T and ˆσ T 2 exist and are unique. (2) Given assumptions [A1-A4], ˆβ T is unbiased. (3) Given assumptions [A1-A3] and [A4 ], var(ˆβ T ) = E[(ˆβ T Eˆβ T )(ˆβ T Eˆβ T ) ] = E[(X X) 1 X ee X(X X) 1 ] = σ0(x 2 X) 1.

(4) Gauss-Markov Result: Given assumptions [A1-A3] and [A4 ], ˆβ T is the best linear unbiased estimator (BLUE) of β 0. (5) Given assumptions [A1-A3] and [A4 ], ˆσ T 2 = ê ê/(t k) is an unbiased estimator for σ0. 2 (6) If we assume [A5] instead of [A4 ], ˆβ T is the maximum likelihood estimator (MLE). But, the MLE for σ0 2 is σ T 2 = ê ê/t is biased estimator. (7) Given assumption [A1-A3] and [A5], ˆβ T and ˆσ T 2 are the minimum variance unbiased estimator (MVUE).

Goodness of Fit A natural measure is the regression variance ˆσ T 2 = ê ê/(t k). Some relative measures: (1) The Coefficient of Determination: Non-Centered R 2. (2) The Coefficient of Determination: Centered R 2. (3) Adjusted R 2 : R 2. R 2 ê ê/(t k) = 1 (y y TȳT 2 )/(T 1) = 1 T 1 T k (1 R2 ) = R 2 k 1 T k (1 R2 ).

Three alternatives that have been proposed for comparing models are 1. R 2 = T+k (1 T k R2 ), which minimizes Amemiya s prediction criterion, PC = ê ê T k ( 1 + k ) ( = ˆσ T 2 1 + k ). T T 2. Akaike s information criterion: (AIC) ) (ê ê AIC = ln + 2k T T = ln σ2 T + 2k T. 3. Schwarz information criterion: (SIC) SIC = ln σ 2 T + k ln T T.

Sampling Distribution of OLS Estimator under Classical Assumptions Given [A5]: e N(0, σ 2 0I T ), the following distributions are immediate. y X N(Xβ 0, σ 2 0I T ); ˆβ T X N(β 0, σ 2 0(X X) 1 ); ê X = (I T P)e N(0, σ 2 0(I T P)). As (T k)ˆσ 2 T /σ2 0 = ê ê/σ 2 0, (T k)ˆσ2 T χ 2 (T k), σ0 2 with mean (T k) and variance 2(T k). Hence, ˆσ T 2 has mean σ0 2 and variance 2σ0/(T 4 k).

Testing Linear Hypotheses H 0 : Rβ 0 = r, where R is a q k nonstochastic matrix with rank q, and r is a vector of pre-specified real values. [R(X X) 1 R ] 1/2 (Rˆβ T r)/σ 0 N(0,I q ) (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r)/σ 2 0 χ 2 (q). Recall that (T k)ˆσ 2 T /σ2 0 χ2 (T k). φ = = {(Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r)/σ0 2}/q {(T k)ˆσ T 2 /σ2 0 }/(T k) χ 2 (q)/q χ 2 (T k)/(t k) = (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r) qˆσ T 2 F(q,T k).

An Alternative Approach Given the constraint Rβ 0 = r, the constrained OLS estimator can be ontained by minimizing the Lagrangian: min β (y Xβ) (y Xβ)/T + (Rβ r) λ, where λ is the Lagrangian multiplier. The FOCs are 2X (y Xβ)/T + R λ set = 0 Rβ r set = 0. The FOCS can written as [ ] [ 2X X/T R R 0 β λ ] set = [ 2X y/t r ].

We can solve λ T = 2[R(X X/T) 1 R ] 1 (Rˆβ T r), β T = ˆβ T (2X X/T) 1 R λt. β T is called the constrained OLS estimator for β 0. Note that the vector of constrained OLS residuals is ë = y X β T = y X ˆβ T + X(ˆβ T β T ) = ê + X(ˆβ T β T );

ë ë = ê ê + (ˆβ T β T ) X X(ˆβ T β T ) = ê ê + (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r), since ˆβ T β T = (X X/T) 1 R [R(X X/T) 1 R ] 1 (Rˆβ T r). Thus ë ë ê ê = (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r) is the numerator term in the F-test, φ.

φ = ë ë ê ê = ESS c ESS u qˆσ T 2 qˆσ T 2 = (ESS c ESS u )/q ESS u /(T k) (Ru 2 R 2 = c)/q (1 Ru)/(T 2 k).

where the subscripts c and u signify the constrained and unconstrained models, respectively. In other words, the F-test can be interpreted as a test of the loss of fit because it compares the performance of the constrained and unconstrained models. In particular, if we want to test whether all the coefficients (except the constant term) equal zero, then R 2 c = 0 so that φ = Ru/(k 2 1) F(k 1, T k). (1 (Ru)/(T 2 k)

Asymptotic Properties of the OLS Estimator ( ) 1 ( ) 1 T ˆβ T = x T t x 1 T t x T t y t t=1 t=1 ( ) 1 ( ) 1 T = β 0 x T t x 1 T t x T t x t t=1 t=1 ( ) 1 ( ) 1 T + x T t x 1 T t x T t e t. t=1 t=1

Asymptotic Normality of OLS Estimator: IID Observations

Komolgorov Theorem Let {Z t } be a sequence of i. i. d. random variables and Z T T 1 T t=1 Z a.s. t. Then Z T µ if and only if IE Z t < and IE(Z t ) = µ.

Lindeberg-Levy Central Limit Theorem Let {Z t } be a sequence of i. i. d. random scalars. If varz t σ 2 <, σ 2 0, then T( Z T µ T )/ σ T = T( Z T µ)/σ = T T (Z T µ)/σ A N(0, 1). t=1

Asymptotic Normality under IID Case Given B1 y = Xβ 0 + e; B2 {(x t, e t ) } is an i. i. d. sequence; B3 1 E(x te t ) = 0; 2 E x ti e t 2 <,i = 1,... k; 3 V T var(t 1/2 X e) = V is positive definite; B4 1 E x ti 2 <,i = 1,...,k; 2 M E(x t x t) is positive definite. Then D 1/2 T(ˆβ T β 0 ) A N(0, I), where D M 1 V M 1.

Suppose in addition that B5 there exists Vˆ T symmetric and positive semidefinite such that Vˆ T V p 0. Then Dˆ T D p 0, where Dˆ T = (X X/T) 1 Vˆ T (X X/T) 1.

Asymptotic Normality of OLS: Independent Heterogeneous Observations

Markov s SLLN Let {Z t } be a sequence of independent random variables with E(Z t ) = µ t <. If for some δ > 0, t=1 E Z t µ t 1+δ /t 1+δ <, then Z T µ T 0 a.s.

Lindeberg-Feller s CLT Let {Z t } be a sequence of independent random scalars with E(Z t ) µ t, varz t σt 2 <, σt 2 0, and distribution function F t (z). Then T( Z T µ T )/ σ A T N(0, 1) and lim n σ 2 T T 1 T t=1 (z µ t) 2 >ǫt σ 2 T (z µ T ) 2 df t (z) = 0. The last condition of this result is called Lindeberg condition.

Liapounov s CLT Let {Z t } be a sequence of independent random scalars with E(Z t ) = µ t, varz t = σt 2, σt 2 0, and E Z t µ t 2+δ < < for some δ > 0 and all t. If σ T 2 > δ > 0 for all T sufficiently large, then T( Z T µ T )/ σ A T N(0, 1).

Asymptotic Normality: Ind. Het. Observations Suppose that the following conditions hold: B1 y t = x tβ 0 + e t, t = 1,...,T; B2 {(x t, e t) } is an independent sequence; B3 1 E(x t e t ) = 0 for all t, 2 E x ti e t 2+δ < for some δ > 0 and all i = 1,...,k, and t, 3 V T var(x e/t 1/2 ) is uniformly p.d.; B4 1 E x 2 ti 1+δ < for some δ > 0 and all i = 1,...,k, and t, 2 M T E(X X/T) is uniformly p.d. Then D 1/2 T T(ˆβ T β 0 ) A N(0, I), where D T = M 1 T V TM 1 T.

Further suppose: B5 there exists ˆV T p.s.d. and symmetric such that ˆV T V T p 0. Then ˆD T = (X X/T) 1 ˆV T (X X/T) 1 and ˆD T D T p 0.

Large Sample Tests I We consider various large sample tests for the linear hypothesis Rβ 0 = r, where R is a q k nonstochastic matrix with rank q k.

Wald Test Let Γ T = RD T R = RM 1 T V TM 1 T R. Then under the null hypothesis, and the Wald statistic is Γ 1/2 T T(RˆβT r) A N(0, I), W T = T(Rˆβ T r) Γ 1 T (Rˆβ T r) A χ 2 (q), where ˆΓ T = R ˆD T R = R(X X/T) 1 ˆV T (X X/T) 1 R.

Lagrange Multiplier Test Given the constraint Rβ = r, the constrained OLS estimator is obtained by minimizing the Lagrangian (y Xβ) (y Xβ)/T + (Rβ r) λ, where λ is the Lagrange multiplier. Intuitively, when the null hypothesis is true (i.e., the constraint is valid), the shadow price (λ) of this constraint should be low. Hence, whether the shadow price is close to zero is an evidence for or against the hypothesis. The Lagrange multiplier (LM) test can be interpreted as a test of λ = 0.

λ T = 2[R(X X/T) 1 R ] 1 (Rˆβ T r), β T = ˆβ T (X X/T) 1 R λt /2, where β T is the constrained OLS estimator, and λ T is the basis of the LM test.

Suppose that the asymptotic normality of ˆβ T holds, Λ 1/2 T TbλT A N(0, I), where Λ T = 4(RM 1 T R ) 1 Γ T (RM 1 T R ) 1. The LM statistic is where ˆΛ T = LM T = T λ 1 T ˆΛ λ T T A χ 2 (q), 4[R(X X/T) 1 R ] 1 [R(X X/T) 1 V T (X X/T) 1 R ][R(X X/T) 1 R ] 1, and V T is an estimator of V T obtained from the constrained regression such that V T V T P 0 under the null.

If ˆV T replaces V T in ˆΛ T, then LM T = 4(Rˆβ T r) [R(X X/T) 1 R 1 ] 1ˆΛ T [R(X X/T) 1 R ] 1 (Rˆβ T r) = T(Rˆβ T r) ˆΓ 1 T (Rˆβ T r) = W T. This suggests that these two tests are asymptotically equivalent under the null hypothesis, i.e., W T LM T P 0.

Test of s coefficients being zero: [0 I s ]β 0 = 0. Accordingly, the original model can be written as y = X 1 b 10 + X 2 b 20 + e, where X 1 and X 2 are T (k s) and T s matrices, respectively. Clearly, the constrained model is y = X 1 b 10 + e, so that the constrained OLS estimator is β T = ( b 1T 0), where b 1T = (X 1X 1 ) 1 X 1y, and the constrained OLS residual is ë = y X 1 b1t.

Writing P 1 = X 1 (X 1X 1 ) 1 X 1, it is easily verified that by matrix inversion formula, R(X X) 1 [ = [X 2(I P 1 )X 2 ] 1 X 2X 1 (X 1X 1 ) 1 [X 2(I P 1 )X 2 ] 1], R(X X) 1 R = [X 2 (I P 1)X 2 ] 1, R(X X) 1 X = [X 2(I P 1 )X 2 ] 1 X 2(I P 1 ). Hence λ T = 2X 2 (I P 1)ë/T = 2X 2ë/T, and ˆΛ T = 4[R(X X/T) 1 R ] 1 [R(X X/T) 1 V T(X X/T) 1 R ][R(X X/T) 1 = 4[ X 2X 1 (X 1X 1 ) 1 I s ] V T [ X 2X 1 (X 1X 1 ) 1 I s ].

The LM statistic is thus LM T = T 4 λ [ T [ X 2 X 1(X 1 X 1) 1 I s ] V T [ X 2 X 1(X 1 X 1) 1 I s ] ] 1 λt. When V T = σ 2 T (X X/T) 1 is consistent for V T, where σ 2 T = T t=1 ë2 t/t, the LM statistic can be further simplified as LM T = ë X(X X) 1 X ë ë ë/t = TR 2, where R 2 is the (non-centered) R 2 of regressing ë on X.

Likelihood Ratio Test When e t are i.i.d. N(0, σ 2 0), we have learned that the OLS estimator is also the MLE maximizing L T (β, σ 2 ) = T 2 log(2π) T 2 log(σ2 ) 1 2σ 2 T (y t x tβ) 2. t=1 Let β T ( β T ) be the (un)constrained MLE of β 0 and σ 2 T = 1 T T ë 2 t, σ 2 T = 1 T T ê 2 t. t=1 t=1

The likelihood ratio (LR) test is based on the log likelihood-ratio: ( ) LR T = 2 L T ( β T, σ T) 2 L T ( β T, σ T) 2 = T log ( σ 2 T σ 2 T ). If the null hypothesis is true, the likelihood ratio is close to one so that LR T is close to zero; otherwise, LR T is positive.

As σ 2 T = σ 2 T + ( β T β T ) (X X/T)( β T β T ) = σ 2 T + (R β T r) [R(X X/T) 1 R ] 1 (R β T r), LR T = T log ( ) 1+(R β T r) [R(X X/T) 1 R ] 1 (R β T r)/ σ T 2 }{{} =:z T

By noting that the mean value expansion of log(1 + z) about z = 0 is (1 + z ) 1 z, where z lies between z and 0, we can write LR T = T(1 + z T ) 1 z T = T(R β T r) [R(X X/T) 1 R ] 1 (R β T r)/ σ 2 T + o P (1), where the second term is nothing but the Wald statistic with ˆV T = σ 2 T (X X/T). We immediately have the following result.

Suppose that σ T 2(X X/T) is consistent for V T. Then under the null hypothesis, LR T A χ 2 (q). Therefore, the Wald, LR, and LM tests are asymptotically equivalent provided that σ T 2 (X X/T) is consistent for V T. If σ T 2 (X X/T) is not consistent for V T, LR T need not have a limiting χ 2 distribution. Thus, the LR test is not robust to heteroskedasticity and serial correlation, whereas the Wald and LM tests are robust if V T is estimated properly.

Conflict Among Tests If σ0 2 is known, it can be seen that T LR T = (ë 2 t e 2 t)/σ0 2 = W T. t=1 We have also learned that the Wald and LM tests differ by the asymptotic covariance matrix estimator used in the statistics. It follows that when σ0 2 is known, LM T = W T = LR T. As W T = LR T ( σ T); 2 LM T = LR T ( σ T). 2 if σ T 2 (X X/T) is not consistent for V T, LR T need not have a limiting χ 2 distribution. Thus, the LR test is not robust to heteroskedasticity and serial correlation, whereas the Wald and LM tests are robust if V T is estimated properly.

Observe that LR T LM T = LR T LR T ( σ 2 T) = 2[L T ( β T, σ 2 T) L T ( β u T, σ 2 T)] 0, where β u T maximizes L T (β, σ T 2 ), and that W T LR T = LR T ( σ 2 T) LR T = 2[L T ( β T, σ 2 T) L T ( β r T, σ 2 T)] 0, where β r T maximizes L T (β, σ T 2 ) subject to the constraint Rβ = r. We have established an inequality in finite samples: W T LR T LM T ;

Estimation of the Asymptotic Covariance Matrix In the most general form, V T can be written as ( ) 1 T var x t e t T = 1 T t=1 T var(x t e t ) + 1 T t=1 T 1 T τ=1 t=τ+1 E(x t τ e t τ e t x t) + E(x t e t e t τ x t τ). We have learned that the limiting distributions of the large sample tests discussed in the preceding subsections depend crucially on the consistent estimation of V T.

The Case of No Serial Correlation We have learned that when {(x t, e t ) } is an independent sequence, ( ) 1 T var x t e t = 1 T T t=1 T var(x t e t ). Let ˆV T = T t=1 ê2 tx t x t/t. It can be seen that when ˆβ T is consistent for β 0, 1 T T ê 2 tx t x t 1 T t=1 = 1 T T E(e 2 tx t x t) t=1 t=1 T ( e 2 t x t x t E(e 2 tx t x t) ) 2 T t=1 1 T T t=1 T t=1 ( ) (ˆβ T β 0 ) x t x t(ˆβ T β 0 )x t x t P 0, ( ) e t x t(ˆβ T β 0 ) x t x t +

Thus, ˆV T is consistent for V T, and ˆD T = ( 1 T ) 1 ( T x t x 1 t T t=1 ) ( T ê 2 tx t x 1 t T t=1 T x t x t t=1 ) 1 is consistent for D T, the asymptotic covariance matrix of T(ˆβT β 0 ).

More generally, if E(e t F t 1 ) = 0, where F t 1 = σ((e i 1, x i) ; i t) contains information up to time t 1, then for τ < t, E(x t e t e τ x τ) = E(x t E(e t F t 1 )e τ x τ) = 0, so that V T = T t=1 var(x te t )/T. Consequently, ˆD T above is still consistent for D T.

General Case In the time series context, it is possible that x t e t exhibit certain correlation. If x t e t are asymptotically uncorrelated in the sense that E(x t e t e t τ x t τ) 0 at a suitable rate as τ, then for τ large, T t=τ+1 E(x te t e t τ x t τ)/t should be very small. This suggests that V T may be well approximated by V T = 1 T T var(x t e t ) + 1 T t=1 m(t) T τ=1 t=τ+1 E(x t τ e t τ e t x t) + E(x t e t e t τ x t τ), for some m(t), where m(t) should be growing with T to maintain the approximation property.

In particular, m(t) is required to be O(T 1/4 ), i.e., m(t) also tends to infinity at a rate much slower than T. The following estimator is a heteroskedasticity and autocorrelation consistent convariance matrix estimator: ˆV T = 1 T T ê 2 tx t x t + 1 T t=1 m(t) T τ=1 t=τ+1 (x t τ ê t τ ê t x t + x t ê t ê t τ x t τ).

The major problem is that ˆV T need not be p.s.d. Newey & West (1987) propose a simple estimator: ˇV T = 1 T T ê 2 t x tx t + 1 T t=1 m(t) τ=1 w m (τ) T t=τ+1 ( xt τ ê t τ ê t x t + x tê t ê t τ x ) t τ, where w m (τ) = 1 [τ/(m + 1)] is a weight function. Note that w m (τ) is decreasing in τ; hence the larger the τ, the smaller the associated weights. Also note that for fixed τ, w m (τ) 1 as m.

Testing for Efficient Market Hypothesis, EMH EMH: E(p t Ω t 1 ) = p t 1, Ω t 1 is the information set at t 1. Under EMH, Ω t 1 = p t 1. That is, E(p t Ω t 1 ) = E(p t p t 1 ) = p t 1. Gien a linear model for the conditional mean E(p t p t 1 ) = α 0 + β 0 p t 1, a linear regression model for observations t = 1,..., T is set to be p t = α 0 + β 0 p t 1 + e t, t = 1,...,T. Testing for EMH is equivalent to test the null hypothesis H 0 : β 0 = 1.

Assumptions to be checked: [A1 ]: True model? Yes! [A2 ]: Is p t 1 nonstochastic? No! Non-classical regression analysis! [B2 ]: Does p 2 t 1 obey a WLLN? No! as p t 1 is not stationary so that spurious regression may exist. This is known by data plot or unit root tests.

What is the stationary and nonstationary? 1. Strong Stationarity: A time series {y t } is strong stationary if the distribution and joint distribution are time invariant. 2. Weak Stationarity: A time series {y t } is weak stationary if it has constant mean, constant variance, and the covariance between y t and y t+s depending on s not t.

It is clear that ln p t is nonstationary when p t is nonstationary. However, the first-order difference of ln p t, p t = ln p t ln p t 1 = r t which is defined as the return, becomes stationary. Observe that p t = α 0 + β 0 p t 1 ln p t ln p t 1 = α 0 + β 0 ln p t 1 = α 0 + β 0 ln p t 2 ln p t ln p t 1 = β 0 (ln p t 1 ln p t 1 ) p t = β 0 p t 1 r t = β 0 r t 1. Therefore, the linear regression model we considered becomes r t = α 0 + β 0 r t 1 + e t, t = 1,...,T,

Question again: How do we make a reliable statistical inference for the null hypothesis? [A1 ]: True model? Yes! [A2 ]: Is r t 1 nonstochastic? No! Non-classical regression analysis! [B2 ]: Does {r 2 t 1} obey a WLLN? Yes! as {r t 1 } is stationary. [A5 ]: Is r t normally distributed? No! Check by Eviews. [B3 ]: Does {r t 1 e t } obey a CLT? Yes! Therefore, regression analysis is implementable and the large sample test is applicable.