Stein-Rule Estimation and Generalized Shrinkage Methods for Forecasting Using Many Predictors

Stein-Rule Estimtion nd Generlized Shrinkge Methods for Forecsting Using Mny Predictors Eric Hillebrnd CREATES Arhus University, Denmrk ehillebrnd@cretes.u.dk Te-Hwy Lee University of Cliforni, Riverside Deprtment of Economics telee@ucr.edu August Abstrct We exmine the Stein-rule shrinkge estimtor for possible improvements in estimtion nd forecsting when there re mny predictors in liner time series model. We consider the Stein-rule estimtor of Hill nd Judge (97) tht shrinks the unrestricted unbised OLS estimtor towrds restricted bised principl component (PC) estimtor. Since the Stein-rule estimtor combines the OLS nd PC estimtors, it is model-verging estimtor nd produces combined forecst. The conditions under which the improvement cn be chieved depend on severl unknown prmeters tht determine the degree of the Stein-rule shrinkge. We conduct Monte Crlo simultions to exmine these prmeter regions. The overll picture tht emerges is tht the Stein-rule shrinkge estimtor cn dominte both OLS nd principl components estimtors within n intermedite rnge of the signl-to-noise rtio. If the signl-to-noise rtio is low, the PC estimtor is superior. If the signl-to-noise rtio is high, the OLS estimtor is superior. In out-of-smple forecsting with AR() predictors, the Stein-rule shrinkge estimtor cn dominte both OLS nd PC estimtors when the predictors exhibit low persistence. Keywords: JEL Clssifictions: Introduction Stein-rule, shrinkge, risk, vrince-bis trdeoff, OLS, principl components. C, C, C Recent contributions to the forecsting literture consider mny predictors in dt-rich environments nd principl components, such s Stock nd Wtson (,, ), Bi (), Bi nd Ng (, ), Bir et l. (), Hung nd Lee (), Hillebrnd et l. (), nd Inoue nd Kilin (), mong others. In prticulr, Stock nd Wtson () note tht mny forecsting models in this environment cn be written in unified frmework clled the shrinkge representtion. Although the notion of the generlized shrinkge representtion cn be found in much erlier publictions (e.g., Judge nd Bock 97), interest in shrinkge hs been revived in the recent literture on out-of-smple forecsting.

The issue of forecsting using mny predictors ws discussed erlier in econometrics nd sttistics under the subject heding of ill-conditioned dt or multicollinerity. In prticulr, Hill nd Judge (97) studied improved prediction in the presence of multicollinerity. They exmined possible improvements in estimtion nd forecsting when there re mny predictors in liner regression model. The Stein-rule estimtor proposed in their pper shrinks the unrestricted unbised OLS estimtor towrds restricted bised principl component (PC) estimtor. Improvements re usully mesured employing risk function of the squred forecst error loss. While the symptotic risk functions for the OLS nd PC estimtors re rther esily obtined, the risk of the Stein-rule is complicted s it depends on severl unknown prmeters nd dt-chrcteristics. It is not esy to understnd conditions nd situtions under which improvements cn be chieved. We conduct Monte Crlo simultions to shed light on the issue, both in-smple estimtion nd outof-smple forecsting. In generl, key feture is tht the desired improvement through Stein-rule shrinkge depends on the signl-to-noise rtio, which is ffected by multiple determinnts. The Steinrule shrinkge estimtor cn dominte both OLS nd PC estimtors within n intermedite rnge of the signl-to-noise rtio. If the signl-to-noise rtio is low, the PC estimtor tends to be superior. If the signl-to-noise rtio is high, the OLS estimtor tends to be superior. In out-of-smple forecsting with AR() predictors, the Stein-rule shrinkge estimtor cn dominte both OLS nd PC estimtors when the predictors hve low persistence. Hill nd Fomby (99) exmined the out-of-smple performnce of vriety of bised estimtion procedures such s ridge regression, principl component regression, nd severl Stein-like estimtors. Their setup of evlution ws out-of-smple prediction in the sense tht the out-of-smple dt re different from the dt used for prmeter estimtion, but not out-of-smple prediction in the context of the recent time series forecsting literture. As the Stein-rule estimtor of Hill nd Judge (97) combines OLS nd PC estimtors, it cn be shown tht it is model-verging estimtor nd thus produces combined forecst. In fct, Hnsen () shows tht the Stein-type shrinkge estimtor is Mllow-type combined estimtor. Other ppers hve studied the reltion between Stein-like shrinkge nd forecst combintions. Fomby nd Smnt (99) use the Stein-rule for directly combining forecsts. Clrk nd McCrcken (9) exmine the properties of combined forecsts of two nested models nd note tht their combined forecst is Stein-type shrinkge forecst. Hence, the shrinkge principle provides insights not only into how to solve the issues of estimtion in the presence of multicollinerity nd forecsting using mny predictors, but lso how forecst combintions in the sense of Btes nd Grnger (99) yield improvements. The pper is orgnized s follows. Section presents the shrinkge representtion for forecsting using principl components. In Section we consider the OLS nd PC estimtors nd their symptotic risk of the squred error loss. In Section, the Stein-rule shrinkge estimtor tht combines the OLS nd PC estimtors is presented. Section nd Section present Monte Crlo nlysis for in-smple nd

out-of-smple performnce of these three estimtors OLS, PC nd the Stein-rule estimtors. Finlly, Section 7 provides some concluding remrks. Shrinkge Representtion This section uses Stock nd Wtson s () nottion. Let the time series under study be denoted by y t nd let P it, i =,..., K, be set of K orthonorml predictors such tht P P/T = I K. These predictors cn be thought of s the principl components of possibly lrge dt set X t. The sttisticl model is y t = δ P t + ε t, () where δ R K is prmeter vector nd ε t is some error with men zero nd vrince σ. Both y t nd P t re ssumed to hve smple men zero. Let ỹ T + T be the forecst of y t time T + given informtion through time T. The theorems in Stock nd Wtson () show tht n rry of forecsting methods, nmely Norml Byes, Byesin Model Averging, Empiricl Byes, nd Bgging, hve shrinkge representtion ỹ T + T = K ψ(κt i )ˆδ i P it + o P (), () i= where ˆδ i = T T t= P i,t y t is the OLS estimtor of δ i, t i = T ˆδ i /ˆσ is the t-sttistic for ˆδ i, ˆσ = T t= (y t ˆδ P t ) /(T K) is the consistent estimtor of σ, ψ is function tht is specific to forecsting method, nd κ is constnt tht is specific to forecsting method. For exmple, the shrinkge representtion of the OLS estimtor is ψ(κt i ) = for ll i. A pre-test estimtor hs shrinkge representtion ψ(κt i ) = { ti >t c} for some criticl vlue t c. The principl components estimtor tht retins the first K principl components nd discrds the others hs shrinkge representtion ψ(κt i ) = for i {,..., K } nd ψ(κt i ) = else. See lso Judge nd Bock (97, p. ) nd Hill nd Fomby (99, p. ) for generl representtion of fmily of minimx shrinkge estimtors. Principl Component Model This section follows Hill nd Judge (97, 99) nd Hill nd Fomby (99), with dpted nottion. Let the model in terms of the originl predictor X be y = Xβ + ε, () where y is T time series, X is T K mtrix of K predictors, β is K prmeter vector, nd ε is T error time series with the conditionl men zero E(ε X) = nd conditionl vrince E (εε X) = σ I K. Note tht we do not ssume normlity of ε in this section while we generte it from the norml distribution in our simultion study in Sections nd. Our interest is to forecst y

when the number K of predictors in X is lrge. The loction vector β is unknown nd the objective is to estimte it by β(y, X). We consider three estimtors for β in this pper: (i) the ordinry lest squres (OLS) estimtor denoted ˆβ, (ii) the principl component (PC) estimtor denoted ˆβ, nd (iii) the Stein-like combined estimtor of ˆβ nd ˆβ, which is to be denoted s β in the next section. In this section we exmine the smpling properties of ˆβ nd ˆβ in terms of the symptotic risk under the weighted squred error loss. In Sections nd we compre them with the Stein-like combined estimtor β. The smpling performnce of n estimtor β (y, X) is evluted by its risk function, the expected weighted squred error loss with weight Q, Risk (β, β(y, X), Q) = E [(β (y, X) β) Q(β (y, X) β)]. () As we will exmine the performnce of the Stein-like estimtor in dynmic models for forecsting with wekly dependent time series, the predictor mtrix X is treted s stochstic. Hence, the expecttion in () is tken over the joint probbility lw of (y, X). In this section we compute the weighted qudrtic risk with weight Q = X X, which gives the squred conditionl prediction error risk. In Sections nd we lso consider weight Q = I K. The symptotic risks of ˆβ nd ˆβ re computed below bsed on the symptotic covrinces of ˆβ nd ˆβ. The OLS estimtor ˆβ = (X X) X y, () conditionl on X, hs the symptotic smpling property T ( ˆβ β ) X d N (, σ (X X) ). () The symptotic qudrtic risk weighted with Q = X X of the OLS estimtor ˆβ is ( Risk β, ˆβ, ) { ( ) ( ) } X X = E ˆβ β X X ˆβ β (7) { ( ) ( ) } = tr E X X ˆβ β ˆβ β { [ ( ) ( ) ]} = tr E X XE ˆβ β ˆβ β X = tr E {X X σ (X X) } = tr(σ I K ) = Kσ. ( ) Since the bis E ˆβ β X = conditionl on X, the risk contins only vrince component. Turning to the PC estimtor ˆβ, let V be the K K mtrix of eigenvectors of X X = T V ΛV, where Λ is the digonl mtrix of eigenvlues in descending order. Then, V V = I K nd Y = Xβ + ε = XV Λ Λ V β + ε = P δ + ε, P = XV Λ, δ = Λ V β. ()

This is the principl components regression model; P contins the principl components of X, nd ˆδ = (P P ) P y = T P y cn be estimted either from the principl components or s ˆδ = Λ V ˆβ from the OLS estimtor of β. So fr, the principl components model is equivlent to the originl model. When X hs lrge degree of collinerity, the eigenvlues in Λ vry gretly in mgnitude, nd some re close to zero. Then, the number of components is decomposed into K = K + K, where K is the number of eigenvlues tht re reltively lrge nd K is the number of eigenvlues tht re reltively close to zero. The K principl components tht correspond to the smll eigenvlues re discrded; the remining K principl components re kept. The model becomes ( ) δ y = P δ + ε = (P P ) + ε = P δ δ + P δ + ε, (9) = X(V V )Λ Λ (V V ) β + ε = XV Λ Λ V β + XV Λ Λ V β + ε, () where Λ nd Λ re the K K nd K K digonl mtrices, respectively, tht contin the corresponding eigenvlues, nd P δ = XV Λ Λ V β is deleted. Therefore, principl components regression with deleted components is equivlent to OLS estimtion with the restriction δ = Rβ = Λ V β =, () where R = Λ V imposes K liner restrictions on β. Note tht R = Λ V is stochstic depending on X, nd the risk of the restricted estimtor is the expected loss with expecttion tken over (y, X). The principl components estimtor of δ with K deleted components, corresponding to the restrictions δ =, is ˆδ = (P P ) P Y = T P Y. () The symptotic distribution conditionl on X is T (ˆδ δ ) X d N (, σ I K ). () The estimtor ˆδ nd setting δ = result in the fit nd the principl components estimtor of β is therefore y = P ˆδ + ˆε = X V Λ ˆδ + ˆε, () ˆβ = V Λ ˆδ. () This is specil cse of the restricted lest squres (RLS) estimtors explored by Mittelhmmer (9). Fomby, Hill, nd Johnson (97) present n optimlity property of ˆβ tht the trce of the symptotic covrince mtrix of ˆβ obtined by deleting K principl components ssocited with the smllest eigenvlues is t lest s smll s tht for ny other RLS estimtor with J K restrictions. This optimlity is in terms of the symptotic qudrtic risk weighted with Q = I K, i.e., Risk (β, ˆβ ), I K.

For forecsting, it is interesting to exmine the symptotic qudrtic risk weighted with Q = X X of the PC estimtor ˆβ. ( Risk β, ˆβ ), X X ( ( ) = E ˆβ β) X X ˆβ β ( = E = T E V Λ ( V Λ ( ˆδ β) (T V ΛV ) ) ( ˆδ β V Λ ) ˆδ β V Λ / Λ / V ) ( V Λ = T E (ˆδ Λ V V Λ / β V Λ /) ( Λ / V V Λ [ = T E (ˆδ IK ] δ ) ([ ] ) I K ˆδ δ ) ) = T E (ˆδ δ (ˆδ δ + T δ δ ) ˆδ β ) ˆδ Λ / V β () = T tr E (ˆδ δ ) (ˆδ δ ) + T δ δ = T tr ( T σ I K ) + T δ δ = K σ + T δ δ where the first term corresponds to the vrince term which declines s K decreses nd the second term corresponds to the bis term. The second to the lst equlity follows from (). To compre the symptotic risks of the OLS ˆβ estimtor nd the PC estimtor ˆβ, look t the risk difference ( Risk β, ˆβ, ) ( X X Risk β, ˆβ ), X X = Kσ ( K σ + T δ δ ) = K σ T δ δ, which is positive when δ δ is smll. This is the cse if the restriction in () is resonble. In tht cse the OLS estimtor ˆβ is dominted by the PC estimtor ˆβ. Stein-Rule Estimtor Hill nd Judge (97, 99) propose Stein-rule estimtor β tht shrinks the stndrd OLS estimtor ˆβ towrds the principl components estimtor ˆβ : ( ) β = ˆβ ˆσ (T K) + ˆβ R (R(X X) R ) R ˆβ ( ˆβ ˆβ ) (7) = ˆβ + ψ( ˆβ ˆβ ) = ψ ˆβ + ( ψ) ˆβ

where is constnt, R is defined from (), nd the Stein coefficient ψ is the shrinkge from the OLS estimtor ˆβ to the PC estimtor ˆβ. Using R = Λ V nd ˆβ = V Λ ˆδ, we obtin ( ) β = V Λ ˆδ + ˆσ (T K) (V Λ T ˆδ Λ V V Λ (Λ V V Λ V V Λ ) Λ ˆδ V Λ V V Λ ˆδ ). ˆδ () Using tht V V = [ I K ], V V = [ I K ], nd (X X) = T V Λ V, where I K is the K K identity mtrix, we obtin tht ( ) β = ˆσ (T K) T ˆδ ˆδ = V Λ Further rerrngement yields the expression (V Λ ˆδ V Λ ˆδ ) + V Λ ˆδ, ˆδ + ψ(v Λ ˆδ V Λ ˆδ ). (9) β = V Λ ˆδ + ψv Λ ˆδ, () for the Stein-rule estimtor, from which its shrinkge representtion cn now be red. Since the individul t-sttistics of the principl components re given by t i = T ˆδ,i /ˆσ, the coefficient of the K terms in ˆδ corresponding to the discrded principl components cn be written T ˆσ (T K) (T K) = ˆδ ˆδ K K + t i = (T K), () K F K,T K where F K,T K = K K + t i /K is the test sttistic for H : δ =, the restriction of Eqution (). Note tht F K,T K = K K + t i = T ˆδ ˆδ /K K ˆσ = signl from K discrded vribles. noise The Stein coefficient function ψ in the shrinkge representtion of Section is given by ψ i = {, i {,..., K }, ψ, i {K +,..., K}. () The symptotic qudrtic risk weighted with Q = X X for the Stein estimtor β ( Risk β, β, ) [ ( ) ( ) ] X X = E β β X X β β () cn be clculted here but it is rther complicted s it depends on prmeters such s β, σ,, nd on dt chrcteristics such s T, K, K, X (with X determining Λ, V ). Hence, we use Monte Crlo nlysis in the next two sections to exmine the risk of β in comprison with those of ˆβ nd ˆβ. 7

In-Smple Performnce of Stein-Rule Shrinkge Estimtor We conduct Monte Crlo nlysis to compre the risk of β with those of ˆβ nd ˆβ. The risk of the Stein-rule estimtor depends on β, σ,, T, K, K, X. For the risk comprisons we fix T = nd K = while we vry β, σ,, K, nd X.. Simultion Design The elementry model to be studied is the liner eqution y = Xβ + ε, () where y is T vector, X is T K mtrix of regressors, ε is T rndom vector drwn from N (, σ ) distribution, β R K, nd σ R +. We compre the performnce of the Stein-rule estimtor in-smple with the stndrd OLS estimtor nd the principl components estimtor nd employ the following simultion design. We drw mtrix X of N (, ) rndom vribles of dimensions T K, T =, K =. We im to impose different eigenvlue structures on the regressor mtrix X in the spirit of Hill nd Judge (97). To this end, we singulr-vlue decompose X into X = UΛ V nd discrd the digonl mtrix Λ. The regressor mtrix X is then constructed s X = UΛ V, () where Λ is constructed ccording to three different scenrios. The singulr vlues re constnt. Λ = dig(,..., ). () The singulr vlues re linerly decresing from to. The singulr vlues re exponentilly decresing from to. dig(λ ) = + e.k, k =,..., K. (7) In the dt-generting process, we consider different scenrios for the vrince σ of the error process. In prticulr, we set σ {,,, 7}. ()

The dt-generting prmeter vector β is set to [ ] L β =, (9) K k {,...,K} such tht its direction in prmeter spce is (/K) k nd its length is L. We consider different scenrios for the length L of the vector, in prticulr, L {,,, }. () Tble lists the popultion-r for the different resulting scenrios, where popultion-r = β E(X X)β β E(X X)β + T σ. Our simultion design considers only limited region of the spce of simultion design prmeters (T, K, L, σ, Λ). Estimting response surfce for lrger region could give some more indiction on the rnge of dt-sets where gins from Stein-rule estimtion cn be expected. This is left for future reserch. Tble : Popultion R for the different simultion scenrios. Eigenvlues constnt liner exponentil L / σ 7 7 7..7...9.7..7.9.7...7....7...9.7.....7...9....... The performnce of the estimtors is mesured in terms of their risk. The generl risk function considered is Risk (β, β(y, X), Q) = E [ (β (y, X) β) Q (β (y, X) β) ], s shown in (). We study the prticulr cse where Q = I, which results in the stndrd men squred error considered in Jmes nd Stein (9) nd Judge nd Bock (97) nd the second cse where Q = X X s considered in Judge nd Bock (97), Hill nd Judge (97, 99), nd Hill nd Fomby (99). This risk mesure cn be interpreted s the squre of the distnce (Xβ X ˆβ) of the fitted vlue from the signl prt of y. There re few estimtor-specific settings to consider s well, in prticulr the number K of principl components for the principl components estimtor nd the vlue of the prmeter in the Stein-rule shrinkge estimtor. We consider K {,,, } nd (, ). The two numbers 9

interct through the bounds for given in Judge nd Bock (97, p. 9) nd Hill nd Judge (97, p. 7): Here, for K {,,, }, we obtin (K K ). () T K +.,.7,.,.7, so tht we expect the region for in which the Stein-rule shrinkge estimtor performs better thn OLS nd PC estimtors to move towrds the origin s the number of components increses.. Choosing the Number of Principl Components Selecting the number of principl components is problem tht hs spwned lrge literture (see, for exmple, Anderson, Bi & Ng, Hllin & Lisk 7, nd Ontski 9). In this pper, we restrict ourselves to studying the behvior of the Stein-rule estimtor for set of number of components, including the one-fctor model, few components (K = ), moderte number (K = ), nd mny fctors (K = ). Recll tht the number of regressors is K =. Figures to show the risk of the three estimtors, Stein-rule shrinkge, OLS, nd PC, s functions of the prmeter of the Stein-rule shrinkge estimtor. Since the OLS nd PC estimtors do not depend on this prmeter, they re constnts in the grphs. The risk of the OLS estimtor is depicted by dotted line; the risk of the PC estimtor is shown s dshed line. The risk of the Stein-rule shrinkge estimtor is shown s connected dots. The left pnel of four plots in ech figure shows the MSE risk (Q = I); the right pnel of four plots shows the risk for Q = X X. The four plots show the different scenrios for the number K {,,, } of principl components. Ech figure shows different singulr vlue scenrio, Figure shows the cse of constnt singulr vlues equl to two; Figure shows the cse of linerly decresing singulr vlues, nd Figure shows the cse of exponentilly decresing singulr vlues. The grphs show tht the risk of the Stein-rule follows prbol in, which indictes tht there is n optiml, t lest in the simultion scenrios considered. Unlike in the cse of the originl Jmes nd Stein (9) estimtor, this optiml is not nlyticlly known t this point. The minimum of the prbol is moving inwrds towrd the origin s the number K of components increses, s expected. For the scenrios where the singulr vlues re constnt nd where they re linerly decresing, the OLS estimtor performs generlly better thn the PC estimtor. For exponentilly decresing singulr vlues, the PC estimtor often performs better thn the OLS estimtor. The Stein-rule shrinkge estimtor hs greter reltive dvntge over PC nd OLS estimtors for smll number of components (K =, ). For lrger K, the performnce of the Stein-rule shrinkge estimtor pproches tht of the reltively better estimtor mong OLS nd PC. Note tht the singulr vlue scenrios considered in this pper

do not include vlues close to zero s in Hill nd Judge (97). We found tht for most scenrios of this nture, where strong degree of multicollinerity is present, the principl components estimtor performs better thn the Stein-rule shrinkge estimtor.. Different Vrince Scenrios Figures through report the performnce of the estimtors for different noise levels σ {,,, 7}. The orgniztion of the grphs is the sme s described in Section.. Agin, the risk of the Stein-rule shrinkge estimtor describes prbol in, indicting the existence of n optiml prmeter vlue. For low vlues of vrince, OLS performs better thn principl components, nd s the noise level increses, the PC estimtor outperforms OLS. The Stein-rule shrinkge estimtor cn outperform both OLS nd PC estimtors within n intermedite noise rnge.. Different Lengths of the Prmeter Vector β Figures 7 through 9 disply the performnce of the estimtors for different lengths L of the prmeter vector β = L/K. The four plots of ech pnel show the risks of the estimtors for L {,,, }. The orgniztion of the grphs is the sme s described in Section.. If L =, tht is, there is no signl in y, the PC estimtor outperforms both OLS nd the Stein-rule shrinkge estimtors. For lrge vlues of L, OLS performs better thn the other estimtors. On n intermedite rnge, the Stein-rule shrinkge estimtor cn outperform both other estimtors. Recll from Eqution () tht δ = Λ V β where β = [ ] L K. Hence, the length L for β determines the length of δ. Becuse T δ δ is the second term in the symptotic risk of the PC estimtor corresponding to the bis due to the omission of the K principl components, s shown in (), lrge vlue of L increses the risk of the PC estimtor compred to the risk of the OLS estimtor. For the Stein-rule estimtor, lrge vlue of L increses T δ δ, which in turn will increse the F sttistic defined from () F K,T K = T ˆδ ˆδ /K ˆσ nd hence increses the Stein-rule coefficient ψ nd thus reduces the shrinkge from the OLS estimtor ˆβ to the PC estimtor ˆβ.

Out-of-Smple Performnce of Stein-Rule Shrinkge Estimtor. Simultion Design We ssess the out-of-smple (OOS) performnce of the Stein-rule shrinkge estimtor in two different simultion setups. One is exctly the sme s described in Section., only tht the forecst performnce on T = out-of-smple observtions is evluted. The two risk functions considered for the OOS comprison re the men squred forecst error (MSFE) MSFE (β(y, X)) = E[(ŷ y) (ŷ y)], () where ŷ = Xβ(y, X), nd the squred signl-to-prediction distnce s considered in () with Q = X X, Risk(β, β(y, X), X X) = E[(β(y, X) β) X X(β(y, X) β)]. () The second simultion environment tht we study hs AR() time series in the columns of the regressor mtrix X. Tht is, nd the individul columns follow X = [{x,t } {x,t }... {x K,t }] t {,...,T }, () x k,t = φx k,t + σ X,k ξ t,k, k =,..., K, ξ t,k N (, ). () The stndrd devitions of the AR() processes in the columns of X re chosen to correspond to the exponentilly decying sequence employed in Eqution (7): Vr xk,t = σ X,k φ = + e.k, k =,..., K. () Thus, σ X,k = σ X,k (φ) = (+e.k ) φ. Vrying φ replces the vrince dimension considered in the in-smple study. The stndrd devition of the noise in y is set to σ = nd T = out-of-smple observtions re evluted. Note tht principl components re liner combintions of the columns of X, P = XV Λ, (7) nd therefore the individul components re, with some coefficients w k,j determined by V nd Λ, K K K P j,t = w k,j x k,t = φ w k,j x k,t + w k,j σ X,k ξ t,k, k= = φp j,t + η j,t, k= k=

where η j,t = K k= w k,jσ X,k ξ t,k. As long s the AR() prmeter φ is the sme cross ll columns of X, the principl components will themselves be AR() processes with the sme decorreltion length s the individul columns. If different φ k re chosen cross the columns, the principl components will be liner combintions of AR() processes with different persistence prmeters, which cn led to long memory behvior of the components, s described in Grnger (9).. Choosing the Number of Principl Components Figures nd disply the out-of-smple performnce of the estimtors for different numbers of principl components K {,,, }. The orgniztion of the grphs is similr to the one described in Section.. Insted of different singulr vlue scenrios, two different simultion designs re considered. Figure shows the cse where the regressor mtrix X is drwn from independent N (, ) distributions; Figure shows the cse where the regressors re AR() time series. Unlike in the in-smple study, here the reltive performnce of PC nd OLS estimtors chnges with the number K of principl components. For smll numbers, OLS performs better thn PC, nd for lrge K, PC performs better thn OLS. The Stein-rule shrinkge estimtor domintes for up to ten components. There is no obvious difference between the i.i.d. nd the AR() simultion scenrios.. Different Vrince Scenrios Figure shows the performnce of the estimtor when X is drwn from n N (, ) distribution. Similr to the in-smple study, OLS performs best for low noise levels nd PC performs best for high noise levels. The Stein-rule shrinkge estimtor cn outperform both in n intermedite noise rnge. Figure shows the performnce for the estimtors when the columns of X follow AR() dynmics. The four plots in ech pnel show the sitution for different vlues φ {.,.,.9,.99} of the AR-prmeter. The stndrd devition of the error in the AR model is then set through σ X,k (φ) = (+e.k ) φ such tht the stndrd devition of the column follows Eqution (). The figure shows tht the Stein-rule shrinkge estimtor outperforms OLS nd PC estimtors in low persistence scenrios (φ =.,.), wheres in high persistence scenrios (φ =.9,.99) the PC estimtor outperforms both Stein-rule nd OLS. The reltive performnce of OLS nd PC estimtors lso chnges with persistence: In low persistence scenrios, OLS performs better thn PC, nd vice vers for high persistence.

. Different Lengths of the Prmeter Vector β Figures nd show the performnce of the estimtors for different lengths L {,,, } of the prmeter vector. As in the in-smple cse, when L =, PC performs best. For L =, OLS performs better thn PC, but both re dominted by the Stein-rule shrinkge estimtor. For higher vlues of L, OLS performs best mong ll three estimtors. This holds true for both simultion environments, i.i.d. regressors nd AR() regressors. 7 Concluding Remrks In this pper, we hve shown tht the Stein-rule shrinkge estimtor tht shrinks the OLS estimtor towrds the PC estimtor, s proposed in Hill nd Judge (97, 99), cn be represented s shrinkge estimtor for forecsting model s proposed in Stock nd Wtson (). We exmined the performnce of the estimtor in vriety of simultion environments, both in-smple nd out-of-smple. The overll picture tht emerges is tht the Stein-rule shrinkge estimtor cn dominte both OLS nd principl components estimtors within n intermedite rnge of the signl-to-noise rtio. If the noise level is high (high vrince of noise terms) or if the signl is low (short prmeter vector), the principl components estimtor is superior. If the noise level is low (low vrince of noise terms) or if the signl is high (long prmeter vector), OLS is superior. In out-of-smple simultions with AR() regressors, the Stein-rule shrinkge estimtor cn dominte both OLS nd principl components estimtors in low persistence situtions. Acknowledgments We would like to thnk the prticipnts of the Advnces in Econometrics conference in Bton Rouge in Mrch for their helpful comments, in prticulr Crter Hill, Tom Fomby, Stn Johnson, Mike McCrcken, nd Lee Adkins. We would lso like to thnk the orgnizers of the conference, in prticulr Dek Terrell, for job superbly done nd for mny useful comments on this pper. The usul disclimer pplies. References Anderson, T.W. (). An Introduction to Multivrite Sttisticl Anlysis. rd edn. Hoboken, NJ: Wiley. Bi, J. (). Inferentil Theory for Fctor Models of Lrge Dimensions. Econometric, 7(), 7.

Bi, J., & Ng, S. (). Determining the Number of Fctors in Approximte Fctor Models. Econometric, 7(), 9. Bi, J., & Ng, S. (). Confidence Intervls for Diffusion Index Forecsts nd Inference for Fctor- Augmented Regressions. Econometric, 7(),. Bi, J., & Ng, S. (). Forecsting Economic Time Series Using Trgeted Predictors. Journl of Econometrics,, 7. Bir, E., Hstie, T., Pul, D., & Tibshirni, R. (). Prediction by Supervised Principl Components. Journl of the Americn Sttisticl Assocition, (7), 9 7. Btes, J.M., & Grnger, C.W.J. (99). The Combintion of Forecsts. Opertions Reserch Qurterly,,. Clrk, T.E., & McCrcken, M.W. (9). Combining Forecsts from Nested Models. Oxford Bulletin of Economics nd Sttistics, 7(), 9. Fomby, T.B., Hill, R.C., & Johnson, S.R. (97). An Optiml Property of Principl Components in the Context of Restricted Lest Squres. Journl of the Americn Sttisticl Assocition, 7(), 9 9. Fomby, T.B., & Smnt, S.K. (99). Appliction of Stein Rules to Combintion Forecsting. Journl of Business nd Economic Sttistics, 9(), 9 7. Grnger, C.W.J. (9). Long Memory Reltionships nd the Aggregtion of Dynmic Models. Journl of Econometrics,, 7. Hnsen, B. (). Efficient Shrinkge in Prmetric Models. Mnuscript, University of Wisconsin, Mdison. Hill, R.C., & Fomby, T.B. (99). The Effect of Extrpoltion on Minimx Stein-Rule Prediction. In W. Griffiths, H. Lütkepohl, & M.E. Bock (Eds.), Redings in Econometric Theory nd Prctice, A Volume in Honor of George G. Judge (pp. ). Amsterdm: North-Hollnd. Hill, R.C., & Judge, G.G. (97). Improved Prediction in the Presence of Multicollinerity. Journl of Econometrics,,. Hill, R.C., & Judge, G.G. (99). Improved Estimtion under Collinerity nd Squred Error Loss. Journl of Multivrite Anlysis,, 9. Hillebrnd, E., Hung, H., Lee, T.-H., & Li, C. (). Using the Yield Curve in Forecsting Output Growth nd Infltion. Mnuscript, Arhus University nd UC Riverside.

Hung, H., & Lee, T.-H. (). To Combine Forecsts or To Combine Informtion? Econometric Reviews, 9, 7. Inoue, A., & Kilin, L. (). How Useful is Bgging in Forecsting Economic Time Series? A Cse Study of U.S. CPI Infltion. Journl of the Americn Sttisticl Assocition, (),. Jmes, W., & Stein, C.M. (9). Estimtion with Qudrtic Loss. Proceedings of the Fourth Berkeley Symposium on Mthemticl Sttistics nd Probbility,, 79. Judge, G.G., & Bock, M.E. (97). The Sttisticl Implictions of Pre-Test nd Stein-Rule Estimtors in Econometrics. Amsterdm: North-Hollnd. Hllin, M. & Lisk, R. (7). Determining the Number of Fctors in the Generl Dynmic Fctor Model. Journl of the Americn Sttisticl Assocition, (7), 7. Mittelhmmer, R.C. (9). Qudrtic Risk Domintion of Restricted Lest Squres Estimtors vi Stein-Rules Auxiliry Constrints. Journl of Econometrics, 9, 9. Ontski, A. (9). Testing Hypotheses About the Number of Fctors in Lrge Fctor Models. Econometric, 77(), 7 79. Stock, J.H., & Wtson, M.W. (). Forecsting Using Principl Components from Lrge Number of Predictors. Journl of the Americn Sttisticl Assocition, 97, 7 79. Stock, J.H., & Wtson, M.W. (). Forecsting with Mny Predictors. In: G. Elliott, C.W.J. Grnger, & A. Timmermnn (Eds.), Hndbook of Economic Forecsting, Volume (pp. ). Amsterdm: North Hollnd. Stock, J.H., & Wtson, M.W. (). Generlized Shrinkge Methods for Forecsting Using Mny Predictors. Mnuscript, Hrvrd University nd Princeton University.

7 x K = x K = K = K = 7 x K = x K = K = K = 7 Figure : Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re constnt nd equl to two. Other prmeters re set to L =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes. 7

x K = x K = K = K =........... x K =...... K = 7 K = K = Figure : Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re linerly decresing from to. Other prmeters re set to L =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes. x K = x K = K = 7 K =. K =. K = K = K =.......... Figure : Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re exponentilly decresing from to t rte of.. Other prmeters re set to L =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes.

x σ = 7 x σ = σ = σ =. σ =. σ = 7 σ = σ = 7.......... Figure : Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re constnt nd equl to two. Other prmeters re set to L =, K =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes. x σ = x σ = σ = σ =.... x 9 7 σ =..... σ = 7 σ = σ = 7 Figure : Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re linerly decresing from to. Other prmeters re set to L =, K =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes. 9

x σ = x σ = σ = σ =. σ =. σ = 7 σ = σ = 7........ Figure : Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re exponentilly decresing from to t rte of.. Other prmeters re set to L =, K =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes.. L = 7 x L = L = L =....... L =. L = L = L =....... Figure 7: Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re constnt nd equl to two. Other prmeters re set to K =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes.

. L = x L = L = L =.......... L =. L = L = L =....... Figure : Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re linerly decresing from to. Other prmeters re set to K =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes.. L = x L = L = L =...... L =. L = L = L =....... Figure 9: Risk(β(y, X)) = E[(β(y, X) β) Q(β(y, X) β)] s function of. Left pnel: Q = I (MSE), right pnel: Q = X X. The dt-generting singulr vlues re exponentilly decresing from to t rte of.. Other prmeters re set to K =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes.

K = 7 K = K = K = 7 K = K = K = K = Figure : Left pnel MSFE(β(y, X)) = E[(ŷ y) (ŷ y)], where ŷ is T -vector of forecsts of y, s function of. Right pnel: Risk(β, β(y, X), X X) = E[(β(y, X) β) X X(β(y, X) β)] for the T - period forecst smple. The dt-generting eigenvlues re exponentilly decresing from to t rte of.. Other prmeters re set to L =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes. K = K = K = K = K = K = K = K = Figure : Left pnel MSFE(β(y, X)) = E[(ŷ y) (ŷ y)], where ŷ is T -vector of forecsts of y, s function of. Right pnel: Risk(β, β(y, X), X X) = E[(β(y, X) β) X X(β(y, X) β)] for the T - period forecst smple. The columns of the regressor mtrix X re AR() processes. Other prmeters re set to L =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes.

σ = σ = σ = σ = σ = σ = 7 σ = σ = 7 9 7 Figure : Left pnel MSFE(β(y, X)) = E[(ŷ y) (ŷ y)], where ŷ is T -vector of forecsts of y, s function of. Right pnel: Risk(β, β(y, X), X X) = E[(β(y, X) β) X X(β(y, X) β)] for the T - period forecst smple. The dt-generting eigenvlues re exponentilly decresing from to t rte of.. Other prmeters re set to K =, L =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes. φ =. φ =. φ =. φ =. φ =.9 φ =.99 φ =.9 φ =.99 Figure : Left pnel MSFE(β(y, X)) = E[(ŷ y) (ŷ y)], where ŷ is T -vector of forecsts of y, s function of. Right pnel: Risk(β, β(y, X), X X) = E[(β(y, X) β) X X(β(y, X) β)] for the T - period forecst smple. The columns of the regressor mtrix X re AR() processes. Other prmeters re set to K =, L =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes.

L = L = L = L = L = L = L = L = Figure : Left pnel MSFE(β(y, X)) = E[(ŷ y) (ŷ y)], where ŷ is T -vector of forecsts of y, s function of. Right pnel: Risk(β, β(y, X), X X) = E[(β(y, X) β) X X(β(y, X) β)] for the T - period forecst smple. The dt-generting eigenvlues re exponentilly decresing from to t rte of.. Other prmeters re set to K =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes. L = L = L = L = L = L = L = L = Figure : Left pnel MSFE(β(y, X)) = E[(ŷ y) (ŷ y)], where ŷ is T -vector of forecsts of y, s function of. Right pnel: Risk(β, β(y, X), X X) = E[(β(y, X) β) X X(β(y, X) β)] for the T - period forecst smple. The columns of the regressor mtrix X re AR() processes. Other prmeters re set to K =, σ =. The connected dots line shows the performnce of the Stein-like estimtor. For comprison, the performnce of the stndrd OLS estimtor is shown in dots. The performnce of the principl components estimtor is plotted with dshes.