Computer Simulates the Effect of Internal Restriction on Residuals in Linear Regression Model with First-order Autoregressive Procedures

Similar documents
The Effect of Nonzero Autocorrelation Coefficients on the Distributions of Durbin-Watson Test Estimator: Three Autoregressive Models

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

DEPARTMENT OF STATISTICS

Robust estimation based on the first- and third-moment restrictions of the power transformation model

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

Comparing Means: t-tests for One Sample & Two Related Samples

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Licenciatura de ADE y Licenciatura conjunta Derecho y ADE. Hoja de ejercicios 2 PARTE A

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Vehicle Arrival Models : Headway

Dynamic Econometric Models: Y t = + 0 X t + 1 X t X t k X t-k + e t. A. Autoregressive Model:

GMM - Generalized Method of Moments

14 Autoregressive Moving Average Models

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

4.1 Other Interpretations of Ridge Regression

Wednesday, November 7 Handout: Heteroskedasticity

How to Deal with Structural Breaks in Practical Cointegration Analysis

The General Linear Test in the Ridge Regression

Distribution of Estimates

Econ Autocorrelation. Sanjaya DeSilva

Stationary Time Series

Generalized Least Squares

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Testing the Random Walk Model. i.i.d. ( ) r

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

A unit root test based on smooth transitions and nonlinear adjustment

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

Regression with Time Series Data

Unit Root Time Series. Univariate random walk

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

Y 0.4Y 0.45Y Y to a proper ARMA specification.

1. Diagnostic (Misspeci cation) Tests: Testing the Assumptions

The Simple Linear Regression Model: Reporting the Results and Choosing the Functional Form

Time series Decomposition method

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

Forecasting optimally

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Dynamic Models, Autocorrelation and Forecasting

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Chapter 2. First Order Scalar Equations

Improved Approximate Solutions for Nonlinear Evolutions Equations in Mathematical Physics Using the Reduced Differential Transform Method

Mathematical Theory and Modeling ISSN (Paper) ISSN (Online) Vol 3, No.3, 2013

20. Applications of the Genetic-Drift Model

Distribution of Least Squares

GINI MEAN DIFFERENCE AND EWMA CHARTS. Muhammad Riaz, Department of Statistics, Quaid-e-Azam University Islamabad,

EXERCISES FOR SECTION 1.5

Testing for a Single Factor Model in the Multivariate State Space Framework

Lecture 5. Time series: ECM. Bernardina Algieri Department Economics, Statistics and Finance

Solutions to Odd Number Exercises in Chapter 6

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

Solutions: Wednesday, November 14

Cointegration and Implications for Forecasting

2. Nonlinear Conservation Law Equations

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

1 Differential Equation Investigations using Customizable

A New Unit Root Test against Asymmetric ESTAR Nonlinearity with Smooth Breaks

CHAPTER 17: DYNAMIC ECONOMETRIC MODELS: AUTOREGRESSIVE AND DISTRIBUTED-LAG MODELS

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Measurement Error 1: Consequences Page 1. Definitions. For two variables, X and Y, the following hold: Expectation, or Mean, of X.

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Properties of Autocorrelated Processes Economics 30331

Appendix to Creating Work Breaks From Available Idleness

Chapter 5. Heterocedastic Models. Introduction to time series (2008) 1

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

OBJECTIVES OF TIME SERIES ANALYSIS

The equation to any straight line can be expressed in the form:

10. State Space Methods

Financial Econometrics Jeffrey R. Russell Midterm Winter 2009 SOLUTIONS

Empirical Process Theory

FITTING OF A PARTIALLY REPARAMETERIZED GOMPERTZ MODEL TO BROILER DATA

Modeling and Forecasting Volatility Autoregressive Conditional Heteroskedasticity Models. Economic Forecasting Anthony Tay Slide 1

Department of Economics East Carolina University Greenville, NC Phone: Fax:

STATE-SPACE MODELLING. A mass balance across the tank gives:

SOLUTIONS TO ECE 3084

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Vectorautoregressive Model and Cointegration Analysis. Time Series Analysis Dr. Sevtap Kestel 1

Matlab and Python programming: how to get started

Failure of the work-hamiltonian connection for free energy calculations. Abstract

Stochastic Model for Cancer Cell Growth through Single Forward Mutation

Hypothesis Testing in the Classical Normal Linear Regression Model. 1. Components of Hypothesis Tests

Analyze patterns and relationships. 3. Generate two numerical patterns using AC

Modeling Economic Time Series with Stochastic Linear Difference Equations

(10) (a) Derive and plot the spectrum of y. Discuss how the seasonality in the process is evident in spectrum.

Stat 601 The Design of Experiments

Random Walk with Anti-Correlated Steps

Chapter 4. Truncation Errors

An introduction to the theory of SDDP algorithm

Final Spring 2007

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

dy dx = xey (a) y(0) = 2 (b) y(1) = 2.5 SOLUTION: See next page

Solutions from Chapter 9.1 and 9.2

Introduction to Probability and Statistics Slides 4 Chapter 4

Transcription:

MPRA Munich Personal RePEc Archive Compuer Simulaes he Effec of Inernal Resricion on Residuals in Linear Regression Model wih Firs-order Auoregressive Procedures Mei-Yu Lee Deparmen of Applied Finance, Yuanpei Universiy 2014 Online a hps://mpra.ub.uni-muenchen.de/60362/ MPRA Paper No. 60362, posed 4 December 2014 13:08 UC

Journal of Saisical and Economeric Mehods, vol.3, no.3, 2014, 1-22 ISSN: 2241-0384 (prin), 2241-0376 (online) Scienpress Ld, 2014 Compuer Simulaes he Effec of Inernal Resricion on Residuals in Linear Regression Model wih Firs-order Auoregressive Procedures Mei-Yu Lee 1 Absrac his paper demonsraes he impac of paricular facors such as a non-normal error disribuion, consrains of he residuals, sample size, he muli-collinear values of independen variables and he auocorrelaion coefficien on he disribuions of errors and residuals. his explains how residuals increasingly end o a normal disribuion wih increased linear consrains on residuals from he linear regression analysis mehod. Furhermore, reduced linear requiremens cause he shape of he error disribuion o be more clearly shown on he residuals. We find ha if he errors follow a normal disribuion, hen he residuals do as well. However, if he errors follow a U-quadraic disribuion, hen he residuals have a mixure of he error disribuion and a normal disribuion due o he ineracion of linear requiremens and sample size. hus, increasing he consrains on he residual from more independen variables causes he residuals o follow a normal 1 Deparmen of Applied Finance, Yuanpei Universiy, Hsinchu, aiwan. E-mail: mylee@gm.ypu.edu.w Aricle Info: Received : June 14, 2014. Revised : July 27, 2014. Published online : Augus 15, 2014.

2 Inernal Resricion on Residuals disribuion, leading o a poor esimaor in he case where errors have a non-normal disribuion. Only when he sample size is large enough o eliminae he effecs of hese linear requiremens and muli-collineariy can he residuals be viewed as an esimaor of he errors. Mahemaics Subjec Classificaion: 37M10; 62M10; 37M05; 68U20 Keywords: ime series; Auoregressive model; Compuer simulaion; Non-normal disribuion 1 Inroducion I is reasonable o quesion why business sudies always use linear regression models, bu he engineering and qualiy managemen fields do no rely on such models. Researchers face a residual disribuion differen o he error disribuion, and his diverges from he resul of Box and Pierce (1970). Box and Pierce suppose ha residuals ha have a good fi should be he rue errors and can be regarded as esimaors of he errors in an auoregressive process. hus, some residuals ha are viewed as good esimaors of errors include he Durbin-Wason es saisic (Durbin and Wason, 1950, 1951) and he LaGrange Muliplier es saisic (Berusch and Pagan, 1980) in he linear regression model wih an auoregressive error process. In general, residuals are viewed as he highes represenaion of errors and are combined as an esimaor ha is used o esimae he properies of errors such as he serial correlaion of errors or he error disribuion. In he lieraure, Yule (1921) firs discusses he problem of serial correlaion, and hen Roos (1936) provides basic soluions regarding how independen variables are independen by he use of choosing lagged ime and how he rend and flucuaion can be grabbed. Box and Pierce (1970) also noe ha residual auocorrelaion can be approximaed

Mei-Yu Lee 3 o he linear ransformaion of error auocorrelaion and possesses a normal disribuion. herefore, he residual disribuion plays an imporan role residuals can be used o es he iniially assumed error disribuion. For example, he errors have a normal disribuion if he residuals are idenified as having a normal disribuion. However, ime series daa may have a non-normal error disribuion which conradics an assumpion in he linear regression model. hen, he auoregressive error procedure conradics he assumpion of he linear regression model regarding he independence of errors. his paper aims o explain ha he disribuions of he residuals are separae from he disribuions of he errors by using a probabiliy simulaion approach. ha is, we run a compuer simulaion wih a random number able wih a uniform disribuion beween 0 and 1 o invesigae he following hree poins: (1) if he errors are disribued normally, hen he residuals also have a normal disribuion, (2) if he errors have a non-normal disribuion, hen he residuals also have a non-normal disribuion, and (3) as he sample size becomes larger, he residual disribuion approaches o he error disribuion, ha is, he law of large numbers gradually has influence over he disribuion of he residuals. he second poin is based on an example of he U-quadraic disribued errors. If he argumen of Box and Pierce were righ, hen he residuals should have a U-quadraic disribuion, and no a normal disribuion. However, he calculaed residuals appear o have a normal disribuion, conradicing Box and Pierce. his paper presens compuer simulaion resuls showing ha he normaliy of he residuals resuls from he number of independen variables ha deermine he consrains of he residuals, X εˆ = 0, wih 1 plus he number of independen variables. he consrains of he residuals, X εˆ = 0, are referred o as he linear requiremens of he linear regression model and is number of consrains is he same as he degree of freedom (Lee, 2014a, 2014b, 2014c). he hird poin refers o he change in he residual disribuion wih a fixed number of independen variables and a U-quadraic error disribuion when he sample sizes become larger.

4 Inernal Resricion on Residuals he paper is srucured as follows. Secion 2 describes he model seing and simulaion procedure. Secion 3 gives he resuls of he hree cases where he error has a normal disribuion, he error has a U-quadraic disribuion, and he sample size is changed. Secion 4 concludes his paper. 2 Model and simulaion procedures Consider a linear regression model wih k independen variables and sample sizes, as Y = X β + ε ( 1) ( k ) ( k 1) ( 1) on he condiions of E( ε) = 0, E( X ε) = 0 procedure is and he firs-order auoregressive ε + +1 = ρε µ where = 1,2,, 1 and < 1 Yˆ = Xβˆ, hen he residuals are εˆ = ε X X 1 ˆ = and ρ. he esimaors are β ( X X) X Y 1 ( X ) ε, 1 ( X) X ε = I X( X X) and linear requiremen are X εˆ = 0 where Y = Yˆ + εˆ (Balagi 2011). When Yˆ is normal disribuion, he regression coefficien poin esimaors also are normal disribuion and εˆ is normal disribuion, Ŷ approximaes o normal disribuion, if k is enough large. We calculae he mean-square-error (MSE) as ˆ = from ( ε) 0 ˆˆ MSE = ( Yˆ Xβˆ ) ( Yˆ Xβˆ ) E and E( ε ε ) = ( X X) MSE ˆ ˆ 1. hus, he wo main facors ha

Mei-Yu Lee 5 affec he probabiliy disribuion of he residuals are he assumpion of he error disribuion and he linear requiremen of X εˆ = 0, which has k+1 consrains. Moreover, he sample size, muli-collinear values of independen variables and he auocorrelaion coefficien affec he disribuion of he residuals. However, as he disribuion of he residuals in he above model is difficul o formulae, his paper only uses compuer simulaions o show how facors affec he relaion beween he errors and he residuals. 2.1 he simulaor mehod he sampling disribuion of a es saisic may or may no be known. In paricular, some sample disribuion of he es saisic canno be ransferred using radiional mahemaical echniques such as calculus or Mone Carlo mehods. he concep of he Mone Carlo mehod is a good simulaion mehod, bu he use of coninuous daa isn possible in our compuer program. o manage he coninuous normal disribuion (or U-quadraic disribuion), we run a compuer simulaion using a sofware program ha can work wih any probabiliy disribuion ransformaion. he probabiliy heory of his paper has been creaed using he basic concep of a probabiliy disribuion simulaor, and he funcions of coninuous random variables can be ransferred from a uniform disribuion wih parameers of 0 and 1, X ~ U(0,1). hus, we can generae he daa and compue he coefficiens and images. 2 he compuer simulaion is based on he following seps. 2 he sofware program is named as Whie model I ha can be download from hp://goo.gl/oudpsp. he disribuions of he h error and residual, he disribuions of sum of he errors and residuals can be simulaed by he sofware. he simulaion echnology is from C.C.C. Ld. (hp://psccc.com.w/en/produc). And he U-quadraic disribuion formula can reference a hp://psccc.com.w/uploads/files/probabiliy/1/chaper_one_02.pdf.

6 Inernal Resricion on Residuals (1) Generae daa from a random number able of U( 0,1). Each value of he U-quadraic disribuion can be obained when he value is from he inverse funcion of a Normal disribuion (or U-quadraic disribuion). (2) Collec he values whose number maches he number of errors. Clearly, hese values are i.i.d. Normal disribuion (or U-quadraic disribuion). ha is he se values of µ from he serial correlaion model where E 2 ( µ ) = 0, Var( µ ) = σ, 1,...,, = and or U-quadraic disribuion, ε = ρ ε + µ, = 1,2,..., 1, ρ < 1. + 1 + 1 When ρ is known, ε + 1 can be found. (3) he residuals follow he poin esimae requiremens of he linear model. Y = β 0 + β1 X 1, +... + β k X k, + ε, 1,...,, = If he number and values of independen variables are known, hen ˆ β ˆ β,..., ˆ 0, 1 β k can be esimaed, whereas εˆ is consrained by X εˆ = 0. Meanwhile, he simulaor obeys he linear model mehod and creaes he residual values ( εˆ ). hus, he simulaion process includes: Sep 1: Giving he inercep and slope value, and he daa se of independen variables. Sep 2: Using he simulaion mehod o ge he error daa se from he probabiliy disribuion wih sample size,. Sep 3: According o he linear model, compuing he daa se of dependen variables: Y = Xβ + ε.

Mei-Yu Lee 7 Sep 4: Calculaing he poin esimaor values of he regression coefficien and geing he esimaed values of dependen variables: Yˆ = Xβˆ. Sep 5: Calculaing he daa se of residuals. εˆ = Y Xβˆ. ε is simulaed by he Sep 1 and 2, and εˆ is simulaed by he Sep 1 o 5. ε and εˆ are simulaed oally 32768 2 1024 imes, and hen generaed 32768 2 1024 values, respecively, o form he frequency disribuion ha can reach o he real ε and εˆ disribuions. 3 Main Resuls 3.1 he errors follow normal disribuion Secion 3.1 discusses he condiion ha he disribuion of he errors is a normal disribuion wih 6 independen variables, 15 samples and he auocorrelaion coefficien of he errors is zero. hus, he firs column of Figure 1 is he disribuion of he errors, which is a sandard normal disribuion, and he second column of Figure 1 illusraes he shape and coefficien of he 7 h residual disribuion where he coefficiens of he mean, skewness, and kurosis represen a normal disribuion and are he same as he error disribuion. he residuals can be viewed as an esimaor of he errors because Figure 1 guaranees ha he disribuion of he residuals is he same as he disribuion of he errors. hus, Figure 1 suppors he conclusion of Box and Pierce (1970) when he errors are normally disribued. he reason is as follows: he residuals are a combinaion of he errors in he linear regression model, ha is ( β) ε ε ˆ = Y Yˆ = X β ˆ +, and he auoregressive procedure akes he errors oward he auocorrelaion wih each oher wihou changing he propery beween he residuals and he errors. A he same ime, he addiive propery of a normal

8 Inernal Resricion on Residuals disribuion impacs he linear combinaion of he errors. hus, he residuals show he normally disribued propery of he errors. f(w1),f(w1), W1=error(1) f(w17),f(w17),w17=residual(7) Mahemaical Mean: 0.00004 Variance : 0.99995 S.D. : 0.99997 Skewed Coef. : -0.00034 Kurosis Coef. : 3.00023 Mahemaical Mean: -0.00003 Variance : 0.58319 S.D. : 0.76367 Skewed Coef. : 0.00002 Kurosis Coef. : 3.00068 Figure 1: he errors and residuals disribuions as he errors are normal disribuion Anoher reason is ha he mahemaical formula of he normal disribuion includes sin and cos funcions, which are cyclical funcions. herefore, he residuals follow he normal disribuion when he error disribuion is a normal disribuion. On he oher hand, if he error disribuion has no properies of sin or cos funcions, such as logisic, uniform, U-quadraic, or an exponenial disribuion, hen he errors have no cyclical propery and may no be shaped as per he normal disribuion, bu like oher disribuions insead. his paper confirms he proposiion below. Proposiion 1 he residuals are normally disribued when he errors are normally disribued because a normal disribuion wih a sin and cos funcion form is subjeced o he addiive propery.

Mei-Yu Lee 9 However, i is hard o promise ha he errors will always be normally disribued when researchers do no have he populaion daa. Samples should always be esed o classify wha disribuions hey follow. 3.2 he errors follow U-quadraic disribuion he paper gives an example of non-normal error disribuion ha he errors, were U-quadraic disribuion, hen he compuer simulaion offers evidence abou he disribuions of he errors and residuals on he condiion of 6 independen variables, 15 samples, he 1 lagged period, variance of error is 1 and he auocorrelaion coefficien of he errors is zero. Lee (2014c) supposed ha he values of independen variables have serious impac on he residuals, hus he paper simulaneously discusses he disribuions of he residuals a wo cases where he values of independen variables are separaely wih low and high muli-collinearliy in he linear regression model wih he firs-order auoregressive error procedure. he hird column of Figure 2 presens he disribuion of he firs residual ha is generaed from he independen variables wih he populaion correlaion coefficien is 0.99. Figure 2 shows ha he firs column is he error disribuion which is U-quadraic disribuion, he second column is he disribuion of he firs residual wih low muli-collineariy and he hird column is he disribuion of he firs residual wih high muli-collineariy. he disribuions of he firs residual in he second and hird columns are as similar as normal disribuion while he disribuion of he errors is U-quadraic disribuion. he residual disribuions in Figure 2 are differen from he error disribuion, hus, he residuals canno be regarded as an esimaor of he errors when he errors are non-normal disribuion, moreover, he serial correlaion es for auocorrelaion of he errors is no suiable o use he mahemaic combinaion of he residuals because he difference beween he µ,

10 Inernal Resricion on Residuals disribuions of he errors and residuals. he researchers should firs invesigae he disribuion of he daa o classify wha disribuion he daa is or hey always obain he resul ha he errors follow normal disribuion from he residuals. f(w1),f(w1), W1=error(1) f(w11),f(w11),w11=residual(1) wih low muli-collineariy f(w11),f(w11),w11=residual(1) wih high muli-collineariy Mahemaical Mean: 0.00071 Variance : 0.99998 S.D. : 0.99999 Skewed Coef. : -0.00125 Kurosis Coef. : 1.19053 Mahemaical Mean: -0.00004 Variance : 0.07704 S.D. : 0.27757 Skewed Coef. : 0.00041 Kurosis Coef. : 2.82251 Mahemaical Mean: -0.00005 Variance : 0.58928 S.D. : 0.76764 Skewed Coef. : -0.00003 Kurosis Coef. : 2.31065 Figure 2: he error and residual disribuions when error follows U-quadraic disribuion (=15) he main difference beween Figure 1 and Figure 2 is he assumpion of he error disribuion, bu he residuals show a normal disribuion in Figure 2. here mus be some special facors no ye discovered, independen of he error disribuion which has he propery of sin and cos funcions or he propery of addiion from he normal disribuion. Lee (2014b) discovered ha he number of independen variables from 1 o 6 causes he residual disribuion o end owards a normal disribuion in he linear regression model wih an auoregressive procedure. herefore, he paper supposes ha errors are no resriced, bu ha he residuals are resriced by X εˆ = 0 which has k+1 consrains. his linear requiremen of he residuals in he linear regression model disors he error disribuion away

Mei-Yu Lee 11 from he original disribuion o a normal disribuion. he linear requiremen of he residuals can produce wo sources in ANOVA able when he number of independen variables is k and he sample size is. he firs source is he sum of squares in regression (SSR) whose degree of freedom is k. he second source is he sum of squares of error (SSE) whose degree of freedom is -k-1. hese wo sources decide he shape of he residual disribuion. As k becomes larger, he residuals are resriced by more consrains, similar o a linear combinaion of random variables. If more random variables are added o he linear combinaion, he new random variable ends oward he normal disribuion. he residuals ha are resriced by more consrains also represen similar saes such ha he disribuion of he residuals ends oward a normal disribuion. If we consider he collinear values of independen variables, hen he collinear effec affecs he convergence o a normal disribuion when he error disribuion is a U-quadraic disribuion. he hree columns of Figure 2 show ha he residuals wih convergence o normaliy are weakened by high muli-collinear values of independen variables when he number of independen variables is fixed. he coefficiens of Figure 2 express he variance of he residuals, S ( εˆ ). MSE increases as he values of independen variables change from low o high muli-collineariy. he shape of he disribuion in he hird column of Figure 2 shows ha he disribuion of he firs residual has a fla region around is mean and has more lef-skewness and less cenralizaion. hus, he high muli-collineariy causes a big problem wih he residuals, and so he disribuion of he firs residual is no he same as he shape of he disribuion in he second column of Figure 2. he muli-collineariy of he independen variables resuls in he calculaion and combinaion of he residuals becoming more complex. Moreover, he variance and disribuion of he residuals are boh disurbed by he collineariy when he errors follow a non-normal disribuion. When he errors are asymmeric, he residuals

12 Inernal Resricion on Residuals are delayed in heir convergence o normaliy. o address his, he linear regression model adds more independen variables o accelerae he residuals convergence o normaliy because more and more independen variables can resric he residuals and weaken he collinear effec. Meanwhile, he degree of freedom also decreases. 3 his paper proposes a second proposiion as follows. Proposiion 2. (1) A larger number of independen variables, k, brings faser convergence o normaliy on he residuals when he errors have a non-normal disribuion. (2) Higher muli-collinear values of independen variables cause slower convergence o normaliy for he residuals. (3) Higher muli-collinear values and a smaller number of independen variables cause he disribuion of he residuals o follow neiher a non-normal disribuion nor he error disribuion, bu a mixure of he error disribuion and a normal disribuion according o he consrains of he residuals. he muli-collinear propery and consrains on he residuals have opposie effecs, simulaneously disurbing he disribuion of he residuals which becomes a mixed disribuion beween a normal disribuion and he error disribuion. 3.3 Sample size is changed he sample size effec plays a very imporan role in ime series models ha can represen he law of large numbers. hus, he residuals may be regarded as a good esimaor of he errors. he simulaion case only changes he sample size from 9 3 In fac, he mulicollinear case of he compuer simulaion implies ha only he number of independen variables is more han 20 and he degree of freedom is more han 2, hen he residuals and coefficiens of regression model wih firs-order auoregressive error process will have normal disribued poin esimaors when he low mulicollieariy of independen variables exiss and he errors follow independenly idenically disribuion wih symmeric a zero.

Mei-Yu Lee 13 o 107 on he condiion of 6 independen variables. he populaion variance is 1, he auocorrelaion coefficien is zero, he values of independen variables are from he fron of he daa se, and he simulaed seing follows secion 3.2. he residuals gradually revealed he propery of he errors when he sample sizes increased from 9 o 107. Wihou loss of generalizaion, his paper only shows he shapes and coefficiens of disribuions for he able -1, which are in Appendix A. h 0.5 residual in Figure 3 and =14, he 7 h residual =60, he 30 h residual Mahemaical Mean: -0.00003 Variance : 0.50400 S.D. : 0.70993 Skewed Coef. : -0.00030 Kurosis Coef. : 2.47090 Mahemaical Mean: 0.00014 Variance : 0.86864 S.D. : 0.93201 Skewed Coef. : -0.00019 Kurosis Coef. : 1.63269 Figure 3: he h 0.5 residual s disribuion h Figure 3 explores he disribuions of he 0.5 residual a = 14 and 60. he second column of Figure 3 has a less regular disribuion shape han he firs column a = 14. Meanwhile, he shape of he 30 h residual a = 60 ends oward a U-quadraic disribuion more han he disribuion of he 7 h residual a = 14. h Figure 3 clearly shows ha he larger sample sizes cause he 0.5 residual o end owards he error disribuion. hus, he sample size effec can limi he effec

14 Inernal Resricion on Residuals of he linear requiremens on he residuals. A he same ime, i allows he disribuion of he residuals o represen more properies of he errors. he coefficiens in able A-1 show ha he h 0.5 residual has a mean of around zero, skewed coefficiens, larger variances, and decreasing kurosis when he sample sizes increased from 9 o 107. In comparison wih he firs column in Figure 2, he coefficiens in able A-1 approach he coefficiens in he firs column in Figure 2 when he sample sizes increased, especially for he variance and kurosis coefficiens. he coefficiens ha change wih differen sample sizes also show ha he disribuion of he residuals can be an esimaor of he errors if he sample size effec is sufficienly larger han he effec of consrains on he residuals. he ineracion beween he sample size and he linear requiremen h causes he differen shape and coefficiens of he disribuion of he 0.5 residual. he residuals become more cenralized when he linear requiremen has more consrains. Neverheless, he residuals are affeced by he error disribuion and are more likely o end oward he error disribuion when he number of consrains is no large enough. In oher words, he residual disribuion is vasly differen from he error disribuion because he residuals are affeced by he linear requiremen of linear regression models, he error disribuion, sample sizes, and he values of independen variables. I is generally assumed ha errors are normally disribued, which is a symmeric disribuion. However, he residuals can be an esimaor of he errors when k+1 is greaer han 20, ranges from 23 o 23+(k-19), and here is lile or no muli-collineariy. 3.4 Auocorrelaion coefficien of he errors is 0.7 he above saemens are on he condiion of zero auocorrelaion of he errors. However, nonzero auocorrelaion cases are usually seen in he daa. We discuss he case where he auocorrelaion coefficien of he errors is 0.7 and oher

Mei-Yu Lee 15 condiions are as he same as secion 3.2. here are also wo cases of low and high muli-collinear values of independen variables. We show he shapes and coefficiens of disribuion of firs and 15 h errors and residuals in Figure 4 and Figure 5, respecively. he 1s error he 1s residual Low muli-collineariy he 1s residual High muli-collineariy Mahemaical Mean: 0.00006 Variance : 0.99999 S.D. : 0.99999 Skewed Coef. : -0.00012 Kurosis Coef. : 1.19049 Mahemaical Mean: -0.00004 Variance : 0.52980 S.D. : 0.72787 Skewed Coef. : -0.00005 Kurosis Coef. : 2.17951 Mahemaical Mean: -0.00005 Variance : 0.25373 S.D. : 0.50372 Skewed Coef. : -0.00013 Kurosis Coef. : 2.62116 Figure 4: he firs error and residual disribuions when error follows U-quadraic disribuion and auocorrelaion coefficien is 0.7 he disribuion of he firs error follows a U-quadraic disribuion, bu he firs residuals ha are affeced by he degree of muli-collineariy show differen shapes for he disribuion of he residuals in Figure 4. he disribuion of he firs residuals canno represen he propery of he firs error and greaer muli-collinear values of independen variables cause more cenralized disribuions of he firs residuals. Anoher imporan view from coefficiens in Figure 4 is ha he coefficiens are similar o each oher among he hree columns, bu he populaion disribuion of he firs column and sampling disribuions of he second and hird columns have compleely differen shapes from each oher. On he oher hand, he coefficiens

16 Inernal Resricion on Residuals may highligh quesionable resuls ha are used o pass hypohesis esing when researchers only invesigae he means and variances of residuals. Comparing he hree columns, he linear requiremen and muli-collineariy push he firs residual owards cenralizaion. Higher muli-collinear values of independen variables induce more cenralized residuals. his sudy also ran he disribuions of he 15 h error and residuals which are divided ino wo pars including low and high muli-collineariy in Figure 5. 4 he disribuion of he 15 h error is no he same as he U-quadraic disribuion, bu is more cenralized as per he normal disribuion because of he nonzero auocorrelaion coefficien. Meanwhile, he disribuions of he 15 h residual are more similar o he disribuion of he 15 h error in Figure 5 han he firs residual in Figure 4. he means and variances a he firs and hird columns in Figure 5 highligh ha he 15 h residual is very similar o he 15 h error. However, he diagrams and kurosis coefficiens show ha here is a vas difference beween he 15 h residual and he 15 h error. Comparing Figure 4 and Figure 5, he coefficiens of he errors show similar coefficiens, excep for he kurosis coefficiens. he diagrams of he firs column from Figure 4 and Figure 5 are from he U-quadraic shape o he opposie of he U-quadraic shape because of he non-zero auocorrelaion coefficien. he diagrams of low muli-collinear residuals show bulging from he firs residual o he 15 h residual, and he diagrams of high muli-collinear residuals are similar. his highlighs ha he nonzero auocorrelaion coefficien and muli-collineariy inerac on he residuals. Higher muli-collineariy decreases he effec of he auocorrelaion coefficien on he residuals 4 We also simulae he case of -0.7 auocorrelaion coefficien in Appendix B.

Mei-Yu Lee 17 he 15 h error he 15 h residual Low muli-collineariy he 15 h residual High muli-collineariy Mahemaical Mean: -0.00008 Variance : 0.99992 S.D. : 0.99996 Skewed Coef. : 0.00025 Kurosis Coef. : 2.38056 Mahemaical Mean: -0.00003 Variance : 0.60125 S.D. : 0.77540 Skewed Coef. : -0.00008 Kurosis Coef. : 2.56853 Mahemaical Mean: 0.00004 Variance : 0.83748 S.D. : 0.91514 Skewed Coef. : 0.00026 Kurosis Coef. : 2.61272 Figure 5: he 15h error and residual disribuions when error follows U-quadraic disribuion and auocorrelaion coefficien is 0.7 4 Conclusion he purpose of his paper is o explain why residuals canno perfecly represen errors. his paper confirms ha residuals ha require specific condiions can be viewed as an esimaor of he errors, bu i may no be appropriae o assume a normal disribuion because of he properies of he daa. However, if he errors are normally disribued, he residuals can be a good esimaor of he errors because of he properies of a normal disribuion. his paper also shows, via compuer simulaion resuls, how a non-normal disribuion, linear requiremens, muli-collineariy, sample size, and he auocorrelaion coefficien affec he disribuions of he errors and he residuals. Firs, residuals are resriced, bu errors are no. Hence, he values of he residuals are consrained by a linear requiremen from he regression analysis, and his causes he residuals o no perfecly represen errors. Second, his paper supposes ha he linear requiremen, sample sizes, and he values of independen variables

18 Inernal Resricion on Residuals inerac in he linear regression model. hus: (1) he consrains of he linear requiremens are high enough ha he residuals follow a normal disribuion when he sample sizes are fixed, regardless of he error erm assumed. (2) If he error erm is assumed o follow a normal disribuion, hen he residuals follow a normal disribuion. (3) Larger sample sizes resul in he residuals revealing properies of he error erm when he linear requiremen is fixed. References [1] B.H. Balagi, Economerics, Fifh ediion, Springer: New York, 2011. [2] C.F. Roos, Annual Survey of Saisical echniques: he Correlaion and Analysis of ime Series--Par II, Economerica, 4(4), (1936), 368-381. [3] G.E. Box, D.A. Pierce, Disribuion of Residual Auocorrelaions in Auoregressive-inegraed Moving Average ime Series Models, Journal of he American Saisical Associaion, 65(332), (1970), 1509-1526. [4] G.U. Yule, On he ime-correlaion Problem, wih Especial Reference o he Variae-Difference Correlaion Mehod, Journal of he Royal Saisical Sociey, 84(4), (1921), 497-537. [5] J. Durbin, G.S. Wason, esing for Serial Correlaion in Leas Squares Regression: I, Biomerika, 37(3/4), (1950), 409-428. [6] J. Durbin, G.S. Wason, esing for Serial Correlaion in Leas Squares Regression. II, Biomerika, 38(1/2), (1951), 159-177. [7] M.Y. Lee, he Paern of R-Square in Linear Regression Model wih Firs-Order Auoregressive Error Process and Bayesian propery: Compuer Simulaion, Journal of Accouning & Finance Managemen Sraegy, 9(1), (2014a). [8] M.Y. Lee, Limiing Propery of Durbin-Wason es Saisic, manuscrip, (2014b).

Mei-Yu Lee 19 [9] M.Y. Lee, he Conflic of Residual and Error Simulaed in Linear Regression Model wih AR(1) Error Process, manuscrip, (2014c). [10].S. Breusch, L.G. Godfrey, A review of recen work on esing for auocorrelaion in dynamic economic models," Universiy of Souhampon, (1980).

20 Inernal Resricion on Residuals Appendix A. Secion 3.3 shows how he change of sample size affecs he disribuion of each residual. However, each differen sample size case has residuals, he paper shows he coefficiens of he 0.5 h residual whose 0.5 h is half. he coefficiens of able A-1 almos have he same means of he 0.5 h residual and canno le readers know wha he difference among hose disribuions of he 0.5 h residual from differen sample sizes. hus, he paper pu he coefficiens of he 0.5 h residual in Appendix and he graphs of disribuion of he 0.5 h residual in he main conen. able A-1. he coefficiens of he 0.5 h residual in differen sample sizes 9 10 12 0.5 h 5 5 6 Coefficiens Mahemaical Mean: 0.00002 Variance : 0.12050 S.D. : 0.34712 Skewed Coef.: 0.00008 Kurosis Coef.: 2.61692 MAD : 0.28164 Range : 2.18595 Median : 0.10410 IQR : - 1.09240 Mahemaical Mean: 0.00010 Variance : 0.66584 S.D. : 0.81599 Skewed Coef. : 0.00007 Kurosis Coef. : 2.14263 MAD : 0.69024 Range : 4.36179 Median : 0.05433 IQR : -1.67194 Mahemaical Mean: 0.00003 Variance : 0.41115 S.D. : 0.64121 Skewed Coef. : -0.00049 Kurosis Coef. : 2.52345 MAD : 0.52335 Range : 4.03883 Median : -0.59530 IQR : -0.01178 14 16 18 0.5 h 7 8 9 Coefficiens Mahemaical Mean: 0.00003 Variance : 0.50400 S.D. : 0.70993 Skewed Coef. : -0.00030 Kurosis Coef.: 2.47090 MAD : 0.58384 Range : 4.96694 Median : -0.49846 IQR : 0.60169 Mahemaical Mean: 0.00015 Variance : 0.69658 S.D. : 0.83462 Skewed Coef. : 0.00016 Kurosis Coef.: 2.09787 MAD : 0.71287 Range : 5.26536 Median : 1.13480 IQR : 1.81239 Mahemaical Mean: -0.00007 Variance : 0.67610 S.D. : 0.82225 Skewed Coef. : -0.00021 Kurosis Coef. : 2.15551 MAD : 0.69899 Range : 5.76724 Median : 0.51382 IQR : 0.72251 20 30 40 0.5 h 10 15 20 Coefficiens Mahemaical Mean: -0.00001 Variance : 0.57802 S.D. : 0.70280 Skewed Coef. : 0.00022 Kurosis Coef. : 2.35887 MAD : 0.63243 Range : 5.39457 Median : -0.87208 IQR : -0.44459 Mahemaical Mean: -0.00011 Variance : 0.74874 S.D. : 0.86530 Skewed Coef. : 0.00018 Kurosis Coef. : 1.97507 MAD : 0.75134 Range : 5.66672 Median : -0.36794 IQR : -0.17231 80 107 0.5 h 40 50 Coefficiens Mahemaical Mean: 0.00004 Variance : 0.97787 S.D. : 0.98888 Skewed Coef. : -0.00034 Kurosis Coef.: 1.26972 MAD : 0.94699 Range : 3.91584 Median : 0.88092 IQR : -2.35631 Mahemaical Mean: 0.00028 Variance : 0.92229 S.D. : 0.96036 Skewed Coef.: -0.00034 Kurosis Coef.: 1.46089 MAD : 0.89520 Range : 4.92764 Median : 1.02281 IQR : -0.48487 Mahemaical Mean: -0.00011 Variance : 0.90912 S.D. : 0.95348 Skewed Coef.: 0.00022 Kurosis Coef.: 1.50373 MAD : 0.88326 Range : 4.90625 Median : -0.30457 IQR : -0.51299

Mei-Yu Lee 21 Appendix B. he paper also simulaes he siuaion ha he auocorrelaion of he errors is -0.7. he residuals have he smaller mean and variance and more negaive skewness on he condiion of high mulicollineariy. he 1 s error he 1 s residual Low muli-collineariy he 1 s residual High muli-collineariy Figure B-1. Mahemaical Mean: -0.00000 Mahemaical Mean: -0.00005 Mahemaical Mean: -0.00016 Variance : 0.99996 Variance : 0.92580 Variance : 0.49797 S.D. : 0.99998 S.D. : 0.96219 S.D. : 0.70567 Skewed Coef. : 0.00008 Skewed Coef. : -0.00003 Skewed Coef. : -0.00069 Kurosis Coef. : 1.19052 Kurosis Coef. : 2.13261 Kurosis Coef. : 2.66838 he firs error and residual disribuions when error follows U-quadraic disribuion and auocorrelaion coefficien is -0.7 Wih comparison of he auocorrelaion coefficiens of he errors ha are 0.7 and -0.7, he means of he firs error and he 15 h error have less means when auocorrelaion coefficien is -0.7. Second, he probabiliy disribuion of he firs error in Figure B-1 is as he same as in he lef side of Figure 4, so does he 15 h error in Figure B-2 and Figure 5. hird, he low muli-collineariy siuaion shows ha he firs residual has larger variance and less cenralizaion in Figure B-1 han in Figure 4. However, he 15 h residual is more cenralized in Figure B-2 han in Figure 5. Finally, he high muli-collineariy siuaion explores ha he firs residual in Figure B-1 has larger variance and more cenralizaion han in Figure 4,. However, he 15 h residual has smaller variance and less cenralizaion in Figure B-2 han in Figure 5.

22 Inernal Resricion on Residuals he 15 h error he 15 h residual Low muli-collineariy he 15 h residual High muli-collineariy Figure B-2. Mahemaical Mean: -0.00024 Mahemaical Mean: -0.00012 Mahemaical Mean: -0.00017 Variance : 0.99998 Variance : 1.08709 Variance : 0.82336 S.D. : 0.99999 S.D. : 1.04264 S.D. : 0.90739 Skewed Coef. : 0.00036 Skewed Coef. : 0.00057 Skewed Coef. : 0.00021 Kurosis Coef. : 2.38057 Kurosis Coef. : 2.68820 Kurosis Coef. : 2.27220 he 15 h error and residual disribuions when error follows U-quadraic disribuion and auocorrelaion coefficien is -0.7