Versatile Regression: simple regression with a non-normal error distribution

Similar documents
Estimation of Mixed Exponentiated Weibull Parameters in Life Testing

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.

Nonlinear regression

CSE 546 Midterm Exam, Fall 2014

Biostatistics in Research Practice - Regression I

Chapter 5: Generalized Linear Models

arxiv:math/ v2 [math.st] 26 Jun 2007

1.1 The Equations of Motion

Monte Carlo integration

CHAPTER 1 Functions and Their Graphs

Answer Explanations. The SAT Subject Tests. Mathematics Level 1 & 2 TO PRACTICE QUESTIONS FROM THE SAT SUBJECT TESTS STUDENT GUIDE

Directional derivatives and gradient vectors (Sect. 14.5). Directional derivative of functions of two variables.

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Nonparametric M-quantile Regression via Penalized Splines

1. Here is a distribution. y p(y) A.(5) Draw a graph of this distribution. Solution:

INTRODUCTION TO DIOPHANTINE EQUATIONS

Optimal scaling of the random walk Metropolis on elliptically symmetric unimodal targets

Estimators as Random Variables

Vibrational Power Flow Considerations Arising From Multi-Dimensional Isolators. Abstract

Section 1.5 Formal definitions of limits

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

CHAPTER 3 Applications of Differentiation

1.6 CONTINUITY OF TRIGONOMETRIC, EXPONENTIAL, AND INVERSE FUNCTIONS

An Information Theory For Preferences

Glossary. Also available at BigIdeasMath.com: multi-language glossary vocabulary flash cards. An equation that contains an absolute value expression

Power Functions. A polynomial expression is an expression of the form a n. x n 2... a 3. ,..., a n. , a 1. A polynomial function has the form f(x) a n

From the help desk: It s all about the sampling

Mutual Information Approximation via. Maximum Likelihood Estimation of Density Ratio:

8.1 Exponents and Roots

Figure 1: Visualising the input features. Figure 2: Visualising the input-output data pairs.

Section 2: Wave Functions and Probability Solutions

Strain Transformation and Rosette Gage Theory

Exponential and Logarithmic Functions, Applications, and Models

STRAND: GRAPHS Unit 5 Growth and Decay

STUDY KNOWHOW PROGRAM STUDY AND LEARNING CENTRE. Functions & Graphs

Computation of Csiszár s Mutual Information of Order α

Chapter 1 Prerequisites for Calculus

Characterization of the Skew-Normal Distribution Via Order Statistics and Record Values

4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY

CHAPTER 3 Applications of Differentiation

Does k-th Moment Exist?

6.4 graphs OF logarithmic FUnCTIOnS

1.1 Laws of exponents Conversion between exponents and logarithms Logarithm laws Exponential and logarithmic equations 10

Polynomial and Rational Functions

Trigonometry Outline

Mathematics Extension 2

6 = 1 2. The right endpoints of the subintervals are then 2 5, 3, 7 2, 4, 2 9, 5, while the left endpoints are 2, 5 2, 3, 7 2, 4, 9 2.

Machine Learning. 1. Linear Regression

Physics Gravitational force. 2. Strong or color force. 3. Electroweak force

CHAPTER 3 Applications of Differentiation

Evolution of philosophy and description of measurement (preliminary rationale for VIM3)

Unit 3 Notes Mathematical Methods

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

= x. Algebra II Notes Quadratic Functions Unit Graphing Quadratic Functions. Math Background

An Algorithm for Rough Surface Generation with Inhomogeneous Parameters

Review of Prerequisite Skills, p. 350 C( 2, 0, 1) B( 3, 2, 0) y A(0, 1, 0) D(0, 2, 3) j! k! 2k! Section 7.1, pp

1.6 ELECTRONIC STRUCTURE OF THE HYDROGEN ATOM

11. Generalized Linear Models: An Introduction

Introduction to Differential Equations. National Chiao Tung University Chun-Jen Tsai 9/14/2011

5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph.

4 Strain true strain engineering strain plane strain strain transformation formulae

Algebra 2 with Trigonometry. Practice Exam #1 Answer Key

Best subset selection via bi-objective mixed integer linear programming

LECTURE NOTES - VIII. Prof. Dr. Atıl BULU

Methods for Advanced Mathematics (C3) Coursework Numerical Methods

Estimation of Finite Population Variance Under Systematic Sampling Using Auxiliary Information

Dependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression

Characterizing Log-Logistic (L L ) Distributions through Methods of Percentiles and L-Moments

Research Article Development of a Particle Interaction Kernel Function in MPS Method for Simulating Incompressible Free Surface Flow

Math 123 Summary of Important Algebra & Trigonometry Concepts Chapter 1 & Appendix D, Stewart, Calculus Early Transcendentals

ONLINE PAGE PROOFS. Exponential functions Kick off with CAS 11.2 Indices as exponents

Estimators in simple random sampling: Searls approach

CHAPTER 2: Partial Derivatives. 2.2 Increments and Differential

Zero inflated negative binomial-generalized exponential distribution and its applications

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

A GENERAL FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN USING KNOWN VALUE OF SOME POPULATION PARAMETER(S)

Bayesian spatial quantile regression

Experimental Uncertainty Review. Abstract. References. Measurement Uncertainties and Uncertainty Propagation

School of Computer and Communication Sciences. Information Theory and Coding Notes on Random Coding December 12, 2003.

2.5 CONTINUITY. a x. Notice that Definition l implicitly requires three things if f is continuous at a:

Quick Review 4.1 (For help, go to Sections 1.2, 2.1, 3.5, and 3.6.)

Finite-sample quantiles of the Jarque-Bera test

x c x c This suggests the following definition.

Exponential and Logarithmic Functions

y R T However, the calculations are easier, when carried out using the polar set of co-ordinates ϕ,r. The relations between the co-ordinates are:

GB2 Regression with Insurance Claim Severities

EQUIVALENT FORMULATIONS OF HYPERCONTRACTIVITY USING INFORMATION MEASURES

The Fusion of Parametric and Non-Parametric Hypothesis Tests

Linear Equation Theory - 2

c 1999 Society for Industrial and Applied Mathematics

CHAPTER 3 Applications of Differentiation

CHAPTER P Preparation for Calculus

3.1 Exponential Functions and Their Graphs

Math 53 Homework 4 Solutions

Interspecific Segregation and Phase Transition in a Lattice Ecosystem with Intraspecific Competition

3.3 Logarithmic Functions and Their Graphs

A GENERAL FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN USING KNOWN VALUE OF SOME POPULATION PARAMETER(S)

Applications of Gauss-Radau and Gauss-Lobatto Numerical Integrations Over a Four Node Quadrilateral Finite Element

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Unit 10 - Graphing Quadratic Functions

Transcription:

Versatile Regression: simple regression with a non-normal error distribution Benjamin Dean,a, Robert A. R. King a a Universit of Newcastle, School of Mathematical and Phsical Sciences, Callaghan 238, NSW Australia Abstract We present a simple regression technique, called Versatile Regression, where the error distribution is described b the Generalized Lambda Distribution. The fleibilit of this distribution allows the error distribution to be heav-tailed, skewed or approimatel normal. Versatile Regression was found to perform well on heav-tailed and skewed data. Versatile Regression also provided a reasonable approimation to Normal-Error Regression. Simulation studies found that Versatile Regression produced accurate parameter estimates. Ke words: Simple Regression, Generalized Lambda Distribution, Non-Normal Error Distribution 1. Introduction Versatile Regression (VR) is a regression technique with a single predictor variable. The error distribution is non-normal, homoscedastic and identicall distributed for all levels of the predictor variable. The error distribution is described using the Generalized Lambda Distribution (GLD); a quantile-defined distribution with fleible shape. 1.1. The Generalized Lambda Distribution The GLD can represent a vast range of shapes. Freimer et al. (1988) nicel articulated this b saing [the GLD is ver rich in the variet of densit and tail shapes. It contains unimodal, U-shaped, J-shaped and monotone probabilit densit functions. These can be smmetric or asmmetric and their tails can be smooth, abrupt or truncated, and long, medium or short. Furthermore, the GLD contains the logistic, eponential and uniform distributions as limiting cases, and provides good approimations to other distributions including the normal and gamma. A comprehensive discussion of GLD shapes is given in Karian and Dudewicz (2). The GLD is defined in terms of its quantile function F 1 (u). This article makes use of Freimer et al. s (1988) parameterisation of the GLD. We use slightl different notation than Freimer et al. (1988) and write the quantile function as F 1 (u) = λ 1 + 1 λ 2 [ u 1 (1 u)λ4 1 λ 4 where u [, 1, λ 2 >. The probabilit densit function of the GLD is not available in closed form. However, it is available as a function of u. f () = λ 2 u 1 + (1 u) λ 4 1 (1) at = F 1 (u) (2) Corresponding author. Tel.: +61 2 4921 6384 Email addresses: Benjamin.Dean@newcastle.edu.au (Benjamin Dean), Robert.King@newcastle.edu.au (Robert A. R. King) Third Annual ASEARC Conference λ 1, λ 2, and λ 4 are parameters that can attain an real value (ecept for λ 2 which must be positive), and u is a variable that assumes values between zero and one. λ 1 is the location parameter, λ 2 is the (inverse) scale parameter, and, λ 4 are the shape parameters. ( controls the left tail shape and λ 4 controls the right tail shape.) A GLD with parameters λ 1, λ 2,, λ 4 is denoted b GLD(λ 1, λ 2,, λ 4 ). The support of the GLD is given in Table 1. The GLD has infinite support when both and λ 4 are non-positive, and halfinfinite support when one of and λ 4 is non-positive. λ 4 Support (, ) > [λ 1 1/λ 2, ) > (, λ 1 + 1/λ 2 λ 4 > > [λ 1 1/λ 2, λ 1 + 1/λ 2 λ 4 Table 1: Support of the GLD. 1.2. Alternative regression methods 1.2.1. Transformations The regression model i = β +β 1 i +ɛ i, where ɛ i N(, σ 2 ), is etremel popular because of its simplicit and practicalit ( i denotes the value of the response variable, i denotes the value of the predictor variable, β and β 1 are the regression coefficients, and ɛ i is a random error term). Following the terminolog used b Neter et al. (1996, pg. 29 3), we refer to this as the model for Normal-Error Regression (NER). When NER s assumptions of linearit, normalit and homoscedasticit are not satisfied, it is common practice to transform the response and/or predictor variables. However, transformations introduce two main problems. It can be difficult to justif wh a linear relationship should eist between the transformed variables. Interpretation, relative to the original variables of interest, is often complicated. December 7 8, 29, Newcastle, Australia

December 7 8, 29, Newcastle, Australia 1.2.2. Quantile Regression Quantile Regression (Koenker and Bassett, 1978) fits a linear regression function b minimizing a weighted sum of residuals, where positive and negative residuals receive a weight of p and 1 p, respectivel ( < p < 1). Quantile Regression makes no assumptions regarding the form of the error distribution. This makes the technique simple to perform, quick to compute, and widel applicable. However, Quantile Regression has two main disadvantages. The regressed p-quantile functions can cross over. Quantile Regression is a non-parametric technique and it ma not produce the most accurate predictions and prediction intervals. Parametric methods that successfull estimate the error distribution are likel to obtain better results. 1.2.3. Other non-normal regression methods There have been numerous attempts to develop regression techniques where the error distribution is non-normal. Most of the literature has occurred within the past 3 ears due to advances in computing power. A stud b Zeckhauser and Thompson (197) modelled the errors using a power distribution with densit f (z; µ, σ, θ) = k(σ, θ) ep ( σ θ z µ θ), where k(σ, θ) was a normalizing factor and µ, σ and θ were the location, scale and kurtosis parameters, respectivel. The distribution offered promise because it assumed the normal, double eponential and uniform distributions for certain values of θ. Zeckhauser and Thompson (197) were criticized because the error distribution had a cusp at the origin when θ < 1, rendering it unrealistic (Mandelbrot, 1971). Relevant work in more recent ears includes Geweke (1993) and Fernandez and Steel (1998). Geweke (1993) emploed Baesian methods to construct a linear model where the errors followed a smmetric t-distribution. Fernandez and Steel (1998) used Baesian MCMC methods to develop a linear model where the errors followed a skewed t-distribution. King, Gerlach and Wraith (2) proposed a regression technique, called Starship Regression, where the error distribution followed the GLD. Parameter estimation was performed using Owen s (1988) Starship method. The estimation algorithm used a grid-based search to obtain starting values for a minimization routine. The grid-based search made the technique slow, especiall when dealing with large sample sizes. Parameter estimation was also complicated b the presence of a penalt term in the objective function. Dean, King and Howle (29) developed a regression technique called Stretched Regression. The regression model was formulated in terms of a response distribution, rather than an error distribution centered on a regression function. The response distribution dispersed or tapered in shape as the predictor variable increased, whilst the left tail minimum remained fied in position. The response distribution was described b the GLD. 2. Method In VR, the regression model is i = β + β 1 i + ɛ i, where ɛ i GLD(λ 1, λ 2,, λ 4 ). The median of the error distribution 2 was set to zero (since the GLD can generate a wide range of shapes, including severel skewed distributions, it was more appropriate to work with the median than the mean.) This required the following constraint. λ 1 = 1 [. 1.λ4 1 (3) λ 2 λ 4 Parameter estimation was performed in two steps. Firstl, the residuals were computed using e i = i (β + β 1 i ). Secondl, the log likelihood function of the error distribution parameters was maimized. The log likelihood function is log L(λ 1, λ 2,, λ 4 ) = log f ( e ; λ 1, λ 2,, λ 4 ) = log = n f ( ) e i ; λ 1, λ 2,, λ 4 i=1 n log [ f ( ) e i ; λ 1, λ 2,, λ 4 i=1 The values of λ 2,, λ 4, β, β 1 that maimized (4) were chosen as the final parameter estimates (since λ 1 was defined as a function of λ 2,, λ 4, the independent parameters reduced to λ 2,, λ 4, β, β 1 ). The optimization was performed using the BFGS algorithm (Broden (197), Fletcher (197), Goldfarb (197), Shanno (197)). The optimization alwas used a starting value of.4 for and λ 4 (heav tails ensured decent coverage of the data). The β and β 1 starting points were set equal to the regression coefficients produced b NER. The starting point for λ 2 was set to [ (.7.2 )/ + (.7 λ 4.2 λ 4 )/λ 4 / IQR(eNER ), where, λ 4 =.4. This is obtained b approimating the NER residuals b GLD(λ 1, λ 2,, λ 4 ), and rearranging the epression for interquartile range (given b F 1 (.7 ; λ 1, λ 2,, λ 4 ) F 1 (.2 ; λ 1, λ 2,, λ 4 )). 3. Results 3.1. Approimation of Normal-Error Regression Since the GLD can approimate the normal distribution, VR can provide an approimation of NER. The qualit of the approimation was assessed b a simulation stud. The simulation stud involved randoml generating 1, datasets with i Uniform(, 2) and i β +β 1 i +N(, 1), for sample sizes of n =, 1, 2,, 1. β and β 1 were arbitraril set to β = 2 and β 1 = 1. VR and NER were applied to the datasets. This produced parameter estimates (ˆλ 2, ˆ, ˆλ 4, ˆβ, ˆβ 1 ) for VR, and parameter estimates (ˆβ, ˆβ 1, ˆσ) for NER. The performance of VR and NER was compared using the Mean Square Error (MSE) of the regression coefficients. The results are shown in Figure 1. 3.2. Accurac of parameter estimates A simulation stud assessed the accurac of VR s parameter estimates. For a given parameter set (λ 2,, λ 4, β, β 1 ), 1, datasets were randoml generated with i Uniform(, 2) and i β +β 1 i +GLD(λ 1, λ 2,, λ 4 ) (where λ 1 was given b (3)). (4)

December 7 8, 29, Newcastle, Australia MSE..6.12 β n MSE e+ 6e 4 β 1 Figure 1: MSE of regression coefficients when modelling data suited for NER. Results for VR and NER are represented b circles and triangles, respectivel. n than NER. However, VR rapidl improved in performance as the sample size increased, and there was minimal difference between VR and NER for sample sizes of n = 2, and 1. Table 3 summarizes the ˆλ 2, ˆ, ˆλ 4 values produced b VR in Section 3.1. Since GLD(, 1.464,.1349,.1349) closel approimates N(, 1), as determined b the method of moments (Karian and Dudewicz, 2), Table 3 indicates that VR s error distribution became approimatel normal as the sample size increased. Hence, VR provided a good approimation of NER as the sample size increased. VR was applied to the datasets and the estimated parameters (ˆλ 2, ˆ, ˆλ 4, ˆβ, ˆβ 1 ) were compared to the true parameters. This process was repeated for sample sizes of n =, 1, 2,, 1. A preliminar stud used 16 different (λ 2,, λ 4, β, β 1 ) parameter sets. These sets were produced b the possible combinations of λ 2 =.1, = (.4, ), λ 4 = (.4, ), β = (2, 4) and β 1 = (, 1). Setting λ 2 =.1 generated data with large spread (λ 2 is the inverse scale parameter). Setting, λ 4 =.4 generated data with heav tails (see Karian and Dudewicz (2) for a description of GLD shapes). Setting, λ 4 = generated data with lighter tails, but still on infinite support (see Table 1). β and β 1 were set to arbitrar values. The preliminar stud showed that ˆλ 2, ˆ, ˆλ 4, ˆβ, ˆβ 1 had sampling distributions where the shape and spread was dependent upon and λ 4, but independent of β and β 1. Hence, publishing results for more than one combination of β and β 1 was found to be redundant. Consequentl, this article onl presents results for 4 of the 16 parameter sets from the preliminar stud. These parameter sets are defined in Table 2. Set λ 2 λ 4 β β 1 A.1.4.4 2 B.1.4 2 C.1.4 2 D.1 2 Table 2: Definition of parameter sets. Figures 2, 3, 4 and show tpical datasets and regression models produced in the simulations (the data and regression line are drawn in the -plane, and the error distribution is drawn above the regression line at =, 1 and 2). The error distributions were approimatel smmetric for sets A and D, but were left and right skewed for sets B and C, respectivel. Figure 6 shows the results of the simulation stud. The accurac of parameter estimates is presented for a range of sample sizes and parameter sets. 4. Discussion Section 3.1 assessed VR s performance on data where the response variable was normall distributed. For a sample size of n =, VR s regression coefficients had much larger MSE 3 n ˆλ 2 ˆ ˆλ 4 1.26 (.44).34 (.27).34 (.28) 1 1.38 (.26).21 (.14).21 (.14) 2 1.43 (.1).16 (.7).16 (.7) 1.4 (.1).1 (.).1 (.) 1 1.4 (.7).14 (.3).14 (.3) Table 3: Mean ˆλ 2, ˆ and ˆλ 4 values produced b VR in Section 3.1. The bracketed values represent the standard error. Section 3.2 found that VR s parameter estimates decreased in bias and spread as the sample size increased (spread was measured b the difference between the.97 and.2 quantiles of the sampling distributions.) Figure 6 shows that set D had the smallest spread of β and β 1 estimates. This is a consequence of the data being more condensed (see Figure ). Eecution time is an important aspect of an modelling technique. Table 4 summarizes the eecution times of VR for the simulations performed in Section 3.2. The simulations were performed on computers with dual-core AMD Opteron 2 processors (2.4 GHz) and 4 GB of RAM. The fast eecution times can be largel attributed to the VR objective function being written in the C programming language. n A B C D.8 (.8).11 (.11).1 (.7).11 (.7) 1.12 (.1).16 (.7).14 (.3).16 (.4) 2.27 (.6).3 (.1).3 (.6).3 (.7).48 (.7). (.12). (.9).3 (.12) 1.94 (.12) 1.31 (.34) 1.2 (.28) 1.22 (.33) Table 4: Mean eecution times (seconds) for the simulations performed in Section 3.2. The bracketed values represent the standard deviation. All modelling techniques have their disadvantages and VR is no eception. The shortcomings of VR are listed below. The GLD has finite support when both and λ 4 are positive. Consequentl, VR s error distribution is onl defined on a finite domain when both and λ 4 are positive. The optimization routine ma not converge to a solution (this is a problem of optimization in general). In Section 3.2, 2, simulations were performed and ever one of these simulations successfull converged to a solution. In practice, if the optimization routine did fail, the user could override the and λ 4 starting values and the modelling process could be rerun.

December 7 8, 29, Newcastle, Australia.1.1.. 2 1 1 2 1 1 Figure 2: Tpical regression model for set A..2.1.1.. 2 1 2 1 1 1 Figure 3: Tpical regression model for set B..2.1.1.. 2 1 1 2 1 1 Figure 4: Tpical regression model for set C..2.2.1.1.. 2 1 1 2 1 1 Figure : Tpical regression model for set D. A B C D λ 2..1..1..1..1.8.2.8.2.4.4.2.8.4.2.8 λ 4.8.2.4.2.8.8.2.2.4 1. β 1 2 3 1 2 3 1 2 3 1 2 3 β 1 3. 4.. 6. 3. 4.. 6. 3. 4.. 6. 3. 4.. 6. Figure 6: Accurac of parameter estimates for a range of sample sizes and parameter sets. Each horizontal line represents the true parameter value. Each circle represents the mean value of the sampling distribution. Each upward and downward facing triangle represents the.2 and.97 quantiles of the sampling distribution, respectivel. 4

December 7 8, 29, Newcastle, Australia. Conclusion This article presented Versatile Regression; a simple regression technique where the error distribution was described b the Generalized Lambda Distribution. The fleibilit of this distribution allowed the error distribution to be heav-tailed, skewed or approimatel normal. Versatile Regression performed well on heav-tailed and skewed data. Versatile Regression also provided a reasonable approimation to Normal-Error Regression. References Broden, C. G., 197. The convergence of a class of double-rank minimization algorithms. IMA Journal of Applied Mathematics, 6 76 9. Dean, B., King, R. A. R., and Howle, P. P., 29. Stretched Regression: simple regression with a non-normal response distribution that smoothl changes scale and shape. In preparation. (Dept. of Statistics, Universit of Newcastle, Callaghan, NSW Australia). Fernandez, C. and Steel, M. F. J., 1998. On Baesian modeling of fat tails and skewness. Journal of the American Statistical Association, 93(441) 39 371. Fletcher, R., 197. A new approach to variable metric algorithms. Computer Journal, 13 317 322. Freimer, M., Mudholkar, G. S., Kollia, G. and Lin, C. T., 1988. A stud of the Generalized Tuke Lambda famil. Communications in Statistics - Theor and Methods, 17 347-367. Geweke, J., 1993. Baesian treatment of the independent student-t linear model. Journal of Applied Econometrics, 8 19-4. Goldfarb, D., 197. A famil of variable metric updates derived b variational means. Mathematics of Computation, 24 23 26. Karian, Z. A. and Dudewicz, E. J., 2. Fitting statistical distributions: the Generalized Lambda Distribution and Generalized Bootstrap methods. Boca Raton, CRC Press. King, R. A. R., Gerlach, R. and Wraith, D., 2. Starship regression: A parametric quantile regression method with fleibl-shaped errors. In preparation. (Dept. of Statistics, Universit of Newcastle, Callaghan, NSW Australia). Koenker, R. and Bassett, G., 1978. Regression quantiles. Econometrica, 46 33-. Mandelbrot, B., 1971. Linear regression with non-normal error terms: a comment. The Review of Economics and Statistics, 3(2) 2 26. Neter, J., Kutner, M. H., Nachtsheim, C. J. and Wasserman, W., 1996. Applied linear statistical models, fourth ed. McGraw-Hill/Irwin. Owen, D. B., 1988. The starship. Communications in Statistics - Simulation and Computation, 17 31-323. Shanno, D. F., 197. Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24 647 66. Zeckhauser, R. and Thompson, M., 197. Linear regression with non-normal error terms. The Review of Economics and Statistics, 2(3) 28-286.