Notes on the use of the Goldfeld-Quandt test for heteroscedasticity in environment research

Similar documents
Properties and Hypothesis Testing

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

A statistical method to determine sample size to estimate characteristic value of soil parameters

There is no straightforward approach for choosing the warmup period l.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

11 Correlation and Regression

Output Analysis (2, Chapters 10 &11 Law)

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Sample Size Determination (Two or More Samples)

Random Variables, Sampling and Estimation

1 Inferential Methods for Correlation and Regression Analysis

Expectation and Variance of a random variable

Statistical Properties of OLS estimators

This is an introductory course in Analysis of Variance and Design of Experiments.

The standard deviation of the mean

ANALYSIS OF EXPERIMENTAL ERRORS

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

1 Introduction to reducing variance in Monte Carlo simulations

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Topic 9: Sampling Distributions of Estimators

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Topic 9: Sampling Distributions of Estimators

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Statistics 511 Additional Materials

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4


Department of Mathematics

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Statisticians use the word population to refer the total number of (potential) observations under consideration

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Topic 9: Sampling Distributions of Estimators

Chapter 6 Sampling Distributions

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Final Examination Solutions 17/6/2010

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Mathematical Statistics - MS

Stat 319 Theory of Statistics (2) Exercises

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Topic 18: Composite Hypotheses

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Lecture 2: Monte Carlo Simulation

Chapter 12 Correlation

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Error & Uncertainty. Error. More on errors. Uncertainty. Page # The error is the difference between a TRUE value, x, and a MEASURED value, x i :

6 Sample Size Calculations

Data Analysis and Statistical Methods Statistics 651

Parameter, Statistic and Random Samples

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

Frequentist Inference

Statistical inference: example 1. Inferential Statistics

MA131 - Analysis 1. Workbook 2 Sequences I

Analysis of Experimental Data

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Correlation Regression

Chapter 13, Part A Analysis of Variance and Experimental Design

MA Advanced Econometrics: Properties of Least Squares Estimators

Output Analysis and Run-Length Control

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Efficient GMM LECTURE 12 GMM II

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Chapter 6 Principles of Data Reduction

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Read through these prior to coming to the test and follow them when you take your test.

DISTRIBUTION LAW Okunev I.V.

Access to the published version may require journal subscription. Published with permission from: Elsevier.

Infinite Sequences and Series

A proposed discrete distribution for the statistical modeling of


Extreme Value Charts and Analysis of Means (ANOM) Based on the Log Logistic Distribution

The target reliability and design working life

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Understanding Samples

Trimmed Mean as an Adaptive Robust Estimator of a Location Parameter for Weibull Distribution

GG313 GEOLOGICAL DATA ANALYSIS

On Random Line Segments in the Unit Square

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Chapter 2 Descriptive Statistics

Linear Regression Models

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

Transcription:

Biometrical Letters Vol. 45(008), No., 43-55 Notes o the use of the Goldfeld-Quadt test for heteroscedasticity i eviromet research Aa Budka, Dauta Kachlicka, Maria Kozłowska Departmet of Mathematical ad Statistical Methods, Poza Uiversity of Life Scieces, Wojska Polskiego 8, 60-637 Pozań, Polad e-mail: budka@up.poza.pl SUMMARY Some methods of usig the Goldfeld-Quadt test are described. The use of classical statistics (mea, stadard deviatio) or positio statistics (media, average deviatio, media absolute deviatio) is proposed. The results ca be used to test the hypothesis that the residuals from a liear regressio are homoscedastic, whe the experimetal distributio of the cosidered variable is symmetrical or asymmetrical (right-sided skew or left-sided skew). Usig Mote Carlo method, properties of these modificatios of the Goldfeld-Quadt procedure are explored. A compariso of the effectiveess of the described methods for eviromet research is preseted. Key words: Goldfeld-Quadt test, heteroscedasticity, liear regressio, positio statistics. Itroductio The pure scieces have wide applicatio to experimetatio i the field of evirometal egieerig. Statistics ad ecoometrics offer quatitative methods which are a importat tool i the aalysis of results of eviromet research. These iclude research o the cotet of the air, air quality beig a importat issue. High air quality meas that the level of pollutio of the air is low. Pollutio may come from sources which are either atural or athropogeic (caused by huma activity). Natural pollutio is produced maily by forest fires or the decompositio of livig orgaisms, while athropogeic sources iclude trasport, fuel burig, idustrial processes ad others. The quality compositio of the air is variable, ad what is worse there arise secodary pollutats, which are ofte more toxic ad harmful tha the primary oes. The course of these processes depeds o may factors, icludig the spatial distributio of cocetratios. Aother importat aspect is the asphalt

44 A. Budka, D. Kachlicka, M. Kozłowska effect. Whe we cover the soil surface with a impermeable layer of cocrete or asphalt, the damp from the groud caot pass to the air. The air becomes dry ad susceptible to pollutio. Therefore a fudametal criterio to cosider whe buyig a plot of buildig lad is its positio. Research ito the effect o lad prices of, amog other thigs, the distace of the plot from the earest highway ad airport requires certai coditios to be checked, which we do usig the Goldfeld- Quadt test. The object of this paper is to preset certai modificatios of the use of the Goldfeld-Quadt test for ivestigatig the presece of heteroscedasticity.. Framework We cosider theliear model y = Xβ + ε, () where y is a x vector of observatios o the depedet variable, X=[, x,, x k- ] is a x k matrix of observatios o k- idepedet variables, β is a k x vector of ukow parameters, ad ε is a x radom vector of errors, where is the vector of oes. We assume that errors are ormally distributed ad the value of the error term correspodig to a observed value of the depedet variable is statistically idepedet of the value of the error term correspodig to ay other observed value of the variable. The error is the distace betwee the observed value of the depedet variable ad its expected value. This expectatio is labelled E(y) ad E(y) = Xβ, because we assume that E(ε) = 0, where 0 is the ull vector. I order for the estimates of the parameters ot to be biased (or best liear ubiased estimates) oe more assumptio should hold: the variace should be costat. The variace matrix is labelled D(ε) ad D(ε) = I, where I is the idetity matrix. If we observe departures from these assumptios the the estimators of the parameters are biased ad cosequetly they are ot best or other drawbacks may to be. Hece cotrol of the model is a very importat part of the regressio aalysis. It is well kow that i the presece of heteroscedasticity of error variaces D(ε)=diag(,,..., ), the least squares method has two major drawbacks:

Notes o the use of the Goldfeld-Quadt test for heteroscedasticity 45 iefficiet parameter estimates ad biased variace estimates which make stadard hypothesis tests iappropriate. Kow tests for heteroscedasticity based o error aalysis iclude, for example, the most popular Goldfeld-Quadt test, the Breusch-Paga test ad the White test, described respectively by Goldfeld ad Quadt (965), Breusch ad Paga (979), White (980). The literature o testig for heteroscedasticity icludes may more tests (see Dufour et al., 004). The test for heteroscedasticity i regressio models based o the Goldfeld Quadt methodology defied by Carapeto ad Holt (003) also deserves attetio. Most of these tests are what Goldfeld ad Quadt call o-costructive tests, i that they ca be used to determie the presece or absece of heteroscedasticity, but reveal othig about the form of the variace structure (Buse, 984). We are iterested i testig for heteroscedasticity i situatios with kow deflator. Our ull hypothesis is H 0 : i = for i=,,,, ad the alterative hypothesis is H : ~H 0 (the symbol ~ deotes egatio). The Goldfeld-Quadt test is proposed by, for example, Goldfeld ad Quadt (965), Buse (984) or Maddala (006) for verificatio of the ull hypothesis. The test proposed by Goldfeld-Quadt (965) is carried out i four steps. I the first step we must sort the multivariate variable with respect to the choice of oe idepedet variable, for example x d, d {l: l=,,,k-}. This variable is a potetial deflator. Next, i the secod step, we ca discard oe or more cetral observatios. I the third step, we fit separate regressio aalyses to each of two remaiig sets of observatios. I the last step, we form the statistic, which has F-distributio. The statistic is the quotiet of the mea squares for error from the separate regressios. Let x d x d x d, where the secod subscript deotes the umber of observatios of the variable x d. We obtai a set of observatios of the multivariate variable i the ew orderig. Now we cosider model () i the form y y y 3 X = X X 3 ε + β ε ε 3, ()

46 A. Budka, D. Kachlicka, M. Kozłowska where for j=,, 3, y j ad ε j are j x vectors, X j is a j x k dimesio matrix. Goldfeld ad Quadt (965) state that the vector y icludes some cetral observatios after they are sorted with respect to the chose potetial deflator. The choice of the x dimesio vector y is the subject of cosideratio i the ext sectio. The choice of the vector y follows as a cosequece of the choice of cetral observatios of the variable x d. Goldfeld ad Quadt (965) state that the dimesio of vectors y ad y 3 is the same, but other authors, for example Thursby (98), Dufour et al. (004), take ito accout varyig dimesios of these vectors. Goldfeld ad Quadt (965) proposed usig the test for heteroscedasticity after the removal of some umber of cetral observatios. They did ot specify how may observatios should be removed. They gave the relative frequecy of cases i which the false hypothesis is rejected for samples of dimesio =30 ad =60 after omittig 0, 4, 8, or 6 cetral observatios, ad they estimated the power of the test. They obtaied the largest frequecy for =30 ad =60 after the omissio, respectively, of 8 ad 6 cetral observatios (equal to 6.7%). Buse (984) aalysed the problem for =0, 40, 80 removig 0% of cetral observatios. The same dimesio of the removed set of observatios was used by Dufour et al. (004) for =50 ad =00. Maddala (006) suggests the removal of cetral observatios to icrease the power of the test, but he does ot aswer the questio of how may observatios to remove. 3. Results We must quote here two seteces formulated by Goldfeld ad Quadt (965): (i) The power of this test will clearly deped upo the value of, the umber of omitted observatios; for every large value of the power will be small but it is ot obvious that the power icreases mootoically as teds to 0, (ii) The power of the test will clearly deped o the ature of the sample of values for the variable which is the deflator. Thus, if the variace of x d is small relative to the mea of x d the power ca be expected to be small ad coversely. We believe that the umber of omitted observatios depeds o the precisio of the measuremets carried out ad o their distributio.

Notes o the use of the Goldfeld-Quadt test for heteroscedasticity 47 The distributio of the idepedet variable called the deflator may be defied by a cumulative probability fuctio labelled F(x d ). If for a value x d(q), where q (0;), the cumulative probability fuctio F(x d(q) ) is equal to q, the x d(q) is called a quatile of order q. Lower quartile, media ad upper quartile are measures of locatio ad are labelled x d(/4), x d(/) ad x d(3/4) respectively. Let us recall that may authors suggest removig 0% of observatios; these are values of the variable x d withi the iterval (x d(/5) ; x d(3/5) ). Whe for a set of several measuremets the mea has bee calculated, the stadard deviatio ca be calculated for the set too. The stadard deviatio tells us how repeatable the placemet of the measure is, ad how much this cotributes to the ucertaity of the mea value. From these, the estimated stadard deviatio of the mea (the stadard error) may be calculated. The stadard deviatio of the mea has also bee called a stadard ucertaity. The stadard ucertaity is a margi whose size ca be thought of as plus or mius oe stadard error. We propose usig classical statistics or positio statistics with the aim of defiig the set of removed observatios. 3.. Method based o classical statistics Propositio. Whe the variable x d has symmetrical distributio, the mea is the cetral poit of the distributio. We assume that stadard error is the measure of deviatio from the mea. From the set of observatios of variable x d, such that x d x d x d, we propose to remove all observatios belogig to the iterval defied by the mea of the variable plus or mius the stadard error, that is ˆ ˆ x di ( µ ˆ ; µ ˆ + ), (3) where µˆ ad ˆ are estimates of the mea ad stadard deviatio of the variable x d respectively. Let us otice that, for the variable x d N(µ;), the orders of the quatile x (q ) = µ ad = µ + have the forms: d x d(q )

48 A. Budka, D. Kachlicka, M. Kozłowska q q = F( µ = F = F 0; ( = F( µ + 0; ) = P(x ) = ) = ( ) = d π x P( π d < µ e e µ < ) = xd xd dx x P( dx ) = d µ < ) =, Hece, we have q q = F0; (/ ). I the case whe =6 the differece of the orders is equal to /5, that is q q = F0; (0.5) = / 5. Let us otice that for icreasig sample size, we have lim ( q q) = lim (F0; ( ) ) = 0. Whe x d has the ormal distributio, the legth of the iterval (3) depeds o the size of the sample. I the case we will remove for example 0% of observatios whe =6 or 0% whe =8. 3.. Methods based o positio statistics Whe the variable x d has asymmetrical distributio the the media med(x d ) is the cetral poit of the distributio. The media is i the middle of the sorted data (x d x d x d ). I a similar way we ca defie the first quartile to be /4 of the way through the sorted data (x d(/4) ), ad the third quartile to be 3/4 of the way through the sorted data (x d(3/4) ). For ivestigatio of the variability of the media, a positio statistic called media absolute deviatio is used. The media absolute deviatio (MAD) is defied as the media of the absolute value of the differece betwee a variable ad its media, that is MAD(x d ) = med x d med(x d ). For ormally distributed x d N(µ;), the MAD is give by: MAD(x d )= med (x d µ)/ =0.67

Notes o the use of the Goldfeld-Quadt test for heteroscedasticity 49 Let us otice that µ MAD(x d )=med(x d ) MAD(x d )=x d(/4), µ+mad(x d )= =med(x d )+MAD(x d )= x d(3/4) ad F(µ+0.5 MAD(x d )) F(µ 0.5 MAD(x d ))=0.5 because x d µ F( µ + 0.67 / ) F( µ 0.67 / ) = P( < 0.67 / ) = F (0.67 / ) = 0.5 0; We propose takig MAD(x d ) as the upper limitatio of the legth of the iterval icludig omitted observatios of the variable x d. We may ivestigate the symmetry of distributio of the variable x d by calculatig quartiles. Whe the media is equal to half of the iter-quartile rage, that is med(x d )=½ (x d(3/4) x d(/4) ), the the distributio is symmetrical. For right-sided skew distributio we have med(x d )<½ (x d(3/4) x d(/4) ), ad for left-sided skew distributio med(x d )>½ (x d(3/4) x d(/4) ). For variable x d havig asymmetrical distributio we propose some methods for defiig the set of omitted observatios. Propositio. Adoptig the asymptotic approach for buildig the cofidece iterval for the media applyig stadard error (see Dawid, 98) we propose to defie the set of removed observatios by the iterval media plus or mius stadard error. Hece we propose removig the followig observatios : ˆ ˆ x di ( med(x d ) ; med(x d ) + ). (4) Propositio 3. Aother measure of variatio is mea absolute deviatio. The mea absolute deviatio (MeaAD) is defied as follows: MeaAD (xd ) = xdi med(x d ). (5) i The mea absolute deviatio takes accout of values from zero to half of the rage of observatios. Hece we believe that we may remove observatios belogig to the iterval media plus or mius the mea absolute deviatio divided by the square root of the umber of observatios, amely MeaAD(x d ) MeaAD(x d ) x di ( med(xd ) ;med(x d ) + ). (6) =.

50 A. Budka, D. Kachlicka, M. Kozłowska Propositio 4. I experimets there is o such thig as a perfect measuremet. Each measuremet cotais a degree of ucertaity due to the limits of istrumets ad the people usig them. The ucertaity idex for sample size has the form.859 MAD(x d ) /sqrt(), where sqrt() deotes the square root of. The rage of omitted values of x d ca be defied as the media plus or mius the value of the ucertaity idex. We propose removig the followig cetral observatios:.859 MAD(xd ).859 MAD(xd ) x di ( med(xd ) ;med(xd ) + ) (7) Above we have give four propositios for defiig the set of cetral values of observatios of the idepedet variable x d called a deflator. These propositios cocer the use of measures of variatio such as the stadard error, the mea absolute deviatio ad the media absolute deviatio. The rage of removed observatios is defied as a iterval whose cetral poit is the mea or media of the variable x d. We will check the usefuless of the above propositios o geerated data. 4. Mote Carlo simulatio I order to compare the effectiveess of the described modificatios of the Goldfeld-Quadt procedure we performed a Mote Carlo study usig SEPATH from Statistica 7.. We cosidered sample size =00 ad we used the sigle regressor model as applied for example i Goldfeld ad Quadt (965), Griffiths ad Surekha (986), Carapeto ad Holt (003). However we performed our study i a differet way tha these authors. I the first step, values y i ad x i for i=,,,00, were geerated from a ormal distributio N(0;), such that these values (y i ;x i ) had highly sigificat correlatio. I the secod step we sorted (y i ;x i ) with respect to x i. I the third step we omitted some cetral observatios. We used oe of six methods of defiig the set. I the ext step we calculated the Goldfeld-Quadt statistics. Nie hudred replicatios were geerated. I the last step we calculated the p-value for the Goldfeld-Quadt test ad the cumulative frequecy of cases i which the ull

Notes o the use of the Goldfeld-Quadt test for heteroscedasticity 5 hypothesis is ot rejected. The power of the Goldfeld-Quadt test was calculated usig Statistica 7. for first five hudred replicatios. We cocetrate ow o comparig differet methods for omittig cetral observatios. Two stadard methods are cosidered here; the first method was described by Goldfeld ad Quadt (965) ad the secod by, for example, Buse (984). The first method ivolves use of the test without removig cetral observatios, ad i the secod method twety percet of cetral observatios are omitted. These two methods are compared with our four propositios. Table displays the experimetally calculated mea p-value, cumulative frequecy of cases i which the ull hypothesis is ot rejected o sigificace level 0.05 or 0.0 or 0.00 ad mea power of the Goldfeld-Quadt test tabulated by six methods. Table. Mea p-value, cumulative frequecy of cases i which ull hypothesis is ot rejected (α=0.05, 0.0, 0.00) mea power of the Goldfield-Quadt test for Mote-Carlo experimets ad umber of removed observatios (ro) Set of omitted observatios p-value frequecy power Nro 0.05 0.0 0.00 Null 0.58 0.89 0.978 0.997 0.67 0.0 0% cetral observatios 0.488 0.904 0.983 0.998 0.57 0.0 mea ± stadard error 0.486 0.900 0.98 0.998 0.45 7.9 media ± stadard error 0.5 0.896 0.983 0.998 0.33 8.8 media ± MeaAD/sqrt() 0.540 0.898 0.98 0.998 0.5 7. media ±.859 MAD/sqrt() 0.5 0.897 0.984 0.998 0.45 0.6 The magitude of the differeces i the p-values of our propositios compared with the two methods from earlier papers is small (<%). We obtaied the smallest p-value for the iterval mea plus or mius stadard error. The power give i Table clearly idicates that the power of the Goldfeld-Quadt test deped upo the umber of omitted observatios but it is ot true that power icreases mootoically as this umber teds to ull. This cosideratio shows that the stadard method recommeded by may authors (0% omitted cetral observatios) gives the same result as our methods. Whe we make measuremets, we have o way of kowig how accurate the values are. The use of variatio measures such as stadard error, media absolute deviatio or stadard ucertaity make it possible to avoid takig a overoptimistic view of reality. By performig 'truthful' aalysis, we

5 A. Budka, D. Kachlicka, M. Kozłowska ca refie the Goldfeld-Quadt procedure. We ucover ad take advatage the major sources of the ucertaity. I doig so, we use iformatio about measuremets. 5. Research problem Regressio models i eviromet research, ad particularly the estimated stadard errors, rely upo the assumptio that the residuals are idepedetly ad idetically distributed. Two commo violatios of this assumptio are serial correlatio ad heteroscedasticity. We assume that serial correlatio is ot a sigificat issue. However heteroscedasticity, which refers to o-costat error variace, is a commo problem i eviromet regressio models. The preset aalysis is based o the results of research described i example 4.7 by Maddala (006). I this example six variables were described. The price of lad per acre is here called the depedet variable labelled y, the percetage afforested area is labelled x, the distace of the lad parcel from the airport is labelled x, the distace from the highway is labelled x 3, the area of the lad parcel is labelled x 4 ad the moth i which it was sold is labelled x 5. The depedet variable y ad two idepedet variables x ad x 3 were trasformed ad deoted respectively ly, lx ad lx 3. The idepedece variable lx 3 is called the deflator. Hece we ca write x d =lx 3. Table displays classical ad positio statistics tabulated by the six variables. The aalysis bega with a study of homoscedasticity for each idepedet variable separately ad the depedet variable (Table 3). We performed the Goldfeld-Quadt test usig six methods of defiig the set of omitted observatios, separately. The same aalyses were made for two groups of idepedet variables, i which we sorted these data with respect to x d =lx 3. We performed the Goldfeld-Quadt test for two groups to compare their results. We calculated the media absolute deviatio of the deflator MAD(x d )=MAD(lx 3 )=0.397 ad iter-quatile rage (x d(3/5) -x d(/5) )=0.809. Table 3 examies the p-value for these aalyses ad the umber of omitted observatios of the deflator lx 3 for two stadard methods ad our four propositios for defiig the set of omitted observatios. For the last two aalyses we also calculated the power of the test.

Notes o the use of the Goldfeld-Quadt test for heteroscedasticity 53 Table. Classical statistics ad positio statistics for the variables cosidered i example 4.7 by Maddala (006) ly x lx lx 3 x 4 x 5 Mea 8.393 0.37.657.43 6.93 3.8 Stadard deviatio 0.53 0.45 0.50 0.87 6.78.5 Stadard error 0.064 0.055 0.06 0.0 4.7.4 Media 8.466 0.000.74.4 0.00 4.0 Miimum observatio 6.770 0.000.53 -.303 3.50.0 Maksimum observatio 9.700.000 3.757 3.48 656.00 46.0 Before we cosider the ifluece of a defied of set of omitted observatios o the p-value ad power of the Goldfeld-Quadt test, it is ecessary to make a short digressio o the relatio betwee the magitude of the sets cosidered. I the example the greatest umber of observatios belogs to the secod stadard set (0% =3), but the smallest p-value ad the highest power is for the propositios mea plus or mius stadard deviatio ad media plus or mius stadard deviatio. I the respective sets we have 9 observatios i each. For the ext two propositios, based o mea absolute deviatio or media absolute deviatio, we obtaied worse results. 6. Coclusio To coclude our cosideratios we offer four methods for buildig the set of omitted observatios. O the basis of the Mote Carlo simulatio we geerally prefer first propositio. Propositio is preferred whe the deflator has symmetrical distributio, propositio two whe the distributio is asymmetrical. For the cosidered research problem the results of each applicatio of the Goldfeld-Quadt test are differet. I may cases our propositios are better tha the secod stadard method cosidered here. Methods based o measuremet of the deviatio of the deflator are better tha the removal of 0% cetral observatios without regard to their distributio. We should reiterate that the use of variatio measures such as stadard error, media absolute deviatio or stadard ucertaity makes it possible to avoid takig a overoptimistic view of reality.

54 A. Budka, D. Kachlicka, M. Kozłowska Table 3. The p-value for the Goldfeld-Quadt test for simple liear regressio ad two multivariate liear regressio with xd=lx3 (ro umber of removed observatios of variable lx3) Set of omitted observatios lx lx3 x4 x5 ro x; lx; lx3; x4; x5 lx; lx3 p-value p-value p-value p-value p-value power p-value power ull 0.0405 0.0049 0.3406 0.95 0 0.009 0.8357 0.00 0.8770 0% cetral observatios 0.096 0.003 0.984 0.379 3 0.005 0.859 0.006 0.8543 mea ± stadard error 0.0578 0.0005 0.847 0.406 9 0.006 0.8485 0.0006 0.906 media ± stadard error 0.608 0.0005 0.639 0.406 9 0.006 0.8485 0.0006 0.906 media ± MeaAD/sqrt() 0.047 0.004 0.474 0.406 6 0.0053 0.773 0.000 0.8768 media ±.859 MAD/sqrt() 0.047 0.0038 0.636 0.3980 7 0.004 0.753 0.000 0.8768

Notes o the use of the Goldfeld-Quadt test for heteroscedasticity 55 REFERENCE Breusch T., Paga A. (979): A Simple Test for Heteroscedasticity ad Radom Coefficiet Variatio. Ecoometrica 47: 87-94. Buse A. (984): Tests for additive heteroskedasticity Goldfeld ad Quadt revisited. Empirical Ecoomics 9: 99-6. Carapeto M., Holt W. (003): Testig for heteroscedasticity i regressio models. Joural of Applied Statistics 30(): 3-0. Dawid H.A. (98): Order Statistics. New York, Wiley. Dufour J.-M., Khalaf L., Berard J.-T., Geest I. (004): Simulatio-based fiitesample tests for heteroskedasticity ad ARCH effects. Joural of Ecoometrics : 37-347. Griffiths W.E., Surekha K. (986): A Mote Carlo evaluatio of the power of some tests for heteroscedasticity. Joural of Ecoometrics 3: 9-3. Goldfeld S.M., Quadt R.E. (965): Some tests for homoscedasticity. America Statistical Associatio Joural 60: 539-547. Maddala G.S. (006): Ekoometria. PWN, Warszawa Thursby G. (98): Misspecificatio, heteroscedasticity, ad the chow ad Goldfeld- Quadt tests. The Review of Ecoomics ad Statistics 64(): 34-3. White H. (980): A Heteroscedasticity-Cosistet Covariace Matrix Estimator ad a Direct Test for Heteroscedasticity. Ecoometrica 48: 87-838.