OFFICE OF NAVAL RESEARCH FINAL REPORT for TASK NO. NR PRINCIPAL INVESTIGATORS: Jeffrey D. Hart Thomas E. Wehrly

Similar documents
Econ 582 Nonparametric Regression

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

Nonparametric Methods

A New Method for Varying Adaptive Bandwidth Selection

REGRESSION WITH CORRELATED ERRORS C.A. GLASBEY

in a series of dependent observations asks for specific regression techniques that take care of the error structure to avoid under- respectively overs

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis

TIME SERIES DATA ANALYSIS USING EVIEWS

Kernel Density Estimation

Stat 5101 Lecture Notes

Local Polynomial Modelling and Its Applications

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

Introduction to Econometrics

Hakone Seminar Recent Developments in Statistics

ARIMA Modelling and Forecasting

Box-Jenkins ARIMA Advanced Time Series

Arma-Arch Modeling Of The Returns Of First Bank Of Nigeria

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Chapter 3: Regression Methods for Trends

Defect Detection using Nonparametric Regression

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Does k-th Moment Exist?

Nonparametric Regression

Statistical Methods for Forecasting

Transformation and Smoothing in Sample Survey Data

Testing Error Correction in Panel data

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE

Nonparametric confidence intervals. for receiver operating characteristic curves

Olga Y. Savchuk EDUCATION WORK EXPERIENCE. Department of Mathematics and Statistics University of South Florida

A New Approach to Robust Inference in Cointegration

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Confidence intervals for kernel density estimation

Section 2 NABE ASTEF 65

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Relevance Vector Machines for Earthquake Response Spectra

On Consistency of Tests for Stationarity in Autoregressive and Moving Average Models of Different Orders

Modelling Survival Data using Generalized Additive Models with Flexible Link

7 Semiparametric Estimation of Additive Models

Interaction effects for continuous predictors in regression modeling

Additive Isotonic Regression

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

Three Papers by Peter Bickel on Nonparametric Curve Estimation

Single Equation Linear GMM with Serially Correlated Moment Conditions

Frontier estimation based on extreme risk measures

INTRODUCTORY REGRESSION ANALYSIS

Economic modelling and forecasting

A New Procedure for Multiple Testing of Econometric Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Carl N. Morris. University of Texas

Elements of Multivariate Time Series Analysis

Nonparametric Principal Components Regression

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

Bootstrapping The Order Selection Test

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

LIST OF PUBLICATIONS. 1. J.-P. Kreiss and E. Paparoditis, Bootstrap for Time Series: Theory and Applications, Springer-Verlag, New York, To appear.

A Goodness-of-fit Test for Copulas

On Fractile Transformation of Covariates in Regression 1

KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 COWLES FOUNDATION DISCUSSION PAPER NO.

Nonparametric Regression. Badr Missaoui

Modelling Non-linear and Non-stationary Time Series

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

STAT 704 Sections IRLS and Bootstrap

Section 7: Local linear regression (loess) and regression discontinuity designs

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

DavidAllanBurt. Dissertation submitted to the Faculty of the. Virginia Polytechnic Institute and State University

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Local Polynomial Regression

Extending clustered point process-based rainfall models to a non-stationary climate

Introduction to Eco n o m et rics

On the use of Nonparametric Regression in Assessing Parametric Regression Models

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

Wavelet Regression Estimation in Longitudinal Data Analysis

Single Equation Linear GMM with Serially Correlated Moment Conditions

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

A test for improved forecasting performance at higher lead times

Bickel Rosenblatt test

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

MGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong

CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

In this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment.

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Using Estimating Equations for Spatially Correlated A

Christopher Dougherty London School of Economics and Political Science

A New Solution to Spurious Regressions *

BOOTSTRAP CONFIDENCE INTERVALS FOR PREDICTED RAINFALL QUANTILES

The regression model with one stochastic regressor (part II)

LESLIE GODFREY LIST OF PUBLICATIONS

Parametric Inference on Strong Dependence

A Bootstrap Test for Conditional Symmetry

Bayesian Estimation and Inference for the Generalized Partial Linear Model

Mohsen Pourahmadi. 1. A sampling theorem for multivariate stationary processes. J. of Multivariate Analysis, Vol. 13, No. 1 (1983),

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Santiago Velilla. Curriculum Vitae

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Transcription:

AD-A240 830 S ~September 1 10, 1991 OFFICE OF NAVAL RESEARCH FINAL REPORT for 1 OCTOBER 1985 THROUGH 31 AUGUST 1991 CONTRACT N00014-85-K-0723 TASK NO. NR 042-551 Nonparametric Estimation of Functions Based Upon Correlated Observations PRINCIPAL INVESTIGATORS: Jeffrey D. Hart Thomas E. Wehrly Department of Statistics Texas A&M Urnivcrsity College Station, TX 77843 91-11041 Reproduction in whole, or in part, is permitted for any purpose of the United States Government. *This document has been approved for nublic r=!'" nd sale; its distribution is unlimited.

Our research effots can be broken down into three categories: (1) smoothing dependent data, (2) general issues arising in function estimation, and (3) hypothesis testing based on smoothing methods. These three areas will be discussed below in Sections 1, 2, and 3, respectively. 1. Smoothing Dependent Data. Our first goal was to gain an understanding of how correlation affects standard automated methods of smoothing data from fixed-design regression models. We found that, generally speaking, positive serial correlation among the data induces cross-validation and related methods to drastically undersmooth the data, while negative serial correlation leads to oversmoothing (of a less drastic nature). These results are detailed in Hart and Wehrly (1986)*, Hart (1988, 35.) and Hart (1991). In particular, it is shown theoretically in Hart (1991) that cross-validation is extremely sensitive to very small amounts of positive correlation. We next proposed and investigated smoothing methodology that explicitly accounts for dependence in the data. In Hart and Wehrly (1986) we considered the problem of nonparametrically estimating the mean fuiction in a repeated measures model. We developed a data-driven means for choosing the bandwidth of a kernel estimator of the mean function. This methodology was applied to a data set involving plasma citrate concentrations of human subjects. In Holiday, Wehrly and Hart (1987, ONR Tech. Report No. 3) and Holiday and Hart (1987, ONR Tech. Repot No. 4), the repeated measures model was studied further. In the former technical report, optimal linear estimators of the mean function were derived. Conditions for consistency of linear estimators were obtained for a number of correlation structures. A numerical study was carried out to compare four popular kernel estimators with the optimal estimator. In Holiday and Hart (1987, ONR Tech. Report No. 4), properties of kernel estimators of the pth derivative of the mean curve were investigated. The asymptotic mean squared error of kernel estimators of the first and second derivatives were obtained for the Ornstein-Uhlenbeck model and a model with a smooth correlation function. Conditions for the mean square consistency of kernel estimators were provided for these models. Smoothing methodology in the context of estimating the trend in a single time series * References with only a year are published papers. A reference with a year followed by a number is the technical report of that number in the Texas A&M Department of Statistics series. 2.#

was proposed in Hart (1988, 35.), (1991). The major difficulty in this setting is obtaining estimates of the correlation structure when no model is available for the trend. Correlation estimates based on differences of the data were studied in Hart (1989). These estimates were used in the construction of data-driven smoothers of time series data (Hart (1988, 35.), (1991)). Related smoothing methods were investigated in Ha't and Wehrly (1990, 110). Data-driven bandwidth choice in probability density estimation was investigated by Hart and Vieu (1990). They showed that bandwidth selectors derived on an assumption of independent data work reasonably well for dependent data, at least if the dependence is not extremely strong. Precise conditions on the allowable degree of dependence were provided. Hart and Vieu (1990) also proposea n alternative to ordinary cross-validation for cases where the da'a are highly dependent. Theoretical aspect3 of nonparametric function estimation based on dependent data were explored in Hall and Hart (1990b,c). They studied the effect of long-range dependence on the efficiency of kernel estimators in the respectivc scttilgs of probability density estimation and fixed-design regression. Long-range dependence was shown to degrade efficiency to a much greater extent in the latter setting. However, when dependence is sufficiently long-range, even the efficiency of kernel density estimators is substantially degraded. Hall and Hart (1990b) provided the first precise results along these lines. Baek (1991, Ph.D dissertation) under the direction of Wehrly investigated kernel estimators for a multiple time series with an additive conditional mean structure. The asymptotic normality of the Nadaraya-Watson kernel estimator was proven for an a-mixing multiple time series. 2. General Issues in Function Estimation. Bias properties of nonparametric function estimators are generally unaffected by correlation among the data. Therefore, bias reduction techniques are applicable whether the data are dependent or not. In the setting of probability density estimation, Hart (1987, ONR Tech. Report No. 2) and Hart (1988) investigated so-called ARMA density estimators. These estimators make use of the generalized jackknife to reduce the bias of Fourier series estimators. It was shown that, under geneial condil:.,as, a certain ARMA estimator has smaller mean integrated squared error than does any tapered Fourier series estimator. While this last result is for independent data, it can be easily extended to dependent data which satisfy a 0-mixing condition. Kernel regression estimators are subject to so-called boundary or edge effects, a phe- 3

nomenon in which the bias of an estimator increases near the endpoints of the estimation interval. Hall and Wehrly (1991) intrcduced a simple geometric method for removing these edge effects. It involves reflecting the data set in two estimated points, thereby generating a new data set with three times the range of the original data. The usual kernel-type estimator may be applied to the enlarged data se4 without any danger of edge effects. A cross-validation algorithm may be extended to the endb of th:e design interval, unlike its more conventional counterpart which must be downweighted at the ends of the interval to -,','id edge effccts. Hart and Wehrly (1990, 91.), (1991) also proposed special boundary kernels to deal with cases where the regression curve is linear or nearly linear and the requisite amount of smoothing is so great that the boundary region is the entire estimation interval. When the smoothing parameter of the proposed estimator is large, the kernel estimator is close to a straight line. The limiting straight line is essentially the least squares line for equally spaced data. In smoothing data with more than one predictor, the use of kernel smoothers will necessitate extremely large data sets. To alleviate this curse of dimensionality, additive models for the regression function have proved to be useful. Baek (1991, Ph.D. dissertation) investigated the performance of a one-step estimator of the marginal regression functions for a fixed design. The optimal rate of convergence is obtained for p = 2 dimensions, but the rate is suboptimal for more than two dimensions. The results suggest that an iterative estimator may be necessary to avoid the curse of dimensionality. A basic problem in sample surveys is to estimate the population sum of a function of a random variable measured on each element in the population. A model-based approach involves parametrically estimating the conditional expectation of the population total given the observed value of the random variable. If the model is incorrect, this approach gives biased estimates. Chambers, Dorfman, and Wehrly (1990, 120.) used a kernel regression estimator to obtain a bias-calibrated version of the parametric estimator. The bandwidth of the kernel estimator is selected automatically by minimizing its estimated risk. The resulting estinator is robust to model misspecification. An application to prediction of the finite population distribution function of a population of Australian beef farms is presented. 3. Hypothesis Testing Via Smoothing. Recently there has been much interest in using smoothing methods to test hypotheses about regression functions. For example, we have made use of kernel smoothers in the problem of comparing Lwo regression curves. 4

Such work has applications in the analysis of covariance and other areas. Suppose one has two independent sets of data, and that each set of data follows a model of the form data = smooth curve + error. The problem of interest is to test the hypothesis that the two curves are identical. Our approach is to calculate kernel estimators of the two curves and to compare them using a quadratic measure of discrepancy. The probability distribution of this discrepancy measure is obtained on the assumption that the two curves are identical. King, Hart and Wehrly (1989, 72.), (1991) investigate the test obtained by assuming that the errors within each data set are normally distributed. The power of the test and its robustness to departures from normality are studied by means of simulation. King (1988, Ph.D. dissertation) obtains large sample properties of the test under a general model for the error distributions. Extensions to cases where the errors are correlated are possible and will be the subject of future research. Hall and Hart (1989, 57.), (1990a) studied a bootstrap approach for testing equality of two regression curves. As in the King, Hart and Wehrly test, the Hall-Hart test requires the choice of a bandwidth. One option is to use a data-driven bandwidth based on risk estimation. An important finding by Hall and Hart (19 9 0a) is that the variablity of their test statistic is significantly increased when a data-driven bandwidth is used. However, by incorporating this extra source of variation into the bootstrap algorithm, a valid test may be obtained. Another testing problem in which smootning methods are applicable is that of checking the adequacy of a parametric regression model. Eubank and Hart (1990, 121.) proposed methodology for testing the goodness-of-fit of a linear model. Their test uses the value of a data-driven smoothing parameter as test statistic. An advantage of this approach is that the test is a well-defined function of the data, and no smoothing parameter needs to be fixed by the data analyst. In analyzing their test, Eubank and Hart (1990, 121.) obtained the large sample distributional properties of data-driven smoothing parameters in a nonstandard situation. Hart and Wehrly (1991) proposed a goodness-of-fit test that uses a data-driven bandwidth as the test statistic. Boundary kernels corresponding to a large bandwidth play a crucial part in the methodology. Like the Eubank-Hart test, the Hart-Wehrly test does not involve an arbitrary choice of the smoothiig parameter. The test is shown to be consistent against a very large class of alternative hypotheses. 5

PUBLICATIONS: (1) Hart, J.D. and Wehrly, T.E. (1986). "Kernel regression estimation using repeated measurements data," Journal of the American Statistical Association, 81, 1080-1088. (2) Hart, J. D. (1988) "An ARMA Type Probability Density Estimator," Annals of Statistics, 16, 842-855. (3) Hart, J. D. (1989) "Differencing as an Approximate De-Trending Device," Stochastic Processes and their Applications, 31, 251-259. (4) Matis, J. H., Wehrly, T.E., and Ellis, W. C., (1989). "Some Get.-alized Stochastic Compartment Models for Digesta Flow," Biometrics, 45, 701-720. (5) Matis, J. H. and Wehrly, T. E., (1990). "Generalized Stochastic Compartmental Models with Gamma Transit Times." Journal of Pharmacokinetics and Biophormaceutics, 18, 589-607 (6) Hall, P. G. and Hart, J. D. (1990a) "Bootstrap Test for Difference Between Means in Nonparametric Regression," JASA, 85, 1039-1049. (7) Hall, P. G. and Hart, J. D. (1990b) "Convergence Rates in Density Estimation for Data from Infinite-Order Moving Average Processes," Probability Theory and Related Fields, 87, 253-274. (8) Hall, P. G. and Hart, J. D. (1990c) "Nonparametric Regression with Long- Range Dependence," Stocha.,tic Processes and their Applications, 36, 339-351. (9) Eubank, R. L., Speckman, P., and Hart, J. D. (1990) "Trigonometric Series Regression Estimators with an Application to Partially Linear Models," Journal of Multivariate Analysis, 32, 70-83. (10) Hart, J. D. and Vieu, P. (1990) "Data-Driven Bandwidth Choice for Density Estimation Based on Dependent Data," Annals of Statistics, 18, 873-890. (11) Hall, P. G. and Wehrly, T. E. (1991). "A Geometrical Method for Removing Edge Effects from Kernel-Type Nonparametric Regression Estimators," Journai of the American Statistical Association, 86, 665-672. (12) Cline, D. B. H. and Hart, J. D. (1991) "Kernel Estimation of Densities with Diqcnntinuities or Discontinuous Derivatives," Statistics, 22, 69-84. (13) Hart, J. D. (1991) "Kernel Regression Estimation with Time Series Errors," Journal of the Royal Statistical Society, Series B, 53, 173-187. (14) King, E. C., Hart, J. D. and Wehrly, T. E. (1991). "Testing for the Equality of Two Regression Curves Using Linear Smoothers," Statistics and Probability Letters, 12, 239-247. (15) Hart, J. D. and Wehrly, T. E. (1991). "Kernel Regression Estimation when the Boundary Region is Large with an Application to Testing the Adequacy

of Polynomial Models," to appear in Journal of the American Statistical Association. Technical Reports Hart, J.D. and Wehrly, T.E., (1986) "Kernel Regression Estimation Using Repeated Measurements Data," ONR Technical Report No. 1. Holiday, D.B., (1986) "On Nonparametric Regression Estimation in a Correlated Errors Model," Ph.D. dissertation, Department of Statistics, Texas A&M University. Hart, J.D., (1987) "ARMA Estimators of Probability Densities with Exponential or Regularly Varying Fourier Coefficients", ONR Technical Report No. 2. Holiday, D.B., Wehrly, T.E. and Hart, J.D., (1987) "Considerations for the Linear Estimation of a Regression Function when the Data are Correlated", ONR Technical Report No. 3. Holiday, D.B. and Hart, J.D., (1987) "Kernel Estimation of the Derivative of the Regression Function Using Repeated-Measurements Data", ONR Technical Report No. 4. King, E.C., (1988) "A Test for the Equality of Two Regression Curves Based on Kernel Smoothers," Ph.D. dissertation, Department of Statistics, Texas A&M University. Baek, Jangsun, (1991) "Kernel Estimation for Nonparametric Additive Models," Ph.D. dissertation. Technical Reports from the Department of Statistics Technical Report Series, Texas A&M University 9. Daren B. H. CLINE and Jeffrey D. HART, 1987, Kernel Estimation of Densities with Discontinuities or Discontinuous Derivatives. 23. James H. MATIS, Thomas E. WEHRLY and William C. ELLIS, 1988, Some Generalized Stochastic Compartment Models for Digesta Flow, to appear in Biometrics. 33. Phillipe VIEU and Jeffrey D. HART, 1988, Nonparametric Regression Under Dependence: A Class of Asymptotically Optimal Data-Driven Bandwidths. 34. Jeffrey D. HART and Philippe VIEU, 1988, Data Driven Bandwidth Choice for Density Estimation Based on Dependent Data. 35. Jeffrey D. HART, 1988, Kernel Smoothing When the Observations are Correlated. 36. Jeffrey D. HART, 1988, Differencing as an Approximate De-Trending Device. 39. R. L. EUBANK, J. D. HART and Paul SPECKMAN, 1989, Trigonometric Series Regression Estimators with an Application to Partially Linear Models.

57. Peter HALL and Jeffrey D. HART, 1989, Bootstrap Test for Difference Between Means in Nonparametric Regression. 72. Eileen KING, Jeffrey D. HART and Thomas E. WEHRLY, 1989, Testing the Equality of Two Regression Curves Using Linear Smoothers. 75. J. H. MATIS and T. E. WEHRLY, 1989, Generalized Stochastic Compartmental Models with Gamma Transit Times. 88. Peter HALL and Jeffrey D. HART, 1990, Nonparametric Regression with Long-Range Dependence. 89. Peter HALL and Jeffrey D. HART, 1990, Convergence Rates in Density Estimation for Data from Infinite-Order Moving Average Processes. 90. Peter HALL and Jeffrey D. HART, 1990, On the Probability of Error when Using a General Akaike-Type Criterion to Estimate Autoregression Order. 91. Jeffrey D. HART and Thomas E. WEHRLY, 1990, Kernel Regression Estimation When the Boundary Region is Large. 108. Peter HALL and Thomas E. WEHRLY, 1990, A Geometrical Method for Removing Edge Effects from Kernel-Type Nonparametric Regression Estimators. 110. Jeffrey D. HART and Thomas E. WEHRLY, 1990, Kernel Regression Estimation with Autocorrelated Errors. 120. R.L. CHAMBERS, A.H. DORFMAN and Thomas E. WEHRLY, 1990, Bias Robust Estimation in Finite Populations Using Nonparametric Calibration. 121. R.L. EUBANK and Jeffrey D. HART, 1990, Testing Goodness-of-Fit in Regression via Order Selection Criteria. 129. K.F. CARDWELL and T.E. WEHRLY, 1990, The Use of a Nonparametric Significance Test in Combating Crop Disease. Other Papers Tentatively Accepted or To Appear Hardle, W. and Hart, J. D. (1988) "A Bootstrap Test for Positive Definiteness of Income Effect Matrices," J. of Econornetrics, to appear. Hardle, W., Hart, J. D., Marron, J. S. and Tsybakov, A. B. (1988) "Bandwidth Choice for Average Derivative Estimation, " JASA, to appear.