Preponderantly increasing/decreasing data in regression analysis

Similar documents
(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Linear Regression Demystified

A Note on the Kolmogorov-Feller Weak Law of Large Numbers

Algebra of Least Squares

Riesz-Fischer Sequences and Lower Frame Bounds

Topic 9: Sampling Distributions of Estimators

PAijpam.eu ON DERIVATION OF RATIONAL SOLUTIONS OF BABBAGE S FUNCTIONAL EQUATION

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities

A Fixed Point Result Using a Function of 5-Variables

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

The Method of Least Squares. To understand least squares fitting of data.

New Inequalities For Convex Sequences With Applications

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Comparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes

A Note On The Exponential Of A Matrix Whose Elements Are All 1

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Self-normalized deviation inequalities with application to t-statistic

On forward improvement iteration for stopping problems

7.1 Convergence of sequences of random variables

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

Diagonal approximations by martingales

Common Coupled Fixed Point of Mappings Satisfying Rational Inequalities in Ordered Complex Valued Generalized Metric Spaces

Lecture 19: Convergence

Lecture 23: Minimal sufficiency

Sequences. Notation. Convergence of a Sequence

Chapter 7 Isoperimetric problem

ECON 3150/4150, Spring term Lecture 3

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Chapter 6 Infinite Series

Oscillation and Property B for Third Order Difference Equations with Advanced Arguments

Topic 9: Sampling Distributions of Estimators

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

Approximating the ruin probability of finite-time surplus process with Adaptive Moving Total Exponential Least Square

Topic 9: Sampling Distributions of Estimators

Properties and Hypothesis Testing

Chapter 6 Principles of Data Reduction

The natural exponential function

Research Article Some E-J Generalized Hausdorff Matrices Not of Type M

Math 61CM - Solutions to homework 3

Concavity of weighted arithmetic means with applications

Mathematical Methods for Physics and Engineering

ON THE LEHMER CONSTANT OF FINITE CYCLIC GROUPS

Stochastic Matrices in a Finite Field

About the use of a result of Professor Alexandru Lupaş to obtain some properties in the theory of the number e 1

Correlation Regression

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

A statistical method to determine sample size to estimate characteristic value of soil parameters

Roger Apéry's proof that zeta(3) is irrational

Axioms of Measure Theory

Mi-Hwa Ko and Tae-Sung Kim

Lecture 3. Properties of Summary Statistics: Sampling Distribution

7.1 Convergence of sequences of random variables

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

CSE 1400 Applied Discrete Mathematics Number Theory and Proofs

Assignment 5: Solutions

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Castiel, Supernatural, Season 6, Episode 18

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Expectation and Variance of a random variable

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Asymptotic distribution of products of sums of independent random variables

Central limit theorem and almost sure central limit theorem for the product of some partial sums

An Introduction to Randomized Algorithms

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

5. Likelihood Ratio Tests

The value of Banach limits on a certain sequence of all rational numbers in the interval (0,1) Bao Qi Feng

Mathematical Modeling of Optimum 3 Step Stress Accelerated Life Testing for Generalized Pareto Distribution

Linear Regression Models

sin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =

A Characterization of Compact Operators by Orthogonality

MATH 324 Summer 2006 Elementary Number Theory Solutions to Assignment 2 Due: Thursday July 27, 2006

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

SPECTRUM OF THE DIRECT SUM OF OPERATORS

Decoupling Zeros of Positive Discrete-Time Linear Systems*

Advanced Stochastic Processes.

A GRÜSS TYPE INEQUALITY FOR SEQUENCES OF VECTORS IN NORMED LINEAR SPACES AND APPLICATIONS

Statistical Inference Based on Extremum Estimators

Math 113 Exam 3 Practice

ON SOME INEQUALITIES IN NORMED LINEAR SPACES

Rates of Convergence by Moduli of Continuity

Recurrence Relations

Statistical Properties of OLS estimators

MAT1026 Calculus II Basic Convergence Tests for Series

MAS111 Convergence and Continuity

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Chapter 6 Sampling Distributions

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation

Efficient GMM LECTURE 12 GMM II

Polynomial Functions and Their Graphs

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Estimation of the essential supremum of a regression function

Transcription:

Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek, Trg Ljudevita Gaja 6, 31000 Osijek, Croatia E-mail: darija@mathos.hr Abstract. For the give data (, x i, y i), i = 1,...,, ad the give model fuctio f(x; θ), where θ is a vector of ukow parameters, the goal of regressio aalysis is to obtai estimator θ of the ukow parameters θ such that the vector of residuals is miimized i some sese. The commo approach to this problem of miimizatio is the least-squares method, that is miimizig the L 2 orm of the vector of residuals. For oliear model fuctios, what is ecessary is fidig at least the sufficiet coditios o the data that will guaratee the existece of the best least-squares estimator. I this paper we will describe ad examie i detail the property of prepoderat icrease/decrease of the data, which esures the existece of the best estimator for certai importat oliear model fuctios. Keywords: regressio aalysis, oliear least squares, existece problem, prepoderat icrease/decrease property, Chebyshev iequality Received: October 3, 2016; accepted: December 11, 2016; available olie: December 30, 2016 DOI: 10.17535/crorr.2016.0018 1. Itroductio Regressio aalysis is oe of the most importat tools for data aalysis. For the give model fuctio f(x; θ), where θ is a vector of ukow parameters, the regressio model ca be writte i the form y i = f(x i ; θ) + ε i, i = 1,...,. Here x i deote the values of the idepedet variable, y i deotes the value of the depedet variable ad ε i are radom errors. The goal of regressio aalysis is to obtai the estimator θ of the ukow parameters θ base o the give experimetal or empirical data (, x i, y i ), where > 0 are i advace give data weights. Iversevariace weightig is typically used i statistical literature, but various weightig methods are used to calculate weights (see [12]). The errors ε i are usually assumed to be ormally distributed with mea zero ad costat variace. Correspodig author. http://www.hdoi.hr/crorr-joural c 2016 Croatia Operatioal Research Society

270 Darija Marković I the most geeral way, we ca divide regressio models ito two classes: liear ad oliear regressio models. Liear regressio models are those that are liear i all parameters ad are ofte used i applied scieces due to their simplicity. However such models are usually orealistic. Noliear regressio models usually appear whe there is some physical explaatio of the relatioship betwee the depedet ad the idepedet variable i a particular fuctioal form. Practical itroductios to oliear regressio icludig umerous data examples are give by Ratkowsky i [11] ad by Bates ad Watts i [1]. A more extesive treatmet of oliear regressio methodology is give by Seber ad Wild i [15]. The vector of ukow parameters θ i the regressio model should be estimated from the give data by miimizig a suitable goodess-of-fit expressio with respect to θ. The most popular criterio i applied scieces is based o miimizatio of the weighted sum of squared residuals, that is by miimizig the followig expressio S(θ) = (y i f(x; θ)) 2 = ε 2 i. This approach is kow as weighted least squares (LS) method. It is well kow that for a liear least squares problem a closed form solutio exists ad ca be easily obtaied. Numerical methods for solvig the oliear LS problem are described i [1, 11, 12, 15]. Before performig iterative miimizatio of the sum of squares, the questios remais as to whether the least squares estimate (LSE) exists. For the case of oliear LS problems, aswerig this questio is exceptioally difficult (see [1, 11, 12, 15]). Therefore, i order to guaratee the existece of a miimum of the fuctioal S (i.e. the existece of the LSE) it is ecessary to require that the data (, x i, y i ), i = 1,...,, satisfy some additioal coditios. It has bee show that property of prepoderat icrease/decrease, which will be described i more details i the ext sectio, has a importat role i aalyzig the existece of the LSE (see e.g. [2, 3, 5, 6, 7, 13, 14]). Our mai theoretical result is the useful Theorem 1 that describes icreasig/decreasig data by prepoderatly icreasig/decreasig data. Its applicatio i some oliear least squares existece problems is illustrated i Sectio 3. 2. Property of prepoderat icrease/decrease of the data Let us first recall some importat ad useful defiitios ad results. Throughout the etire paper we will suppose that we are give data (x i, y i ), i = 1,...,, such that x 1 x 2 x ad x 1 < x. (1) We will say that the data (x i, y i ) are icreasig if y 1 y 2 y ad y 1 < y. Similarly, we will say that the data are decreasig if y 1 y 2 y ad y 1 > y.

Prepoderatly icreasig/decreasig data i regressio aalysis 271 If y 1 = y 2 = = y we say that the data are costat or statioary. The property of prepoderat icrease/decrease ca be described by the Chebyshev iequality. This iequality is usually stated as follows (see [4, 10]): Let a 1 a 2 a ad b 1 b 2 b. The a i b i a i b i. The above Chebyshev iequality ca be exteded to iclude positive weights p i, p i p i a i b i i p i a i p i b i. (2) The Chebyshev iequality ca be reversed: If a 1 a 2 a ad b 1 b 2 b the p i p i a i b i p i a i p i b i. (3) If at least two of the a i ad at least two of the b i are distict the both iequalities (2) ad (3) are strict. For ay two fiite sequeces of real umbers a = (a 1,..., a ) ad b = (b 1,..., b ) ad ay strictly positive ad fiite sequece of real umbers p = (p 1,..., p ), the so-called Korkie s idetity holds (see [9]): p i p i a i b i p i a i p i b i = 1 2 j=1 p i p j (a i a j )(b i b j ). (4) This equality is easy to check directly by rewritig the right-had side of the equatio. Before statig the defiitio of prepoderatly icreasig/decreasig data, let us first recall that, for the give data (, x i, y i ), i = 1,...,, the correspodig weighted regressio lie is give by y = a x + b, where a = b = w i x i y i w ix i y i w i x 2 i ( w, ix i ) 2 y i a x i w. i Defiitio 1 (see [13]). The data (, x i, y i ), i = 1,...,, are said to have the prepoderat icrease (respectively decrease) property if the slope a of the associated weighted liear tred is positive (respectively egative). If the slope is equal to zero, the the data is said to be prepoderatly statioary.

272 Darija Marković The property of prepoderat icrease/decrease is also referred to as a essetial icrease/decrease property (see e.g. [2]). I geeral, this property depeds o both data (x i, y i ) ad weights. We will illustrate this i the ext simple example. Example 1. Let the data x = (2, 5, 8) ad y = (2, 5, 2) be give. We will make the followig choices of weight: w 1 = (1, 1, 1), w 2 = (1, 1, 1 2 ) ad w 3 = ( 1 2, 1, 1). For the first choice of weights, the slope of the regressio lie is equal to zero, hece the data are prepoderatly statioary. For the secod choice of weights, the slope of the regressio lie is equal to 1 7, ad the data have the property of prepoderat icrease. The last choice of weights for the slope gives the value of 1 7, ad hece i this case the data have the property of prepoderat decrease. y 6 y 6 y 6 4 4 4 2 2 2 2 4 6 8 10 (a) (w 1, x, y), y = 3 x 2 4 6 8 10 x (b) (w 2, x, y), y = 1 7 x + 16 5 2 4 6 8 10 x (c) (w 3, x, y), y = 1 7 x + 4 Figure 1: The data ad associated liear treds for differet choices of weights Give that the deomiator of a is strictly positive as x i satisfy (1), the data will be prepoderatly icreasig if ad oly if the followig coditio is satisfied Q(w, x, y) = x i y i x i y i > 0. By usig (4), that coditio ca be writte i the followig form: Q(w, x, y) = j=1 w j (x i x j )(y i y j ) > 0. (5) The above discussio is stated i the followig propositio (see also [13]). Propositio 1. The data (, x i, y i ), i = 1,...,, have the property of prepoderat icrease if ad oly if the Chebyshev iequality holds: x i y i > x i y i. Similarly, property of prepoderat decrease is equivalet to the opposite iequality. The followig results are direct cosequeces of Propositio 1 (see also [2]). Propositio 2. Effect of some trasformatio of the data.

Prepoderatly icreasig/decreasig data i regressio aalysis 273 (a) If the data (, x i, y i ), i = 1,...,, are prepoderatly icreasig (decreasig), the the data (, x i, y i ) are prepoderatly decreasig (icreasig). (b) If the data (, x i, y i ), i = 1,...,, are prepoderatly icreasig (decreasig), the the data (, x i, y i ) are prepoderatly decreasig (icreasig). (c) If the data (, x i, y i ), i = 1,...,, are prepoderatly icreasig (decreasig) ad 0 < x 1, the the data ( x i, x i, y i ), where x i = 1 x i, are prepoderatly decreasig (icreasig). (d) If the data (, x i, y i ), i = 1,...,, are prepoderatly icreasig (decreasig) ad 0 < y 1 ( 0 < y ), the the data ( y i, x i, ỹ i ), where ỹ i = 1 y i, are prepoderatly decreasig (icreasig). I regressio aalysis the most ofte case is whe values of idepedet variable are distict, i.e. x 1 < x 2 < < x. The followig theorem treats just that case. Theorem 1. Suppose we are give the o-costat data (x i, y i ), i = 1,...,, such that x 1 < x 2 < < x. The data are icreasig (or decreasig) if ad oly if the data (, x i, y i ) have the property of prepoderat icrease (or prepoderat decrease) for ay choice of weights > 0, i = 1,...,. Proof. We will prove the Theorem 1 oly for the case whe data are icreasig; the proof for the case whe data are decreasig would be doe i a similar way. Therefore suppose that the data (x i, y i ), i = 1,...,, are icreasig. The (x i x j )(y i y j ) 0. Sice the data are icreasig, strict iequality appears for at least oe pair (i, j). Due to this fact, for ay choice of weights > 0 iequality (5) holds, ad by defiitio, the data have the property of prepoderat icrease. Suppose that the data (, x i, y i ), i = 1,..., are prepoderatly icreasig for ay choice of weights > 0, i = 1,...,. Let i 1, i 2 {1,..., } such that i 1 < i 2. It is ecessary to show that y i1 y i2 ad y 1 < y. To do this, for each k N let us defie { 1 := k, i {1,..., } \ {i 1, i 2 } 1, i {i 1, i 2 }.

274 Darija Marković By assumptio, we have Q(w, x, y) = j=1 = 1 k 2 w j (x i x j )(y i y j ) i i 1,i 2 (x i x j )(y i y j ) j=1 j i 1,i 2 + 1 (x i1 x j )(y i1 y j ) k j=1 j i 2 + 1 (x i2 x j )(y i2 y j ) k j=1 j i 1 +2(x i2 x i1 )(y i2 y i1 ) > 0, k N, wherefrom by passig to the limit whe k we obtai (x i2 x i1 )(y i2 y i1 ) 0 Sice x i2 x i1 > 0, we have y i2 y i1 0. It remais to show that y 1 < y. Ideed, if y 1 = y, the y 1 = y 2 = = y, implyig that the slope of the weighted liear tred is equal to zero which cotradicts the assumptio. Remark 1. The followig result which was used i [3] for proof of the existece of the LSE ca ow be viewed as a cosequece of Theorem 1 for a special choice of weights: Suppose that the data (, x i, y i ), i = 1,...,, are such that 0 < x 1 < x 2 < < x ad > 0, i = 1,...,. The (i) If the sequece (y 1,..., y ) icreases, the x i y i with the iequality holdig for y 1 < y. (ii) If the sequece (y 1 /x 1,..., y /x ) decreases, the y i x i x 3 i x 2 i with the iequality holdig for y 1 /x 1 > y /x. y i x i 0, (6) y i x 2 i 0, (7) Coditio (6) meas that the data ( /x i, x i, y i ), i = 1,...,, have the property of prepoderat icrease, ad similarly, coditio (7) meas that the data ( x 2 i, x i, y i /x i ), i = 1,...,, have the property of prepoderat decrease.

Prepoderatly icreasig/decreasig data i regressio aalysis 275 3. Some applicatios of prepoderat icrease property i regressio aalysis I this sectio we will give a brief overview of a femportat oliear least square regressio models. It has bee show that for these models the property of prepoderat icrease/ decrease is crucial for the existece of the LSE. Example 2. The mathematical model described by a expoetial fuctio f(x; b, c) = be cx, b, c R or a liear combiatio of such fuctios is ofte used i applied research, e.g. biology, chemistry, electrical egieerig, ecoomy, astroomy, uclear physics, etc. I [7], it was show that the LSE exists, provided the data satisfy either the coditio of prepoderat icrease or the coditio of prepoderat decrease. Example 3. The geeralized logistic fuctio (or asymmetric S fuctio) f(x; b, c) = A, A, b, c, γ > 0 (1 + be cγx ) 1/γ occurs frequetly i various applied areas, such as biology, marketig, ecoomics, etc. The existece of the LSE for the geeralized logistic fuctio is cosidered i [8], where it was proved that the LSE exists if i additio to the atural coditio o the data, it is eough to require that the data are prepoderatly icreasig. Example 4. The Michaelis-Mete ezyme kietic model f(x; a, b) = ax, a, b > 0, b + x is widely used i biochemistry, pharmacology, biology ad medical research. The followig theorem gives sufficiet coditios for the existece of the LSE; the proof ca be foud i [3]. Theorem 2. Let the data (, x i, y i ), i = 1,..., m, m 3, be give, such that 0 < x 1 x 2... x m, x 1 < x m ad y i > 0, i = 1,..., m. If the data fulfill iequalities (6) ad (7), the the LS estimate for the Michaelis-Mete fuctio exists. 4. Coclusio The aim of this paper was to thoroughly ivestigate the property of prepoderat icrease/decrease of data. This property was foud very useful i solvig the existece problem for some importat oliear model fuctios. The theoretical improvemet of results give i [13] is stated i Theorem 1; that is the descriptio of the icreasig/decreasig data by prepoderatly icreasig/decreasig data. Ackowledgemet I am grateful to the aoymous reviewers ad the editor for their valuable suggestios, costructive commets ad their efforts i improvig this paper.

276 Darija Marković Refereces [1] Bates, D.M. ad Watts, D.G. (1988). Noliear Regressio Aalysis ad Its Applicatios. New York: Wiley. [2] Dubeau, F. ad Mir, Y. (2007). O discrete least squares polyomial fit, liear spaces ad data classificatio. J. Math. Stat., 3, 222-227. [3] Hadeler, K.P., Jukić, D. ad Sabo, K. (2007). Least squares problems for Michaelis Mete kietics. Math. Meth. Appl. Sci., 30, 1231-1241. [4] Hardy, G.H., Littlewood, J.E. ad Pòlya, G. (1988). Iequalities, 2d ed. Cambridge: Cambridge Uiversity Press. [5] Jukić, D., Sabo, K. ad Scitovski, R. (2007). Total least squares fittig Michaelis- Mete ezyme kietic model fuctio. J. Comput. Appl. Math., 201, 230-246. [6] Jukić, D. ad Scitovski, R. (1998). Existece results for special oliear total least squares problem. J. Math. Aal. Appl., 226, 348-363. [7] Jukić, D. ad Scitovski, R. (1997). Existece of optimal solutio for expoetial model by least squares. J. Comput. Appl. Math., 78, 317-328. [8] Jukić, D. ad Scitovski, R. (1996). The existece of optimal parameters of the geeralized logistic fuctio. Appl. Math. Comput., 77, 281-294. [9] Korkie, A.N. (1882). Ob odom opredeleom itegrale, (Russia). Math. Sborik, 10, 571-572. [10] Mitriović, D.S., Pečarić, J. ad Fik, A.M. (1993). Classical ad New Iequalities i Aalysis. Dordrecht: Kluwer. [11] Ratkowsky, D.A. (1989). Hadbook of Noliear Regressio Models. New York: Dekker. [12] Ross, G.J.S. (1990). Noliear Estimatio. New York: Spriger Verlag. [13] Scitovski, R. (1988). Coditio of prepoderat icrease ad Tchebycheffs iequality. I Jovaović, B. S. (Ed.). Proceedigs of VI. Coferece o Applied Mathematics (pp. 189-194). Belgrade. [14] Scitovski, R. (1988). Some special oliear least squares problems. Radovi matematički, 4, 279-298. [15] Seber, G.A.F. ad Wild, C.J. (1989). Noliear Regressio. New York: Wiley.