Preponderantly increasing/decreasing data in regression analysis

Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek, Trg Ljudevita Gaja 6, 31000 Osijek, Croatia E-mail: darija@mathos.hr Abstract. For the give data (, x i, y i), i = 1,...,, ad the give model fuctio f(x; θ), where θ is a vector of ukow parameters, the goal of regressio aalysis is to obtai estimator θ of the ukow parameters θ such that the vector of residuals is miimized i some sese. The commo approach to this problem of miimizatio is the least-squares method, that is miimizig the L 2 orm of the vector of residuals. For oliear model fuctios, what is ecessary is fidig at least the sufficiet coditios o the data that will guaratee the existece of the best least-squares estimator. I this paper we will describe ad examie i detail the property of prepoderat icrease/decrease of the data, which esures the existece of the best estimator for certai importat oliear model fuctios. Keywords: regressio aalysis, oliear least squares, existece problem, prepoderat icrease/decrease property, Chebyshev iequality Received: October 3, 2016; accepted: December 11, 2016; available olie: December 30, 2016 DOI: 10.17535/crorr.2016.0018 1. Itroductio Regressio aalysis is oe of the most importat tools for data aalysis. For the give model fuctio f(x; θ), where θ is a vector of ukow parameters, the regressio model ca be writte i the form y i = f(x i ; θ) + ε i, i = 1,...,. Here x i deote the values of the idepedet variable, y i deotes the value of the depedet variable ad ε i are radom errors. The goal of regressio aalysis is to obtai the estimator θ of the ukow parameters θ base o the give experimetal or empirical data (, x i, y i ), where > 0 are i advace give data weights. Iversevariace weightig is typically used i statistical literature, but various weightig methods are used to calculate weights (see [12]). The errors ε i are usually assumed to be ormally distributed with mea zero ad costat variace. Correspodig author. http://www.hdoi.hr/crorr-joural c 2016 Croatia Operatioal Research Society

270 Darija Marković I the most geeral way, we ca divide regressio models ito two classes: liear ad oliear regressio models. Liear regressio models are those that are liear i all parameters ad are ofte used i applied scieces due to their simplicity. However such models are usually orealistic. Noliear regressio models usually appear whe there is some physical explaatio of the relatioship betwee the depedet ad the idepedet variable i a particular fuctioal form. Practical itroductios to oliear regressio icludig umerous data examples are give by Ratkowsky i [11] ad by Bates ad Watts i [1]. A more extesive treatmet of oliear regressio methodology is give by Seber ad Wild i [15]. The vector of ukow parameters θ i the regressio model should be estimated from the give data by miimizig a suitable goodess-of-fit expressio with respect to θ. The most popular criterio i applied scieces is based o miimizatio of the weighted sum of squared residuals, that is by miimizig the followig expressio S(θ) = (y i f(x; θ)) 2 = ε 2 i. This approach is kow as weighted least squares (LS) method. It is well kow that for a liear least squares problem a closed form solutio exists ad ca be easily obtaied. Numerical methods for solvig the oliear LS problem are described i [1, 11, 12, 15]. Before performig iterative miimizatio of the sum of squares, the questios remais as to whether the least squares estimate (LSE) exists. For the case of oliear LS problems, aswerig this questio is exceptioally difficult (see [1, 11, 12, 15]). Therefore, i order to guaratee the existece of a miimum of the fuctioal S (i.e. the existece of the LSE) it is ecessary to require that the data (, x i, y i ), i = 1,...,, satisfy some additioal coditios. It has bee show that property of prepoderat icrease/decrease, which will be described i more details i the ext sectio, has a importat role i aalyzig the existece of the LSE (see e.g. [2, 3, 5, 6, 7, 13, 14]). Our mai theoretical result is the useful Theorem 1 that describes icreasig/decreasig data by prepoderatly icreasig/decreasig data. Its applicatio i some oliear least squares existece problems is illustrated i Sectio 3. 2. Property of prepoderat icrease/decrease of the data Let us first recall some importat ad useful defiitios ad results. Throughout the etire paper we will suppose that we are give data (x i, y i ), i = 1,...,, such that x 1 x 2 x ad x 1 < x. (1) We will say that the data (x i, y i ) are icreasig if y 1 y 2 y ad y 1 < y. Similarly, we will say that the data are decreasig if y 1 y 2 y ad y 1 > y.

Prepoderatly icreasig/decreasig data i regressio aalysis 271 If y 1 = y 2 = = y we say that the data are costat or statioary. The property of prepoderat icrease/decrease ca be described by the Chebyshev iequality. This iequality is usually stated as follows (see [4, 10]): Let a 1 a 2 a ad b 1 b 2 b. The a i b i a i b i. The above Chebyshev iequality ca be exteded to iclude positive weights p i, p i p i a i b i i p i a i p i b i. (2) The Chebyshev iequality ca be reversed: If a 1 a 2 a ad b 1 b 2 b the p i p i a i b i p i a i p i b i. (3) If at least two of the a i ad at least two of the b i are distict the both iequalities (2) ad (3) are strict. For ay two fiite sequeces of real umbers a = (a 1,..., a ) ad b = (b 1,..., b ) ad ay strictly positive ad fiite sequece of real umbers p = (p 1,..., p ), the so-called Korkie s idetity holds (see [9]): p i p i a i b i p i a i p i b i = 1 2 j=1 p i p j (a i a j )(b i b j ). (4) This equality is easy to check directly by rewritig the right-had side of the equatio. Before statig the defiitio of prepoderatly icreasig/decreasig data, let us first recall that, for the give data (, x i, y i ), i = 1,...,, the correspodig weighted regressio lie is give by y = a x + b, where a = b = w i x i y i w ix i y i w i x 2 i ( w, ix i ) 2 y i a x i w. i Defiitio 1 (see [13]). The data (, x i, y i ), i = 1,...,, are said to have the prepoderat icrease (respectively decrease) property if the slope a of the associated weighted liear tred is positive (respectively egative). If the slope is equal to zero, the the data is said to be prepoderatly statioary.

272 Darija Marković The property of prepoderat icrease/decrease is also referred to as a essetial icrease/decrease property (see e.g. [2]). I geeral, this property depeds o both data (x i, y i ) ad weights. We will illustrate this i the ext simple example. Example 1. Let the data x = (2, 5, 8) ad y = (2, 5, 2) be give. We will make the followig choices of weight: w 1 = (1, 1, 1), w 2 = (1, 1, 1 2 ) ad w 3 = ( 1 2, 1, 1). For the first choice of weights, the slope of the regressio lie is equal to zero, hece the data are prepoderatly statioary. For the secod choice of weights, the slope of the regressio lie is equal to 1 7, ad the data have the property of prepoderat icrease. The last choice of weights for the slope gives the value of 1 7, ad hece i this case the data have the property of prepoderat decrease. y 6 y 6 y 6 4 4 4 2 2 2 2 4 6 8 10 (a) (w 1, x, y), y = 3 x 2 4 6 8 10 x (b) (w 2, x, y), y = 1 7 x + 16 5 2 4 6 8 10 x (c) (w 3, x, y), y = 1 7 x + 4 Figure 1: The data ad associated liear treds for differet choices of weights Give that the deomiator of a is strictly positive as x i satisfy (1), the data will be prepoderatly icreasig if ad oly if the followig coditio is satisfied Q(w, x, y) = x i y i x i y i > 0. By usig (4), that coditio ca be writte i the followig form: Q(w, x, y) = j=1 w j (x i x j )(y i y j ) > 0. (5) The above discussio is stated i the followig propositio (see also [13]). Propositio 1. The data (, x i, y i ), i = 1,...,, have the property of prepoderat icrease if ad oly if the Chebyshev iequality holds: x i y i > x i y i. Similarly, property of prepoderat decrease is equivalet to the opposite iequality. The followig results are direct cosequeces of Propositio 1 (see also [2]). Propositio 2. Effect of some trasformatio of the data.

Prepoderatly icreasig/decreasig data i regressio aalysis 273 (a) If the data (, x i, y i ), i = 1,...,, are prepoderatly icreasig (decreasig), the the data (, x i, y i ) are prepoderatly decreasig (icreasig). (b) If the data (, x i, y i ), i = 1,...,, are prepoderatly icreasig (decreasig), the the data (, x i, y i ) are prepoderatly decreasig (icreasig). (c) If the data (, x i, y i ), i = 1,...,, are prepoderatly icreasig (decreasig) ad 0 < x 1, the the data ( x i, x i, y i ), where x i = 1 x i, are prepoderatly decreasig (icreasig). (d) If the data (, x i, y i ), i = 1,...,, are prepoderatly icreasig (decreasig) ad 0 < y 1 ( 0 < y ), the the data ( y i, x i, ỹ i ), where ỹ i = 1 y i, are prepoderatly decreasig (icreasig). I regressio aalysis the most ofte case is whe values of idepedet variable are distict, i.e. x 1 < x 2 < < x. The followig theorem treats just that case. Theorem 1. Suppose we are give the o-costat data (x i, y i ), i = 1,...,, such that x 1 < x 2 < < x. The data are icreasig (or decreasig) if ad oly if the data (, x i, y i ) have the property of prepoderat icrease (or prepoderat decrease) for ay choice of weights > 0, i = 1,...,. Proof. We will prove the Theorem 1 oly for the case whe data are icreasig; the proof for the case whe data are decreasig would be doe i a similar way. Therefore suppose that the data (x i, y i ), i = 1,...,, are icreasig. The (x i x j )(y i y j ) 0. Sice the data are icreasig, strict iequality appears for at least oe pair (i, j). Due to this fact, for ay choice of weights > 0 iequality (5) holds, ad by defiitio, the data have the property of prepoderat icrease. Suppose that the data (, x i, y i ), i = 1,..., are prepoderatly icreasig for ay choice of weights > 0, i = 1,...,. Let i 1, i 2 {1,..., } such that i 1 < i 2. It is ecessary to show that y i1 y i2 ad y 1 < y. To do this, for each k N let us defie { 1 := k, i {1,..., } \ {i 1, i 2 } 1, i {i 1, i 2 }.

274 Darija Marković By assumptio, we have Q(w, x, y) = j=1 = 1 k 2 w j (x i x j )(y i y j ) i i 1,i 2 (x i x j )(y i y j ) j=1 j i 1,i 2 + 1 (x i1 x j )(y i1 y j ) k j=1 j i 2 + 1 (x i2 x j )(y i2 y j ) k j=1 j i 1 +2(x i2 x i1 )(y i2 y i1 ) > 0, k N, wherefrom by passig to the limit whe k we obtai (x i2 x i1 )(y i2 y i1 ) 0 Sice x i2 x i1 > 0, we have y i2 y i1 0. It remais to show that y 1 < y. Ideed, if y 1 = y, the y 1 = y 2 = = y, implyig that the slope of the weighted liear tred is equal to zero which cotradicts the assumptio. Remark 1. The followig result which was used i [3] for proof of the existece of the LSE ca ow be viewed as a cosequece of Theorem 1 for a special choice of weights: Suppose that the data (, x i, y i ), i = 1,...,, are such that 0 < x 1 < x 2 < < x ad > 0, i = 1,...,. The (i) If the sequece (y 1,..., y ) icreases, the x i y i with the iequality holdig for y 1 < y. (ii) If the sequece (y 1 /x 1,..., y /x ) decreases, the y i x i x 3 i x 2 i with the iequality holdig for y 1 /x 1 > y /x. y i x i 0, (6) y i x 2 i 0, (7) Coditio (6) meas that the data ( /x i, x i, y i ), i = 1,...,, have the property of prepoderat icrease, ad similarly, coditio (7) meas that the data ( x 2 i, x i, y i /x i ), i = 1,...,, have the property of prepoderat decrease.

Prepoderatly icreasig/decreasig data i regressio aalysis 275 3. Some applicatios of prepoderat icrease property i regressio aalysis I this sectio we will give a brief overview of a femportat oliear least square regressio models. It has bee show that for these models the property of prepoderat icrease/ decrease is crucial for the existece of the LSE. Example 2. The mathematical model described by a expoetial fuctio f(x; b, c) = be cx, b, c R or a liear combiatio of such fuctios is ofte used i applied research, e.g. biology, chemistry, electrical egieerig, ecoomy, astroomy, uclear physics, etc. I [7], it was show that the LSE exists, provided the data satisfy either the coditio of prepoderat icrease or the coditio of prepoderat decrease. Example 3. The geeralized logistic fuctio (or asymmetric S fuctio) f(x; b, c) = A, A, b, c, γ > 0 (1 + be cγx ) 1/γ occurs frequetly i various applied areas, such as biology, marketig, ecoomics, etc. The existece of the LSE for the geeralized logistic fuctio is cosidered i [8], where it was proved that the LSE exists if i additio to the atural coditio o the data, it is eough to require that the data are prepoderatly icreasig. Example 4. The Michaelis-Mete ezyme kietic model f(x; a, b) = ax, a, b > 0, b + x is widely used i biochemistry, pharmacology, biology ad medical research. The followig theorem gives sufficiet coditios for the existece of the LSE; the proof ca be foud i [3]. Theorem 2. Let the data (, x i, y i ), i = 1,..., m, m 3, be give, such that 0 < x 1 x 2... x m, x 1 < x m ad y i > 0, i = 1,..., m. If the data fulfill iequalities (6) ad (7), the the LS estimate for the Michaelis-Mete fuctio exists. 4. Coclusio The aim of this paper was to thoroughly ivestigate the property of prepoderat icrease/decrease of data. This property was foud very useful i solvig the existece problem for some importat oliear model fuctios. The theoretical improvemet of results give i [13] is stated i Theorem 1; that is the descriptio of the icreasig/decreasig data by prepoderatly icreasig/decreasig data. Ackowledgemet I am grateful to the aoymous reviewers ad the editor for their valuable suggestios, costructive commets ad their efforts i improvig this paper.

276 Darija Marković Refereces [1] Bates, D.M. ad Watts, D.G. (1988). Noliear Regressio Aalysis ad Its Applicatios. New York: Wiley. [2] Dubeau, F. ad Mir, Y. (2007). O discrete least squares polyomial fit, liear spaces ad data classificatio. J. Math. Stat., 3, 222-227. [3] Hadeler, K.P., Jukić, D. ad Sabo, K. (2007). Least squares problems for Michaelis Mete kietics. Math. Meth. Appl. Sci., 30, 1231-1241. [4] Hardy, G.H., Littlewood, J.E. ad Pòlya, G. (1988). Iequalities, 2d ed. Cambridge: Cambridge Uiversity Press. [5] Jukić, D., Sabo, K. ad Scitovski, R. (2007). Total least squares fittig Michaelis- Mete ezyme kietic model fuctio. J. Comput. Appl. Math., 201, 230-246. [6] Jukić, D. ad Scitovski, R. (1998). Existece results for special oliear total least squares problem. J. Math. Aal. Appl., 226, 348-363. [7] Jukić, D. ad Scitovski, R. (1997). Existece of optimal solutio for expoetial model by least squares. J. Comput. Appl. Math., 78, 317-328. [8] Jukić, D. ad Scitovski, R. (1996). The existece of optimal parameters of the geeralized logistic fuctio. Appl. Math. Comput., 77, 281-294. [9] Korkie, A.N. (1882). Ob odom opredeleom itegrale, (Russia). Math. Sborik, 10, 571-572. [10] Mitriović, D.S., Pečarić, J. ad Fik, A.M. (1993). Classical ad New Iequalities i Aalysis. Dordrecht: Kluwer. [11] Ratkowsky, D.A. (1989). Hadbook of Noliear Regressio Models. New York: Dekker. [12] Ross, G.J.S. (1990). Noliear Estimatio. New York: Spriger Verlag. [13] Scitovski, R. (1988). Coditio of prepoderat icrease ad Tchebycheffs iequality. I Jovaović, B. S. (Ed.). Proceedigs of VI. Coferece o Applied Mathematics (pp. 189-194). Belgrade. [14] Scitovski, R. (1988). Some special oliear least squares problems. Radovi matematički, 4, 279-298. [15] Seber, G.A.F. ad Wild, C.J. (1989). Noliear Regressio. New York: Wiley.