Statistical Modeling for Citrus Yield in Pakistan

European Journal of Scientific Research ISSN 1450-216X Vol.31 No.1 (2009, pp. 52-58 EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/ejsr.htm Statistical Modeling for Citrus Yield in Pakistan Atif Akbar E-mail: atifakber@yahoo.co.uk G. R. Pasha E-mail: drpasha@bzu.edu.pk Muhammad Aslam E-mail: aslamasadi@bzu.edu.pk Syed Khurram Arslan Wasti E-mail: arslan_bzu@hotmail.com Abstract In this paper, nonlinear modeling for citrus yield is carried out. Different approaches are used to test the validity of model. These include standard error, coefficient of determination, bias, intrinsic nonlinearity and parameter effect nonlinearity. Data for all provinces of Pakistan is obtained from Agricultural Statistics of Pakistan (2004-05 and seven different nonlinear models are then applied. From different models, Farazdaghi and Harris model is found to be the best to model yield density relationships. Keywords: Asymptotic theory; Bias; Citrus yield; Intrinsic nonlinearity; Nonlinear models; Parameter effect nonlinearity. 1. Introduction Citrus fruit is the first fruit crop of Pakistan in international trade in terms of value. There are two clearly differentiated markets in the citrus sector: fresh citrus fruits market, with a predominance of oranges, and processed citrus products market, mainly orange juice. According to Food and Agriculture Organization (FAO, 1999, Pakistan is the sixth largest producer of kinow (mandarin and oranges in the world, with 2.1 million tons of yield per annum. The share in world market for mandarin and oranges was 0.9 percent and 3.6 percent in terms of value and volume, respectively during the year 1997. Pakistan is also the largest producer of 'Citrus Reticula' variety (kinow, this unique variety of citrus is indigenous to this part of the world. According to an estimate approximately 95 percent of the total kinow produced all over the world is grown in Pakistan.

Statistical Modeling for Citrus Yield in Pakistan 53 In Punjab, the maximum production of citrus (orange in Sargodha, Rahim Yar Khan and Toba Tek Singh, respectively. In Sindh, the maximum production of citrus in Khairpur, Noshehra Feroz and Jaccobabad, respectively. In the NWFP, Swat, Mardan and Dir Lower are considered to give maximum citrus production while in Balouchistan, Turbat, Nasirabad and Kachi are famous for citrus (Agricultural Statistics of Pakistan, 2004-05. In order to model the citrus yield, we have to look for regression models. Applications of the regression analysis are numerous and occur almost in every field of life including engineering, life and biological sciences, physics, chemistry, economics, management and social sciences. Two types of system prevail in nature for such modeling i.e., linear and nonlinear. In linear models, parameters occur linearly but on the other hand, in nonlinear modeling, parameters attain the degree other than one. While estimating the linear models, the ordinary least square (OLS estimators hold Gauss-Markov criteria, but in nonlinear system this criteria cannot be met and LS estimators tend to be biased, nonnormally distributed, and have variances exceeding the minimum possible variance. To deal with such type of situation asymptotic theory comes to solve the problems of best linear unbiased (BLU estimation. When regression function is nonlinear in parameters, both theory and practice of estimation procedures are considerably more difficult. To overcome the problems discussed above Hartley (1961 suggested the methodology of nonlinear regression problems, i.e. with the numerical technique of computing least squares estimates as solution of nonlinear system. To quantify the nonlinear behavior and to access the adequacy of regression models with their estimation, many efforts have been made (see e.g., Beale, 1960; Guttman and Meeter, 1965. Gillis and Ratkowsky (1978 presented yield-density model and showed it to be fit for the nonlinear behavior of a model. Bates and Wates (1980 proposed another measure of nonlinearity with reference to the geometric behavior of the curvature. They have found that the nonlinearity has two components, named as intrinsic nonlinearity (IN and parameter effect (PE nonlinearity. They critically evaluated the work done by Beale (1960 and Box (1971 and showed that Beale s measure generally tend to underestimate the true nonlinearity, but the bias measure of Box is closely related to the parameter effect nonlinearity. In the present paper, our objective is to model the citrus yield of Pakistan. I the available literature in Pakistan, no such efforts have yet been made so far. For this purpose, we fit seven models, available in literature, namely, models given by Weibull (1951, Richards (1959, Bleasdale and Nelder (1960, Holiday (1960, Farazdaghi and Harris (1968, and Morgan et al (1975. For model selection criteria include standard error, coefficient of determination, bias, IN, and PE. In Section 2, we present the type of data used, its source, all the seven models to be used and the test of significance for IN and PE. Section 3 is reserved for results and discussions while Section 4 concludes the paper. 2. Materials and Methods The citrus (orange yield data used in this study, taken from Agricultural Statistics of Pakistan (2004-05, are in the form of production of citrus in different districts in Pakistan for the years 2000-01, 2001-02 and 2002-03. The data are about the yield (in tones per hectares of citrus in different area of Pakistan and then have been converted from hectares to acres with scale of 1000 by using the relation that each hectares of district multiplying with 2.471 and scaled to 1000. Production for each year, which is in the form of tones, converted into mounds by using the relation that each value of tones multiplying with 25 and then multiplying with numbers of trees/ acre then scaled to 1000. Following Malik (1999, the number of trees per acre is taken as 82. Bleasdale and Nelder (1960, Holiday (1960 and Farazdaghi and Harris (1968 proposed following models to describe the yield density relationship of different crops. Following Box (1971 and Bates and Wates (1980, we use the same models in our study. The deterministic component of these models is shown below while assuming additive error.

54 Atif Akbar, G. R. Pasha, Muhammad Aslam and Syed Khurram Arslan Wasti 1/ γ Y = ( α + βx (1 Y = ( X 2 1 α + βx + γ (2 γ 1 Y = ( α + βx (3 where equation 1, 2 and 3 represent Bleasdale and Nelder (BN, Holiday and Farazdaghi and Harris (FH model, respectively. We also use the sigmoidal growth models, again following Box (1971 and Bates and Wates (1980. These models are the logistic model, the model proposed by Richards (1959, the Morgan- Mercer-Flodin (MMF model by Morgan et al (1975, and the Weibull (1951 type model. These models are given below in equations 4, 5, 6 and 7, respectively. Y = α (4 1+ exp( β γx α Y = (5 1 [ 1+ exp( β γx ] βγ + αx Y = (6 γ + X Y = α β exp ( γx (7 For estimation of the above mentioned nonlinear models, Gauss-Newton Method is adopted (see Ratkowsky, 1983 and Draper and Smith, 1998, for more details. Bates and Wates (1980 and Box (1971 selected the suitable models on the basis of least standard error, high R 2, insignificant IN, PE and bias. Following Box (1971 and Bates and Wates (1980, the IN and PE measures are established as well as bias calculation is carried out for model selection. To test the significance of the IN and PE the 1 relation is used, where F is the critical value obtained from the F-distribution table. While to test 2 F the significance of bias, the percentage bias is compared with 1%. The lesser the bias from 1 % leads to the insignificance of the bias and vice versa as proposed by Box (1971. 3. Results The following section summarizes the results of data analysis. The models described in Section 2, from equation 1-7 are coded here as,,, respectively. The results are given here for four provinces of Pakistan for the year 2000-2001, with their estimated parameters, R 2, standard error, bias calculation, IN and PE measures. Also the standard errors of estimates are given in parenthesis.

Statistical Modeling for Citrus Yield in Pakistan 55 Table 3.1: Parameter Estimates for - for Punjab Models α β γ 2 R 0.1443 0.000008 0.2853 (0.0123 (0.1040 (0.0411 -- 0.8318 868.4612 0.4070 0.7927 0.00008 0.4695 1541.5910 (0.2010 (0.0100 (0.1210 -- 0.9899 213.1423 0.0007 0.1813 -.7709 (0.0091 (0.0052 (0.0012 -- 1294.63-3576.90 4.5712 (0.0910 (0.3110 (0.2140 -- 0.3200 2394.1390 151.89 0.7820 0.6399 0.6837 (0.1020 (0.2194 (0.1387 (0.0193 0.3100 2469.3412 678.28 616.3521 0.69.19-2458.8 (0.2110 (0.1104 (0.3105 (0.1143 0.2600 2150.3911 -.6132 38.9230 1.8310 1.0672 (0.0011 (0.0081 (0.0210 (0.0025 0.9851 259.5001 From Table 3.1, we report that (i.e. FH Model has least standard error (i.e. 213.1423 among all the models and highest R 2 (i.e. 0.9899 also (MMF model has a reasonable value of R 2 (i.e. 0.9851 with standard error 259.5001 Table 3.2: Parameter Estimates for - for Sindh Models α β γ 2 R 0.8896-0.0034 3.6812 (.081 (0.1010 (0.1105 -- 0.8487 0.9845 -.0038 (0.0910 (0.249 (0.2102 -- 0.0180 0.1597-23.9021 (.0011 (0.0312 (0.0451 -- 55.2220-16.8123 31.9320 (0.1030 (0.0401 (0.1530 -- 29.6810 0.9462-0.1169 0.0424 (0.1101 (0.1401 (0.2105 (0.0925 44774.50 22.0901-7.611 0.0009 (0.2109 (0.0120 (0.0610 (0.0525 1.0511-12.0500 1.0320 0.8896 (0.0126 (0.0091 (0.0100 (0.0021 0.5290 63.5315 0.0992 87.8520 0.8325 91.0500 0.3010 92.5601 0.2350 117.0251 0.5054 68.2806 0.8749 34.3324 From Table 3.2, it is clear that (i.e. MMF Model has least standard error (i.e., 34.33 among all the models and highest R 2 (i.e., 0.8749 but (i.e., FH Model has R 2 = 0.8325 with standard error 91.0500.. Table 3.3, again votes in favour of as this model has minimum standard error of 6.4620 with maximum R 2 of 0.9813 also have standard error (i.e., 6.5757 with R 2 = 0.9827. Moreover also provides reasonable R 2 of 0.9825 with standard error of 8.9124. Finally, once again, Table 3.4 reveals (i.e. FH Model to be best having least standard error (i.e. 2.0411 among all the models and highest R 2 (i.e., 0.9985. Moreover has standard error of 2.946 with R 2 = 0.9972.

56 Atif Akbar, G. R. Pasha, Muhammad Aslam and Syed Khurram Arslan Wasti Table 3.3: Parameter Estimates for - for NWFP Models α β γ 0.9624 (0.0451 0.0505 (0.0110 0.0002 (0.0010 37.2310 (0.2104 214.04 (0.0314 31782.0752 (0.12001 1.073 (0.0102 0.2371 (0.0504-0.00005 (0.0594 0.7641 (0.0095-13.5156 (0.1243-2.5611 (0.0192 93.6750 (0.0933-0.6413 (0.0211 3.0544 (0.1198 0.000001 (0.0438-0.9519 (0.0247 21.8411 (0.0906 0.0165 (0.0017-5.8273 (0.1266 2.1674 (0.0013 2 R -- 0.6459 29.2694 -- 0.8847 16.6932 -- 0.9827 6.4620 -- 0.2650 1491.8000 0.0222 (0.0927-0.00011 (0.0217 0.9190 (0.0124 0.9686 8.9124 0.6018 31.7004 0.9825 6.5757 Table 3.4: Parameter Estimates for - for Balouchistan Models α β γ 1.0111 (0.0121 0.0559 (0.1301 0.0022 (0.0121 30.92 (0.1818 18.11 (0.2100 25.25 (0.2414-0.6968 (0.0010 0.3018 (0.0290 0.0001 (0.0401 3.8041 (0.0164 28.48 (0.2015 1.9502 (0.1123-11.6705 (0.0835-0.2787 (0.0093 3.0845 (0.0626 0.0000007 (0.0917-1.2646 (0.0018 24.8256 (0.2015 0.3554 (0.0980 8.5736 (0.0567 1.7145 (0.0208 2 R -- 0.7846 24.2712 -- 0.7250 27.3700 -- 0.9985 2.0411 -- 0.2001 52.2915 0.3882 (0.0734-1.0711 (0.0.094 1.0342 (0.0110 0.2190 56.6645 0.1282 52.9656 0.9972 2.9463 Table 3.5: α β γ Intrinsic Nonlinearity and Parameter Effect Nonlinearity Measures of - for the Citrus Data of Punjab Punjab IN 0.2100 0.0011 0.0012 0.6301 0.5800 0.0658 0.0023 PE 0.2500 0.2910 0.1120 1.0210 1.0051 1.2001 0.1680 IN 0.1202 0.0321 0.0193 0.7800 0.6840 0.3501 0.1120 PE 0.2601 0.2319 0.1532 8.4300 14.0211 5.4005 0.2001 IN 0.2300 0.4200 0.0155 0.5600 0.9811 0.6800 0.1624 PE 0.4501 0.5980 0.2364 18.2300 28.0565 15.2013 0.2210 IN ---- ---- ---- ---- 0.2304 0.5000 0.1102 PE ---- ---- ---- ---- 12.6507 6.5801 0.3213 1 Table 3.5 gives the IN and PE measures with critical region; = 0.2926, where F (3, 32 = 2 F 4.48. From this table, we conclude that IN and PE are non-significant for models and for all the parameters except PE of for parameter. The rest of the models have significant IN and PE

Statistical Modeling for Citrus Yield in Pakistan 57 Table 3.6: Bias Calculation for M1-M7 for Punjab Punjab α 0.1269 0.4500 0.0021 10.0204-16.5903 9.6524 0.0126 β 0.3580-0.5800 0.0678 15.2632-20.3203 9.3655 0.2353 γ 0.0126-1.2800-0.3240 11.4602-25.0694 3.4927-0.6800 ---- ---- ---- ---- -6.0654-4.8907 0.0490 Table 3.6 shows that and have non-significant bias. Also the rest of the models have significant bias in their parameters. This shows that these two models i.e., FH model and MMF model behave linearly and hence are more plausible to fit for citrus yield data. Similar results hold for the yield data of the other provinces of Pakistan for IN, PE and bias so we do not include them in this discussion. 4. Conclusion In the following section we summarize the results and also the appropriate conclusions are drawn. To obtain the model under nonlinearity, we use different criteria named as R 2, standard error of residuals, bias, intrinsic nonlinearity (IN and parameter effect nonlinearity (PE. The comparison is established among seven (,,, different nonlinear models (i.e., yield-density models and sigmoidal growth models. After the analysis of the data for different districts of provinces of Pakistan, we have reasonable evidence in the favour of (Farazdaghi and Harris model. This model has the least standard error, high R 2, insignificant IN, PE and bias for all data sets of the four provinces of Pakistan among all nonlinear models discussed above except for the province Sindh where this model bears relatively larger standard error. Another model that is the potential candidate for selection is (i.e., Morgen-Mercer-Flodin model. In general, two models, namely, FH model and MMF model, fulfill the above mentioned criteria of selection and between these two we have selected FH model because of its commendable behavior among others. Moreover the FH model behaves like a linear model as the IN and PE calculations signifies, which are presented in table 3.5. The insignificant IN, PE, and bias of the parameters also support the conclusion drawn above.

58 Atif Akbar, G. R. Pasha, Muhammad Aslam and Syed Khurram Arslan Wasti References [1] Agricultural Statistics of Pakistan (2004-05. Govt. of Pakistan (www.pakistan.gov.pk. [2] Bates, D. M. and Watts, D. G. (1980. Relative Curvature Measure of Nonlinearity. J.R. Statist. Soc. Ser. B, 42, pp.1-25. [3] Beale, E. M. L. (1960. Confidence Region in Non-Linear Estimation. J. R. Stat. Soc. Ser. B, Vol.22, pp.41-76. [4] Bleasdale, J. K. A. and Nelder, J. A. (1960. Plant Population and Crop Yield. Nature 188, 342. [5] Box, M. J. (1971. Bias in Nonlinear Estimation. J. R. Stat. Soc. Ser. B, Vol.33, No.2, pp. 171-201. [6] Farazdaghi, H. and Harris, P. M. (1968. Plant Competition and Crop Yield. Nature 217, 289-290. [7] Food and Agriculture Organization (1999. Govt. of Pakistan (http://www.un.org.pk/fao. [8] Gillis, P. R. and Ratkowsky, D. A. (1978. The Behaviour of the Estimators of the Parameters of Various tield-density Relationship. Biometrics, 34, 191-198. [9] Guttman, I. and Meeter, D. A. (1965. On Beale s Measures of Non-Linearity. Technometrics, Vol.7, pp. 623-637. [10] Holiday, R. (1960. Plant Population and Crop Yield. Field Crop Abstr., 13, 159-167, 247-254. [11] Hartley, H.O. (1961. The Modification of Gauss-Newton Method for the Fitting of Nonlinear Regression Function by Least Squares. Technometrics, Vol.3, pp. 269-280. [12] Malik, N. M. (1999. Horticulture. National Book Foundation, Islamabad. [13] Morgan, P. H, Mercer, L. P., and Flodin, N. W. (1975. General Model for Nutritional esponses of Higher Organisms, Proc. Nat. Acad. Sci. U. S. A. 72, 4327-4331. [14] Ratkowsky, D. A. (1983. Nonlinear Regression Modeling. Marcel Dekker. [15] Richards, F. J. (1959. A Flexible Growth Function for Empirical Use. J. Exp. Biol. 10, 290-300. [16] Wiebull, W. (1951. A Statistical Distribution Function of Wide Applicability, J. Appl. Mech. 18, 293-296.