Mathematical Modelling of RMSE Approach on Agricultural Financial Data Sets

Similar documents
Sugarcane Productivity in Bihar- A Forecast through ARIMA Model

Forecasting the Prices of Indian Natural Rubber using ARIMA Model

Forecasting Using Time Series Models

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Logistic Regression: Regression with a Binary Dependent Variable

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Morphological Markers Related to Sex Expression in Papaya (Carica papaya L.)

Subject CS1 Actuarial Statistics 1 Core Principles

Forecasting of Soybean Yield in India through ARIMA Model

Combining Regressive and Auto-Regressive Models for Spatial-Temporal Prediction

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

Generalized Linear Models

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Implementation of ARIMA Model for Ghee Production in Tamilnadu

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)

On Forecast Strength of Some Linear and Non Linear Time Series Models for Stationary Data Structure

THE APPLICATION OF GREY SYSTEM THEORY TO EXCHANGE RATE PREDICTION IN THE POST-CRISIS ERA

Do not copy, post, or distribute

Research Statement. Zhongwen Liang

Types of Statistical Tests DR. MIKE MARRAPODI

Development of the Regression Model to Predict Pigeon Pea Yield Using Meteorological Variables for Marathwada Region (Maharashtra)

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

FORECASTING FLUCTUATIONS OF ASPHALT CEMENT PRICE INDEX IN GEORGIA

Comparison of Estimators in GLM with Binary Data

Estimation of Parameters of Multiplicative Seasonal Autoregressive Integrated Moving Average Model Using Multiple Regression

On the Detection of Heteroscedasticity by Using CUSUM Range Distribution

Section on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS

Experimental Design and Data Analysis for Biologists

FORECASTING COARSE RICE PRICES IN BANGLADESH

Statistics in medicine

Statistics: A review. Why statistics?

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

A stochastic modeling for paddy production in Tamilnadu

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

STATISTICS-STAT (STAT)

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Forecasting 1 to h steps ahead using partial least squares

PONDICHERRY UNIVERSITY DEPARTMENT OF ECONOMICS

Nonlinear Characterization of Activity Dynamics in Online Collaboration Websites

FinQuiz Notes

Package rsq. January 3, 2018

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

EC821: Time Series Econometrics, Spring 2003 Notes Section 9 Panel Unit Root Tests Avariety of procedures for the analysis of unit roots in a panel

FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH

Cost analysis of alternative modes of delivery by lognormal regression model

Logistic Regression. Continued Psy 524 Ainsworth

Evaluation and Variability of Some Genotypes of Tomato (Lycopersicon esculentum Mill) for Horticultural Traits

AE International Journal of Multi Disciplinary Research - Vol 2 - Issue -1 - January 2014

Dependent Variable Q83: Attended meetings of your town or city council (0=no, 1=yes)

MULTINOMIAL LOGISTIC REGRESSION

Genetic variability, Heritability and Genetic Advance for Yield, Yield Related Components of Brinjal [Solanum melongena (L.

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University

Procedia - Social and Behavioral Sciences 109 ( 2014 )

Categorical Predictor Variables

Modified Procedure for Construction and Selection of Sampling Plans for Variable Inspection Scheme

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Population Growth and Economic Development: Test for Causality

Discrete Software Reliability Growth Modeling for Errors of Different Severity Incorporating Change-point Concept

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

8 Nominal and Ordinal Logistic Regression

Forecasting Egyptian GDP Using ARIMA Models

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data

Forecasting Area, Production and Yield of Cotton in India using ARIMA Model

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

International Journal of Mathematical Archive-7(2), 2016, Available online through ISSN

Chapter 8: Regression Models with Qualitative Predictors

UNIT-IV CORRELATION AND REGRESSION

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Forecasting Major Vegetable Crops Productions in Tunisia

Decisions on Multivariate Time Series: Combining Domain Knowledge with Utility Maximization

Classification of Forecasting Methods Based On Application

FORECASTING OF COTTON PRODUCTION IN INDIA USING ARIMA MODEL

Economics 308: Econometrics Professor Moody

What Explains the Effectiveness of Hot Spots Policing And What New Strategies Should we Employ?

Drought Assessment under Climate Change by Using NDVI and SPI for Marathwada

The Retraction Penalty: Evidence from the Web of Science

Pseudo-score confidence intervals for parameters in discrete statistical models

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Journal of International

Christopher Dougherty London School of Economics and Political Science

Trend and Variability Analysis and Forecasting of Wind-Speed in Bangladesh

Applied Time Series Topics

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

HANDBOOK OF APPLICABLE MATHEMATICS

Published: 26 April 2016

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

Bootstrapping the Grainger Causality Test With Integrated Data

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

11-2 Multinomial Experiment

Transcription:

Available online at www.ijpab.com Babu et al Int. J. Pure App. Biosci. 5 (6): 942-947 (2017) ISSN: 2320 7051 DOI: http://dx.doi.org/10.18782/2320-7051.5802 ISSN: 2320 7051 Int. J. Pure App. Biosci. 5 (6): 942-947 (2017) Research Article Mathematical Modelling of RMSE Approach on Agricultural Financial Data Sets S. K. Khadar Babu 1*, M. Sunitha 2 and M. V. Ramanaiah 3 1 Assistant Professor Senior, Department of Mathematics, SAS, VIT University, Vellore, TN, India 2 Department of Statistics, SPMVV, Tirupathi, A.P., India 3 Professor, Department of Statistics, Sri Venkateswara University, Tirupati, A.P, India *Corresponding Author E-mail: khadarbabu.sk@vit.ac.in Received: 28.09.2017 Revised: 26.10.2017 Accepted: 30.10.2017 ABSTRACT In Statistics, Root Mean Square Error Approach (RMSE) plays an important role for Model fitting of agriculture data sets. The Present paper gives the whole information about the RMSE statistical analysis for different heterogeneous areas of data sets with standard parameters in agricultural applications. Initially, we fit a model for crops financial data sets and give complete statistical analysis through the RMSE approach. Finally, It attempts that it is the best measure to compare with the coefficient of determination. Key words: Model fit, squared deviations, chi square test. INTRODUCTION Agricultural science is a multidisciplinary field of broad areas of biology that emphasis the parts of exact, natural, economic and social sciences that are used in the practice and understanding of agriculture. Evaluation of association and independence between two categorical factors is a classic topic of interest in statistical inference. Pearson celebrated goodness of fit test yielded the chi square test in the analysis of contingency table. The Chi square test begins by hypothesizing that the distribution of variable behaves in a particular manner. Usually, every data set should follow a specific mathematical model in nature. Suppose that a variable has a frequency distribution with k categories into which the data has been grouped. The frequencies of the occurrence of the variable for each category are called the observed values. The manner in which the chi square goodness of fit test works is to determine, how many cases there would be in each category if the sample data were distributed exactly according to the claim. These are termed the expected number of cases for each category. The total of the expected number of cases is always the made equal to the total of the observed number of cases. Linear regression is an approach for modelling the relationship between a scalar dependent variable and one or more explanatory variable. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable is called multivariate linear regression. Cite this article: Babu, S.K.K., Sunitha, M. and Ramanaiah, M.V., Mathematical Modelling of RMSE Approach on Agricultural Financial Data Sets, Int. J. Pure App. Biosci. 5(6): 942-947 (2017). doi: http://dx.doi.org/10.18782/2320-7051.5802 Copyright Nov.-Dec., 2017; IJPAB 942

Ordinary least square method is used to estimate the unknown parameters in a linear regression model. In the regression model developed, the response of the previous reading of the time period has become the predictor. In this case, the order of an auto regression is the number of the immediate preceding values in the series that are used to signals have used an autoregressive model to predict tea quality. An autoregressive model was developed based on the responses collected from the electronic tongue (ET) sensors and AR moving averages were also implemented. The responses obtained from the sensor are characterized with the tea samples from different qualities. Coefficients of the predict the values at the present time. This autoregressive model are used as the model is called First order auto regressive characteristics of the ET response model AR-1. Actually, the mathematical model depends on the yield variable of past qualities. corresponding to the different samples of tea quality. Proposed method is useful for the prediction of tea quality. The Root Mean Squared Error Shota Tanaka et al., in Footstep Approach (RMSE) is a vital content in the Modelling and Detection using locally analysis of different data homogeneous and heterogeneous data applications in science and engineering. In a mathematical modelling, we should obtain the fluctuations using the standardized error approach. In error analysis we consider the residual clustering techniques for model fitting in mathematical sciences. stationary autoregressive model developed a footstep based surveillance system. Accurate footstep analysis would be powerful tool in various applications. Lawrence Lin and Lynn. D (1996) developed by the new statistic, the Coefficient of Accuracy, Ca, for methods comparison. Actually, it illustrates the complete analytical When an old measurement method is solutions through the method of least squares. compared to a new measurement method or if Especially, in agricultural sciences, the the same method is compared in two coefficient of determination only the method laboratories, the Coefficient of Determination, for statistical analysis to obtain the percentage r2, is typically used to measure the of variations occurred in the data sets. In this case, we are not aware to overcome the identification of the percentage of fluctuations raised by the explanatory variable in bulk postings. The present article supports that the method of RMSE approach is comparatively relationship. However, r2 only measures the precision of the relationship. The newly developed statistic, Ca, measures the accuracy of the relationship. When these two statistics are combined together, they form a single statistic for both accuracy and precision called the Concordance Correlation Coefficient, etc. good enough in agricultural and medical Nagelkerke (1991), gives a sciences. We can identify and known error of generalization of the coefficient of the variables in any number of data sets and determination squared R to general regression we can easily predict the error by taking the model is discussed and proposed a different explanatory. The analysis of the statistical data can easily analyzed by using the approach. modification of earlier definition to allow. Durhan J L and L. Stockburger (1986) has used the coefficient of determination for his research work entitled Nitric Acid air REVIEW OF LITERATURE diffusion coefficient experimental The generalization of squared R was proposed determination. by Cox and Snell (1989, 208-209) and, apparently independently, by Magee (1990), but had been suggested earlier for binary response models by Maddala (1983). Pradip Saha et al., in tea quality prediction by autoregressive modelling of electronic tongue METHODS AND DISCUSSIONS Actually, here we define two variables one is Explanatory and the other is explained. Let us consider X and Y are the variables denotes explained and explanatory in this case. Present Copyright Nov.-Dec., 2017; IJPAB 943

article gives the importance of the concept root for model fitting. The following diagrams mean square error approach. In model fitting, show the appearance of the model in the data. the squared R plays a role but it had some Fitted model for the data is limitations to analyze the error in terms of Y=7602.1X-2E+07, where X is the years and percentage fluctuations. The RMSE directly Y subsidiaries. And squared R for model fit gives the standard error involved in the data was 0.0112 and exponential smoothing Y= exp and it is useful to infer that the model was best (0.0057X) and its squared R 0.1221. By fit. RMSE, we can get exact error occurred in the The data for analysis was taken from data sets and give the possible conclusions for the Central agricultural statistical Bulletins. any specified problem. In the present example Specifically, the two variables, one is year and squared r value is 0.8357, which means that the other is subsidy given to farmers for crops. the data contains 83% of the error occurred in For this data fit a model and illustrate the the data due to the effect of the given RMSE and squared R. Finally, compare the explanatory by taking exponential smoothing values and give appropriate findings. Every curve. By seeing the linear trend, the squared year Government of India gives rupees in R value is 0.0112, which means that contains crores for purchase of crops through subsidy very small error between the explained and mode for upliftment of the agriculture in India. explanatory variables. And the trend equation Now we are taking consecutive years of is given on the diagram. All possible and 2013,14,15,16, and 17 and their corresponding useful diagrams are drawn for writing subsidies in rupees given by the Government conclusions. Fig. 1: Bar diagram for subsidiaries Fig. 2: Trend line with squared R Copyright Nov.-Dec., 2017; IJPAB 944

Fig. 3: Exponential smoothing for subsidy in Rs. Fig. 4: Bar diagram for subsidy Fig. 5: Exponential smoothing Fig. 6: Linear trend with R squared Copyright Nov.-Dec., 2017; IJPAB 945

Fig. 7: Bardiagram for residuals Fig. 8: residual Linear trend line The fitted regression line Y = 51.208 X, where X is explained variable and Y is Explanatory. The RMSE = SQRT ( - ) 2 = 18546.23 The corresponding coefficient of variation is 0.0112, It gives the percentage of fluctuations. CONCLUSIONS AND FINDINGS In the present article. We estimate the values of squared R and RMSE for data taken from the statistics of the Government of agriculture, India. We observe that the values obtained through formulas of squared R and RMSE approach. RMSE gives the exact error occurred in the data but squared R does not give any error value but it give the value between 0 to 1 and convert it in to the percentages, then only possible to give percentage of the fluctuations occurred in the data. Ultimately, by comparison of the RMSE and squared R, slightly RMSE is comparatively superior to the squared R. In agriculture sciences, the people have aware about the squared R only. In engineering applications, they need exact error and then they have a easy way to reduce error. Engineering, the research was developed broadly compare with other areas. In agriculture, they are unable to think about to reduce the error but they need only the percentage of error occurred in the data sets. REFERENCES 1. Tanaka, S., Mizuno, K. and Itai, A., Footstep modeling and detection using locally stationary autoregressive model, 10th International Conference on Information, Communications and Signal Copyright Nov.-Dec., 2017; IJPAB 946

Processing (ICICS), Singapore. 1-5 determination, Biometrika, 78(3): 691-692 (2015). (1991). 2. Durhan, J.L. and Stockburger, L., Nitric 5. Cox, D.R., Regression models and Lifeacid-air diffusion coefficient experimental Tables, Journal of the Royal Statistical determination, Atmospheric Environment, Society. Series B (Methodological), 34(2): 20(3): 559-563 (1986). 187-220 (1972). 3. Lawrence, L., and Lynn, D., Torbeck, 6. Cox, D.R., Partial likelihood, Biometrika, Coefficient of Accuracy and Concordance 62(2): 269-276 (1975). Correlation Coefficient: New Statistics for 7. Maddala, G.S., Limited-dependent and Methods Comparison, PDA Journal of qualitative variables in econometrics, Pharmaceutical Science And Technology, Cambridge University Press, New York, 52(2): 55-59 (1996). 1983. 4. Nagelkerke, N.J.D., A note on a general definition of the coefficient of Copyright Nov.-Dec., 2017; IJPAB 947