Peter Ochieg Taita Taveta University College P.O.Box Solomon Mwanjele Mwagha Taita Taveta University College P.O.Box Voi Kenya

Similar documents
DROUGHT ASSESSMENT USING SATELLITE DERIVED METEOROLOGICAL PARAMETERS AND NDVI IN POTOHAR REGION

OVERVIEW OF IMPROVED USE OF RS INDICATORS AT INAM. Domingos Mosquito Patricio

Chiang Rai Province CC Threat overview AAS1109 Mekong ARCC

Country Presentation-Nepal

El Nino 2015 in South Sudan: Impacts and Perspectives. Raul Cumba

NATIONAL HYDROPOWER ASSOCIATION MEETING. December 3, 2008 Birmingham Alabama. Roger McNeil Service Hydrologist NWS Birmingham Alabama

Interannual variation of MODIS NDVI in Lake Taihu and its relation to climate in submerged macrophyte region

DROUGHT INDICES BEING USED FOR THE GREATER HORN OF AFRICA (GHA)

1. INTRODUCTION 2. HIGHLIGHTS

WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and Rainfall For Selected Arizona Cities

YACT (Yet Another Climate Tool)? The SPI Explorer

El Nino 2015: The Story So Far and What To Expect Next

El Nino 2015: The Story So Far and What To Expect Next

Variability of Reference Evapotranspiration Across Nebraska

Geostatistical Analysis of Rainfall Temperature and Evaporation Data of Owerri for Ten Years

P7: Limiting Factors in Ecosystems

Application of remote sensing for agricultural disasters

Applications/Users for Improved S2S Forecasts

SOIL MOISTURE MODELING USING ARTIFICIAL NEURAL NETWORKS

Changing Hydrology under a Changing Climate for a Coastal Plain Watershed

Climate also has a large influence on how local ecosystems have evolved and how we interact with them.

Analysis of Rainfall and Other Weather Parameters under Climatic Variability of Parbhani ( )

I C P A C. IGAD Climate Prediction and Applications Centre Monthly Climate Bulletin, Climate Review for April 2018

Operational Practices in South African Weather Service (SAWS)

Communicating Climate Change Consequences for Land Use

Research Article Weather Forecasting Using Sliding Window Algorithm

I C P A C. IGAD Climate Prediction and Applications Centre Monthly Climate Bulletin, Climate Review for September 2017

GAMINGRE 8/1/ of 7

CHAPTER-11 CLIMATE AND RAINFALL

Climate Change Impact Assessment on Indian Water Resources. Ashvin Gosain, Sandhya Rao, Debajit Basu Ray

DROUGHT RISK EVALUATION USING REMOTE SENSING AND GIS : A CASE STUDY IN LOP BURI PROVINCE

Analysis of Historical Pattern of Rainfall in the Western Region of Bangladesh

Prairie Climate Centre Prairie Climate Atlas. Visualizing Climate Change Projections for the Canadian Prairie Provinces

REPORT ON SOWING PERIOD AND FLOODING RISKS IN DOUALA, CAMEROON. Cameroon Meteorological Department

Typical Hydrologic Period Report (Final)

Spatial Drought Assessment Using Remote Sensing and GIS techniques in Northwest region of Liaoning, China

Science Standard 1: Students analyze monthly precipitation and temperature records, displayed in bar charts, collected in metric units (mm).

Analyzing and Visualizing Precipitation and Soil Moisture in ArcGIS

Promoting Rainwater Harvesting in Caribbean Small Island Developing States Water Availability Mapping for Grenada Preliminary findings

International Journal of Research in Computer and Communication Technology, Vol 3, Issue 7, July

Monthly Prediction of the Vegetation Condition: A case for Ethiopia

UPPLEMENT A COMPARISON OF THE EARLY TWENTY-FIRST CENTURY DROUGHT IN THE UNITED STATES TO THE 1930S AND 1950S DROUGHT EPISODES

Drought History. for Southeast Oklahoma. Prepared by the South Central Climate Science Center in Norman, Oklahoma

2015 Fall Conditions Report

Use of satellite information in research and operational activities at NIMH of Bulgaria

Drought History. for the Northern Mountains of New Mexico. Prepared by the South Central Climate Science Center in Norman, Oklahoma

El Nino: Outlook VAM-WFP HQ September 2018

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

The 2010/11 drought in the Horn of Africa: Monitoring and forecasts using ECMWF products

2003 Water Year Wrap-Up and Look Ahead

Climatography of the United States No

UWM Field Station meteorological data

Minnesota s Climatic Conditions, Outlook, and Impacts on Agriculture. Today. 1. The weather and climate of 2017 to date

Drought History. for the Oklahoma Panhandle. Prepared by the South Central Climate Science Center in Norman, Oklahoma

MISSION DEBRIEFING: Teacher Guide

Regional Drought and Crop Yield Information System to enhance drought monitoring and forecasting in Lower Mekong region

A decision support system for predicting seasonal rainfall variations in sub-humid and semi-arid high country areas

Intraseasonal Characteristics of Rainfall for Eastern Africa Community (EAC) Hotspots: Onset and Cessation dates. In support of;

Jackson County 2013 Weather Data

The Huong River the nature, climate, hydro-meteorological issues and the AWCI demonstration project

Agrometeorological activities in RHMSS

Utilization of seasonal climate predictions for application fields Yonghee Shin/APEC Climate Center Busan, South Korea

The Vegetation Outlook (VegOut): A New Tool for Providing Outlooks of General Vegetation Conditions Using Data Mining Techniques

CATCHMENT DESCRIPTION. Little River Catchment Management Plan Stage I Report Climate 4.0

Using Reanalysis SST Data for Establishing Extreme Drought and Rainfall Predicting Schemes in the Southern Central Vietnam

MORPH forecast for carrot flies

Drought History. for the Edwards Plateau of Texas. Prepared by the South Central Climate Science Center in Norman, Oklahoma

Weather and climate outlooks for crop estimates

Climate Change and Arizona s Rangelands: Management Challenges and Opportunities

The Colorado Drought : 2003: A Growing Concern. Roger Pielke, Sr. Colorado Climate Center.

Workshop on Drought and Extreme Temperatures: Preparedness and Management for Sustainable Agriculture, Forestry and Fishery

The North American Drought Monitor - The Canadian Perspective -

I C P A C IGAD Climate Prediction & Applications centre

US Drought Status. Droughts 1/17/2013. Percent land area affected by Drought across US ( ) Dev Niyogi Associate Professor Dept of Agronomy

CURRENT METHODS AND PRODUCTS OF SEASONAL PREDICTION IN KENYA

Jackson County 2018 Weather Data 67 Years of Weather Data Recorded at the UF/IFAS Marianna North Florida Research and Education Center

To Predict Rain Fall in Desert Area of Rajasthan Using Data Mining Techniques

Final Report. COMET Partner's Project. University of Texas at San Antonio

September 2016 No. ICPAC/02/293 Bulletin Issue October 2016 Issue Number: ICPAC/02/294 IGAD Climate Prediction and Applications Centre Monthly Bulleti

Crop and pasture monitoring in Eritrea

Indian National (Weather) SATellites for Agrometeorological Applications

Funding provided by NOAA Sectoral Applications Research Project CLIMATE. Basic Climatology Colorado Climate Center

ICPAC. IGAD Climate Prediction and Applications Centre Monthly Bulletin, May 2017

Climate Change: bridging scientific knowledge and public policy. Forum Parliament House, Canberra 18 March 2010

Climatography of the United States No

Climatography of the United States No

Analysis of Data Mining Techniques for Weather Prediction

Jackson County 2014 Weather Data

Drought History. for the Low Rolling Plains of Texas. Prepared by the South Central Climate Science Center in Norman, Oklahoma

REMOTELY SENSED INFORMATION FOR CROP MONITORING AND FOOD SECURITY

El Nino: Outlook 2018

Climate Risk Management and Tailored Climate Forecasts

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Transcription:

Comparison of Nearest Neighbor (ibk), Regression by Discretization and Isotonic Regression Classification Algorithms for Precipitation Classes Prediction Solomon Mwanjele Mwagha Taita Taveta University College P.O.Box 308-80300 Voi Kenya Masinde Muthoni Central University of Technology Bloemfontein South Africa Peter Ochieg Taita Taveta University College P.O.Box 635-80300 ABSTRACT Selection of classifier for use in prediction is a challenge. To select the best classifier comparisons can be made on various aspects of the classifiers. The key objective of this paper was to compare performance of nearest neighbor (ibk), regression by discretization and isotonic regression classifiers for predicting predefined precipitation classes over Voi, Kenya. We sought to train, test and evaluate the performance of nearest neighbor (ibk), regression by discretization and isotonic regression classification algorithms in predicting precipitation classes. A period of 1979 to 2008 daily Kenya Meteorological Department historical dataset on minimum/maximum temperatures and precipitations for Voi station was obtained. Knowledge discovery and data mining method was applied. A preprocessing module was designed to produce training and testing sets for use with classifiers. Isotonic Regression, K-nearest neighbours classifier, and RegressionByDiscretization classifiers were used for training training and testing of the data sets. The error of the predicted values, root relative squared error and the time taken to train/build each classifier model were computed. Each classifier predicted output classes 12 s in advance. Classifiers performances were compared in terms of error of the predicted values, root relative squared error and the time taken to train/build each classifier model. The predicted output classes were also compared to actual classes. Classifier performances to actual precipitation classes were compared. The study revealed that the nearest neighbor classifier is a suitable for training rainfall data for precipitation classes prediction. General Terms Clasification Algorithms, Data Mining, Knowledge Discovery Keywords Regression by discretization, isotonic regression, nearest neighbor(ibk), precipitation prediction, classification algorithms, classifier performance 1. INTRODUCTION There has been a lot of research aimed at precipitation predictions over selected regions where solutions were based on traditional, statistical and modern computational methods or a combination. Though these precipitation predictions were useful in overall region rainfall picture depiction, challenges exist for the prediction of quantity classes of precipitation for fixed range durations for instance weekly or ly quantities in a. Successful prediction of precipitation in fixed range durations can aid in selection of activities during rainy seasons such as cropping where different crops require different water requirements, or selection of a grazing land for nomadic pastoralists for a particular duration of time. Classification algorithms continue to play a big role in prediction of events based on historical data. In order to predict precipitation classes in advance algorithm performances must compared and the best one selected. 2. LITERATURE REVIEW A study on drought forecasting [2] analyzed rainfall frequencies using data from 248 rain gauges (1938-2005). SPI was determined using ANN feed feedfoward and back propagation algorithm. The findings showed that the result of ANN is suitable for drought forecast. Another study aimed at comparing ANN and ANFIS in precipitation prediction [2] realized ANN efficient in rain prediction. predicting agricultural drought [12] was done using 1880-2005 rain data to analyze agricultural drought. By applying fuzzy sets analysis on the condition of crops and valid rain history, result of fuzzy clustering obtained. Drought s extracted from fuzzy clustering results. Time series used to predict next drought. A study in [8] was aimed at translating seasonal forecast to agricultural terms using crop simulation model to translate seasonal forecast to agricultural terms. The results offered support to farmer s climate risk management. In China rainfall was predicted by direct determination of surface soil moisture using microwave observation [11] where data was acquired and analyzed over several test sites. The study was validated by conducted large field experiments. Agricultural drought was predicted in paddy fields using remotely sensed data [7] where NDVI was found to be reasonable in detecting agricultural drought. The study was limited by insufficient data as fuzzy was done in non cropping time. Metrological conditions causing drought were evaluated using the differentiate between precipitation & evapotranspiration to evaluate metrological conditions causing drought [4]. In USA historical patterns for drought were identified using VegOut Model that integrated Climate Ocean, satellite indicators[10]; regression trees were used to identify historical patterns for drought intensely and vegetation. SPI and PDSI were used to represent climate vulnerability. This study was evaluated using 2006 drought. Unlike previous studies this paper contributes on prior work by considering crop production history and weather data history together with classification algorithms to come up with precipitation classes. Our work provides future classes projections with a limit of twelve s in advance predictions. By borrowing from previous studies this research 44

emphasis is on comparison of classification algorithms in rainfall prediction in order to select the best. 3. METHODOLOGY A period of 1979 to 2008 daily KMD historical dataset on minimum/maximum temperatures and precipitations for Voi KMD station was obtained. Next the knowledge Discovery and Data mining (KDD) process steps were applied. A preprocessing module was designed to produce training and testing sets of files for use with the classifiers. Three classifiers (isotonic regression, k-nearest neighbours classifier, and regression by discretization) were used for training training and testing of the data sets. A knowledge flow was implemented for each of the three classifiers. The Waikato Environment for Knowledge Analysis (WEKA) and Java programming environment (JCreator and Net Beans) were used. 4. RESULTS A preprocessing module was designed to produce two sets of files for use with the Weka Knowledge flow each with five attributes namely:- Year, Month, Scaled precipitation values (range: 0 to 1), Precipitation class values, Index class (range: -2 to 2). Three classifiers (Isotonic Regression, K-nearest neighbours classifier, and RegressionByDiscretization) were considered for training training and testing of the data sets. The classifiers produced output classes with the following attributes:- Precipitation class values with removal filtered applied, Index class identified as the class variable. The output of each of the classifiers is as follows:- Isotonic Regression This algorithm learns an isotonic regression model to pick the attribute that result in the lowest squared error. It does not allow missing values and can only deal with numeric attributes. It considers the monotonically increasing case as well as the monotonically decreasing case. The running information on using this classifier is as follows:- Scheme: weka.classifiers.functions.isotonicregression Isotonic regression Based on attribute: prediction: -2 cut point: 0 prediction: cut point: 0.08 prediction: -1 cut point: 0.11 prediction: -0.5 cut point: 0.19 prediction: 0 cut point: 0.37 prediction: 0.5 cut point: 0.46 prediction: 1 cut point: 0.56 prediction: 1.5 cut point: 0.65 prediction: 2 Time taken to build model: 0.03 seconds Correlation coefficient 0.9993 Class complexity order 0 932.1235 bits 2.5057 bits/instance Class complexity scheme 24.4368 bits 0.0657 bits/instance Complexity improvement (Sf) 907.6867 bits 2.44 bits/instance Mean absolute error 0.0027 Root mean squared error 0.0367 Relative absolute error 0.328 % Root relative squared error 3.7127 % The error of the predicted values for numeric classes is 0.0027 The root relative squared error is 3.7127% K-nearest neighbor classifier K-nearest neighbour classifier can select appropriate value of K based on cross-validation. It can also do distance weighting using a simple distance measure to find the training instance closest to the given test instance, and predicts the same class as this training instance. If multiple instances are the same (smallest) distance to the test instance, the first one found is used. The running information for this classifier is as follows:- Scheme: weka.classifiers.lazy.ibk -K 1 -W 0 -X -A "weka.core.neighboursearch.linearnnsearch -A \"weka.core.euclideandistance -R 3,4\" -P" 45

IB1 instance-based classifier Using 1 nearest neighbour(s) for classification Time taken to build model: 0 seconds Correlation coefficient 0.9993 Class complexity order 0 936.637 bits 2.5178 bits/instance Class complexity scheme 22.9989 bits 0.0618 bits/instance Complexity improvement (Sf) 913.6381 bits 2.456 bits/instance Mean absolute error 0.0027 Root mean squared error 0.0367 Relative absolute error 0.3275 % Root relative squared error 3.7048 % The error of the predicted values for numeric classes is 0.0.0027 The root relative squared error is 3.7048% Regression By Discretization Regression by discretization is a scheme that employs any classifier on a copy of the data that has the class attribute (equal-width) discretized. The predicted value is the expected value of the mean class value for each discretized interval (based on the predicted probabilities for each interval). The base classifier used is J48 Class for generating a pruned or unpruned C4.5 decision trees. The output of this classifier is as follows:- Scheme: weka.classifiers.meta.regressionbydiscretization -B 10 -W weka.classifiers.trees.j48 -- -C 0.25 -M 2 Confidence factor for pruning is 0.25, to use binary splits and restrict the minimum number of instances in a leaf to 2 (grow the tree fully). The name of the relation contains in it the name of data file used to build it, and the names of filters that removes the fourth attribute Regression by discretization Class attribute discretized into 10 values Classifier spec: weka.classifiers.trees.j48 -C 0.25 -M 2 J48 pruned tree ------------------ <= 0.073766 <= 0: '(-inf--1.6]' (45.0) > 0: '(-1.6--1.2]' (177.0) > 0.073766 <= 0.186199 <= 0.107971: '(-1.2--0.8]' (19.0) > 0.107971: '(-0.8--0.4]' (40.0) > 0.186199 <= 0.364366: '(-0.4-0]' (47.0) > 0.364366 <= 0.555324 <= 0.447353: '(0.4-0.8]' (12.0) > 0.447353: '(0.8-1.2]' (12.0) > 0.555324 <= 0.650506: '(1.2-1.6]' (14.0) > 0.650506: '(1.6-inf)' (6.0) Number of Leaves: 9 Size of the tree: 17 Time taken to build model: 0.05 seconds Correlation coefficient 0.9974 Class complexity order 0 2.5178 bits/instance 936.637 bits Class complexity scheme 71.8388 bits 0.1931 bits/instance Complexity improvement (Sf) 864.7982 bits 2.3247 bits/instance Mean absolute error 0.0108 Root mean squared error 0.0733 Relative absolute error 1.3101 % Root relative squared error 7.4096 % The error of the predicted values for numeric classes is 0.0108 The root relative squared error is 7.4096% 46

predicted class class International Journal of Computer Applications (0975 8887) Evaluation Comparison of classifiers 2009 predictions to 2009 actual precipitation classes Month Actual 2009 2009 Class predicted by classifiers Ibk Isotonic Regression By 0 5 Step 10 Step Regression 5 Step 10 Step Discretization -0.5 5 Step 10 Step Jan -0.5 2-1 2-1 -1 2-1 Feb -1-0.5-0.5-2 Mar -0.5 0-0.5 0-0.5 0 Apr 0.5 0 2 0.5 2 0.5 2 May -2-2 -2 Jun Figure -1 55: ibk classifier predictions -1 for 2009 compared to Jul -2-2 actual Aug -2 Source: study -2 classifiers results analysis Sep -2-2 The classifier -2 results in prediction showed that the IBk (knearest Oct 0 0-0.5-2 neighbor) -0.5 classifier was -2 the best when applied in the weka knowledge flow model for training and prediction. Nov 0 0 1.5 0 2 0 2 Apart from generating the best results compared to other Dec 1.5 0.5 1.5 0.5 classifiers 2 on 0.5 performing 10 2 fold cross validation the IBk Error of the predicted values 0.0027 0.0027 classifier took 0.0108 the least amount of time to build/train the Root relative squared error 3.7048% 3.7127% model. The 7.4096% comparison of precipitation predictions for 2009 with the actual figures for 2009 results indicated that the Time taken to train/build model 0 seconds 0.03 seconds predicted and 0.05 actual seconds results were comparable. Comparison of output classes from classifiers Graph of classifiers 2009 predictions outputs compared to 2009 actual 2 1.5 1 0.5-1 -2 clasifiers comparison 0 1 2 3 4 5 6 7 8 9 10 11 12-0.5 Actual class Class predicted by classifiers Ibk 5 Step Class predicted by classifiers Ibk 10 Step Class predicted by classifiers Isotonic Regression 5 Step Class predicted by classifiers Isotonic Regression 10 Step Class predicted by classifiers Regression By Discretization 5 Step Class predicted by classifiers Regression By Discretization 10 Step Figure 49: graph comparing classifiers outputs on 5 and 10 sample step with actual In figure 55 below the predicted classes are compared to the actual computed for 2009. The predictions correspond to most of the ly actual classes. The predictions were based on IBk classifier with 10 step sampling. 2 1.5 1 0.5 Precdicted compared to actual classes Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 5. CONCLUSION The precipitation class prediction output results obtained showed that the nearest neighbor classifier is a suitable tool for training meteorological data for precipitation classes. As part of machine learning the IBk classifier results accomplished intelligence through the knowledge discovery and data mining process as aimed in the study major objective. Evaluation of our study results shows that the Ibk classifier had the least error of the predicted values (0.0027) and the least root relative squared error (3.7048%) hence can be used to predict precipitation in advance with greater accuracy compared to the other two classifiers. The recommendation for designing a solution that can cater for precipitation predictions in multiple regions is open for this study as future work. In Kenya for instance all districts can be represented and in precipitation classes predicted. Finally with adjustments of our prediction predictions socioeconomic measures on droughts anticipations can be suggested e.g. early warning allowing systems can be developed. 6. REFERENCES [1] Ashok, M. et al, 2006. Linking Seasonal Climate Forecasts with Crop Simulation to Optimize Maize Management, CCSP Workshop: Climate Science in Support of Decision Making, 14-16 November 2005 Crystal Gateway Marriott Arlington, Virginia 14-16 November 2005 [2] Dostrani, M. et al (2010). Application of ANN and ANFIS Models on Dryland Precipitation Prediction (Case Study: Yazd in Central Iran). Journal of Applied Sciences, 10: 2387-2394. [3] Gong, Z. et al, 2010. Risk Prediction of Agricultural Drought in China. 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010).Kenya Meteorological Department, Agrometeorological bulletin, Issue No. 27/2009. Actual Predicted 47

[4] Kozyra, J. et al 2009. Institute Of Soil Science and Plant Cultivation National Research Institute, International Symposium, Climate change and Adaptation Options in Agriculture, Viena, June, 22-23 2009. [5] Ladislaus B. et al, 2010. Indigenous knowledge in seasonal rainfall prediction in Tanzania: A case of thesouth-western Highland of Tanzania, Journal of Geography and Regional Planning Vol. 3(4), pp. 66-72,April 2010. [6] Lin Zhu, Jing M. Chen, Qiming Qin, Mei Huang, Lianxi Wang, Jianping Li, Bao Cao: Assimilating Remote Sensing based Soil Moisture in an Ecosystem Model (BEPS) for Agricultural Drought Assessment.IGARSS (5) 2008: 437-440 [7] Niu Shulian; Susaki Junichi, 2006. Detection of Agricultural Drought in Paddy Fields Using NDVI from MODIS Data. A Case Study in Burirum Province, Thailand. [8] Patrick O, 2006. Agricultural Policy in Kenya: Issues and Processes, A paper for the Future Agricultures Consortium workshop, Institute of Development Studies, 20-22 March 2006. [9] Peter Reutemann, (2007). WEKA Knowledge Flow Tutorial for Version 3-5-7, University of Waikato 2007 [10] Tsegaye T. & Brian W. 2007. The Vegetation Outlook (VegOut): A New Tool for Providing Outlooks of General Vegetation Conditions Using Data Mining Techniques. ICDM Workshops 2007: 667-672 [11] Z. (Bob) Su, Y. Chen, M. Menenti, J. Sobrino, Z.-L. Li, W. Verhoef, L. Wang, Y. Ma, L. Wan, Y. He, Q.H. [12] Liu, C. Li, J. WEN, R. van der Velde, M. van Helvoirt, W. Lin, X. Shan, 2007. Drought Monitoring and Prediction over China, In: Proceedings of the 2008 Dragon symposium, Dragon programme, final results, 2004-2007, Beijing, China 21-25 April 2008. IJCA TM : www.ijcaonline.org 48