International Journal of Research in Computer and Communication Technology, Vol 3, Issue 7, July

Similar documents
WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and Rainfall For Selected Arizona Cities

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Analysis of Data Mining Techniques for Weather Prediction

Global Climates. Name Date

International Journal of Advance Engineering and Research Development. Review Paper On Weather Forecast Using cloud Computing Technique

YACT (Yet Another Climate Tool)? The SPI Explorer

Enhancement of Effective Spatial Data Analysis using R

Statistical Analysis of Temperature and Rainfall Trend in Raipur District of Chhattisgarh

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers

Changing Hydrology under a Changing Climate for a Coastal Plain Watershed

GAMINGRE 8/1/ of 7

Chiang Rai Province CC Threat overview AAS1109 Mekong ARCC

To Predict Rain Fall in Desert Area of Rajasthan Using Data Mining Techniques

2016 Meteorology Summary

Climate Change Impact Assessment on Indian Water Resources. Ashvin Gosain, Sandhya Rao, Debajit Basu Ray

A Report on a Statistical Model to Forecast Seasonal Inflows to Cowichan Lake

Recovery Analysis Methods and Data Requirements Study

PROJECT REPORT (ASL 720) CLOUD CLASSIFICATION

Analysis of Rainfall and Other Weather Parameters under Climatic Variability of Parbhani ( )

Drought in Southeast Colorado

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 4, May 2014

US Drought Status. Droughts 1/17/2013. Percent land area affected by Drought across US ( ) Dev Niyogi Associate Professor Dept of Agronomy

CHAPTER-11 CLIMATE AND RAINFALL

Communicating Climate Change Consequences for Land Use

Finding Multiple Outliers from Multidimensional Data using Multiple Regression

Technical note on seasonal adjustment for M0

Champaign-Urbana 1998 Annual Weather Summary

Study of Changes in Climate Parameters at Regional Level: Indian Scenarios

Multivariate Regression Model Results

Research Article Weather Forecasting Using Sliding Window Algorithm

Chapter 3. Regression-Based Models for Developing Commercial Demand Characteristics Investigation

Climate Forecasts and Forecast Uncertainty

PRELIMINARY DRAFT FOR DISCUSSION PURPOSES

Study of Hydrometeorology in a Hard Rock Terrain, Kadirischist Belt Area, Anantapur District, Andhra Pradesh

Analysis of Historical Pattern of Rainfall in the Western Region of Bangladesh

Climate Change and Arizona s Rangelands: Management Challenges and Opportunities

2.10 HOMOGENEITY ASSESSMENT OF CANADIAN PRECIPITATION DATA FOR JOINED STATIONS

REDWOOD VALLEY SUBAREA

AREP GAW. AQ Forecasting

Investigation of Rainfall Trend in Jorhat Town, Assam, India

COUNTRY REPORT. Jakarta. July, th National Directorate of Meteorology and Geophysics of Timor-Leste (DNMG)

NASA Products to Enhance Energy Utility Load Forecasting

Average temperature ( F) World Climate Zones. very cold all year with permanent ice and snow. very cold winters, cold summers, and little rain or snow

AN ASSESSMENT OF THE RELATIONSHIP BETWEEN RAINFALL AND LAKE VICTORIA LEVELS IN UGANDA

Jackson County 2018 Weather Data 67 Years of Weather Data Recorded at the UF/IFAS Marianna North Florida Research and Education Center

Climate also has a large influence on how local ecosystems have evolved and how we interact with them.

Chapter-3 GEOGRAPHICAL LOCATION, CLIMATE AND SOIL CHARACTERISTICS OF THE STUDY SITE

WATER AVAILABILITY ASSESSMENT IN SHIPRA RIVER

REPORT ON LABOUR FORECASTING FOR CONSTRUCTION

What Does It Take to Get Out of Drought?

2003 Moisture Outlook

Jackson County 2013 Weather Data

Drought History. for the Northern Mountains of New Mexico. Prepared by the South Central Climate Science Center in Norman, Oklahoma

Missouri River Basin Water Management Monthly Update

Missouri River Basin Water Management Monthly Update

Minnesota s Climatic Conditions, Outlook, and Impacts on Agriculture. Today. 1. The weather and climate of 2017 to date

Prairie Climate Centre Prairie Climate Atlas. Visualizing Climate Change Projections for the Canadian Prairie Provinces

P7.7 A CLIMATOLOGICAL STUDY OF CLOUD TO GROUND LIGHTNING STRIKES IN THE VICINITY OF KENNEDY SPACE CENTER, FLORIDA

Champaign-Urbana 2000 Annual Weather Summary

MAURITIUS SUGAR INDUSTRY RESEARCH INSTITUTE

CoCoRaHS Monitoring Colorado s s Water Resources through Community Collaborations

DESIGN AND DEVELOPMENT OF ARTIFICIAL INTELLIGENCE SYSTEM FOR WEATHER FORECASTING USING SOFT COMPUTING TECHNIQUES

Long Range Forecasts of 2015 SW and NE Monsoons and its Verification D. S. Pai Climate Division, IMD, Pune

Champaign-Urbana 2001 Annual Weather Summary

Colorado s 2003 Moisture Outlook

SPECIMEN. Date Morning/Afternoon. A Level Geography H481/01 Physical systems Sample Question Paper. Time allowed: 1 hour 30 minutes PMT

Missouri River Basin Water Management Monthly Update

Drought History. for Southeast Oklahoma. Prepared by the South Central Climate Science Center in Norman, Oklahoma

Application of Text Mining for Faster Weather Forecasting

HAIDA GWAII CLIMATE ASSESSMENT 2010 Special Report for MIEDS Franc Pridoehl

The Effect of Cloudy Days on the Annual Typical Meteorological Solar Radiation for Armidale NSW, Australia

Drought History. for the Oklahoma Panhandle. Prepared by the South Central Climate Science Center in Norman, Oklahoma

Process Behavior Analysis Understanding Variation

Using Reanalysis SST Data for Establishing Extreme Drought and Rainfall Predicting Schemes in the Southern Central Vietnam

Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting

The Climate of Payne County

Making a Climograph: GLOBE Data Explorations

The Climate of Oregon Climate Zone 3 Southwest Interior

Physical Features of Monsoon Asia. 192 Unit 7 Teachers Curriculum Institute 60 N 130 E 140 E 150 E 60 E 50 N 160 E 40 N 30 N 150 E.

Sierra Weather and Climate Update

Jackson County 2014 Weather Data

Rainfall Observations in the Loxahatchee River Watershed

Seasonal Climate Watch April to August 2018

Variability of Reference Evapotranspiration Across Nebraska

Geostatistical Analysis of Rainfall Temperature and Evaporation Data of Owerri for Ten Years

2003 Water Year Wrap-Up and Look Ahead

Drought History. for the Low Rolling Plains of Texas. Prepared by the South Central Climate Science Center in Norman, Oklahoma

Rainfall variation and frequency analysis study of Salem district Tamil Nadu

American International Journal of Research in Science, Technology, Engineering & Mathematics

Drought History. for the Edwards Plateau of Texas. Prepared by the South Central Climate Science Center in Norman, Oklahoma

The Climate of Texas County

Estimation of Solar Radiation at Ibadan, Nigeria

Interannual variation of MODIS NDVI in Lake Taihu and its relation to climate in submerged macrophyte region

Stream Discharge and the Water Budget

Climatography of the United States No

Effects of climate change on landslide frequencies in landslide prone districts in Sri Lanka; Overview

Climatography of the United States No

Climatography of the United States No

WIND DATA REPORT. Vinalhaven

8.1 Attachment 1: Ambient Weather Conditions at Jervoise Bay, Cockburn Sound

Transcription:

Hybrid SVM Data mining Techniques for Weather Data Analysis of Krishna District of Andhra Region N.Rajasekhar 1, Dr. T. V. Rajini Kanth 2 1 (Assistant Professor, Department of Computer Science & Engineering, VNRVJIET, Hyderabad) 2 (Professor, Department of Computer Science &Engineering, SNIST, Hyderabad) ABSTRACT: Weather Prediction is the application of science and technology to estimate the state of atmosphere at a particular spatial location. Due to the availability of huge data researchers got interest to analyze and forecast the weather. By predicting accurately it helps in safeguarding human life and their wealth. Forecasting techniques are useful in effective weather prediction; crops yield growth, traffic congestions, marine navigation, forests growth & defense purposes. The Data Mining techniques / algorithms are proved to be better than the existing techniques / methodologies / traditional statistical methods. In this paper, we have proposed a new Hybrid SVM (Support vector machines) model for effective weather prediction by analyzing the given huge weather data sets and to find the suitable patterns existing in it. SVM is one of the supervised learning methods under classifications & regression. Good results were arrived in weather prediction than the other Data Mining Techniques. For this, as a case study we have considered Krishna district weather data sets were considered for analysis using this proposed hybrid SVM technique. Keywords - Weather Prediction, Spatial Location, Hybrid Support Vector Machine, Machine Learning Algorithms, Data Mining Techniques. 1. INTRODUCTION Analysis of Weather data is used for finding the atmospheric pattern and apart from knowing the status of weather at a specific location for desired time duration. Krishna district [Ref.1] is a district in the Coastal Andhra region of Andhra Pradesh, India. It is named after the Krishna River, the third longest river that flows within India, flows through the district and joins Bay of Bengal near Hamsaladevi village of the district. Machilipatnam is the administrative headquarters for the district. Vijayawada is the biggest city in the district and also the commercial center of the state. According to 2011 Census, it has a population of 4,529,009, of which 41.00% is urban and a literacy rate of 74.37%. The district is bounded to the north-west and west by Telangana, to the north-east by West Godavari District, to the south-east by the Bay of Bengal, to the southwest by Guntur District. All the numerical data about the previous and current status of the rainfall is collected for the estimation of weather. It is useful to study and analyze weather data sets for effective prediction. The present weather prediction models are useful in estimation of weather based on the atmospheric conditions. In expert systems there is no automation methods exists in order to pick up the best possible predictive models for effective weather forecast. For this, new mathematical models are to be designed in order to address very huge time and computational complexities that occurs due to uncertainty of weather conditions. By this, the new models developed will benefit us in enhancing weather prediction accuracies. Effective forecasting of weather has lots of uses like by providing early information about natural disasters to the public to save their lives and minimize the property damages. These predictions make the agriculturalists and business people to plan crops and yields. Weather prediction is classified depending on time factor in to four types namely www.ijrcct.org Page 743

Short range- forecast up to 48 hours, Extended-forecast from 3 5 days, Medium range 3 to 7 days and Longrange forecasts more than 7 days [1]. We consider the rainfall data of three districts of Andhra Region. Here we considered Krishna district weather data for study and analysis for effective weather prediction. 2. LITERATURE SURVEY 2.1 K- Means clustering algorithm: K- Means clustering algorithm [6, 7, 8] is one of the most widely used clustering algorithm in the real world applications. It has been successfully applied in domains such as relational databases, gene expression data and decision support systems. One of the drawbacks of this algorithm is the choice of centroids location which is at random at the beginning of the algorithm; variables treatment as numbers and the number of clusters K is unknown. The drawback of the first one is overcome by multiple runs or particular initialization methods. But these initialization methods are not better than random centroids. The second drawback is overcome by handling the categorical parameters by using the measure of matching dissimilarity. For the third drawback, the input parameter that is number of clusters is fixed before in the regular K-means algorithm. This problem can be addressed by the use of cluster validity indices. Like other data mining algorithms, reliability through K- means reduced when treating high-dimensional data. In this type data sets are nearly always too sparse. The application of Euclidean distance becomes meaningless in high dimensional sparse spaces. By combining K- means with feature extraction methods such as Principal Component Analysis (PCA) and Self Organizing Maps (SOM) there is a possibility of solution. 2.2Support vector machine: Support vector machine (SVMs) [2, 5] is a supervised learning method that generates a decision-making system to predict new values. When a set of training samples were given the SVM algorithm develops a model to predict to which category this samples falls. SVM model segregates as distant as possible the given samples by a clear gap by representing them as points in space. The given examples to be predicted are then mapped into the either side of the gap based on the category that they belongs. SVM constructs a hyper plane or set of hyper-planes in the high dimensional space that can be used for machine learning algorithms like classification, or regression. The largest distanced hyper plane has a good margin even to the closest training data points to whatever class they belongs to. If there is higher separation there will be smaller generalization of classifier error. This technique can be explained as a combination of two steps: Learning: This step consists of examples along with SVM training. Prediction: It is necessary when ever results are not known by inclusion of new samples. Every example is represented as a pair of (input, output) in which data set is the input and how to categorize is denoted by output. In this every example is represented as a pair ( x, y) in mathematical format where x is the real numbers vector and y is the Boolean value (i.e. y/n, 1/ -1, T/F), or a real number. Out of these two cases first one, is treated as a problem of classification, and the second one treated as a problem of regression. Learning results are compared to the (x, y) points representing a real function in the space and SVM generates the interpolated function interpolates these series of points to the best. The Training Set consists of group of pairs that make up the SVM, where as in Test Set it contains the group of pairs for prediction. The data set is derived after clustering the data set. 3. PROPOSED APPROACH The realistic data sets that are collected are in raw format. In first step they have to be converted in to experimental suitable data [3] sets format by preprocessing techniques like noise removal etc. Initially the K-means clustering technique will be applied on the data sets and after that SVM classification technique will be applied over that clustered data set. This SVM hybrid technique [4] is suitable for effective analysis of weather data sets. 3.1 Implementation of the Proposed Approach: K- means clustering was applied on the data set for about 102 years namely Monthly Mean for each year Average Temperature (1901-2002). K-means www.ijrcct.org Page 744

clustering algorithm was applied on the data set [3] Monthly mean for each year Average Temperature and the Clustering model is on full training set. There are 13 attributes namely year, Jan, Feb, Mar, April, May, June, July, August, Sep, Oct, Nov and Dec. Euclidean distance was applied for making clusters. The number of iterations is 6 and within cluster sum of squared errors is 33.373323263149274. Time taken to build model (full training data) is 0.03 seconds. In this, missing values are globally replaced with mean/mode. Table 1 is representing the cluster centroids for about 5 clusters. Total number of Instances is 102. Cluster 0 has 33(32%), Cluster 1 has 15(15%), Cluster 2 has 24(24%), Cluster 3 has 13(13%) and Cluster 4 has 17(17%) number of instances. Fig.1 shows clusters Histogram. Fig.2 shows cluster graph of Monthly Mean for each year Average temperature with Instance number along x-axis and year along y-axis. SVM classification was done with 14 attributes namely year, names of 12 months and Clusters. The total number of Instances is 102. It was evaluated on total training set. LibSVM wrapper, actual code by Yasser EL- Manzalawy (= WLSVM). The time taken to build the model is 0.14 seconds and 0.02 seconds to test the model on training data set. Agreement between the classifications and the true classes is represented by Kappa Statistic. It is a Chance-corrected measure. It is measured by considering expected agreement by chance away from observed agreement and divides it by highest possible agreement. If the value is more than 0 it means that the classifier is doing better. The Kappa statistic is near to 1 so it is near to perfect agreement. The kappa statistic measures the agreement of prediction with the true class 0.8828 signifies almost there is complete agreement. The correctly classified instances are 92 and incorrectly classified instances are 10. That means it has classified perfectly. There is no considerable error most of the errors are near to zero except the root relative squared error. Fig.3 shows SVM classifier graph of monthly mean for each year Average temperature. In this graph Clusters were taken along x-axis and predicted cluster along y-axis. The Annual Mean values of Minimum, Average and Maximum temperatures for all the years 1901-2002 are shown in Fig.4 along with their trend line Polynomial equations. In this month s are taken along x-axis and temperature along y-axis. The equation for Mean Minimum Temperature is given by y = -0.0019x 5 + 0.0631x 4-0.7665x 3 + 3.8138x 2-5.4917x + 20.378 ---- (1) Equation for Mean Average Temperature is given by y = -0.0024x 5 + 0.0787x 4-0.9231x 3 + 4.3635x 2-5.9399x + 25.725 --- (2) Equation for Mean Maximum Temperature is given by y = -0.0029x 5 + 0.0943x 4-1.0808x 3 + 4.9155x 2-6.3828x + 31.093 --- (3) The regression [9] equation for Annual Mean values is given by Averagetemperature = 0.0442 * cloudcover + ( - 1.2564) * DiurnalTemperatureRange + (-9.431)* GroundFrostFrequency + 0.0208 * precipitation + 0.0824 * Vapour Pressure + -0.7168 * Wet day Frequency + 3.9871 * Potential Evapotranspiration +13.3485 --- (4) 4. FIGURES AND TABLES OF CLUSTER CENTROIDS Attribute Full Data 0 1 2 3 4 (102) (33) (15) (24) (13) (17) Cluster# ======================================= ================================ Year 1951.5 1924.697 1930.2 1977.3333 1986.6923 1958.9412 Jan 23.3345 23.1897 22.7651 23.5955 24.1856 23.0984 Feb 25.1828 24.8517 24.5215 25.6661 26.159 24.9804 www.ijrcct.org Page 745

Mar 27.6269 27.0164 27.1878 28.0848 28.5775 27.8259 Apr 30.3108 30.0328 29.5188 30.6396 30.975 30.5771 May 32.3848 32.1833 31.8497 32.8083 32.6614 32.4384 Jun 31.0933 31.0168 30.7815 31.8466 30.7131 30.7443 Jul 28.838 28.6428 28.7983 29.185 29.5832 28.192 Aug 28.4052 28.4061 28.0938 28.7352 28.6475 28.0271 Sep 28.2424 28.0239 28.2679 28.6095 28.9003 27.6229 Oct 27.1811 26.91 27.5127 27.3202 27.7857 26.7561 Nov 24.7899 24.2983 25.2432 25.1427 25.6115 24.2178 Dec 23.1654 22.8922 23.3564 23.493 23.9412 22.4715 Table1: Clustered Centroids of Monthly Mean For Each Year Average Temperature Fig.1: Clusters Histogram Fig.2: Clustered graph generated using k-means clustering algorithm DETAILED ACCURACY BY CLASS TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.970 0.072 0.865 0.970 0.914 0.873 0.949 0.848 cluster0 0.600 0.000 1.000 0.600 0.750 0.749 0.800 0.659 cluster1 0.958 0.038 0.885 0.958 0.920 0.895 0.960 0.858 cluster2 0.846 0.000 1.000 0.846 0.917 0.910 0.923 0.866 cluster3 1.000 0.024 0.895 1.000 0.944 0.935 0.988 0.895 cluster4 Weighted Avg. 0.902 0.036 0.912 0.902 0.897 0.875 0.933 0.833 Table 2: Detailed accuracy by clustered class www.ijrcct.org Page 746

CONFUSION MATRIX a b c d e <-- classified as 32 0 0 0 1 a = cluster0 5 9 1 0 0 b = cluster1 0 0 23 0 1 c = cluster2 0 0 2 11 0 d = cluster3 0 0 0 0 17 e = cluster4 Table 3: Confusion Matrix Fig.5: Graph of Annual mean values of weather parameters for all the years (1901-2002) Fig.3: SVM classifier graph of monthly mean for each year Average temperature Fig.4: Graph of Annual Mean Temperatures for all the years 1901-2002 along with Trend line equations 5. CONCLUSION: In the K-means cluster analysis given by Table.1 cluster 0 is 33 with highest number of instances has 1925 year as the centroid. Cluster 4 has lowest temperature 22.4715 0 c in the month of December. Cluster3 has highest temperature values in all months except May and June with 13 instances. Cluster2 with number of instances 24 has highest temperature values in these two months May and June. Detailed accuracies are given by Table.2 indicates that cluster 4 has TP rate as 1 and F-measure as 0.944. Cluster0 and cluster2 has TP rate values as 0.970 and 0.958 and F-measure values are 0.914 and 0.920 respectively. It tells that cluster 4 was classified accurately followed by cluster0 and cluster2. In the confusion matrix represented by the Table.3 each column represents instances in the predicted class and each row indicates instances in actual class. The classifier has classified accurately for the cluster4, cluster0 and cluster2 but less accurate in case of cluster1 and cluster 3 indicated by confusion matrix. Only cluster 4 was classified accurately. Fig.4. gives the following conclusions or inferences in which months are taken along x-axis and different weather parameter values are taken along y-axis. There is a correlation between mean precipitation and mean cloud cover its value is 0.86114364. Likewise there is a correlation between average temperature and mean www.ijrcct.org Page 747

vapour pressure and is 0.851278951. There is a correlation between mean diurnal temperature and mean potential evapotranspiration and is 0.665903472. There is negative correlation between Mean ground frost frequency and mean wet day frequency and its value is -0.729960042. Mean average temperature has correlation with all the parameters namely mean cloud cover, diurnal temperature range, ground frost frequency, precipitation, vapour pressure, wet day frequency, potential evapotranspiration and reference crop evapotranspiration. It has highest correlation with reference crop evapotranspiration and negative correlation with ground frost frequency. The future scope is this Hybrid SVM technique can be extended to any weather data belongs to any spatial location for further analysis based on various weather parameters. International Conference on Advanced Computing methodologies ICACM-2013, 02-03 Aug 2013, Elsevier Publications, Pg. No: 459-465, ISBN No: 978-93-35107-14- 95. [9] Dr.David, B.Stephenson, Data analysis methods in weather and climate research Department of Meteorology University of Reading, July,20, 2005, http://www.met.rdg.ac.uk /cag/courses REFERENCES [1] http://www.timeanddate.com/weather/forecast-accuracytime.html [2] http://en.wikipedia.org/wiki/weather_forecasting [3] Folorunsho Olaiya, Application of Data Mining Techniques in Weather Prediction and Climate Change Studies, I.J. Information Engineering and Electronic Business, 2012, 1, 51-59, Published Online February 2012 in MECS ( http://www.mecs-press.org/) DOI: 10.5815/ijieeb.2012.01.07 [4]Dr. M.H.Dunham, Companion slides for the text, Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002. [5] Y.Radhika and M.Shashi, Atmospheric Temperature Prediction using Support Vector Machines International Journal of Computer Theory and Engineering, Vol. 1, No. 1, April 2009 1793-8201 [6] Dr.T. V. Rajini Kanth, Ananthoju Vijay Kumar, Estimation of the Influence of Fertilizer Nutrients Consumption on the Wheat Crop yield in India- a Data mining Approach, 30 Dec 2013, Volume 3, Issue 2, Pg.No: 316-320, ISSN: 2249-8958 (Online). [7] Dr.T. V. Rajini Kanth, Ananthoju Vijay Kumar, A Data Mining Approach for the Estimation of Climate Change on the Jowar Crop Yield in India, 25Dec2013,Volume 2 Issue 2, Pg.No:16-20, ISSN: 2319-6378 (Online). [8] A. Vijay Kumar, Dr. T. V. Rajini Kanth Estimation of the Influential Factors of rice yield in India 2nd www.ijrcct.org Page 748