AN APPLICATION OF MULTIPLE CORRESPONDENCE ANALYSIS TO THE CLASSIFICATION OF RESIDENTIAL ELECTRICITY CONSUMERS
|
|
- Byron Cobb
- 6 years ago
- Views:
Transcription
1 !#"$% $$ %!& '($! *)!!#% $)$ +-,/ <CB5D/EF6HG <C,/I-9AB5G 91J <C25D/.K9L;=<5,/2MG,F68.12C0 9:6HG J G684 AN APPLICATION OF MULTIPLE CORRESPONDENCE ANALYSIS TO THE CLASSIFICATION OF RESIDENTIAL ELECTRICITY CONSUMERS Ronaldo Rocha Bastos (UFJF) Henrique Steinherz Hippert (UFJF) Augusto Carvalho Souza (ViannaJr) Classification of consumers from relatively small areas into different demand groups may help in the estimation of where, and by how much, electricity consumption is likely to change, thus reducing the uncertainties in the aggregate projecttions required from regional electricity supply companies in Brazil by the Câmara de Comercialização de Energia Elétrica (Electricity Commercialisation Chamber), responsible for the purchase of all energy generated nationwide, since In this paper we compare some methods of classification which can account for the socio-economic profile of electricity consumers and the characteristics of households that generate such residential consumption. In particular, we were interested in studying the usefulness of Multiple Correspondence Analysis (MCA), a multivariate exploratory technique for categorical data, in the identification of covariates and in the combination of its results with regression modelling and intensive computation techniques. The strategies analysed were logistic regression (LR) with both the original categorical covariates and with MCA factors as covariates, and artificial neural networks (ANN) with both types of input variables. The data used in this paper come from a multiplestage sample survey undertaken in among the residential electricity consumers of the city of Juiz de Fora, state of Minas Gerais, an area supplied by a single regional electricity company. The results obtained have shown the importance of MCA as an intermediate step to estimate coefficients that can be used in classification. For LR, despite the similar performance of both strategies, a significant reduction of the dimensionality of the problem was attained. As to the ANN, a methodology originally designed for numerical predictors, the use of MCA factors not only improves the performance of the method, but permits its application to categorical predictors.
2 Keywords: Classification; logistic regression; multiple correspondence analysis; artificial neural networks; load forecasting 2
3 1. Introduction The provision for the growing residential electricity load is made by each regional electricity supply company at a highly disaggregate level, which normally involves large capital investments in distribution lines and supply infrastructure. The overall domestic load to be supplied is, therefore, the summation of the disaggregate load across every individual household. In Brazil, since the creation in 2004 of the national Câmara de Comercialização de Energia Elétrica CCEE (Electricity Commercialisation Chamber), responsible for the purchase of all energy generated nation-wide, all electricity distributors are required to present aggregate demand forecasts of the loads they will need in their concession areas for a fiveyear time span. These forecasts often carry with them a high level of uncertainty, specially if little is known about the patterns of individual consumption. According to the new regulations, however, the CCEE is entitled to penalise the distributors for the forecasting errors, thus increasing the need for developing accurate forecasting systems, if financial risks are to be minimised. Classification of consumers from relatively small areas into different demand groups may help in the estimation of where, and by how much, electricity consumption is likely to change, thus reducing the uncertainties of such aggregate projections. One major factor affecting electricity consumption is the typology of the population which generates it, generally described by a series of categorical variables such as educational level, car ownership, type and size of dwelling, to name but a few. By taking into account differences in population characteristics, one can better analyse expected differences in consumption. Some studies about small-area demand forecast strategies, which demonstrate the usefulness and flexibility of such approaches, have been undertaken for electricity supply companies (e.g. Madden et al, 1994). In Brazil, attempts to account for the characteristics of the consumers and their households in the determination of consumer patterns and in the classification of populations into electricity consumption groups are still rare (see, e.g. Hippert, 2006). This is probably due to the scarcity of adequate data or the difficulties involved in merging data coming from different sources at different aggregation levels. The data described in section 2 made this kind of approach possible. In this paper we apply and compare some methods of classification which can account for the socio-economic profile of electricity consumers and the characteristics of households that generate such domestic consumption. In particular, we were interested in studying the usefulness of Multiple Correspondence Analysis (MCA), a multivariate exploratory technique for categorical data, in the identification of covariates and in the combination of its results with regression modelling and intensive computation techniques. The strategies analysed were logistic regression, both with the original categorical covariates and with the MCA factors as covariates, and artificial neural networks (ANN), using the same two kinds of input data. 2. Data used The data we used come from a data set created by the Department of Statistics (UFJF), containing all variables obtained from a multiple-stage sample survey undertaken in among the residential electricity consumers of the city of Juiz de Fora, state of Minas Gerais, an area supplied by a single regional electricity company. A total of 557 households were visited and their heads interviewed by means of a structured questionnaire, which contained questions about dwelling characteristics (built area, number of rooms, etc.), socioeconomic characteristics of the households (educational level of head, cars owned, existence 3
4 and number of maids, etc.), energy consuming habits (frequency and time of use of electrical household appliances), attitudes towards energy efficiency and conservation measures, and opinions about possible rationalisation in consumption. All households interviewed were also geo-referenced with the aid of a Global Positioning System (GPS) appliance, in order to enable their classification according to the neighbourhoods where they were located. In the study here discussed we did not use the variables related to attitudes, energy consuming habits and location. We focused, instead, on the categorical variables (most of them ordinal) which described the dwelling characteristics and the socio-economic profiles of the consumers. The problem of missing data was not tackled in this study, as our aim was to use the results obtained from different strategies for comparative purposes. After the elimination of all cases with missing data in any of the variables mentioned above, 444 cases remained for analysis. 3. Classification Methods Adopted and Data Analysis 3.1 Multiple Correspondence Analysis Multiple Correspondence Analysis (MCA) is a multivariate exploratory technique aimed at reducing the dimensionality of a categorical data set by means of uncorrelated factors which maximise the projection distances between the different categories of the variables considered. The solution is obtained through single value decomposition of a rectangular matrix (see, e.g. Greenacre & Blasius, 2006, p. 12). One of its strengths is the possibility of presenting results in graphical form, which facilitates the understanding of possible similarities and associations among variables. This technique, first proposed by Benzécri (1992) in the 1960s and 1970s, has been used in different applications in all branches of science (see, e.g. Greenacre and Blasius, 1994; 2006). In order to explore the possible patterns existent among the categorical variables related to the consumers and their households, we undertook an MCA with some selected variables from the dataset, as indicated in Table 1. Categorical Variables (levels) Categories Educational level (3) Up to High School; H S to Undergraduate.; Graduate predictor Cars owned (3) None; One; Two or More predictor House maid (2) Yes; No predictor Dwelling area (4) < 50 m 2 ; m 2 ; m 2 ; > 151 m 2 predictor Meter type (2) One phase; Two/ Three phases grouping Table 1 The categorical variables used in the study Type A total of four predictor variables and 12 corresponding categories were analysed with MCA, giving an overall solution with eight factors. Table 2 summarises the solution, presenting the percentage of the total inertia explained by each factor. Solution Eigenvalu % Cum. % e F F
5 F F F F F F Table 2 - MCA solutions, corresponding eigenvalues and % of inertia explained The graphical solution with the first two factors, which account for approximately 37% of the explained inertia, is presented in Figure 1. In this graph, categories of different variables are considered to be associated when located close to each other, whereas categories of the same variable which are close to each other are considered to be similar. Factors 1 and 2 are very discriminating for the variables analysed. For example, positive values of category coordinates for Factor 1 are related to category 1 of the variable meter type (lower consumption) and negative ones are related to category 2 (higher consumption) of the same variable. The variable meter type was plotted in the graph as a supplementary variable (Greenacre, 2006; p.31-32), inasmuch as it does not influence the solution. The position of its two categories in the principal plane of MCA, however, confirm the expected pattern, observed for the other variables, that more affluent consumers, who tend to have higher demand for electricity (here indicated by the typo of meter), are separated from the lessaffluent consumers. The variable meter type, therefore, is the dependent variable to be used in the classification methods that follow. Area < 50 Factor 2 = 14,93% Two or more cars Hires maid Graduate Area > 151sqm Two/Three-phase metre From high school to undergraduate Area qrm One car No car One-phase metreup to high Does not school level hire maid Area sqm Factor 1 = 22,31% 3.2 Logistic Regression Figure 1 Principal plane of MCA Logistic regression (LR) is one linear method of classification that can be applied to the data presented. The conditional probabilities of a binary outcome (or an outcome of higher 5
6 dimension, if polychotomous logistic regression is used) are estimated and used to classify cases described by particular values of the covariates. LR can easily incorporate categorical predictors as covariates and, according to Saporta & Niang (2006), perform better than linear discriminant analysis when the conditional distributions are not normal or have different covariances. Following the proposal of Saporta & Niang (2006), we initially performed stepwise selection of covariates (probability level 0.05) for the estimation of the conditional probability of an observation x coming from one particular group (baseline). Then, stepwise logistic regression (probability level 0.05) was again performed, this time with the factors obtained by MCA as covariates. We then analysed how close the two solutions were. In order to avoid the problem of resubstitution bias, Saporta & Niang (2006) recommend splitting the total sample in subsets for training (estimating coefficients) and for testing (classification). We followed their advice and performed a similar simulation exercise, in which 50 random samples of both types were taken without replacement, stratified by the two groups of meter type. In each sample, 356 cases (80% of the total) were used for each estimation run and the remaining 88 (20%) were used in each classification. The same subsamples were used for both strategies. The performance of each strategy in terms of classification was evaluated by comparison of their Receiver Operating Characteristic (ROC) curves and corresponding areas under the ROC curve (AUC). The ROC curve, formed by the true positive rates and false positive rates for different thresholds, represents the overall performance of the model for any threshold. Figure 2 presents two of these curves, for illustrative purposes, obtained by both strategies, for the whole sample available in the database. It can be seen that the curves for both strategies are similar, and have about the same AUC. MCA, LR, ROC curves and AUC were computed throughout this analysis by means of routines implemented in R. sensitivity: true positive rate LR with Factors LR with Categorical var. AUC-Factors: AUC-Categ. Var: (1 - specificity): false positive rate Figure 2 ROC curves for LR with categorical covariates and MCA factors as covariates on complete sample 6
7 Table 3a presents a summary of the frequency of the number of covariates retained in the models for both strategies and Table 3b presents the frequency of each covariate in the whole simulation exercise. Nr. of predictors selected (a) LR with categorical predictors LR with MCA factor as predictors Predictor # of times retained (b) Predictor 2-19 Area : m 2 49 Factor Area : m 2 49 Factor Area : >151 m 2 49 Factor Cars: 1 50 Factor Cars: Factor Maid: 1+ 3 Factor 6 0 Total Ed.Level: HS- undergrad 40 Factor 7 3 # of times retained Ed. Level: graduate 40 Factor 8 6 Table 3 Categorical variable and MCA factor selection for LR in 50 samples frequency of covariates retained It can be noticed that LR with MCA factors tend to retain fewer covariates than LR with categorical covariates. Furthermore, one can see that the categorical covariates dwelling area, cars owned and educational level appear more frequently as significant covariates; the covariate dwelling area and cars owned were present in almost all 50 models. On the other hand, factors F1, F2 (the principal plane of MCA) and F5 were the most frequent; F1 and F2 were always present in the 50 models. 3.3 Artificial Neural Networks Artificial Neural Networks (ANN) may be viewed as nonlinear regression models which map out the input variables onto the output space by means of complex sets of nested functions whose parameters are iteratively estimated by means of training algorithms. For this application, the ANN had the same inputs as the LR models, and a single output, restricted to the [0,1] interval, so that it could be interpreted as the probability of the input vector belonging to a given class. The ANN models we experimented with were fully-connected feed-forward perceptrons with one hidden layer and one output neuron. Hyperbolic tangent functions were used for the activation of the hidden neurons, and a logistic function for the output neuron. Training was performed with the Levenberg-Marquardt algorithm (Bishop, 1998), with early stopping defined by cross-validation: after each iteration, the ANN performance was checked over a sub-sample of the data, distinct from the training data; when the error started to increase (which meant the ANN had started to overfit the training data), the iterations were stopped (Haykin, 1999). Out of the 444 cases available in the sample, 332 were randomly selected for the network training, 24 for cross-validation, and the remaining 88 cases for out-of-sample testing (therefore, the sample used for the ANN testing had the same size as the samples used for LR testing). The number of hidden processing elements (neurons) was defined by a gridsearch which compared the performances of ANNs of eight different sizes, containing 1, 2, 3, 5, 10, 15, 20 and 30 neurons. The ANNs were experimented according to the same two strategies as used for the LR models: with the original covariates (Table 1), and with the factors obtained by MCA (Table 7
8 2). In both situations, Meter type was used as the output variable. Since the training algorithms tend to produce slightly different results each time they are run, as they start form random initialisation values, we run each ANN model 30 times. In order to evaluate the performance at each run, the vectors of 88 output values were used to build ROC curves, and the areas under each curve (AUCs) were estimated. For each ANN size, then, we had 30 AUC estimates. Figure 3 shows boxplots displaying the 30 areas estimated for the 30-neuron ANN in each strategy (with original covariates and with MCA factors). The eight ANNs we experimented in each strategy obtained mean and median areas which were roughly equivalent; the larger ANNs, however, tended to show less dispersion in the areas and produce fewer outliers. 4. Comparison of classification strategies used The methodologies used in Sections 3.2 and 3.3 were different: for each strategy (original covariates as input data, versus MCA factors), there were 50 simulations, and consequently 50 different samples for each of the two LR strategies, as compared to 30 replications of the same ANN model on a single sample. Therefore, it is not possible to compare the results coming from LR to those from ANN. It is advisable, therefore, to proceed with separate comparisons, bearing in mind that the main objective of the present study was to observe the contribution of MCA as an intermediate step to both LR and ANN. Figure 3 summarises all results discussed here. Firstly, the two LR strategies are clearly comparable in their classification capabilities, as measured by the box-plots for the AUC obtained in 50 different test samples. A comparison by means of non-parametric tests showed that the two distributions are not significantly different (p=0.085 in Mann-Whitney, p=0.27 in Kolmogorov-Smirnov). The advantage of the strategy using MCA factors is that the models were fitted with fewer inputs (only three factors in average), favouring the principle of parsimony for regression models. Secondly, the ANN trained with MCA factors as entry variables clearly shows superior classification performance than the ANN trained with categorical variables (p=0.000 in Mann-Whitney and in Kolmogorov-Smirnov). The median in the distribution of 30 runs for the former strategy is clearly higher than the median obtained from the other strategy. Finally, the four strategies clearly increase the level of information that can be input to load forecast models, as all of them present a classification performance, represented by their AUC, superior to a random choice. Figure 3 Comparison of different strategies for classification of electricity consumers 8
9 5. Conclusion The results obtained have shown the importance of MCA as an intermediate step to estimate coefficients that can be used in classification. For LR, despite the similar performance of both strategies, the principle of lowering the dimensionality of the problem is attained. As to the ANN, a methodology originally designed for numerical predictors, the use of MCA factors not only improves the performance of the method, but permits its application to categorical predictors, as suggested by Saporta & Niang (2006). The typologies obtained by such classification methods are expected to be used as input to medium or long-term demand forecast models, improving further their accuracy. The results reported in this paper are part of larger study that intends to develop models that associate spatially referenced data about the socio-economic profile of the consumers, and the characteristics of their households, to their most likely levels of electricity demand. These models will then be used to define a demand typology to be used as an input to load forecasting models, particularly those geared to medium- or long-term forecasting for specific areas whose population characteristics are known. References BISHOP, C.M. Neural Networks for Pattern Recognition. Oxford: Oxford University Press, GREENACRE, M. & BLASIUS, J. (eds.) Correspondence Analysis in the Social Sciences. London: Academic Press, GREENACRE, M. & BLASIUS, J. (eds.) Multiple Correspondence Analysis and Related Methods. Boca Raton: Chapman & Hall/CRC, HAYKIN, S. Neural Networks a Comprehensive Foundation. 2 nd ed. Upper Saddle River: Prentice Hall, HIPPERT, H.S. Modelagem e Previsão de Demanda de Energia Elétrica com Base em Dados Geo-Referenciados. Anais do IX Encontro de Modelagem Computacional. Belo Horizonte, MADDEN, M.; STEVENSON, M.A.; BROWN, P.J.B. & BATEY, P.W.J. Developing a Small-Area Electricity Demand Forecasting System. The Journal of Energy and Development. Vol. XX, n. 1, p. 1-24, SAPORTA, G. & NIANG, N. Correspondence Analysis and Classification. In: Michael Greenacre & Jörg Blasius (eds.) Multiple Correspondence Analysis and Related Methods. Boca Raton: Chapman & Hall/CRC,
Analysis of Fast Input Selection: Application in Time Series Prediction
Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,
More informationTypical information required from the data collection can be grouped into four categories, enumerated as below.
Chapter 6 Data Collection 6.1 Overview The four-stage modeling, an important tool for forecasting future demand and performance of a transportation system, was developed for evaluating large-scale infrastructure
More informationData Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1
Data Collection Lecture Notes in Transportation Systems Engineering Prof. Tom V. Mathew Contents 1 Overview 1 2 Survey design 2 2.1 Information needed................................. 2 2.2 Study area.....................................
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted
More informationApplication of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption
Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption ANDRÉ NUNES DE SOUZA, JOSÉ ALFREDO C. ULSON, IVAN NUNES
More informationMarkovian Models for Electrical Load Prediction in Smart Buildings
Markovian Models for Electrical Load Prediction in Smart Buildings Muhammad Kumail Haider, Asad Khalid Ismail, and Ihsan Ayyub Qazi LUMS School of Science and Engineering, Lahore, Pakistan {8,,ihsan.qazi}@lums.edu.pk
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationA Weighted Score Derived from a Multiple Correspondence
Int Statistical Inst: Proc 58th World Statistical Congress 0 Dublin (Session CPS06) p4 A Weighted Score Derived from a Multiple Correspondence Analysis Solution de Souza Márcio L M Universidade Federal
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More information152 STATISTICAL PREDICTION OF WATERSPOUT PROBABILITY FOR THE FLORIDA KEYS
152 STATISTICAL PREDICTION OF WATERSPOUT PROBABILITY FOR THE FLORIDA KEYS Andrew Devanas 1, Lydia Stefanova 2, Kennard Kasper 1, Sean Daida 1 1 NOAA/National Wear Service, Key West, Florida, 2 COAPS/Florida
More informationFeature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size
Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationRevision: Neural Network
Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationA Joint Tour-Based Model of Vehicle Type Choice and Tour Length
A Joint Tour-Based Model of Vehicle Type Choice and Tour Length Ram M. Pendyala School of Sustainable Engineering & the Built Environment Arizona State University Tempe, AZ Northwestern University, Evanston,
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationNeural Network Based Response Surface Methods a Comparative Study
. LS-DYNA Anwenderforum, Ulm Robustheit / Optimierung II Neural Network Based Response Surface Methods a Comparative Study Wolfram Beyer, Martin Liebscher, Michael Beer, Wolfgang Graf TU Dresden, Germany
More informationCorrespondence Analysis of Longitudinal Data
Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationData Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td
Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak
More informationApplied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition
Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world
More informationInput Selection for Long-Term Prediction of Time Series
Input Selection for Long-Term Prediction of Time Series Jarkko Tikka, Jaakko Hollmén, and Amaury Lendasse Helsinki University of Technology, Laboratory of Computer and Information Science, P.O. Box 54,
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationCOMP-4360 Machine Learning Neural Networks
COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca
More informationFORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH
International Journal of Mathematics and Computer Applications Research (IJMCAR) ISSN 49-6955 Vol. 3, Issue 1, Mar 013, 9-14 TJPRC Pvt. Ltd. FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH R. RAMAKRISHNA
More informationAn evaluation of Bayesian techniques for controlling model complexity and selecting inputs in a neural network for short-term load forecasting
An evaluation of Bayesian techniques for controlling model complexity and selecting inputs in a neural network for short-term load forecasting Henrique S. Hippert Depto. de Estatistica, Universidade Federal
More informationA Note on Methods. ARC Project DP The Demand for Higher Density Housing in Sydney and Melbourne Working Paper 3. City Futures Research Centre
A Note on Methods ARC Project DP0773388 The Demand for Higher Density Housing in Sydney and Melbourne Working Paper 3 City Futures Research Centre February 2009 A NOTE ON METHODS Dr Raymond Bunker, Mr
More informationA Neural Qualitative Approach for Automatic Territorial Zoning
A Neural Qualitative Approach for Automatic Territorial Zoning R. J. S. Maciel 1, M. A. Santos da Silva *2, L. N. Matos 3 and M. H. G. Dompieri 2 1 Brazilian Agricultural Research Corporation, Postal Code
More informationBearing fault diagnosis based on EMD-KPCA and ELM
Bearing fault diagnosis based on EMD-KPCA and ELM Zihan Chen, Hang Yuan 2 School of Reliability and Systems Engineering, Beihang University, Beijing 9, China Science and Technology on Reliability & Environmental
More informationConfidence Estimation Methods for Neural Networks: A Practical Comparison
, 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.
More informationMolinas. June 15, 2018
ITT8 SAMBa Presentation June 15, 2018 ling Data The data we have include: Approx 30,000 questionnaire responses each with 234 questions during 1998-2017 A data set of 60 questions asked to 500,000 households
More informationUrban form, resource intensity & renewable energy potential of cities
Urban form, resource intensity & renewable energy potential of cities Juan J. SARRALDE 1 ; David QUINN 2 ; Daniel WIESMANN 3 1 Department of Architecture, University of Cambridge, 1-5 Scroope Terrace,
More informationAn Alternative Algorithm for Classification Based on Robust Mahalanobis Distance
Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,
More informationOn the energy demands of small appliances in homes
Available online at www.sciencedirect.com ScienceDirect Energy Procedia 00 (2015) 000 000 www.elsevier.com/locate/procedia 6th International Building Physics Conference, IBPC 2015 On the energy demands
More informationChapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationApplication of an Artificial Neural Network Based Tool for Prediction of Pavement Performance
0 0 0 0 Application of an Artificial Neural Network Based Tool for Prediction of Pavement Performance Adelino Ferreira, Rodrigo Cavalcante Pavement Mechanics Laboratory, Research Center for Territory,
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationWEATHER DEPENENT ELECTRICITY MARKET FORECASTING WITH NEURAL NETWORKS, WAVELET AND DATA MINING TECHNIQUES. Z.Y. Dong X. Li Z. Xu K. L.
WEATHER DEPENENT ELECTRICITY MARKET FORECASTING WITH NEURAL NETWORKS, WAVELET AND DATA MINING TECHNIQUES Abstract Z.Y. Dong X. Li Z. Xu K. L. Teo School of Information Technology and Electrical Engineering
More informationLoad Forecasting Using Artificial Neural Networks and Support Vector Regression
Proceedings of the 7th WSEAS International Conference on Power Systems, Beijing, China, September -7, 2007 3 Load Forecasting Using Artificial Neural Networks and Support Vector Regression SILVIO MICHEL
More informationA simulation study of model fitting to high dimensional data using penalized logistic regression
A simulation study of model fitting to high dimensional data using penalized logistic regression Ellinor Krona Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats
More informationNeural networks (not in book)
(not in book) Another approach to classification is neural networks. were developed in the 1980s as a way to model how learning occurs in the brain. There was therefore wide interest in neural networks
More informationAdvising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand
Advising on Research Methods: A consultant's companion Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand Contents Preface 13 I Preliminaries 19 1 Giving advice on research methods
More informationArtificial Neural Network
Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationShort Term Load Forecasting Using Multi Layer Perceptron
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Short Term Load Forecasting Using Multi Layer Perceptron S.Hema Chandra 1, B.Tejaswini 2, B.suneetha 3, N.chandi Priya 4, P.Prathima
More informationDiscrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data
Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions
More informationClassification of Ordinal Data Using Neural Networks
Classification of Ordinal Data Using Neural Networks Joaquim Pinto da Costa and Jaime S. Cardoso 2 Faculdade Ciências Universidade Porto, Porto, Portugal jpcosta@fc.up.pt 2 Faculdade Engenharia Universidade
More informationSUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota
Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In
More informationNeural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:
Neural Networks Nethra Sambamoorthi, Ph.D Jan 2003 CRMportals Inc., Nethra Sambamoorthi, Ph.D Phone: 732-972-8969 Nethra@crmportals.com What? Saying it Again in Different ways Artificial neural network
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationTurning a research question into a statistical question.
Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationCombination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters
Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Kyriaki Kitikidou, Elias Milios, Lazaros Iliadis, and Minas Kaymakis Democritus University of Thrace,
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationUNIVERSITY OF THE PHILIPPINES LOS BAÑOS INSTITUTE OF STATISTICS BS Statistics - Course Description
UNIVERSITY OF THE PHILIPPINES LOS BAÑOS INSTITUTE OF STATISTICS BS Statistics - Course Description COURSE COURSE TITLE UNITS NO. OF HOURS PREREQUISITES DESCRIPTION Elementary Statistics STATISTICS 3 1,2,s
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationThunderstorm Forecasting by using Artificial Neural Network
Thunderstorm Forecasting by using Artificial Neural Network N.F Nik Ismail, D. Johari, A.F Ali, Faculty of Electrical Engineering Universiti Teknologi MARA 40450 Shah Alam Malaysia nikfasdi@yahoo.com.my
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More informationA4. Methodology Annex: Sampling Design (2008) Methodology Annex: Sampling design 1
A4. Methodology Annex: Sampling Design (2008) Methodology Annex: Sampling design 1 Introduction The evaluation strategy for the One Million Initiative is based on a panel survey. In a programme such as
More informationepochs epochs
Neural Network Experiments To illustrate practical techniques, I chose to use the glass dataset. This dataset has 214 examples and 6 classes. Here are 4 examples from the original dataset. The last values
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationPrincipal Component Analysis Applied to Polytomous Quadratic Logistic
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS024) p.4410 Principal Component Analysis Applied to Polytomous Quadratic Logistic Regression Andruski-Guimarães,
More informationTextbook Examples of. SPSS Procedure
Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationUnivariate versus Multivariate Models for Short-term Electricity Load Forecasting
Univariate versus Multivariate Models for Short-term Electricity Load Forecasting Guilherme Guilhermino Neto 1, Samuel Belini Defilippo 2, Henrique S. Hippert 3 1 IFES Campus Linhares. guilherme.neto@ifes.edu.br
More informationPrediction of Hourly Solar Radiation in Amman-Jordan by Using Artificial Neural Networks
Int. J. of Thermal & Environmental Engineering Volume 14, No. 2 (2017) 103-108 Prediction of Hourly Solar Radiation in Amman-Jordan by Using Artificial Neural Networks M. A. Hamdan a*, E. Abdelhafez b
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationDATABASE AND METHODOLOGY
CHAPTER 3 DATABASE AND METHODOLOGY In the present chapter, sources of database used and methodology applied for the empirical analysis has been presented The whole chapter has been divided into three sections
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationEstimation of extreme flow quantiles and quantile uncertainty for ungauged catchments
Quantification and Reduction of Predictive Uncertainty for Sustainable Water Resources Management (Proceedings of Symposium HS2004 at IUGG2007, Perugia, July 2007). IAHS Publ. 313, 2007. 417 Estimation
More informationCSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning
CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.
More informationThe Service Pipe A Forgotten Asset in Leak Detection
Leakage 2005 - Conference Proceedings Page 1 R P Warren Tynemarch Systems Engineering Ltd, Crossways House, 54-60 South Street, Dorking, Surrey, RH4 2HQ, UK, rwarren@tynemarch.co.uk Keywords: service pipes;
More informationComparison of Predictive Accuracy of Neural Network Methods and Cox Regression for Censored Survival Data
Comparison of Predictive Accuracy of Neural Network Methods and Cox Regression for Censored Survival Data Stanley Azen Ph.D. 1, Annie Xiang Ph.D. 1, Pablo Lapuerta, M.D. 1, Alex Ryutov MS 2, Jonathan Buckley
More informationForecasting Crude Oil Price Using Neural Networks
CMU. Journal (2006) Vol. 5(3) 377 Forecasting Crude Oil Price Using Neural Networks Komsan Suriya * Faculty of Economics, Chiang Mai University, Chiang Mai 50200, Thailand *Corresponding author. E-mail:
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationUrban Pattern Geometry and its Potential Energy Efficiency
DOI: 10.14621/tna.20170104 Urban Pattern Geometry and its Potential Energy Efficiency Anna Yunitsyna* 1, Ernest Shtepani 2 1 Department of Architecture, Epoka University Tirana, Albania; ayunitsyna@epoka.edu.al
More informationMODELLING ENERGY DEMAND FORECASTING USING NEURAL NETWORKS WITH UNIVARIATE TIME SERIES
MODELLING ENERGY DEMAND FORECASTING USING NEURAL NETWORKS WITH UNIVARIATE TIME SERIES S. Cankurt 1, M. Yasin 2 1&2 Ishik University Erbil, Iraq 1 s.cankurt@ishik.edu.iq, 2 m.yasin@ishik.edu.iq doi:10.23918/iec2018.26
More informationClassifying Hungarian sub-regions by their competitiveness
Classifying Hungarian sub-regions by their competitiveness Péter KOVÁCS, lecturer, Department of Statistics and Demography, Miklós LUKOVICS, lecturer, Institute of Economics and Economic Development, Faculty
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationForecasting demand in the National Electricity Market. October 2017
Forecasting demand in the National Electricity Market October 2017 Agenda Trends in the National Electricity Market A review of AEMO s forecasting methods Long short-term memory (LSTM) neural networks
More informationstatistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI
statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI tailored seasonal forecasts why do we make probabilistic forecasts? to reduce our uncertainty about the (unknown) future
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationNeural network modelling of reinforced concrete beam shear capacity
icccbe 2010 Nottingham University Press Proceedings of the International Conference on Computing in Civil and Building Engineering W Tizani (Editor) Neural network modelling of reinforced concrete beam
More informationLecture 7 Artificial neural networks: Supervised learning
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More information