A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

Similar documents
Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Feature Extraction Techniques

List Scheduling and LPT Oliver Braun (09/05/2017)

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

COS 424: Interacting with Data. Written Exercises

Mathematical Model and Algorithm for the Task Allocation Problem of Robots in the Smart Warehouse

Ch 12: Variations on Backpropagation

Probability Distributions

Topic 5a Introduction to Curve Fitting & Linear Regression

Randomized Recovery for Boolean Compressed Sensing

Bipartite subgraphs and the smallest eigenvalue

Boosting with log-loss

Machine Learning Basics: Estimators, Bias and Variance

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

a a a a a a a m a b a b

A method to determine relative stroke detection efficiencies from multiplicity distributions

Homework 3 Solutions CSE 101 Summer 2017

REDUCTION OF FINITE ELEMENT MODELS BY PARAMETER IDENTIFICATION

Principles of Optimal Control Spring 2008

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Pattern Recognition and Machine Learning. Artificial Neural networks

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Non-Parametric Non-Line-of-Sight Identification 1

Ensemble Based on Data Envelopment Analysis

lecture 36: Linear Multistep Mehods: Zero Stability

The Methods of Solution for Constrained Nonlinear Programming

Support Vector Machines MIT Course Notes Cynthia Rudin

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Combining Classifiers

A note on the multiplication of sparse matrices

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science

Kernel Methods and Support Vector Machines

R. L. Ollerton University of Western Sydney, Penrith Campus DC1797, Australia

Convex Programming for Scheduling Unrelated Parallel Machines

When Short Runs Beat Long Runs

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

A Generalized Permanent Estimator and its Application in Computing Multi- Homogeneous Bézout Number

Stochastic Subgradient Methods

OBJECTIVES INTRODUCTION

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

Understanding Machine Learning Solution Manual

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3

SPECTRUM sensing is a core concept of cognitive radio

The Transactional Nature of Quantum Information

Testing Properties of Collections of Distributions

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

In this chapter, we consider several graph-theoretic and probabilistic models

Least Squares Fitting of Data

Polygonal Designs: Existence and Construction

Bayes Decision Rule and Naïve Bayes Classifier

Analysis of Impulsive Natural Phenomena through Finite Difference Methods A MATLAB Computational Project-Based Learning

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Curious Bounds for Floor Function Sums

The dynamic game theory methods applied to ship control with minimum risk of collision

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Analysis of Polynomial & Rational Functions ( summary )

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

On Rough Interval Three Level Large Scale Quadratic Integer Programming Problem

Support Vector Machines. Maximizing the Margin

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Divisibility of Polynomials over Finite Fields and Combinatorial Applications

Soft-margin SVM can address linearly separable problems with outliers

MODIFICATION OF AN ANALYTICAL MODEL FOR CONTAINER LOADING PROBLEMS

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

PROXSCAL. Notation. W n n matrix with weights for source k. E n s matrix with raw independent variables F n p matrix with fixed coordinates

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Recursive Algebraic Frisch Scheme: a Particle-Based Approach

M ath. Res. Lett. 15 (2008), no. 2, c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS. Van H. Vu. 1.

Genetic Quantum Algorithm and its Application to Combinatorial Optimization Problem

EXPLICIT CONGRUENCES FOR EULER POLYNOMIALS

Interactive Markov Models of Evolutionary Algorithms

IN modern society that various systems have become more

lecture 37: Linear Multistep Methods: Absolute Stability, Part I lecture 38: Linear Multistep Methods: Absolute Stability, Part II

The Algorithms Optimization of Artificial Neural Network Based on Particle Swarm

Solutions of some selected problems of Homework 4

The Weierstrass Approximation Theorem

ANALYTICAL INVESTIGATION AND PARAMETRIC STUDY OF LATERAL IMPACT BEHAVIOR OF PRESSURIZED PIPELINES AND INFLUENCE OF INTERNAL PRESSURE

Donald Fussell. October 28, Computer Science Department The University of Texas at Austin. Point Masses and Force Fields.

Support Vector Machines. Goals for the lecture

Lecture 13 Eigenvalue Problems

Distributed Subgradient Methods for Multi-agent Optimization

Lower Bounds for Quantized Matrix Completion

The Euler-Maclaurin Formula and Sums of Powers

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES

Pattern Recognition and Machine Learning. Artificial Neural networks

Block designs and statistics

Experiment 2: Hooke s Law

Transcription:

1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu 1 Jerry Farlow, Dept of Matheatics, University of Maine, Orono, ME. 04473, farlow@ath.uaine.edu

Abstract: Logical regression, as described by Ruczinski, Kooperberg, and LeBlanc (003) is a ultivariable regression ethodology for predicting a Boolean variable Y fro logical relationships aong a collection of Boolean predictor variables,... 1. More specifically, one seeks a regression odel of the for g E Y = b + b L + b L + b L (1) ( [ ]) 0 1 1 where the coefficients b0, b1,..., b and the logical expressions L, = 1,..., are to be deterined. The expressions L are logical relationships (Boolean functions having values 0 or 1) aong the predictor variables, such as " 1, are true but not ", or 5 " 3, 5, 7 are true but not 1 or " or "if 1 and are true then 5 is true, where true is taken as 1 and false taken as 0. A aor proble in finding the best odel is the oveent fro one logical expression to another in an effort to find a path to optiality. The authors investigate the use of a greedy algorith as well as the siulated annealing algorith in their search for optiality. In this paper we develop a strategy, based on the self-organizing ethod of Ivakhnenko, to first find a collection of logical relations L k for predicting the dependent variable Y and then use ordinary linear regression to find a regression equation of the for (1) and possibly a ore general odel by adding continuous predictor variables to the ix. An exaple is presented to deonstrate the potential use of this ethod. Key Words: Logical regression, GMDH algorith

3 1. Introduction Our goal is to use a variation of the self-organizing GMDH algorith to find a collection of logical relations Y = L (,,..., ), k = 1,,..., () k 1 for predicting the Boolean variable Y fro the Boolean variables 1,,...,. Although the procedure as described finds the best logical relations, one ost likely is interested only in the best one or two. Inasuch as the GMDH algorith is not known to ost people, we outline the basic ethod so that the adoption to logical regression is better appreciated. For ore inforation on the GMDH algorith the reader can consult the book, Self-Organizing Methods in Modeling, by Farlow (1984) or the ore coplete and recent book, Self-Organizing Data Mining by Johann Mueller and Frank Leke, which can be purchased and downloaded online at http://www.gdh.net/.. The Group Method of Data Handling (GMDH) Algorith One ight say that the GMDH algorith builds a atheatical odel siilar to the way biological organiss are created through evolution. That is, starting with a few basic prieval fors (i.e. equations); one grows a new generation of ore coplex off-springs (equations), then allows a survival-of-the-fittest principle to deterine which off-springs survive and which do not. The idea being that each new generation of offsprings (equations) is better suited to odel the real world than previous ones. Continuing this process for ore generations, one finds a collection of odels that hopefully describes the proble at hand. The process is stopped once the odel begins to overfit the real world, thus stopping when the odel reaches optial coplexity. In 1966 the Ukrainian cyberneticist, A.G. Ivakhnenko, discouraged by the fact that any atheatical odels require knowledge of the real world that are difficult or ipossible to know, devised a heuristic self-organizing ethod, called the Group Method of Data Handling algorith. The GMDH algorith can be broken into a few distinct steps.

4 Step 1 (constructing new variables z1, z,..., z C (,) ) The algorith begins with regression-type data yi, xi 1, xi,..., xi, i = 1,,..., n where y is the dependent variable to be predicted and x1, x,... x are predictor variables. The n observations required for the algorith are subdivided into two groups, one group of nt observations are called the training observations (fro which the odel is built) and the reaining n nt observations (which deterines when the odel is optial) are called checking observations. This is a coon cross-validation strategy for deterining when odels are optial; building the odel fro one set of observations (training set) and checking the odel against independent observations (checking set). See Figure 1. Input Data for the GMDH Algorith Figure 1 The algorith begins by finding the least-squares regression polynoial of the for y = A + Bx + Cx + Dx + Ex + Fx x (3) i i i for each C(, ) = ( 1) / pair of distinct independent variables x, x using the i observations in the training set. These ( 1) / regression surfaces are illustrated in Figure.

5 Coputed Quadratic Regression Surfaces Figure One now evaluates each of the C(, ) regression polynoials at all n data points and stores these values (new generation of variables) in coluns of a new n C(, ) array, say Z. The evaluation of the first regression polynoial and the storage if its n values in the first colun of Z is illustrated in Figure 3. Evaluating the (, ) C Quadratic Regression Polynoials Figure 3

6 The obect is to keep only the best of these new coluns and this is where the checking set coes into play. Step (screening out the least effective variables) This step replaces the original variables (coluns of ) by those coluns of Z that best predict y, based on the training set observations. This is done by coputing for each colun of Z soe easure of association; say the root ean square r given by nt ( yi zi ) i= 1 r = nt, = 1,,..., C(, ) y i= 1 i (4) then selecting those coluns of Z that satisfy r < R, where R is soe prescribed nuber. The nuber of coluns of Z that replace the coluns of ay be larger or saller than the nuber of coluns of, although often one siply chooses coluns of Z to replace the coluns of, thus keeping the nuber of predictor variables constant. Step 3 (test for optiality) We now cross validate the odel by coputing the goodness of fit of new variables sued over the checking set. That is, we copute n ( yi zi ) i= nt+ 1 R = n, = 1,,..., C(, ) y i= nt+ 1 Step we find the sallest of the root ean squares each generation (iteration) this value is plotted as shown in Figure 4. i R s and call it RMIN, and then at

7 Deterining the Optial Polynoial Figure 4 The values of RMIN will decrease for a few iterations (aybe fro 3-5 iterations) but then starts to increase when the odel begins to overfit the observations on which it was built. Hence, one stops procedure when the RMIN curve reaches its iniu and selects the colun of Z having the sallest value of R as the best predictor. When the algorith stops, the quadratic regression polynoials found at each generation has been stored, and hence by coposition one can for a high-order regression polynoial of the for 1 (5) y = a + b x + c x x + d x x x + i i i ik i k i= 1 i= 1 = 1 i= 1 = 1 k = 1 known as the Ivakhnenko polynoial that best predicts Y fro. At each iteration the degree of the Ivakhnenko doubles, and for a p-th order regression polynoial the nuber of ters in the polynoial will be ( + 1)( + ) ( + p) /!. If one started with = 10 input variables and the algorith went through 4 generations, the Ivakhnenko polynoial would be of degree 4 = 16 and would contain ters such as x1x3 x 7. Step 4 (Applying the results of the GMDH Algorith)

8 One doesn t actually copute the coefficients in the Ivakhenko polynoial, but saves the regression coefficients A,B,C,D,E,F at each generation. Hence, to evaluate the Ivakhnenko polynoial and use the odel as a predictor of Y fro new observations, one siply carries out repeated copositions of these quadratic expressions. Figure 5 illustrates this process. Evaluation of the Ivakhnenko Polynoial Figure 5 3. Applying GMDH to Logical Regression We now use the ideas of the GMDH algorith to find logical expressions aong the Boolean variables 1,,..., that best predict of Y. Starting with n observations of Y and 1,,..., we subdivide the observations into nt training observations and nc = n nt checking observations. We then use the observations in the training set to deterine for each C(, ) pair of dependent variables, how well each of binary functions i,,,,,,, (6) i i i i i i i i

9 predicts Y. We do this by assigning a 1 to an observation if Y f (, ), where f (, ) is one of the eight binary functions. Carrying out this operation for each of i the n observation for each of the eight binary functions and each pair of dependent variables yields an n 8 C(, ) atrix of 0 s and 1 s which we call Z. Since the 1 s in the coluns of Z represent correct predictions of Y for a given logical function and pair of predictor variables, we su the coluns of Z in the training set and rank the in descending order and select the largest ones. We have chosen the largest sus. These coluns of Z with the largest sus represent those logical relations between a given two variables that best predicts Y. Typical exaples ight be 3 7 or. 5 7 We now replace the original data by the best coluns of Z. This gives us a new data set which are evaluations of the best logical relationships of the original variables, hence should act as better predictors of Y than the original observations. We then repeat this process again and again, each tie finding new logical relations of the previous variables which in tern are logical relations of earlier variables. Before starting each new iteration however, we check best predicted values of Y (the first colun of the atrix Z in the last nc rows) against the nc observations of Y in the checking set to deterine goodness of fit. When the percentage of correct predictions reaches a axiu, the process is stopped. At this tie we have logical expressions L for estiating Y ordered fro top to botto. We then continue the process by finding the linear regression equation i g( E[ Y ]) = b + b L + b L + b L + a + a W + a W +... + a W (7) 0 1 1 0 1 1 p p where the variables W1, W,..., W p are continuous variables we can (possibly) add to the Boolean variables L1, L,..., L. We then find the coefficients b, a by usual linear regression using the original data Y, along with the data for the continuous variables W1, W,..., W p.

10 4. Siulation of the Process We generated ten data sets, each with 50 observations and ten predictor variables. The predictor variables 1,,... 10 were independent Boolean variables ( p = 0.5 ), and the dependent variable Y was deterined by Y ( A B) ( C D) with probability q = Bernoulli var iable (0.5) with probability 1 q (8) where 0 q 1 was chosen (generally about 0.7) and A, B, C, D were any four of the variables 1,,... 10, possibly repeated, or their negations. Exaple: We generated 50 observations of 10 independent variables, each with a Bernoulli distribution ( p = 0.5 ) and dependent variable Y generated by Y ( 1 ) ( 3 4) with probability q = Boolean var iable(0.5) with probability 1 q Input: input nuber of independent variables 10 input nuber of observations 50 input nuber of observations in the training set 150 input p = Bernoulli variable in dep variables... aybe p =.5.5 input q = probability of picking Y as a logical relation of independent variables.9 input the first of the 4 variables to be used 1 input the second of the 4 variables to be used input the third of the 4 variables to be used 3 input the fourth of the 4 variables to be used 4 input nuber of iterations 5 Output:

11 Logical Regression nuber of variables = 10 nuber of observations = 50 nuber of observations in the training set = 150 nuber of observations in the checking set = 100 Iteration 1 (best 3 predictors) predicts independent variable in the training set 76% of the tie 3 4 predicts independent variable in the training set 75% of the tie 1 predicts independent variable in the training set 68% of the tie 3 best predictor 3 4 predicts checking set dependent variable 75% of the tie Iteration (best 3 predictors) ( ) ( ) predicts independent variable in training set 95% of the tie 1 3 4 ( ) ( ) predicts independent variable in training set 95% of the tie 4 3 ( ) ( ) predicts independent variable in training set 93% of the tie 3 4 1 3 best predictor ( 1 ) ( 3 4) predicts checking set Y 93% of the tie coefficients in linear cobination of logical functions, the first one is a constant ter 0.034356 0.7950 0.10 0.019969 0.0080103-0.077889-0.078816 0.0030564 0.058093 0.0577-0.03567 stopped after iterations References: Farlow, S. J., 1984, Self-Organizing Methods in Modeling: GMDH Type Algoriths, Marcel Dekker. Ruczinski, I., Kooperberg, C., and LeBlanc, M., 003, Logic Regression, Cop. Graph. Statist 1.

1