Koza s Algorithm. Choose a set of possible functions and terminals for the program.

Similar documents
Evolution Strategies for Constants Optimization in Genetic Programming

Lecture 5: Linear Genetic Programming

Chapter 8: Introduction to Evolutionary Computation

Stochastic Search: Part 2. Genetic Algorithms. Vincent A. Cicirello. Robotics Institute. Carnegie Mellon University

CSC 4510 Machine Learning

Genetic Engineering and Creative Design

On computer aided knowledge discovery in logic and related areas

Evolving a New Feature for a Working Program

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Finding an Impulse Response Function Using Genetic Programming

Local Search & Optimization

Geometric Semantic Genetic Programming (GSGP): theory-laden design of semantic mutation operators

Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat:

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

Intelligens Számítási Módszerek Genetikus algoritmusok, gradiens mentes optimálási módszerek

USE OF GENETIC PROGRAMMING TO FIND AN IMPULSE RESPONSE FUNCTION IN SYMBOLIC FORM( *)

Program Evolution by Integrating EDP and GP

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Lecture 9 Evolutionary Computation: Genetic algorithms

Program Evolution by Integrating EDP and GP

Using Differential Evolution for GEP Constant Creation

Improving Coordination via Emergent Communication in Cooperative Multiagent Systems: A Genetic Network Programming Approach

Machine Learning Lecture 7

Interplanetary Trajectory Optimization using a Genetic Algorithm

Top Down and Bottom Up Composition. 1 Notes on Notation and Terminology. 2 Top-down and Bottom-Up Composition. Two senses of functional application

Lesson 3 - Linear Functions

Local Search & Optimization

Fall 2003 BMI 226 / CS 426 LIMITED VALIDITY STRUCTURES

Geometric Semantic Genetic Programming (GSGP): theory-laden design of variation operators

Artificial Intelligence Methods (G5BAIM) - Examination

Classification Trees

Design Optimization of an Electronic Component with an Evolutionary Algorithm Using the COMSOL-MATLAB LiveLink

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data

BIOSTATISTICS NURS 3324

Research Article A Novel Differential Evolution Invasive Weed Optimization Algorithm for Solving Nonlinear Equations Systems

INVARIANT SUBSETS OF THE SEARCH SPACE AND THE UNIVERSALITY OF A GENERALIZED GENETIC ALGORITHM

biologically-inspired computing lecture 18

Genetic Algorithms: Basic Principles and Applications

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch

Quad-trees: A Data Structure for Storing Pareto-sets in Multi-objective Evolutionary Algorithms with Elitism

Classification and Regression Trees

AND AND AND AND NOT NOT

Genetic Algorithms. Donald Richards Penn State University

QR Decomposition. When solving an overdetermined system by projection (or a least squares solution) often the following method is used:

NUMERICAL SOLUTION FOR FREDHOLM FIRST KIND INTEGRAL EQUATIONS OCCURRING IN SYNTHESIS OF ELECTROMAGNETIC FIELDS

Evolutionary Functional Link Interval Type-2 Fuzzy Neural System for Exchange Rate Prediction

Symbolic Computation using Grammatical Evolution

Lecture 1 : Data Compression and Entropy

Symbolic regression of thermophysical model using genetic programming

Factoring. Expressions and Operations Factoring Polynomials. c) factor polynomials completely in one or two variables.

DETECTING THE FAULT FROM SPECTROGRAMS BY USING GENETIC ALGORITHM TECHNIQUES

CSC 411 Lecture 6: Linear Regression

Deciphering Math Notation. Billy Skorupski Associate Professor, School of Education

The European Applied Business Research Conference Rothenburg, Germany 2002

Advanced Techniques for Mining Structured Data: Process Mining

August 27, Review of Algebra & Logic. Charles Delman. The Language and Logic of Mathematics. The Real Number System. Relations and Functions

Regression. Simple Linear Regression Multiple Linear Regression Polynomial Linear Regression Decision Tree Regression Random Forest Regression

Solving Iterated Functions Using Genetic Programming

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

Honors Advanced Algebra Unit 3: Polynomial Functions November 9, 2016 Task 11: Characteristics of Polynomial Functions

Evolutionary Algorithms for the Design of Orthogonal Latin Squares based on Cellular Automata

Differentiation 1. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996.

Zebo Peng Embedded Systems Laboratory IDA, Linköping University

Ron Paul Curriculum 9th Grade Mathematics Problem Set #2

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Practice Test III, Math 314, Spring 2016

Contents. Grammatical Evolution Tutorial. Biological Phenomena. Issues with GP GECCO Motivation. Overview. Example

Gecco 2007 Tutorial / Grammatical Evolution

Foundations of Math II Unit 5: Solving Equations

Defining Locality as a Problem Difficulty Measure in Genetic Programming

CS 343: Artificial Intelligence

Deduction by Daniel Bonevac. Chapter 3 Truth Trees

Evolutionary computation in high-energy physics

Learning Decision Trees

New rubric: AI in the news

CSE 473: Artificial Intelligence Spring 2014

Crossover Techniques in GAs

INF4130: Dynamic Programming September 2, 2014 DRAFT version

18.9 SUPPORT VECTOR MACHINES

Decision Trees. None Some Full > No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. Patrons? WaitEstimate? Hungry? Alternate?

Exponential smoothing is, like the moving average forecast, a simple and often used forecasting technique

Estimation-of-Distribution Algorithms. Discrete Domain.

Teachers Guide. Overview

I N N O V A T I O N L E C T U R E S (I N N O l E C) Petr Kuzmič, Ph.D. BioKin, Ltd. WATERTOWN, MASSACHUSETTS, U.S.A.

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

Lecture 2: Linear regression

Lecture 15: Genetic Algorithms

Generalisation of the Limiting Distribution of Program Sizes in Tree-based Genetic Programming and Analysis of its Effects on Bloat

Search. Search is a key component of intelligent problem solving. Get closer to the goal if time is not enough

CSC 411 Lecture 3: Decision Trees

ELLIPTIC CURVES BJORN POONEN

The Gram-Schmidt Process

Metaheuristics and Local Search

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement

CHAPTER 7. An Introduction to Numerical Methods for. Linear and Nonlinear ODE s

Computational statistics

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

[Read Chapter 9] [Exercises 9.1, 9.2, 9.3, 9.4]

12 Symbolic Regression

4 Bias-Variance for Ridge Regression (24 points)

Transcription:

Step 1 Koza s Algorithm Choose a set of possible functions and terminals for the program. You don t know ahead of time which functions and terminals will be needed. User needs to make intelligent choices for best GP performance. For planetary orbital problem we guessed that the function set is {+, -, *, /, sqrt) and the terminal set is A. (If you add more functions and terminals, the problem takes longer to compute a good answer.) handout 11-18-11 1

Step 2 Generate an initial population of random trees (programs) using the set of possible functions and terminals. Random trees must be syntactically correct programs the number of branches extending from each function node must equal the number of arguments taken by that function. (Three such random trees are given in Fig. 2.2.) Notice randomly generated programs can be of different sizes (i.e. can have different numbers of nodes and levels in the trees.) 2

Repeated Slide: Three possible solutions to the relationship between P and A 3

Koza s Algorithm (continued) Step 3 Calculate the fitness of each program in the population by running it for a set of fitness cases (a set of inputs for which the correct output is known). Fig. 2.2 gives the fitness for the three random programs generated. The difference in fitness among members of the population are basis for natural selection 4

Step 4 Apply selection, crossover, and (perhaps) mutation to the population of random program to form a new population. Recommended that 10% of trees in population (chosen probabilistically in proportion to fitness) are copied without modification into the new population ( How related to elitism?) The remaining 90% of the new population is formed by crossovers between the parents selected (in proportion to fitness) from the current population. Fig. 2.3 shows an example of crossover. 5

Crossover Crossover allows the size of a program to increase or decrease. (In cross over, all information from the cross over point to the terminal is incorporated into the crossover.) Mutation can be performed by choosing a random point in a tree and replacing the subtree beneath that point by a randomly generated subtree. Koza typically does not use a mutation operator. Step 3 and 4 are repeated until a stopping criterion is satisfied. 6

Repeated Slide: Crossover in Genetic Programming Parents Crossover Point Offspring 7

Best Solution Obtained Earlier for the Relationship between P and A: P= (A 3 ) 1/2 = SQRT(*A(*AA))) (Lisp) 8

Was the GP solution of P = sqrt(a 3 ) Correct? Kepler s Third Law says P 2 = ca 3 based on physics and mathematics. And units can be selected so that c = 1 so P = sqrt(a 3 ) The Genetic Program found this as the best solution after some generations. So does that mean we don t need theoretical thinkers? 9

Symbolic Regression Genetic Programming can be applied to many types of problems. One of these areas is what Koza called symbolic regression (e.g. the orbital Problem). You are familiar with the idea of regression in statistics. For example, if you speak of linear regression, you want to compute the coefficients a, b, c,, z such that Y = a x + b x 2 + + z x n 10

For nonlinear regression, the expressions on the right side of the equation can be sums of nonlinear terms like polynomials, but you assume that you know the form of the equation in regression and that you are looking only for the coefficient values. (Some of the coefficients might be zero, thereby dropping some terms.) So nonlinear regression does not include some types of regression equations that could be included in symbolic regression For example f(x)= sin(x)/(cos(x) +12x +x 2 11

Genetic Programming: Symbolic Regression The difference with symbolic regression with genetic programming is that the Mathematical form as well as the coefficients are selected by the algorithm since your decision variable (which we will call an S-expression ) is composed of different algebraic expressions (e.g. including terms like x/y, cos x, or exp(x-y/z)). What possible advantage might that approach have? 12

GP Symbolic Regression Selects the Form as well as coefficients In symbolic regression, you are trying to find the form of the equation as well as the coefficients. This is a much harder problem. This is data to function regression. It can be done in general for multidimensional input and output. Genetic Programming can be used to solve symbolic regression problems 13

Symbolic Regression Problem For this first example we will take a one dimensional example so x and y are scalars. Assume we are given 20 pairs of data ( x k, y k ) k=1,,20. We want to find a function f such that Y = f(x) + error with error less than.01 14

First Steps In general we do not know the mathematical expression for f. However, in this example we are going to use 20 pairs of data (x k,y k ) where the value of y k has been generated from Y k = (x k ) 4 + (x )3+ k (x )2 k + (x k ) + 1 Hence the best expression for f is this 4th order polynomial (but the GP user does not know this.) We will see if the genetic programming approach will discover that this is the best form for f. (The GP does not know before it starts calculating that a 4 th order polynomial is the form of f(x) = Y.) 15

Fitness in Symbolic Regression Example Fitness is calculated by comparing the value of the true function y(x) to the population member s k (x) at twenty points {x j } between 0 and 1. So fitness is 20 fitness s k = y x s k x j j j= 1 ( ) ( ) ( ) 16

Generation 0 two examples of individual programs =functions 17

Best of Generation from Generation 0 18

Results for Best of generation individual from generation 0 T= s(x). You compute fitness by adding all the numbers in 4 th row (but there would be 20 values not just the 5 values shown here) 19

Best of Generation 2 X 4 + 1.5x 3 + 0.5x 2 + x Raw fitness is 2.57 (improved from 4.47 in generation 1 best individual) Notice that we have coefficients. How were they generated given that we only started with coefficients of 1? 20

Fitness Curves for the simple symbolic regression problem Usually we only plot the bottom line, which looks flat because 21 of the scale. The bad solutions are REALLY bad.

What does this equal??? Are there unnecessary terms? Best Solution from Generation 34 22

Video of Applications Econometrics forecasting prices as a function of gross national product and money supply. Fit equation to the past and then look at its ability to forecast. 23

Best method for the problem If you know that the best function is a polynomial, would genetic programming be the best method to solve this problem? Why or Why not? When would you recommend using Genetic Programming for Symbolic Regression? 24