12 Symbolic Regression

Size: px

Start display at page:

Download "12 Symbolic Regression"

Kristina Craig
5 years ago
Views:

Computational Geosciences - 2015 - PART I METHODS 12 Symbolic Regression 12-1 Introduction Symbolic regression is the process of determining the symbolic function which describes a data set

1 Computational Geosciences PART I METHODS 12 Symbolic Regression 12-1 Introduction Symbolic regression is the process of determining the symbolic function which describes a data set effectively developing an analytic model which summarizes the data and is useful for predicting response behaviors as well as facilitating human insight and understanding. The symbolic regression approach adopted herein is based upon genetic programming wherein a population of functions are allowed to breed and mutate with the genetic propagation into subsequent generations based upon a survival-of-the-fittest criteria. Amazingly, this works and, although computationally intensive, summary solutions may be reasonably discovered using current laptop and desktop computers. It should be noted that symbolic regression is not a silver bullet and is, in fact, complementary to its nonlinear analysis brethren of neural networks and statistical learning theory (support vector machines) as well as classical linear analysis and data mining tools and techniques. In all cases, an understanding of the problem, the data variables, the data quality, and definitions of success is a critical aspect since context free analysis can easily lead to confidently wrong answers i.e., is very dangerous. The symbolic regression capability provides a complete and flexible foundation for evolving multivariate datadriven models. The decision to develop this symbolic regression capability within the Mathematica environment was validated during the development of this package as the ability to seamlessly blend symbolic and numeric computing has enabled new features which would have been onerous to implement in a strictly procedural and numeric environment. Additional benefits include facilitating analysis documentation Symbolic Regression (SR) Method Unlike traditional linear and nonlinear regression methods that fit parameters to an expression (equation/relation) of a given form, symbolic regression (SR) simultaneously searches for both the parameters as well as the form of expression. Although the discipline of SR has matured significantly in the last few years (e.g., Davidson et al. 2003), its applications to geodesy are very rare, exemplified only in the works of Wu et al. (2007, 2008) who employed it for transforming GPS coordinates into two-dimensional coordinates. In a more recent work, Wu and Su (2013) developed a lattice-based clustering method and integrated it with a genetic programming to build a better regression model for coordinate transformation. In natural and technical sciences, and in finance, however, the method has been applied for quite a long time efficiently, (e.g., Santini and Tettamanzi (2001); Banks (2002); Kwon and Moon (2005); Babu and Karthik (2007); Schmidt and Lipson (2009); Langdon and Gustafson (2010); and Garg and Tai (2011)). In hydrological sciences, for example, Parasuraman et al. (2007) applied it to model the dynamics of evapotranspiration, where their results performed better than the traditional Penman-Montieith method, and were comparable to those of artificial neural network (ANN). The proposed method could be of use to geodesy, where regression analysis and functional approximation are often handled. For instance, SR could be used for gravimetric corrections where they have traditionally been carried out using a wide variety of parametric and non-parametric surfaces such as polynomial models (e.g., Fotopoulos and Sideris 2005), spline interpolation

2 2 SymbolicRegression_12.nb Fotopoulos and Sideris 2005), spline interpolation (e.g., Featherstone 2000), least squares collocation (LSC) (e.g., Iliffe et al. 2003), Kriging (e.g., Nahavandchi and Soltanpour 2004), combined least squares adjustments (e.g., Fotopoulos 2005) and thin plate spline (TPS) surface of solving the problem via finite elements method (e.g., Zaletnyik et al. 2007). Applying soft computing technique, Kavzoglu and Saka (2005) and Lao-Sheng Lin (2007) employed artificial neural network (ANN) for approximating the GPS/leveling geoidal heights instead of the corrector surface itself. Zaletnyik et al. (2007) also used ANN but with radial bases activation functions (RBF) and regularization in the training phase. Soltanpour et al. (2006) used second generation wavelets to approximate corrector surface directly. Another soft computing technique represented by the support vector machines (SVM) was employed by Zaletnyik et al. (2008). Some of these models are global types of regression, (e.g., linear parametric models and ANN), while some are interpolating type of local models, (e.g., thin plate spline and SVM). Generally speaking, local methods are more precise than global on the training set, but their complexity is very high since they involve all of the measured data points in the model structure. In addition, they are less effective on the validation set since the overlearning affect the training set. The foregoing discussions support the need for the proposed SR method, particularly in geodesy where it is of utmost need but rarely used. SR can be considered as a broad generalization of the class of Generalized Linear Models (GLM). GLM is a linear combination of n basis functions β i, i = 1,2,...n with a dependent variable y and an independent vector variable x. n y (x) = c 0 + c i β i (x) + ϵ i=1 where ci are the coefficients and ε the error term. SR will search for a set of basic functions (building blocks) and coefficients (weights) in order to minimize the error ε in case of given y and x. The standard basic functions are constant, addition, subtraction, multiplication, division, sine, cosine tangent, exponential, power, square root, etc. To select the optimal set of basic functions, Koza (1992) suggested employment of genetic programming (GP). GP is a biologically inspired machine learning method that evolves computer programs to perform a task. In order to carry out genetic programming, the individuals (competing functions) should be represented by a binary tree. In standard GP, the leaves of the binary tree are called terminal nodes represented by variables and constants, while the other nodes, the so called non-terminal nodes are represented by functions. Let us see a simple example. Consider Its binary tree representation can be seen in Fig 12.1, β i (x) = x 1 x x 3 Fig The binary tree representation of a basic function βi In this example, there are three variables (x1, x2, x3), two constants (1, 2), and three elementary functions (plus, times, rational). The binary tree of y(x) can be built up from such trees as sub-trees. Mathematically

3 SymbolicRegression_12.nb 3 y(x) = c 0 + c 1 tree 1 + c 2 tree GP randomly generates a population of individuals ( y k (x), k = 1, 2...n ) represented by tree structures to find the best performing trees. There are two important features of the function represented by a binary tree: complexity and fitness. We define complexity as the number of nodes in a binary tree needed to represent the function, Morales (2004). The fitness qualifies how good a model (y = y(x)) is. Basically, there are two types of measures used in SR; the root mean squared error (RMSE) and the Rsquare. The later returns the square of the Pearson product moment correlation coefficient (R) describing the correlation between the predicted values and the target values, (e.g., Ferreira (2006)). Then, the goodness of the model, the fitness function can be defined as, f = 1 1+ RMSE or f = R2 where 0 f 1 GP tries to minimize this error to improve the fitness of the population consisting of individuals (competing functions) from generation to generation by mutation and cross-over procedure. Mutation is an eligible random change in the structure of the binary tree, which is applied to a randomly chosen sub-tree in the individual. This sub-tree is removed from the individual and replaced by a new randomly created sub-tree. This operation leads to a slightly (or even substantially) different basic function. Let us consider the binary tree on Fig. 12.2a, where the sub-tree of y 2 is replaced by y+x 2. Then, the mutated binary tree can be seen in Fig. 12.2b Fig Binary tree representations of the mutation. y 2 in a) is replaced by y+x 2 in b) The operation cross-over representing sexuality can accelerate the improvement of the fitness of a function more effectively than mutation alone can do. It is a random combination of two different basic functions (parents), based on their fitness, in order to create a new generation of functions, more fitter than the original functions. To carry out cross-over, crossing points (nonterminal nodes) in tree of both parents should be randomly selected, as it can be seen in Fig. 3. Then subtrees belonging to these nodes will be exchanged creating offsprings. Let us consider the parents before the cross-over. The first parent (x-y)/3, with its crossing point (x) is shown in Fig. 12.3a, the second parent 3x+y 2 /5 with its crossing point (see the sub-tree of y 2 ) is presented in Fig b.

4 4 SymbolicRegression_12.nb Fig The parents before the cross over The children produced by the cross-over are given on Fig The first child (y 2 -y)/3 is shown in Fig a while the second child (16/5)x is shown in Fig. 12.4b. Fig The children produced by cross-over The generalization of GP was invented by Cramer (1985) and further developed by Koza (1992). GP is a class of evolutionary algorithms working on executable tree structures (parse trees). Koza (1992) showed that GP is capable of doing symbolic regression (or function identification) by generating mathematical expressions approximating a given sample set very closely or in some cases even perfectly. Therefore, GP finds the entire approximation model and its (numerical) parameters simultaneously. An important goal in symbolic regression is to get a solution, which is numerically robust and does not require high levels of complexity to give accurate output values for given input parameters. Small mean errors may lead to wrong assumptions about the real quality of the found expressions. To be on the safer side, a worst case absolute error should be determined. Sometimes, alternating the worst case absolute error and RMSE or R 2 as target to be minimized during GP process is the best strategy. Complexity and fitness are conflicting features leading to a multiobjective problem, (e.g., Smits and Kotanchek (2004)). Useful expression is both predictive and parsimonious. Some expressions may be more accurate but overfit the data, whereas others may be more parsimonious but oversimplify. The prediction error versus complexity or 1-fitness versus complexity Pareto front represents the optimal solutions as they vary over expression complexity and maximum prediction error, (e.g., Paláncz and Awange (2012)). As Fig shows, functions representing the Pareto front have the following features: In case of fixed complexity, there is no such a solution (function), which could provide less error as the Pareto solution, Reversely, in case of fixed error, there is no such a solution (function), which would

5 SymbolicRegression_12.nb 5 Reversely, in case of fixed error, there is no such a solution (function), which would have smaller complexity than the Pareto solution. Fig The Pareto - front In order to illustrate the method, let us start with a simple, didactic example: Kepler s Third Law. This problem was considered and rediscovered by Langley et al. (1987) by a family of heuristic techniques as well as by Koza (1992) by genetic programming Didactic example- Kepler third law Kepler s problem The third law of Kepler states: "The square of the orbital period of a planet is directly proportional to the cube of the semi-major axis of its orbit (average distance from the Sun)." P 2 a 3 where P is the orbital period of the planet and a is the semi-major axis of the orbit. For example, suppose planet A is 4 times as far from the Sun as planet B. Then planet A must traverse 4 times the distance of planet B each orbit, and moreover it turns out that planet A travels at half the speed of planet B, in order to maintain equilibrium with the reduced gravitational centripetal force due to being 4 times further from the Sun. In total it takes 4 2 = 8 times as long for planet A to travel an orbit, in agreement with the law (8 2 = 4 3 ). The third law currently receives additional attention as it can be used to estimate the distance from an exoplanet to its central star, and help to decide if this distance is inside the habitable zone of that star. The exact relation, which is the same for both elliptical and circular orbits, is given by the equation This third law used to be known as the harmonic law, because Kepler enunciated it in a laborious attempt to determine what he viewed as the "music of the spheres" according to precise laws, and express it in terms of musical notation. His result based on the Rudolphine table containing the observations of Tycho Brache 1605, see Table 12.1 Table 12.1 Normalized observation planetary data Planet where a is give in units of Earth s semi-major axis. Period Semimajor axis P [yr] a Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto

6 6 SymbolicRegression_12.nb Let us assume that Kepler could have employed one of the function approximation techniques like polynomial regression, artificial neural networks, support vector machine, thin plate spline. Could he find this simple relation with these sophisticated methods? Polynomial regression Let us employ a fifth order algebraic polynomial. Then the resulted polynomial is P = a a a a a Neural network Employing a single-layer feedforward network (Kecman 2001) with two nodes where sigmoid activation function is implemented, we obtained P = a a a Support vector machine Employing wavelet - kernel (Palancz et al. 2005) the result is the following, P = (0.39-a)2 Cos[ ( a)] (0.72-a)2 Cos[ ( a)] (1.-a)2 Cos[ (1. - a)] (1.52-a)2 Cos[ ( a)] (5.2-a)2 Cos[ (5.2 - a)] (9.54-a)2 Cos[ ( a)] (19.19-a)2 Cos[ ( a)] (30.06-a)2 Cos[ ( a)] (39.53-a)2 Cos[ ( a)] Thin plate - spline Employing polyharmonic interpolation (Fasshauer 2007) the approximation is, P = φ φ φ φ φ φ φ φ φ 9 ] where φ i = r 2 log (r) and alternatively r = a - a i for a a i r = 0 for a = a i As we can see, in this way one can hardly find a simple relation, although all of these methods give quite satisfactory solution for the problem, see Table Table 12.2 Errors of the different methods of approximation of the observed planetary data Model 1 - R 2 Standard Deviation Polynomial Regression Neural Network Support Vector Machine Thin plate spline interpolation where R stands for the Root Square Mean Error (RMSE). Remarks. 1 fifth order algebraic polynomial. 2 feedforward neural network with two nodes and linear tail

7 SymbolicRegression_12.nb 7. feedforward neural network with two nodes and linear tail. 3 with Gaussian- kernel. 4 basis function: x 2 Log( x ) Now let us see, how can we solve this problem with symbolic regression Symbolic regression Fig shows the Pareto - front of the generated models via DataModeler. The points represent the generated models. The red points stand for the models belong to the Pareto-front. Fig The Pareto-front (red points) and the evaluated models in case of the Kepler s problem In Table 12.3 some of the models of the Pareto front can be seen. Table Model selection riport Model Complexity 1 - R 2 Function x x ( x) x x x a 3/ x 3/2 + 2 x x 3/2 + 2 x x x 3/2 + 2 x * x x 3/ x x + 5 x x 3/ a +a x x x + x 12+2 x+x 2 It goes without saying that our candidate is the 4th model, since it has a small error and at the same time its complexity is low. In practice, it is useful to carry out an additional nonlinear regression with the candidate model in order to refine the value of its parameters. This means, we use symbolic regression to find the best model structure and then its

8 8 SymbolicRegression_12.nb the value of its parameters. This means, we use symbolic regression to find the best model structure and then its parameters will be improved via standard nonlinear regression, in order to avoid a long run of the symbolic regression procedure. Now let us compute the parameters p i of the candidate model, P = p 0 + p 1 a p 2 Table 12.4 indicates that the estimation of the parameter p 0 is unreliable, since its confidence interval is very wide and the statistic P-Value is high, For this model 1 - R 2 = Table The statistics of the estimated parameters via nonlinear regression Parameter Estimated Value Confidencia Interval P- Value p { , } p { , } p { , } Now let us repeat the regression for the model P = p 1 a p2 we get the following values of the two parameters, see Table 12.5 Table The statistics of the estimated parameters via nonlinear regression Parameter Estimated Value Confidence Interval P- Value p { , p { , From practical point of view one considers p 1 = 1 and a = p 2 = 1.5. Although the error of this simplified model is somewhat higher, since 1 - R 2 = < Now let us see, how we compute the symbolic result via Mathematica Application Mathematica Mathematica itself has not a built-in function for symbolic regression, however a third party product DataModeler package developed by Evolved Analytics ( can be employed. Let us load the package << DataModeler` First we should provide the input - output corresponding data, see Table 12.1 samplepts = SetPrecision[{0.39, 0.72, 1., 1.52, 5.2, 9.54, 19.19, 30.06, 39.53}, 10]; observedresponse = SetPrecision[{0.24, 0.61, 1., 1.88, 11.86, 29.46, 84.01, , }, 10]; Then we compute the Pareto front. The computation is carried out parallel in order to decrease the running time.

9 SymbolicRegression_12.nb 9 Timing@ ParetoFrontLogPlot[ quickmodels1 = SymbolicRegression[ samplepts, observedresponse, DataVariables {x}, AlignModel True, RobustModels True, TimeConstraint 50, FitnessPrecision 10 ] ] , Fig The Pareto-front (red points) computed by Mathematica. The models represented by the Pareto front are in the following table. Here we listed out models which have less error than err and their complexity is lower than comp err = 0.01; comp = 150; ModelSelectionReport[quickModels1, QualityBox -> {comp, err}] Model Selection Report Complexity 1-R 2 Function x ( x) x x x x x x 3/ x 3/ x x 3/ x (10 +x) x x x 3/ x x x 3/ x x 3/ x (4.93+x) x x 3/ x x x 3/2 - (7+x) x2 (11.82+x) x x 3/ x 2 (( x) ( x)) x x x 5/ x +x x x x x + x 2 Table The some good models representing the Pareto- front Kepler considered the 5th model, and leaved out the small constant: x 3! In order to use the selected model for

10 10 SymbolicRegression_12.nb further computations we should do the following operations Here we select the model having less error than and complexity is not greater than 30! kp = SelectModels[quickModels1, QualityBox -> {30, }]; This already a Mathematica expression kpm = ModelPhenotype[kp] // First x 3 and, since u = kpm[[2]] x 3 then the Kepler model is KP = Round Coefficient u, x 3 u[[2]] x Teaching Example Let us try to approximate the following function on the basis of some discrete points. The function is z[{x_, y_}] := (x 2 - y 2 ) Sin[0.5 x] p1 = Plot3D[z[{x, y}], {x, -10, 10}, {y, -10, 10}, PlotPoints {30, 30}, BoxRatios {1, 1, 0.5}] Fig The function to be approximated Let the discrete data (x i, y i, f (x i, y i ) data = Flatten[Table[N[{i - 10, j - 10, z[{i - 10, j - 10}]}], {i, 0, 30, 4}, {j, 0, 30, 4}], 1];

11 SymbolicRegression_12.nb 11 Show[{p1, ListPointPlot3D[data, PlotStyle Directive[Blue, PointSize[0.015]], BoxRatios {1, 1, 0.5}]}] Fig The given points We employ symbolic regression, since we should like to have an explicit expression for discribing the surface. Now we separate the data: the inputs (x i, y i ) and the outputs f (x i, y i ) inputs = Transpose[Take[Transpose[data], {1, 2}]] {{-10., -10.}, {-10., -6.}, {-10., -2.}, {-10., 2.}, {-10., 6.}, {-10., 10.}, {-10., 14.}, {-10., 18.}, {-6., -10.}, {-6., -6.}, {-6., -2.}, {-6., 2.}, {-6., 6.}, {-6., 10.}, {-6., 14.}, {-6., 18.}, {-2., -10.}, {-2., -6.}, {-2., -2.}, {-2., 2.}, {-2., 6.}, {-2., 10.}, {-2., 14.}, {-2., 18.}, {2., -10.}, {2., -6.}, {2., -2.}, {2., 2.}, {2., 6.}, {2., 10.}, {2., 14.}, {2., 18.}, {6., -10.}, {6., -6.}, {6., -2.}, {6., 2.}, {6., 6.}, {6., 10.}, {6., 14.}, {6., 18.}, {10., -10.}, {10., -6.}, {10., -2.}, {10., 2.}, {10., 6.}, {10., 10.}, {10., 14.}, {10., 18.}, {14., -10.}, {14., -6.}, {14., -2.}, {14., 2.}, {14., 6.}, {14., 10.}, {14., 14.}, {14., 18.}, {18., -10.}, {18., -6.}, {18., -2.}, {18., 2.}, {18., 6.}, {18., 10.}, {18., 14.}, {18., 18.}} outputs = Transpose[Take[Transpose[data], {3, 3}]] // Flatten {0., , , , , 0., , , , 0., , , 0., , , , , , 0., 0., , , , , , , 0., 0., , , , , , 0., , , 0., , , , 0., , , , , 0., , , , , , , , , 0., , , , , , , , , 0.} Then we compute the Pareto front, which now is not convex, see Fig ,

12 12 SymbolicRegression_12.nb ParetoFrontLogPlot[ model = SymbolicRegression[inputs, outputs, AlignModel True, RobustModels True, FitnessPrecision 10, TimeConstraint 500 ] ] , Fig The Pareto front now is not convex The selected models can be seen in Table 12.7 ModelSelectionReport[model, QualityBox {500, 0.01}] Model Selection Report Complexity 1-R 2 Function x x x x x x x x 1 +x x x 1 x x 1 3 x x 1 5 x x 1 3 x 1 2 x x x x (7.78- x 1) x x x 5 1 x x x 1 x x 3 1 x x 1 5 x x 1 x 1 2 x x x x 1 3 x x x x x x x x x 1 + x x x x 1 x x 1 3 x x 1 5 x x 1 7 x x 1 3 x 1 2 x Model selection Table 12.7 The eminent models representing the Pareto- front Now we select the model with complexity 262 and error kp = SelectModels[model, QualityBox -> {262, 0.005}]; kpm = ModelPhenotype[kp] // Flatten x x x x x x x x1 + x1 2 + x x1 x x1 3 x x1 5 x x1 3 x1 2 x2 2

13 SymbolicRegression_12.nb 13 p2 = Plot3D[kpm, {x1, -10, 10}, {x2, -10, 10}, PlotPoints {30, 30}, BoxRatios {1, 1, 0.5}] Fig The selected model Now let us see, how the data points fit to the surface of the model, see Fig Show[{p2, ListPointPlot3D[data, PlotStyle Directive[Blue, PointSize[0.015]], BoxRatios {1, 1, 0.5}]}] Fig The selected model and the data points The fitting looks quite good. However, comparing the surfaces of the original model and the selected one, we can see the differences, see Fig left side,

14 14 SymbolicRegression_12.nb Show[{p2, p1}] Fig The surfaces of the original and the selected model In the data points, we have good fitting but where were no points, the fitting is less efficient.this fact shows that we should not select the model with the smallest error this would mean in general overlearning. Let us select the best model kp = SelectModels[model, QualityBox -> {302, 0.000}]; kpm = ModelPhenotype[kp] // Flatten x x x x x x x x1 + x1 2 + x x x1 x x1 3 x x1 5 x x1 7 x x1 3 x1 2 x2 2 p2 = Plot3D[kpm, {x1, -10, 10}, {x2, -10, 10}, PlotPoints {30, 30}, BoxRatios {1, 1, 0.5}]

15 SymbolicRegression_12.nb 15 Show[{p2, ListPointPlot3D[data, Fig The best selected model PlotStyle Directive[Blue, PointSize[0.015]], BoxRatios {1, 1, 0.5}]}] Show[{p2, p1}] Fig The best selected model and the data points 12-7 Extention of function space Fig The surfaces of the original and the best selected model In the basic option, the choice of the type basic functions, which can be employed by the method is quite poor. To gettin better results these function space can be extended, see Help DataModeler SymbolicRegression FunctionPatterns The Pareto - front can be seen in Fig Table? shows that we have got back the original model!

16 16 SymbolicRegression_12.nb ParetoFrontLogPlot[ model = SymbolicRegression[inputs, outputs, AlignModel True, RobustModels True, FitnessPrecision 10, FunctionPatterns BuildFunctionPatterns["ExtendedMath", Sin], MemoryLimit 5 "GB", TimeConstraint 500] ] , Fig The Pareto front in case of extended function space ModelSelectionReport[model, QualityBox {500, }] Model Selection Report Complexity 1-R 2 Function Sin x1 Table 12.8 The bets model of the Pareto- front kp = SelectModels[model, QualityBox -> {52, 0.2}]; kpm = ModelPhenotype[kp] // Flatten x1 2 + x2 2 Sin x x12 +x 2 2

17 SymbolicRegression_12.nb 17 p2 = Plot3D[kpm, {x1, -10, 10}, {x2, -10, 10}, PlotPoints {30, 30}, BoxRatios {1, 1, 0.5}] Show[{p2, ListPointPlot3D[data, Fig The selected model PlotStyle Directive[Blue, PointSize[0.015]], BoxRatios {1, 1, 0.5}]}] Now the two surfaces fit perfectly! Fig The selected model with the data points

$18 SymbolicRegression_12.nb Show[{p2, p1}] 12-8 Application example 12-8-1 Geometric transformation Problem definition Fig 12.$

18 18 SymbolicRegression_12.nb Show[{p2, p1}] 12-8 Application example Geometric transformation Problem definition Fig The surface of the original and the selected models Transformation of coordinates is important in computer vision, photogrammetry as well as in geodesy. In this example we consider some standard 2D transformations as similarity, affine and projective transformations between the coordinates of the fiducial marks on the comparator plate and that of the corresponding points on the reseau plate. The 16 observed (x, y) and calibrated reseau (X, Y) coordinates are given in Table 12.9 Similarity transformation Table 12.9 The coordinates of the corresponding observation points on the comparator and reseau planes Points x [mm] [mm] [mm] [mm] The advantage of this model is that it is linear in the coordinates as well as in the parameters. Consequently an iterative solution is not required and the inverse transformation is easy. This transformation can be parametrized in the following form, y X Y

19 SymbolicRegression_12.nb 19 x y = a b -b a X Y + c d For each observed ith point the following pair of observations equation can be written for the residuals (r xi, r yi ), x i y i 1 0 y i -x i 0 1 a b c d + -x i -y i = r x i r yi A minimum of two fiducial marks or reseau crosses are required for a unique solution. Having more observation points linear least square can be directly applied. In our example the result can be seen in Table Table The estimated parameters of the similarity transformation model Parameter Value a b c d The norm of the residual vectors in the observation points are shown in Table Affine transformation Table The norm of the residual vectors at the observation points in case of similarity transformation model Points r xy [mm] This is also a linear model although it needs six parameters, therefore minimum 3 fiducial marks or reseau crosses are required for a unique solution. Iterative solution is not required and the inverse transformation is easy. This model can be parametrized in the following form x y = a d b e X Y + c f For each observed ith point the following pair of observations equation can be written for the residuals (r xi, r yi ), y i x i x i 1 y i a b c d e f + -x i -y i = r x i r yi Having more than 3 observation points linear least square can be directly applied. In our example the result can be seen in Table 12.12

20 20 SymbolicRegression_12.nb Table The estimated parameters of the affine transformation model Parameters Values a b c d e f The norm of the residual vectors in the observation points are shown in Table Projective transformation Table The norm of the residual vectors at the observation points in case of affine transformation model Points r xy [mm] The projective transformation can be expressed by the following two equations, and x = a 1 X + a 2 Y + a 3 c 1 X + c 2 Y + 1 y = b 1 X + b 2 Y + a 3 c 1 X + c 2 Y + 1 These equations are a special case of the collinearity condition for mapping of 2 D points from one plane onto another. There are 8 unknown parameters, so if only four fiducial marks are available, the solution is not unique. This is perhaps the main reason why it is not frequently used. However, it is relevant for systems that have been retro-fitted with a reseau plate, such as the Hasselblad camera used to acquire the assignment imagery. The least square solution is nonlinear due to the rational nature of the functions. However, an approximate linear solution can be implemented if both sides of the equations are multiplied by the denominator and partial derivates with respect to the observables (as in the combined adjutment model) are ignored. The inverse of the transformation can be also computed by solving the equation system in symbolic form for X and Y, we obtain

21 SymbolicRegression_12.nb 21 and X = (a 2 (-y + a 3 ) + (x - a 3 ) b 2 + (-x + y) a 3 c 2 )/(-x b 2 c 1 + a 2 (-b 1 + y c 1 ) + x b 1 c 2 + a 1 (b 2 - y c 2 )) Y = (a 1 (y - a 3 ) + (-x + a 3 ) b 1 + (x - y) a 3 c 1 ) /(-x b 2 c 1 + a 2 (-b 1 + y c 1 ) + x b 1 c 2 + a 1 (b 2 - y c 2 )) Having more than 4 observation points nonlinear least square can be applied. In our example the result can be seen in Table Table The estimated parameters of the projective transformation model Parameters Values a a a b b b c 1 0 c The norm of the residual vectors in the observation points are shown in Table Table The norm of the residual vectors at the observation points in case of projective transformation model Application of symbolic regression Points r xy [mm] {{-110, -110}, {-40, -110}, {40, -110}, {110, -110}, {110, -40}, {40, -40}, {-40, -40}, {-100, -40}, {-110, 40}, {-40, 40}, {40, 40}, {110, 40}, {110, 110}, {40, 110}, {-40, 110}, {-110, 110}} outputx = { , , , , , , , , , , , , , , , }; outputy = {-107.4, , , , , , , , , , , , , , , };

22 22 SymbolicRegression_12.nb inputxy = {{-110, -110}, {-40, -110}, {40, -110}, {110, -110}, {110, -40}, {40, -40}, {-40, -40}, {-100, -40}, {-110, 40}, {-40, 40}, {40, 40}, {110, 40}, {110, 110}, {40, 110}, {-40, 110}, {-110, 110}}; Per definition neither the model nor the parameters of the model are not known. However in order to have a chance for computing the inverse of the transformation trigonometrical functions are excluded from the functionset. Now, we create linear and nolinear models, but linear models will be optimized, Timing@ ParetoFrontLogPlot[ ] model = SymbolicRegression[inputXY, outputx, OptimizeLinearModel True, ] TimeConstraint , Fig The Pareto fron in case of Optimal Linear model for x = (X, Y)

23 SymbolicRegression_12.nb 23 ModelSelectionReport model, QualityBox 250, 10-8 Page 1 Page 2 Model Selection Report Complexity 1-R 2 Function x x x x x x x 1 x x x x 1 x x x x 1 x x x1 x1 x1 x2 x x x 1 x x x x 1 x x x x1 x2 x x x 1 x x x x 1 x x x x x1 x2 x2 2 x x x 1 x x 2 2 x x x 1 x x x 2 2 x x x 1 x x x 2 2 x x x x x 1 x (3 -x x1 2) x x x x x x 1 x x (7 - x 2 ) x 3 2 x x x x x x2 x1 x1 2 2 x x x (x 1 - x 2) 2 + Table Linear and nonlinear models for x = (X, Y) The best linear model among the linear and nonlinear models is kp = SelectModels model, QualityBox -> 30, ; kpm = ModelPhenotype[kp] // Flatten x x x1 x2 Now, only linear models will be created

24 24 SymbolicRegression_12.nb ParetoFrontLogPlot[model = CreateLinearModel[ inputxy, outputx, PolynomialOrder 4, TimeConstraint 10]] , Fig The Pareto fron in case of direct linear modelling for x = (X, Y) ModelSelectionReport model, QualityBox 230, 10-8 Model Selection Report Complexity 1-R 2 Function x x 1 x x 2 1 x x 3 1 x x x x x x x 1 x x 2 1 x x x 1 x x 4 2 Table The linear model for x = (X, Y) kp = SelectModels model, QualityBox -> 205, 10-9 ; kpm = ModelPhenotype[kp] // Flatten x x x x x x1 x x1 2 x x1 3 x x x1 x x1 2 x x x1 x x2 4 Surprisingly, the best linear model is among the linear and nonlinear models is better than direct linear model! Now we select both xv1 = ` `X `Y `*^-7X Y X Y X Y and

25 SymbolicRegression_12.nb 25 xv2 = `*^ `x `*^-6x `*^-9x `*^-10x `x `*^-6x1 x `*^-10x1 2 x `*^-11x1 3 x `x `*^-9x1 x `*^-13x1 2 x `*^-9x `*^-11x1 x `*^-7x2 4 /. {x1 X, x2 Y} X X X X Y X Y X 2 Y X 3 Y Y X Y X 2 Y Y X Y Y 4 Similarly let us compute the y (X, Y) function, too. Timing@ ParetoFrontLogPlot[ model = SymbolicRegression[inputXY, outputy, OptimizeLinearModel True, TimeConstraint 150 ] ] , Fig The Pareto fron in case of Optimal Linear model for y = (X, Y)

26 26 SymbolicRegression_12.nb ModelSelectionReport model, QualityBox 250, 10-8 Page 1 Page 2 Model Selection Report Complexity 1-R 2 Function x x x x x x x 1 x x x x x x x x x x 1 x x x1 x x x x x x x x2 x x x x x x 1 2 x / ( x 1 + x 2) x x x x x / ( x 1 +x 2) x x x x x 1 2 x / ( x 1 + x 2) x x x x x 2 1 x x x 2 x x x x x 1 x x 1 +x / ( x 1 + x 2 ) x x x x 1 x x1 +x / ( x 1 + x 2) x x x x x x x (4+ x 1 +x 2) x x /( x 1 -x 2) x / ( x 1 +x 2) + x2 The best linear model is Table Linear and nonlinear models for y = (X, Y) kp = SelectModels model, QualityBox -> 42, ; kpm = ModelPhenotype[kp] // Flatten x x x1 x x2 2 Now, only linear models will be created

27 SymbolicRegression_12.nb 27 ParetoFrontLogPlot[model = CreateLinearModel[ inputxy, outputy, PolynomialOrder 4, TimeConstraint 10]] , Fig The Pareto fron in case of direct linear modelling for y = (X, Y) ModelSelectionReport model, QualityBox 230, 10-8 Model Selection Report Complexity 1-R 2 Function x x 1 x x 2 1 x x 3 1 x x x x x x x 1 x x 2 1 x x x 1 x x 4 2 Table The linear model for y = (X, Y) kp = SelectModels model, QualityBox -> 205, ; kpm = ModelPhenotype[kp] // Flatten x x x x x x1 x x1 2 x x1 3 x x x1 x x1 2 x x x1 x x2 4 Now the best linear model is among the linear and nonlinear models has smaller complexity but higher error level than the direct linear model! Therefore we select both yv1 = ` `x `x `*^-7x1 x `*^-7x2 2 /. {x1 X, x2 Y} X Y X Y Y 2

28 28 SymbolicRegression_12.nb yv2 = `*^ `x `x `*^-9x `*^-10x `x `*^-8x1 x `*^-9x1 2 x `*^-12x1 3 x `x `*^-9x1 x `*^-11x1 2 x `*^-9x `*^-11x1 x `*^-8x2 4 /. {x1 X, x2 Y} X X X X Y X Y X 2 Y X 3 Y Y X Y X 2 Y Y X Y Y 4 Now we compute the sum of residuals rxy1 = MapThread[Norm[{(#2 - xv1), (#3 - yv1)}] /. {X #1[[1]], Y -> #1[[2]]} &, {inputxy, outputx, outputy}]; TableForm[rxy1] Apply[Plus, rxy1] RMSE % / rxy2 = MapThread[Norm[{(#2 - xv2), (#3 - yv2)}] /. {X #1[[1]], Y -> #1[[2]]} &, {inputxy, outputx, outputy}]; TableForm[rxy2] Apply[Plus, rxy2]

29 SymbolicRegression_12.nb 29 RMSE % / So the higher order pair give better result. Table The norm of the residual vectors at the observation points in case of the nonlinear transformation model obtained by SR Points r xy [mm] The Fig illustrates how this nonlinear transformation warpes (deformates) the original (X, Y) plane into the (x, y) plane

30 30 SymbolicRegression_12.nb Fig The original plane and below the plane after warping via the nonlinear transformation Comparison of the different transformation models Max[rxy2] StandardDeviation[rxy2] It is not surprising that transformation models having more parameters and nonlinear form can provide less error than others. Let us summarize some statistic values of the different methods in Table Transformation model Table Some statistical values of the different transformation models Max of Absolute Errors Standard Deviation of the Absolute Errors RMSE of the residual Errors [mm] [mm] [mm] Similarity Affine Projective Symbolic Regression Both SR models are better than the other models. Although the nonlinear transformation obtained by the symbolic regression gives the best fitting, to compute the inverse of the transformation is not an easy task. However it can be solved via Groebner basis.

Physics Unit 7: Circular Motion, Universal Gravitation, and Satellite Orbits. Planetary Motion

Physics Unit 7: Circular Motion, Universal Gravitation, and Satellite Orbits Planetary Motion Geocentric Models --Many people prior to the 1500 s viewed the! Earth and the solar system using a! geocentric