Regression, Curve Fitting and Optimisation

Size: px

Start display at page:

Download "Regression, Curve Fitting and Optimisation"

Sylvia Jones
6 years ago
Views:

1 Supervised by Elena Zanini STOR-i, University of Lancaster 4 September 2015

2 1 Introduction Root Finding 2 3 Simulated Annealing 4 5 The Rosenbrock Banana Function 6 7

3 Given a set of data, what is the optimum curve which may be fitted? This question has obvious importance in queries regarding relationships between two or more variables, as well as explaining data quantitatively.

4 If a straight line is needed, we can do the standard trick of using Ordinary Least Squares (OLS). However, there will be situations in which this may not be appropriate.

5 Some Less Trivial Examples y y y x x x

6 We observe that the OLS inference arises from an optimisation problem, namely argmin b R p Y Xb 2. So it makes sense to think about the problem of optimal curve fitting from the perspective of optimisation.

7 Optimisation has an obvious analogue in root finding. There are several core methods we can use for this: Bisection; Newton-Raphson; Secant; Muller s. All of these (except Newton-Raphson) are derivative-free.

8 In higher dimensions, one of the more effective non-derivative-free methods is the Broydon-Fletcher-Goldfarb-Shanno (BFGS) Method, which can be adapted to optimise by changing the iterative equation to x n+1 = x n [Hf (x n )] 1 f (x n ).

9 The Suppose our goal is to minimise the function f (x), where x R n.

10 Start with n + 1 test points: x 1,..., x n+1.

11 Order these points by output value, so that f (x 1 ) f (x 2 )... f (x n+1 ). x 3 x 1 x 2

12 We consider several different candidate points, and if these aren t an improvement, then we shrink the simplex.

13 How well does this work on the problem? yrange yrange yrange xrange xrange xrange

14 Disadvantages of Nelder-Mead We usually require a reasonable idea of the form of the relationship between the two variables in question to produce a reasonable eventual plot; If the data do not conform well to the true underlying relationship, the procedure can be very costly, and could arrive at an incorrect answer if the initial conditions are poorly specified.

15 Several alternative methods of optimisation can be used which employ a probabilistic approach. These include: Simulated Annealing; Genetic Algorithms; Ant Colony Optimisation.

16 Simulated Annealing (SA) is a physical process describing the cooling of a material in a system with a controlled negative temperature gradient. It can be observed that under situations where a substance such as water cools in such a system, an optimal solid arrangement is obtained.

17 How SA works Introduction To use Simulated Annealing in an optimisation problem, the following need to be well defined: The neighbours of each state - e.g. for a discrete domain, a rearrangement of two adjacent states; The energies of each state; The probability of moving from state S to state S - states with smaller energy preferred, so P(E, E, T ) > P(E, E, T ) when E < E.

18 How SA works Introduction In the problem of curve fitting: We shall define a neighbour of the current curve as an addition of a small, simple function; The probabilites shall be set as follows: If E < E, then P(E, E, T ) exp( E E T ); Else, P(E, E, T ) 1.

19 How well does this work on the problem? yrange yrange xrange xrange

20 Disadvantages of SA Often requires a high starting temperature to achieve a reasonable result; The model is very sensitive to starting temperature - choice is not obvious; Is very difficult to achieve a fairly accuracte solution, as it is difficult to construct well defined neighbours which enable effective zeroing in on a state in a continuous domain.

21 Suppose we had no intuition at all as to an underlying relationship, such as in the example shown below y x

22 One way of tackling the problem of curve fitting in this instance is to give each point an associated reward function, with shape similar to a hillock.

23 A reward function found to be useful is f (r) = k d e r 0.55, where k d is a constant depending on the datapoint d and r is the Euclidean distance from the datapoint. Can take k d = e D d.

24 A total reward function is then constructed by summing all the reward functions, and then this can be optimised through a brainless search for the curve that optimises reward. Size of second largest city proper by population size (millions) Size of largest city proper by population size (millions)

25 Disadvantages of this approach Model prone to overfitting; Additional methodology may therefore be needed, such as Cross-Validation or Akaike s Information Criterion; Depending on the initial weighting, the resultant optimal curve can favour the OLS line.

26 All these methods were tried on a series of standard test functions before moving on to a real-life application.

27 Introduction There are several functions which are notoriously tricky to optimise numerically. These were used to test the robustness of the algorithms involved. Some examples include: The Rosenbrock Banana Function; Five-Uneven-Peak Trap; Equal Maxima; Uneven Decreasing Maxima.

28 The Rosenbrock Banana Function This function takes the form f (x, y) = (a x) 2 + b(y x 2 ) 2, for some a, b.

29 Extreme Value Theory, A Brief Background One way of defining extreme events is to define a threshold, and anything exceeding this threshold is classed as extreme. This gives rise to the Generalised Pareto Distribution (GPD), whose likelihood is L(σ, ξ) = 1 k σ k i=1 (1 + ξ y i 1 σ ) (1+ ξ ), ξ 0, L(σ, ξ) = 1 k σ k i=1 exp( y i σ ), ξ = 0; where k is the number of datapoints exceeding the threshold, ξ is the shape and σ is the scale.

30 difference in log of closures Amount of rainfall(mm) days elapsed Day Number The left figure corresponds to log-differences of daily closing prices between 1996 and The right figure shows daily rainfall accumulations in South West England between 1914 and 1962.

31 We use Nelder-Mead to fit the GPD and obtain Dataset Threshold ˆσ ˆξ Rain Dow Jones (Candidate thresholds were chosen by observation using the mean residual life plot.) Other procedures, such as Simulated Annealing, proved to be less successful than Nelder-Mead at finding the MLEs.

32 Other Extreme Value Theory Machinery There are several other things we can consider: An alternative and theoretically equivalent approach would be to use a Poisson Point Process (PPP) model; Sometimes the underlying process is more complicated, and covariates need to be added to the model. The first of these is still relative straightforward using Nelder-Mead, however introducing covariates is more complex, and will often result in a convergence to a local optimum.

33 s Introduction In general: Nelder-Mead remains a very effective algorithm used for blind optimisation ; SA shoud be preferred only if there is a strong intuition for a starting temperature. Pinning down a sensible starting value for the temperature may be a fruitful approach in further work; Computationally, gradient-free methods are preferred.

34 s Introduction With respect to Extreme Value Theory: Nelder-Mead becomes highly sensitive to initial conditions in the covariate case; Investigating the application of SA and an effective choice of threshold may be of interest.

35 References Introduction Atkinson, K.E. (1989). An introduction to numerical analysis. Inference and background on deterministic algorithms. Nocedal, J. & Wright, S.J. (2006). Numerical optimization. Higher-dimensional deterministic methods. Reeves, C.R. (1995). Modern Heuristic Techniques for Combinatorial Problems. Simulated Annealing Reference. Coles, S. (2004). An Introduction to Statistical Modeling of Extreme Values.

Numerical Optimization: Basic Concepts and Algorithms

May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some