Function Approximation - PDF Free Download

1 Function Approximation This is page i Printer: Opaque this 1.1 Introduction In this chapter we discuss approximating functional forms. Both in econometric and in numerical problems, the need for an approximating function often arises. One possibility is that one has a finite set of data points and wants to determine the underlying functional form. For example, suppose one knows that next period s value of the interest rate, +1, is some function of the current interest rate, but one doesn t know what the functional form is. That is, one has the empirical relationship +1 = ( )+ +1 (1.1) and the question is then whether with enough data one can discover the functional form of. The need for an approximating functional form also arises if one could in principle calculate the function value for any given set of arguments but that it is very expensive to do so. For example, the function value may be the outcome of many complex calculations and it may take a lot of computing time to calculate one function value. With an approximating functional form one could obtain (approximate) function values much quicker. Again the goal would be to come up with an approximating functional form using a finite set of data points. There is a big difference with the problem that the econometrician faces, however, because the econometrician cannot choose his data points. We will see in this section that the freedom to

ii 1. Function Approximation choose the location of the arguments makes it much easier to come up with accurate approximations. Finally, the theory on function approximation is very useful if one is trying to solve for a function that is (implicitly) defined by a system of functional equations. 1.2 Polynomial approximations Most of this chapter will be devoted to polynomial approximations, i.e., = 0 + 1 + 2 2 + + (1.2) where is a scalar. Later in this chapter will we discuss functions with multiple arguments. We will see that there are actual many different types of polynomials by choosing different basis functions, ( ). Thatis,amore general way to write polynomials is = 0 + 1 1 ( )+ 2 2 ( )+ + ( ) (1.3) For example, ( ) could be (ln( )). Moreover, one can take transformations. For example, if one knows that the function value is always positive, one could use this information by letting =exp( 0 + 1 1 ( )+ 2 2 ( )+ + ( )) (1.4) 1.2.1 How good are polynomial approximations Weierstrass theorem tells us that a continuous real-valued function defined on a bounded interval on the real line can be approximated arbitrarily well using the sup norm if the order of the polynomial goes to infinity. Even functions that have a discontinuity can be approximated arbitrarily well if one uses another norm then the sup norm. 1.2.2 How to find polynomial approximations In this subsection, we discuss four different ways to come up with a polynomial approximation. The procedures differ in whether they use only local information (Taylor expansion) and whether they use information about derivatives or not. Taylor expansion If one has the function value and derivatives at one point, 0,thenone can calculate a polynomial approximation using the Taylor expansion. ( ) ( 0 )+( 0 ) ( ) + + ( 0) ( )! = 0 (1.5) = 0 Note that the right-hand side is indeed a polynomial.

1.2 Polynomial approximations iii Projection More common though is that one has +1 function values, 0, 1,,, at +1 arguments, 0, 1,,. There are two procedures to find the coefficients of the polynomial. The firstistorunaregression. Ifyoudoit right you get an R 2 equal to 1. Lagrange Interpolation The approximating polynomial is also given by where ( ) 0 0 ( )+ + ( ) (1.6) ( ) = ( 0) ( 1 )( +1 ) ( ) ( 0 ) ( 1 )( +1 ) ( ) (1.7) ( ) is a polynomial so the right-hand side of 1.6 is a polynomial as well. So we only have to show that this polynomial gives an exact fit atthe +1 points. This is easy if one realizes that ½ 1 if = ( ) = 0 if { 0 }\{ } (1.8) Consequently, one gets ( )= (1.9) It is unlikely that you will actually find it useful to write the polynomial like this, but we will see that this way of writing a polynomial is useful for numerical integration. Hermite Interpolation The last two procedures only used function values at a set of grid points. Now suppose that in addition to +1 function values one also has numerical values for the derivatives at the nodes, i.e., 0, 0 1,, 0.Withthese2( +1) 0 pieces of information one can calculate a (2 +1) th order polynomial. where X X ( ) ( )+ 0 e ( ) (1.10) =0 =0 =(1 2 0 ( )( )) ( ) 2 e =( ) ( ) 2 Note that ( ) = e 0 ( ) = ½ 1 if = 0 if { 0 }\{ }

iv 1. Function Approximation e ( ) = ( ) =½ 0 0 if = 0 if { 0 }\{ } The approximation gets the function values right at all nodes because the e ( ) terms are all zero and the ( ) terms are 1 at = and zero otherwise. The approximation gets the derivatives right because the ( ) are all zero and the e ( ) select with its zero/one structure the appropriate derivative. 1.2.3 Orthogonal polynomials From the discussion above, it became clear that finding the coefficients of the polynomial is like a projection on the space spanned by the basis function, i.e., like a regression. In any introductory econometrics course one learns about the problem of multicollinearity and the advantage of having uncorrelated explanatory variables. The same is true in function approximation. Moreover, in numerical problems it is important to have good initial conditions. The problem with the basis functions of regular polynomials (i.e., 1 2 etc.) is that these terms are often highly correlated unless one has a lot of variation in the argument. Adding one additional polynomial term could thus very well mean that the coefficients of all polynomial terms change substantially even if the extra terms has little additional explanatory power. Orthogonal polynomials are such that the basis functions are by construction orthogonal with respect to a certain measure. That is, the basis functions of all orthogonal polynomials satisfy Z ( ) ( ) ( ) =0 Ä 6= (1.11) for some weighting function ( ). Popular orthogonal polynomials are Chebyshev polynomials and the reason they are popular will become clear below. Chebyshev polynomials are defined on the interval [ 1 1] and the weighting function is given by 1 ( ) = (1.12) (1 2 ) 1 2 The basis functions of the Chebyshev polynomials are given by 0 ( ) = 1 1 ( ) = +1( ) = 2 ( ) 1( ) 1 Note that if one builds a polynomial with Chebyshev basis functions, i.e., X ( ) ( ) (1.13) =0

1.2 Polynomial approximations v then one also has a standard polynomial, i.e., of the form 0 + 1 + 2 2 + + where the sarefunctionsofthe s. The same is true for other orthogonal polynomials. So the reason we use orthogonal polynomials is not that we get a different type of product. The reason we use orthogonal polynomials is that it is easier to find the coefficients, for example, because if one add higher order terms good initial conditions are the coefficients one found with the lower-order approximation. Chebyshev polynomials are defined on a particular interval but note that a continuous function on a compact interval can always be transformed so that it is definedinthisrange. 1.2.4 Chebyshev nodes We now define a concept of which the importance will become clear in the remainder of this section. Chebyshev nodes are the x-values at which a basis function is equal to zero. For example, 2 =2 2 1 (1.14) and the corresponding roots are equal to p 1 2 and + p 1 2. Similarly, 3 =4 3 3 (1.15) and the roots are are equal to p 3 4, 0, and p 3 4. Ifonewants to construct chebyshev nodes one thus takes the th Chebyshev basis function and finds the roots that set it equal to zero. 1.2.5 Uniform convergence Weierstrass theorem implies that there are polynomial approximations that converge uniformly towards the true function, because convergence in the sup norm implies uniform convergence. To find this sequence, however, one must be smart in choosing the points that one uses in the approximation. If one uses observed data points one doesn t have this degree of freedom, but in many numerical problems one does. The flexibility to choose the approximation points will turn out to be a great benefit in many numerical problems. It turns out that by fitting the polynomial at the Chebyshev nodes guarantees uniform convergence. A famous function to document how terrible not having uniform convergence can be is: ( ) = 1 1+ 2

vi 1. Function Approximation defined on [ 1 1]. As an exercise you should compare the following two strategies to find the coefficients of the approximating polynomial. The first strategy finds the coefficients of the approximating polynomial using +1equidistant points and its function values. The second strategy uses Chebyshev nodes. The polynomials that one obtains with equidistant points only converge point wise and as increases one sees bigger and bigger oscillations. For a formal discussion one should read Judd (1998). But some intuition of why Chebyshev nodes are so powerful can be obtained by thinking of the formula for standard errors in a standard regression problem. If is the matrix with all the observations of the explanatory variables and 2 the error variance, then the standard error is given by 2 ( 0 ) 1.Thatis, the further apart the x-values the smaller the standard error. Chebyshev nodes are more spread towards the boundaries then equidistant points and, thus, we obtain a more accurate approximation using polynomials fitted at Chebyshev nodes. 1.2.6 Other types of basis functions Orthogonal polynomials can be written as ordinary polynomials. They differ from ordinary polynomials by having different basis functions, but an th -order Chebyshev polynomial can be written as an th -order regular polynomial. Nevertheless one has quite a bit of flexibility with polynomial approximations. For example, instead of approximating ( ) one can approximate (exp e ) = (exp(ln )). Or if one knows that the function value is always positive one can approximate ln( ( )). Of course, one also could consider alternatives to using polynomial basis functions. An alternative is to use neural nets. The idea behind neural nets is very similar to using polynomial approximations but they use different basis functions to build up the approximating function. In particular let x be a vector with function arguments and let :. Then the neural net approximation is given by (x) X ( x 0 + ), (1.16) =1 where,,, and : is a scalar squashing function, that is, a function with function values in the unit interval. Neural net approximations are not very popular in macroeconomics. The reason is that neural net approximations need quite a few parameters (layers) to approximate low-order polynomials and many series in economics are well approximated with polynomials. Neural nets have been more successful in explaining time series with more chaotic behavior.

1.3 Splines vii 1.3 Splines The approximation procedures discussed above all had in common that for each possible argument at which one would like to evaluate the function, there is one identical polynomial. The idea about splines is to split up the domain into different regions and to use a different polynomial for each region. This would be a good strategy if, the function can only be approximated well with a polynomial of a very high order over the entire domain, but can be approximated well with a sequence of low-order polynomials for different parts of the domain. The inputs to construct a spline are again +1function values at +1nodes. Splines can still be expressed as a linear combination of basis functions which is the same for each possible argument, but this is not a polynomial. The basis functions are zero over most of the domain. Thus splines take a much more local approach and a change in a function value far away from is much less likely to affect the approximation for ( ) when using splines then when using polynomial approximations. Piece-wise linear The easiest spline to consider is a piecewise linear interpolation. That is for [ +1 ] ( ) µ 1 µ + +1 +1 +1 th -order spline Piece-wise linear splines are in general not differentiable at the nodes and this could be a disadvantage. But it is easy to deal with this by fitting a (low-order) polynomial on each segment and choose the coefficients such that it fits the function values at the nodes and the function is smooth at the nodes. Consider what needs to be done to implement a cubic spline. A cubicsplineuses ( ) + + 2 + 3 for [ 1 ] Since we have segments we have separate cubic approximations and thus 4 unknown coefficients. What are the conditions that we have to pin down these coefficients? We have 2+2( 1) conditions to ensure that the function values correspond to the given function values at the nodes. For the two endpoints, 0 and +1, we only have one cubic that has to fit it correctly. But for the intermediate nodes we need that the cubic approximations of both adjacent segments give the correct answer. For

viii 1. Function Approximation example, we need that 1 = 1 + 1 1 + 1 2 1 + 1 3 1 and 1 = 2 + 2 1 + 2 2 1 + 2 3 1 To ensure differentiability at the intermediate nodes we need +2 +3 2 = +1 +2 +1 +3 +1 2 for { 1 1 } which gives us 1 conditions. With a cubic spline one can also ensure that second derivatives are equal. That is, 2 +6 =2 +1 +6 +1 for { 1 1 } We now have 2+4( 1) = 4 2 conditions to find 4 unknowns. So we need two additional conditions. For example, one can set the derivatives at the end points equal to zero. There are several algorithms to find the coefficients of spline approximations. See for example the approximation tool kit of Miranda and Fackler or the routines of Chapter 6 of Ken Judd s website at http://bucky.stanford.edu/numericalmethods/pubcode/default.htm Coefficients of the spline What are the coefficients of the spline? There are two answers. The first answer is to say that the coefficients that determine the functional form are the +1combinations for. The other answer is to say that they areallthecoefficients of the separate polynomials. Both answers are, of course, correct. But for latter applications it is more convenient to think of the coefficients of the spline as the ( ) pairs and when the nodes don t change, then the coefficients are just the +1 function values, i.e., the s. Now note that these values may not very directly reveal the function value at an arbitrary, but given that we do use a particular spline, these function values pin down the spline and thus the approximating function value at. More importantly, if finding the approximating function is part of a bigger numerical project then the goal is to find the +1 function values at the nodes. Those completely pin down the solution. 1.4 Shape-preserving approximations Polynomial approximations oscillate around the true function value. Moreover, these oscillations could be such that the approximating function does not inherit important properties of the approximating function, such as monotonicity or concavity. Can you approximate a function and preserve

1.5 Multi-variate polynomials ix such properties? That is suppose that the +1 function values satisfy properties such as being positive, monotonicity, and concavity. Can you come up with an approximation that also has these properties? If one uses one polynomial for the complete domain then this is more likely to happen if one uses a low-order polynomial. But since the fit in terms of distance may be worse for the low-order polynomial one could face a trade-off between accuracy and desired shape. Actually, there is one approximation we discussed that automatically preserves monotonicity and concavity and that is the piece-wise linear approximation. That is, if the function ( ) is monotonic and concave, then the +1function values will of course inherit these properties and so will the interpolated values. Schumacher s algorithm find s second-order splines that preserve (if present) monotonicity and concavity/convexity. 1.5 Multi-variate polynomials 1.5.1 Polynomial approximations Extending polynomial approximations to multi-variate problems is very easy. Just like one easily formulates multivariate polynomials using standard basis functions, one can also formulate multivariate polynomials using other basis functions. For example, consider a function that has and as arguments. Then the th -order complete polynomial is given by X ( ) ( ) + The th -order tensor-product polynomial is given by X ( ) ( ) 1.5.2 Splines Linear interpolation is easy for multivariate problems. Here I give the formulas for the interpolation of a function that depends on and and one has the four function values,,forthe( ) combinations equal to ( ), ( ), ( ), and( ). In this rectangle the interpolated value is given by ³ ³ ( ) 0 25 2 2 2 2 ³ ³ +0 25 2 2 2 ³ ³ +0 25 2 2 ³ ³ +0 25 2 2 2 (1.17)

x 1. Function Approximation Since the right-hand side has the cross product of and this is a first-order tensor but not a first-order complete polynomial. At the boundaries, i.e., the walls of the box, the function is linear, which means that the function automatically coincides with the interpolated values of the box right next to it. Extending splines to multivariate problems is tricky. To see why think about what one has to ensure to construct a two-dimensional spline. The equivalent of the segment for the univariate case is now a box with the floor representing the space of the two arguments. Now focus on the fitted values on one of the walls. These function values and the corresponding derivativeshavetobeequaltothoseonthewalloftheboxnexttoit.so instead of enuring equality at a finite set of nodes one now has to ensure equality at many points even in the two-dimensional case. 1.6 What type of approximation to use? Before one uses any approximation method one should ask oneself what one knows about the function. For example, it is possible that there are special cases for which one knows the functional form. Also, sometimes one knows that the functional form is more likely to be simple in the logs than in the original arguments. This would be the case if one expects the elasticity of with respect to to be fairly constant (as opposed to. The big question one faces is whether one should use one (possibly highorder) polynomial for the entire domain or several (possibly lower-order) polynomials for separate segments. The latter doesn t mean one has to use splines with many segments. Using prior knowledge one can also split the domain in a small number of separate regions, for example, a region where a borrowing constraint is and one where it is not binding. For each of the regions one than fits a separate approximating function. Note that for the case of borrowing constraints one typically wouldn t want the function at the node that connects the two regions to be differentiable. So one really could simply fit two polynomials on the two regions.