BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) ( ), 1,..., It s assued that we have N observatons x, y of soe ndependent varable X and dependent varable Y. The observatons of Y are nfluenced by ε whch s a norally dstrbuted rando coponent wth ean zero and varance σ. The unknown regresson functon f can be approxated by a coposton of a certan nuber of low order polynoals defned on ntervals separated by the so called knot ponts. The Bayesan estaton of the regresson functon allows the nuber and the locaton of the knot ponts to be rando varables whch are estated usng the data. The paper presents several applcatons of ths ethod whch use Markov Chan Monte Carlo technques.. Pecewse polynoals The pecewse polynoal forula approxatng the functon f fro forula (1) s gven by (see [1]): where: k nuber of knots, ( a ) a = + ax,, l k l n n, + n,, () + n= = 1n= l n ( ) = β ( ) + β ( r ) f x x r x r knot ponts ndexed n the ascendng order wth the boundary knot r x, = k+ 1 n l order of the pecewse polynoal, r = x 1 l defnes the order of contnuty; f l = then the functon ay be dscontnuous, otherwse t s contnuous wth l 1 contnuous dervatves. and 1

The values for the l and l paraeters has to be fxed pror to estaton. If we set for exaple l = l = 3, then the functon () s called cubc polynoal splne. The Bayesan soluton to the proble of paraeter estaton does not requre pror specfcaton of the nuber and poston of the knots. These are both treated as rando varables whch are estated usng the avalable data. What s requred however s the specfcaton of the pror dstrbuton on the nuber of knots. The nuber of knots controls the tradeoff between the soothness of the functon and ts ft to the data ponts. The pror dstrbuton used has soe effect on the obtaned result. It sees however that t s not possble to autoatcally ft a curve wthout soe pror knowledge about how sooth t should be. There are two alternatve approaches to the poston of the knots. One of the requres that the knots are postoned on the data ponts x. The other whch s used n the paper has no such restrcton. 3. Pror and posteror dstrbuton In order to estate the odel () paraeters, t s necessary to specfy ther pror dstrbutons. The followng prors wll be assued: β ( n,..., l;,..., ) expectaton and standard devaton equal to 1 N ( ) noral ndependent prors for the n,, k 1 unfor prors on the nterval ( ) = = k paraeters wth ( n, ~, 1 ) β, ( ) r r + for the knot postons r = 1,.., k, gaa pror for the rando coponent varance: ~ G (,1;,1) σ, Posson pror for the nuber of knots k (the expectaton of ths pror wll vary between the exaples). Except for the pror on the nuber of knots, the assued prors are non-nforatve. The only paraeter whch requres soe subjectve judgent s the expectaton of the dstrbuton for k. It s also possble to assue a herarchcal pror for k (see [1]): k ~ Posson( λ ), ~ G( a, b) λ, where a and b are soe specfed constants controllng the uncertanty about the expected nuber of knots. Let us assue that the purpose s to approxate the value of the functon f at a certan pont x. We wll denote by Y the rando varable correspondng to the varable Y at the pont = { }, θ = ( σ, β, : n = n,..., l ; =,.., k ) x. Let also D ( x1, y1),...,( xn, yn ) ( kr, : 1,..., k) ϑ = = and M the dscrete set of possble odels. Then (see [4]): ( ) = (, ϑ ) = (, ϑ) ( ) p y D p y D p y D p ϑ D (3) ϑ M ϑ M The ter p( ϑ D) called odel probablty can be expressed n the followng way: ( ϑ ) ( ) p p( ϑ D) = p( Dθϑ, ) p( θϑ) dθ p D,,

where p( D ) s constant, p( ϑ ) s the pror probablty of a odel (n the paper t wll be a product of the Posson dstrbuton for k and unfor dstrbutons for r ), p( Dθ, ϑ ) s the lkelhood of the data D and p( θ ϑ ) s the pror dstrbuton of the paraeters θ n the current odel (here t wll be a product of norals and a gaa dstrbuton). The ter p( y D, ϑ ) n (3) s found usng the followng forula: p( y D, ϑ ) = p( y D, ϑθ, ) p( θd, ϑ) dθ, (4) where p( D, ) θ ϑ s the posteror dstrbuton of θ n odel ϑ and p( y,, ) noral wth expectaton f ( x ) and varance σ. D ϑ θ s It should be noted that the posteror dstrbuton of y s found usng not only the ost probable odel. It s expressed as an average of the predctons fro all the possble odels. Such procedure s called Bayesan odel averagng and s ore effcent than usng just one odel for predctons. The coputatons of the odel probablty are analytcally ntractable. Fndng the oents of the dstrbuton (3) s possble usng the reversble jup algorth (see [3] and [4] for detals). The algorth s pleented n the WINBUGS language (freely avalable fro http://www.rc-bsu.ca.ac.uk/bugs/wnbugs/contents.shtl). 4. Sulated exaples 4.1. Exaple 1 The frst functon to be approxated has the followng for (see []): A hundred pars of data ponts, ( ) ε ~ N ( ;,3) ( ) sn ( ) exp( 3 ), [,] f x = x + x x (5) ( x f x ε ) wth x equally spaced on the nterval + [-, ] and have been generated. The dataset s presented n Fg.. The data have been approxated wth a polynoal wth paraeters dstrbuton for the nuber of knots was chosen to be ( 5) l = l =. The pror Posson and Posson 1. 5 teratons of the reversble jup algorth have been perfored wth the frst 5 dscarded for the burn-n..6.4.. k saple: 1 4 6 k saple: 1 4 6 Fg. 1. Knot nuber dstrbuton for exaple 1 wth Posson(5) left and Posson(1) rght The dstrbuton for the nuber of knots s presented n Fg.. As we can see the ost probable value for the nuber of knots s 3 or dependng on the pror. It can also be.8.6.4.. () 3

seen that the axu nuber of knots does not exceed 5. Fg. 1 presents the ftted 5 Posson 1 curve for the Posson ( ) pror. The curve s practcally dentcal for the ( ) pror.,5 1,5 1,5 -,5 - -1,8-1,6-1,4-1, -1 -,8 -,6 -,4 -,,,4,6,8 1 1, 1,4 1,6 1,8-1 -1,5 - Fg.. Data for exaple 1 4.. Exaple The second functon s (see [1]): ( ) ( ) [ ] ( ) π ( ) ( ) f x = 4, 158sn 1+, 5 x+, 5 x 1 x, x, 1 (6) ( x f x ε) Fve hundreds pars of data ponts, ( ) ε ~ N ( ;1) wth x equally spaced on the nterval + [, 1] and have been generated. The dataset s presented n Fg. 4. The data have been approxated wth a polynoal wth paraeters dstrbuton for the nuber of knots was chosen to be ( 5) l = and l = 1. The pror Posson and Posson. 5 teratons of the reversble jup algorth have been perfored wth the frst 5 dscarded for the burn-n. The dstrbuton for the nuber of knots s presented n Fg. 3. As we can see the ode for the case ( s 47 and 8 for Posson 5. Fg. 4 presents the ftted curve for the ( ) Posson ) ( ) Posson () pror. The curve s practcally dentcal for the Posson ( 5) pror. As we can see the procedure s qute robust wth respect to the pror dstrbuton and for ths reason can be defned as alost autoatc. The ft for the exaple has also 4

been checked for the Posson() 1 pror. The result s that the curve s a lttle soother but also no serous dfference can be seen..1.75.5.5. k saple: 5 3 4 5 6.15.1.5. k saple: 5 19 3 4 Fg. 3. Knot nuber dstrbuton for exaple wth Posson() left and Posson(5) rght 15 1 5-5 -1-15 - Fg. 4. Data for exaple 4.3. Exaple 3,44,88,13,176,,64,38,35,396,44,484,58,57,616,66,74,748,79,836,88,94,968 The thrd exaple nvolves regresson of per capta GDP aganst lfe expectancy at brth. The dataset conssted of 174 observatons one for each country. As the curve s relatvely sooth wthout any dscontnutes the data have been approxated wth a polynoal wth paraeters was chosen to be l = l = Posson ( 5) and Posson ( 1). The pror dstrbuton for the nuber of knots. 5 teratons of the reversble jup algorth have been perfored wth the frst 5 dscarded for the burn-n. 5

The dstrbuton for the nuber of knots s presented n Fg. 5. As we can see the ost probable value for the nuber of knots s 1 or 4 dependng on the pror.. Fg. 6 presents the ftted curve for the Posson 1 pror. The curve s practcally dentcal for the Posson ( 5) pror..6.4.. k saple: 5 () 4 6 8.3..1. k saple: 95 5 1 15 Fg. 5. Knot nuber dstrbuton for exaple 3 wth Posson(1) left and Posson(5) rght 35 3 5 15 1 5 35 4 45 5 55 6 65 7 75 8 85 Fg. 6. Data for exaple 3 References: [1] Denson D.G.T., Mallck B.K., Sth A.F.M.: Autoatc Bayesan Curve Fttng, J.R. Statst. Soc. B, Vol. 6, 1998. [] DMatteo I., Genovese C.R., Kass R.E: Bayesan Curve Fttng wth Free-Knot Splnes, Boetrka 88, 1. [3] Green P.: Reversble Jup Markov Chan Monte Carlo Coputaton and Bayesan Model Deternaton. Boetrka 1995, 8, 711 73. [4] Lunn D. J., Best N., Whttaker J.: Generc reversble jup MCMC usng graphcal odels, Techncal Report EPH-5-1, Departent of Epdeology and Publc Health, Iperal College London, 5. Dr Darusz Bskup, Wroclaw Unversty of Econocs, Koandorska str. 118/1, 53-345 Wroclaw, Poland; darusz.bskup@ae.wroc.pl 6