B-Spline Interpolation on Lattices

Similar documents
Solving Systems of Polynomial Equations

The Laplace Expansion Theorem: Computing the Determinants and Inverses of Matrices

Fitting a Natural Spline to Samples of the Form (t, f(t))

Intersection of Ellipsoids

Polyhedral Mass Properties (Revisited)

A Relationship Between Minimum Bending Energy and Degree Elevation for Bézier Curves

Least-Squares Fitting of Data with Polynomials

Low-Degree Polynomial Roots

Distance Between Ellipses in 2D

Dynamic Collision Detection using Oriented Bounding Boxes

MULTIPLIERS OF THE TERMS IN THE LOWER CENTRAL SERIES OF THE LIE ALGEBRA OF STRICTLY UPPER TRIANGULAR MATRICES. Louis A. Levy

Information About Ellipses

Moving Along a Curve with Specified Speed

Intersection of Ellipses

COMS 6100 Class Notes

Gaussian Elimination and Back Substitution

A Hybrid Method for the Wave Equation. beilina

Calculating Algebraic Signatures Thomas Schwarz, S.J.

Intersection of Infinite Cylinders

2.1 Gaussian Elimination

Remainders. We learned how to multiply and divide in elementary

Intersection of Objects with Linear and Angular Velocities using Oriented Bounding Boxes

Computer Science Introductory Course MSc - Introduction to Java

CMSC427 Parametric curves: Hermite, Catmull-Rom, Bezier

NUMERICAL MATHEMATICS & COMPUTING 7th Edition

Solving Linear Systems Using Gaussian Elimination. How can we solve

Pascal s Triangle on a Budget. Accuracy, Precision and Efficiency in Sparse Grids

1 Positive definiteness and semidefiniteness

CHAPTER 3 Further properties of splines and B-splines

The Design Procedure. Output Equation Determination - Derive output equations from the state table

LINEAR SYSTEMS (11) Intensive Computation

Principal Curvatures of Surfaces

System of Linear Equations. Slide for MA1203 Business Mathematics II Week 1 & 2

CHAPTER 4 Knot insertion

Models of Computation, Recall Register Machines. A register machine (sometimes abbreviated to RM) is specified by:

Math 370 Homework 2, Fall 2009

MY PUTNAM PROBLEMS. log(1 + x) dx = π2

Chapter 9: Systems of Equations and Inequalities

Random Number Generation. Stephen Booth David Henty

CS513, Spring 2007 Prof. Amos Ron Assignment #5 Solutions Prepared by Houssain Kettani. a mj i,j [2,n] a 11

An Overly Simplified and Brief Review of Differential Equation Solution Methods. 1. Some Common Exact Solution Methods for Differential Equations

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

CHAPTER 4. Interpolation

In practice one often meets a situation where the function of interest, f(x), is only represented by a discrete set of tabulated points,

NOTES ON FINITE FIELDS

SOLVING LINEAR SYSTEMS

1 Solving Algebraic Equations

TOPOLOGICAL COMPLEXITY OF 2-TORSION LENS SPACES AND ku-(co)homology

Numerical Linear Algebra

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Coskewness and Cokurtosis John W. Fowler July 9, 2005

Iterative Solvers. Lab 6. Iterative Methods

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Numerical Linear Algebra

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: October 11. In-Class Midterm. ( 7:05 PM 8:20 PM : 75 Minutes )

5. Surface Temperatures

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Companion. Jeffrey E. Jones

Error Detection, Correction and Erasure Codes for Implementation in a Cluster File-system

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

V o l u m e 5, N u m b e r 5 2, 1 6 P a g e s. Gold B e U ClUt Stamps Double Stamp D a y E v e r y Wednesday

Distance from Line to Rectangle in 3D

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

The Area of Intersecting Ellipses

MODULE 7. where A is an m n real (or complex) matrix. 2) Let K(t, s) be a function of two variables which is continuous on the square [0, 1] [0, 1].

Reconstructing an Ellipsoid from its Perspective Projection onto a Plane

Simple Examples on Rectangular Domains

A matrix over a field F is a rectangular array of elements from F. The symbol

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

Linear Algebra and Matrix Inversion

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Distance from a Point to an Ellipse, an Ellipsoid, or a Hyperellipsoid

Discrete Mathematical Structures: Theory and Applications

Min/Max-Poly Weighting Schemes and the NL vs UL Problem

LINEAR SYSTEMS AND MATRICES

Interpolation of Rigid Motions in 3D

11 The Max-Product Algorithm

The purpose of computing is insight, not numbers. Richard Wesley Hamming

Chapter 7. Linear Algebra: Matrices, Vectors,

17. C M 2 (C), the set of all 2 2 matrices with complex entries. 19. Is C 3 a real vector space? Explain.

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

LINEAR SYSTEMS, MATRICES, AND VECTORS

ECE Information theory Final (Fall 2008)

Synopsis of Numerical Linear Algebra

Section-A. Short Questions

GALOIS GROUPS OF CUBICS AND QUARTICS (NOT IN CHARACTERISTIC 2)

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Chapter 2 - Basics Structures MATH 213. Chapter 2: Basic Structures. Dr. Eric Bancroft. Fall Dr. Eric Bancroft MATH 213 Fall / 60

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

Distance to Circles in 3D

MATHEMATICS. IMPORTANT FORMULAE AND CONCEPTS for. Final Revision CLASS XII CHAPTER WISE CONCEPTS, FORMULAS FOR QUICK REVISION.

Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations

Poisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material.

4.2 Floating-Point Numbers

GF(2 m ) arithmetic: summary

Chapter 2 - Basics Structures

Data byte 0 Data byte 1 Data byte 2 Data byte 3 Data byte 4. 0xA Register Address MSB data byte Data byte Data byte LSB data byte

Transcription:

B-Spline Interpolation on Lattices David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. Created: July 15, 1999 Last Modified: December 14, 2013 Contents 1 Introduction 3 2 B-Spline Blending Matrices 4 3 Direct Implementation 6 3.1 1D Splines............................................... 6 3.2 2D Splines............................................... 7 4 Generalized and Optimized Spline Calculation 8 4.1 Multidimensional Array Layout................................... 8 4.2 Condensing Matrix Products.................................... 9 4.3 Eliminating Nested Loops...................................... 12 4.4 Reduction of Product Tensor Loops................................ 14 4.5 Reduction of Cache Tensor Loops.................................. 14 4.6 Reduction of Evaluation Loops................................... 15 5 Polynomial Construction 17 6 Avoiding Intermediate Calculations 18 7 Computing Data On-Demand 19 8 Putting It All Together 21 8.1 1D Splines............................................... 23 1

8.2 2D Splines............................................... 23 8.3 3D Splines............................................... 24 8.4 4D Splines............................................... 25 8.5 N-dimensional Splines........................................ 26 9 Timing 27 2

1 Introduction It is sufficient to understand spline interpolation on 1-dimensional tabular data with uniformly spaced input values. Spline interpolation of higher dimensional tabular data on a uniform grid becomes a tensor product of 1-dimensional interpolating functions. Given a table of function values f i N 1 i=0 where the inputs are the uniformly spaced integers 0 i N 1, we want to build a spline of degree D which smoothly approximates the function values. The spline should have local control, not because we want to move the sample points around, but because an exact matching spline requires solving large sparse matric systems to determine the spline coefficients. One must also decide if the spline should have the convex hull property (like B-splines, but not Catmull-Rom interpolation). Let B(x) be the spline function of a single variable x. It is defined piecewise on intervals of the form [i, i + 1) for 1 i N D. The restriction on i avoids having to introduce boundary conditions for the interpolation, which would require special handling. For x [i, i + 1), the spline is defined as B(x) = f i 1+j m jk X k (x) where the summation ranges are 0 j D and 0 k D. The polynomial vector is X(x) = ( 1, (x i), (x i) 2,..., (x i) D). The (D + 1) (D + 1) blending matrix M = m jk can be chosen for any desired interpolation scheme. For B-splines I will provide in the next subsection an algorithm where the input is D and the output is M. The evaluation of the B-spline for a given x involves nested summation over the appropriate indices. Derivatives of B(x) are evaluated accordingly: d m B(x) dx m = f d m X k (x) i 1+jm jk dx m where 1 m D. For m > D, the derivatives are identically zero since B(x) is a piecewise polynomial of degree D. Spline interpolation for 2-dimensional tables f ij N 1,M 1 i=0,j=0 are built as a tensor product. Let B(x, y) be the spline function. It is defined piecewise on squares of the form [i, i + 1) [j, j + 1) for 1 i N D and 1 j M D. For (x, y) [i, i + 1) [j, j + 1), the spline is defined by where B(x, y) = f i 1+k,j 1+l m kr m ls X r (x)y s (y) X(x) = ( 1, (x i), (x i) 2,..., (x i) D) Y (y) = ( 1, (y j), (y j) 2,..., (y j) D) Evaluation of B(x, y) and its partial derivatives are straightforward as in the 1-dimensional case. For higher dimensions, the construction is easily extended. For example, in 3 dimensions the spline function for (x, y, z) [i 0, i 0 + 1) [i 1, i 1 + 1) [i 2, i 2 + 1) is where B(x, y, z) = f i0 1+j 0,i 1 1+j 1,i 2 1+j 2 m j0k 0 m j1k 1 m j2k 2 X k0 (x)y k1 (y)z k2 (z) X(x) = ( 1, (x i 0 ), (x i 0 ) 2,..., (x i 0 ) D) Y (y) = ( 1, (y i 1 ), (y i 1 ) 2,..., (y i 1 ) D) Z(z) = ( 1, (z i 2 ), (z i 2 ) 2,..., (z i 2 ) D) 3

2 B-Spline Blending Matrices Here is a short discussion of recursive generation of the basis functions for B-splines. The main result is the construction of the blending matrix when the input values of the tabular data are uniformly spaced integers. Given values t i n i=0, define Recursively define Bi 0 1, t [t i, t i+1 ] (t) = 0, otherwise. ( ) ( ) t Bi d ti (t) = B d 1 ti+d+1 t i (t) + B d 1 i+1 t i+d t i t i+d+1 t (t) i+1 for d 1. For t i = i (uniformly spaced integers), Bi+j d (t) = Bd i (t j), so there is essentially one B-spline basis function to compute for each d, call it B d (t). Thus, 1, t [0, 1] B 0 (t) = 0, otherwise, and where B d+1 (t) = t d + 1 B d(t) + d + 2 t d + 1 B d(t 1) B d (t) = 0 otherwise P (k) d (t), t [k, k + 1) for 0 k d where the P (k) d (t) are polynomials of degree d (to be determined). The recursion now implies and Now let Then P (k) d (t 1) = 1 d! P (k) d+1 (t) = t d i=0 d + 1 P (k) d a (k,d) i (t 1) i = 1 d! The recursion for the polynomials now yields a (k,d+1) i P (0) 0 (t) = 1 (t) + d + 2 t d + 1 P (k 1) d (t 1), 0 k d + 1. P (k) d (t) = 1 d! = a (k,d) i 1 i=0 d i=0 j=i a (k,d) i t i. d d ( 1) j i j i b(k 1,d) i 1 a (k,d) j + (d + 2)b (k 1,d) i t i =: 1 d! d i=0 b (k,d) i t i. for 0 i d + 1 and 0 k d. By convention, if an index is out of range, the term containing that index is 0. The initial data is a (0,0) 0 = 1 (so b (0,0) 0 = 1). 4

Now define then Q (k) d (t) = 1 d! d i=0 Q (k) d a (k,d) i (t + k) i = 1 d! (k) (t) = P (t + k); i=0 d d d j=i k j i j i a (k,d) j t i =: 1 d! The blending matrix M = m ij used in the B-spline tensor (of degree d) calculation is m ij = c (d j,d) i. d i=0 c (k,d) i t i. Blending matrices for some small d values are d = 1 : 1 1 0 1 1 2 1 1 d = 2 : 2 1 2 2 0 0 1 d = 3 : 1 3 3 1 4 0 6 3 1 6 1 3 3 3 0 0 0 1 1 4 6 4 1 11 12 6 12 4 1 d = 4 : 24 11 12 6 12 6 1 4 6 4 4 0 0 0 0 1 1 5 10 10 5 1 26 50 20 20 20 5 66 0 60 0 30 10 1 d = 5 : 120 26 50 20 20 20 10 1 5 10 10 5 5 0 0 0 0 0 1 The routine BlendMatrix in the base class Spline (in file spline.c) implements the recursions described above. 5

3 Direct Implementation The direct implementations of B-splines in 1D and 2D are given here. For higher dimensions, the tensor summations require more nested for loops. For an n-dimensional problem the calculations require 2n nested loops. In the next subsection I ll go through a series of steps to optimize the code and remove the nested loops. The result is reduced address calculation which yields highly optimized code in time. The cost is increased memory usage which is used to store data in an appropriate order to gain the time decrease. 3.1 1D Splines The spline function B(x) = f x 1+j m jk X k (x) and its derivative B (x) are calculated as follows. c o n s t i n t d e g r e e ; // d e g r e e > 0 const i n t dp1 = degree +1; const i n t dim ; // dim >= degree+1 f l o a t m[ dp1 ] [ dp1 ] ; // B s p l i n e m a t r i x f l o a t data [ dim ] ; // 1D data s e t f l o a t x ; // 1 <= x < dim d e g r e e+1 // d e t e r m i n e i n t e r v a l [ bx, bx+1) f o r e v a l u a t i o n i n t bx = f l o o r ( x ) ; // compute i n t e r m e d i a t e t e n s o r ( non p o l y n o m i a l p a r t o f B( x ) ) f l o a t i n t e r [ dp1 ] ; f o r ( i n t kx = 0 ; kx <= d e g r e e ; kx++) i n t e r [ kx ] = 0 ; f o r ( i n t i x = bx 1, j x = 0 ; j x <= d e g r e e ; i x ++, j x ++) i n t e r [ kx ] += data [ i x ] m[ j x ] [ kx ] ; // p o l y n o m i a l s t o r a g e f o r s p l i n e and i t s d e r i v a t i v e s f l o a t px [ dp1 ] ; // compute B( x ) // compute p o l y n o m i a l p a r t ( 1, x, x ˆ2, x ˆ 3,..., x ˆ d e g r e e ) px [ 0 ] = 1 ; f o r ( kx = 1 ; kx <= d e g r e e ; kx++) px [ kx ] = px [ kx 1] (x bx ) ; // compute f i n a l r e s u l t f l o a t r e s u l t = 0 ; f o r ( kx = 0 ; kx <= d e g r e e ; kx++) r e s u l t += i n t e r [ kx ] px [ kx ] ; // compute B ( x ) // compute p o l y n o m i a l p a r t ( 0, 1, 2 x, 3 x ˆ 2,..., d e g r e e x ˆ degree 1) px [ 0 ] = 0 ; px [ 1 ] = 1 ; f o r ( kx = 2 ; kx <= d e g r e e ; kx++) p o l y [ kx ] = kx p o l y [ kx 1] (x bx ) ; // compute f i n a l r e s u l t f l o a t r e s u l t = 0 ; f o r ( kx = 0 ; kx <= d e g r e e ; kx++) r e s u l t += i n t e r [ kx ] px [ kx ] ; // o t h e r d e r i v a t i v e s can be s i m i l a r l y computed 6

3.2 2D Splines The spline function B(x, y) = f x 1+k, y 1+l m kr m ls X r (x)y s (y) and its derivative B x (x, y) are calculated as follows. c o n s t i n t d e g r e e ; // d e g r e e > 0 const i n t dp1 = degree +1; const i n t xdim ; // xdim >= degree+1 const i n t ydim ; // ydim >= degree+1 f l o a t m[ dp1 ] [ dp1 ] ; // B s p l i n e m a t r i x f l o a t data [ ydim ] [ xdim ] ; // 2D data s e t f l o a t x ; // 1 <= x < xdim d e g r e e+1 f l o a t y ; // 1 <= y < ydim d e g r e e+1 // d e t e r m i n e i n t e r v a l [ base0, base0 +1)x [ base1, base1 +1) f o r e v a l u a t i o n i n t bx = f l o o r ( x ) ; i n t by = f l o o r ( y ) ; // compute i n t e r m e d i a t e t e n s o r ( non p o l y n o m i a l p a r t o f B( x, y ) ) f l o a t i n t e r [ dp1 ] [ dp1 ] ; f o r ( i n t ky = 0 ; ky <= d e g r e e ; ky++) f o r ( i n t kx = 0 ; kx <= d e g r e e ; kx++) i n t e r [ ky ] [ kx ] = 0 ; f o r ( i n t i y = by 1, j y = 0 ; j y <= d e g r e e ; i y ++, j y ++) f o r ( i n t i x = bx 1, j x = 0 ; j x <= d e g r e e ; i x ++, j x ++) i n t e r [ ky ] [ kx ] += data [ i y ] [ i x ] m[ j x ] [ kx ] m[ j y ] [ ky ] ; // p o l y n o m i a l s t o r a g e f o r s p l i n e and i t s d e r i v a t i v e s f l o a t px [ dp1 ] ; f l o a t py [ dp1 ] ; // compute B( x, y ) // compute x p o l y n o m i a l p a r t ( 1, x, x ˆ2, x ˆ 3,..., x ˆ d e g r e e ) px [ 0 ] = 1 ; f o r ( kx = 1 ; kx <= d e g r e e ; kx++) px [ kx ] = px [ kx 1] (x bx ) ; // compute y p o l y n o m i a l p a r t ( 1, y, y ˆ2, y ˆ 3,..., y ˆ d e g r e e ) py [ 0 ] = 1 ; f o r ( ky = 1 ; ky <= d e g r e e ; ky++) py [ ky ] = py [ ky 1] (y by ) ; // compute f i n a l r e s u l t f l o a t r e s u l t = 0 ; f o r ( ky = 0 ; ky <= d e g r e e ; ky++) f o r ( kx = 0 ; kx <= d e g r e e ; kx++) r e s u l t += i n t e r [ ky ] [ kx ] px [ kx ] py [ ky ] ; // compute B x ( x, y ) // compute x p o l y n o m i a l p a r t ( 1, x, x ˆ2, x ˆ 3,..., x ˆ d e g r e e ) px [ 0 ] = 1 ; f o r ( kx = 1 ; kx <= d e g r e e ; kx++) px [ kx ] = px [ kx 1] (x bx ) ; // compute y p o l y n o m i a l p a r t ( 0, 1, 2 x, 3 x ˆ 2,..., d e g r e e x ˆ degree 1) py [ 0 ] = 0 ; py [ 1 ] = 1 ; f o r ( ky = 2 ; ky <= d e g r e e ; ky++) py [ ky ] = ky py [ ky 1] ( y by ) ; // compute f i n a l r e s u l t f l o a t r e s u l t = 0 ; f o r ( ky = 0 ; ky <= d e g r e e ; ky++) f o r ( kx = 0 ; kx <= d e g r e e ; kx++) r e s u l t += i n t e r [ ky ] [ kx ] px [ kx ] py [ ky ] ; // o t h e r d e r i v a t i v e s can be s i m i l a r l y computed 7

4 Generalized and Optimized Spline Calculation The first of my goals is to create a spline class for which the user can specify the dimension of the data set dynamically. In the 2D example, the dimension is used statically in three ways. The first way is the structure of the data set and the intermediate tensor as float[][]. The second way is the occurrence of nested loops, two at a time. The third way is that the innermost loop of the intermediate calculation requires multiplication by n matrix values. The second of my goals is to optimize the spline calculations for speed. The bottleneck in the spline evaluation is the calculation of the intermediate tensor, which in general involves 2n nested loops. The next subsections present modifications to meet the three goals. As each modification is made, the revised code for the 2-dimensional case is given. 4.1 Multidimensional Array Layout An n-dimensional array can be stored in memory as a 1-dimensional array of contiguous bytes. The elements of the array are ordered lexicographicaly. Let the data set be defined on an n-dimensional lattice L = n 1 d 0 [0, b d) where [0, b d ) = i ZZ : 0 i < b d for b d > 0. That is, the data set is a function f : L IR. If f(x) is the data value at coordinate x = (x 0, x 1,..., x n 1 ), then the corresponding 1-dimensional index is i = x 0 + b 0 x 1 + b 0 b 1 x 2 + (b 0 b n 2 )x n 1. The coefficient of x k is k 1 d=0 b k (if k = 0, the product defaults to 1). Given a 1-dimensional index i, the corresponding coordinate can be obtained by a sequence of mod and div operations: x = (i mod b 0, (i div b 0 ) mod b 1, ((i div b 0 ) div b 1 ) mod b 2,...). The data and intermediate tensor can be stored as 1-dimensional arrays. The relevant changes in the 2D code are f l o a t data [ xdim ydim ] ; // f ( i, j ) = data [ i+xdim j ] // compute i n t e r m e d i a t e t e n s o r f l o a t i n t e r [ dp1 dp1 ] ; f o r ( i n t ky = 0 ; ky <= d e g r e e ; ky++) f o r ( i n t kx = 0 ; kx <= d e g r e e ; kx++) f l o a t sum = 0 ; f o r ( i n t i y = by 1, j y = 0 ; j y <= d e g r e e ; i y ++, j y ++) f o r ( i n t i x = bx 1, j x = 0 ; j x <= d e g r e e ; i x ++, j x ++) sum += data [ i x+xdim i y ] m[ j x ] [ kx ] m[ j y ] [ ky ] ; i n t e r [ kx+dp1 ky ] = sum ; // compute B( x, y ) f l o a t r e s u l t = 0 ; f o r ( ky = 0 ; ky <= d e g r e e ; ky++) f o r ( kx = 0 ; kx <= d e g r e e ; kx++) r e s u l t += i n t e r [ kx+dp1 ky ] px [ kx ] py [ ky ] ; A small modification to the loops can avoid the address calculation kx+dp1*ky. This works because of the lexicographical ordering of inter and the ordering of the k loops. // compute i n t e r m e d i a t e t e n s o r i n t i i n d e x = 0 ; 8

f o r ( i n t ky = 0 ; ky <= d e g r e e ; ky++) f o r ( i n t kx = 0 ; kx <= d e g r e e ; kx++) f l o a t sum = 0 ; f o r ( i n t i y = by 1, j y = 0 ; j y <= d e g r e e ; i y ++, j y ++) f o r ( i n t i x = bx 1, j x = 0 ; j x <= d e g r e e ; i x ++, j x ++) sum += data [ i x+xdim i y ] m[ j x ] [ kx ] m[ j y ] [ ky ] ; i n t e r [ i i n d e x ++] = sum ; // compute B( x, y ) f l o a t r e s u l t = 0 ; i i n d e x = 0 ; f o r ( ky = 0 ; ky <= d e g r e e ; ky++) f o r ( kx = 0 ; kx <= d e g r e e ; kx++) r e s u l t += i n t e r [ i i n d e x ] px [ kx ] py [ ky ] ; A similar modification to avoid the address calculation ix+xdim*iy is not straightforward since the data accessed is a subrectangle of the original data set whose columns (x) are stored in contiguous memory for each row, but whose rows (y) are not contiguous. It is, however, not necessary to do so. The method used to avoid a static number of matrix values m[j][k] in the innermost loop will require the data to be cached into a contiguous block of memory. Anticipating this, the intermediate tensor code becomes // cache t h e data f l o a t cache [ dp1 dp1 ] ; i n t c i n d e x = 0 ; f o r ( i n t i y = by 1; i y <= by 1+d e g r e e ; i y ++) f o r ( i n t i x = bx 1; i x <= bx 1+d e g r e e ; i x ++) cache [ c i n d e x ] = data [ i x+xdim i y ] ; // compute i n t e r m e d i a t e t e n s o r i n t i i n d e x = 0 ; f o r ( i n t ky = 0 ; ky <= d e g r e e ; ky++) f o r ( i n t kx = 0 ; kx <= d e g r e e ; kx++) f l o a t sum = 0 ; c i n d e x = 0 ; f o r ( i n t j y = 0 ; j y <= d e g r e e ; j y ++) f o r ( i n t j x = 0 ; j x <= d e g r e e ; j x ++) sum += cache [ c i n d e x ] m[ j x ] [ kx ] m[ j y ] [ ky ] ; i n t e r [ i i n d e x ++] = sum ; 4.2 Condensing Matrix Products The next step is to remove the explicit matrix product m[jx][kx] * m[jy][ky] which forces a static dependence on dimension 2. Let i 0 and i 1 be given (which is the case for a single spline evaluation). The intermediate second-order tensor is In the code we have the correspondences I k0k 1 = m j0k 0 m j1k 1 f i0 1+j 0 i 1 1+j 1. 9

symbols i 0 i 1 i 0 1 + j 0 i 1 1 + j 1 j 0 j 1 k 0 k 1 variables bx by ix iy jx jy kx ky The matrix product can be viewed abstractly as a fourth-order tensor P j0j 1k 0k 1 = m j0k 0 m j1k 1. For given i 0 and i 1 (which is the case for a single spline evaluation), the cached data is a second-order tensor The intermediate tensor is therefore C j0j 1 = f i0 1+j 0 i 1 1+j 1. I k0k 1 = P j0j 1k 0k 1 C j0j 1. Let J be the 1-dimensional index corresponding to the coordinate (j 0, j 1 ). If D is the degree of the spline, then 0 J < D 2 and J is mapped to the coordinate via J = j 0 + D j 1. Similarly K corresponds to (k 0, k 1 ) with 0 K < D 2 and K = k 0 + D k 1. The intermediate tensor can be written in multiindex notation as I K = P JK C J where the right-hand side is a matrix P times a vector C. The intermediate code, which additionally stores P as a 1D array, now becomes // cache t h e data f l o a t cache [ dp1 dp1 ] ; i n t c i n d e x = 0 ; f o r ( i n t i y = by 1; i y <= by 1+d e g r e e ; i y ++) f o r ( i n t i x = bx 1; i x <= bx 1+d e g r e e ; i x ++) cache [ c i n d e x ] = data [ i x+xdim i y ] ; // compute p r o d u c t t e n s o r ( o r d e r o f l o o p s i s i m p o r t a n t ) f l o a t p r o d u c t [ dp1 dp1 dp1 dp1 ] ; i n t p i n d e x = 0 ; f o r ( i n t k1 = 0 ; k1 <= d e g r e e ; k1++) f o r ( i n t k0 = 0 ; k0 <= d e g r e e ; k0++) f o r ( i n t j 1 = 0 ; j 1 <= d e g r e e ; j 1++) f o r ( i n t j 0 = 0 ; j 0 <= d e g r e e ; j 0++) p r o d u c t [ p i n d e x ] = m[ j 0 ] [ k0 ] m[ j 1 ] [ k1 ] ; // compute i n t e r m e d i a t e t e n s o r const i n t dp1 to 2 = dp1 dp1 ; f o r ( i n t K = 0 ; K < d p 1 t o 2 ; K++) f l o a t sum = 0 ; f o r ( i n t J = 0 ; J < d p 1 t o 2 ; J++) sum += p r o d u c t [ J+d p 1 t o 2 K] cache [ J ] ; i n t e r [K] = sum ; 10

A minor modification can be made to remove the calculation of J+dp1 to 2*K. p i n d e x = 0 ; f o r ( i n t K = 0 ; K < d p 1 t o 2 ; K++) f l o a t sum = 0 ; f o r ( i n t J = 0 ; J < d p 1 t o 2 ; J++, p i n d e x++) sum += p r o d u c t [ p i n d e x ] cache [ J ] ; i n t e r [K] = sum ; Note that (in general) the original 2n intermediate tensor loops are reduced to 2 loops, thereby reducing address calculations. The cost is the increased space required to store the product tensor. After these changes, the intermediate tensor calculation is still the bottleneck in the calculations. The calculation time can be further reduced. The (D + 1) (D + 1) B-spline matrix has at least D zeros (the ones occurring in the last row). The presence of these zeros forces many of the product tensor entries to be zero. For example, dimension degree zeros total ratio 2 1 7 16 0.438 2 32 81 0.395 3 112 256 0.438 4 184 625 0.294 5 455 1296 0.351 3 1 37 64 0.578 2 386 729 0.529 3 2368 4096 0.578 4 6364 15625 0.407 5 22267 46656 0.477 4 1 175 256 0.684 2 4160 6561 0.634 3 44800 65536 0.684 4 196144 390625 0.502 A significant portion of time is spent updating the running sum in the innermost loop with zero. We can add another fourth-order tensor (of integers) to the code. This tensor measures how many of the product tensor entries to skip to get to the next nonzero value. That is, if P i is the product tensor stored as a 1-dimensional array, then define the skip tensor by S i = j if P i 0, P i+j 0, and P k = 0 for i < k < i + j. Note that if P i 0 and P i+1 0, then S i = 1. If P i = 0, the value of S i is irrelevant, but I will set it to 1. The code to calculate the skips is // compute p r o d u c t t e n s o r ( o r d e r o f l o o p s i s i m p o r t a n t ) i n t dp1 to 4 = dp1 dp1 dp1 dp1 ; f l o a t p r o d u c t [ d p 1 t o 4 ] ; i n t p i n d e x = 0 ; 11

f o r ( i n t k1 = 0 ; k1 <= d e g r e e ; k1++) f o r ( i n t k0 = 0 ; k0 <= d e g r e e ; k0++) f o r ( i n t j 1 = 0 ; j 1 <= d e g r e e ; j 1++) f o r ( i n t j 0 = 0 ; j 0 <= d e g r e e ; j 0++) p r o d u c t [ p i n d e x ] = m[ j 0 ] [ k0 ] m[ j 1 ] [ k1 ] ; // compute s k i p t e n s o r f l o a t s k i p [ d p 1 t o 4 ] ; f o r ( i n t i = 0 ; i < d p 1 t o 4 ; ) s k i p [ i ] = 1 ; f o r ( i n t j = 0 ; j < d p 1 t o 4 && p r o d u c t [ j ] == 0 ; j ++) s k i p [ j ]++; i = j ; // compute i n t e r m e d i a t e t e n s o r ( o r d e r o f i n c r e m e n t o f J and I i m p o r t a n t ) f o r ( i n t K = 0, I = 0 ; K < d p 1 t o 2 ; K++) f l o a t sum = 0 ; f o r ( i n t J = 0 ; J < d p 1 t o 2 ; J += s k i p [ I ], I += s k i p [ I ] ) sum += p r o d u c t [ I ] cache [ J ] ; i n t e r [K] = sum ; The product and skip tensors can be computed at time of spline object creation. 4.3 Eliminating Nested Loops The general idea of converting N nested loops to a single loop is described. Consider the example i n t L0, L1 ; // s p e c i f i e d l o w e r bounds i n t U0, U1 ; // s p e c i f i e d upper bounds, L0 < U0 and L1 < U1 // 2 n e s t e d l o o p s f o r ( i n t i 1 = L1 ; i 1 < U1 ; i 1 ++) f o r ( i n t i 0 = L0 ; i 0 < U0 ; i 0 ++) // l o o p body goes h e r e One way to convert to a single loop is to use the conversions between 1-dimensional indices and 2-dimensional coordinates. Using the lexicographic ordering j = (i 0 L 0 ) + b 0 (i 1 L 1 ) where b 0 = U 0 L 0, // e q u i v a l e n t s i n g l e l o o p i n t b0 = U0 L0, b1 = U1 L1, q u a n t i t y = b0 b1 ; f o r ( i n t j = 0 ; j < q u a n t i t y ; j ++) i 0 = L0 + ( j % b0 ) ; i 1 = L1 + ( j / b0 ) ; // l o o p body goes h e r e The general case of N loops is c o n s t i n t N; // d i m e n s i o n s, N > 0 i n t L [N], U[N ] ; // bounds, L [ d ] < U[ d ] i n t i [N ] ; // l o o p i n d i c e s, L [ d ] <= i [ d ] < U[ d ] // N n e s t e d l o o p s f o r ( i [ N 1] = L [ N 1]; i [ N 1] < U[N 1]; i [ N 1]++) f o r ( i [ N 2] = L [ N 1]; i [ N 2] < U[N 2]; i [ N 2]++) : f o r ( i [ 0 ] = L [ 0 ] ; i [ 0 ] < U [ 0 ] ; i [0]++) // l o o p body goes h e r e // e q u i v a l e n t s i n g l e l o o p 12

i n t b [N ] ; i n t q u a n t i t y = 1 ; f o r ( i n t d = 0 ; d < N; d++) b [ d ] = U[ d] L [ d ] ; q u a n t i t y = b [ d ] ; f o r ( i n t j = 0 ; j < q u a n t i t y ; j ++) i n t temp = j ; f o r ( d = 0 ; d < N; d++) i [ d ] = L [ d ] + temp % b [ d ] ; temp /= b [ d ] ; // l o o p body goes h e r e In exchange for a single loop, the execution time increases because of coordinate calcuations. Another reduction scheme is described now. The previous one made conversions from indices to coordinates which work for a random index. The algorithm does not take into account that the indices are processed in the order corresponding to the lexicographical ordering of the coordinates. Consider the same example as before, i n t L0, L1 ; // s p e c i f i e d l o w e r bounds i n t U0, U1 ; // s p e c i f i e d upper bounds, L0 < U0 and L1 < U1 // 2 n e s t e d l o o p s f o r ( i n t i 1 = L1 ; i 1 < U1 ; i 1 ++) f o r ( i n t i 0 = L0 ; i 0 < U0 ; i 0 ++) // l o o p body goes h e r e // e q u i v a l e n t s i n g l e l o o p i n t b0 = U0 L0, b1 = U1 L1, q u a n t i t y = b0 b1 ; i 0 = L0 ; i 1 = L1 ; f o r ( i n t j = 0 ; j < q u a n t i t y ; j ++) // l o o p body goes h e r e i f ( ++i 0 < U0 ) continue ; i 0 = 0 ; i 1 ++; // j==q u a n t i t y when i 1==U1, so o n l y j t e s t needed In the general case we have the reduction c o n s t i n t N; // d i m e n s i o n s, N > 0 i n t L [N], U[N ] ; // bounds, L [ d ] < U[ d ] i n t i [N ] ; // l o o p i n d i c e s, L [ d ] <= i [ d ] < U[ d ] // N n e s t e d l o o p s f o r ( i [ N 1] = L [ N 1]; i [ N 1] < U[N 1]; i [ N 1]++) f o r ( i [ N 2] = L [ N 2]; i [ N 2] < U[N 2]; i [ N 2]++) : f o r ( i [ 0 ] = L [ 0 ] ; i [ 0 ] < U [ 0 ] ; i [0]++) // l o o p body goes h e r e // e q u i v a l e n t s i n g l e l o o p i n t q u a n t i t y = 1 ; f o r ( i n t d = 0 ; d < N; d++) q u a n t i t y = U[ d] L [ d ] ; f o r ( d = 0 ; d < N; d++) i [ d ] = L [ d ] ; 13

f o r ( i n t j = 0 ; j < q u a n t i t y ; j ++) // l o o p body goes h e r e f o r ( d = 0 ; d < N; d++) i f ( ++i [ d ] < U[ d ] ) break ; i [ d ] = L [ d ] ; 4.4 Reduction of Product Tensor Loops The first reduction method is applied to computing the product tensor from the spline matrix. For dimension N, 2N nested loops are required c o n s t i n t N; // d i m e n s i o n s i n t dp1 to 2N = 1 ; f o r ( i n t d = 0 ; d < N; d++) dp1 to 2N = dp1 ; dp1 to 2N = dp1 to 2N ; // compute p r o d u c t t e n s o r ( o r d e r o f l o o p s i s i m p o r t a n t ) f l o a t p r o d u c t [ dp1 to 2N ] ; i n t p i n d e x = 0 ; i n t j [N], k [N ] ; f o r ( k [ N 1] = 0 ; k [ N 1] <= d e g r e e ; k [ N 1]++) : f o r ( k [ 0 ] = 0 ; k [ 0 ] <= d e g r e e ; k [0]++) f o r ( j [ N 1] = 0 ; j [ N 1] <= d e g r e e ; j [ N 1]++) : f o r ( j [ 0 ] = 0 ; j [ 0 ] <= d e g r e e ; j [0]++) p r o d u c t [ p i n d e x ] = m[ j [ 0 ] ] [ k [ 0 ] ]... m[ j [ N 1 ] ] [ k [ N 1 ] ] ; These can be reduced to i n t i [2 N ] ; f o r ( i n t j = 0 ; j < dp1 to 2N ; j ++) i n t temp = j ; f o r ( d = 0 ; d < 2 N; d++) i [ d ] = temp % dp1 ; temp /= dp1 ; p r o d u c t [ j ] = 1 ; f o r ( i n t k = 0 ; k < N; k++) p r o d u c t [ k ] = m[ i [ k ] ] [ i [ k+n ] ] ; Because of the nested loop ordering, the 2N-dimensional coordinate i[] contains j 0 through j N 1 in its first N components and k 0 through k N 1 in its last N components. 4.5 Reduction of Cache Tensor Loops This reduction uses the second method, but it needs a minor modification to extract the sublattice of data points. The 2-dimensional code is i n t i x = bx 1, i y = by 1, d i n d e x = i x+xdim i y ; f o r ( i n t c i n d e x = 0 ; c i n d e x < d p 1 t o 2 ; c i n d e x ++, d i n d e x++) cache [ c i n d e x ] = data [ d i n d e x ] ; i f ( ++i x <= bx 1+degree ) continue ; 14

i x = bx 1; d i n d e x += xdim dp1 ; i y ++; The data sublattice is a rectangle of size (D + 1) (D + 1). When the index ix points to the last column in a row, the next incrementing of it takes it out-of-range, so it is reset to the first column in the next row ix = bx-1. At the same time the data index dindex was incremented to an out-of-range column (immediately to the right of the data set). The addition of xdim to dindex gets you to the same out-of-range column, but in the next row. The subtraction of dp1 from dindex gets you to the first column in that next row. The extension of this idea to N-dimensions is c o n s t i n t N; // d i m e n s i o n s, N > 0 i n t d e g r e e ; // d e g r e e D o f s p l i n e, D > 0 i n t dp1 ; // D+1 i n t dp1 to N ; // (D+1)ˆN i n t dim [N ] ; // d i m e n s i o n s i z e s, dim [ d ] > 0 i n t g r i d m i n [N ] ; // c o o r d i n a t e o f f i r s t data p o i n t to be p r o c e s s e d i n t g r i d m a x [N ] ; // g r i d m a x [ d ] = g r i d m i n [ d]+ d e g r e e i n t i [N ] ; f o r ( i n t d = 0 ; d < N; d++) i [ d ] = g r i d m i n [ d ] ; // compute 1 dim s t a r t i n g i n d e x from N dim s t a r t i n g c o o r d i n a t e i n t d i n d e x = i [ N 1]; f o r ( d = N 2; d >= 0 ; d ) d i n d e x = dim [ d ] ; d i n d e x += i [ d ] ; // compute data a d d r e s s d i f f e r e n c e s, used when l o o p v a r out of r a n g e i n t d e l t a [ N 1]; i n t temp = 1 ; f o r ( d = 0 ; d < N 1; d++) d e l t a [ d ] = temp ( dim [ d] dp1 ) ; temp = dim [ d ] ; f o r ( i n t c i n d e x = 0 ; c i n d e x < dp1 to N ; c i n d e x ++, d i n d e x++) cache [ c i n d e x ] = data [ d i n d e x ] ; f o r ( d = 0 ; d < N; d++) i f ( i [ d ] <= g r i d m a x [ d ] ) break ; i [ d ] = g r i d m i n [ d ] ; d i n d e x += d e l t a [ d ] ; 4.6 Reduction of Evaluation Loops Consider the example of evaluating a B-spline in two variables. The final loops in the process are shown below. // i n t e r m e d i a t e t e n s o r c o m p u t a t i o n s go h e r e // p o l y n o m i a l c o m p u t a t i o n s f o r px [ ] and py [ ] go h e r e // compute B( x, y ) f l o a t r e s u l t = 0 ; i n t i i n d e x = 0 ; f o r ( i n t ky = 0 ; ky <= d e g r e e ; ky++) 15

f o r ( i n t kx = 0 ; kx <= d e g r e e ; kx++) r e s u l t += i n t e r [ i i n d e x ] px [ kx ] py [ ky ] ; The loops can be replaced using the second reduction method: i n t i 0 = 0, i 1 = 0 ; f l o a t r e s u l t = 0 ; f o r ( i n t j = 0 ; j < d p 1 t o 2 ; j ++) r e s u l t += i n t e r [ j ] px [ i 0 ] py [ i 1 ] ; i f ( i 0 <= d e g r e e ) continue ; i 0 = 0 ; i 1 ++; In N dimensions, the reduction is f l o a t p o l y [N ] [ dp1 ] ; // N v a r i a b l e s, each p o l y o f d e g r e e D+1 i n t i [N ] ; f l o a t r e s u l t = 0 ; f o r ( i n t j = 0 ; j < dp1 to N ; j ++) f l o a t temp = i n t e r [ j ] ; f o r ( i n t d = 0 ; d < N; d++) temp = p o l y [ d ] [ i [ d ] ] ; r e s u l t += temp ; f o r ( d = 0 ; d < N; d++) i f ( i [ d ] <= d e g r e e ) break ; i [ d ] = 0 ; Now consider evaluating a derivative, say B x (x, y). We evaluated the appropriate polynomial terms, but used the same final set of loops. Note, however, that for a derivative in x of order m, the first m components of the polynomial array are zero. Thus, the loops could be written as shown to avoid the multiplications by zero. // compute dx p a r t i a l d e r i v a t i v e s o f B( x, y ) w i t h r e s p e c t to x // and dy p a r t i a l d e r i v a t i v e s o f B( x, y ) w i t h r e s p e c t to y f l o a t r e s u l t = 0 ; f o r ( ky = dy ; ky <= degree ; ky++) f o r ( kx = dx ; kx <= degree ; kx++) r e s u l t += i n t e r [ ky ] [ kx ] px [ kx ] py [ ky ] ; In the previous example, the loop index j is used to traverse all of the intermediate tensor values. To avoid the multiplications by zero, now j needs to skip over blocks of intermediate tensor values which would have been the other operands of the zero multiplication. The code is i n t dx, dy ; // o r d e r s o f d i f f e r e n t a t i o n // px [ 0 ] =... = px [ dx 1] = 0 // py [ 0 ] =... = px [ dy 1] = 0 i n t i 0 = dx, i 1 = dy ; i n t d e l t a 0 = dx ; // amount o f s k i p i n t h e i n t e r [ ] t e n s o r f l o a t r e s u l t = 0 ; f o r ( i n t j = i 0+dp1 i 1 ; j < dg1 to N ; j +) r e s u l t += i n t e r [ j ] px [ i 0 ] py [ i 1 ] ; i f ( ++i 0 <= degree ) continue ; i 0 = dx ; j += d e l t a 0 ; // s k i p t h e f i r s t dx i n t e r [ ] v a l u e s i n t h e n e x t row 16

i 1 ++; In N dimensions, the reduction is f l o a t p o l y [N ] [ dp1 ] ; // N v a r i a b l e s, each p o l y o f d e g r e e D+1 i n t dx [N ] ; // o r d e r s o f d i f f e r e n t i a t i o n f o r v a r s x0 t h r o u g h x N 1 // p o l y [ d ] [ 0 ] =... = p o l y [ d ] [ dx [ d ] 1] = 0 // l o o p i n d i c e s i n t i [N ] ; f o r ( i n t d = 0 ; d < N; d++) i [ d ] = dx [ d ] ; // s k i p amounts f o r i n t e r [ ] t e n s o r i n t d e l t a [ N 1]; i n t temp = 1 ; f o r ( d = 0 ; d < N 1; d++) d e l t a [ d ] = temp dx [ d ] ; temp = dp1 ; // s t a r t i n g i n d e x i n t j = i [ N 1]; f o r ( d = N 2; d >= 0 ; d ) j = dp1 ; j += i [ d ] ; f l o a t r e s u l t = 0 ; f o r ( ; j < dp1 to N ; j ++) f l o a t temp = i n t e r [ j ] ; f o r ( i n t d = 0 ; d < N; d++) temp = p o l y [ d ] [ i [ d ] ] ; r e s u l t += temp ; f o r ( d = 0 ; d < N; d++) i f ( i [ d ] <= d e g r e e ) break ; i [ d ] = dx [ d ] ; j += d e l t a [ d ] ; 5 Polynomial Construction It is sufficient to consider the 1-dimensional case since each polynomial is independent of the others. The following table illustrates what the polynomial arrays will store. The value t = x x and D is the degree 17

of the polynomial. derivative order polynomial array 0 ( 1, t, t 2, t 3,..., t D ) 1 ( 0, 1, 2t, 3t 2,..., Dt D 1 ) 2 ( 0, 0, 2, 6t,..., D(D 1)t D 2 ).. D ( 0, 0, 0, 0,..., D! ) D + 1 ( 0, 0, 0, 0,..., 0 ) To allow for a dynamically selected degree D, we need to construct the coefficients for the polynomials above. We can do so by building a (D + 1) (D + 1) matrix, [α rc ] which stores the coefficients indicated in the table above. The matrix is upper triangular with row index 0 r D + 1 indicating order of differentiation and column index 0 c D + 1 indicating the power of t that goes with the coefficient. It is given by The code implementing this is α rc = f l o a t c o e f f [ dp1 ] [ dp1 ] ; f o r ( i n t r = 0 ; r <= d e g r e e ; r++) f o r ( i n t c = r ; c <= d e g r e e ; c++) c o e f f [ r ] [ c ] = 1 ; f o r ( i = 0 ; i < r ; i ++) c o e f f [ r ] [ c ] = c i ; The code to compute the polynomial array is f l o a t p o l y [ dp1 ] ; f l o a t t = x f l o o r ( x ) ; i n t o r d e r = dx ; // o r d e r o f d i f f e r e n t a t i o n f l o a t temp = 1 ; f o r ( i n t d = o r d e r ; o r d e r <= d e g r e e ; d++) p o l y [ d ] = c o e f f [ o r d e r ] [ d ] temp ; temp = t ; 1 c r = 0 r 1 i=0 (c i) c r > 0. 6 Avoiding Intermediate Calculations The order of calculations in the evaluation of B-splines was: compute floors of the input variables to use as base indices, compute the intermediate tensor product, compute the polynomial terms, multiply the intermediate tensor product with the polynomials and return the result. Even with the optimizations discussed earlier, the intermediate tensor calculation is still the bottleneck. For random selection of input variables, this tensor must always be calculated. In many applications, the evaluation of splines is localized. Given that the spline is evaluated at x, the probability is large that the next evaluation is at y where x y is smaller than one unit distance relative to the lattice on which the data is defined. Using the locality of reference, it 18

is highly probable that the base indices are the same for both x and y. If we update the intermediate tensor only when a base index changes, we could save some significant time. The code to do this is given. Variable names are consistent with what was used earlier in this section. // i n i t i a l i z a t i o n code c o n s t i n t N; // d i m e n s i o n s, N > 0 i n t o l d b a s e [N ] ; // s a v e s t h e base i n d i c e s from t h e p r e v i o u s e v a l u a t i o n i n t base [N ] ; // c u r r e n t base i n d i c e s, 1 <= base [ d ] < N d e g r e e+1 f o r ( i n t d = 0 ; d < N; d++) o l d b a s e [ d ] = 1; // base [ d ] w i l l be d i f f e r e n t on f i r s t c a l l // code p e r e v a l u a t i o n f l o a t x [N ] ; // i n p u t to s p l i n e e v a l u a t i o n f o r ( d = 0 ; d < N; d++) base [ d ] = f l o o r ( x [ d ] ) ; // compute base i n d i c e s f o r c u r r e n t c a l l f o r ( d = 0 ; d < N; d++) i f ( o l d b a s e [ d ]!= base [ d ] ) // base i n d e x has changed // s w i t c h to new l o c a l g r i d f o r ( i n t k = 0 ; k < N; k++) o l d b a s e [ k ] = base [ k ] ; g r i d m i n [ k ] = base [ k ] 1; g r i d m a x [ k ] = g r i d m i n [ k]+ d e g r e e ; // compute i n t e r m e d i a t e t e n s o r code goes h e r e break ; // compute p o l y n o m i a l terms code goes h e r e // f i n a l c a l c u l a t i o n s go h e r e ( m u l t i p l i c a t i o n o f p o l y n o m i a l s and // i n t e r m e d i a t e t e n s o r ) 7 Computing Data On-Demand For some applications, the tabular data might not be completely known when the B-spline object is to be constructed. The reason most likely is that the precalculation of the entire data set is very expensive. In these applications it is economical to compute only those local grid points needed for evaluation. For a N-dimensional data set, each dimension requiring K values, the total number of grid points is K N. Precalculation takes order O(K N ) time. If the application only needs to analyze data lying approximately on a curve in the data set, the time to process this localized data is order O(DK) where D is the degree of the spline. Clearly this is a large time savings. The user of a spline object can provide a callback function which is executed whenever tabular data have not yet been calculated. The callback has the job of computing the tabular data. A mechanism to handle this is as follows. Any unknown tabular data are assigned an invalid floating point number. I use MAXFLOAT for this purpose. If for some reason the application allows MAXFLOAT as valid data, then some invalid bit pattern for IEEE short reals should be chosen. Whenever a new local grid is entered, the spline object checks for invalid data and executes the callback for that data. For a 2-dimensional spline class Spline2, // data i s t h e t a b u l a r data p o i n t e r used by s p l i n e // data i s o f s i z e xdim by ydim v o i d S p l i n e 2 : : EvaluateUnknownData ( ) f o r ( i n t i 1 = g r i d m i n [ 1 ] ; i 1 <= g r i d m a x [ 1 ] ; i 1 ++) f o r ( i n t i 0 = g r i d m i n [ 0 ] ; i 0 <= g r i d m a x [ 0 ] ; i 0 ++) 19

i n t i n d e x = i 0+xmin i 1 ; i f ( data [ i n d e x ] == INVALID DATA ) data [ i n d e x ] = e v a l u a t e c a l l b a c k ( i0, i 1 ) ; Th callback function pointer is assigned to spline object member function evaluate callback) by the user. The method for general dimensions is i n t S p l i n e N : : I n d e x ( i n t i ) i n t i n d e x = i [ N 1]; f o r ( i n t d = N 2; d >= 0 ; d ) i n d e x = dp1 ; i n d e x += i [ d ] ; r e t u r n i n d e x ; v o i d SplineN : : EvaluateUnknownData ( ) i n t i [N ] ; f o r ( i n t d = 0 ; d < N; d++) i [ d ] = g r i d m i n [ d ] ; f o r ( i n t j = 0 ; j < dp1 to N ; j ++) i n t i n d e x = I n d e x ( i ) ; i f ( data [ i n d e x ] == INVALID DATA ) data [ i n d e x ] = e v a l u a t e c a l l b a c k ( i n d e x ) ; f o r ( d = 0 ; d < N; d++) i f ( ++i [ d ] <= g r i d m a x [ d ] ) break ; i [ d ] = g r i d m i n [ d ] ; Only the code block for computing the intermediate tensor product needs the tabular data. The evaluation code (in general dimensions) now looks like // S p l i n e p r o v i d e s member f u n c t i o n p o i n t e r, // f l o a t ( e v a l u a t e c a l l b a c k ) ( i n t ) // f o r s t o r i n g t h e c a l l b a c k. P o i n t e r i s n u l l u n l e s s s e t by u s e r. f l o a t x [N ] ; // i n p u t to s p l i n e e v a l u a t i o n f o r ( d = 0 ; d < N; d++) base [ d ] = f l o o r ( x [ d ] ) ; f o r ( d = 0 ; d < N; d++) i f ( o l d b a s e [ d ]!= base [ d ] ) // s w i t c h to new l o c a l g r i d f o r ( i n t k = 0 ; k < N; k++) o l d b a s e [ k ] = base [ k ] ; g r i d m i n [ k ] = base [ k ] 1; g r i d m a x [ k ] = g r i d m i n [ k]+ d e g r e e ; // compute unknown data i f ( e v a l u a t e c a l l b a c k ) EvaluateUnknownData ( ) ; // compute i n t e r m e d i a t e t e n s o r code goes h e r e break ; // compute p o l y n o m i a l terms code goes h e r e // f i n a l c a l c u l a t i o n s go h e r e ( m u l t i p l i c a t i o n o f p o l y n o m i a l s and 20

// i n t e r m e d i a t e t e n s o r ) In general, EvaluateUnknownData() checks the entire (D + 1) N set of grid points for invalid data. When the base indices are incremented or decremented by at most 1, it is not necessary to check all values. If you want to optimize this routine, it would go as follows. Suppose that any of the pairs of old and new base indicies differ by at most 1. For simplicity, assume that new data members are added to the proper classes which keep track of these differences, The function computations can be modified to // 1D s p l i n e s i n t b x d i f f = bx o l d b x ; v o i d EvaluateUnknownData ( ) i n t k0 = bx+2 bx d i f f +1; // k0 = bx 1 o r bx+3 i f ( data [ k0 ] == INVALID FLOAT ) data [ k0 ] = e v a l u a t e c a l l b a c k ( k0 ) ; // 2D s p l i n e s i n t b x d i f f = bx o l d b x ; i n t b y d i f f = by o l d b y ; v o i d EvaluateUnknownData ( ) i n t k0, k1 ; i f ( b x d i f f ) k0 = bx+2 b x d i f f +1; // k0 = bx 1 o r bx+3 f o r ( k1 = g r i d m i n [ 1 ] ; k1 <= g r i d m a x [ 1 ] ; k1++) i n t i n d e x = k0+xmin k1 ; i f ( data [ i n d e x ] == INVALID DATA ) data [ i n d e x ] = e v a l u a t e c a l l b a c k ( k0, k1 ) ; i f ( b y d i f f ) k1 = by+2 b y d i f f +1; // k1 = by 1 o r by+3 f o r ( k0 = g r i d m i n [ 0 ] ; k0 <= g r i d m a x [ 0 ] ; k0++) i n t i n d e x = k0+xmin k1 ; i f ( data [ i n d e x ] == INVALID DATA ) data [ i n d e x ] = e v a l u a t e c a l l b a c k ( k0, k1 ) ; Note that there are still some duplicate checks, but the total number of checks is reduced from (D + 1) n to at most n (D + 1) n 1. Rather than checking the entire volume of grid points, we are only checking the new faces that occur when the base indices change. The additional check occurs at the intersection of the faces. I have not implemented this modification since in low dimensions it probably does not gain any time. 8 Putting It All Together I have constructed the spline classes by derivation from an abstract base spline class. The base class has the job of initializing things such as the spline blending matrix, the polynomial coefficient matrix, the product and skip tensors for optimization, and the base indices. The class declaraion is in spline.h: c l a s s mgcspline p u b l i c : mgcspline ( i n t d i m e n s i o n s, i n t d e g r e e, i n t dim, f l o a t d a t a ) ; v i r t u a l mgcspline ( ) ; 21

i n t Dimensions ( ) r e t u r n d i m e n s i o n s ; i n t Degree ( ) r e t u r n d e g r e e ; c o n s t f l o a t M a t r i x ( ) r e t u r n ( c o n s t f l o a t ) m a t r i x ; f l o a t DomainMin ( i n t d = 0) r e t u r n dom min [ d ] ; f l o a t DomainMax ( i n t d = 0) r e t u r n dom max [ d ] ; i n t GridMin ( i n t d = 0) r e t u r n g r i d m i n [ d ] ; i n t GridMax ( i n t d = 0) r e t u r n g r i d m a x [ d ] ; f l o a t ( e v a l u a t e c a l l b a c k ) ( i n t ) ; s t a t i c c o n s t f l o a t INVALID DATA ; // s p l i n e e v a l u a t i o n f o r f u n c t i o n i n t e r p o l a t i o n ( no d e r i v a t i v e s ) v i r t u a l f l o a t o p e r a t o r ( ) ( f l o a t x ) = 0 ; // s p l i n e e v a l u a t i o n, d e r i v a t i v e c o u n t s g i v e n i n dx [ ] v i r t u a l f l o a t o p e r a t o r ( ) ( i n t dx, f l o a t x ) = 0 ; protected : i n t dimensions ; // N, number o f d i m e n s i o n s i n t d e g r e e ; // D, d e g r e e o f p o l y n o m i a l s p l i n e i n t Dp1 ; // D+1 i n t Dp1 to N ; // power (D+1,N) i n t Dp1 to 2N ; // power (D+1,2N) i n t dim ; // d i m e n s i o n s i z e s dim [ 0 ] t h r o u g h dim [ N 1] f l o a t data ; // N d i m e n s i o n a l a r r a y o f data f l o a t dom min ; // minimum a l l o w e d v a l u e o f s p l i n e i n p u t v e c t o r f l o a t dom max ; // maximum a l l o w e d v a l u e o f s p l i n e i n p u t v e c t o r i n t g r i d m i n ; // minimum a l l o w e d v a l u e f o r c u r r e n t l o c a l g r i d i n t g r i d m a x ; // maximum a l l o w e d v a l u e f o r c u r r e n t l o c a l g r i d i n t base ; // base i n d i c e s f o r g r i d o f l o c a l c o n t r o l p o i n t s i n t o l d b a s e ; // o l d base i n d i c e s f o r g r i d o f l o c a l c o n t r o l p o i n t s f l o a t cache ; // cache f o r s u b b l o c k o f data f l o a t i n t e r ; // i n t e r m e d i a t e p r o d u c t o f data w i t h b l e n d i n g m a t r i x f l o a t p o l y ; // p o l y [N ] [ D+1] f o r s t o r i n g p o l y n o m i a l s and d e r i v a t i v e s f l o a t c o e f f ; // c o e f f i c i e n t s f o r p o l y n o m i a l c o n s t r u c t i o n f l o a t m a t r i x ; // (D+1)x (D+1) b l e n d i n g m a t r i x f l o a t p r o d u c t ; // o u t e r t e n s o r p r o d u c t o f m a t r i x w i t h i t s e l f N t i m e s i n t s k i p ; // f o r s k i p p i n g z e r o v a l u e s o f mtensor v i r t u a l v o i d EvaluateUnknownData ( ) = 0 ; v i r t u a l v o i d C o m p u t e I n t e r m e d i a t e ( ) = 0 ; v o i d S e t P o l y n o m i a l ( i n t o r d e r, f l o a t d i f f, f l o a t p o l y ) ; p r i v a t e : v o i d C r e a t e ( ) ; s t a t i c i n t Choose ( i n t n, i n t k ) ; s t a t i c f l o a t B l e n d M a t r i x ( i n t deg ) ; // e r r o r h a n d l i n g p u b l i c : s t a t i c i n t v e r b o s e ; s t a t i c unsigned e r r o r ; s t a t i c v o i d Report ( ostream& o s t r ) ; p r i v a t e : s t a t i c c o n s t unsigned i n v a l i d d i m e n s i o n s ; s t a t i c c o n s t unsigned i n v a l i d d e g r e e ; s t a t i c c o n s t unsigned i n v a l i d d i m ; s t a t i c c o n s t unsigned n u l l d a t a ; s t a t i c c o n s t unsigned a l l o c a t i o n f a i l e d ; s t a t i c c o n s t char message [ ] ; s t a t i c i n t Number ( unsigned s i n g l e e r r o r ) ; s t a t i c v o i d Report ( unsigned s i n g l e e r r o r ) ; ; The static member INVALID DATA is initialized to MAXFLOAT in the spline source file. Method SetPolynomial evaluates polynomial terms. Method Create does the allocation and initialization of the various arrays. Method Choose computes the combinatorial value of n choose k. Method BlendMatrix computes the blending matrix described in the theory of B-splines. The member function pointer evaluate callback is 22

initialized to null by the constructor. If your application requires data to be computed on-demand, then set this pointer to the function that generates your data. If necessary, you ll need to map the input index to an n-dimensional coordinate if your function is defined on coordinates. Eventually I should put in an additional pointer to a callback that takes as input a coordinate. The evaluation operators then need to check if either pointer is not null. The method EvaluateUnknownData, which calls the callback, also needs to be duplicated so that one handles the index-callback, the other handles the coordinate-callback. 8.1 1D Splines The class declaration for 1D splines is in spline1.h: c l a s s mgcspline1 : p u b l i c mgcspline p u b l i c : mgcspline1 ( i n t d e g r e e, i n t dim, f l o a t d a t a ) ; // s p l i n e e v a l u a t i o n f o r f u n c t i o n i n t e r p o l a t i o n ( no d e r i v a t i v e s ) f l o a t o p e r a t o r ( ) ( f l o a t x ) ; f l o a t o p e r a t o r ( ) ( f l o a t x ) r e t u r n ( t h i s ) ( x ) ; // s p l i n e e v a l u a t i o n, d e r i v a t i v e count g i v e n i n dx f l o a t o p e r a t o r ( ) ( i n t dx, f l o a t x ) ; f l o a t o p e r a t o r ( ) ( i n t dx, f l o a t x ) r e t u r n ( t h i s ) ( dx, x ) ; p r i v a t e : s t a t i c c o n s t i n t N; ; v o i d EvaluateUnknownData ( ) ; v o i d C o m p u t e I n t e r m e d i a t e ( ) ; The static variable N conveniently stores the constant dimension 2. An example of how to use the class to build a bicubic spline curve is given below. i n t main ( ) i n t d e g r e e = 3 ; i n t dim = 1 0 0 ; // number o f c u r v e p o i n t s to i n t e r p o l a t e f l o a t xdata [ 1 0 0 ], ydata [ 1 0 0 ] ; // t h e p o i n t s, s e t them to w h a t e v e r mgcspline1 x ( degree, dim, xdata ), y ( degree, dim, ydata ) ; // c a l c u l a t e a p o i n t on t h e b i c u b i c s p l i n e c u r v e cout << p o i n t i s ( << x ( 5 0 ) <<, << y ( 5 0 ) << ) <, e n d l ; // c a l c u l a t e a t a n g e n t a t t h e p o i n t cout << t a n g e n t i s ( << x ( 1, 5 0 ) <<, << y ( 1, 5 0 ) << ) <, e n d l ; r e t u r n 0 ; 8.2 2D Splines The class declaration for 2D splines is in spline2.h: c l a s s mgcspline2 : p u b l i c mgcspline p u b l i c : mgcspline2 ( i n t d e g r e e, i n t dim, f l o a t d a t a ) ; i n t I n d e x ( i n t x, i n t y ) r e t u r n x+dim [ 0 ] y ; 23

// s p l i n e e v a l u a t i o n f o r f u n c t i o n i n t e r p o l a t i o n ( no d e r i v a t i v e s ) f l o a t o p e r a t o r ( ) ( f l o a t x ) ; f l o a t o p e r a t o r ( ) ( f l o a t x, f l o a t y ) ; // s p l i n e e v a l u a t i o n, d e r i v a t i v e c o u n t s g i v e n i n dx [ ] f l o a t o p e r a t o r ( ) ( i n t dx, f l o a t x ) ; f l o a t o p e r a t o r ( ) ( i n t dx, i n t dy, f l o a t x, f l o a t y ) ; p r i v a t e : s t a t i c c o n s t i n t N; ; v o i d EvaluateUnknownData ( ) ; v o i d C o m p u t e I n t e r m e d i a t e ( ) ; The static variable N conveniently stores the constant dimension 2. The method Index converts 2D coordinates to 1D indices. An example of using the class is shown below. This one uses the callback mechanism to compute data on-demand. i n t q u a n t i t y ; f l o a t C a l l b a c k ( i n t i n d e x ) r e t u r n exp ( i n d e x / f l o a t ( q u a n t i t y ) ) ; i n t main ( ) i n t d e g r e e = 3 ; i n t dim [ 2 ] = 8, 8 ; q u a n t i t y = dim [ 0 ] dim [ 1 ] ; i n t x [ 2 ] ; f l o a t data = new f l o a t [ q u a n t i t y ] ; f o r ( x [ 1 ] = 0 ; x [ 1 ] < dim [ 1 ] ; x [1]++) f o r ( x [ 0 ] = 0 ; x [ 0 ] < dim [ 0 ] ; x [0]++) i n t i n d e x = x [0]+ dim [ 0 ] x [ 1 ] ; data [ i n d e x ] = mgcspline : : INVALID DATA ; mgcspline2 f ( degree, dim, data ) ; f. e v a l u a t e c a l l b a c k = C a l l b a c k ; f l o a t r x [ 2 ] ; f o r ( x [ 1 ] = f. DomainMin ( 1 ) ; x [ 1 ] <= f. DomainMax ( 1 ) ; x [1]++) r x [ 1 ] = x [ 1 ] ; f o r ( x [ 0 ] = f. DomainMin ( 0 ) ; x [ 0 ] <= f. DomainMax ( 0 ) ; x [0]++) r x [ 0 ] = x [ 0 ] ; cout << f ( r x ) << ; cout << e n d l ; d e l e t e [ ] data ; r e t u r n 0 ; 8.3 3D Splines The class declaration for 3D splines is in spline3.h: c l a s s mgcspline3 : p u b l i c mgcspline p u b l i c : mgcspline3 ( i n t d e g r e e, i n t dim, f l o a t d a t a ) ; 24

i n t I n d e x ( i n t x, i n t y, i n t z ) r e t u r n x+dim [ 0 ] ( y+dim [ 1 ] z ) ; // s p l i n e e v a l u a t i o n f o r f u n c t i o n i n t e r p o l a t i o n ( no d e r i v a t i v e s ) f l o a t o p e r a t o r ( ) ( f l o a t x ) ; f l o a t o p e r a t o r ( ) ( f l o a t x, f l o a t y, f l o a t z ) ; // s p l i n e e v a l u a t i o n, d e r i v a t i v e c o u n t s g i v e n i n dx [ ] f l o a t o p e r a t o r ( ) ( i n t dx, f l o a t x ) ; f l o a t o p e r a t o r ( ) ( i n t dx, i n t dy, i n t dz, f l o a t x, f l o a t y, f l o a t z ) ; p r i v a t e : s t a t i c c o n s t i n t N; ; v o i d EvaluateUnknownData ( ) ; v o i d C o m p u t e I n t e r m e d i a t e ( ) ; The static variable N conveniently stores the constant dimension 3. The method Index converts 3D coordinates to 1D indices. An example of using the class is shown below. This one precomputes the data. i n t main ( ) i n t dim [ 3 ] = 8, 8, 8 ; i n t q u a n t i t y = dim [ 0 ] dim [ 1 ] dim [ 2 ] ; f l o a t data = new f l o a t [ q u a n t i t y ] ; mgcspline3 f ( 3, dim, data ) ; i n t x [ 3 ] ; f o r ( x [ 2 ] = 0 ; x [ 2 ] < dim [ 2 ] ; x [2]++) f o r ( x [ 1 ] = 0 ; x [ 1 ] < dim [ 1 ] ; x [1]++) f o r ( x [ 0 ] = 0 ; x [ 0 ] < dim [ 0 ] ; x [0]++) i n t i n d e x = f. I n d e x ( x [ 0 ], x [ 1 ], x [ 2 ] ) ; data [ i n d e x ] = exp ( i n d e x / f l o a t ( q u a n t i t y ) ) ; i n t o r d e r [ 3 ] = 1, 0, 2 ; // compute f x z z ( x, y, z ) f l o a t r x [ 3 ] ; f o r ( x [ 2 ] = f. DomainMin ( 2 ) ; x [ 2 ] <= f. DomainMax ( 2 ) ; x [2]++) r x [ 2 ] = x [ 2 ] ; f o r ( x [ 1 ] = f. DomainMin ( 1 ) ; x [ 1 ] <= f. DomainMax ( 1 ) ; x [1]++) r x [ 1 ] = x [ 1 ] ; f o r ( x [ 0 ] = f. DomainMin ( 0 ) ; x [ 0 ] <= f. DomainMax ( 0 ) ; x [0]++) r x [ 0 ] = x [ 0 ] ; cout << f ( o r d e r, r x ) << ; cout << e n d l ; cout << e n d l ; d e l e t e [ ] data ; r e t u r n 0 ; 8.4 4D Splines The class declaration for 3D splines is in spline3.h: c l a s s mgcspline4 : p u b l i c mgcspline p u b l i c : mgcspline4 ( i n t d e g r e e, i n t dim, f l o a t d a t a ) ; i n t I n d e x ( i n t x, i n t y, i n t z, i n t w) r e t u r n x+dim [ 0 ] ( y+dim [ 1 ] ( z+dim [ 2 ] w ) ) ; 25