Solving Continuous Linear Least-Squares Problems by Iterated Projection by Ral Juengling Department o Computer Science, Portland State University PO Box 75 Portland, OR 977 USA Email: juenglin@cs.pdx.edu Abstract I present a new divide-and-conquer algoritm or solving continuous linear least-squares problems. Te metod is applicable wen te column space o te linear system relating data to model parameters is translation invariant. Te central operation is a matrixvector product, wic maes te metod very easy to implement. Secondly, te structure o te computation suggests a straigtorward parallel implementation. A complexity analysis or sequential implementation sows tat te metod as te same asymptotic complexity as well-nown algoritms or discrete linear least-squares. For illustration we wor out te details or te problem o itting quadratic bivariate polynomials to a piecewise constant unction. Introduction Te linear least-squares problem is to determine te parameter values β or a unction tat minimizes te sum o squared dierences between data values {d i } and unction values {x i ; β} or corresponding points {x i }, m β = argmin β d i x i ; β. i= Te problem is a linear least-squares problem i te model is linear in te parameters, tat is, n x; β = = β x. A classical way o determining β goes as ollows:. Express in matrix orm, β = min d Aβ, β were A= x x n x x x n x x m x m n x m and d is te vector o data values.. Construct and solve te so called normal equations, A T Aβ = A T d. Assuming m > n more data values tan parameters and tat A as ull ran, te solution to may be written in terms o te pseudoinverse A + o A, β = A T A A T d = A + d. 5 Geometrically speaing, d = Aβ is te point in te range o A closest to d in te Euclidean norm. Te pseudoinverse projects d to te solution β, wic may be tougt o as te coordinates o d in te column space o A. In te ollowing we consider te continuous, linear version o problem, β = argmin β dx β x dx. 6 Ω
Section Note tat in te continuous setting we assume tere is a data unction dx, deined over some domain Ω. An example is single-band image data, wic is oten interpreted as a piecewise constant unction over te image domain []. Section introduces te ey ideas o te new metod and explains its prerequisites. Section gives te algoritm and discusses its computational merits. As an example, in Section 5 I wor out te elements o te algoritm or te problem o itting a bivariate quadratic polynomial. Te metod or solving 6 tat I call iterated projection wors or any dimension o Ω, but or te sae o clear exposition I assume Ω as dimension and Ω =Ω =[, ] [, ]. Linear Least-Squares by Iterated Projection Figure. Domain Ω wit coordinate origin at center a. Ω partitioned into our subdomains Ω, Ω, Ω, Ω in clocwise order b, next iner partition c, inest partition at wic te data unction d may be accurately represented over te subdomains d. A local coordinate system is centered on eac subdomain only sown or upper let subdomains in b and c. Recall tat te pseudoinverse in 5 projects te data directly onto a solution vector. Wit iterated projection we also obtain te solution by projection. However, instead o directly projecting te data we project solutions o intermediate least-squares problems. Imagine we partition Ω into our equally sized subdomains Ω,, Ω as sown in Figure b, and tat we are able to solve 6 or eac subdomain, wit solutions β,, β. Can we compute te solution β or Ω rom te our solutions β,, β or Ω,, Ω? As we will see below, te answer is yes, under certain circumstances. Wit problem 6 or subdomain Ω i we mean tat te integral in 6 is over Ω i instead o Ω, and tat te coordinates are wit respect to a coordinate system centered on Ω i c. Figure b. But centering te coordinate systems wit respect to eac Ω i means te our integrals are over te same set o points Ω = [ /, /] [ /, /]. Tese are important aspects o iterated projection: te problems corresponding to te our subdomains are our instances o te same problem only te data term in 6 is dierent, as well as similar to te original problem Ω =Ω instead o Ω =Ω. A ew more bits o notation are necessary to deine te intermediate problems more precisely. We denote te coordinates wit respect to Ω o te our subdomain s coordinate origins by x,, x and te solutions corresponding to te our subdomain problems by β,, β, respectively. In te continuous setting β are te coordinates o a unction closest to d in te space F o unctions over Ω tat is spanned by { Ω } = n. Liewise, te solution β i or subdomain Ω i are te coordinates o a unction closest to x dx + x i, in te space F o unctions over Ω spanned by { Ω } = n. We can use te space F to construct anoter space o unctions over Ω, F = { : Ω R x x+x i F or all x Ω and i= }. Note tat F as dimension n, wereas F as dimension n. Te answer to our question is tat we can construct a solution β or 6 rom te solutions β i or te subdomains i F is a subspace o F, F F. 7
Linear Least-Squares by Iterated Projection. Te Subspace Property Wen does te subspace property 7 old or a set o unctions { }? Wat we must veriy is tat or any set o parameters β tere are parameters β,, β suc tat β x x i = β i, x, x Ω, i=,,. In oter words, relation 7 olds wen te space o unctions over R spanned by { } is translation-invariant. Examples are te spaces o polynomials o any inite order, or te space o unctions spanned by sine-cosine pairs wit same wave number.. Solving te Linear Least-Squares Problem We want to obtain te linear least-squares solution β on Ω = Ω rom te solutions β,, β or Ω s subdomains Ω,, Ω, respectively. As we said above, we obtain β by projecting te solutions β,, β. Let a unction F be parameterized by β R n, β T = β T β T β T T β, were β i corresponds to te part o covering subdomain Ω i, and let F be parameterized by γ R n. Te projector rom F onto F we see maps β to te vector β = argmin γ β i,l l x dx. γ x x i 9 i Ω l Assume, or te moment, we now te projector, a n n-matrix, and call it R. Using R, te solution to problem 6 is β = R β = β R, R, R, R, β β β R,i denotes te obvious n n submatrix o R. Equation expresses te central idea and operation o te iterated projection algoritm: obtain solutions to smaller versions o te linear least-squares problem irst, ten derive te solution to te original problem by a simple matrixvector product. Te solutions to te smaller problems are obtained in te same way, by projecting solutions or sub-subdomains, using a similar projection operator R /. Te resulting algoritm is recursive. Te recursion ends at subdomains over wic te data unction d is an element o te space spanned by { } over tat domain, or wen d may be approximated in tat space wit negligible error Figure d.. Te Projection Operator We now turn to te problem o determining te projection matrix R used in. Tere are dierent ways o deriving R. In tis section I describe one way, and in Section 5 I demonstrate anoter way by example. To begin note tat wen subspace property 7 olds, or every element β in te range o R tere is one vector β or wic 9 is exactly zero. I tis is te case, ten β and β =R β represent te same unction over Ω. We denote P te n n-matrix tat maps β representing a unction in F to te vector β representing te same unction in F, β = P β = P, P, P, P, β. Determining P or a particular set o unctions { } is just anoter way o conirming tat te subspace property 7 olds. P is relevant ere because concrete expressions or it are easy to derive, and we can express R in terms o P below.
Section Te second ingredient we need is te inner product on te parameter space o F tat corresponds to te least-squares Euclidean norm,, g = x gx dx Ω = β x γ l l x dx Ω l Ω Ω Ω n = β T Ω Ω Ω n n γ Ω n Ω n Ω n n = β T Q γ. Q is te Gram matrix o te unctions { }. Inner products corresponding to te subdomainunction spaces are deined in te same way, Ω Ω T Ω, g Ω = β i Ω Ω Ω n Ω n Ω n Ω n n γ i = β T i Q γ i. Ω n n Wit P and Q we can express te least-squares property 9, wic is te deining property o R, witout using integrals R β = argmin γ β i,l l x dx γ x x i i Ω l = argmin γ β i,l l x dx P γ i, x i Ω l = argmin γ β i P γ i T Q β i P γ i i Since te sum in is quadratic in γ we can set its gradient w.r.t γ to zero and solve or γ to ind te minimizer. Te result is te ollowing expression or R, R = P T,i Q P,i P T, Q P T, Q P T, Q P T, Q. 5 i= Computational Caracteristics Te ollowing algoritm Iterated_Projection_LSQ is a very simple instance o iterated projection; it requires tat te data unction is piecewise constant over square regions o a regular partition o Ω into m m = l l regions c. Figure d. Not included in te algoritm description is te construction o te projection matrices R. Tese are assumed to ave been precomputed or all relevant scales prior to calling Iterated_Projection_LSQ. Iterated_Projection_LSQD, Input. D: m m data matrix, : scale o domain Output. β : least-squares solution over domain Ω = [, ] [, ] Steps.. I m = :. β project constant unction dx =D, Section.. else:. partition D, D = D D D D
Computational Caracteristics 5 5. β i Iterated_Projection_LSQD i, /, or i =,, 6. β R β β β β Equation. Computational Complexity o Iterated Projection Algoritm Iterated_Projection_LSQ is ormulated or two-dimensional domains. However, adapting te algoritm to domains o dierent dimensionality is straigtorward and we discuss computational complexity or te general case d dimensions. In te general case D is assumed to ave m d entries and is partitioned into d parts in Step, te projection matrix R is o size n d n, te dept o te recursion l = log m is independent o d. We assume tat analytical expressions or R are available, tus, precomputing te projection matrices means evaluating tose expressions or all l scales. Assuming tat te amount o computation or evaluating eac R -entry is independent o, precomputing te projecting matrices requires Olog m d n operations and Olog m d n space. Computing β in Step. requires On operations per entry o D, or Om d n operations total. Computing β in Step 6. requires a matrix-vector product o O d n. Step 6 is carried out i= l d i = d l d times, resulting in a total operation count o O d logm n or Om d n or Step 6. Steps 5. and 6. togeter may be organized in a loop over all D i. In tat case te amount o space required by iterated projection in addition to te space required or te projection matrices is proportional to n and proportional to te dept o te recursion, Olog m n.. Comparison to Classical Linear Least-Squares Algoritms In discrete linear least-squares problems, te problem domain Ω as been abstracted away and its dimension usually plays no part in a complexity analysis. Instead, te complexity o tose algoritms are expressed in te size o matrix A in, say M n []. Setting M = m d or comparison, te cost o iterated projection is OM n d operations, and O log d M n units o space ignoring space required or entering te problem. A classical algoritm or solving te normal equations is Colesy actorization, Colesy_LSQA, d Input. A: M n model matrix, d: M data vector Output. β : least-squares solution argmin β Aβ d Steps.. Compute B = A T A. Compute y = A T d. Factorize B, B = R T R R upper triangular. Solve R T x= y 5. Solve R β = x Te costs o tese steps are, respectively, OM n, OM n, On, On, and On operations. In most situations were iterated projection is applicable we could carry out Step. and Step. o Colesy_LSQ in advance, and te asymptotic cost would be dominated by Step., OM n. Storing te Colesy actors requires On units o space. Oter classical algoritm s complexity caracteristics are essentielly te same [], Capter, and we conclude tat te asymptotic computational cost o iterated projection is te same as te cost o well-nown algoritms or discrete linear least-squares.
6 Section 5 Discussion I ave presented a new idea, iterated projection, or computing te solution to continuous linear least-squares problems. Te idea is to project te data wic is tougt o as a unction into a sequence o unction spaces o decreasing dimensionality. Te inal projection in tis sequence is into te space in wic te solution is sougt. As discussed in Section., iterated projection is asymptotically as eicient as oter linear least-squares algoritms. However, Iterated_Projection_LSQ is a striingly simple algoritm, consisting essentially o a single operation te matrix-vector product in Step 6 and is arguably easier to accelerate by special ardware. Secondly, te way te computation is organized in Iterated_Projection_LSQ maes it easy to distribute te wor among multiple processors. Classical linear least-squares algoritms and iterated projection do not solve exactly te same problem. Te ormer solve te problem Find β suc tat d Aβ is minimal or β = β, 6 iterated projection solves te problem wit linear in β Find β suc tat dx β x dx is minimal or β = β. 7 Ω In practice problem 7 is oten approximated by a problem o te orm 6, obtained troug sampling d and at selected points {x i } c. to in Section. In suc situations, and wen te subspace property 7 olds, iterated projection is a compelling alternative to discrete linear least-squares algoritms. It may give exact solutions except or rounding errors, and it may require less computation wen d is o appropriate orm e.g., unctions in inite-element spaces; Step in Iterated_Projection_LSQ needs to be adapted accordingly. Anoter interesting aspect o iterated projection is tat, since no linear system needs to be solved, te problems o underdetermined or singular systems do not occur. Tis recommends iterated projection or problems lie least-squares based image reconstruction and segmentation [, ], were small regions oten lead to underdetermined or singular least-squares problems. 5 Appendix Iterated Projection wit Quadratic Polynomials In tis appendix we derive te projection matrix R or problem 7 wit { } given as x = x = x x = x x = x 5 x = x x 6 x = x. At irst we veriy tat te space spanned by,, 6 is translation invariant. Let β represent any suc unction, x; β = 6 = β x, x be te origin o anoter coordinate system, and x ; β denote te representation wit respect to te coordinate system centered at x. Function is te same as i teir values and tat o all teir derivatives are identical at any point. Coose x as tat point and write down te identities to obtain β in terms o β and x. β = β + β x + β x + β x + β5 x x + β 6 x β = β + β x + β 5 x,,,,,, and,, respec- To derive matrix P in we set x to tively, and obtain
Appendix Iterated Projection wit Quadratic Polynomials 7 P, = P, = P, = P, =. Matrix Q in can be seen to be Q = 6 576 6 6 576 6 6 Tese are te ingredients needed to derive R via 5. 9 5. Deriving R by QR Factorization One algoritm or solving least-squares problem is by QR actorization. It consists o computing te reduced QR actorization o A in, A = Q R, by wic we obtain an ortonormal basis or te column space o A. Next, te data vector is expanded in tat ortonormal basis, and, inally, an expansion in te basis o interest te columns o A is obtained by solving a triangular system. I now sow ow to derive R by applying essentially tis algoritm, but in te continuous setting. Te polynomials do not orm an ortogonal basis wit respect to te inner product, oterwise Q in 9 were diagonal. An ortornomal basis or te space spanned by are te Legendre polynomials up to order two. Deined over Ω =[, ] [, ], tese are φ x = φ x = x φ x = x φ x = 5 x φ 5 x = x x φ 6 x = 5 x. Te QR actorization o A= 5 6 over Ω is 5 6 = φ φ φ φ φ 5 φ 6 = Φ C 5 5
Section Te product R β may now be expressed as R β = C Φ T 5 6 dx β i i= Ω i φ, / φ, / φ, 6 / = C φ, / φ, / φ 5, 6 / β i i= φ 6, / φ 6, / φ 6, 6 / = C β, R, R, R, R, ence R = C R, C R, C R, C R,. Evaluating te integrals in R,i we get 9 6 9 6 R, = 9 6 9 6 9 6 9 6 5 6 5 R, = 9 6 9 6 5 6 6 5 6 9 6 5 6 6 9 6 6 5 5 6 5 9 6 9 6 9 6 9 6 R, = 9 6 9 6 5 6 5 R, = 9 6 9 6 5 6 6 9 6 6 5. 6 6 9 6 5 6 5 5 6 5 Multiplying R,i by C we inally arrive at te our components o R, R, = R, = 5 5 6 6 5 9 6 5 5 5 6 6 5 9 6 5 R, = R, = 5 5 6 6 5 9 6 5 5 5 6 6 5. 9 6 5 A Matlab implementation o iterated projection based on tese results is available or download rom tp://tp.cs.pdx.edu/pub/juenglin/iterated_projection/examples.tar. Bibliograpy [] P. J. Besl and R. C. Jain. Segmentation troug variable-order surace itting. IEEE Transactions on Pattern Analysis and Macine Intelligence, :67 9, Marc 9. [] T. Kanungo, B. Dom, W. Niblac, and D. Steele. A ast algoritm or MDL-based multi-band image segmentation. In Proceedings o te Conerence on Computer Vision and Pattern Recognition, pages 69 66, Los Alamitos, CA, USA, 99. IEEE Computer Society Press. [] Y. G. Leclerc. Constructing simple stable descriptions or image partitioning. International Journal o Computer Vision, :7, May 99. [] L. N. Treeten and D. Bau. Numerical Linear Algebra. SIAM, Piladelpia, PA, 997.