recent developments of approximation theory and greedy algorithms Peter Binev Department of Mathematics and Interdisciplinary Mathematics Institute University of South Carolina Reduced Order Modeling in General Relativity Pasadena, CA June 6-7, 2013
Outline Table of Contents Greedy Algorithms Initial Remarks Greedy Bases Examples for Greedy Algorithms Tree Approximation Initial Setup Binary Partitions Near-Best Approximation Near-Best Tree Approximation Parameter Dependent PDEs Reduced Basis Method Kolmogorov Widths Results Robustness
Greedy Algorithms Polynomial Approximations Initial Remarks find the best L 2 -approximation via polynomials to a function in an interval [a, b] space X = L 2 [a, b] of functions: f X f < ( ) 1 norm f = f X = f 2 := f(x) 2 2 dx [a,b] basis ϕ 1 = 1, ϕ 2 = x, ϕ 3 = x 2,..., ϕ n = x n 1,... space of polynomials of degree n 1 Φ n := span{ϕ 1, ϕ 2,..., ϕ n } approximation p n := argmin f p p Φ n n representation p n = c n,j (f)ϕ j j=1 in general, the coefficients c n,j (f) are not easy to find in this case we can use orthogonality Hilbert spaces
Greedy Algorithms Polynomial Approximations Initial Remarks find the best L 2 -approximation via polynomials to a function in a domain Ω space X = L 2 (Ω) of functions: f X f < ( ) 1 norm f = f X = f 2 := f(x) 2 2 dx basis ϕ 1, ϕ 2, ϕ 3,..., ϕ n,... space of polynomials of degree n Φ n := span{ϕ 1, ϕ 2,..., ϕ n } approximation p n := argmin f p p Φ n n representation p n = c n,j (f)ϕ j j=1 in general, the coefficients c n,j (f) are not easy to find in this case we can use orthogonality Hilbert spaces Ω
Hilbert Space Setup Greedy Algorithms Initial Remarks Banach space X normed linear space f X Hilbert space H Banach space with a scalar product f, g L 2 (Ω) is a Hilbert space f, g := f(x)g(x) dx Ω ( ) 1 2 (induced) norm f := f, f orthogonality f g f, g = 0 ḡ - complex conjugation n 1 orthogonal basis ψ n = ϕ n + q j ϕ j and ψ n Φ n 1 Gramm-Schmidt j=1 C j do not depend on n space of polynomials Φ n = span{ϕ 1, ϕ 2,..., ϕ n } = span{ψ 1, ψ 2,..., ψ n } n f p representation p n = C j (f)ψ j := argmin p Φ n C j (f) := f, ψ j ψ j, ψ j j=1 in case ψ j = 1, we have p n = n f, ψ j ψ j j=1
Greedy Algorithms Approximation in Hilbert Spaces Initial Remarks orthonormal basis of H: ψ 1, ψ 2, ψ 3,..., ψ n,... { } ψ j { j=1 0 if j k ψ j, ψ k = δ j,k := H = span { } ψ 1 if j = k j j=1 linear approximation p n = n f, ψ j ψ j Φ n j=1 approximation error f p n 2 = nonlinear approximation g n = j=n+1 f, ψj 2 n f, ψ kj ψ kj index set Λ n = { k 1, k 2,..., k n } IN Λn = Λ n 1 {k n } j=1 How to find Λ n? f, ψ k1 f, ψ k2 f, ψ k3... Parseval s Identity f 2 = f, ψ j 2 j=1
Greedy Algorithms Nonlinear Approximation Initial Remarks given a basis { } ψ j, choose any n elements from it and form the linear j=1 n combinations C j ψ kj j=1 approximation class (not a space!) Σ n := { g = } C k ψ k : Λ IN, #Λ n k Λ ( {ψj } ) Σ n = Σ n can be defined for any basis in a Banach space X j=1 approximate f X via functions from Σ n best approximation σ n (f) := inf g Σ n f g basic question: how to find g n Σ n such that f g n Cσ n? Note that although { } ψ j j=1 and { } ϕ j might yield the same polynomial spaces Φn, it is usually the case that j=1 ( {ψj } ) ( {ϕj } ) Σ n Σ j=1 n for all n and even the rates of σ j=1 n can be completely different
Greedy Approximation Greedy Algorithms Greedy Bases how to find efficiently g n Σ n that approximates f well? the case of an orthonormal basis { } ψ j in a Hilbert space X j=1 incremental algorithm for finding g n = f, ψ k ψ k k Λ n Λ 0 = and Λ j = Λ j 1 {k j }, where k j = argmax f, ψ k = argmax f g j 1, ψ k k IN\Λ j 1 k IN for a general basis in a Banach space X, let f X has the representation f = c j (f)ψ j j=1 Greedy Algorithm : define g n = k Λ n c k (f)ψ k for Λ 0 = and Λ j = Λ j 1 {k j }, where k j = argmax k IN\Λ j 1 c k (f) Note that in the general case g n is no longer the best approximation from Σ n to f
Greedy Basis Greedy Algorithms Greedy Bases the bases, for which g n is a good approximation Greedy Basis { } ψ j j=1 for any f X the greedy approximation g n = g n (f) to f satisfies f g n Gσ n (f) with a constant G independent on f and n Unconditional Basis { } ψ j j=1 for any sign sequence { } θ j j=1, ( ) θ j = ±1, the operator M θ defined by M θ a j ψ j = θ j a j ψ j is bounded Democratic Basis { } ψ j there exists a constant D such that for j=1 any two finite sets of indeces P and Q with the same cardinality #P = #Q we have ψ k D ψ k k P Theorem [Konyagin, Temlyakov] A basis is greedy if and only if it is unconditional and democratic. k Q j=1 j=1
Weak Greedy Algorithm Greedy Algorithms Greedy Bases often it is difficult (or even impossible) to find the maximizing element ψ k settle for an element which is at least γ times the best with 0 < γ 1 define g n (f) := k Λ n c k (f)ψ k for Λ 0 = and Λ j = Λ j 1 {k j }, where c kj (f) γ max k IN\Λ j 1 c k (f) Theorem [Konyagin, Temlyakov] For any greedy basis of a Banach space X and any γ (0, 1] there is a basis-specific constant C(γ), independent on f and n, such that f g n (f) C(γ) σ n (f)
General Greedy Strategy Greedy Algorithms Examples for Greedy Algorithms start with g 0 = 0 and Λ 0 = set j = 1 and loop through the next items analyze the element f g 0 with the possible improvements related to (some of) the elements from { } ψ k by calculating a decision functional k=1 λ j (f g j 1, ψ k ) for each possible ψ k in tree approximation the number of possible elements is bounded by (a multiple of) j in the classical settings λ j is usually related to inf k,c f g j 1 Cψ k be greedy, use the element ψ kj with the largest λ j or at least the one, for which λ j (f g j 1, ψ kj ) γ sup λ j (f g j 1, ψ k ) k set Λ j = Λ j 1 {k j } calculate the next approximation g j based on { ψ k } k Λ j in the classical settings g j is found in the form g j 1 + Cψ kj set j := j + 1 and continue the loop
Greedy Algorithms Examples for Greedy Algorithms Greedy Algorithms for Dictionaries in Hilbert Spaces { ψk } k=1 is a dictionary (not a basis!) in a Hilbert space with ψ k = 1 Pure Greedy Algorithm k j := argmax f g j 1, ψ k and g j = g j 1 + f g j 1, ψ kj ψ kj k
Greedy Algorithms Examples for Greedy Algorithms Greedy Algorithms for Dictionaries in Hilbert Spaces { ψk } k=1 is a dictionary (not a basis!) in a Hilbert space with ψ k = 1 Pure Greedy Algorithm k j := argmax f g j 1, ψ k and g j = g j 1 + f g j 1, ψ kj ψ kj k Orthogonal Greedy Algorithm k j := argmax f g j 1, ψ k and g j = P {ψk } k Λj f k where P Ψ f is the orthogonal projection of f on the space span{ψ}
Greedy Algorithms Examples for Greedy Algorithms Greedy Algorithms for Dictionaries in Hilbert Spaces { ψk } k=1 is a dictionary (not a basis!) in a Hilbert space with ψ k = 1 Pure Greedy Algorithm k j := argmax f g j 1, ψ k and g j = g j 1 + f g j 1, ψ kj ψ kj k Orthogonal Greedy Algorithm k j := argmax f g j 1, ψ k and g j = P {ψk } k Λj f k where P Ψ f is the orthogonal projection of f on the space span{ψ} Weak Greedy Algorithm with 0 < γ 1 f g j 1, ψ kj γ sup f g j 1, ψ k k and g j = g j 1 + f g j 1, ψ kj ψ kj
Greedy Algorithms Examples for Greedy Algorithms Greedy Algorithms for Dictionaries in Hilbert Spaces { ψk } k=1 is a dictionary (not a basis!) in a Hilbert space with ψ k = 1 Pure Greedy Algorithm k j := argmax f g j 1, ψ k and g j = g j 1 + f g j 1, ψ kj ψ kj k Orthogonal Greedy Algorithm k j := argmax f g j 1, ψ k and g j = P {ψk } k Λj f k where P Ψ f is the orthogonal projection of f on the space span{ψ} Weak Greedy Algorithm with 0 < γ 1 f g j 1, ψ kj γ sup f g j 1, ψ k k and g j = g j 1 + f g j 1, ψ kj ψ kj Weak Orthogonal Greedy Algorithm with 0 < γ 1 f g j 1, ψ kj γ sup f g j 1, ψ k and g j = P {ψk } k Λj f k
Greedy Algorithms Examples for Greedy Algorithms Greedy Algorithms for Dictionaries in Hilbert Spaces { ψk } k=1 is a dictionary (not a basis!) in a Hilbert space with ψ k = 1 Pure Greedy Algorithm k j := argmax f g j 1, ψ k and g j = g j 1 + f g j 1, ψ kj ψ kj k Orthogonal Greedy Algorithm k j := argmax f g j 1, ψ k and g j = P {ψk } k Λj f k where P Ψ f is the orthogonal projection of f on the space span{ψ} Weak Greedy Algorithm with 0 < γ 1 f g j 1, ψ kj γ sup f g j 1, ψ k k and g j = g j 1 + f g j 1, ψ kj ψ kj Weak Orthogonal Greedy Algorithm with 0 < γ 1 f g j 1, ψ kj γ sup f g j 1, ψ k and g j = P {ψk } k Λj f k more in [V. Temlyakov, Greedy Approximation, Cambridge University Press, 2011]
Two Additional Examles Greedy Algorithms Examples for Greedy Algorithms Coarse-to-Fine Algorithms in Tree Approximation framework for adaptive partitioning strategies gn corresponds to a (binary) tree with complexity n functionals λk are estimators of the local errors search for kj is limited to the leaves of the tree corresponding to g j 1 greedy strategy does not work needs modifications theoretical estimates ensure near-best approximation with essentially linear complexity
Two Additional Examles Greedy Algorithms Examples for Greedy Algorithms Coarse-to-Fine Algorithms in Tree Approximation framework for adaptive partitioning strategies gn corresponds to a (binary) tree with complexity n functionals λk are estimators of the local errors search for kj is limited to the leaves of the tree corresponding to g j 1 greedy strategy does not work needs modifications theoretical estimates ensure near-best approximation with essentially linear complexity Greedy Approach to Reduced Basis Method the problem is to estimate a high dimensional parametric set via a low dimensional subspace the error cannot be calculated efficiently and one has to settle with a calculation of a surrogate using weak greedy strategies is a must the general comparison of the greedy approximation with the best approximation requires exponential constants should apply finer estimation techniques
Tree Approximation Initial Setup Adaptive Approximation on Binary Partitions Function f : X Y X IR d domain equipped with a measure ρ X such that ρ X (X ) = 1 Y [ M, M] IR for a given constant M Adaptive binary partitions of X building blocks j,k with j = 1, 2,... and k = 0, 1,.., 2 j 1 k represents a bitstream with length j 0, = X ; 1,0 1,1 = 0, and ρ X ( 1,0 1,1 ) = 0 j+1,2k j+1,2k+1 = j,k and ρ X ( j+1,2k j+1,2k+1 ) = 0 adaptive partition P : start with 0, and for certain pairs (j, k) replace j,k with j+1,2k and j+1,2k+1 corresponding binary tree T = T (P) with nodes j,k
Binary Partitions Tree Approximation Binary Partitions
Binary Partitions Tree Approximation Binary Partitions
Binary Partitions Tree Approximation Binary Partitions
Tree Approximation Binary Partitions Adaptive Approximation on Binary Partitions piecewise polynomial approximation of f on the partition P f P (x) := P p,f (x) χ (x) the process of finding an appropriate partition P can be defined on the corresponding tree T = T (P) tree algorithms
Tree Approximation Binary Partitions Adaptive Approximation on Binary Partitions piecewise polynomial approximation of f on the partition P f P (x) := P p,f (x) χ (x) the process of finding an appropriate partition P can be defined on the corresponding tree T = T (P) tree algorithms in T the node j,k is the parent of its children j+1,2k and j+1,2k+1 not every tree corresponds to a partition admissible trees T is admissible if for each node T its sibling is also in T the elements of P are the terminal nodes of T, its leaves L(T ) usually the complexity of P N = #P is measured by the number of its elements the number of nodes of the binary tree T (P) is equivalent measure since #T (P) = 2N 1
Tree Approximation Near-Best Approximation Near-Best Approximation Best Approximation σ N (f) := inf P : #P N f f P
Tree Approximation Near-Best Approximation Near-Best Approximation Best Approximation σ N (f) := inf P : #P N f f P Approximation class A s (X): f A s (X) σ N (f) = O(N s/d ) shows the asymptotic behavior of the approximation note the dependence on the dimension curse of dimensionality Usually, the theoretical results are given in terms of how the algorithms perform for functions from an approximation class does not provide any assurance about the performance for an individual function. Can we do better? Near-Best Approximation f there exist constants C 1 < and c 2 > 0 such that f f C 1 σ c2n (f) Sometimes referred as instance optimality
Error Functionals Tree Approximation Near-Best Tree Approximation a functional e : node T error e( ) 0 total error E(T ) := e( ). L(T ) Subadditivity For any node T if C( ) is the set of its children, then e( ) e( ) C( ) Weak Subadditivity There exists C 0 1 such that for any T and for any finite subtree T T with root node e( ) C 0 e( ) L(T )
Tree Approximation Near-Best Tree Approximation Greedy Strategy for Tree Approximation Example: Approximation in L 2 [0, 1] of a function f defined as linear combination of scaled Haar functions: f(x) := AH 0 + B I H 0 := [0, 2 M ], where M is huge constant I set of 2 k 1 dyadic subintervals of [ 1 2, 1] with length 2 k H L2[0,1] = 1 and A = B + ε with ε > 0 arbitrarily small (B > 0) e([0, 2 m ]) = A 2 for m M and e( ) = B 2 for I The greedy algorithm will first subdivide [1/2, 1] and its descendants until we obtain the set of intervals I. From then on it will subdivide [0, 2 m ] for m M (the ancestors of 0 ). After N := 2 k + M 2 subdivisions, the greedy algorithm will give the tree T with error E(T ) = f 2 L 2 = A 2 + 2 k B 2. If we would have subdivided [1/2, 1] and its descendants to dyadic level k + 1, we would have used just n := 2 k+1 subdivisions and gotten an error σ n (f) = A 2. σ 2 k+1 = A 2 < 2 k (A 2 + 2 k B 2 ) = E(T ) with #T = 2 k + M 2>>2 k+1
Tree Approximation Near-Best Tree Approximation Modified Greedy Strategy for Tree Approximation the standard greedy strategy does not work for tree approximation need a modification that will change the decision functional design modified error functionals to appropriately penalize the depth of subdivision use the greedy strategy based on these modified error functionals use dynamic instead of static decision functionals extensions of the algorithms for high dimensions - sparse occupancy trees
Tree Approximation Near-Best Tree Approximation Basic Idea of Tree Algorithm [B., DeVore 2004] For all of the nodes of the initial tree T 0 Then, for each child j, j = 1,..., m( ) of we define ẽ( ) = e( ). m( ) j=1 e( j ) ẽ( j ) := q( ) := e( ) + ẽ( ) ẽ( ). Note that ẽ is constant on the children of. Define the penalty terms p( j ) := e( j) ẽ( j ) The main property of ẽ : m( ) j=1 p( j ) = p( ) + 1.
Tree Approximation Near-Best Tree Approximation Adaptive Algorithm on Binary Trees [2007] Modified Error ẽ : initial partition subtree T 0 T, T 0 : ẽ( ) := e( ) ( 1 for each child j of : ẽ( j ) := e( j ) + 1 ) 1 ẽ( ) Adaptive Tree Algorithm ( creates a sequence of trees T j, j = 1, 2,... ): start with T 0 subdivide leaves L(T j 1 ) with largest ẽ( ) to produce T j To eliminate sorting, we can consider all ẽ( ) with 2 l ẽ( ) < 2 l+1 as if they are equally large.
Tree Approximation Near-Best Tree Approximation Near-Best Approximation on Binary Trees Best Approximation σ N (f) := inf P : #P N f f P Assume that the error functionals e( ) 0 satisfy the subadditivity condition. Then the adaptive tree algorithm that produces a tree T N corresponding to a partition P N with N n elements satisfies ( ) E(T N ) = f f N N σ n (f) N n + 1 This gives the constant C 1 = in the general estimate f f N C 1 σ c2n(f). N (1 c 2)N+1 for any chosen 0 < c 2 1
Parameter Dependent PDEs Parameter Dependent PDEs input parameters µ D IR p differential operator A µ : H H functional l H solution u µ of A µu µ = l(u µ) quantity of interest I(µ) = l(u µ) I µ opt µ D example: 2 H = 2 = a µ(, ) A µ, v := a µ(u, v) = p j=1 θ j (µ j ) Ω j u v dx uniform ellipticity: c 1 v 2 a µ(v, v) C 1 v 2 v H, µ D [Y. Maday, T. Patera, G. Turicini,...]
Parameter Dependent PDEs Reduced Basis Method Reduced Basis exploit sparsity solution manifold compact set F := { u µ = A 1 µ l : µ D } H offline: compute f 0, f 1,..., f n 1 such that for F n := span{f 0,..., f n 1 } σ n := max f F f P nf [tollerance] online: for each µ-query solve a small Galerkin problem in F n a µ (u n µ, f j ) = l(f j ) j = 0,..., n 1
Example Parameter Dependent PDEs Reduced Basis Method 3 a µ(u, v) = j=1 µ j u v dx + u v dx Ω j Ω 4 l(v) = v ds Γ C µ 1 = 0.1, µ 2 = 0.3, µ 3 = 0.8 l(u) = 1.24705 µ 1 = 0.4, µ 2 = 0.4, µ 3 = 7.1 l(u) = 0.58505
Parameter Dependent PDEs Reduced Basis Method Reduced Basis exploit sparsity offline: f 0, f 1,..., f n 1 such that for F n := span{f 0,..., f n 1 } σ n := max f F f P nf [tollerance] online: for each µ solve a small Galerkin problem in F n a µ (u n µ, f j ) = l(f j ) j = 0,..., n 1 l(u µ ) l(u n µ) = a µ (u µ u n µ, u µ u n µ) C 1 σ n (F) 2 C 1 [tollerance] 2 use u n µ for solving the optimization problem
Parameter Dependent PDEs Reduced Basis Method Basis Construction greedy approach ideal algorithm pure greedy f0 := argmax f, F 1 := span{f 0 }, σ 1 (F) := f 0 f F given F n := span{f 0,..., f n 1 } and σ n (F) := max f F f P nf fn := argmax f P n f f F a feasible variant weak greedy algorithm using a computable surrogate R n (f) for which c 2 R n (f) f P n f C 2 R n (f) f fn γ σ n (F) e.g. f n := argmax R n (f) f F and γ = c2 C 2
Kolmogorov Widths Parameter Dependent PDEs Kolmogorov Widths d n (F) := inf dim(y )=n sup f F dist H (f, Y ) σ n (F) Can one bound σ n (F) in terms of d n (F)? Are optimal subspaces spanned by elements of F? d n (F) := inf Y F n sup f F dist H (f, Y ) σ n (F) Theorem: for any compact set F we have dn (F) (n + 1) d n (F) given any ε > 0 there is a set F such that d n (F) (n 1 ε) d n (F)
Parameter Dependent PDEs Widths vs Greedy Kolmogorov Widths n = 1 n = 2 Kolmogorov Widths vs Greedy Basis
Parameter Dependent PDEs Results Results pure greedy (γ = 1) [Buffa, Maday, Patera, Prudhomme, Turinici] σ n (F) C n2 n d n (F) slight improvement σ n (F) 2n+1 3 d n (F) for any n > 0 and any ε > 0 there exists a set F = F n such that for the pure greedy algorithm σ n (F) (1 ε)2 n d n (F)
Parameter Dependent PDEs Results Results pure greedy (γ = 1) [Buffa, Maday, Patera, Prudhomme, Turinici] σ n (F) C n2 n d n (F) slight improvement σ n (F) 2n+1 3 d n (F) for any n > 0 and any ε > 0 there exists a set F = F n such that for the pure greedy algorithm σ n (F) (1 ε)2 n d n (F) What if 2 n d n (F) 0? What if γ < 1?
Parameter Dependent PDEs Polynomial Convergence Rates Results Theorem [Binev, Cohen, Dahmen, DeVore, Petrova, Wojtaszczyk] Suppose that d 0 (F) M. Then d n (F) Mn α for n > 0 σ n (F) CMn α for n > 0 with C := 4 α q α+ 1 2 and q := 2 α+1 γ 1 2.
Parameter Dependent PDEs Polynomial Convergence Rates Results Theorem [Binev, Cohen, Dahmen, DeVore, Petrova, Wojtaszczyk] Suppose that d 0 (F) M. Then d n (F) Mn α for n > 0 σ n (F) CMn α for n > 0 with C := 4 α q α+ 1 2 and q := 2 α+1 γ 1 2. using the Flatness Lemma: Let 0 < θ < 1 and assume that for q := 2θ 1 γ 1 2 and some integers m and n we have σ n+qm (F) θ σ n (F). Then σ n (F) q 1 2 dm (F).
Idea of the Proof Parameter Dependent PDEs Results σ n+qm (F) θσ n(f) σ n(f) q 1 2 d m(f)
Parameter Dependent PDEs Results Idea of the Proof σ n+qm (F) θσ n(f) σ n(f) q 1 2 d m(f) σn(f) CMn α for n N 0 assume it fails for some N > N0 flatness for m n apply flatness lemma contradiction d n (F) Mn α for n > 0 σ n (F) CMn α for n > 0
Parameter Dependent PDEs Sub-Exponential Rates Results finer resolutions between n α and e an Theorem [DeVore, Petrova, Wojtaszczyk] For any compact set F and n 1, we have 2 σ n (F) min γ d n m n m (F) 1 m<n In particular, σ 2n (F) 2dn (F) γ and d n (F) C 0 e anα for n 1 2C0 σ n (F) γ e c1anα for n 1 with c 1 = 2 1 2α.
Robustness Parameter Dependent PDEs Robustness in reality f j cannot be computed exactly we receive f j (that might not be in F) with f j f j ε instead of F n use F { n := span f0,..., f } n 1 performance of the noisy weak greedy algorithm σ n (F) := sup f F Theorem [polynomial rates, n > 0] dist H (f, F n ) d n (F) Mn α σ n (F) C max{mn α, ε} with C = C(α, γ). similar result for subexponential rates
The End Thanks T HAN K YOU!