Algebraic flux correction and its application to convection-dominated flow Matthias Möller matthias.moeller@math.uni-dortmund.de Institute of Applied Mathematics (LS III) University of Dortmund, Germany 8th Hirschegg Workshop on Conservation Laws 9-15 September 2007, Hirschegg 1
Overview 1 Algebraic flux correction Motivation Design principles Flux limiting 2 Nonlinear solution algorithm Choice of preconditioners Construction of Jacobians Numerical examples 3 Recent work Application to compressible flows Grid adaptivity 4 Conclusions 2
Motivation Scalar transport equation u t + (vu) = 0 High-order methods: wiggles Standard Galerkin FEM M c du dt + Ku = 0 Low-order methods: smearing 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Remedy: nonlinear combination of high- and low-order discretizations. 3
Algebraic flux correction, Kuz05a High-order scheme M c du dt + Ku = 0 K = {k ij}, k ij > 0, j i Apply artificial diffusion D = {d ij }, d ij = max{k ij, 0, k ji } = d ji Low-order scheme M l du dt + Lu = 0 L = K + D, l ij 0, j i Decompose the residual difference into internodal fluxes r H i r L i = f ij, j i f ij = [ m ij d dt + d ij] (uj u i ) = f ji Nonlinear high-resolution scheme containing limited antidiffusion M l du dt + Lu = f, f i = j i α ij f ij, 0 α ij 1 4
Algebraic flux correction, Kuz05a High-order scheme M c du dt + Ku = 0 K = {k ij}, k ij > 0, j i Apply artificial diffusion D = {d ij }, d ij = max{k ij, 0, k ji } = d ji Low-order scheme M l du dt + Lu = 0 L = K + D, l ij 0, j i Decompose the residual difference into internodal fluxes r H i r L i = f ij, j i f ij = [ m ij d dt + d ij] (uj u i ) = f ji Nonlinear high-resolution scheme containing limited antidiffusion M l du dt + Lu = f, f i = j i α ij f ij, 0 α ij 1 4
Algebraic flux correction, Kuz05a High-order scheme M c du dt + Ku = 0 K = {k ij}, k ij > 0, j i Apply artificial diffusion D = {d ij }, d ij = max{k ij, 0, k ji } = d ji Low-order scheme M l du dt + Lu = 0 L = K + D, l ij 0, j i Decompose the residual difference into internodal fluxes r H i r L i = f ij, j i f ij = [ m ij d dt + d ij] (uj u i ) = f ji Nonlinear high-resolution scheme containing limited antidiffusion M l du dt + Lu = f, f i = j i α ij f ij, 0 α ij 1 4
Algebraic flux correction, Kuz05a High-order scheme M c du dt + Ku = 0 K = {k ij}, k ij > 0, j i Apply artificial diffusion D = {d ij }, d ij = max{k ij, 0, k ji } = d ji Low-order scheme M l du dt + Lu = 0 L = K + D, l ij 0, j i Decompose the residual difference into internodal fluxes r H i r L i = f ij, j i f ij = [ m ij d dt + d ij] (uj u i ) = f ji Nonlinear high-resolution scheme containing limited antidiffusion M l du dt + Lu = f, f i = j i α ij f ij, 0 α ij 1 4
Flux limiting Nonlinear high-resolution scheme (lumped-mass version) M l du dt + Lu = f, f i = j i α ij f ij, f ij = d ij (u j u i ) = f ji α ij 0 f = 0 M l du dt + Lu = 0 α ij 1 f = Du M l du dt + Ku = 0 low-order scheme high-order scheme Goal: Determine α ij as close to 1 as possible such that (Lu f ) i = j i q ij (u j u i ), q ij 0 j i 5
Multidimensional flux correction 1. Compute the sums of positive/negative antidiffusive fluxes P + i = j i max{0, f ij }, P i = j i min{0, f ij } 2. Define the corresponding upper/lower bounds Q + i = j i q ij max{0, u j u i }, Q i = j i q ij min{0, u j u i } 3. Evaluate the nodal correction factors for positive/negative fluxes R + i = min{1, Q + i /P + i }, R i = min{1, Q i /P i } 4. Apply correction factors to the raw antidiffusive fluxes f i = j i α ij f ij a ij = { min{r + i, R j }, if f ij 0 min{r i, R + j }, otherwise 6
Multidimensional flux correction 1. Compute the sums of positive/negative antidiffusive fluxes P + i = j i max{0, f ij }, P i = j i min{0, f ij } 2. Define the corresponding upper/lower bounds Q + i = j i q ij max{0, u j u i }, Q i = j i q ij min{0, u j u i } 3. Evaluate the nodal correction factors for positive/negative fluxes R + i = min{1, Q + i /P + i }, R i = min{1, Q i /P i } 4. Apply correction factors to the raw antidiffusive fluxes f i = j i α ij f ij a ij = { min{r + i, R j }, if f ij 0 min{r i, R + j }, otherwise 6
Multidimensional flux correction 1. Compute the sums of positive/negative antidiffusive fluxes P + i = j i max{0, f ij }, P i = j i min{0, f ij } 2. Define the corresponding upper/lower bounds Q + i = j i q ij max{0, u j u i }, Q i = j i q ij min{0, u j u i } 3. Evaluate the nodal correction factors for positive/negative fluxes R + i = min{1, Q + i /P + i }, R i = min{1, Q i /P i } 4. Apply correction factors to the raw antidiffusive fluxes f i = j i α ij f ij a ij = { min{r + i, R j }, if f ij 0 min{r i, R + j }, otherwise 6
Multidimensional flux correction 1. Compute the sums of positive/negative antidiffusive fluxes P + i = j i max{0, f ij }, P i = j i min{0, f ij } 2. Define the corresponding upper/lower bounds Q + i = j i q ij max{0, u j u i }, Q i = j i q ij min{0, u j u i } 3. Evaluate the nodal correction factors for positive/negative fluxes R + i = min{1, Q + i /P + i }, R i = min{1, Q i /P i } 4. Apply correction factors to the raw antidiffusive fluxes f i = j i α ij f ij a ij = { min{r + i, R j }, if f ij 0 min{r i, R + j }, otherwise 6
Nonlinear solution algorithm Nonlinear algebraic system for 0 < θ 1 N(u) := Lu f M l u n+1 u n t + θn(u n+1 ) + (1 θ)n(u n ) = 0 Fixed-point iteration u (0) := u n u (m+1) = u (m) C 1 F (u (m) ), m = 0, 1,... until F (u (m+1) ) tol or F (u (m+1) ) tol F (u n ). Practical implementation C u (m+1) = F (u (m) ) m = 0, 1,... u (m+1) = u (m) u (m+1) u (0) = u n 7
Nonlinear solution algorithm Nonlinear algebraic system for 0 < θ 1 N(u) := Lu f F (u n+1 u n+1 u n ) := M l + θn(u n+1 ) + (1 θ)n(u n ) = 0 t Fixed-point iteration u (0) := u n u (m+1) = u (m) C 1 F (u (m) ), m = 0, 1,... until F (u (m+1) ) tol or F (u (m+1) ) tol F (u n ). Practical implementation C u (m+1) = F (u (m) ) m = 0, 1,... u (m+1) = u (m) u (m+1) u (0) = u n 7
Choice of preconditioner Linearized system C u (m+1) = F (u (m) ) Diagonal mass matrix C = M l / t cheap, but only usable for very small time steps Low-order operator C = M l / t + θ L fixed-point defect correction scheme may converge slowly no update needed if the velocity field remains unchanged Jacobian operator C = M l / t + θ N(u) u nonlinear operator N(u) is constructed discretely flux limiter may not be globally differentiable Idea: approximate by divided differences and see what happens 8
Jacobian matrix, Moe07a Nodal approximation of Jacobian J = {j ik } D k [N i ] := N i(u + he k ) N i (u he k ) 2h j ik = D k [N i ] + O(h 2 ) D k [N i ] = D k [(Lu f ) i ] = D k [(Ku + Du f ) i ] = D k [ j k ij u j + j i (1 α ij )f ij ] = k ik + j D k [k ij ]u j + j i D k [(1 α ij )f ij ] Averaged coefficient kik := k ik(u + he k ) + k ik (u he k ) 2 9
Jacobian matrix, Moe07a Nodal approximation of Jacobian J = {j ik } D k [N i ] := N i(u + he k ) N i (u he k ) 2h j ik = D k [N i ] + O(h 2 ) D k [N i ] = D k [(Lu f ) i ] = D k [(Ku + Du f ) i ] = D k [ j k ij u j + j i (1 α ij )f ij ] = k ik + j D k [k ij ]u j + j i D k [(1 α ij )f ij ] Averaged coefficient kik := k ik(u + he k ) + k ik (u he k ) 2 9
Jacobian matrix, Moe07a Nodal approximation of Jacobian J = {j ik } D k [N i ] := N i(u + he k ) N i (u he k ) 2h j ik = D k [N i ] + O(h 2 ) D k [N i ] = D k [(Lu f ) i ] = D k [(Ku + Du f ) i ] = D k [ j k ij u j + j i (1 α ij )f ij ] = k ik + j D k [k ij ]u j + j i D k [(1 α ij )f ij ] Averaged coefficient kik := k ik(u + he k ) + k ik (u he k ) 2 9
Sparsity pattern, Moe07a Jacobian coefficients j ik = k ik + j D k[k ij ]u j + j i D k[(1 α ij )f ij ] There exists an edge ij iff the FE basis functions have overlapping supports G := K Structure of the Jacobian Remarks z kk 0, g ii = 1, g ij = 1 ij Z := J z ik 0 ik z jk 0 ik ij Same structure is used for edge-oriented stabilization techniques Sparsity pattern can be generated by multiplication: Z = G 2 10
Sparsity pattern, Moe07a Jacobian coefficients j ik = k ik + j D k[k ij ]u j + j i D k[(1 α ij )f ij ] There exists an edge ij iff the FE basis functions have overlapping supports G := K g ii = 1, g ij = 1 ij u he k k Structure of the Jacobian Z := J z kk 0, z ik 0 ik z jk 0 ik ij Remarks Same structure is used for edge-oriented stabilization techniques Sparsity pattern can be generated by multiplication: Z = G 2 10
Sparsity pattern, Moe07a Jacobian coefficients j ik = k ik + j D k[k ij ]u j + j i D k[(1 α ij )f ij ] There exists an edge ij iff the FE basis functions have overlapping supports G := K Structure of the Jacobian Remarks z kk 0, g ii = 1, g ij = 1 ij Z := J z ik 0 ik z jk 0 ik ij Same structure is used for edge-oriented stabilization techniques Sparsity pattern can be generated by multiplication: Z = G 2 10
Sparsity pattern, Moe07a Jacobian coefficients j ik = k ik + j D k[k ij ]u j + j i D k[(1 α ij )f ij ] There exists an edge ij iff the FE basis functions have overlapping supports G := K Structure of the Jacobian Remarks z kk 0, g ii = 1, g ij = 1 ij Z := J z ik 0 ik z jk 0 ik ij Same structure is used for edge-oriented stabilization techniques Sparsity pattern can be generated by multiplication: Z = G 2 10
Sparsity pattern, Moe07a Jacobian coefficients j ik = k ik + j D k[k ij ]u j + j i D k[(1 α ij )f ij ] There exists an edge ij iff the FE basis functions have overlapping supports G := K Structure of the Jacobian Remarks z kk 0, g ii = 1, g ij = 1 ij Z := J z ik 0 ik z jk 0 ik ij Same structure is used for edge-oriented stabilization techniques Sparsity pattern can be generated by multiplication: Z = G 2 10
Example: Stationary convection-diffusion Convection-diffusion equation v u d u = 0 v = (cos 10, sin 10 ) FEM-TVD: d = 10 3 Initial guess in Ω = (0, 1) 2 u u (x, y) = Boundary conditions { 1 x if y 0.5 0 otherwise 128 128 Q 1 -elements u(x, 0) = 0, u(1, y) = 0, { u 1 if y 0.5, (x, 1) = 0, u(0, y) = y 0 otherwise. 11
Example: τ u + v u d u = 0 d = 10 3, τ = 1.0, BiCGSTAB+ILU, rel lin 10 3 norm of nonlinear residual 10 0 Newton level 6 Newton level 7 10 2 Newton level 8 Defcor level 6 Defcor level 7 10 4 Defcor level 8 10 6 10 8 10 10 10 12 10 14 0 50 100 150 number of nonlinear iterations NVT CPU NN NL Newton s method 4225 1 13 71 16641 4 13 102 66049 23 13 189 defect correction 4225 3 146 1263 16641 9 78 1060 66049 35 43 1063 12 Newton s method vs. defect correction total CPU time reduces by factor 2-3 reduction of nonlinear iterations by factor 4-16
Example: τ u + v u d u = 0 d = 10 3, τ = 10.0, BiCGSTAB+ILU, rel lin 10 3 norm of nonlinear residual 10 0 Newton level 6 Newton level 7 10 2 Newton level 8 Defcor level 6 Defcor level 7 10 4 Defcor level 8 10 6 10 8 10 10 10 12 10 14 0 20 40 60 80 100 120 140 160 180 number of nonlinear iterations NVT CPU NN NL Newton s method 4225 2 37 263 16641 5 12 152 66049 19 10 182 defect correction 4225 4 166 1612 16641 10 86 1234 66049 38 46 1173 12 Newton s method vs. defect correction total CPU time reduces by factor 2-3 reduction of nonlinear iterations by factor 4-16
Example: τ u + v u d u = 0 d = 10 4, τ = 1.0, BiCGSTAB+ILU, rel lin 10 3 norm of nonlinear residual 10 0 Newton level 6 Newton level 7 10 2 Newton level 8 Defcor level 6 Defcor level 7 10 4 Defcor level 8 10 6 10 8 10 10 10 12 10 14 0 100 200 300 400 500 600 number of nonlinear iterations NVT CPU NN NL Newton s method 4225 2 32 541 16641 54 65 3514 66049 95 37 1219 defect correction 4225 9 400 4279 16641 63 560 8013 66049 263 595 5861 12 Newton s method vs. defect correction total CPU time reduces by factor 2-3 reduction of nonlinear iterations by factor 4-16
Example: τ u + v u d u = 0 d = 10 4, τ = 10.0, BiCGSTAB+ILU, rel lin 10 3 norm of nonlinear residual 10 0 Newton level 6 Newton level 7 10 2 Newton level 8 Defcor level 6 Defcor level 7 10 4 Defcor level 8 10 6 10 8 10 10 10 12 10 14 0 100 200 300 400 500 600 700 800 900 number of nonlinear iterations NVT CPU NN NL Newton s method 4225 5 53 1144 16641 37 45 4287 66049 180 63 4147 defect correction 4225 10 447 5306 16641 75 616 9875 66049 356 853 7519 Newton s method vs. defect correction total CPU time reduces by factor 2-3 reduction of nonlinear iterations by factor 4-16 12
Example: τ u + v u d u = 0 d = 10 4, τ = 10.0, BiCGSTAB+ILU, rel lin 10 3 norm of nonlinear residual 10 0 Newton level 6 Newton level 7 10 2 Newton level 8 Defcor level 6 Defcor level 7 10 4 Defcor level 8 10 6 10 8 10 10 10 12 10 14 0 100 200 300 400 500 600 700 800 900 number of nonlinear iterations NVT CPU NN NL Newton s method 4225 5 53 1144 16641 37 45 4287 66049 180 63 4147 defect correction 4225 10 447 5306 16641 75 616 9875 66049 356 853 7519 Newton s method vs. defect correction total CPU time reduces by factor 2-3 reduction of nonlinear iterations by factor 4-16 12
Example: Burgers equation in space-time Inviscid Burgers equation ( ) u 2 x + t u = 0 2 Discrete upwind: 32,768 P 1 -elements Space-time domain Ω = (0, 1) (0, 0.5) Boundary conditions 1 if 0 x < 0.4 t = 0 0.5 if 0.4 x 8 t = 0 u(x, t) = 0 if 0.8 < x 1 t = 0 or x = 0 0 t 0.5 u u h 1 = 1.6922e 2 u u h 2 = 4.0169e 2 13
Example: Burgers equation in space-time Inviscid Burgers equation ( ) u 2 x + t u = 0 2 FEM-TVD: 32,768 P 1 -elements Space-time domain Ω = (0, 1) (0, 0.5) Boundary conditions 1 if 0 x < 0.4 t = 0 0.5 if 0.4 x 8 t = 0 u(x, t) = 0 if 0.8 < x 1 t = 0 or x = 0 0 t 0.5 u u h 1 = 4.9211e 3 u u h 2 = 1.9082e 2 13
Example: τ u + x ( 1 2 u2 ) + t u = 0 Discrete upwind: 1 nit./ τ, BiCGSTAB+ILU, rel lin 10 3 τ = 0.1 Newton defect correction error NVT CPU NN NL CPU NN NL u u h 1 u u h 2 4225 1 25 162 1 32 172 4.86e-2 7.84e-2 16641 2 24 287 3 35 352 2.93e-2 5.63e-2 66049 15 24 518 19 41 716 1.69e-2 4.01e-2 τ = 1.0 Newton defect correction error NVT CPU NN NL CPU NN NL u u h 1 u u h 2 4225 1 12 136 1 21 321 4.86e-2 7.84e-2 16641 2 11 242 4 26 661 2.93e-2 5.63e-2 66049 12 12 412 30 35 1348 1.69e-2 4.01e-2 Discrete Jacobian operator yields robust Newton iteration 14
Example: τ u + x ( 1 2 u2 ) + t u = 0 FEM-TVD: up to 10 nit./ τ n, τ n [0.01, 1.0], rel nonl 10 3 BiCGSTAB+ILU, rel lin 10 3 Newton defect correction CPU NSTEP NN NL CPU NSTEP NN NL 10 38 119 5817 53 476 3741 42955 73 38 167 8597 224 254 2517 46755 1295 39 303 40162 3818 1065 6152 179584 error 4225 16641 66049 u u h 1 2.29e-2 1.01e-2 4.92e-3 u u h 2 5.34e-2 3.26e-2 1.90e-2 Newton s method vs. defect correction total CPU time reduces by factor 3 reduction of outer iterations by factor 15-30 15
Example: Swirling flow, Leveque Swirling flow in Ω = (0, 1) 2 u t + (vu) = 0 Time-dependent velocity t (0, T ) v x = sin 2 (πx) sin(2πy)g(t) v y = sin 2 (πy) sin(2πx)g(t) g(t) = cos(πt/t ), T = 1.5 Time step control: PID t [10 3, 10 1 ], un+1 u n u n+1 0.5% 16
Example: Swirling flow, contd. Newton defect correction NVT NSTEP CPU NN NL CPU NN NL 4225 763 (70) 46 4474 4498 212 31898 31398 16641 977 (77) 251 5528 5556 1141 38295 38295 66049 1126 (66) 1220 5988 5990 5592 42410 42410 0.02 4225 nodes 16641 nodes 0.018 66049 nodes 10 1 0.016 0.014 0.012 0.01 10 2 0.008 0.006 0.004 0.002 0 0 0.5 1 1.5 adaptive time step 10 3 256 128 64 error reduction 17 CPU improved by factor 4.5; reduction of iterations by factor 7
Example: Swirling flow, contd. Newton defect correction NVT NSTEP CPU NN NL CPU NN NL 4225 763 (70) 46 4474 4498 212 31898 31398 16641 977 (77) 251 5528 5556 1141 38295 38295 66049 1126 (66) 1220 5988 5990 5592 42410 42410 0.02 4225 nodes 16641 nodes 0.018 66049 nodes 0.016 10 4 10 6 0.014 0.012 0.01 0.008 0.006 0.004 norm of nonlinear residual 10 8 10 10 10 12 0.002 0 0 0.5 1 1.5 adaptive time step 10 14 0 10 20 30 40 50 60 70 80 number of nonlinear iterations 17 CPU improved by factor 4.5; reduction of iterations by factor 7
Outlook: Compressible flows, Kuz05b Divergence Form U t + 3 d=1 F d x d = 0 A d = F d U Quasi-linear form U t + 3 U d=1 Ad x d = 0 FEM discretization [ M du ] C dt = i j i a ij(u j u i ) a ij = r ij Λ ij r 1 ij Transformation to local characteristic variables makes it possible to perform upwinding and algebraic flux correction as in the scalar case. M = 3, α = 0, density Mach number 18
Grid adaption, Moe06 1 2048 triangles 1 7194 triangles 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0 0.5 1 1.5 2 2.5 3 3.5 4 1 3503 triangles 1 15664 triangles 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 1 Density distribution on the final mesh 0 0 0.5 1 1.5 2 2.5 3 3.5 4 2.2 2 1.8 cutline at y = 0.6 grid1 grid2 grid3 grid4 0.5 1.6 1.4 1.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 1 0.8 0 0.5 1 1.5 2 2.5 3 3.5 4 19
Time-dependent grid adaptivity 20
Time-dependent grid adaptivity 20
Time-dependent grid adaptivity 20
Time-dependent grid adaptivity 20
Time-dependent grid adaptivity 20
Time-dependent grid adaptivity 20
Time-dependent grid adaptivity 20
Conclusions and Outlook nonlinear high-resolution schemes can be constructed by adding artificial diffusion + limited antidiffusion implicit flux correction schemes can be combined with algebraic Newton methods to speed up convergence Jacobian matrix is approximated by divided differences and assembled efficiently edge-by-edge in a black-box fashion the sparsity pattern of the Jacobian needs to be extended which can be accomplished by standard matrix multiplication Further research: compressible Euler/Navier-Stokes equations time-dependent grid adaptation + goal oriented error estimation 21
References Kuz05a D. Kuzmin, M.M. Algebraic flux correction I. Scalar conservation laws. In: Flux-Corrected Transport, Principles, Algorithms, and Applications, D. Kuzmin, R. Löhner, S. Turek (eds.). Springer: Germany, 2005; 155-206 Kuz05b D. Kuzmin, M.M. Algebraic flux correction II. Compressible Euler equations. In: Flux-Corrected Transport, Principles, Algorithms, and Applications, D. Kuzmin, R. Löhner, S. Turek (eds.). Springer: Germany, 2005; 207-250. Moe06 M.M., D. Kuzmin. Adaptive mesh refinement for high-resolution finite element schemes. IJNMF 52, 2006; 545-569. Moe07a M.M. Efficient solution techniques for implicit finite element schemes with flux limiters. IJNMF (in press). Moe07b M.M, D. Kuzmin, D. Kourounis. Implicit FEM-FCT algorithms and discrete Newton methods for transient problems. Tech. Rep. 340, 2007, University of Dortmund. 22
PID time step control Normalize relative changes e n = un u n 1 tol u n Reject time step if e n > 1 t n := β t n, 0 < β < 1 Compute new time step t n+1 = ( en 1 e n e n Parameters (Valli, Carey, Coutinho) ) kp ( ) ki ( 1 e 2 ) kd n 1 t n e n e n 2 k P = 0.075, k I = 0.175, k D = 0.01 Back 23
Semi-implicit FCT limiter, Part I 1. Initialization P ± i Q ± i R ± i 0 2.Compute positivity-preserving intermediate low-order solution ũ = u n + (1 θ) tm 1 l Lu n 3. Evaluate raw antidiffusive flux f n ij P ± i := P ± i + max min {0, f n = td n ij (un i ij }, P ± j := P ± j + max min 4. Update admissible increments for both nodes Q ± i uj n ) and update {0, f n ij } := max min {Q± i, ũ j ũ i }, Q ± j := max min {Q± j, ũ i ũ j } 5. Limite the raw antidiffusive fluxes f n ij R ± i := m i Q ± i /P ± i f ij := { min{r + i, R j }fij n, min{r i, R + j }fij n, if F ij n > 0, otherwise. 24
Semi-implicit FCT limiter, Part II 1. Evaluate the target flux f ij = [m ij + θ td (m) ij ](u (m) i u (m) j ) [m ij (1 θ) tdij n ](ui n uj n ) 2.Constrain each flux by means of f ij { f ij = min{f ij, max{0, f ij }}, if f ij > 0, max{f ij, min{0, f ij }}, otherwise. 3. Insert the limited antidiffusive flux into the right-hand side b(m) i := b (m) i + f ij, b(m) j := b (m) j f ij Back 25