A Jacobi Davidson Algorithm for Large Eigenvalue Problems from Opto-Electronics

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 1/30 A Jacobi Davidson Algorithm for Large Eigenvalue Problems from Opto-Electronics Peter Arbenz 1 Urban Weber 1 Ratko Veprek 2 Bernd Witzigmann 2 1 Institute of Computational Science, ETH Zurich, 2 Integrated Systems Laboratory, ETH Zurich Work in progress RANMEP, Hsinchu, Taiwan, January 4 8, 2008.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 2/30 Introduction I In solid state physics, the (electronic) band structure of a solid describes ranges of energy that an electron can assume. Electrons of single free-standing atom occupy atomic orbitals, that form discrete set of energy levels. When a large number of atoms are brought together to form a solid, the number of orbitals becomes exceedingly large, and the difference in energy between them becomes very small. Bands of energy levels are form rather than discrete energy levels. Intervals that contain no orbitals are called band gaps. From wikipedia.org

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 3/30 Introduction II Any solid has a large number of bands, in theory, many. However, all but a few lie at energies so high that any electron that reaches those energies escapes from the solid. These bands are usually disregarded.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 4/30 Introduction III The uppermost occupied band is called valence band by analogy to the valence electrons of individual atoms. The lowermost unoccupied band is called the conduction band because only when electrons are excited to the conduction band can current flow in these materials.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 5/30 Introduction IV A numerical evaluation of the band structure takes into account the periodic nature of a crystal lattice. The Schrödinger equation for a single particle in a cristal lattice is given by ( 2 2m 0 2 + V (r))ψ(r) = EΨ(r) (1) where the potential V exhibits the cristal periodicities: V (r) = V (r + R) for all lattice vectors R. Solutions of (1) are Bloch functions: where u nk (r) is lattice-periodic. Ψ nk (r) = e ik r u nk (r). (2)

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 6/30 Introduction V Here, k is called the wave vector, and is related to the direction of motion of the electron in the crystal, and n is the band index, which simply numbers the energy bands. The wave vector k takes on values within the Brillouin zone corresponding to the crystal lattice. Plugging (2) into (1) yields 2 2m 0 ( 2 + 2ik + k 2 + V (r) ) u nk (r) = E n (k)u nk (r) For each of the possible k there are infinitely many n = 1, 2,.... (3)

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 7/30 Introduction VI There are a number of possible ways to compute the spectral bands. 1 Solve (3) for k = k 0 = 0 and determine the rest of the eigenvalues by perturbation theory. (Effective mass m ) 2 Solve (3) in a subspace spanned by a small number of eigenfunctions (zone-centered Bloch functions) u n0 (Kane, 1957). 3 In the k p method the two methods are somehow combined by writing m Φ(r) = g i (r)u i0 (r). (4) i=1 Here, the so-called envelope g i replaces the plane wave e ik r in (2) (Luttinger-Kohn, 1955; Löwdin, 1951). We go for the k p method for its superior accuracy.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 8/30 Introduction VII We finally arrive at an eigenvalue problem for the envelopes: H (2) ij (r) i j + H (1) i (r) i + H (0) (r) + U(r) g(r) = Eg(r) i,j=1,2,3 (5) Here, the perturbation U(r) takes into account an impurity potential of the material or quantum dot, etc. Note, that g is a m-vector (field). For the numerical computation g is represented by piecewise (linear) finite elements.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 9/30 Quantum dot Probability density of an excited state in a valence band in a pyramidal quantum dot.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 10/30 The linear algebra problem We end up with the generalized complex Hermitian eigenvalue problem Ax = λmx. (6) where A = A C n n, M T = M R n n, Depending on the bands included, A is either definite or indefinite. The latter holds if valence and conduction band are taken into account. Just a small number (k = 1 8) of bands are used. Matrices have k k blocks. Only a few of the eigenvalues closest to 0 of (6) are desired.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 11/30 Transformation complex Hermitian real symmetric I For A = A r + i A i C n m, A r, A i R n m, define the mapping (Day & Heroux, 2001) ( ) ϕ : C m n R 2m 2n Ar A : A ϕ(a) := i A i A r (7) that transforms a complex m n matrix into a real 2m 2n matrix. If operations are allowed, then ϕ(ab) = ϕ(a)ϕ(b). (8) Many codes for solving eigenvalue problems, in particular for large sparse eigenvalue problems, are available only in real arithmetic. Very often the sparsity structures of real and imaginary parts of the underlying matrices differ considerably.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 12/30 Transformation complex Hermitian real symmetric II We can rewrite the complex eigenvalue problem Ax = xλ (9) in the real form ϕ(ax) = ϕ(a)ϕ(x) = ϕ(xλ) = ϕ(x)ϕ(λ). (10) or, ( Ar A i A i A r ) ( ) ( ) ( xr x i xr x = i λr λ i x i x r So, a complex eigenvalue of A becomes a pair of complex conjugate eigenvalues of ϕ(a). A Hermitian = λ i = 0. x i x r λ i λ r ). (11)

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 13/30 Transformation complex Hermitian real symmetric III Special cases: ϕ(x )ϕ(x) = ϕ( x 2 ) ( x T r x T i x T i x T r ) ( ) xr x i x i x r ( ) = x 2 1 0 0 1 ( ) Enforcing x yr y = 0 thus means that is made orthogonal y ( ) ( ) i ( ) xi xr yi to both and. The companion vector x r x i y r will then automatically be orthogonal to these two vectors. Computation is done with just one vector.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 14/30 Computing only a few eigenpairs Case 1. A is positive definite. Compute eigenvalues one by one starting with the smallest. Case 2: A is indefinite. (M is always spd.) Use approaches employing harmonic Ritz values/vectors. Preconditioner? Generate two new positive definite eigenvalue problems (A σ i M)M 1 (A σ i M)x = λmx, i = 1, 2. (12) where σ 1 is close to the smallest positive eigenvalue of (A, M) and σ 2 is close to the smallest negative eigenvalue of (A, M) (Vömel, 2007; Shi Shu, 2006). Compute eigenvalues of (12) with shift σ 1 one by one starting with the smallest. Compute eigenvalues of (12) with shift σ 2 one by one starting with the smallest. Preconditioner? AMG preconditioner?

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 15/30 Symmetric Jacobi Davidson (JDSYM) (Sleijpen/van der Vorst, 1996; Geus, 2003) Let V k = span{v 1,..., v k }, v T k Mv j = δ kj, be the actual search space (not a Krylov space). Rayleigh Ritz Galerkin procedure: Extract Ritz pair ( λ, q) in V k with λ closest to some target value τ. Convergence: If r k M 1 (A λm) q M 1 < ε q M then we have found an eigenpair Solve correction equation for t k M q, (I M q q T )(A η k M)(I q q T M)t k = r k, q T Mt k = 0. (13) M-orthonormalize t k to V k to obtain v k+1 Expand search space: V k+1 = span{v 1,..., v k+1 }.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 16/30 Remarks on JDSYM In exact arithmetic, one-vector JD is Rayleigh quotient iteration. Eigenvalue approximations converge cubically. Shift η k in (13) is set to target value τ initially ( inverse iteration) and to the Rayleigh quotient ρ( q) close to convergence. Keep subspace size bounded: If k = j max reduce size of the search space to j min. Use j min best Ritz vectors in V jmax to define V jmin. The correction equation is solved only approximatively. We use a Krylov space method: QMRS (admits indefinite preconditioner) (Freund, 1992). Eigenvectors corresponding to higher eigenvalues are computed in the orthogonal complement of previously computed eigenvectors.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 17/30 Preconditioning the correction equation The correction equation is given by (I M q q T )(A η k M)(I q q T M)t k = r k, q T Mt k = 0. Preconditioning means solving with a system of the form (I M q q T )K(I q q T M)c = b, q T Mc = 0. (14) where K is a preconditioner for A ρ k M. As we are looking for just a few of the smallest eigenvalues we take K A σm where σ is close to the desired eigenvalues. Usually, σ = τ. We use the same aggregation-based AMG preconditioner K for all correction equations (Vaněk, Brezina, Mandel, 1996) (Remember we only compute a few eigenpairs.)

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 18/30 Setup procedure for an abstract multigrid solver 1: Define the number of levels, L 2: for level l = 0,..., L 1 do 3: if l < L 1 then 4: Define prolongator P l ; 5: Define restriction R l = Pl T ; 6: K l+1 = R l K l P l ; 7: Define smoother S l ; 8: else 9: Prepare for solving with K l ; 10: end if 11: end for

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 19/30 Smoothed aggregation (SA) AMG preconditioner I 1 Build adjacency graph G 0 of K 0 = K. (Take m m block structure into account.) 2 Group graph vertices into contiguous subsets, called aggregates. Each aggregate represents a coarser grid vertex. Typical aggregates: 3 3 3 nodes (of the graph) up to 5 5 5 nodes (if aggressive coarsening is used) (Par)METIS Note: The matrices K 1, K 2,... need much less memory space than K 0! Typical operator complexity for SA: 1.4 (!!!)

Smoothed aggregation (SA) AMG preconditioner II 3 Define a grid transfer operator: Low-energy modes (near-kernel) are chopped according to aggregation B l = Let B (l) j B (l) 1.. B (l) n l+1 = rows of B l corresponding to grid points assigned to j th aggregate. B (l) j = Q (l) j R (l) j be QR factorization of B (l) j then B l = P l B l+1, PT l Pl = I, with P l = diag(q (l) 1,..., Q(l) n l+1 ) and B l+1 = R (l) 1.. R (l) n l+1 Columns of B l+1 span the near kernel of K l+1. Notice: matrices K l are not used in constructing tentative prolongators P l, near kernels B l, and graphs G l. RANMEP, Hsinchu, Taiwan, January 4 8, 2008 20/30.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 21/30 Smoothed aggregation (SA) AMG preconditioner III 4 For elliptic problems, it is advisable to perform an additional step, to obtain smoothed aggregation (SA). P l = (I l ω l D 1 l K l ) P l, ω l = smoothed prolongator 4/3 λ max (D 1 l K l ), In non-smoothed aggregation: P l = P l 5 Smoother S l : polynomial smoother Choose a Chebyshev polynomial that is small on the upper part of the spectrum of K l (Adams, Brezina, Hu, Tuminaro, 2003). Parallelizes perfectly, quality independent of processor number.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 22/30 Matrix-free multigrid Our approach: pcg which almost smoothed aggregation AMG preconditioning We set K = K 0 = A σm in the positive definite case and K = (A σm)m 1 lumped (A σm) in the indefinite case. (We apply K 0.) We lump because M 1 is dense. G 0 is the graph corresponding to (A σm)m 1 lumped (A σm). P 0 is not smoothed, i.e. P 0 = P 0. K 1 = P T 0 K 0P 0 is formed explicitely. All graphs, including G 0, are constructed.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 23/30 The Software Environment: Trilinos The Trilinos Project is an effort to develop parallel solver algorithms and libraries within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific applications. See http://software.sandia.gov/trilinos/ Provides means to distribute (multi)vectors and (sparse) matrices (Epetra and EpetraExt packages). Provides solvers that work on these distributed data. Here we use iterative solvers and preconditioners (packages AztecOO/IFPACK), smoothed aggregation multilevel AMG preconditioner (ML), direct solver wrappers (Amesos) and data distribution for parallelization (Zoltan/ParMETIS). Quality of documentation depends on package.

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 24/30 Parallel mesh reading Implementation: HDF5 format/library Allows for parallel I/O Mesh file content A list of tetrahedra (4 vertices) Mesh reading scales with number of processors Binary file format allows for efficient I/O A list of vertex coordinates (x, y, z) A list of surface triangles (3 vertices)

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 25/30 Timings for a small quantum wire problem (4 4, spd) Problem size n = 49860, 2*n = 99720 Number of nonzeros in the graph = 1259826 Some JDSYM parameters itmax=1000 linitmax=50 kmax=8 jmin=8 jmax=16 tau=0.0e+00 jdtol=1.0e-08 eps tr=1.0e-02 toldecay=1.5e+00 sigma=0.0e+00 linsolver=qmrs blksize=1

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 26/30 Convergence history of a typical JD run

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 27/30 Performance with spd problems E(p) = t(1) p t(p) 2DBigWire p t [sec] E(p) t(prec) n out n avg inner 1 400 1.00 243 59 15.6 2 181 1.10 147 56 15.8 4 140 0.71 115 56 15.7 8 215 0.23 178 61 17.1

Performance with indefinite problems Isoprobability surface at probability density of 0.006 of an elevated electron state of a pyramidal Indium-Arsenide (InAs) quantum dot calculated using k p 8 8. RANMEP, Hsinchu, Taiwan, January 4 8, 2008 28/30

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 29/30 Conclusions JD solves the spd eigenvalue very efficiently (with the right number of processors) The SA multilevel preconditioner is unbeatable (memory consumption, scalability), if it is applicable. Uncertainties with respect to success with the indefinite system does preconditioner work? near null space right? stopping criterion? how is the accuracy affected by squaring of the matrix?

RANMEP, Hsinchu, Taiwan, January 4 8, 2008 30/30 The End