Plaquette Renormalized Tensor Network States: Application to Frustrated Systems

Workshop on QIS and QMP, Dec 20, 2009 Plaquette Renormalized Tensor Network States: Application to Frustrated Systems Ying-Jer Kao and Center for Quantum Science and Engineering Hsin-Chih Hsiao, Ji-Feng Yu(NTU) Anders W. Sandvik, Ling Wang (Boston)

Frustration AF or Kagome Pyrochlore Hyperkagome Garnet ZnCu3(OH)6Cl2 Tb2Ti2O7 Na4Ir3O8 Gd3Ga5O12 Large number of degenerate classical ground states Emergence of novel spin-disordered ground states due to quantum fluctuations Hard to study numerically: size limitation in Exact Diagonalization, sign problem in QMC (also fermion), dimension limit for DMRG (d=1)

Negative Sign Problem H = ij H ij = ij(s + i S j + S i S+ j ) flip α -1-1 -1 flip flip α p(c) = ( 1) 3 α H 12 H 23 H 31 α < 0 A p(c) = As p(c) s p(c) Negative matrix elements Fluctuations in signs:ensemble average diverges at low temperature

Simulation of Quantum Systems Is it possible to classically simulate faithfully a quantum system? To represent a quantum state: ψ = classical representation requires d n complex coefficients which grows exponentially with n Ground states of quantum many-body systems are usually entangled. d i 1,i 2,...,i n =1 c i1 i 2...i n i 1 i 2... i n

Entanglement Complementary viewpoints on entanglement: Quantum information theory: a crucial resource to process and send information in novel ways Quantum many-body physics: entanglement gives rise to exotic phases of matter Numerical simulation of strongly correlated quantum systems: source of difficulties!! What kind of superpositions appear in nature? symmetries, local interactions, little entanglement

Entanglement: Matrix Product State i α A [1]i 1 α 1 α 2 A [2]i 2 α 2 α 3 A [3]i 3 α 3 α 4 A [4]i 4 α 4 α 5 A [5]i 5 α 5 α 6 A [6]i 6 α 6 α 7 A [7]i 7 α 7 α 8 Between each nearest-neighbor lattice site, we introduce a D- dimensional maximally entangled state: φ >= 1 D αα > D α=1 At each lattice site there are two ends to these maximally entangled states Operator A: project D 2 to d (AKLT GS: D=2, d=3) ψ = d i 1,...,i N =1 A 1 (i 1 )A 2 (i 2 ) A N (i N ) i 1, i 2,..., i N We reexpress the 2 N coefficients of in terms of about 2D 2 N parameters (linear in N) Key ingredient behind the success of DMRG S. Ostlund and S. Rommer, Phys. Rev. Lett. 75, 3537(1995) Afflect, Kennedy, Lieb, Tasaki, Commun Math Phys 115 (1988)

Tensor product states Extension similar ideas to higher dimension: tensor product states, projected entangled paired state (PEPS) ψ = {s} ttr{a(s 1 )A(s 2 ) A(s N )} s 1, s 2,..., s N s =±1 i j A s k l =1...D Aijkl: rank-4 tensor A s 1 A s 4 A s 7 A s 2 A s 3 A s 5 A s6 A s 8 A s9 F. Verstraete and J. I. Cirac, arxiv:cond-mat/0407066

Variational wave function Use tensor product state as trial wave function E = ψ H ψ ψ ψ ψ ψ = E 0 H = s 1,s 2,... i,j,k,l,... = ttr [T T T...] i H i H i = Ô0 i + Ô1 i Ô2 j +... A s 1 i,j,k,l A s 1 i,j,k,l As 2 k,m,n,o A s 1 k,m,n,o Double tensor T = s,s A s A s ) T

Variational Energy H = i H i H i = Ô0 i + Ô1 i Ô2 j +... ψ H ψ = ttr Ti 0 T T... +ttr Ti 1 Tj 2 T... +... T a = s,s A s A s s Ôa s Computationally intensive: D N_bond Direct computation of contraction is impossible: approximation (RG schemes)

Tensor Contraction Contracting the internal indices, the four-leg tensor can be viewed as a single tensor. External link dimension becomes D 2 after one contraction; exponential growth as we keep contracting D 4,D 8... Computationally intensive; Impossible to store intermediate results. Need some RG scheme.

Tensor entanglement renormalization Singular Value Decomposition: rank-3 tensors S T αβµν T αβµν D cut γ =1 D cut γ =1 S 1 µνγ S3 αβγ S 2 ανγ S4 µβγ Introduce cut-off for the bond-contraction T γσλρ αβµν S 1 βαγs 2 µβσs 3 νµγs 4 ανρ H. C. Jiang, Z. Y. Weng, and T. Xiang, Phys. Rev. Lett. 101, 090603 (2008) Gu, Levin, Wen, Phys. Rev. B, 78, 205116 (2008)

Tranverse ising model H = ij σ z i σ z j h i σ x i D=2, Dcut=18, non-mf results is obtained Method is not variational Locally optimized SVD truncation Translational invariance explicitly broken Gu, Levin, Wen, Phys. Rev. B, 78, 205116 (2008)

Distribution of Singular Values Transverse Ising Model, 8x8, h=3.08 Fully Frustrated XX model, 8x8, h=1 i i i Frustrated Systems Remain Difficult i

Plaquette-Renormalization 8-index tensor: D 8 4-index tensor: Dcut 4 8-index dbl-tensor: D 16 4-index dbl-tensor: Dcut 8 e f d g c h b a k j l i d l d r e l e r c l c r b l b r k i f l a l f r a r g l h l l g r h r j a,b,...g,h=1...d T 0 T 0 i,j,k,l=1...dcut S 1 S 1 a,b,...g,h=1...d i,j,k,l=1...dcut 2 Renormalization of an 8-index plaquette tensor using auxiliary 3-index tensors S. T 0 T 0 T 0 T 0 S 1 T k 1 i S 1 T 0 T 0 j l Wang, Kao, Sandvik, arxiv:0901.0214

Plaquette renormalized trial wave function We treat the elements in the A and S tensors as variational parameters, and minimize the total energy. Principal axis method: derivative-free, but computationally expensive.

Expectation values

Plaquette-Renormalization of TNS Effective reduced tensor network for a 8x8 lattice Summing over all unequivalent bonds and sites Method is variational Optimize T and R globally Size scaling: L 2 Log(L) R 1 T 1 T 0 T 2 R 2 Wang, Kao, Sandvik, arxiv:0901.0214

Transverse Ising Model m z m x 0.8 0.6 0.4 0.2 0.0 0.9 0.8 0.7 0.6 0.4 0.2 4 0 3.2 3.3 8 16 32 64 2.5 3.0 3.5 D=Dcut=2 4 8 16 32 Assume translational invariance: all initial T s are the same. Globally optimized T and R. hc=3.33 (3.04,QMC) mz~ (h-hc) β, β~0.40 h~hc, β~0.50 mean-field like. (Sanvik s talk). 2.5 3.0 3.5 h/j Wang, Kao, Sandvik, arxiv:0901.0214

Transverse Ising Model Spins at different lattice sites inside the plaquette have different environments. We use different tensors inside a plaquette.

J1-J2 Heisenberg Model H = J 1 S i S j + J 2 S i S j ij ij H. J. Schulz, T. A. Ziman and D. Poilblanc, J. Phys. I 6 (1996) 675-703

Error in energy Improvement in accuracy as D increase. D also sets a length scale. J2=0 A. W. Sandvik, Phys. Rev. B 56, 11678 (1997)

Building block: plaquette α 12 internal sums. Maximum computational effort: D 8. Each free index and summation contributes D. γ γ1 γ2 S 1 T 0 T 0 l α1 S 1 i j α2 T 0 k T 0 S 1 δ1 δ2 δ D 7 D 7 D 8 D 7 D 6 β2 S 1 β β1 T (1),αβ γδ = α 1,α 2,...,δ 1,δ 2,ijkl S (1),α α 1 α 2 S (1),β β 1 β 2 S (1),γ γ 1 γ 2 S (1),δ δ 1 δ 2 T (0),α 1l γ 2 i T (0),α 2j iδ 1 T (0),jβ 1 kδ 2 T (0),lβ 2 γ 1 k

Computational Costs Globally optimized T and R. Contraction calculation is highly parallelizable. IBM Blue Gene/L at BU. D=2, takes weeks to optimize. Bottleneck: plaquette internal contractions. Take advantage of GPU (Graphic Processing Unit). Wang, Kao, Sandvik, arxiv:0901.0214

GPU supercomputing The GPU is specialized for compute-intensive, massively data parallel computation 1.4 billion transistors 1 Teraflop of processing power 240 processing cores nvidia Tesla 10-Series Processor NTU CQSE GPU cluster 16x Tesla S1070=64 Teraflops

S 1 GPU kernel Strategy Inputs: T (D 4 ), S (D 3 ) (unchanged during contraction) S 1 T 0 T 0 T 0 T 0 S 1 Load into texture memory (optimized for 2D read) Output: T (D 4 ) block geometry up to 3D (tx,ty,tz), grid of blocks geometry up to 2D (bx,by). Reshape T into a D 2 xd 2 matrix, S into a DxD 2 matrix. Use D 2 xd 2 thread blocks, each block has DxD threads. Each block computes an element of T. Use block indices (bx,by) to represent external indices. Host Kernel 1 Kernel 2 Block (1, 1) (0, 0) (0, 1) (0, 2) Device (1, 0) (1, 1) (1, 2) S 1 Grid 1 Block (0, 0) Block (0, 1) Grid 2 (2, 0) (2, 1) (2, 2) Block (1, 0) Block (1, 1) (3, 0) (3, 1) (3, 2) (4, 0) (4, 1) (4, 2) Block (2, 0) Block (2, 1) NVIDIA

Benchmark: One plaquette contraction CPU code: Intel Core 2 Duo E4700 2.6GHz D=6, 2.1 GFLOPS D=8, 3.4 GFLOPS GPU, Single Precision code (unoptimized): D=6, 24.9 GFLOPS (GTX8800), 49.7GFLOPS (GTX280) D=8, 92.2 GFLOPS (GTX280) 10x to 30x speedup. A factor of at least 8 performance hit for DP. GPU code requires optimization Bank conflict in shared memory R/W Low stream multiprocessor occupancy Current scheme D bound by shared memory size Global memory bandwidth hiding

Conclusion Tensor network states are promising candidates to understand frustrated quantum spin systems. In plaquette renormalized tensor network representation, no approximations are made when contracting the effective renormalized tensor network. Non-MFT results even with the smallest possible nontrivial tensors and truncation (D = 2) Larger internal bond dimension D is necessary to get the right physics. GPU can potentially speed up the computationally intensive part of the calculation. Use Monte Carlo sampling in the future.