Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem

Size: px

Start display at page:

Download "Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem"

Julius Lester
5 years ago
Views:

1 Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics July 25, 2014 Joint work with Gene Katsevich (Princeton) and Alexander Katsevich (UCF) Amit Singer (Princeton University) July / 41

2 Single Particle Cryo-Electron Microscopy Drawing of the imaging process: Amit Singer (Princeton University) July / 41

3 The X-ray Transform (P s φ)(x,y) = R φ(r T s r)dz, where r = (x,y,z) T, R s SO(3), and φ L 2 (B) is the electric potential of the molecule. Projection I s Molecule φ R s = R 1 s R 2 s R 3 s SO(3) Electron source Amit Singer (Princeton University) July / 41

4 The cryo-em problem If S : L 2 (D) R q is discretization onto N N grid (q π 4 N2 ) then the microscope produces where ǫ s is noise. I s = SP s φ+ǫ s, s = 1,...,n, Nyquist can reconstruct only up to a bandlimit W. Natural search space for φ: space B of band limited functions. Suppose dim(b) = p. Cryo-EM problem: Given I s, estimate R s and find approximation of φ in B. Amit Singer (Princeton University) July / 41

5 The heterogeneity problem What if the molecule has more than one possible structure? (Image source: H. Liao and J. Frank, Classification by bootstrapping in single particle methods, Proceedings of the 2010 IEEE international conference on biomedical imaging, 2010.) Amit Singer (Princeton University) July / 41

6 The heterogeneity problem (discrete conformational space) Problem (1) φ : Ω R 3 R, with P[φ = φ c ] = p c, c = 1,...,C. φ 1,...,φ n are i.i.d. samples from φ Let I s = SP s φ s +ǫ s, where P 1,...,P n are obtained from i.i.d. rotations R s (independent of the φ s ) drawn from a distribution over SO(3), and ǫ s N(0,σ 2 I p ), independent of φ s,p s. Given R s,i s, find the number of classes C, approximations of the molecular structures φ c in B, and their probabilities p c. Amit Singer (Princeton University) July / 41

7 Previous work Common lines (e.g., Shatsky et al, 2010) Maximum likelihood (e.g., Wang et al, 2013) Bootstrapping and PCA (e.g., Penczek et al, 2011) Amit Singer (Princeton University) July / 41

8 The heterogeneity problem: finite-dimensional formulation For purposes of this talk, suppose that φ c B. Note that P s := SP s B is a linear operator P s : R p R q. If X s is basis representation of φ s, then I s = P s X s +ǫ s. X 1,...,X n are i.i.d samples of a random vector X in R p. When X has C classes, Σ = Var[X] has rank C 1. Penczek (2011) proposed a (suboptimal) method to use the top C 1 eigenvectors of Σ to solve the heterogeneity problem. How to estimate Σ? Amit Singer (Princeton University) July / 41

9 Covariance Estimation from Projected Data Consider the following general statistical estimation problem: Problem (2) X is a random vector on C p, with E[X] = µ and Var(X) = Σ. X 1,...,X n are i.i.d. samples from X Let I s = P s X s +ǫ s, s = 1,...,n, P 1,...,P n : C p C q are i.i.d. samples (independent of the X s ) from a distribution over C q p (q p) ǫ s are i.i.d. noises such that E[ǫ s ] = 0,Var[ǫ s ] = σ 2 I q, independent of X s,p s Given I s,p s, and σ 2, estimate µ and Σ. Amit Singer (Princeton University) July / 41

10 PCA from projected data: Well-known special cases I s = P s X s +ǫ s, s = 1,...,n. If p = q and P s = I p, we have the usual covariance estimation problem. When P s are coordinate-selection operators, we have a missing data statistics problem (Little and Rubin, 1987). A special case of the matrix sensing problem (Recht, Fazel, Parrilo, 2010) Here we do not attempt to impute the missing entries / estimate X s assuming Σ has low rank. Instead we shall try to estimate Σ and its rank directly. Amit Singer (Princeton University) July / 41

11 Obtaining an estimator I s = P s X s +ǫ s. Note that E[I s ] = P s µ; Var[I s ] = P s ΣP s +σ2 I. Can construct estimators µ n and Σ n as solutions of optimization problem: µ n = argmin µ Σ n = argmin Σ n I s P s µ 2 s=1 n (Is P s µ n )(I s P s µ n ) (P s ΣPs +σ 2 I) 2 F. s=1 Σ n is not constraint to be positive semidefinite: we are typically interested only in the few leading eigenvectors of Σ n. Amit Singer (Princeton University) July / 41

12 Resulting linear equations These lead to linear equations for µ n and Σ n : ( n ) A n µ n := 1 P n sp s µ n = 1 n s=1 n PsI s s=1 L n Σ n := 1 n = 1 n n PsP s Σ n PsP s s=1 n s=1 P s(i s P s µ n )(I s P s µ n ) P s σ2 n n PsP s. s=1 Amit Singer (Princeton University) July / 41

13 Connection to missing-data statistics We recover available-case estimators of mean and covariance Known to be consistent under the missing completely at random (MCAR) assumption Amit Singer (Princeton University) July / 41

PCA in high dimensions For the sample covariance matrix Σ n = 1 n x i xi T, X N(0,I p ) n i=1 the asymptotic regime p,n with p/n γ ( high dimensional statistics ) is by now well

14 PCA in high dimensions For the sample covariance matrix Σ n = 1 n x i xi T, X N(0,I p ) n i=1 the asymptotic regime p,n with p/n γ ( high dimensional statistics ) is by now well understood (Johnstone, ICM 2006) Marcenko-Pastur distribution, Tracy-Widom distribution, spiked model (Σ = I p +σ 2 vv T ), phase transition. Amit Singer (Princeton University) July / 41

15 Connection to high-dimensional PCA 1 n n PsP s Σ n PsP s = 1 n s=1 n s=1 P s(i s P s µ n )(I s P s µ n ) P s σ2 n n PsP s. s=1 When P s = I, we recover the sample covariance matrix How does the spectrum of Σ n behave as a function of n,p,q and the geometry of P s? Empirical spectral density? Distribution of largest eigenvalue? Spiked model? Amit Singer (Princeton University) July / 41

16 Theoretical results about Σ n : Preliminary Study Proposition [Asymptotic Consistency, fixed p] Suppose A n A and L n L, where A,L are invertible. Suppose P s is uniformly bounded and that X has bounded fourth moments. Then, Proposition E[Σ n ] Σ = O ( 1 n ) ; Var[Σ n ] = O ( ) 1, n [Finite Sample Bounds] Suppose X C 1, ǫ s C 2, P s C 3 a.s. Assume that X is centered. If λ = λ min (L n ) and B = C 2 3 C 1 +C 3 C 2, then ( 3nλ 2 t 2 ) /2 P{ Σ Σ n t} pexp B 4 +2B 2, t 0. λt Amit Singer (Princeton University) July / 41

17 Regularization Extreme example: q = 2, P s are random coordinate selection operators. Coupon collector: n p 2 logp for L n to be invertible. Regularization? min Σ n (Is P s µ n )(I s P s µ n ) (P s ΣPs +σ2 I) 2 F +λ Σ 2 F. s=1 This would set unattainable entries of Σ n to 0. While other regularization strategies are possible, e.g., WΣW 1 to promote sparsity in some basis W, they come at a price of increased computational complexity. Amit Singer (Princeton University) July / 41

18 Back to Heterogeneity: Fourier Slice Theorem If ˆP s is the Fourier-domain version of P s, then ˆP s simply restricts Fourier volumes to central planes perpendicular to the viewing direction. Amit Singer (Princeton University) July / 41

19 The slice theorem and covariance matrix estimation The Fourier slice theorem: Better work in Fourier domain ˆφ s = Fφ s, Î s = FI s Estimate ˆµ, ˆΣ and then inverse transform to get µ, Σ. ˆP s is a restriction to a central plane corresponding to R s. In 3D, for any pair of frequencies we can find a central plane that passes through them coordinate selection operators in the continuous limit. The covariance of 2D objects from their 1D line projections cannot be estimated. Amit Singer (Princeton University) July / 41

20 The ramp filter ( Â nˆµ n = 1 n ˆP n s=1 s ˆP ) s ˆµ n In the continuous limit: replace P s with P s and average with integral over SO(3) with respect to the Haar measure. Fourier slice theorem: (ˆP s ˆP sˆφ)(ρ) = ˆφ(ρ)δ(ρ R 3 s ). Âˆµ(ρ) = ˆµ(ρ) 1 δ(ρ θ)dθ = ˆµ(ρ) 1 δ( ρ ˆµ(ρ) θ)dθ = 4π S 2 ρ 4π S 2 ρ 2 ρ. Amit Singer (Princeton University) July / 41

21 The triangular area filter ˆL nˆσn := 1 n ˆP n s=1 s ˆP sˆσnˆp s ˆPs Fourier slice theorem: (ˆP s ˆP sˆφ)(ρ) = ˆφ(ρ)δ(ρ R 3 s ). (ˆP s ˆP sˆσˆp s ˆP s )(ρ 1,ρ 2 ) = ˆΣ(ρ 1,ρ 2 )δ(ρ 1 R 3 s)δ(ρ 2 R 3 s) Integrate over R 3 s : (ˆLˆΣ)(ρ 1,ρ 2 ) = ˆΣ(ρ 1,ρ 2 )K(ρ 1,ρ 2 ). ˆL is a diagonal operator, with K(ρ 1,ρ 2 ) = 1 δ(ρ 1 θ)δ(ρ 2 θ)dθ = 1 2 4π S 2 4π ρ 1 ρ 2. K is the density of planes that go through frequencies ρ 1,ρ 2 The filter 1/K is proportional to the area of a triangle with nodes at ρ 1, ρ 2, and the origin. Amit Singer (Princeton University) July / 41

22 The triangular area filter (ˆLˆΣ)(ρ 1,ρ 2 ) = ˆΣ(ρ 1,ρ 2 )K(ρ 1,ρ 2 ). K(ρ 1,ρ 2 ) = 1 δ(ρ 1 θ)δ(ρ 2 θ)dθ = 1 2 4π S 2 4π ρ 1 ρ 2. Amit Singer (Princeton University) July / 41

23 Challenge: discretization Now that we are in the Fourier domain, how do we discretize? If we using a Cartesian grid and use nearest neighbors interpolation then we incur large discretization error. If the volumes are voxels, then Σ is of size Low pass filter approximation might be unavoidable. We need to respect rotation and restriction to planes (i.e., spherical coordinates), but then ˆL n is no longer diagonal and we can easily run into computational problems with the inversion of ˆL n. We need a representation of volumes in which ˆL n is sparse. Amit Singer (Princeton University) July / 41

24 Finding the right basis Idea: Find image and volume domains Î and ˆV (in Fourier space) so that ˆP s (ˆV) Î. Construction: ({ }) 1 Î = span f k (r)e imϕ ; ˆV = span({f k (r)yl m (θ,ϕ)}). 2π Note: resulting matrix ˆP s is block-diagonal. This is a bit unusual construction, since typically the radial functions depend on the angular frequency l (e.g., Spherical-Bessel, 3D Slepian) We have to be very careful with our construction so that our basis approximately spans B. Amit Singer (Princeton University) July / 41

25 Fourier linear equations The linear equations in the Fourier domain are essentially the same: ( n Â nˆµ n := ˆP s ˆP s )ˆµ n = â; s=1 n ˆL nˆσn := ˆP s ˆP sˆσnˆp s ˆPs = ˆB, s=1 Note: ˆL n is block-diagonal (sparsity!), acting on the blocks ˆΣ k 1,k 2 separately. Denote each block of ˆL n by ˆL k 1,k 2 n. Amit Singer (Princeton University) July / 41

26 Challenge: size of ˆL n We still must deal with the enormous size of ˆL n, where n ˆL nˆσn = ˆP s ˆP sˆσnˆp s ˆPs. s=1 Amit Singer (Princeton University) July / 41

27 Sum to integral If rotations are uniformly distributed, we have ˆL nˆσn = 1 n 1 4π n ˆP s ˆP sˆσnˆp s ˆPs s=1 S 2 ˆP(θ) ˆP(θ)ˆΣnˆP(θ) ˆP(θ)dθ = ˆLˆΣn, a.s. Amit Singer (Princeton University) July / 41

28 What is ˆL? Let us view ˆΣ k 1,k 2 : S 2 S 2 C as bandlimited bilinear form. It turns out that where ˆL k 1,k 2ˆΣk 1,k 2 = LPF k1,k 2 (ˆΣ k 1,k2 K), K(α,β) = 1 2 4π α β. Thus, ˆL is no longer diagonal......but, ˆL turns out to be sparse! This can be shown using Clebsch-Gordan coefficients for products of spherical harmonics. Amit Singer (Princeton University) July / 41

29 Sparsity of ˆL Only about 6% of ˆL 15,15 are nonzero. Figure : Sparsity structure of (block of) ˆL Amit Singer (Princeton University) July / 41

30 Condition number and computational complexity 1 ˆL is self-adjoint and positive semidefinite 2 λ min (ˆL) 2 (implies consistency) 3 Conjecture: λ max (ˆL k 1,k 2 ) grows linearly with min(k 1,k 2 ) ) 4 Conjugate gradient requires O ( κ(ˆl k 1,k 2) iterations, where the cost of each iteration is nnz(ˆl k 1,k 2 ). 5 Computational complexity: (here N res = 2 π w max) O(nN 2 N 2 res +N 9.5 res) but all steps can be run in parallel (e.g., ˆL is block-diagonal). Amit Singer (Princeton University) July / 41

31 Numerical results: Two classes Synthetic dataset: simulate heterogeneity using two toy volumes. Figure : Slices of the toy volumes. Note: ideal covariance matrix should have rank 1. Amit Singer (Princeton University) July / 41

32 Numerical results: Two classes Noisy projection images: (a) Clean (b) SNR = 0.05 (c) SNR = 0.01 Amit Singer (Princeton University) July / 41

33 Numerical results: Two classes Reconstructions from noisy data: (a) Clean (b) SNR = 0.01 (c) SNR = (d) SNR = Figure : Reconstructions of mean and top eigenvector for three different SNR values. The top row contains clean and reconstructed slices of the mean; the bottom row contains slices of the top eigenvector. Amit Singer (Princeton University) July / 41

34 Numerical results: Two classes Figure : Correlations of true and computed top eigenvectors for various SNRs. Amit Singer (Princeton University) July / 41

35 Numerical results: Two classes Figure : Corresponding eigenvalue histograms. Amit Singer (Princeton University) July / 41

Numerical results: Three classes Ideal Σ has rank 2. Figure : Eigenvalue histograms of reconstructed covariance matrix in the three class case for three SNR values.

36 Numerical results: Three classes Ideal Σ has rank 2. Figure : Eigenvalue histograms of reconstructed covariance matrix in the three class case for three SNR values. Note that the noise distribution first engulfs the second eigenvalue, and eventually the top eigenvalue as well. Amit Singer (Princeton University) July / 41

αs,c V c =1 ˆs V ˆ c P ˆs X ˆs P ˆs µ ˆs µ αs,c P ˆn ˆIs P ˆn.

37 Numerical results: Three classes Clustering Least squares and a Gaussian mixture model: ˆs µ X ˆn + C 1 X c =1 C 1 X ˆc. αs,c V c =1 ˆs V ˆ c P ˆs X ˆs P ˆs µ ˆs µ αs,c P ˆn ˆIs P ˆn. Figure : The coordinates αs for the three class case, colored according to true class. Amit Singer (Princeton University) July / 41

38 Numerical results: Continuous Variability Amit Singer (Princeton University) July / 41

39 Summary It is possible to do PCA from projected data and estimate rank without imputation. Open problems for the distribution of eigenvalues of Σ n in the large n, large p regime (also need consider q and distribution of P s ) Generalized the ramp filter of classical computerized tomography to the triangular area filter for tomographic inversion of structural variability. A careful selection of bases and the discovery of a sparse structure allowed us to tractably solve the heterogeneity problem. We began the study of the projection covariance transform L, an operator of interest in tomographic problems involving variability. Amit Singer (Princeton University) July / 41

40 Future research directions Random matrix theory investigation for high dimensional PCA of projected data Compare approach with matrix sensing/completion methods Study ˆL further Include contrast transfer function (CTF) in the forward model Add and examine regularization schemes Relax uniform rotations assumption Test on real datasets Other applications (e.g., 4D tomography) Amit Singer (Princeton University) July / 41

41 Thank You! Funding: NIH/NIGMS R01GM AFOSR FA Simons Foundation LTR DTD DARPA N G. Katsevich, A. Katsevich, A. Singer (submitted). Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem. Amit Singer (Princeton University) July / 41

Some Optimization and Statistical Learning Problems in Structural Biology

Some Optimization and Statistical Learning Problems in Structural Biology Amit Singer Princeton University, Department of Mathematics and PACM January 8, 2013 Amit Singer (Princeton University) January