Global Positioning from Local Distances

Global Positioning from Local Distances Amit Singer Yale University, Department of Mathematics, Program in Applied Mathematics SIAM Annual Meeting, July 2008 Amit Singer (Yale University) San Diego 1 / 34

Joint work with... Ronald Coifman (Yale University, Applied Mathematics) Yoel Shkolnisky (Yale University, Applied Mathematics) Fred Sigworth (Yale School of Medicine, Cellular & Molecular Physiology) Yuval Kluger (NYU, Department of Cell Biology) David Cowburn (New York Structural Biology Center) Yosi Keller (Bar Ilan University, Electrical Engineering) Amit Singer (Yale University) San Diego 2 / 34

Three Dimensional Puzzle Amit Singer (Yale University) San Diego 3 / 34

Cryo Electron Microscopy: Projection Images Projection Image Line Integral y z x g SO(3) Molecule The projection image is P g (x,y) = φ g(x,y,z)dz. φ(r) is the electric potential of the molecule, φ g (r) = φ(g 1 r). Amit Singer (Yale University) San Diego 4 / 34

Projection Images: Toy Example Amit Singer (Yale University) San Diego 5 / 34

Fourier projection-slice theorem (k 1,l 1 ) Projection P 1 Fourier transform ˆP1 3D Fourier space Λ k2,l 2 =Λ k1,l 1 Λ k2,l 2 =Λ k1,l 1 Projection P 2 Fourier transform ˆP 2 3D Fourier space Amit Singer (Yale University) San Diego 6 / 34

The Geometry of the slice theorem Every image is a great circle over S 2. Any pair of images have a common line, or Any pair of great circles meet at two antipodal points. Amit Singer (Yale University) San Diego 7 / 34

Three Dimensional Puzzle The radial lines are the puzzle pieces. Every image is a circular chain of pieces. Common line: meeting point Amit Singer (Yale University) San Diego 8 / 34

Spiders: It s the Network K projection images L radial lines We build a weighted directed graph G = (V,E,W). The vertices are the radial lines ( V = KL) V = {(k,l) : 1 k K, 0 l L 1} E = {((k 1,l 1 ),(k 2,l 2 )) : (k 1,l 1 ) points to (k 2,l 2 )} W (k1,l 1 ),(k 2,l 2 ) = { 1 if ((k1,l 1 ),(k 2,l 2 )) E 0 if ((k 1,l 1 ),(k 2,l 2 )) E W is a sparse weight matrix of size KL KL Amit Singer (Yale University) San Diego 9 / 34

Spider first pair of legs Blue vertex (k 1,l 1 ) is the head of the spider Link (k 1,l 1 ) with (k 1,l 1 + l), d l d (same image radial lines) Weights: W (k1,l 1 ),(k 1,l 1 +l) = 1. Amit Singer (Yale University) San Diego 10 / 34

Spider: remaining legs (k 1,l 1 ) and (k 2,l 2 ) are common radial lines of different images. Links: ((k 1,l 1 ),(k 2,l 2 + l)) E for d l d. Weights: W (k1,l 1 ),(k 2,l 2 +l) = 1. Amit Singer (Yale University) San Diego 11 / 34

Averaging operator Row stochastic normalization of W A = D 1 W. D is a diagonal matrix, D ii = N j=1 W ij. The matrix A is an averaging operator: (Af)(k 1,l 1 ) = 1 {((k 1,l 1 ),(k 2,l 2 )) E} ((k 1,l 1 ),(k 2,l 2 )) E f (k 2,l 2 ). A assigns the head of each spider the average of f over its legs. A1 = 1: trivial eigenvector ψ 0 = 1, with λ 0 = 1. Amit Singer (Yale University) San Diego 12 / 34

Coordinate Eigenvectors Coordinate vectors x, y and z are eigenvectors: Ax = λx Ay = λy Az = λz The center of mass of every spider is beneath the spider s head: any pair of opposite legs balance each other symmetric weights. Amit Singer (Yale University) San Diego 13 / 34

Embedding and algorithm Find the common lines for all pairs of images. Construct the averaging operator A. Compute eigenvectors Aψ i = λ i ψ i. Embed the data into the eigenspace (ψ 1,ψ 2,ψ 3 ) (k,l) (ψ 1 (k,l),ψ 2 (k,l),ψ 3 (k,l)). Reveals molecule orientations up to rotation and reflection. Amit Singer (Yale University) San Diego 14 / 34

Numerical Spectrum 1 0.9 0.8 K=200 L=100 d=10 0.7 0.6 λ i 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 45 50 i Amit Singer (Yale University) San Diego 15 / 34

Toy Example Amit Singer (Yale University) San Diego 16 / 34

E. coli ribosome Amit Singer (Yale University) San Diego 17 / 34

E. coli ribosome Amit Singer (Yale University) San Diego 18 / 34

Beyond CryoEM: Center of mass averaging operator Coordinates x,y,z are eigenfunctions of the averaging operator: Ax = λx Ay = λy Az = λz. Sphere has a constant curvature. Is there a generalization to Euclidean spaces? Amit Singer (Yale University) San Diego 19 / 34

Global Positioning from Local Distances Problem setup: N points r i R p (p = 2,3). Find coordinates r i = (x 1 i,...,xp i ) Given noisy neighboring distances δ ij = r i r j 2 + noise. Solution: Build an operator whose eigenfunctions are the global coordinates. Efficient eigenvector computation of a sparse matrix. Applications: Sensor networks Protein structuring from NMR spectroscopy (1/r 6 decay of the spin-spin interaction between hydrogen atoms) Surface reconstruction and PDE solvers. More? Amit Singer (Yale University) San Diego 20 / 34

Global Positioning from Local Distances: History Multidimensional Scaling (MDS) if all d ij = r i r j are given: law of cosines + SVD of the inner product matrix. Optimization: minimizing a variety of loss functions, e.g. [ min ri r r 1,...,r j 2 δ 2 2 ij] N i j many variables, not convex, local minima. Semidefinite programming (SDP), slow. Graph Laplacian regularization (Weinberger, Sha, Zhu & Saul, NIPS 2006). Amit Singer (Yale University) San Diego 21 / 34

Rigidity Amit Singer (Yale University) San Diego 22 / 34

Locally Rigid Embedding Assume local rigidity. For each point, embed its k-nn locally (using MDS or otherwise). N local coordinate systems mutually rotated and possibly reflected. How to glue the different coordinate systems together? Amit Singer (Yale University) San Diego 23 / 34

Center of Mass Consider the point r i and its k-nn r i1,r i2,...,r ik. Find weights such that r i is the center of mass of its neighbors and k W i,ij r ij = r i j=1 k W i,ij = 1 j=1 System of p + 1 linear equations in k variables; underdetermined for k > p + 1. Weights are invariant to rigid transformations: rotation, translation, reflection. Amit Singer (Yale University) San Diego 24 / 34

Eigenvectors of W The N N weight matrix W is sparse: every row has at most k non-zero elements. W is not symmetric; must have negative weights for points on the boundary. By construction because W1 = 1, Wx 1 = x 1,..., Wx p = x p, N W ij = 1, j=1 N W ij r j = r i. We are practically done: eigenvectors of W with λ = 1 are the desired coordinates. A little linear algebra is needed to deal with multiplicity and noise. j=1 Amit Singer (Yale University) San Diego 25 / 34

1097 US Cities, k = 18 NN y 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 US Cities 0.25 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 x φ 2 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 Locally Rigid Embedding: no noise 0.25 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 φ 1 φ 2 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 Locally Rigid Embedding: 1% noise 0.25 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 φ 1 φ 2 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 Locally Rigid Embedding: 10% noise 0.25 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 φ 1 Amit Singer (Yale University) San Diego 26 / 34

US Cities: Numerical Spectrum 0.015 0.015 0.03 0.01 0.01 0.02 0.005 0.005 0.01 0 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 Numerical spectrum of W for different levels of noise: clean distances (left), 1% noise (center), and 10% noise (right). Amit Singer (Yale University) San Diego 27 / 34

519 hydrogen atoms of 2GGR: 1% noise 20 10 z 0 10 20 20 10 0 y 10 20 20 10 x 0 10 20 Amit Singer (Yale University) San Diego 28 / 34

2GGR: 5% noise 20 10 z 0 10 20 20 10 0 y 10 20 20 0 x 20 40 Amit Singer (Yale University) San Diego 29 / 34

2GGR: Numerical Spectrum 0.05 0.05 0.05 0.04 0.04 0.04 0.03 0.03 0.03 0.02 0.02 0.02 0.01 0.01 0.01 0 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 Amit Singer (Yale University) San Diego 30 / 34

Dealing with the multiplicity The eigenvalue λ = 1 is degenerated with multiplicity p + 1 Wφ i = φ i, i = 0,1,...,p. The computed eigenvectors φ 0,φ 1,...,φ p are linear combinations of 1,x 1,...,x p. We may assume φ 0 = 1 and φ j,1 = 0 for j = 1,...,p. We look for a p p matrix A that maps the eigenmap Φ i = (φ 1 i,...,φp i ) to the original coordinate set r i = (x 1 i,...,xp i ) r i = AΦ i, for i = 1,...,N. The squared distance between r i and r j is d 2 ij = r i r j 2 = (Φ i Φ j ) T A T A(Φ i Φ j ). Overdetermined system of linear equations for the elements of A T A. Least squares gives A T A, whose Cholesky decomposition yields A. Amit Singer (Yale University) San Diego 31 / 34

Dealing with noisy distances δ ij Noise breaks the degeneracy and may lead to crossings of eigenvalues. Coordinate vectors are approximated as linear combinations of m non-trivial eigenvectors Φ i = (φ 1,...,φ m ), with m > p (still m N). r i = AΦ i, with A being p m instead of p p. Replace d 2 ij = r i r j 2 = (Φ i Φ j ) T A T A(Φ i Φ j ) with the constrained minimization problem for the m m semidefinite positive matrix P = A T A min i j [ (Φ i Φ j ) T P(Φ i Φ j ) δ 2 ij] 2, such that P 0. A small SDP (formulation uses the Schur complement lemma). Amit Singer (Yale University) San Diego 32 / 34

Comparison to LLE and Graph Laplacian regularization Locally Linear Embedding (LLE) is a non-linear dimensionality reduction method. LLE: Construct weights by solving an overdetermined system min W ij X j X i, where X i R n. 2 j We have a preprocessing step to reveal vectors (X i are not given). Graph Laplacian regularization: { approximate } coordinate by the eigenvectors of W ij = exp dij 2/ε for i j, W ij = 0 elsewhere. LRE requires locally rigid subgraphs, graph neighbors can be physically distant. Amit Singer (Yale University) San Diego 33 / 34

Thank You! Amit Singer (Yale University) San Diego 34 / 34