Non-unique Games over Compact Groups and Orientation Estimation in Cryo-EM

Similar documents
3D Structure Determination using Cryo-Electron Microscopy Computational Challenges

Introduction to Three Dimensional Structure Determination of Macromolecules by Cryo-Electron Microscopy

PCA from noisy linearly reduced measurements

Three-dimensional structure determination of molecules without crystallization

Some Optimization and Statistical Learning Problems in Structural Biology

Multireference Alignment, Cryo-EM, and XFEL

Orientation Assignment in Cryo-EM

Class Averaging in Cryo-Electron Microscopy

Global Positioning from Local Distances

arxiv: v1 [cs.cv] 12 Jul 2016

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Fast and Robust Phase Retrieval

REPRESENTATION THEORY WEEK 7

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Semidefinite Programming

Positive semidefinite rank

NMR of large protein systems: Solid state and dynamic nuclear polarization. Sascha Lange, Leibniz-Institut für Molekulare Pharmakologie (FMP)

NIH Public Access Author Manuscript Ann Math. Author manuscript; available in PMC 2012 November 23.

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Structure, Dynamics and Function of Membrane Proteins using 4D Electron Microscopy

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

CS273: Algorithms for Structure Handout # 13 and Motion in Biology Stanford University Tuesday, 11 May 2003

Three-Dimensional Electron Microscopy of Macromolecular Assemblies

NMR BMB 173 Lecture 16, February

Image Assessment San Diego, November 2005

Equivariant semidefinite lifts and sum of squares hierarchies

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

Determining Protein Structure BIBC 100

Introduction to the Mathematics of Medical Imaging

Computing Steerable Principal Components of a Large Set of Images and Their Rotations Colin Ponce and Amit Singer

Nonlinear Analysis with Frames. Part III: Algorithms

Distance Geometry-Matrix Completion

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Representation theoretic patterns in three dimensional Cryo-Electron Microscopy I: The intrinsic reconstitution algorithm

REPRESENTATION THEORETIC PATTERNS IN THREE DIMENSIONAL CRYO-ELECTRON MICROSCOPY III - PRESENCE OF POINT SYMMETRIES

Electron microscopy in molecular cell biology II

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting

Cellular basis of life History of cell Biology Year Name of the scientist Importance

REPRESENTATION THEORETIC PATTERNS IN THREE DIMENSIONAL CRYO-ELECTRON MICROSCOPY I - THE INTRINSIC RECONSTITUTION ALGORITHM

REPRESENTATION THEORETIC PATTERNS IN THREE DIMENSIONAL CRYO-ELECTRON MICROSCOPY III - PRESENCE OF POINT SYMMETRIES

Experimental Techniques in Protein Structure Determination

7. Nuclear Magnetic Resonance

SDP Relaxations for MAXCUT

Tutorial 1 Geometry, Topology, and Biology Patrice Koehl and Joel Hass

Numerical Approximation Methods for Non-Uniform Fourier Data

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Biological Small Angle X-ray Scattering (SAXS) Dec 2, 2013

Chapter 12: Intracellular sorting

Structural biomathematics: an overview of molecular simulations and protein structure prediction

Basics of protein structure

Fourier Series and Fourier Transform

Lecture Note 5: Semidefinite Programming for Stability Analysis

Semi-definite representibility. For fun and profit

Principles of Nuclear Magnetic Resonance in One and Two Dimensions

ParM filament images were extracted and from the electron micrographs and

Three-dimensional structure of a viral genome-delivery portal vertex

Fourier series on triangles and tetrahedra

Constructing Approximation Kernels for Non-Harmonic Fourier Data

ITERATIVE METHODS IN LARGE FIELD ELECTRON MICROSCOPE TOMOGRAPHY. Xiaohua Wan

Integrating Globus into a Science Gateway for Cryo-EM

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Algorithms in Bioinformatics

CRYSTALLOGRAPHY AND STORYTELLING WITH DATA. President, Association of Women in Science, Bethesda Chapter STEM Consultant

Rotational motion of a rigid body spinning around a rotational axis ˆn;

Cells and the Stuff They re Made of. Indiana University P575 1

Mitochondria Mitochondria were first seen by kollicker in 1850 in muscles and called them sarcosomes. Flemming (1882) described these organelles as

MULTI-SEGMENT RECONSTRUCTION USING INVARIANT FEATURES. Mona Zehni, Minh N. Do, Zhizhen Zhao

Learning Binary Classifiers for Multi-Class Problem

CAP 5510 Lecture 3 Protein Structures

Rochester Institute of Technology Rochester, New York. COLLEGE of Science Department of Chemistry. NEW (or REVISED) COURSE:

Contents. xiii. Preface v

STA 4273H: Statistical Machine Learning

Self-Calibration and Biconvex Compressive Sensing

Solving Corrupted Quadratic Equations, Provably

A New Approach to EM of Helical Polymers Yields New Insights

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE. Chester F. Carlson Center for Imaging Science

FFTs in Graphics and Vision. Fast Alignment of Spherical Functions

Scientific Computing: An Introductory Survey

Anouncements. Assignment 3 has been posted!

MATHEMATICS. Course Syllabus. Section A: Linear Algebra. Subject Code: MA. Course Structure. Ordinary Differential Equations

Sparse Covariance Selection using Semidefinite Programming

Decentralized Control of Stochastic Systems

Nonparametric Bayesian Methods (Gaussian Processes)

Quantum state discrimination with post-measurement information!

4 Group representations

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Lower bounds on the size of semidefinite relaxations. David Steurer Cornell

Automated Assignment of Backbone NMR Data using Artificial Intelligence

Approximation algorithms for nonnegative polynomial optimization problems over unit spheres

Short note on compact operators - Monday 24 th March, Sylvester Eriksson-Bique

Statistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University

Part III Symmetries, Fields and Particles

9.2 Support Vector Machines 159

Introduction to" Protein Structure

Transcription:

Non-unique Games over Compact Groups and Orientation Estimation in Cryo-EM Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics July 28, 2016 Amit Singer (Princeton University) July 2016 1 / 23

Single Particle Reconstruction using cryo-em Schematic drawing of the imaging process: The cryo-em problem: Amit Singer (Princeton University) July 2016 2 / 23

New detector technology: Exciting times for cryo-em BIOCHEMISTRY The Resolution Revolution Werner Kühlbrandt P recise knowledge of the structure of macromolecules in the cell is essential for understanding how they function. Structures of large macromolecules can now be obtained at near-atomic resolution by averaging thousands of electron microscope images recorded before radiation damage accumulates. This is what Amunts et al. have done in their research article on page 1485 of this issue ( 1), reporting the structure of the large subunit of the mitochondrial ribosome at 3.2 Å resolution by electron cryo-microscopy (cryo-em). Together with other recent high-resolution cryo-em structures ( 2 4) (see the figure), this achievement heralds the beginning of a new era in molecular biology, where structures at near-atomic resolution are no longer the prerogative of x-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. Ribosomes are ancient, massive protein- RNA complexes that translate the linear genetic code into three-dimensional proteins. Mitochondria semi-autonomous organelles www.sciencemag.org SCIENCE VOL 343 28 MARCH 2014 1443 A B C Advances in detector technology and image processing are yielding high-resolution electron cryo-microscopy structures of biomolecules. Near-atomic resolution with cryo-em. (A) The large subunit of the yeast mitochondrial ribosome at 3.2 Å reported by Amunts et al. In the detailed view below, the base pairs of an RNA double helix and a magnesium ion (blue) are clearly resolved. (B) TRPV1 ion channel at 3.4 Å ( 2), with a detailed view of residues lining the ion pore on the four-fold axis of the tetrameric channel. (C) F 420-reducing [NiFe] hydrogenase at 3.36 Å ( 3). The detail shows an α helix in the FrhA subunit with resolved side chains. The maps are not drawn to scale. Amit Singer (Princeton University) July 2016 3 / 23

Cryo-EM in the news... March 31, 2016 REPORTS The 3.8 Å resolution cryo-em structure of Zika virus Cite as: Sirohi et al., Science 10.1126/science.aaf5316 (2016). Devika Sirohi, 1 * Zhenguo Chen, 1 * Lei Sun, 1 Thomas Klose, 1 Theodore C. Pierson, 2 Michael G. Rossmann, 1 Richard J. Kuhn 1 1 Markey Center for Structural Biology and Purdue Institute for Inflammation, Immunology and Infectious Disease, Purdue University, West Lafayette, IN 47907, USA. 2 Viral Pathogenesis Section, Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA. Amit Singer (Princeton University) July 2016 4 / 23

Method of the Year 2015 January 2016 Volume 13 No 1 Single-particle cryo-electron microscopy (cryo-em) is our choice for Method of the Year 2015 for its newfound ability to solve protein structures at near-atomic resolution. Featured is the 2.2-Å cryo-em structure of β-galactosidase as recently reported by Bartesaghi et al. (Science 348, 1147 1151, 2015). Cover design by Erin Dewalt. Amit Singer (Princeton University) July 2016 5 / 23

Big Movie Data, Publicly Available http://www.ebi.ac.uk/pdbe/emdb/empiar/ Amit Singer (Princeton University) July 2016 6 / 23

E. coli 50S ribosomal subunit 27,000 particle images provided by Dr. Fred Sigworth, Yale Medical School 3D reconstruction by S, Lanhui Wang, and Jane Zhao Amit Singer (Princeton University) July 2016 7 / 23

Orientation Estimation: Fourier proection-slice theorem (x,y ) R ic c = (x,y,0) T Proection I i Molecule φ R i SO(3) Proection I i Î i 3D Fourier space (x i,y i) R ic = R c i Electron source Proection I Î 3D Fourier space Cryo-EM inverse problem: Find φ (and R 1,...,R n) given I 1,...,I n. n = 3: Vainshtein and Goncharov 1986, van Heel 1987 n > 3: S, Shkolnisky (SIAM Imaging 2011) min R 1,R 2,...,R n SO(3) R i c R c i 2 i Amit Singer (Princeton University) July 2016 8 / 23

Maximum Likelihood Estimation The images contain more information than that expressed by optimal pairwise matching of common lines. Algorithms based on pairwise matching can succeed only at high SNR. We would like to try all possible rotations R 1,...,R n and choose the combination for which the agreement on the common lines (implied by the rotations) as observed in the images is maximal. Computationally intractable: exponentially large search space, complicated cost function. min g 1,...,g n G n i,=1 f (g i g 1 ) Amit Singer (Princeton University) July 2016 9 / 23

3D Puzzle G = SO(3) min g 1,g 2,...,g n G n i,=1 f (g i g 1 ) Amit Singer (Princeton University) July 2016 10 / 23

Non-Unique Games over Compact Groups Optimization problem: min g 1,g 2,...,g n G n i,=1 f (g i g 1 ) G is a compact group, f : G R smooth, bandlimited functions. Parameter space G G G is exponentially large. For G = Z 2 = { 1,+1} this encodes Max-Cut, which is NP-hard. 1 w 14 w 12 2 w 24 w 25 4 w 45 3 w 34 w 35 5 Amit Singer (Princeton University) July 2016 11 / 23

Why non-unique games? Max-2-Lin(Z L ) formulation of Unique Games (Khot et al 2005): Find x 1,...,x n Z L that satisfy as many difference eqs as possible x i x = b mod L, (i,) E This corresponds to G = Z L and { 1 x = b f (x) = 0 x b in min x 1,x 2,...,x n Z L i,=1 n f (x i x ) Our games are non-unique in general, and the group is not necessarily finite. Amit Singer (Princeton University) July 2016 12 / 23

Fourier transform over G Recall for G = SO(2) f(α) = ˆf(k) = 1 2π k= 2π 0 ˆf(k)e ıkα f(α)e ıkα dα In general, for a compact group G ] f(g) = d k Tr[ˆf(k)ρ k (g) ˆf(k) = k=0 G Here ρ k are the unitary irreducible representations of G d k is the dimension of the representation ρ k (e.g., d k = 1 for SO(2), d k = 2k +1 for SO(3)) dg is the Haar measure on G f(g)ρ k (g) dg Amit Singer (Princeton University) July 2016 13 / 23

Linearization of the cost function Introduce matrix variables ( matrix lifting ) Fourier expansion of f Linear cost function f(g 1,...,g n ) = f (g) = n i,=1 X (k) k=0 = ρ k (g i g 1 ) ] d k Tr[ˆf (k)ρ k (g) f (g i g 1 ) = n i,=1k=0 ] d k Tr [ˆf (k)x (k) Amit Singer (Princeton University) July 2016 14 / 23

Constraints on the variables X (k) = ρ k (g i g 1 ) 1 X (k) 0 2 X (k) ii = I dk, for i = 1,...,n 3 rank(x (k) ) = d k = ρ k (g i g 1 ) = ρ k (g i )ρ k (g 1 ) = ρ k (g i )ρ k (g ) ρ k (g 1 ) X (k) ρ k (g 2 ) [ = ρk (g 1 ) ρ k (g 2 ) ρ k (g n ) ]. ρ k (g n ) X (k) We drop the non-convex rank constraint. The relaxation is too loose, as we can have X (k) = 0 (for i ). Even with the rank constraint, nothing ensures that X (k) and X (k ) correspond to the same group element g i g 1. Amit Singer (Princeton University) July 2016 15 / 23

Additional constraints on X (k) = ρ k (g i g 1 ) The delta function for G = SO(2) δ(α) = k= Shifting the delta function to α i α δ(α (α i α )) = k= e ıkα e ıkα e ık(α i α ) = k= The delta function is non-negative and integrates to 1: e ıkα X (k) 0, α [0,2π) k= 1 2π 2π 0 k= e ıkα X (k) (0) dα = X = 1 e ıkα X (k) Amit Singer (Princeton University) July 2016 16 / 23

Finite truncation via Feér kernel In practice, we cannot use infinite number of representations to compose the delta function. Simple truncation leads to the Dirichlet kernel which changes sign D m (α) = m k= m e ıkα This is also the source for the Gibbs phenomenon and the non-uniform convergence of the Fourier series. The Feér kernel is non-negative F m (α) = 1 m m 1 k=0 D k (α) = m k= m ( 1 k ) e ıkα m The Feér kernel is the first order Cesàro mean of the Dirichlet kernel. Amit Singer (Princeton University) July 2016 17 / 23

Finite truncation via Feér-Riesz factorization Non-negativity constraints over SO(2) m ( 1 k ) e ıkα X (k) 0, α [0,2π) m k= m Feér-Riesz: P is a non-negative trigonometric polynomial over the circle, i.e. P(e ıα ) 0 α [0,2π) iff P(e ıα ) = Q(e ıα ) 2 for some polynomial Q. { Leads to semidefinite constraints on X (k) } k for each i,. Similar non-negativity constraints hold for general G using the delta function over G δ(g) = d k Tr[ρ k (g)] k=0 For example, Feér proved that for SO(3) the second order Cesàro mean of the Dirichlet kernel is non-negative. Amit Singer (Princeton University) July 2016 18 / 23

Additional constraints for SO(3) X (k) is a representation of SO(3). X (0) = 1 X (1) convso(3) is a semidefinite constraint using unit quaternions and Euler-Rodrigues formula (Saunderson, Parrilo, Willsky SIOPT 2015) Q = qq T : Q 0, Tr(Q) = 1, Q = T(X (1) ) X (k) sum-of-squares relaxation Amit Singer (Princeton University) July 2016 19 / 23

Tightness of the semidefinite program We solve an SDP for the matrices X (1),...,X (m). Numerically, the solution of the SDP has the desired ranks up to a certain level of noise (w.h.p). In other words, even though the search-space is exponentially large, we typically find the MLE in polynomial time. This is a viable alternative to heuristic methods such as EM and alternating minimization. The SDP gives a certificate whenever it finds the MLE. Amit Singer (Princeton University) July 2016 20 / 23

Final Remarks Loss of handedness ambiguity in cryo-em: If g 1,...,g n SO(3) is the solution, then so is Jg 1 J 1,...,Jg n J 1 for J = diag( 1, 1,1). [ ] Define X (k) = 1 2 ρ k (g i g 1 )+ρ k (Jg i g 1 J 1 ) Splits the representation: 2k +1 = d k = k +(k +1), reduced computation Point group symmetry (cyclic, dihedral, etc.): reduces the dimension of the representation (invariant polynomials) Translations and rotations simultaneously: SE(3) is a non-compact group, but we can map it to SO(4). Simultaneous rotation estimation and classification Amit Singer (Princeton University) July 2016 21 / 23

References A. Singer, Y. Shkolnisky, Three-Dimensional Structure Determination from Common Lines in Cryo-EM by Eigenvectors and Semidefinite Programming, SIAM Journal on Imaging Sciences, 4 (2), pp. 543 572 (2011). A. S. Bandeira, M. Charikar, A. Singer, A. Zhu, Multireference Alignment using Semidefinite Programming, 5th Innovations in Theoretical Computer Science (ITCS 2014). A. S. Bandeira, Y. Chen, A. Singer Non-Unique Games over Compact Groups and Orientation Estimation in Cryo-EM, http://arxiv.org/abs/1505.03840. R. R. Lederman, A. Singer, A Representation Theory Perspective on Simultaneous Alignment and Classification, http://arxiv.org/abs/1607.03464v1. Amit Singer (Princeton University) July 2016 22 / 23

ASPIRE: Algorithms for Single Particle Reconstruction Open source toolbox, publicly available: http://spr.math.princeton.edu/ Amit Singer (Princeton University) July 2016 23 / 23