Distance Geometry-Matrix Completion

Similar documents
Lecture 7: Positive Semidefinite Matrices

Linear Algebra Review. Vectors

Chapter 3 Transformations

The Singular Value Decomposition

Knowledge Discovery and Data Mining 1 (VO) ( )

Geometric Constraints II

Introduction to Semidefinite Programming I: Basic properties a

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Ma/CS 6b Class 20: Spectral Graph Theory

Chapter 7: Symmetric Matrices and Quadratic Forms

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Singular Value Decomposition

9. Nuclear Magnetic Resonance

Linear Algebra: Characteristic Value Problem

Matrix Rank Minimization with Applications

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Linear algebra and applications to graphs Part 1

Ma/CS 6b Class 20: Spectral Graph Theory

SYLLABUS. 1 Linear maps and matrices

HW2 - Due 01/30. Each answer must be mathematically justified. Don t forget your name.

Singular Value Decomposition

Singular Value Decomposition and Principal Component Analysis (PCA) I

Conceptual Questions for Review

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

UNIT 6: The singular value decomposition.

Maths for Signals and Systems Linear Algebra in Engineering

Linear Algebra & Analysis Review UW EE/AA/ME 578 Convex Optimization

Linear Algebra Primer

Reconstruction and Higher Dimensional Geometry

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Computational math: Assignment 1

LinGloss. A glossary of linear algebra

Lecture 20: November 1st

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Chap 3. Linear Algebra

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

7. Symmetric Matrices and Quadratic Forms

Matrices A brief introduction

Lecture 1 and 2: Random Spanning Trees

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Review of Linear Algebra

STAT200C: Review of Linear Algebra

14 Singular Value Decomposition

Math Matrix Algebra

7. Nuclear Magnetic Resonance

CS 143 Linear Algebra Review

Multivariate Statistical Analysis

Linear Algebra Solutions 1

Lecture 8: Linear Algebra Background

Quantum Computing Lecture 2. Review of Linear Algebra

CHARACTERIZATIONS. is pd/psd. Possible for all pd/psd matrices! Generating a pd/psd matrix: Choose any B Mn, then

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Spherical Euclidean Distance Embedding of a Graph

COMP 558 lecture 18 Nov. 15, 2010

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

Parallel Singular Value Decomposition. Jiaxing Tan

Mathematical foundations - linear algebra

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Semidefinite Programming Basics and Applications

I. Multiple Choice Questions (Answer any eight)

Symmetric Matrices and Eigendecomposition

Quick Tour of Linear Algebra and Graph Theory

CSE 554 Lecture 7: Alignment

Solving distance geometry problems for protein structure determination

Review problems for MA 54, Fall 2004.

Linear Algebra Formulas. Ben Lee

PY 351 Modern Physics Short assignment 4, Nov. 9, 2018, to be returned in class on Nov. 15.

Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form. k k n-k. m-k k m-k n-k

Introduction to Numerical Linear Algebra II

Diagonalizing Matrices

Simultaneous Diagonalization of Positive Semi-definite Matrices

Matrix Theory, Math6304 Lecture Notes from October 25, 2012

Tutorials in Optimization. Richard Socher

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Matrix Factorization and Analysis

Basic Elements of Linear Algebra

Taxonomy of n n Matrices. Complex. Integer. Real. diagonalizable. Real. Doubly stochastic. Unimodular. Invertible. Permutation. Orthogonal.

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Linear Algebra Massoud Malek

Least-Squares Rigid Motion Using SVD

Notes on Linear Algebra

Quick Tour of Linear Algebra and Graph Theory

Chapter 1. Matrix Algebra

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

TBP MATH33A Review Sheet. November 24, 2018

ANSWERS. E k E 2 E 1 A = B

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Common-Knowledge / Cheat Sheet

Fundamentals of Matrices

Mathematical Foundations of Applied Statistics: Matrix Algebra

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

1. General Vector Spaces

Lecture 2: Linear Algebra Review

Transcription:

Christos Konaxis Algs in Struct BioInfo 2010

Outline Outline Tertiary structure Incomplete data

Tertiary structure Measure diffrence of matched sets Def. Root Mean Square Deviation RMSD = 1 n x i y i 2, n i=1 where x i, y i R 3 are (C α ) atom coordinates in SAME coordinate frame. X = [x 1,..., x n ], Y = [y 1,..., y n ], then RMSD(X, Y ) = 1 n X Y F, where M 2 F = i,j M 2 ij = tr(m T M), is the Frobenious metric, and tr(a) = i A ii is the trace of matrix A = [A ij ].

Tertiary structure Optimal alignment of matched sets Translate to common origin by subtracting centroid x c = 1 x i. n Rotate to optimal alignment by rotation matrix Q : Q T = I. Also should have detq = 1. Exists deterministic linear algebra algorithm [Kabsch]. Overall cost= O(n 3 ), but least-squares approximation in O(n). i

Tertiary structure Optimal rotation Assume common centroid: RMSD(X, Y ) = min Q Y XQ F, Q T Q = I 3. Y XQ F = tr(y T Y ) + tr(x T X ) 2tr(Q T X T Y ), so must maximize tr(q T X T Y ). Consider SVD: X T Y = UΣV T, U T U = V T V = I, Σ = diag[s i ] : s 1 s 3. Then tr(q T X T Y ) = tr(q T UΣV T ) = tr(v T Q T UΣ) tr(σ), because T = V T Q T U orthonormal T ij 1. Hence maximum at T = I Q = UV T. If dett = 1 then just negate T 33.

Introduction Nuclear Magnetic Resonance (NMR) and Nuclear Overhauser Effect (NOE) spectroscopy provide approximate inter-atomic distances for molecular structures as large as 5.000 atoms. The distances measured by NMR and NOESY experiments (usually a small subset of all possible pairs) must be converted into a 3D structure consistent with the measurements. In general the distances are imprecisely measured: for each distance d ij we have l ij d ij u ij.

The Method is based on the foundational work of Cayley (1841) and Menger (1928) who showed how convexity and other basic geometric properties could be defined in terms of distances between pairs of points. The problem can be reduced to the completion of a partial matrix M satisfying certain prorperties.

Distance matrices Definition (and structure): A distance matrix D is square, D ii = 0, D ij = D ji 0. Definition. A distance matrix D is euclidean and embeddable in R k iff points p i R k : D ij = 1 2 dist(p i p j ) 2. Embeddable matrices in R 3 correspond to 3D conformations.

Embedding matrices in R k Thm [Schoenberg 35,Blumenthal 53] Take border (Cayley-Menger) matrix 0 1 1 1 B =. D. 1 Then, D embeds in R k iff rank(b) k + 2. Cor. A distance matrix D expresses a 3D conformation iff rank(b) = 5

Cyclohexane s distance matrix p 1 p 2 p 3 p 4 p 5 p 6 p 1 p 2 p 3 p 4 p 5 p 6 0 1 1 1 1 1 1 1 0 u c x 14 c u 1 u 0 u c x 25 c 1 c u 0 u c x 36 1 x 14 c u 0 u c 1 c x 25 c u 0 u 1 u c x 36 c u 0 Known: u 1.526 (adjacent), φ 110.4 o c 2.285 (triangle). Rank condition (= 5) equivalent to the vanishing of all 6 6 minors. This yields a 3 3 system of quadratic polynomials in the x 14, x 25, x 36. If all c, u same, then 2 isolated conformations, one 1-dim set. If the c, u perturbed, then 16 solutions R [Emiris-Mourrain].

Points from distances Given distance matrix D, compute coordinate matrix X. D = 0 v 1 2 v 2 2... v n 2 v 1 2 0 v 1 v 2 2... v 1 v n 2 v 2 2 v 2 v 1 2 0... v 2 v n 2. Find Gram matrix G =........ v n 2 v n v 1 2 v n v 2 2... 0 v 1 v 1 v 1 v 2... v 1 v n v 2 v 1 v 2 v 2... v 2 v n........ v nv 1 v nv 2... v nv n Elements of G computed using: 2v i v j = v i 2 + v j 2 v i v j 2, or 1. Subtract the first row of D from each row. 2. Subtract the first column from each column. 3. Delete the first row and column. Result: 2G.

Points from distances Given G, find n 3 matrix X s.t. G = X T X using eigenvectors matrix V (s.t. V T V = I ), and eigenvalues diagonal matrix E. Then, GV = EV. Since G is symmetric and comes from a 3D distance matrix, it has 3 non-zero real eigenvalues with real eigenvectors. Construct a diagonal matrix E whose entries on the diagonal are the square roots of the entries of E. Then, G = V E EV T = X T X, where X := EV T.

Positive (semi)definite matrices Def. An n n real matrix M is positive (semi)definite if Denoted M 0, M 0. [ ] Examples: 1 1 1 1 x T Mx > 0 (x T Mx 0) x 0. 3 1 2 2 2 0 1 0 1 Lem. [Sylvester]. M 0 det M i > 0, i i upper-left minor M i, i = 1,..., n. Cor. M 0 det M > 0. If M 0 (M 0), then all its eigenvalues are positive (non-negative).

Embeding via rank Thm. {p i } embed in R 3 (and not R 2 ) iff D 0 & rkd = 3. [ ] y R n, y T PP T y = (y T P)(y T P) T 0: positive semidefinite. rk(ab) = min{rka,rkb} rkd =rkp. embed p a, p b, p c linearly independent rkp = 3. [ ] Singular Value Decomposition (SVD): symmetric real D = USU T, where U T U = UU T = I, S =diag[s 1,..., s n ], s i = eigenvalue 2 0. rkd = 3 S = [s 1, s 2, s 3, 0, ], S = [ s i ], D = U S SU T. P := U S is n 3: defines n points p i R 3.

Bound smoothing For any three points i, j, k in R 3 the triangle inequality holds: d ik d jk d ij d ik + d jk. Meausered distances: l ij d ij u ij l ik d ik u ik l jk d jk u jk Improved upper bound: ū ij = min{u ij, u ik + u jk } Improved lower bound: l ij = max{l ij, l ik u jk, l jk u ik }

The tightened upper bounds can be computed independently of the lower. The tightened upper bounds further improve the tightened lower bounds: ū ik u ik, ū jk u jk, l ij = max{l ij, l ik u jk, l jk u ik }, so lij = max{l ij, l ik ū jk, l jk ū ik } If l ij > ū ij (e.g. when an upper bound is too low) we have a triangle inequality violation.

Incomplete data Matrix completion problems Problem: Given a partial matrix M, can M be completed to a positive semidefinite matrix (PSD), positive definite matrix (PD), Euclidean distance matrix (EDM)? EDM is reduced to PSD: If D = (d ij ), d ii = 0, is a symmetric n n matrix, define (n 1) (n 1) symmetric matrix X = (x ij ), where x ij := 1 2 (d in + d jn d ij ), i, j = 1,..., n 1. D is a distance matrix iff X is positive semidefinite. Moreover v i R k iff rank(x ) k.

Incomplete data It is not known if PSD is in NP. PSD can be solved with an arbitrary precision in polynomial time (interior point, elipsoid method). We will examine polynomial instances of PSD. If a matrix contains a fully determined line or column then its completion problem reduces to the completion of a smaller matrix.

Incomplete data Assumptions: M = Hermitian (M = M), all diagonal entries of M are specified (for positive semidefinite matrices), moreover they are equal to zero (for distance matrices). If m ij is specified, then m ji = m ij is also specified. For every n n matrix M we define the graph (pattern of M) G = ([1, n], E) with vertices [1,..., n]. There is an edge ij between vertices i, j if entry m ij is specified.

Incomplete data Partial PSD matrices Def. A matrix M is partial positive semidefinite (partial-psd) if every principal specified submatrix of M is positive semidefinite. Lem. If the incomplete matrix M has a PSD (PD, distance matrix) completion, then M is partial-psd (partial-pd, partial-distance matrix). Hence we have necessary (but not sufficient) conditions for the existance of a completion of M. Counter-example: Partial-PSD but PSD-completion: 1 1? 0 1 1? 1 1 1

Incomplete data Chordal graphs A chord of a circuit C of a graph G is an edge of G which is not in C but which joins two vertices of C. A graph is chordal if every circuit of length 4 has at least one chord. Th. Every partial positive semidefinite matrix M with pattern G has a positive semidefinite completion iff G is chordal. Constructive proof; can be turned into a polynomial time algorithm. Th. PSD can be solved in polynomial time if the matrix has a chordal pattern. (bit model).