Organisatorische Details Vorlesung: Di 13-14, Do 10-12 in DI 205 Übungen: Do 16:15-18:00 Laborraum Schanzenstrasse Vorwiegend Programmieren in Matlab/Octave Teilnahme freiwillig. Übungsblätter jeweils Di nach der Vorlesung online, Abgabe Do vor der Übung in courses. Kriterium für Prüfungszulassung: 75% der Übungsblätter sinnvoll bearbeitet. Besprechung der Aufgaben eine Woche später. Material: CS253 http://informatik.unibas.ch/ Lehre/Teaching Anmeldung: MOnA (Veranstaltung), courses (Übungen) WiRe 12 V. Roth 1
Chapter 0 Overview WiRe 12 V. Roth 2
Linear Systems of Equations The solution set for the equations x y = 1 and 3x + y = 9 is the single point (2, 3). The solution set for two equations in three variables is usually a line. WiRe 12 V. Roth 3
Some examples The equations 3x + 2y = 6 and 3x + 2y = 12 are inconsistent. x 2y = 1, 3x+5y = 8, and 4x+3y = 7 are not linearly independent. WiRe 12 V. Roth 4
Numerical Methods for Linear Systems Direct solution methods Gauss-Jordan elimination with pivoting Matrix factorizations (LU, Cholesky) quantifying inaccuracy conditioning Iterative solution methods Jacobi iterative improvement Over-determined systems: singular value decomposition WiRe 12 V. Roth 5
LU factorization Ax = b becomes LUx = b, or equivalent to...... Ly = b solved by forward-substitution, followed by...... Ux = y solved by back-substitution WiRe 12 V. Roth 6
Singular Value Decomposition An ill-conditioned system Ax = b may have a direct solution, but this may be only a poor approximation of the exact solution x Better: use SVD and zero the small singular values WiRe 12 V. Roth 7
Application Example: Modelling face images WiRe 12 V. Roth 8
Application Example: Modelling face images WiRe 12 V. Roth 9
The two main problems in supervised learning WiRe 12 V. Roth 10
Least squares problem WiRe 12 V. Roth 11
SVD and LS problem: sales pitch The SVD method is... powerful convenient intuitive numerically advantageous Problems with ill-conditioning can be circumvented automatically The SVD can solve problems for which both the normal equations and other factorizations fail WiRe 12 V. Roth 12
Classification Classification: Find class boundaries in training data {(x 1, y 1 ),..., (x n, y n )} by learning discriminants Supervised Learning width 22 21 20 19 18 17 16 15 salmon sea bass 14 2 4 6 8 10 lightness FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The dark line could serve as a decision boundary of our classifier. Overall classification error on the data shown is lower than if we use only one feature as in Fig. 1.3, but there will still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. WiRe 12 V. Roth 13
Fisher s Linear Discriminant Analysis WiRe 12 V. Roth 14
Fishers discriminant and least squares Remark: The Fisher vector ĉ F = Σ 1 W (m 1 m 2 ) coincides with the solution of the LS problem ĉ LS = arg min w Ac = b 2 if n 1 = # samples in class 1 n 2 = # samples in class 2 +1/n 1 x t 1 +1/n b = 1 x, A = t n 1 1/n 2 x t n 1 +1 1/n 2 x t n 1 +n 2 n with x i = 0 (i.e. origin in sample mean) i=1 WiRe 12 V. Roth 15
Example: Secondary Structure Prediction in Proteins Approach: Fisher s discriminant least-squares SVD WiRe 12 V. Roth 16
Linear Programming (LP) Linear programming, sometimes called linear optimization, solves the problem: For d independent variables x 1,..., x d maximize subject to the constraints z = c 1 x 1 + c 2 x 2 + + c d x d = cx (1) Ax b (2) A is an n d-matrix, c and x are d-dimensional vectors, b is a n-dimensional vector. WiRe 12 V. Roth 17
y Example: Simplex u = 0 v = 0 x = 0 1 w = 0 y = 0 1 x WiRe 12 V. Roth 18
Optimization without Gradients Optimization with gradient information: steepest descent, conjugate gradients, Newton etc. (will be covered in the Numerics course) Sometimes direct methods without gradient information are needed: functional is not differentiable, gradients difficult to compute, gradient-based optimization problematic due to many local minima Example: Image registration Proposed method: Downhill-Simplex WiRe 12 V. Roth 19
Example: Non-rigid Multi-modal Registration Non-rigid, multi-modal MR-CT registration (ear). As the CT images generally have less geometric distortions they should be taken as the reference image MR taken as the floating image Original MR Original CT with MR contour Registered MR CT with registered MR contour WiRe 12 V. Roth 20
Dynamic Programming R. Bellman began the systematic study of dynamic programming in 1955. The word programming refers to the use of a tabular solution method. DP typically applies to optimization problems in which a subproblems of the same form often arise. Key technique: store the solution to each subproblem in case it should reappear. WiRe 12 V. Roth 21
Examples: Optimal Binary Search Trees Keys -5 1 8 7 13 21 Probabilities 1/8 1/32 1/16 1/32 1/4 1/2 8 1 13-5 7 21 WiRe 12 V. Roth 22
DP for comparing biological sequences Theory: during the course of evolution mutations occurred, creating differences between families of contemporary species. Point mutations: Insertion - insertion of one or several letters to the sequence. Deletion - deleting a letter (or more) from the sequence. Substitution - replacing a sequence letter by another. When we compare two sequences, we are looking for evidence that they have diverged from a common ancestor by a mutation process. How can similarity be formalized? WiRe 12 V. Roth 23
Sequence Alignment Definition 1. (informal) Aligning two sequences x = x 1... x n and y = y 1... y m : (i) insert spaces, (ii) place resulting sequences one above the other so that every character or space has a counterpart in the other sequence. Example: sequences ACBCDDDB would be and CADBDAD. One possible alignment A C - - B C D D D B - C A D B - D A D - another one - A C B C D D D B C A D B D A D - - WiRe 12 V. Roth 24
The Finite State Automaton (FSA) model M(i 1, j 1) + s(x i, y j ) M(i, j) = max I x (i 1, j 1) + s(x i, y j ) I y (i 1, j 1) + s(x i, y j ) { M(i 1, j) d I x (i, j) = max I x (i 1, j) e { M(i, j 1) d I y (i, j) = max I y (i, j 1) e Assumption: A deletion will not be followed directly by an insertion. Always guaranteed if d e less than the lowest mismatch score. WiRe 12 V. Roth 25
Example FSA alignment WiRe 12 V. Roth 26
What is Phylogenetics? A: most recent common ancestor of bird and jellyfish X: portion of history shared by bird and jellyfish WiRe 12 V. Roth 27
The Problem of Phylogenetic Tree Construction Problem: Find tree which best describes the relationship between a set of objects. carrot whale chimpanzee human Cladistics: systematic classification of groups of organisms on the basis of shared characteristics being derived from a common ancestor. Assumptions: Any group of organisms are related by descent from a common ancestor. There is a bifurcation (binary) pattern of cladogenesis. Changes in characteristics occur in lineages over time. WiRe 12 V. Roth 28
Application Areas Research in biology, linguistics, archaeology,.... - The Tree of Life: (Systematics) 1st generation: Linnaeus (1758) Independent of evolutionary history 2nd generation: Lamarck, Darwin, Haeckel (1800s) Based on phylogenetic relationships (no objective criteria). 3rd generation: Zimmerman, Henning et al. (50s and 60s) Phylogenies based on shared attributes ( character compatibility models). 4th generation: Many people (since the 1970s) Molecular sequence data available in huge quantities - The Indo-European tree of languages by Ringe, Warnow et al. (1995) WiRe 12 V. Roth 29
Indo-European Language Tree WiRe 12 V. Roth 30
Phylogenies with Protein Sequences Peptide sequences of Triosephosphate Isomerase: Spinach Rice Mosquito Monkey Human CNGTKESITKLVSDLNSATLEAD--VDVVVAPPFVYIDQVKSSLTGRVEISA CNGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQVAA MNGDKASIADLCKVLTTGPLNAD--TEVVVGCPAPYLTLARSQLPDSVCVAA MNGRKQNLGELIGTLNAAKVPAD--TEVVCAPPTAYIDFARQKLDPKIAVAA MNGRKQSLGELIGTLNAAKVPAD--TEVVCAPPTAYIDFARQKLDPKIAVAA (Differences between Spinach and Rice = orange, differences between monkey and human = blue, gap = - ). Basis of Phylogenetic Inference: The more differences the less related are species. Find tree which best explains differences. WiRe 12 V. Roth 31
The Least Squares Tree Problem Problem: Least Squares Tree. INPUT: The distance D ij between species i and j, for each 1 i, j n, and a corresponding set of weights w ij. QUESTION: Find the phylogenetic tree T, with the species as its leaves, that minimizes SSQ(T ). In general, finding the least squares tree is an NP-complete problem. Two polynomial heuristics - UPGMA and Neighbor-Joining. WiRe 12 V. Roth 32
1 2 3 4 5 6 1 2 t =t =1/2 d 1 2 12 1 2 3 4 5 6 7 1/2 d 45 1 2 4 5 WiRe 12 V. Roth 33
1 2 3 4 8 5 6 7 1 2 d 37 1 2 4 5 3 1 2 3 9 4 5 6 7 8 1 2 d 68 1 2 4 5 3 WiRe 12 V. Roth 34
Parsimony 1 2 3 4 5 6 Aardvark C A G G T A Bison C A G A C A Chimp C G G G T A Dog T G C A C T Elephant T G C G T A WiRe 12 V. Roth 35
Sankoff s DP algorithm Step 1: for each node v and each state t compute quantity St c (v): minimum cost of the subtree whose root is v given the state of character v is t, i.e. (v c = t). In postorder: for each leaf v: S c t (v) = { 0 vc = t otherwise For an internal node v, with subnodes u and w: S c t (v) = min i {C c ti + S c i (u)} + min j { C c tj + S c j(w) } node v v = t c t > i t > j subnode u u =i c j subnode w WiRe 12 V. Roth 36
Branch and Bound for Parsimony (cont d) WiRe 12 V. Roth 37