Organisatorische Details

Similar documents
Machine Learning 2017

BINF6201/8201. Molecular phylogenetic methods

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Phylogenetic Tree Reconstruction

Evolutionary Tree Analysis. Overview

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

p(d θ ) l(θ ) 1.2 x x x

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Algorithms in Bioinformatics

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Phylogeny: building the tree of life

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Phylogenetic inference

PATTERN CLASSIFICATION

BIOL 428: Introduction to Systematics Midterm Exam

Phylogenetic trees 07/10/13

Phylogenetic Networks, Trees, and Clusters

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

TheDisk-Covering MethodforTree Reconstruction

Evolutionary Trees. Evolutionary tree. To describe the evolutionary relationship among species A 3 A 2 A 4. R.C.T. Lee and Chin Lung Lu

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Organizing Life s Diversity

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

TMA4125 Matematikk 4N Spring 2017

Hence a root lies between 1 and 2. Since f a is negative and f(x 0 ) is positive The root lies between a and x 0 i.e. 1 and 1.

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Dr. Amira A. AL-Hosary

How should we organize the diversity of animal life?

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Phylogeny: traditional and Bayesian approaches

Constructing Evolutionary/Phylogenetic Trees

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

p(x ω i 0.4 ω 2 ω

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Lecture 11 Friday, October 21, 2011

CHAPTER 26 PHYLOGENY AND THE TREE OF LIFE Connecting Classification to Phylogeny

The Perceptron. Volker Tresp Summer 2014

1 Number Systems and Errors 1

Evolutionary Models. Evolutionary Models

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Linear Discrimination Functions

Parametric Techniques

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Theory of Evolution Charles Darwin

Effects of Gap Open and Gap Extension Penalties

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Phylogenetics. BIOL 7711 Computational Bioscience

Multiple Sequence Alignment using Profile HMM

A Phylogenetic Network Construction due to Constrained Recombination

Numerical Methods - Numerical Linear Algebra

Parametric Techniques Lecture 3

Molecular Evolution and Phylogenetic Tree Reconstruction

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Ch 4. Linear Models for Classification

Modern Evolutionary Classification. Section 18-2 pgs

X X (2) X Pr(X = x θ) (3)

Chapter 19: Taxonomy, Systematics, and Phylogeny

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

EVOLUTIONARY DISTANCES

Consistency Index (CI)

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Outline. Classification of Living Things

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

CS 195-5: Machine Learning Problem Set 1


8/23/2014. Phylogeny and the Tree of Life

Machine Learning Lecture 7

Phylogenetics: Parsimony

Introduction to Applied Linear Algebra with MATLAB

Phylogeny Tree Algorithms

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Linear Regression (continued)

How to read and make phylogenetic trees Zuzana Starostová

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Phylogenetic Analysis

Non-parametric Classification of Facial Features

Multiclass Classification-1

CHAPTER 10 Taxonomy and Phylogeny of Animals

Phylogenetic Analysis

Phylogenetic Analysis

Clustering VS Classification

The Perceptron. Volker Tresp Summer 2016

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Linear discriminant functions

Transcription:

Organisatorische Details Vorlesung: Di 13-14, Do 10-12 in DI 205 Übungen: Do 16:15-18:00 Laborraum Schanzenstrasse Vorwiegend Programmieren in Matlab/Octave Teilnahme freiwillig. Übungsblätter jeweils Di nach der Vorlesung online, Abgabe Do vor der Übung in courses. Kriterium für Prüfungszulassung: 75% der Übungsblätter sinnvoll bearbeitet. Besprechung der Aufgaben eine Woche später. Material: CS253 http://informatik.unibas.ch/ Lehre/Teaching Anmeldung: MOnA (Veranstaltung), courses (Übungen) WiRe 12 V. Roth 1

Chapter 0 Overview WiRe 12 V. Roth 2

Linear Systems of Equations The solution set for the equations x y = 1 and 3x + y = 9 is the single point (2, 3). The solution set for two equations in three variables is usually a line. WiRe 12 V. Roth 3

Some examples The equations 3x + 2y = 6 and 3x + 2y = 12 are inconsistent. x 2y = 1, 3x+5y = 8, and 4x+3y = 7 are not linearly independent. WiRe 12 V. Roth 4

Numerical Methods for Linear Systems Direct solution methods Gauss-Jordan elimination with pivoting Matrix factorizations (LU, Cholesky) quantifying inaccuracy conditioning Iterative solution methods Jacobi iterative improvement Over-determined systems: singular value decomposition WiRe 12 V. Roth 5

LU factorization Ax = b becomes LUx = b, or equivalent to...... Ly = b solved by forward-substitution, followed by...... Ux = y solved by back-substitution WiRe 12 V. Roth 6

Singular Value Decomposition An ill-conditioned system Ax = b may have a direct solution, but this may be only a poor approximation of the exact solution x Better: use SVD and zero the small singular values WiRe 12 V. Roth 7

Application Example: Modelling face images WiRe 12 V. Roth 8

Application Example: Modelling face images WiRe 12 V. Roth 9

The two main problems in supervised learning WiRe 12 V. Roth 10

Least squares problem WiRe 12 V. Roth 11

SVD and LS problem: sales pitch The SVD method is... powerful convenient intuitive numerically advantageous Problems with ill-conditioning can be circumvented automatically The SVD can solve problems for which both the normal equations and other factorizations fail WiRe 12 V. Roth 12

Classification Classification: Find class boundaries in training data {(x 1, y 1 ),..., (x n, y n )} by learning discriminants Supervised Learning width 22 21 20 19 18 17 16 15 salmon sea bass 14 2 4 6 8 10 lightness FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The dark line could serve as a decision boundary of our classifier. Overall classification error on the data shown is lower than if we use only one feature as in Fig. 1.3, but there will still be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. WiRe 12 V. Roth 13

Fisher s Linear Discriminant Analysis WiRe 12 V. Roth 14

Fishers discriminant and least squares Remark: The Fisher vector ĉ F = Σ 1 W (m 1 m 2 ) coincides with the solution of the LS problem ĉ LS = arg min w Ac = b 2 if n 1 = # samples in class 1 n 2 = # samples in class 2 +1/n 1 x t 1 +1/n b = 1 x, A = t n 1 1/n 2 x t n 1 +1 1/n 2 x t n 1 +n 2 n with x i = 0 (i.e. origin in sample mean) i=1 WiRe 12 V. Roth 15

Example: Secondary Structure Prediction in Proteins Approach: Fisher s discriminant least-squares SVD WiRe 12 V. Roth 16

Linear Programming (LP) Linear programming, sometimes called linear optimization, solves the problem: For d independent variables x 1,..., x d maximize subject to the constraints z = c 1 x 1 + c 2 x 2 + + c d x d = cx (1) Ax b (2) A is an n d-matrix, c and x are d-dimensional vectors, b is a n-dimensional vector. WiRe 12 V. Roth 17

y Example: Simplex u = 0 v = 0 x = 0 1 w = 0 y = 0 1 x WiRe 12 V. Roth 18

Optimization without Gradients Optimization with gradient information: steepest descent, conjugate gradients, Newton etc. (will be covered in the Numerics course) Sometimes direct methods without gradient information are needed: functional is not differentiable, gradients difficult to compute, gradient-based optimization problematic due to many local minima Example: Image registration Proposed method: Downhill-Simplex WiRe 12 V. Roth 19

Example: Non-rigid Multi-modal Registration Non-rigid, multi-modal MR-CT registration (ear). As the CT images generally have less geometric distortions they should be taken as the reference image MR taken as the floating image Original MR Original CT with MR contour Registered MR CT with registered MR contour WiRe 12 V. Roth 20

Dynamic Programming R. Bellman began the systematic study of dynamic programming in 1955. The word programming refers to the use of a tabular solution method. DP typically applies to optimization problems in which a subproblems of the same form often arise. Key technique: store the solution to each subproblem in case it should reappear. WiRe 12 V. Roth 21

Examples: Optimal Binary Search Trees Keys -5 1 8 7 13 21 Probabilities 1/8 1/32 1/16 1/32 1/4 1/2 8 1 13-5 7 21 WiRe 12 V. Roth 22

DP for comparing biological sequences Theory: during the course of evolution mutations occurred, creating differences between families of contemporary species. Point mutations: Insertion - insertion of one or several letters to the sequence. Deletion - deleting a letter (or more) from the sequence. Substitution - replacing a sequence letter by another. When we compare two sequences, we are looking for evidence that they have diverged from a common ancestor by a mutation process. How can similarity be formalized? WiRe 12 V. Roth 23

Sequence Alignment Definition 1. (informal) Aligning two sequences x = x 1... x n and y = y 1... y m : (i) insert spaces, (ii) place resulting sequences one above the other so that every character or space has a counterpart in the other sequence. Example: sequences ACBCDDDB would be and CADBDAD. One possible alignment A C - - B C D D D B - C A D B - D A D - another one - A C B C D D D B C A D B D A D - - WiRe 12 V. Roth 24

The Finite State Automaton (FSA) model M(i 1, j 1) + s(x i, y j ) M(i, j) = max I x (i 1, j 1) + s(x i, y j ) I y (i 1, j 1) + s(x i, y j ) { M(i 1, j) d I x (i, j) = max I x (i 1, j) e { M(i, j 1) d I y (i, j) = max I y (i, j 1) e Assumption: A deletion will not be followed directly by an insertion. Always guaranteed if d e less than the lowest mismatch score. WiRe 12 V. Roth 25

Example FSA alignment WiRe 12 V. Roth 26

What is Phylogenetics? A: most recent common ancestor of bird and jellyfish X: portion of history shared by bird and jellyfish WiRe 12 V. Roth 27

The Problem of Phylogenetic Tree Construction Problem: Find tree which best describes the relationship between a set of objects. carrot whale chimpanzee human Cladistics: systematic classification of groups of organisms on the basis of shared characteristics being derived from a common ancestor. Assumptions: Any group of organisms are related by descent from a common ancestor. There is a bifurcation (binary) pattern of cladogenesis. Changes in characteristics occur in lineages over time. WiRe 12 V. Roth 28

Application Areas Research in biology, linguistics, archaeology,.... - The Tree of Life: (Systematics) 1st generation: Linnaeus (1758) Independent of evolutionary history 2nd generation: Lamarck, Darwin, Haeckel (1800s) Based on phylogenetic relationships (no objective criteria). 3rd generation: Zimmerman, Henning et al. (50s and 60s) Phylogenies based on shared attributes ( character compatibility models). 4th generation: Many people (since the 1970s) Molecular sequence data available in huge quantities - The Indo-European tree of languages by Ringe, Warnow et al. (1995) WiRe 12 V. Roth 29

Indo-European Language Tree WiRe 12 V. Roth 30

Phylogenies with Protein Sequences Peptide sequences of Triosephosphate Isomerase: Spinach Rice Mosquito Monkey Human CNGTKESITKLVSDLNSATLEAD--VDVVVAPPFVYIDQVKSSLTGRVEISA CNGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQVAA MNGDKASIADLCKVLTTGPLNAD--TEVVVGCPAPYLTLARSQLPDSVCVAA MNGRKQNLGELIGTLNAAKVPAD--TEVVCAPPTAYIDFARQKLDPKIAVAA MNGRKQSLGELIGTLNAAKVPAD--TEVVCAPPTAYIDFARQKLDPKIAVAA (Differences between Spinach and Rice = orange, differences between monkey and human = blue, gap = - ). Basis of Phylogenetic Inference: The more differences the less related are species. Find tree which best explains differences. WiRe 12 V. Roth 31

The Least Squares Tree Problem Problem: Least Squares Tree. INPUT: The distance D ij between species i and j, for each 1 i, j n, and a corresponding set of weights w ij. QUESTION: Find the phylogenetic tree T, with the species as its leaves, that minimizes SSQ(T ). In general, finding the least squares tree is an NP-complete problem. Two polynomial heuristics - UPGMA and Neighbor-Joining. WiRe 12 V. Roth 32

1 2 3 4 5 6 1 2 t =t =1/2 d 1 2 12 1 2 3 4 5 6 7 1/2 d 45 1 2 4 5 WiRe 12 V. Roth 33

1 2 3 4 8 5 6 7 1 2 d 37 1 2 4 5 3 1 2 3 9 4 5 6 7 8 1 2 d 68 1 2 4 5 3 WiRe 12 V. Roth 34

Parsimony 1 2 3 4 5 6 Aardvark C A G G T A Bison C A G A C A Chimp C G G G T A Dog T G C A C T Elephant T G C G T A WiRe 12 V. Roth 35

Sankoff s DP algorithm Step 1: for each node v and each state t compute quantity St c (v): minimum cost of the subtree whose root is v given the state of character v is t, i.e. (v c = t). In postorder: for each leaf v: S c t (v) = { 0 vc = t otherwise For an internal node v, with subnodes u and w: S c t (v) = min i {C c ti + S c i (u)} + min j { C c tj + S c j(w) } node v v = t c t > i t > j subnode u u =i c j subnode w WiRe 12 V. Roth 36

Branch and Bound for Parsimony (cont d) WiRe 12 V. Roth 37