Substitution Matrices and Alignment Statistics. Substitution Matrices

Similar documents
Model Fitting and Robust Regression Methods

Remember: Project Proposals are due April 11.

Haplotype Frequencies and Linkage Disequilibrium. Biostatistics 666

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Applied Statistics Qualifier Examination

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

Lecture 5 Single factor design and analysis

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

MATHEMATICAL MODEL AND STATISTICAL ANALYSIS OF THE TENSILE STRENGTH (Rm) OF THE STEEL QUALITY J55 API 5CT BEFORE AND AFTER THE FORMING OF THE PIPES

Chapter Runge-Kutta 2nd Order Method for Ordinary Differential Equations

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Coimisiún na Scrúduithe Stáit State Examinations Commission

Principle Component Analysis

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

Demand. Demand and Comparative Statics. Graphically. Marshallian Demand. ECON 370: Microeconomic Theory Summer 2004 Rice University Stanley Gilbert

Satellite Retrieval Data Assimilation

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Fact: All polynomial functions are continuous and differentiable everywhere.

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus:

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

Proof that if Voting is Perfect in One Dimension, then the First. Eigenvector Extracted from the Double-Centered Transformed

ITERATIVE METHODS FOR SOLVING SYSTEMS OF LINEAR ALGEBRAIC EQUATIONS

6 Roots of Equations: Open Methods

Chemical Reaction Engineering

Designing Information Devices and Systems I Discussion 8B

CISE 301: Numerical Methods Lecture 5, Topic 4 Least Squares, Curve Fitting

Physics 202H - Introductory Quantum Physics I Homework #08 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/11/15

8. INVERSE Z-TRANSFORM

The Fundamental Theorem of Calculus. The Total Change Theorem and the Area Under a Curve.

Review of linear algebra. Nuno Vasconcelos UCSD

Math 426: Probability Final Exam Practice

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

GAUSS ELIMINATION. Consider the following system of algebraic linear equations

13 Design of Revetments, Seawalls and Bulkheads Forces & Earth Pressures

The Trapezoidal Rule

Chapter 1: Fundamentals

QUADRATIC EQUATIONS OBJECTIVE PROBLEMS

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example.

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

4. Eccentric axial loading, cross-section core

Lecture 21: Order statistics

Chemical Reaction Engineering

Definition of Tracking

38.2. The Uniform Distribution. Introduction. Prerequisites. Learning Outcomes

332:221 Principles of Electrical Engineering I Fall Hourly Exam 2 November 6, 2006

Normal Distribution. Lecture 6: More Binomial Distribution. Properties of the Unit Normal Distribution. Unit Normal Distribution

Chapter 6 Continuous Random Variables and Distributions

Calculus - Activity 1 Rate of change of a function at a point.

Least squares. Václav Hlaváč. Czech Technical University in Prague

4.1. Probability Density Functions

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

When a force f(t) is applied to a mass in a system, we recall that Newton s law says that. f(t) = ma = m d dt v,

Recitation 3: More Applications of the Derivative

19 Optimal behavior: Game theory

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Algebra Readiness PLACEMENT 1 Fraction Basics 2 Percent Basics 3. Algebra Basics 9. CRS Algebra 1

QUADRATIC RESIDUES MATH 372. FALL INSTRUCTOR: PROFESSOR AITKEN

Designing Information Devices and Systems I Spring 2018 Homework 7

Intro to Nuclear and Particle Physics (5110)

The Schur-Cohn Algorithm

Homework Assignment 3 Solution Set

1. Extend QR downwards to meet the x-axis at U(6, 0). y

1 Error Analysis of Simple Rules for Numerical Integration

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

Supporting information

Fundamental Theorem of Calculus

Lecture 20: Numerical Integration III

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

ragsdale (zdr82) HW6 ditmire (58335) 1 the direction of the current in the figure. Using the lower circuit in the figure, we get

be the i th symbol in x and

1B40 Practical Skills

Bases for Vector Spaces

fractions Let s Learn to

Course Review Introduction to Computer Methods

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Review of Probability Distributions. CS1538: Introduction to Simulations

Section 14.3 Arc Length and Curvature

Which of the following describes the net ionic reaction for the hydrolysis. Which of the following salts will produce a solution with the highest ph?

Chapter 3 Solving Nonlinear Equations

Search sequence databases 2 10/25/2016

Linear Inequalities. Work Sheet 1

Chapter 1: Logarithmic functions and indices

The practical version

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

u( t) + K 2 ( ) = 1 t > 0 Analyzing Damped Oscillations Problem (Meador, example 2-18, pp 44-48): Determine the equation of the following graph.

8 Laplace s Method and Local Limit Theorems

Lecture 4: Piecewise Cubic Interpolation

T 1 T 2 T 3 T 4 They may be illustrated by triangular patterns of numbers (hence their name) as shown:

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

SULIT /2 3472/2 Matematik Tambahan Kertas 2 2 ½ jam 2009 SEKOLAH-SEKOLAH MENENGAH ZON A KUCHING

MTH 146 Class 7 Notes

Chapter 3 Single Random Variables and Probability Distributions (Part 2)

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

MAT137 Calculus! Lecture 20

UNIT 3 Indices and Standard Form Activities

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

8 factors of x. For our second example, let s raise a power to a power:

Transcription:

Susttuton Mtrces nd Algnment Sttstcs BMI/CS 776 www.ostt.wsc.edu/~crven/776.html Mrk Crven crven@ostt.wsc.edu Ferur 2002 Susttuton Mtrces two oulr sets of mtrces for roten seuences PAM mtrces [Dhoff et l. 1978] BLOSUM mtrces [Henkoff & Henkoff 1992] oth tr to cture the the reltve susttutlt of mno cd rs n the contet of evoluton 1

Susttuton Mtr Motvton consder smlest lgnment: unged glol lgnment of two seuences nd of length n n scorng ths lgnment we d lke to ssess M R seuences hve common ncestor seuences re lgned chnce we d lke our susttuton mtr to score n lgnment estmtng ths rto Susttuton Mtrces: Bsc Ide let e the freuenc of mno cd consder cse where lgnment of nd s rndom: R let e the rolt tht nd derved from common ncestor then the cse where the lgnment s due to common ncestr s: M 2

3 Susttuton Mtrces: Bsc Ide the odds rto of these two lterntves s gven : R M tkng the we get: R M Susttuton Mtrces: Bsc Ide the score for n lgnment s thus gven : the susttuton mtr score for the r s then gven : s ( ( R M s S

PAM Mtrces ut how do we get vlues for (rolt tht nd rose from common ncestor? t deends on how long go seuences dverged dverged recentl: 0 for dverged long go: PAM roch: estmte the rolt tht ws susttuted for n gven mesure of evolutonr dstnce PAM Mtrces ke de: trusted lgnments of closel relted seuences rovde nformton out ocll ermssle muttons ste 1: for 71 roten fmles constructed hothetcl henetc trees from trees flled mtr A wth numer of oserved susttutons oserve: 4

PAM Mtrces ste 2: from A clculte mtr contnng A A c c ste 3: normlze ths mtr so the eected numer of susttutons s 1% of the roten (PAM-1 t 1 PAM Mtrces there s whole fml of mtrces: PAM-10... PAM-250 these mtrces re etrolted from PAM-1 mtr ( mtr multlcton PAM s reltve mesure of evolutonr dstnce 1 PAM 1 cceted mutton er 100 mno cds 250 PAM 2.5 cceted muttons er mno cd 5

PAM Mtrces ste 4: determne the susttuton mtr s t ( P( t BLOSUM Mtrces smlr de to PAM mtrces roltes estmted from more dstntl relted rotens locks of seuence frgments tht reresent structurll conserved regons trnston freuences oserved drectl dentfng locks tht re t lest 45% dentcl (BLOSUM-45 50% dentcl (BLOSUM-50 62% dentcl (BLOSUM-62 etc. 6

PAM 250 Mtr DNA vs. Proten Comrson If the seuence of nterest encodes roten comre t the roten seuence level: mn chnges n DNA seuences do not chnge the encoded roten susttuton mtrces for roten seuences reresent ochemcl nformton 7

Sttstcs of Algnment Scores (how to choose threshold for S for gven S we cn clculte the rolt we would get mtch wth score > S under rndom model (where we re lgnng of lrge numer of unrelted seuences now turn ths round: set S so tht ths rolt s smll; thus the mtches we get re lkel to e sgnfcnt Dstruton of Scores Krln & Altschul PNAS 1990 consder rndom model n whch we re lookng for HSPs (hgh scorng unged locl lgnments the lengths of the seuences n ech r re m nd n the rolt tht there s HSP wth score greter thn S s gven : > S 1 e Kmne λs ths comes from n etreme vlue dstruton 8

Dstruton of Scores > S 1 e Kmne λs S s gven score threshold m nd n re the lengths of the seuences under consderton K nd λ re constnts tht cn e clculted from the susttuton mtr the freuences of the ndvdul mno cds Sttstcs of Algnment Scores gven ths set S so tht rolt of gettng score > S chnce s ver smll (0.05 or less ths nlss ssumes unged lgnments ll resdues drwn ndeendentl eected score for r of rndoml chosen resdues s negtve: 20 s < 0 j 1 comuttonl eerments suggest nlss holds for ged lgnments (ut K nd λ must e estmted from dt j j 9