Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Size: px
Start display at page:

Download "Computational Biology Lecture 8: Substitution matrices Saad Mneimneh"

Transcription

1 Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally or amno acd sequences (protens). Instead, more elaborated scorng unctons are used. These scores are usually obtaned as a result o analyzng chemcal propertes and statstcal data or amno acds and DNA sequences. For example, t s known that same sze amno acds are more lkely to be substtuted by one another. Smlarly, amno acds wth same anty to water are lkely to serve the same purpose n some cases. On the other hand, some mutatons are not acceptable (may lead to demse o the organsm). PAM and BLOSUM matrces are amongst results o such analyss. We wll see the technques through whch PAM and BLOSUM matrces are obtaned. Substrtuton matrces Chemcal propertes o amno acds govern how the amno acds substtue one another. In prncple, a substrtuton matrx s, where s j s used to score algnng character wth character j, should relect the probablty o two characters substtung one another. The queston s how to buld such a probablty matrx that closely maps realty? Derent strateges result n derent matrces but the central dea s the same. I we go back to the concept o a hgh scorng segment par, theory tells us that the algnment (ungapped) gven by such a segment s governed by a lmtng dstrbuton such that where: s s the substuton matrx used q j = e λsj q j s the probablty o observng character algned wth character j p s the probablty o occurrence o character Thereore, s j = λ ln q j Ths ormula or s j suggests a way to constrcut the matrx s. I hgh scorng algnments are to be real, {q j } represent the desred probabltes o substtons, whle {p } represent the background probabltes o occurrence. By observng related amles o sequences, one could estmate q and p and hence obtan the matrx s by usng some scalng actor λ. Note that λ,j s j =,j ln q j =,j p j ln q j p j Inormaton theory tells us that the above sum s strctly less than 0 p q, whch s a desred property. Also, note that the score S n bts o a gven segment (see prevous lecture) s S = log eλs K Snce S s a sum o terms o the orm λ ln q j, e λs s a product o terms o the orm q j. Thereore, S relect the log lkelhood o observng the algnment due to substtuton (governed probablstcally by q) relatve to smply by chance (governed probablstcally by p). Dvson by the constant K adjusts the score to account or the rate o observng maxmal scorng segments as descrbed n the prevous lecture.

2 Here s another ntutve approach that justes the above scheme. To construct a substtuton matrx to score proten algnments, a amly o protens can be consdered, and a multple algnment o all the proten sequences n the amly s obtaned. Agan, we are consderng algnments wth no gaps; thereore, we assume sequences have the same length (a vald assumpton they are related) and, thereore, the multple algnment s trval. or any par o amno acds and j we need q j, the probablty o observng algned wth j (whch s same as q j ), and p, the probablty o observng an. The queston s, n an algnment (ungapped) o sequences x and y that algns two o ther amno acds and j, dd ths happen by chance or was ndeed because o a mutaton rom to j or vce versa? To capture ths complementary behavors, we consder two models: M, where x and y are related and obtaned accordng to the jont probabltes q j, and R, where x and y are unrelated and obtaned ndependently at random accordng to the ndvdual probabltes p and p j. Consderng ths, now the score s the lkelhood that the sequences are related compared relatve to them beng unrelated. Ths s called the odds rato and s mathematcally expressed as: P (x, y M) score(x, y) = P (x, y R) = q x y p x p. y Ths ormula says that the score o the (ungapped) algnment s the probablty that the symbols o x and y are algned because they are related, relatve to the probablty o ther symbols beng algned just by chance. For two algned amno acds and j, we take s pont o vew, what s the probablty to see a j on the other sde? Well, ths s the probablty that an mutated nto a j, p( j). However, there s a mere chance o p j or a random occurence o a j as well. Hence, the probablty rato p( j) p j relects how much beleves that ths j s related to t. Now, snce q j = = p p( j) (the probablty o observng an and ts mutated orm), we can also express the lkelhood as: q j p( j) = p j. By dong ths or every par o algned symbols n the algnment and ndng the product o the terms, we obtan the ormula above, whch relect how much we beleve that the two entre sequences are related. In all the algnment algorthms we have seen so ar, we reled on the act that the score s addtve, and ths was a key property or the dynamc programmng to work. In ths case - when the score s computed as,j multplcatve. In order to make t addtve we can take the log and compute: log,j =,j log q x y. Ths s called the log-odds rato. Thereore, the values makng up the sum wll be the ndvdual scores ound n the matrx, hence s j = log qj up to some scalng actor. Note that ths s symmetrc, so scorng algned wth j s the same as scorng j algned wth, hence, the drecton o the algnment s not mportant (but one could n prncple make a dstncton needed). Now the mportant queston: how to compute p, p j, and q j? We re gong to look at two ways o computng ths: PAM and BLOSUM matrces. PAM (Pont Accepted Mutatons) matrces q x y q x y - t s PAM stands or Pont Accepted Mutatons. An accepted mutaton s dened as a mutaton that was postvely selected by the envronment and dd not caused the demse o the organsm. A PAM matrx M holds the probablty o beng replaced by j n a certan evolutonary tme perod. The longer the evolutonary perod o tme, the more dcult t s to determne the correct values. The reason beng that could mutate several tmes beore becomng a j, and t wll be hard to capture all these ntermedate mutatons, snce we only observe and j. What we are gong to do s look over mutatons that occurred n a relatvely short evolutonary perod o tme. One unt o evoluton s dened to be the amount o evoluton that changes, on the average, n 00 amno acds. Consderng ths unt, a -PAM matrx s rst computed. Usng ths as a startng pont, a k-pam matrx can be generated rom the -PAM matrx. For a -PAM matrx M, M j s gong to be p( j) scaled by a actor, such that the expected number o mutatons s 0.0; n other words t s the same as havng the probablty o n 00 or a mutaton to occur. The computatonal steps that lead to the -PAM matrx are: 2

3 Compute p or every. Compute p( j) or every par and j and let M j = p( j). Scale M such that the expected number o mutatons p ( M ) s 0.0. M Use s j = 0 log j 0 p j to obtan the addtve scores. s j s rounded to an nteger and here, the scalng actor 0 s used just to provde a better nteger approxmaton. Next we ll take a closer look at each o these steps. Let the requency count j be the number o tmes s algned wth j countng both drectons. Then, let the number o occurrences o, = j j, and the count o all characters =. Now, we can estmate p j = j, whose meanng s smply the rate at whch was ound to be algned wth j. Smlarly, the rate o ndng an occurence o s p =. Now, havng both p j and p determned, the elements o the matrx M are beng computed as: M j = p( j) = pj p. M s ndeed a probablty matrx, and ths can be proved by notng that j M j =. To llustrate ths step o computng a matrx M, let s have a quck example. Let the algnment be: In ths case, the requences are: hence the estmated probabltes: A B A A AB = BA = AA = 2 A = X AX = AB + AA = B = X BX = BA = = X X = 4 The matrx M wll be: The expected number o mutatons s X p AB = AB = 4 p BA = BA = 4 p AA = AA = 2 p A = A = 4 p B = B = 4 p(a B) = p AB p A = p(b A) = p BA p B = M = [ 2 0 p X ( M XX ) = p A ( M AA ) + p B ( M BB ) = 4 ( 2 ) + ( 0) = 0.5 = 50% 4 The next step s the scalng o M such that t s consstent wth the denton o a -PAM matrx: n 00 expected mutatons. Suppose matrx M s elements, M j, are scaled by a actor α. In ths case the new values become M j = αm j. Ths wll change the values o the row sums such that j M j = α. Snce we want a probablty matrx - every row sums up to - a small adjustment s needed: we wll add α to every element on the man dagonal: ] M j = αm j, j M = αm + α Ths wll restore the property o a probablty matrx. Now what should α be? Let s compute the new expected number o mutatons: p ( M ) = p ( αm + α) = α p ( M )

4 Ths s just α multpled by the old expected number o mutatons. Thereore, we can set α approprately. For nstance, n the example above, α = Havng a -PAM matrx computed, the queston s how to compute a 2-PAM matrx? In other words, what s the probablty p 2 ( j) o mutatng nto j n two unts o evoluton. Ths s the probablty o mutatng nto k, or some k, n the rst unt o evoluton, and then, k mutatng nto j n the second unt o evoluton. Mathematcally, ths can be expressed as: p 2 ( j) = k p( k)p(k j) = k M k M kj Ths s the ormula used to obtan the entry correspondng to the par and j when multply M by tsel. Hence, the 2-PAM matrx s just M 2. An analogous step s used to show that the k-pam matrx s the same as M k. When workng wth a k-pam probablty matrx the score wll be computed n the same way: s k j = 0 log M k j 0 p j. The only change s that now the values o M k are plugged nstead o those o M. BLOSUM (BLOCKS Substtuton Matrces) matrces As mentoned earler, BLOSUM are another type o matrces used n scorng sequence algnments. They are ntended to be used or scorng smlartes o proten sequences that are evolutonary ar apart (dstant). Computng ther values s done usng the normaton stored n a database o blocks (called the BLOCKS database) where each block s a multple ungapped algnment o related proten sequences. The sequences o each block are clustered, puttng two sequences nto the same cluster ther percentage o matchng algned resdues - or level o smlarty - s above a certan threshold L%. We dene two sequences to be dstant they all n derent clusters. Thereore, two dstant sequences der by at least (00 L)%. The computaton o BLOSUM-L, or a partcular value o L, s based on countng the number o mutatons among dstant sequences only. Thereore, lower values o L correspond to longer evolutonary tmes, and are applcable or more dstant sequences. As explaned above, n computng a BLOSUM-L matrx s entres, we want to count the number o mutatons between dstant sequences only - the ones that are less than L% smlar. The value ab s the relatve requency o seeng a algned wth b. Whenever such an algnment s observed or two sequences that are n derent clusters, ab s ncremented by n n 2, where n and n 2 are the szes o the two clusters (we scale by the sze o the cluster snce larger clusters are more lkely to contan mutatons). The steps through whch a matrx s computed are: Estmate p = j j ; Estmate q j = k,l j j k,l kl ; BLOSUM-L(,j)= log q j wth some scalng actor λ. Consder an example where sequences are generated at random (so we are not usng the BLOCKS database here) such that p A = p G = p C = p T = 4 and the level o smlarty s 50%,.e. the probablty that two algned resdues are the same s 0.5. Then L = 50%, we expect to have one cluster, where: and p AA = p GG = p CC = p T T = = 8 p AG = p AC = p AT = p GA = p GC = p GT = p CA = p CG = p CT = p T A = p T G = p T C = = 24 Then a match wll have a score and a msmatch wll have a score m = log /8 /4./4 = s = log /24 /4./4 =

5 Reerences Setubal J., Medans, J., Introducton to Molecular Bology, Chapter. Drubn R. et al., Bologcal Sequence Analyss, Chapter 2. 5

Search sequence databases 2 10/25/2016

Search sequence databases 2 10/25/2016 Search sequence databases 2 10/25/2016 The BLAST algorthms Ø BLAST fnds local matches between two sequences, called hgh scorng segment pars (HSPs). Step 1: Break down the query sequence and the database

More information

be the i th symbol in x and

be the i th symbol in x and 2 Parwse Algnment We represent sequences b strngs of alphetc letters. If we recognze a sgnfcant smlart between a new sequence and a sequence out whch somethng s alread know, we can transfer nformaton out

More information

General Tips on How to Do Well in Physics Exams. 1. Establish a good habit in keeping track of your steps. For example, when you use the equation

General Tips on How to Do Well in Physics Exams. 1. Establish a good habit in keeping track of your steps. For example, when you use the equation General Tps on How to Do Well n Physcs Exams 1. Establsh a good habt n keepng track o your steps. For example when you use the equaton 1 1 1 + = d d to solve or d o you should rst rewrte t as 1 1 1 = d

More information

CS 331 DESIGN AND ANALYSIS OF ALGORITHMS DYNAMIC PROGRAMMING. Dr. Daisy Tang

CS 331 DESIGN AND ANALYSIS OF ALGORITHMS DYNAMIC PROGRAMMING. Dr. Daisy Tang CS DESIGN ND NLYSIS OF LGORITHMS DYNMIC PROGRMMING Dr. Dasy Tang Dynamc Programmng Idea: Problems can be dvded nto stages Soluton s a sequence o decsons and the decson at the current stage s based on the

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

ECEN 5005 Crystals, Nanocrystals and Device Applications Class 19 Group Theory For Crystals

ECEN 5005 Crystals, Nanocrystals and Device Applications Class 19 Group Theory For Crystals ECEN 5005 Crystals, Nanocrystals and Devce Applcatons Class 9 Group Theory For Crystals Dee Dagram Radatve Transton Probablty Wgner-Ecart Theorem Selecton Rule Dee Dagram Expermentally determned energy

More information

Eigenvalues of Random Graphs

Eigenvalues of Random Graphs Spectral Graph Theory Lecture 2 Egenvalues of Random Graphs Danel A. Spelman November 4, 202 2. Introducton In ths lecture, we consder a random graph on n vertces n whch each edge s chosen to be n the

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

Random Walks on Digraphs

Random Walks on Digraphs Random Walks on Dgraphs J. J. P. Veerman October 23, 27 Introducton Let V = {, n} be a vertex set and S a non-negatve row-stochastc matrx (.e. rows sum to ). V and S defne a dgraph G = G(V, S) and a drected

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41, The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no confuson

More information

Spring Force and Power

Spring Force and Power Lecture 13 Chapter 9 Sprng Force and Power Yeah, energy s better than orces. What s net? Course webste: http://aculty.uml.edu/andry_danylov/teachng/physcsi IN THIS CHAPTER, you wll learn how to solve problems

More information

University of Washington Department of Chemistry Chemistry 452/456 Summer Quarter 2014

University of Washington Department of Chemistry Chemistry 452/456 Summer Quarter 2014 Lecture 16 8/4/14 Unversty o Washngton Department o Chemstry Chemstry 452/456 Summer Quarter 214. Real Vapors and Fugacty Henry s Law accounts or the propertes o extremely dlute soluton. s shown n Fgure

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Lecture 2 Solution of Nonlinear Equations ( Root Finding Problems )

Lecture 2 Solution of Nonlinear Equations ( Root Finding Problems ) Lecture Soluton o Nonlnear Equatons Root Fndng Problems Dentons Classcaton o Methods Analytcal Solutons Graphcal Methods Numercal Methods Bracketng Methods Open Methods Convergence Notatons Root Fndng

More information

Split alignment. Martin C. Frith April 13, 2012

Split alignment. Martin C. Frith April 13, 2012 Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some

More information

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0 Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms CSE 53 Lecture 4 Dynamc Programmng Junzhou Huang, Ph.D. Department of Computer Scence and Engneerng CSE53 Desgn and Analyss of Algorthms The General Dynamc Programmng Technque

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Physics 2A Chapters 6 - Work & Energy Fall 2017

Physics 2A Chapters 6 - Work & Energy Fall 2017 Physcs A Chapters 6 - Work & Energy Fall 017 These notes are eght pages. A quck summary: The work-energy theorem s a combnaton o Chap and Chap 4 equatons. Work s dened as the product o the orce actng on

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Temperature. Chapter Heat Engine

Temperature. Chapter Heat Engine Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the

More information

A Simple Research of Divisor Graphs

A Simple Research of Divisor Graphs The 29th Workshop on Combnatoral Mathematcs and Computaton Theory A Smple Research o Dvsor Graphs Yu-png Tsao General Educaton Center Chna Unversty o Technology Tape Tawan yp-tsao@cuteedutw Tape Tawan

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Download the files protein1.txt and protein2.txt from the course website.

Download the files protein1.txt and protein2.txt from the course website. Queston 1 Dot plots Download the fles proten1.txt and proten2.txt from the course webste. Usng the dot plot algnment tool http://athena.boc.uvc.ca/workbench.php?tool=dotter&db=poxvrdae, algn the proten

More information

Chapter 6. Operational Amplifier. inputs can be defined as the average of the sum of the two signals.

Chapter 6. Operational Amplifier.  inputs can be defined as the average of the sum of the two signals. 6 Operatonal mpler Chapter 6 Operatonal mpler CC Symbol: nput nput Output EE () Non-nvertng termnal, () nvertng termnal nput mpedance : Few mega (ery hgh), Output mpedance : Less than (ery low) Derental

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Lecture 10: May 6, 2013

Lecture 10: May 6, 2013 TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,

More information

: Numerical Analysis Topic 2: Solution of Nonlinear Equations Lectures 5-11:

: Numerical Analysis Topic 2: Solution of Nonlinear Equations Lectures 5-11: 764: Numercal Analyss Topc : Soluton o Nonlnear Equatons Lectures 5-: UIN Malang Read Chapters 5 and 6 o the tetbook 764_Topc Lecture 5 Soluton o Nonlnear Equatons Root Fndng Problems Dentons Classcaton

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Multple sequence algnment Parwse sequence algnment ( and ) Substtuton matrces Database searchng Maxmum Lelhood Estmaton Observaton: Data, D (HHHTHHTH) What process generated ths data? Alternatve hypothess:

More information

Week 11: Chapter 11. The Vector Product. The Vector Product Defined. The Vector Product and Torque. More About the Vector Product

Week 11: Chapter 11. The Vector Product. The Vector Product Defined. The Vector Product and Torque. More About the Vector Product The Vector Product Week 11: Chapter 11 Angular Momentum There are nstances where the product of two vectors s another vector Earler we saw where the product of two vectors was a scalar Ths was called the

More information

Solutions to Problem Set 6

Solutions to Problem Set 6 Solutons to Problem Set 6 Problem 6. (Resdue theory) a) Problem 4.7.7 Boas. n ths problem we wll solve ths ntegral: x sn x x + 4x + 5 dx: To solve ths usng the resdue theorem, we study ths complex ntegral:

More information

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Lecture 7: Boltzmann distribution & Thermodynamics of mixing Prof. Tbbtt Lecture 7 etworks & Gels Lecture 7: Boltzmann dstrbuton & Thermodynamcs of mxng 1 Suggested readng Prof. Mark W. Tbbtt ETH Zürch 13 März 018 Molecular Drvng Forces Dll and Bromberg: Chapters

More information

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

COMPLEX NUMBERS AND QUADRATIC EQUATIONS COMPLEX NUMBERS AND QUADRATIC EQUATIONS INTRODUCTION We know that x 0 for all x R e the square of a real number (whether postve, negatve or ero) s non-negatve Hence the equatons x, x, x + 7 0 etc are not

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Physics 2A Chapter 3 HW Solutions

Physics 2A Chapter 3 HW Solutions Phscs A Chapter 3 HW Solutons Chapter 3 Conceptual Queston: 4, 6, 8, Problems: 5,, 8, 7, 3, 44, 46, 69, 70, 73 Q3.4. Reason: (a) C = A+ B onl A and B are n the same drecton. Sze does not matter. (b) C

More information

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Answers Problem Set 2 Chem 314A Williamsen Spring 2000 Answers Problem Set Chem 314A Wllamsen Sprng 000 1) Gve me the followng crtcal values from the statstcal tables. a) z-statstc,-sded test, 99.7% confdence lmt ±3 b) t-statstc (Case I), 1-sded test, 95%

More information

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances ec Annexes Ths Annex frst llustrates a cycle-based move n the dynamc-block generaton tabu search. It then dsplays the characterstcs of the nstance sets, followed by detaled results of the parametercalbraton

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Lecture Nov

Lecture Nov Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances

More information

Finite Difference Method

Finite Difference Method 7/0/07 Instructor r. Ramond Rump (9) 747 698 rcrump@utep.edu EE 337 Computatonal Electromagnetcs (CEM) Lecture #0 Fnte erence Method Lecture 0 These notes ma contan coprghted materal obtaned under ar use

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Mixture o f of Gaussian Gaussian clustering Nov

Mixture o f of Gaussian Gaussian clustering Nov Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:

More information

Hopfield Training Rules 1 N

Hopfield Training Rules 1 N Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

Statistics and Quantitative Analysis U4320. Segment 3: Probability Prof. Sharyn O Halloran

Statistics and Quantitative Analysis U4320. Segment 3: Probability Prof. Sharyn O Halloran Statstcs and Quanttatve Analyss U430 Segment 3: Probablty Prof. Sharyn O Halloran Revew: Descrptve Statstcs Code book for Measures Sample Data Relgon Employed 1. Catholc 0. Unemployed. Protestant 1. Employed

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Formulas for the Determinant

Formulas for the Determinant page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics ) Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton

More information

Complex Variables. Chapter 18 Integration in the Complex Plane. March 12, 2013 Lecturer: Shih-Yuan Chen

Complex Variables. Chapter 18 Integration in the Complex Plane. March 12, 2013 Lecturer: Shih-Yuan Chen omplex Varables hapter 8 Integraton n the omplex Plane March, Lecturer: Shh-Yuan hen Except where otherwse noted, content s lcensed under a BY-N-SA. TW Lcense. ontents ontour ntegrals auchy-goursat theorem

More information

Norms, Condition Numbers, Eigenvalues and Eigenvectors

Norms, Condition Numbers, Eigenvalues and Eigenvectors Norms, Condton Numbers, Egenvalues and Egenvectors 1 Norms A norm s a measure of the sze of a matrx or a vector For vectors the common norms are: N a 2 = ( x 2 1/2 the Eucldean Norm (1a b 1 = =1 N x (1b

More information

DIFFERENTIAL SCHEMES

DIFFERENTIAL SCHEMES DIFFERENTIAL SCHEMES RAYMOND T. HOOBLER Dedcated to the memory o Jerry Kovacc 1. schemes All rngs contan Q and are commutatve. We x a d erental rng A throughout ths secton. 1.1. The topologcal space. Let

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Quantum Mechanics I - Session 4

Quantum Mechanics I - Session 4 Quantum Mechancs I - Sesson 4 Aprl 3, 05 Contents Operators Change of Bass 4 3 Egenvectors and Egenvalues 5 3. Denton....................................... 5 3. Rotaton n D....................................

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

Math 217 Fall 2013 Homework 2 Solutions

Math 217 Fall 2013 Homework 2 Solutions Math 17 Fall 013 Homework Solutons Due Thursday Sept. 6, 013 5pm Ths homework conssts of 6 problems of 5 ponts each. The total s 30. You need to fully justfy your answer prove that your functon ndeed has

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

Chapter 3 Differentiation and Integration

Chapter 3 Differentiation and Integration MEE07 Computer Modelng Technques n Engneerng Chapter Derentaton and Integraton Reerence: An Introducton to Numercal Computatons, nd edton, S. yakowtz and F. zdarovsky, Mawell/Macmllan, 990. Derentaton

More information

SIMPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION Smple Lnear Regresson and Correlaton Introducton Prevousl, our attenton has been focused on one varable whch we desgnated b x. Frequentl, t s desrable to learn somethng about the relatonshp between two

More information

Journal of Universal Computer Science, vol. 1, no. 7 (1995), submitted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Springer Pub. Co.

Journal of Universal Computer Science, vol. 1, no. 7 (1995), submitted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Springer Pub. Co. Journal of Unversal Computer Scence, vol. 1, no. 7 (1995), 469-483 submtted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Sprnger Pub. Co. Round-o error propagaton n the soluton of the heat equaton by

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms Course organzaton 1 Introducton Week 1-2) Course ntroducton A bref ntroducton to molecular bology A bref ntroducton to sequence comparson Part I: Algorthms for Sequence Analyss Week 3-8) Chapter 1-3 Models

More information

BIOINFORMATICS: PAST, PRESENT AND FUTURE. Susan R. Wilson Mathematical Sciences Institute, Australian National University, Australia

BIOINFORMATICS: PAST, PRESENT AND FUTURE. Susan R. Wilson Mathematical Sciences Institute, Australian National University, Australia BIOINFORMATICS: PAST, PRESENT AND FUTURE Susan R. Wlson Mathematcal Scences Insttute, Australan Natonal Unversty, Australa Keywords: Bonformatcs, bologcal sequence analyss, sequence algnment, hdden Markov

More information

Course organization. Part II: Algorithms for Network Biology (Week 12-16)

Course organization. Part II: Algorithms for Network Biology (Week 12-16) Course organzaton Introducton Week 1-2) Course ntroducton A bref ntroducton to molecular bology A bref ntroducton to sequence comparson Part I: Algorthms for Sequence Analyss Week 3-11) Chapter 1-3 Models

More information

ESCI 341 Atmospheric Thermodynamics Lesson 10 The Physical Meaning of Entropy

ESCI 341 Atmospheric Thermodynamics Lesson 10 The Physical Meaning of Entropy ESCI 341 Atmospherc Thermodynamcs Lesson 10 The Physcal Meanng of Entropy References: An Introducton to Statstcal Thermodynamcs, T.L. Hll An Introducton to Thermodynamcs and Thermostatstcs, H.B. Callen

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

On the Repeating Group Finding Problem

On the Repeating Group Finding Problem The 9th Workshop on Combnatoral Mathematcs and Computaton Theory On the Repeatng Group Fndng Problem Bo-Ren Kung, Wen-Hsen Chen, R.C.T Lee Graduate Insttute of Informaton Technology and Management Takmng

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7 Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every

More information

7. Products and matrix elements

7. Products and matrix elements 7. Products and matrx elements 1 7. Products and matrx elements Based on the propertes of group representatons, a number of useful results can be derved. Consder a vector space V wth an nner product ψ

More information