Susttuton Mtrces nd Algnment Sttstcs BMI/CS 776 www.ostt.wsc.edu/~crven/776.html Mrk Crven crven@ostt.wsc.edu Ferur 2002 Susttuton Mtrces two oulr sets of mtrces for roten seuences PAM mtrces [Dhoff et l. 1978] BLOSUM mtrces [Henkoff & Henkoff 1992] oth tr to cture the the reltve susttutlt of mno cd rs n the contet of evoluton 1
Susttuton Mtr Motvton consder smlest lgnment: unged glol lgnment of two seuences nd of length n n scorng ths lgnment we d lke to ssess M R seuences hve common ncestor seuences re lgned chnce we d lke our susttuton mtr to score n lgnment estmtng ths rto Susttuton Mtrces: Bsc Ide let e the freuenc of mno cd consder cse where lgnment of nd s rndom: R let e the rolt tht nd derved from common ncestor then the cse where the lgnment s due to common ncestr s: M 2
3 Susttuton Mtrces: Bsc Ide the odds rto of these two lterntves s gven : R M tkng the we get: R M Susttuton Mtrces: Bsc Ide the score for n lgnment s thus gven : the susttuton mtr score for the r s then gven : s ( ( R M s S
PAM Mtrces ut how do we get vlues for (rolt tht nd rose from common ncestor? t deends on how long go seuences dverged dverged recentl: 0 for dverged long go: PAM roch: estmte the rolt tht ws susttuted for n gven mesure of evolutonr dstnce PAM Mtrces ke de: trusted lgnments of closel relted seuences rovde nformton out ocll ermssle muttons ste 1: for 71 roten fmles constructed hothetcl henetc trees from trees flled mtr A wth numer of oserved susttutons oserve: 4
PAM Mtrces ste 2: from A clculte mtr contnng A A c c ste 3: normlze ths mtr so the eected numer of susttutons s 1% of the roten (PAM-1 t 1 PAM Mtrces there s whole fml of mtrces: PAM-10... PAM-250 these mtrces re etrolted from PAM-1 mtr ( mtr multlcton PAM s reltve mesure of evolutonr dstnce 1 PAM 1 cceted mutton er 100 mno cds 250 PAM 2.5 cceted muttons er mno cd 5
PAM Mtrces ste 4: determne the susttuton mtr s t ( P( t BLOSUM Mtrces smlr de to PAM mtrces roltes estmted from more dstntl relted rotens locks of seuence frgments tht reresent structurll conserved regons trnston freuences oserved drectl dentfng locks tht re t lest 45% dentcl (BLOSUM-45 50% dentcl (BLOSUM-50 62% dentcl (BLOSUM-62 etc. 6
PAM 250 Mtr DNA vs. Proten Comrson If the seuence of nterest encodes roten comre t the roten seuence level: mn chnges n DNA seuences do not chnge the encoded roten susttuton mtrces for roten seuences reresent ochemcl nformton 7
Sttstcs of Algnment Scores (how to choose threshold for S for gven S we cn clculte the rolt we would get mtch wth score > S under rndom model (where we re lgnng of lrge numer of unrelted seuences now turn ths round: set S so tht ths rolt s smll; thus the mtches we get re lkel to e sgnfcnt Dstruton of Scores Krln & Altschul PNAS 1990 consder rndom model n whch we re lookng for HSPs (hgh scorng unged locl lgnments the lengths of the seuences n ech r re m nd n the rolt tht there s HSP wth score greter thn S s gven : > S 1 e Kmne λs ths comes from n etreme vlue dstruton 8
Dstruton of Scores > S 1 e Kmne λs S s gven score threshold m nd n re the lengths of the seuences under consderton K nd λ re constnts tht cn e clculted from the susttuton mtr the freuences of the ndvdul mno cds Sttstcs of Algnment Scores gven ths set S so tht rolt of gettng score > S chnce s ver smll (0.05 or less ths nlss ssumes unged lgnments ll resdues drwn ndeendentl eected score for r of rndoml chosen resdues s negtve: 20 s < 0 j 1 comuttonl eerments suggest nlss holds for ged lgnments (ut K nd λ must e estmted from dt j j 9