Introduction to Sequence Analysis
|
|
- Donna Jenkins
- 5 years ago
- Views:
Transcription
1 References Introducton to Seuence Analyss Chaters 2 & 7 of Bologcal Seuence Analyss (Durbn et al., 2001) Utah State Unversty Srng 2012 STAT 5570: Statstcal Bonformatcs Notes Revew Genes are: - seuences of DNA that do somethng - can be exressed as a strng of: nuclec acds: A,C,G,T (4-letter alhabet) Central Dogma of Molecular Bology DNA mrna roten bo. acton rotens can be exressed as a strng of: amno acds: (20-letter alhabet) (sometme 24 due to smlartes ) Why look at roten seuence? Levels of roten structure rmary structure: order of amno acds Secondary structure: reeatng structures (beta-sheets and alha-helces) n backbone Tertary structure: full three-dmensonal folded structure Quartenary structure: nteracton of multle backbones Seuence shae functon Smlar seuence smlar functon -? 3 4
2 Consder smle arwse algnment Seuence 1: HEAGAWGHEE Seuence 2: AWHEAE ossble algnments Seuence 1: HEAGAWGHEE Seuence 2: AWHEAE How smlar are these two seuences? Match u exactly? Subseuences smlar? Whch ostons could be ossbly matched wthout severe enalty? Algnment 1: HEAGAWGHEE AWHEAE Algnment 2: HEAGAWGHEE AW-HE-AE Algnment 3: HEA-GAWGHEE AWHEAE Algnment 4: HEAGAWGHE-E AW-HEAE To fnd the best algnment, need some way to: rate algnments Thnk of gas n algnment as: mutatonal nserton or deleton 5 6 Basc dea of scorng otental algnments + score: denttes and conservatve substtutons - score: non- conservatve changes - (not exected n real algnments) Add score at each oston Euvalent to assumng mutatons are: ndeendent Reasonable assumton for DNA and rotens but not structural RNA s Some Notaton a ab { a, b from common ancestor} Let x be seuence 1, and Random Model : Matched Model : assume ndeendence of seuences fre. of letter a n seuence, x, y R y be seuence 2. x, y M x j x y assume resdues a & b are algned as a ar wth rob. ab y j 7 8
3 Comare these two models Odds Rato : Need : ab x, y M x, y R Log Odds Rato : S x s( x, y ), x y ab where s( a, b) log a y b log lkelhood rato of ar (a,b) occurrng as algned ar, as oosed to unalgned ar Score Matrx or substtuton matrx A R N D... Y V A R N D s(a,b) Y V 0 3 Ths s a orton of the BLOSUM50 substtuton matrx; others exst. These are scaled and rounded log-odds values (for comutatonal effcency) 9 10 How to get these substtuton values? Basc dea: Look at exstng, known algnments Comare seuences of algned rotens and look at substtuton freuences Ths s a chcken-or-the-egg roblem: - algnment - - scorng scheme - Maybe better to base algnment on: tertary structures (or some other algnment) Some substtuton matrx tyes BLOSUM (Henkoff) BLOCK substtuton matrx derved from BLOCKS database set of algned ungaed roten famles, clustered accordng to threshold ercentage (L) of dentcal resdues comare resdue freuences between clusters L=50 BLOSUM50 AM (Dayhoff) ercentage of accetable ont mutatons er 10 8 years derved from a general model for roten evoluton, based on number L of AMs (evolutonary dstance) AM1 from comarng seuences wth <1% dvergence L=250 AM250 = AM1^
4 Whch substtuton matrx to use? No unversal best way In general: low AM fnd short algnments of smlar se. hgh AM fnd longer, weaker local algnments BLOSUM standards: BLOSUM50 for algnment wth gas BLOSUM62 for ungaed algnments hgher AM, lower BLOSUM more dvergent (lookng for more dstantly related rotens) A reasonable strategy: BLOSUM62 comlemented wth AM250 Whch matrx for algnng DNA seuences? The BLOSUM and AM matrces are based on smlartes between amno acds - no such smlarty assumed for nuclec acds; resdues ether match or they don t Untary matrx: dentty matrx +1 for dentcal match (or +3 or ) 0 for non-match (or -2 or ) How to score gas? Tabular reresentaton of algnment One way: affne ga enalty length of ga lnear transformaton followed by a translaton ( g ) d ( g 1) e ga oenng enalty ga extenson enalty (e < d) Thnk of gas n algnment as: mutatonal nserton or deleton start wth 0 0 A W H E A E H E A G A W G H E E begn (or contnue) ga: -d (or -e) match letters (resdues): + s(a,b) Fll n table to gve max. of ossble values at each successve element kee track of whch drecton generated max. then use the ath that gves hghest fnal score (lower rght corner) 15 16
5 Algnment algorthms Global: Needleman-Wunsch - fnd otmal algnment for entre seuences (rev. slde) Local: Smth-Waterman - fnd otmal algnment for subseuences Reeated matches - allow for startng over seuences (fnd motfs n long seuences) Overla matches - allow for one seuence to contan or overla the other (for comarng fragments) Heurstc: BLAST, FASTA - for comarng a sngle seuence aganst a large database of seuences Comare global and local algnments Seuence 1: HEAGAWGHEE Seuence 2: AWHEAE Global arwse Algnment (1 of 1) attern: [1] HEAGAWGHE-E subject: [1] ---AW-HEAE score: 23 Local arwse Algnment (1 of 1) attern: [5] AWGHE-E subject: [2] AW-HEAE score: Smle arwse algnment n R lbrary(bostrngs) # Defne seuences se1 <- "HEAGAWGHEE" se2 <- "AWHEAE" # erform global algnment g.algn <- arwsealgnment(se1, se2, substtutonmatrx='blosum50', gaoenng=-4, gaextenson=-1, tye='global') g.algn # erform local algnment l.algn <- arwsealgnment(se1, se2, substtutonmatrx='blosum50', gaoenng=-4, gaextenson=-1, tye='local') l.algn Look at a bgger examle The arsesm ackage (not n current Boconductor) has a comanon fle (ex.fasta) wth seuence data for 67 roten seuences n FASTA format: >At1g01010 NAC doman roten, utatve MEDQVGFGFRNDEELVGHYLRNKIEGNTSRDVEVAISEVNICSYDWNLRFQSKYKSRD... VISWIILVG >At1g01020 unknown roten MAASEHRCVGCGFRVKSLFIQYSGNIRLMKCGNCKEVADEYIECERMIIFIDLILHRK VYRHVLYNAINATVNIQHLLWKLVFAYLLLDCYRSLLLRKSDEESSFSDSVLLSIKVR SFLFNGLN >At1g01030 DNA-bndng roten, utatve MDLSLATTTTSSDQEQDRDQELTSNIGASSSSGSGNNNNLMMMIEKEHMFDKVV... EESWLVRGEIGASSSSSSALRLNLSTDHDDDNDDGDDGDDDQFAKKGKSSLSLNFN >At1g01040 CAF roten MVMEDEREATIKSYWLDACEDISCDLIDDLVSEFDSSVAVNESTDENGVINDFFGGI... DKDRKRARVCSYQSERSNLSGRGHVNNSREGDRFMNRKRTRNWDEAGNNKKKRECNNYRR htt:// 20
6 Bgger examle: For a gven seuence (subject), "At1g01010 NAC doman roten, utatve" fnd the most smlar seuence n a lst (attern) "At1g01190 cytochrome 450, utatve" Global arwse Algnment (1 of 1) attern: [1] MRTEIESLWVF-----ALASKFNIYMQQHFASLL---VAIAITWFTITI... subject: [1] MEDQVG--FGFRNDEELVGH---YLRNKIEGNTSRDVEVAIS EVNIC... score: 313 # read n data n FASTA format f1 <- "C://folder//ex.fasta" # fle saved from webste (slde 20) ff <- read.aastrngset(f1, "fasta") # comare frst seuence (subject) wth the others (attern) sub <- ff[1] names(sub) # "At1g01010 NAC doman roten, utatve" at <- ff[2:length(ff)] # get scores of all global algnments s <- arwsealgnment(at, sub, substtutonmatrx='am250', gaoenng=-4, gaextenson=-1, tye='global', scoreonly=true) hst(s, man=c('global algnment scores wth',names(sub))) # look at best algnment k <- whch.max(s) names(at[k]) # "At1g01190 cytochrome 450, utatve" arwsealgnment(at[k], sub, substtutonmatrx='am250', gaoenng=-4, gaextenson=-1, tye='global') (names refer to gene name or locus) hylogenetc trees ntro & motvaton hylogeny: relatonsh among seces hylogenetc tree: vsualzaton of hylogeny (usually a dendrogram) How can we do ths here? Consder multle seuences (maybe from dfferent seces) Smlar seuences are called homologues - descended from common ancestor seuence? - smlar functon? Want to vsualze these relatonshs Quck revew of agglomeratve clusterng - defne dstance between onts - each ont (seuence here) starts as ts own cluster - fnd closest clusters and merge them - Lnkage: how to defne dstance between new cluster and exstng clusters 23 24
7 Recall lnkage methods (a few) Defnng dstance between seuences & j Let,, be clusters, d d be the dstance, be thedstance between and the new, cluster, and n be thenumber of onts n cluster. Sngle (nearest neghbor) : d mn d Average : d Ward : d n n d n n d nd UGMA : d n d n n d n / 2 n n d, d n d Why not Eucldean, earson, etc.? - seuences are not onts n sace Could use (after arwse algnment): 1 normalzed score {score (or 0) dvded by smaller selfscore} 1 %dentty based on length of shorter seuence 1 %smlarty Makng use of models for resdue substtuton (for DNA): Let f = fracton of stes n arwse algnment where resdues dffer = 1 - %dentty Jukes-Cantor dstance: d j log1 4 f / Vsualze relatonshs among 11 seuences from ex.fasta fle # Functon to get hylogenetc dstance matrx for multle seuences # -- don't worry about syntax here; just see next slde for usage get.hylo.dst <- functon(ses,subm='blosum62',oen=-4,ext=-1,tye='local') { # Get matrx of arwse local algnment scores num.se <- length(ses) s.mat <- matrx(ncol=num.se, nrow=num.se) for( n 1:num.se) { for(j n :num.se) { s.mat[,j] <- s.mat[j,] <- arwsealgnment(ses[], ses[j], substtutonmatrx=subm, gaoenng=oen, gaextenson=ext, tye=tye, scoreonly=true) } } # Convert scores to normalzed scores norm.mat <- matrx(ncol=num.se, nrow=num.se) for( n 1:num.se) { for(j n :num.se) { mn.self <- mn(s.mat[,],s.mat[j,j]) norm.mat[,j] <- norm.mat[j,] <- s.mat[,j]/mn.self } norm.mat[,] <- 0 } } # Return dstance matrx colnames(norm.mat) <- rownames(norm.mat) <- substr(names(ses),1,9) return(as.dst(1-norm.mat)) 27 28
8 R code for hylogenetc trees from arwse dstances # Choose seuences ses <- ff[50:60] # recall ff object from slde 22 # hylogenetc tree dmat <- get.hylo.dst(ses,subm='blosum62',tye='local') lot(hclust(dmat,method="average"),man='hylogenetc Tree', xlab='normalzed Score') # heatma reresentaton lbrary(cluster) lbrary(rcolorbrewer) hmcol <- colorramalette(brewer.al(10,"uor"))(256) hclust.ave <- functon(d){hclust(d,method="average")} heatma(as.matrx(dmat),sym=true,col=hmcol, cexrow=4,cexcol=1,hclustfun=hclust.ave) Asde: vsualzng seuence content tab <- table(strslt(as.character(ff[1]),"")) use.col <- re('yellow',length(tab)) t <- names(tab)=='s' use.col[t] <- 'blue' barlot(tab,col=use.col,man=names(ff[1])) robably more useful for: assessng C-G counts n DNA seuences lbrary(affy); lbrary(hgu95av2.db); lbrary(annotate) GI <- as.lst(hgu95av2accnum) n.gi <- names(gi) t <- n.gi=="1950_s_at" se <- getseq(gi[t]) tab <- table(strslt(se,"")) use.col <- re('yellow', length(tab)) t <- names(tab)=='g' use.col[t] <- 'blue' barlot(tab,col=use.col, man="seuence content of 1950_s_at on hgu95av2") Summary Look at seuence smlarty to fnd: functonal smlarty -? arwse algnment bascs Scorng matrx BLOSUM, AM, etc. Algnment algorthm global, local, etc. Comng u: searchng onlne databases (BLAST) multle algnments attern (motf) fndng usng seuencng to measure gene exresson 31 32
be the i th symbol in x and
2 Parwse Algnment We represent sequences b strngs of alphetc letters. If we recognze a sgnfcant smlart between a new sequence and a sequence out whch somethng s alread know, we can transfer nformaton out
More informationSearch sequence databases 2 10/25/2016
Search sequence databases 2 10/25/2016 The BLAST algorthms Ø BLAST fnds local matches between two sequences, called hgh scorng segment pars (HSPs). Step 1: Break down the query sequence and the database
More informationComputational Biology Lecture 8: Substitution matrices Saad Mneimneh
Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally
More informationDownload the files protein1.txt and protein2.txt from the course website.
Queston 1 Dot plots Download the fles proten1.txt and proten2.txt from the course webste. Usng the dot plot algnment tool http://athena.boc.uvc.ca/workbench.php?tool=dotter&db=poxvrdae, algn the proten
More informationHidden Markov Model Cheat Sheet
Hdden Markov Model Cheat Sheet (GIT ID: dc2f391536d67ed5847290d5250d4baae103487e) Ths document s a cheat sheet on Hdden Markov Models (HMMs). It resembles lecture notes, excet that t cuts to the chase
More informationCourse organization. Part II: Algorithms for Network Biology (Week 12-16)
Course organzaton Introducton Week 1-2) Course ntroducton A bref ntroducton to molecular bology A bref ntroducton to sequence comparson Part I: Algorthms for Sequence Analyss Week 3-11) Chapter 1-3 Models
More informationAlgorithms for factoring
CSA E0 235: Crytograhy Arl 9,2015 Instructor: Arta Patra Algorthms for factorng Submtted by: Jay Oza, Nranjan Sngh Introducton Factorsaton of large ntegers has been a wdely studed toc manly because of
More informationMaximum Likelihood Estimation
Multple sequence algnment Parwse sequence algnment ( and ) Substtuton matrces Database searchng Maxmum Lelhood Estmaton Observaton: Data, D (HHHTHHTH) What process generated ths data? Alternatve hypothess:
More information8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 493 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces you have studed thus far n the text are real vector spaces because the scalars
More informationClustering gene expression data & the EM algorithm
CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern
More informationIntroduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms
Course organzaton 1 Introducton Week 1-2) Course ntroducton A bref ntroducton to molecular bology A bref ntroducton to sequence comparson Part I: Algorthms for Sequence Analyss Week 3-8) Chapter 1-3 Models
More informationPattern Classification
attern Classfcaton All materals n these sldes were taken from attern Classfcaton nd ed by R. O. Duda,. E. Hart and D. G. Stork, John Wley & Sons, 000 wth the ermsson of the authors and the ublsher Chater
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE II LECTURE - GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCE Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 3.
More informationSplit alignment. Martin C. Frith April 13, 2012
Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some
More informationp 1 c 2 + p 2 c 2 + p 3 c p m c 2
Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance
More informationMechanics Physics 151
Mechancs hyscs 151 Lecture Canoncal Transformatons (Chater 9) What We Dd Last Tme Drect Condtons Q j Q j = = j, Q, j, Q, Necessary and suffcent j j for Canoncal Transf. = = j Q, Q, j Q, Q, Infntesmal CT
More informationMachine Learning. Measuring Distance. several slides from Bryan Pardo
Machne Learnng Measurng Dstance several sldes from Bran Pardo 1 Wh measure dstance? Nearest neghbor requres a dstance measure Also: Local search methods requre a measure of localt (Frda) Clusterng requres
More informationMultiple Sequence Alignment
Introducton to Bonformatcs BINF 630 r.. Andrew Carr Multple Sequence Algnments Multple Sequence Algnment Fgure: Conserved catalytc motfs n the caspase-le superfamly of proteases. 2003 by Kluwer Academc
More information( ) 2 ( ) ( ) Problem Set 4 Suggested Solutions. Problem 1
Problem Set 4 Suggested Solutons Problem (A) The market demand functon s the soluton to the followng utlty-maxmzaton roblem (UMP): The Lagrangean: ( x, x, x ) = + max U x, x, x x x x st.. x + x + x y x,
More informationManaging Capacity Through Reward Programs. on-line companion page. Byung-Do Kim Seoul National University College of Business Administration
Managng Caacty Through eward Programs on-lne comanon age Byung-Do Km Seoul Natonal Unversty College of Busness Admnstraton Mengze Sh Unversty of Toronto otman School of Management Toronto ON M5S E6 Canada
More informationCluster Validation Determining Number of Clusters. Umut ORHAN, PhD.
Cluster Analyss Cluster Valdaton Determnng Number of Clusters 1 Cluster Valdaton The procedure of evaluatng the results of a clusterng algorthm s known under the term cluster valdty. How do we evaluate
More information6. Hamilton s Equations
6. Hamlton s Equatons Mchael Fowler A Dynamcal System s Path n Confguraton Sace and n State Sace The story so far: For a mechancal system wth n degrees of freedom, the satal confguraton at some nstant
More informationProtein Structure Comparison
Proten Structure Comparson Proten Structure Representaton CPK: hard sphere model Ball-and-stck Cartoon Degrees of Freedom n Protens Bond length Dhedral angle 3 4 Bond angle + Proten Structure: Varables
More informationMachine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing
Machne Learnng 0-70/5 70/5-78, 78, Fall 008 Theory of Classfcaton and Nonarametrc Classfer Erc ng Lecture, Setember 0, 008 Readng: Cha.,5 CB and handouts Classfcaton Reresentng data: M K Hyothess classfer
More informationMechanics Physics 151
Mechancs Physcs 151 Lecture 22 Canoncal Transformatons (Chater 9) What We Dd Last Tme Drect Condtons Q j Q j = = j P, Q, P j, P Q, P Necessary and suffcent P j P j for Canoncal Transf. = = j Q, Q, P j
More informationDesign and Analysis of Algorithms
Desgn and Analyss of Algorthms CSE 53 Lecture 4 Dynamc Programmng Junzhou Huang, Ph.D. Department of Computer Scence and Engneerng CSE53 Desgn and Analyss of Algorthms The General Dynamc Programmng Technque
More informationStructure from Motion. Forsyth&Ponce: Chap. 12 and 13 Szeliski: Chap. 7
Structure from Moton Forsyth&once: Chap. 2 and 3 Szelsk: Chap. 7 Introducton to Structure from Moton Forsyth&once: Chap. 2 Szelsk: Chap. 7 Structure from Moton Intro he Reconstructon roblem p 3?? p p 2
More informationDistance-Based Approaches to Inferring Phylogenetic Trees
Dstance-Base Approaches to Inferrng Phylogenetc Trees BMI/CS 576 www.bostat.wsc.eu/bm576.html Mark Craven craven@bostat.wsc.eu Fall 0 Representng stances n roote an unroote trees st(a,c) = 8 st(a,d) =
More informationProblem Points Score Total 100
Physcs 450 Solutons of Sample Exam I Problem Ponts Score 1 8 15 3 17 4 0 5 0 Total 100 All wor must be shown n order to receve full credt. Wor must be legble and comprehensble wth answers clearly ndcated.
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationBasic Regular Expressions. Introduction. Introduction to Computability. Theory. Motivation. Lecture4: Regular Expressions
Introducton to Computablty Theory Lecture: egular Expressons Prof Amos Israel Motvaton If one wants to descrbe a regular language, La, she can use the a DFA, Dor an NFA N, such L ( D = La that that Ths
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationThe Bellman Equation
The Bellman Eqaton Reza Shadmehr In ths docment I wll rovde an elanaton of the Bellman eqaton, whch s a method for otmzng a cost fncton and arrvng at a control olcy.. Eamle of a game Sose that or states
More informationThis model contains two bonds per unit cell (one along the x-direction and the other along y). So we can rewrite the Hamiltonian as:
1 Problem set #1 1.1. A one-band model on a square lattce Fg. 1 Consder a square lattce wth only nearest-neghbor hoppngs (as shown n the fgure above): H t, j a a j (1.1) where,j stands for nearest neghbors
More informationBayesian classification CISC 5800 Professor Daniel Leeds
Tran Test Introducton to classfers Bayesan classfcaton CISC 58 Professor Danel Leeds Goal: learn functon C to maxmze correct labels (Y) based on features (X) lon: 6 wolf: monkey: 4 broker: analyst: dvdend:
More informationNote on EM-training of IBM-model 1
Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are
More informationSimilarities Between Hidden Markov Models and Turing Machines, and Possible Applications Towards Bioinformatics
Bonformatcs Fnal Proect, Fall 2000 Smlartes Between Hdden Markov Models and Turng Machnes, and Possble Applcatons Towards Bonformatcs Tyler Cheung Over the past fve or sx years, Hdden Markov Models (HMMs)
More informationSpatial Statistics and Analysis Methods (for GEOG 104 class).
Spatal Statstcs and Analyss Methods (for GEOG 104 class). Provded by Dr. An L, San Dego State Unversty. 1 Ponts Types of spatal data Pont pattern analyss (PPA; such as nearest neghbor dstance, quadrat
More informationFuzzy approach to solve multi-objective capacitated transportation problem
Internatonal Journal of Bonformatcs Research, ISSN: 0975 087, Volume, Issue, 00, -0-4 Fuzzy aroach to solve mult-objectve caactated transortaton roblem Lohgaonkar M. H. and Bajaj V. H.* * Deartment of
More informationChapter 7 Clustering Analysis (1)
Chater 7 Clusterng Analyss () Outlne Cluster Analyss Parttonng Clusterng Herarchcal Clusterng Large Sze Data Clusterng What s Cluster Analyss? Cluster: A collecton of ata obects smlar (or relate) to one
More informationWhat Independencies does a Bayes Net Model? Bayesian Networks: Independencies and Inference. Quick proof that independence is symmetric
Bayesan Networks: Indeendences and Inference Scott Daves and ndrew Moore Note to other teachers and users of these sldes. ndrew and Scott would be delghted f you found ths source materal useful n gvng
More informationCIS 700: algorithms for Big Data
CIS 700: algorthms for Bg Data Lecture 5: Dmenson Reducton Sldes at htt://grgory.us/bg-data-class.html Grgory Yaroslavtsev htt://grgory.us Today Dmensonalty reducton AMS as dmensonalty reducton Johnson-Lndenstrauss
More informationProfile HMM for multiple sequences
Profle HMM for multple sequences Par HMM HMM for parwse sequence algnment, whch ncorporates affne gap scores. Match (M) nserton n x (X) nserton n y (Y) Hdden States Observaton Symbols Match (M): {(a,b)
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationNon-Ideality Through Fugacity and Activity
Non-Idealty Through Fugacty and Actvty S. Patel Deartment of Chemstry and Bochemstry, Unversty of Delaware, Newark, Delaware 19716, USA Corresondng author. E-mal: saatel@udel.edu 1 I. FUGACITY In ths dscusson,
More informationOn the Dirichlet Mixture Model for Mining Protein Sequence Data
On the Drchlet Mxture Model for Mnng Proten Sequence Data Xugang Ye Natonal Canter for Botechnology Informaton Bologsts need to fnd from the raw data lke ths Background Background the nformaton lke ths
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationMachine Perception of Music & Audio. Topic 9: Measuring Distance
Machne Percepton of Musc & Audo Topc 9: Measurng Dstance Bran Pardo EECS 352 Wnter 2010 1 Wh measure dstance? Clusterng requres dstance measures. Local methods requre a measure of localt Search engnes
More informationPattern Recognition. Approximating class densities, Bayesian classifier, Errors in Biometric Systems
htt://.cubs.buffalo.edu attern Recognton Aromatng class denstes, Bayesan classfer, Errors n Bometrc Systems B. W. Slverman, Densty estmaton for statstcs and data analyss. London: Chaman and Hall, 986.
More informationInterpolated Markov Models for Gene Finding
Interpolated Markov Models for Gene Fndng BMI/CS 776 www.bostat.wsc.edu/bm776/ Sprng 208 Anthony Gtter gtter@bostat.wsc.edu hese sldes, ecludng thrd-party materal, are lcensed under CC BY-NC 4.0 by Mark
More informationBIOINFORMATICS: PAST, PRESENT AND FUTURE. Susan R. Wilson Mathematical Sciences Institute, Australian National University, Australia
BIOINFORMATICS: PAST, PRESENT AND FUTURE Susan R. Wlson Mathematcal Scences Insttute, Australan Natonal Unversty, Australa Keywords: Bonformatcs, bologcal sequence analyss, sequence algnment, hdden Markov
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationPoisson brackets and canonical transformations
rof O B Wrght Mechancs Notes osson brackets and canoncal transformatons osson Brackets Consder an arbtrary functon f f ( qp t) df f f f q p q p t But q p p where ( qp ) pq q df f f f p q q p t In order
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More informationTHERMODYNAMICS. Temperature
HERMODYNMICS hermodynamcs s the henomenologcal scence whch descrbes the behavor of macroscoc objects n terms of a small number of macroscoc arameters. s an examle, to descrbe a gas n terms of volume ressure
More informationCommon loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule:
15-745 Lecture 6 Data Dependence n Loops Copyrght Seth Goldsten, 2008 Based on sldes from Allen&Kennedy Lecture 6 15-745 2005-8 1 Common loop optmzatons Hostng of loop-nvarant computatons pre-compute before
More informationUnderstanding Cellular Systems Using Genome Data
Understandng Cellular Systems Usng Genome Data "@? Km Reynolds, UT Southwestern, Sept. 2014 Why s ths problem hard? Detaled nowledge of the molecular players an apparently dense, nterconnected networ.
More informationOn the Repeating Group Finding Problem
The 9th Workshop on Combnatoral Mathematcs and Computaton Theory On the Repeatng Group Fndng Problem Bo-Ren Kung, Wen-Hsen Chen, R.C.T Lee Graduate Insttute of Informaton Technology and Management Takmng
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More informationIntroduction to Algorithms
Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of
More informationThe Dirac Equation for a One-electron atom. In this section we will derive the Dirac equation for a one-electron atom.
The Drac Equaton for a One-electron atom In ths secton we wll derve the Drac equaton for a one-electron atom. Accordng to Ensten the energy of a artcle wth rest mass m movng wth a velocty V s gven by E
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationSubstitution Matrices and Alignment Statistics. Substitution Matrices
Susttuton Mtrces nd Algnment Sttstcs BMI/CS 776 www.ostt.wsc.edu/~crven/776.html Mrk Crven crven@ostt.wsc.edu Ferur 2002 Susttuton Mtrces two oulr sets of mtrces for roten seuences PAM mtrces [Dhoff et
More informationBody Models I-2. Gerard Pons-Moll and Bernt Schiele Max Planck Institute for Informatics
Body Models I-2 Gerard Pons-Moll and Bernt Schele Max Planck Insttute for Informatcs December 18, 2017 What s mssng Gven correspondences, we can fnd the optmal rgd algnment wth Procrustes. PROBLEMS: How
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationGenCB 511 Coarse Notes Population Genetics NONRANDOM MATING & GENETIC DRIFT
NONRANDOM MATING & GENETIC DRIFT NONRANDOM MATING/INBREEDING READING: Hartl & Clark,. 111-159 Wll dstngush two tyes of nonrandom matng: (1) Assortatve matng: matng between ndvduals wth smlar henotyes or
More informationDigital PI Controller Equations
Ver. 4, 9 th March 7 Dgtal PI Controller Equatons Probably the most common tye of controller n ndustral ower electroncs s the PI (Proortonal - Integral) controller. In feld orented motor control, PI controllers
More informationNaïve Bayes Classifier
9/8/07 MIST.6060 Busness Intellgence and Data Mnng Naïve Bayes Classfer Termnology Predctors: the attrbutes (varables) whose values are used for redcton and classfcaton. Predctors are also called nut varables,
More informationMEM 255 Introduction to Control Systems Review: Basics of Linear Algebra
MEM 255 Introducton to Control Systems Revew: Bascs of Lnear Algebra Harry G. Kwatny Department of Mechancal Engneerng & Mechancs Drexel Unversty Outlne Vectors Matrces MATLAB Advanced Topcs Vectors A
More informationPHYS 705: Classical Mechanics. Newtonian Mechanics
1 PHYS 705: Classcal Mechancs Newtonan Mechancs Quck Revew of Newtonan Mechancs Basc Descrpton: -An dealzed pont partcle or a system of pont partcles n an nertal reference frame [Rgd bodes (ch. 5 later)]
More informationLecture 6 More on Complete Randomized Block Design (RBD)
Lecture 6 More on Complete Randomzed Block Desgn (RBD) Multple test Multple test The multple comparsons or multple testng problem occurs when one consders a set of statstcal nferences smultaneously. For
More informationSequence Analysis. Example of nucleotide sequence database entry for Genbank
//8 E N T R E F O R I N T E G R T I V E B I O I N F O R M T I S V U [] Substtuton matrces Seuence analyss 6 [] Substtuton matrces Seuence analyss 6 Seuence nalyss Fndng relatonshps between genes and gene
More information2-Adic Complexity of a Sequence Obtained from a Periodic Binary Sequence by Either Inserting or Deleting k Symbols within One Period
-Adc Comlexty of a Seuence Obtaned from a Perodc Bnary Seuence by Ether Insertng or Deletng Symbols wthn One Perod ZHAO Lu, WEN Qao-yan (State Key Laboratory of Networng and Swtchng echnology, Bejng Unversty
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationRichard Socher, Henning Peters Elements of Statistical Learning I E[X] = arg min. E[(X b) 2 ]
1 Prolem (10P) Show that f X s a random varale, then E[X] = arg mn E[(X ) 2 ] Thus a good predcton for X s E[X] f the squared dfference s used as the metrc. The followng rules are used n the proof: 1.
More informationMessage modification, neutral bits and boomerangs
Message modfcaton, neutral bts and boomerangs From whch round should we start countng n SHA? Antone Joux DGA and Unversty of Versalles St-Quentn-en-Yvelnes France Jont work wth Thomas Peyrn 1 Dfferental
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationC-wave event automated registration using a nonlinear global search method
C-wave event automated regstraton usng a nonlnear global search method Shuangquan Chen*,1, Xang-Yang L 1,2 and Xaomng L 1 1 CNPC Keylab of Geophyscal Prospectng, Chna Unversty of Petroleum, Bejng, 102249,
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationI529: Machine Learning in Bioinformatics (Spring 2017) Markov Models
I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationSupplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso
Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed
More informationLogistic regression with one predictor. STK4900/ Lecture 7. Program
Logstc regresson wth one redctor STK49/99 - Lecture 7 Program. Logstc regresson wth one redctor 2. Maxmum lkelhood estmaton 3. Logstc regresson wth several redctors 4. Devance and lkelhood rato tests 5.
More informationFoundations of Arithmetic
Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an
More informationConfidence intervals for weighted polynomial calibrations
Confdence ntervals for weghted olynomal calbratons Sergey Maltsev, Amersand Ltd., Moscow, Russa; ur Kalambet, Amersand Internatonal, Inc., Beachwood, OH e-mal: kalambet@amersand-ntl.com htt://www.chromandsec.com
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationSupplementary Material for Spectral Clustering based on the graph p-laplacian
Sulementary Materal for Sectral Clusterng based on the grah -Lalacan Thomas Bühler and Matthas Hen Saarland Unversty, Saarbrücken, Germany {tb,hen}@csun-sbde May 009 Corrected verson, June 00 Abstract
More informationAn Introduction to Morita Theory
An Introducton to Morta Theory Matt Booth October 2015 Nov. 2017: made a few revsons. Thanks to Nng Shan for catchng a typo. My man reference for these notes was Chapter II of Bass s book Algebrac K-Theory
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationAdvanced Topics in Optimization. Piecewise Linear Approximation of a Nonlinear Function
Advanced Tocs n Otmzaton Pecewse Lnear Aroxmaton of a Nonlnear Functon Otmzaton Methods: M8L Introducton and Objectves Introducton There exsts no general algorthm for nonlnear rogrammng due to ts rregular
More informationMODELING TRAFFIC LIGHTS IN INTERSECTION USING PETRI NETS
The 3 rd Internatonal Conference on Mathematcs and Statstcs (ICoMS-3) Insttut Pertanan Bogor, Indonesa, 5-6 August 28 MODELING TRAFFIC LIGHTS IN INTERSECTION USING PETRI NETS 1 Deky Adzkya and 2 Subono
More information